CN113160081A

CN113160081A - Depth face image restoration method based on perception deblurring

Info

Publication number: CN113160081A
Application number: CN202110408739.6A
Authority: CN
Inventors: 赵汉理; 刘影; 黄辉
Original assignee: Wenzhou University
Current assignee: Wenzhou University
Priority date: 2021-04-16
Filing date: 2021-04-16
Publication date: 2021-07-23

Abstract

The invention discloses a depth face image restoration method based on perception deblurring, which comprises the following steps: step S1: and giving a face image restoration training set and a testing set. Step S2: designing coarse reconstruction generator G₁Global discriminator D₁And a local discriminator D₂. Step S3: using G₁、D₁And D₂And data are transmitted forward, loss is calculated and transmitted backward, and model parameters are updated. Step S4: and repeating the step 3 until the coarse reconstruction training is finished. Step S5: design aware deblurring generator G₂And a discriminator D₃. Step S6: using G₂And D₃And data are transmitted forward, loss is calculated and transmitted backward, and model parameters are updated. Step S7: and repeating the step 6 until the training of the perception deblurring stage is finished. Step S8: use the surveyAnd (5) testing and judging the model, and storing the optimal model. The invention can quickly give the prediction result when the human face picture with missing information is given. The prediction result has good visual effect, keeps coherent content and has correct semantic information.

Description

Depth face image restoration method based on perception deblurring

Technical Field

The invention belongs to the field of image restoration, and particularly relates to a human face image restoration algorithm based on CNN and GAN, which better solves the problems of insufficient consistency of restoration contents and discontinuous contents of restoration boundary areas of the existing human face image.

Background

The problem of image restoration, which aims at filling missing pixels in damaged images, has attracted a lot of attention and has been a valuable and active research topic for decades, since high quality images can benefit from a wide range of applications, such as restoration of old photographs, object removal, etc. High quality image inpainting generally requires filling missing regions with not only visually realistic, but also semantically reasonable content. Existing methods can be broadly divided into two categories. The first category attempts texture synthesis techniques at the fill area. In particular, these methods are typically sample and paste the full image resolution patch of the source image into the missing region, allowing the synthesis of detailed results. However, due to the lack of high-level understanding of the images, such methods often fail to produce semantically reasonable results. To solve this problem, a second group of methods proposes to encode the semantic context of an image into a potential feature space through a deep neural network, and then generate semantically relevant information by generating a model. However, generating visually realistic results from a compact latent feature remains a challenge, as full image resolution detail is typically convolved and pooled by stacking. The face restoration is a branch of image restoration, is an important work in computer vision, and is widely applied to image editing, image rendering and computer photography.

Disclosure of Invention

The invention provides a depth face image restoration method based on perception deblurring, aiming at the problems of low efficiency and poor robustness of a depth learning face image restoration technology and a traditional face image restoration algorithm and the problems of insufficient content consistency and discontinuous content of a restoration boundary region of a face image detection method in the prior art. And in the second stage, the neural network with a perception deblurring method is used for carrying out detail refinement on the semi-finished product generated in the first stage to generate a completely repaired picture.

The technical scheme of the invention is a depth face image restoration method based on perception deblurring, which comprises the following steps:

step S1: a facial image restoration data set is given and divided into a training set and a test set;

step S2: coarse reconstruction stage generator G in deep face image restoration method based on perception deblurring construction₁Global arbiter D₁And a local discriminator D₂Constructing an ADADELTA optimizer;

step S3: in each iteration training of the coarse reconstruction stage, sampling a picture x in the training set, randomly generating a binary mask m, wherein the value of a region to be repaired in m is 1, the value of the rest regions is 0, and taking x (1-m) as G₁Inputting an iterative training; an operation for multiplying corresponding elements in the matrix; d_1,2Is D₁And D₂Connecting the last characteristic layer; input via G₁Forward propagation to obtain a semi-finished product x', using D_1,2Calculating L according to the probability of judging x' as a real picture_adv1And calculating L by using the error between x' and x_psnrAnd L_ssim(ii) a At fixed position D₁And D₂When the parameters of (1) are used, L is used_adv1、L_psnrAnd L_ssimCounter propagating pair G₁Updating parameters; at the fixed position G₁When the parameters of (1) are D_1,2To judge the probability that x is a real picture, the probability is calculated by BCELoss, the judgment error is propagated reversely, and the pair D₁And D₂Updating parameters; by D_1,2The probability that x' is judged to be a false picture is calculated by using BCELoss, judgment error is calculated and propagated backwards, and the pair D₁And D₂Updating parameters;

step S4: at G₁And D_1,2In the alternate updating of parameters, G₁Go on and off to generate more realistic samples, D_1,2Then an effort is made to determine whether the generated sample is true or false, and the two are played against each other until nash balance is achievedRepeating the step S3 until the training of one stage is finished;

step S5: method for constructing generator G in perception deblurring stage in depth face image restoration method based on perception deblurring₂And a discriminator D₃(ii) a Constructing an Adam optimizer;

step S6: in each iteration training of the perceptual deblurring stage, a pair of pictures (x, y) is sampled, wherein y ═ x [ (1-m) + x' ] m, and y is used as G₂Through G₂Forward propagation yields the result y', with D₃To distinguish y 'as a true picture, i.e. the probability that y' passes through D₃Calculating L by the difference between the result after forward propagation and True_adv2(ii) a Calculating L by using the error between y' and x_per(ii) a At fixed position D₃When the parameters of (1) are used, L is used_adv2And L_perCounter propagating pair G₂Updating parameters; at the fixed position G₂When the parameters of (1) are D₃To judge the probability that x is a real picture, namely calculating the judgment error back propagation by using the difference value, and to D₃Updating parameters; by D₃To judge y' as false picture probability, i.e. to calculate the judgment error back propagation by difference value, for D₃Updating parameters;

step S7: at G₂And D₃In the alternate updating of parameters, G₂Go on and off to generate more realistic samples, D₃The method tries to judge whether the generated sample is real or false, and the two samples are played mutually until Nash balance is achieved, namely the step S6 is repeated until the two-stage training is finished;

step S8: in the testing stage, selecting the picture of the test set, the picture passing through G₁And G₂Forward propagating according to the sequence to obtain a prediction result; by calculating z ═ x ÷ y '<' > m, a final prediction z is obtained; judging the model and storing the optimal model through qualitative test and quantitative test of z;

in the step S3, the input is via G₁Forward propagation yields x', specifically:

sampling from a training set to obtain a batch of original data, wherein each real picture x in the original data comprises three RGB channels; firstly, preprocessing the pictureC, processing; then randomly generating a single-channel mask m, wherein the method comprises the steps of initializing a single-channel two-dimensional matrix with the same size as the original image, setting the value of the matrix in a region to be repaired to be 1, and setting the value of the matrix in the rest regions to be 0; then, take x [ -1-m ] as input and normalize it to keep all pixel values [ -1,1 [ ]]To (c) to (d); input via G₁Obtaining x' after forward propagation; by D_1,2To determine the probability that x 'is a true picture, i.e. x' passes through D_1,2After forward propagation, L is calculated using the result and the BCELoss value of True_adv1(ii) a The calculation method comprises the following steps:

L_adv1＝-log[D_1,2(G₁(x,m))]

meanwhile, calculating L through the error of x' and x in the mask area_psnrAnd L_ssim(ii) a Wherein L is_psnrBy

Obtaining that k is the side length of the input picture; l is_ssimBy

The subscripts a and b are obtained as mask regions, μ_aAnd mu_bIs a value calculated by a and b from a Gaussian filter, σ_aIs the standard deviation of a, σ_bIs the standard deviation of b, σ_abIs the covariance of a and b; to prevent the denominator and numerator from being zero, M₁And M₂Taking the smaller value of 0.0001 and 0.0009; at fixed position D₁And D₂When the parameters of (1) are used, L is used_adv1、L_psnrAnd L_ssimCounter propagating pair G₁Updating parameters; having a loss function of

Wherein λ_psnrAnd λ_adv11.16 and 0.0003 respectively, in the fixed position G₁When the parameters of (1) are D_1,2To judge the probability that x is a real picture, i.e. calculating the judgment error by BCELoss and back-propagating, for D₁And D₂Updating parameters; by D_1,2To judge the probability that x' is false picture, i.e. using BCELoss to calculate the judgment error and back-propagate, for D₁And D₂Updating parameters;

in step S6, y is passed through G₂Forward propagation to obtain a prediction result y', specifically:

in each iteration training of the perceptual deblurring stage, a pair of pictures (x, y) is sampled, wherein y ═ x [ (1-m) + x' ] m, and y is used as G₂The input of (1); y passes through G₂After forward propagation, the result y' is obtained, using D₃To distinguish y 'as a true picture, i.e. the probability that y' passes through D₃Calculating L by the difference between the result after forward propagation and True_adv2(ii) a Therefore L_adv2＝-D₃(G₂(y)); calculating the perceptual loss L by the error of y' and x_perTo obtain

Wherein

Is a characteristic diagram obtained after passing through the jth convolutional layer and activation function of VGG19 network, wherein W_jAnd H_jThe dimension of the j-th layer feature graph is obtained, and the VGG19 network is trained on an Imagenet data set in advance; at fixed position D₃When the parameters of (1) are used, L is used_perAnd L_adv2Counter propagating pair G₂Performing parameter update, G₂Loss function of

Wherein λ is_adv2Is 0.01; in the fixingLive G₂When the parameters of (1) are D₃To judge the probability that x is a real picture, namely calculating the judgment error back propagation by using the difference value, and to D₃Updating parameters; by D₃To determine the probability that y' is a false picture, i.e. to calculate the judgment error back propagation by using the difference value, for D₃And updating the parameters.

As a further improvement, in step S2, a coarse reconstruction stage generator G in the depth human face image restoration method based on perceptual deblurring is constructed₁Global arbiter D₁And a local discriminator D₂Constructing an ADADELTA optimizer; the method specifically comprises the following steps:

G₁the coarse reconstruction generator is a deep neural network based on a convolutional neural network and a generation countermeasure network; in the coarse reconstruction stage, the features of the whole face picture are extracted by adopting a convolution layer and a cavity convolution layer with rich receptive fields; and recovering the characteristic diagram with low resolution to the original resolution by using the deconvolution layer; g₁A total of 17 layers, 11 convolutional layers, 4 void convolutional layers and 2 deconvolution layers; the first 6 layers are convolution layers, the output characteristic diagram of the 3 rd layer is one half of the resolution of the original image, and the output characteristic diagram of the 5 th layer is one fourth of the resolution of the original image; the 7 th to 10 th layers are void convolution layers, and the void coefficients are respectively 2, 4, 8 and 16; layers 11 and 12 are convolutional layers; the 13 th layer is a deconvolution layer, and the resolution of the output feature map is twice of that of the input feature map; the 14 th layer is a convolution layer; the 13 th layer is a deconvolution layer, the resolution of the output feature map is twice of that of the input feature map, and the output resolution of the 13 th layer is the same as that of the input picture; layers 16 and 17 are convolution layers, and the output of the layer 17 is a coarse reconstruction result x' with three channels and the same resolution as the input picture; except the network layer with the changed resolution of the output characteristic diagram, the output resolution of other network layers is not changed; after the operation of the first 16 layers in each layer, the BatchNormailization normalization method and the ReLU activation function are used; layer 17 uses Sigmoid activation function after operation;

D₁is a global discriminator, is a convolution neural network and is used for discriminating in a global angleThe probability that the picture x' is a real picture is generated at the other stage; d₁The total number of the layers is 6, wherein 5 convolution layers and 1 full-connection layer are provided; the output characteristic resolution of the first 4 layers of convolution layers is one half of the input resolution; the output resolution of the 5 th convolution layer is unchanged; after each of the 5 convolutional layers was operated, the BatchNormailization normalization method and the ReLU activation function were used;

D₂the local discriminator is a convolution neural network and is used for discriminating the probability that the generated picture x' is a real picture at a local angle in one stage; d₂The total number of the layers is 5, wherein 4 convolution layers and 1 full-connection layer are arranged; the output characteristic resolution of the first 3 layers of convolution layers is one half of the input resolution; the output resolution of the 4 th convolution layer is unchanged; after each of the 4 convolutional layers was operated, the BatchNormailization normalization method and the ReLU activation function were used; layer 5 output feature map and D₁The 6 th layer of the graph is used for outputting a characteristic diagram, a connection operation is carried out to form a full connection layer, and a ReLU activation function is used after the full connection layer;

in the first stage, an ADADELTA optimizer is used, so that the rapid fitting of the network is facilitated; in the initialization parameters, the learning rate is 1.0, the coefficient rho for calculating the running average is 0.9, the term epsilon added to the denominator to improve the numerical stability is 1e-6, and the weight attenuation coefficient is 0.

As a further improvement, in step S5, a perception deblurring stage generator G in the depth human face image restoration method based on perception deblurring is constructed₂And a discriminator D₃And constructing an Adam optimizer, specifically comprising the following steps:

G₂the perception deblurring generation network is a deep neural network based on CNN and GAN; in a perception deblurring stage, a reconstructed picture with a roughly reconstructed structure needs to be subjected to feature extraction by adopting a neural network with residual connection, and a feature map with low resolution is restored to the original resolution by using an deconvolution layer; g₂A total of 13 layers, 11 convolutional layers and 2 anti-convolutional layers; the first 3 layers are convolution layers, and the output resolutions of the 2 nd layer and the 3 rd layer are half and one fourth of a real picture respectively; first, theThe 1 residual block contains the 4 th and 5 th convolutional layers, the 2 nd residual block contains the 6 th and 7 th convolutional layers, and the 3 rd residual block contains the 8 th and 9 th convolutional layers; the residual blocks all contain residual links; the 10 th layer and the 11 th layer are deconvolution layers, the resolution of an output feature map of each layer is twice of that of the input, and the output resolution of the 11 th layer is the same as that of the input picture; the 12 th layer is a convolution layer, and the output of the 12 th layer is a perception deblurring result y' with three channels and the same resolution as the input picture; except the network layer with the changed resolution of the output characteristic diagram, the output resolution of other network layers is not changed; after the operation of each layer of the first 11 layers, the BatchNormailization normalization method and the ReLU activation function are used; after the operation of layer 12, a Tanh activation function is used;

D₃the perceptive deblurring discriminator is a convolution neural network and is used for discriminating the probability that the two-stage prediction y' is a real picture; d₃A total of 5 convolutional layers; the resolution of the output characteristic graphs of the 5 convolutional layers is one half of the input resolution, and finally, the discrimination probability is obtained through a Sigmoid activation function; after each layer of the first 4 convolutional layers was operated, the BatchNormailization normalization method and LeakyReLU activation function were used; after the layer 5 operation, a Sigmoid activation function is used;

in the second stage, an Adam optimizer is used to accelerate network convergence; in the initialization parameters, the learning rate is 1e-3, the coefficient beta for calculating the running average value and the square of the gradient is (0.9,0.999), the term epsilon added to the denominator to improve the numerical stability is 1e-8, and the weight attenuation coefficient is 0.

As a further improvement, in step S8, the model is judged and the optimal model is saved through qualitative and quantitative tests on z, specifically:

sampling from the test set to obtain a batch of real pictures, cutting and scaling the real pictures to achieve the target resolution, hollowing out the area to be repaired through a single-channel mask for representing the area to be repaired in RGB three channels, and normalizing the three channels to be used as input; input via G₁And G₂After forward propagation in sequence, the result is completeRepairing the result; repeating the operation until the whole test set is sampled without repetition; the qualitative test is to visually judge all finally obtained complete restoration results, including whether the contents in the picture restoration area and at the boundary are consistent or not and whether the color, the brightness and the contrast of the picture are consistent or not; the quantitative test is to calculate the PSNR value and the SSIM value of all finally obtained complete restoration results and corresponding original images, and the higher the value is, the better the value is; when the steps S2 to S6 are repeated, a plurality of trained models can be obtained; and performing qualitative test and quantitative test on different models, and selecting a final model with excellent effects of the qualitative test and the quantitative test, wherein the model with better qualitative test effect should be selected preferentially.

Compared with the prior art, the depth face image restoration method based on perception deblurring provided by the invention has the following beneficial effects:

the invention provides a depth face image restoration method based on perception deblurring, which uses a two-stage training mode, wherein one stage is used for finishing rough reconstruction by combining a neural network with large receptive field cavity convolution and information of a known region and a region to be restored, and generating a semi-finished product. And in the second stage, the neural network with a perception deblurring method is used for carrying out detail refinement on the semi-finished product generated in the first stage to generate a completely repaired picture. When the face picture with missing information is given, the model can quickly give a prediction result. The prediction result has good visual effect, has semantic information of the human face, and keeps continuous content in the result and at the boundary. The human face image restoration model provided by the invention has good robustness on the input image of the human face image with large internal color difference of the front face, the side face, the makeup and the like, and has good value and prospect in practical application.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is within the scope of the present invention for those skilled in the art to obtain other drawings based on the drawings without inventive exercise.

Fig. 1 is a flowchart of a depth face image restoration method based on perceptual deblurring according to an embodiment of the present invention;

fig. 2 is an architecture diagram of a depth human face image repairing method based on perceptual deblurring according to an embodiment of the present invention.

Detailed Description

For completeness and clarity of description of technical solutions in the embodiments of the present invention, the following detailed description will be further developed with reference to the accompanying drawings in the embodiments of the present invention. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

As shown in fig. 1, the present invention provides a depth face image repairing method based on perception deblurring, which comprises the following steps:

in step S1, a face image inpainting data set is given and divided into a training set and a test set with a ratio of 9: 1.

In step S2, a coarse reconstruction stage generator G in the depth face image restoration method based on perception deblurring is constructed₁Global arbiter D₁And a local discriminator D₂. An ADADELTA optimizer was constructed. The method specifically comprises the following steps:

g in the deep face image restoration method training flow chart based on perception deblurring given in FIG. 2₁The coarse reconstruction generator is a deep neural network based on a Convolutional Neural Network (CNN) and a generation countermeasure network (GAN). In the coarse reconstruction stage, the features of the whole face picture need to be extracted by using the convolution layer and the cavity convolution layer with rich receptive fields. And recovering the characteristic diagram with low resolution to the original resolution by using the deconvolution layer. G₁A total of 17 layers were included, of which there were 11 convolutional layers, 4 void convolutional layers and 2 deconvolution layers. The first 6 layers are convolutional layers, the output characteristic map of the 3 rd layer is one half of the resolution of the original, and the output characteristic map of the 5 th layer is one quarter of the resolution of the original. The 7 th to 10 th layers are void convolution layers, and the void coefficients are 2, 4, 8 and 16, respectively.Layers 11 and 12 are convolutional layers. Layer 13 is a deconvolution layer, and the resolution of the output feature map is twice that of the input. The 14 th layer is a convolutional layer. The 13 th layer is a deconvolution layer, the resolution of the output feature map is twice that of the input, and the output resolution of the 13 th layer is the same as the input picture resolution. Layers 16 and 17 are convolutional layers, and the output of layer 17 is a coarse reconstruction result x' with three channels and the same resolution as the input picture. Except the network layer with the changed resolution of the output characteristic diagram, the output resolution of the other network layers is not changed. After the first 16 layers have been operated on each layer, the BatchNormailification normalization method and the ReLU activation function are used. Layer 17 uses the Sigmoid activation function after operation. Table 1 below shows G in the depth face image restoration method based on perceptual deblurring according to the present invention₁Network detail parameters.

Table 1: g in depth face image restoration method based on perception deblurring₁Network detail parameters

D₁The global discriminator is a convolutional neural network and is used for discriminating the probability that the generated picture x' is a real picture in a global angle. D₁The total of 6 layers were included, of which there were 5 convolutional layers and 1 fully-connected layer. The output characteristic resolution of the first 4 convolutional layers is half of the input resolution. The 5 th convolutional layer output resolution is unchanged. After each of the 5 convolutional layers was operated, the BatchNormailization normalization method and the ReLU activation function were used. As shown in the following Table 2, D in the depth human face image restoration method based on perception deblurring of the invention₁Network detail parameters.

Table 2: depth face image restoration method based on perception deblurring₁Network detail parameters

D₂The local discriminator is a convolution neural network and is used for generating the probability that the picture x' is a real picture at a local angle discrimination stage. D₂The total of 5 layers were included, of which there were 4 convolutional layers and 1 fully-connected layer. The output characteristic resolution of the first 3 convolutional layers is half of the input resolution. The 5 th convolutional layer output resolution is unchanged. After each of the 4 convolutional layers was operated, the BatchNormailization normalization method and the ReLU activation function were used. Layer 5 output feature map and D₁The layer 6 of (2) outputs a feature map, a join operation is performed to form a fully connected layer, and a ReLU activation function is used after the fully connected layer. As shown in the following Table 3, D in the depth human face image restoration method based on perception deblurring of the present invention₂Network detail parameters.

Table 3: depth face image restoration method based on perception deblurring₂Network detail parameters

The ADADELTA is an extension of the Adagarad algorithm, aims to solve the problem that the learning rate in Adagarad is continuously reduced, and can be adaptive to the learning rate. One stage uses an ADADELTA optimizer, facilitating fast fitting of the network. In the initialization parameters, the learning rate is 1.0, the coefficient rho for calculating the running average is 0.9, the term epsilon added to the denominator to improve the numerical stability is 1e-6, and the weight attenuation coefficient is 0.

In step S3, in each iteration training of the coarse reconstruction stage, sampling a picture x in the training set, randomly generating a binary mask m in which the value of the region to be repaired is 1 and the value of the remaining region is 0, and taking x [ (1-m) ] as G₁And inputting one iteration of training. As a multiplication operation of corresponding elements in the matrix. D_1,2Is D₁And D₂And finally, connecting the characteristic layers. Input via G₁Forward propagation to obtain a semi-finished product x', using D_1,2Calculating L according to the probability of judging x' as a real picture_adv1And calculating L by using the error between x' and x_psnrAnd L_ssim. At fixed position D₁And D₂When the parameters of (1) are used, L is used_adv1、L_psnrAnd L_ssimCounter propagating pair G₁And updating the parameters. At the fixed position G₁When the parameters of (1) are D_1,2To judge the probability that x is a real picture, the probability is calculated by BCELoss, the judgment error is propagated reversely, and the pair D₁And D₂Updating parameters; by D_1,2To judge the probability that x' is false, the judgment error is calculated by BCELoss and propagated reversely for D₁And D₂And updating the parameters. The method specifically comprises the following steps:

a batch of original data is obtained by sampling from a training set, and each real picture x in the original data comprises three RGB channels. The picture is preprocessed, for example, if the target resolution of the present invention is 256 × 256, the original image is randomly cropped and scaled to reach the target resolution. And then randomly generating a single-channel mask m, wherein the method is to initialize a single-channel two-dimensional matrix with the same size as the original image, the value of the matrix in the area to be repaired is set to be 1, and the value of the matrix in the rest areas is set to be 0. For example, if the area to be repaired is a rectangular area of the original image with a size of one fourth, a coordinate is randomly selected as an upper left corner in m, and the upper right corner, the lower left corner and the lower right corner of the area to be repaired are formed according to the coordinate. When the coordinate position of the upper left corner is randomly selected, the lower right corner is ensured not to exceed the range of the original image, the area to be repaired meeting the requirement can be constructed, the value of m in the rectangular area is set to be 1, and the value of m outside the rectangular area is set to be 0. Take x [ -1-m ] as input and normalize it to keep all pixel values [ -1,1 [ ]]In the meantime. Input via G₁X' is obtained after forward propagation. By D_1,2To determine the probability that x 'is a true picture, i.e. x' passes through D_1,2After forward propagation, L is calculated using the result and the BCELoss value of True_adv1. Therefore L_adv1＝-log[D_1,2(G₁(x,m))]. Calculating L through the error of x' and x in the mask area_psnrAnd L_ssim. Wherein L is_psnrBy

Obtained, k is the side length of the input picture. L is_ssimBy

The subscripts a and b are obtained as mask regions, μ_aAnd mu_bIs a value calculated by a and b from a Gaussian filter, σ_aIs the standard deviation of a, σ_bIs the standard deviation of b, σ_abIs the covariance of a and b. To prevent the denominator and numerator from being zero, M₁And M₂Taking the smaller value of 0.0001 and 0.0009. At fixed position D₁And D₂When the parameters of (1) are used, L is used_adv1、L_psnrAnd L_ssimCounter propagating pair G₁And updating the parameters. Loss function thereof

Where lambda is_psnrAnd λ_adv11.16 and 0.0003 respectively, in the fixed position G₁When the parameters of (1) are D_1,2To judge the probability that x is a real picture, i.e. calculating the judgment error by BCELoss and back-propagating, for D₁And D₂Updating parameters; by D_1,2To judge the probability that x' is false picture, i.e. using BCELoss to calculate the judgment error and back-propagate, for D₁And D₂And updating the parameters.

In step S4, in G₁And D_1,2In the alternate updating of parameters, G₁Go on and off to generate more realistic samples, D_1,2Then, the method tries to judge whether the generated sample is true or false, and the two samples are played mutually until Nash balance is achieved, namely, the step S3 is repeated until one-stage training is finished, specifically:

first, the preliminary training is carried out to G₁Performing several iterative training times, i.e. oneUpdate alone G₁Several times. To D_1,2Performing several iterative trainings, i.e. updating D separately_1,2Several times. And then both are trained together, i.e. alternately updating G at the same time₁And D_1,2And (4) parameters. Wherein G is updated₁While parameter, keep D_1,2Fixing parameters; to update D_1,2When parameters are satisfied, maintain G₁And (5) fixing the parameters.

In step S5, a perception deblurring stage generator G in the depth human face image restoration method based on perception deblurring is constructed₂And a discriminator D₃. Constructing an Adam optimizer, specifically comprising the following steps:

g in the deep face image restoration method training flow chart based on perception deblurring given in FIG. 2₂To perceive the deblurring generator, it is a deep neural network based on CNN and GAN. In the stage of perception deblurring, a reconstructed picture with a roughly reconstructed structure needs to be subjected to feature extraction by adopting a neural network with residual connection, and a feature map with low resolution is restored to the original resolution by using an deconvolution layer. G₂The total of 13 layers were included, 11 convolutional layers and 2 anti-convolutional layers. The first 3 layers are convolutional layers, and the output resolutions of the 2 nd and 3 rd layers are half and quarter of the real picture, respectively. The 1 st residual block contains the 4 th and 5 th convolutional layers, the 2 nd residual block contains the 6 th and 7 th convolutional layers, and the 3 rd residual block contains the 8 th and 9 th convolutional layers. The residual blocks each contain a residual link. The 10 th and 11 th layers are deconvolution layers, the output feature map resolution of each layer is twice that of the input, and the output resolution of the 11 th layer is the same as the input picture resolution. Layer 12 is a convolutional layer, and the output of layer 12 is a perceptual deblurring result y' with three channels and the same resolution as the input picture. Except the network layer with the changed resolution of the output characteristic diagram, the output resolution of the other network layers is not changed. After the operation of each of the first 11 layers, the BatchNormailization normalization method and the ReLU activation function were used. After layer 12 operation, the Tanh activation function is used. As shown in Table 4 below, G is the depth human face image restoration method based on perception deblurring in the invention₂Network detail parameters.

Table 4: g in depth face image restoration method based on perception deblurring₂Network detail parameters

D₃The perceptive deblurring discriminator is a convolution neural network and is used for discriminating the probability that two-stage prediction y' is a real picture. D₃A total of 5 convolutional layers were included. The resolution of the output characteristic graphs of the 5 convolutional layers is one half of the input resolution, and finally, the discrimination probability is obtained through a Sigmoid activation function. After each of the first 4 convolutional layers was operated, the BatchNormailization normalization method and LeakyReLU activation function were used. After layer 5 operations, the Sigmoid activation function is used. As shown in the following Table 5, D in the depth human face image restoration method based on perception deblurring of the present invention₃Network detail parameters.

TABLE 5 perception deblurring-based depth face image restoration method D₃Network detail parameters

Adam is an effective random optimization method, and the adaptive learning rates of different parameters are calculated through the estimation of the first gradient and the second gradient. Adam optimizer is used in the second stage, and network convergence is accelerated. In the initialization parameters, the learning rate is 1e-3, the coefficient beta for calculating the running average value and the square of the gradient is (0.9,0.999), the term epsilon added to the denominator to improve the numerical stability is 1e-8, and the weight attenuation coefficient is 0.

In step S6, a pair of pictures (x, y) is sampled in each iteration of the perceptual deblurring stage, wherein y ═ x [ (1-m) + x' ] m, andwith y as G₂Through G₂Forward propagation yields the result y', with D₃To distinguish y 'as a true picture, i.e. the probability that y' passes through D₃Calculating L by the difference between the result after forward propagation and True_adv2. Calculating L by using the error between y' and x_per. At fixed position D₃When the parameters of (1) are used, L is used_adv2And L_perCounter propagating pair G₂And updating the parameters. At the fixed position G₂When the parameters of (1) are D₃To judge the probability that x is a real picture, namely calculating the judgment error back propagation by using the difference value, and to D₃Updating parameters; by D₃To judge y' as false picture probability, i.e. to calculate the judgment error back propagation by difference value, for D₃Updating parameters, specifically:

calculating y ═ x ∑ m) + x '<' > m, sampling a pair of pictures (x, y), and using y as G₂Is input. y passes through G₂After forward propagation, the result y' is obtained, using D₃To distinguish y 'as a true picture, i.e. the probability that y' passes through D₃Calculating L by the difference between the result after forward propagation and True_adv2. Therefore L_adv2＝-D₃(G₂(y)). Calculating L by the error of y' with x_perTo obtain

Wherein

Is a characteristic diagram obtained after passing through the jth convolutional layer and activation function of VGG19 network, wherein W_jAnd H_jAnd the dimension of the j-th layer feature graph is obtained, and the VGG19 network is trained on an Imagenet data set in advance. At fixed position D₃When the parameters of (1) are used, L is used_perAnd L_adv2Counter propagating pair G₂Performing parameter update, G₂Loss function of

Wherein λ is_adv2Is 0.01. At the fixed position G₂When the parameters of (1) are D₃To judge the probability that x is a real picture, namely calculating the judgment error back propagation by using the difference value, and to D₃Updating parameters; by D₃To determine the probability that y' is a false picture, i.e. to calculate the judgment error back propagation by using the difference value, for D₃And updating the parameters.

In step S7, in G₂And D₃In the alternate updating of parameters, G₂Go on and off to generate more realistic samples, D₃Then, the method tries to judge whether the generated sample is true or false, and the two samples are played mutually until Nash balance is achieved, namely, the step S6 is repeated until the two-stage training is finished, specifically:

both training together, i.e. alternately updating G at the same time₂And D₃And (4) parameters. Wherein G is updated₂While parameter, keep D₃Fixing parameters; to update D₃When parameters are satisfied, maintain G₂And (5) fixing the parameters.

In step S8, a picture of the test set is selected, and the picture passes through G₁And G₂And carrying out forward propagation in sequence to obtain a prediction result. By calculating z ═ x ÷ (1-m) + y '<' > m, the final prediction z is obtained. Judging the model and storing the optimal model through qualitative test and quantitative test of z, which specifically comprises the following steps:

and (3) obtaining a batch of real pictures from the test set, cutting and scaling the pictures to achieve the target resolution, hollowing the area to be repaired through a single-channel mask for representing the area to be repaired in RGB three channels, and normalizing the pictures to be used as input. Input via G₁And G₂And after forward propagation in sequence, a complete repair result is obtained. And repeating the operation until the whole test set is sampled without repetition. The qualitative test is to visually judge all the finally obtained complete restoration results, including whether the contents inside and at the boundary of the picture restoration area are consistent or not, and whether the color, the brightness and the contrast of the picture are consistent or not. Quantitative test is to obtain all the final complete repair results and correspondencesThe PSNR value and the SSIM value of the original image are calculated, and the higher the PSNR value and the higher the SSIM value, the better the PSNR value and the SSIM value should be. When the steps S2 to S6 are repeated, a plurality of trained models can be obtained. And performing qualitative test and quantitative test on different models, and selecting a final model with excellent effects of the qualitative test and the quantitative test, wherein the model with better qualitative test effect should be selected preferentially.

In summary, the present invention provides a depth face image restoration method based on perception deblurring, which uses a two-stage approach. And in the first stage, the neural network with the large receptive field cavity convolution is combined with the information of the known region and the region to be repaired, so that rough reconstruction is completed, and a semi-finished product is generated. And in the second stage, the neural network with a perception deblurring method is used for carrying out detail refinement on the semi-finished product generated in the first stage to generate a completely repaired picture. When a human face picture with missing information is given, the model provided by the invention can quickly give a prediction result. The prediction result has good visual effect, has semantic information of the human face, and keeps continuous content in the result and at the boundary. The human face image restoration model provided by the invention has good robustness on the input image of the human face image with large internal color difference of the front face, the side face, the makeup and the like, and has good value and prospect in practical application.

It will be appreciated by persons skilled in the art that the invention is not limited to details of the foregoing embodiments, and that the invention can be embodied in other specific forms without departing from the spirit or scope of the invention. In addition, various modifications and alterations of this invention may be made by those skilled in the art without departing from the spirit and scope of this invention, and such modifications and alterations should also be viewed as being within the scope of this invention.

Claims

1. A depth face image restoration method based on perception deblurring is characterized by comprising the following steps:

step S3: in each iteration training of the coarse reconstruction stage, sampling a picture x in the training set, randomly generating a binary mask m, wherein the value of a region to be repaired in m is 1, the value of the rest regions is 0, and taking x (1-m) as G₁Inputting an iterative training; an operation for multiplying corresponding elements in the matrix; d_1，2Is D₁And D₂Connecting the last characteristic layer; input via G₁Forward propagation to obtain a semi-finished product x', using D_1，2Calculating L according to the probability of judging x' as a real picture_adv1And calculating L by using the error between x' and x_psnrAnd L_ssim(ii) a At fixed position D₁And D₂When the parameters of (1) are used, L is used_adv1、L_psnrAnd L_ssimCounter propagating pair G₁Updating parameters; at the fixed position G₁When the parameters of (1) are D_1，2To judge the probability that x is a real picture, the probability is calculated by BCELoss, the judgment error is propagated reversely, and the pair D₁And D₂Updating parameters; by D_1，2The probability that x' is judged to be a false picture is calculated by using BCELoss, judgment error is calculated and propagated backwards, and the pair D₁And D₂Updating parameters;

step S4: at G₁And D_1，2In the alternate updating of parameters, G₁Go on and off to generate more realistic samples, D_1，2Then, the method tries to judge whether the generated sample is real or false, and the two samples are played mutually until Nash balance is achieved, namely, the step S3 is repeated until one-stage training is finished;

step S6: in each iteration training of the perceptual deblurring stage, a pair of pictures (x, y) is sampled, wherein y ═ x [ (1-m) + x' ] m, and y is used as G₂Through G₂Forward propagation yields the result y', with D₃Probability instant use for judging y' as real picturey' passing through D₃Calculating L by the difference between the result after forward propagation and True_adv2(ii) a Calculating L by using the error between y' and x_per(ii) a At fixed position D₃When the parameters of (1) are used, L is used_adv2And L_perCounter propagating pair G₂Updating parameters; at the fixed position G₂When the parameters of (1) are D₃To judge the probability that x is a real picture, namely calculating the judgment error back propagation by using the difference value, and to D₃Updating parameters; by D₃To judge y' as false picture probability, i.e. to calculate the judgment error back propagation by difference value, for D₃Updating parameters;

sampling from a training set to obtain a batch of original data, wherein each real picture x in the original data comprises three RGB channels; preprocessing a picture; then randomly generating a single-channel mask m, wherein the method comprises the steps of initializing a single-channel two-dimensional matrix with the same size as the original image, setting the value of the matrix in a region to be repaired to be 1, and setting the value of the matrix in the rest regions to be 0; then, take x [ -1-m ] as input and normalize it to keep all pixel values [ -1,1 [ ]]To (c) to (d); input via G₁Obtaining x' after forward propagation; by D_1，2To determine the probability that x 'is a true picture, i.e. x' passes through D_1，2After forward propagation, L is calculated using the result and the BCELoss value of True_adv1(ii) a The calculation method comprises the following steps:

L_adv1＝-log[D_1，2(G₁(x，m))]

Obtaining that k is the side length of the input picture; l is_ssimBy

Wherein λ_psnrAnd λ_adv11.16 and 0.0003 respectively, in the fixed position G₁When the parameters of (1) are D_1，2To judge the probability that x is a real picture, i.e. calculating the judgment error by BCELoss and back-propagating, for D₁And D₂Updating parameters; by D_1，2To judge the probability that x' is false picture, i.e. using BCELoss to calculate the judgment error and back-propagate, for D₁And D₂Updating parameters;

in step S6, y is passed through G₂Forward direction of rotationPropagating to obtain a prediction result y', specifically:

Wherein

Wherein λ is_adv2Is 0.01; at the fixed position G₂When the parameters of (1) are D₃To judge the probability that x is a real picture, namely calculating the judgment error back propagation by using the difference value, and to D₃Updating parameters; by D₃To determine the probability that y' is a false picture, i.e. to calculate the judgment error back propagation by using the difference value, for D₃And updating the parameters.

2. The perceptual deblurring deep human face image inpainting method according to claim 1Wherein, in step S2, a coarse reconstruction stage generator G in the depth human face image restoration method based on perception deblurring is constructed₁Global arbiter D₁And a local discriminator D₂Constructing an ADADELTA optimizer; the method specifically comprises the following steps:

D₁the global discriminator is a convolutional neural network and is used for discriminating the probability that the generated picture x' is a real picture at a global angle in one stage; d1 contained 6 layers in total, of which there were 5 convolutional layers and 1 fully-connected layer; the output characteristic resolution of the first 4 layers of convolution layers is one half of the input resolution; the output resolution of the 5 th convolution layer is unchanged; after each of the 5 convolutional layers was operated, the BatchNormailization normalization method and the ReLU activation function were used;

D₂the local discriminator is a convolution neural network and is used for discriminating the probability that the generated picture x' is a real picture at a local angle in one stage; d₂The total number of the layers is 5, wherein 4 convolution layers and 1 full-connection layer are arranged; the output characteristic resolution of the first 3 layers of convolution layers is one half of the input resolution; the output resolution of the 4 th convolution layer is unchanged; after each of the 4 convolutional layers was operated, the BatchNormailization normalization method and the ReLU activation function were used; performing a connection operation between the layer 5 output characteristic diagram and the layer 6 output characteristic diagram of D1 to form a fully connected layer, wherein a ReLU activation function is used after the fully connected layer;

3. The method for restoring a deep face image with perceptual deblurring as defined in claim 1, wherein in step S5, a perceptual deblurring stage generator G in the method for restoring a deep face image based on perceptual deblurring is constructed₂And a discriminator D₃And constructing an Adam optimizer, specifically comprising the following steps:

G₂the perception deblurring generation network is a deep neural network based on CNN and GAN; in a perception deblurring stage, a reconstructed picture with a roughly reconstructed structure needs to be subjected to feature extraction by adopting a neural network with residual connection, and a feature map with low resolution is restored to the original resolution by using an deconvolution layer; g₂A total of 13 layers, 11 convolutional layers and 2 anti-convolutional layers; the first 3 layers are convolution layers, and the output resolutions of the 2 nd layer and the 3 rd layer are half and one fourth of a real picture respectively; the 1 st residual block contains the 4 th and 5 th convolutional layers, the 2 nd residual block contains the 6 th and 7 th convolutional layers, and the 3 rd residual block contains the 8 th and 9 th convolutional layers; the residual blocks all contain residual links; layers 10 and 11 are deconvolution layers, and the resolution of the output feature map of each layer is outputTwice as large as the input, the output resolution of layer 11 is the same size as the input picture resolution; the 12 th layer is a convolution layer, and the output of the 12 th layer is a perception deblurring result y' with three channels and the same resolution as the input picture; except the network layer with the changed resolution of the output characteristic diagram, the output resolution of other network layers is not changed; after the operation of each layer of the first 11 layers, the BatchNormailization normalization method and the ReLU activation function are used; after the operation of layer 12, a Tanh activation function is used;

4. The perceptual deblurring deep human face image restoration method according to claim 1, wherein in the step S8, the model is judged and the optimal model is saved through qualitative and quantitative tests on z, specifically:

sampling from the test set to obtain a batch of real pictures, cutting and scaling the real pictures to achieve the target resolution, hollowing out the area to be repaired through a single-channel mask for representing the area to be repaired in RGB three channels, and normalizing the three channels to be used as input; input via G₁And G₂After forward propagation in sequence, a complete restoration result is obtained; repeating the operation until the whole test set is sampled without repetition; the qualitative test is to look at all the complete repair results obtainedJudging perceptually, including whether the contents inside and at the boundary of the picture repairing area are consistent or not, and whether the color, the brightness and the contrast of the picture are consistent or not; the quantitative test is to calculate the PSNR value and the SSIM value of all finally obtained complete restoration results and corresponding original images, and the higher the value is, the better the value is; when the steps S2 to S6 are repeated, a plurality of trained models can be obtained; and performing qualitative test and quantitative test on different models, and selecting a final model with excellent effects of the qualitative test and the quantitative test, wherein the model with better qualitative test effect should be selected preferentially.