CN113936318A

CN113936318A - Human face image restoration method based on GAN human face prior information prediction and fusion

Info

Publication number: CN113936318A
Application number: CN202111218941.9A
Authority: CN
Inventors: 李孝杰; 万启慧; 史沧红; 张�浩; 严喆; 张宪; 吴锡; 吕建成; 周激流
Original assignee: Chengdu University of Information Technology
Current assignee: Chengdu University of Information Technology
Priority date: 2021-10-20
Filing date: 2021-10-20
Publication date: 2022-01-14

Abstract

The invention relates to a human face image restoration method based on GAN human face prior information prediction and fusion, wherein a neural network of the method takes a VAE structure as a main network and comprises two stages, firstly a rough image with human face structure content information is generated through a coarse neural network at a stage-I stage, meanwhile human face generation guide information is obtained through fusing human face contour, region and key point intermediate characteristics, then in order to better consider the human face structure information, the result of the stage-I is refined through a fine neural network at a stage-II stage, the guide information is introduced into a second generator to realize human face detail and structure refinement, and finally a natural harmonious and structurally symmetrical human face image is generated.

Description

Human face image restoration method based on GAN human face prior information prediction and fusion

Technical Field

The invention relates to the field of image processing, in particular to a human face image restoration method based on GAN human face prior information prediction and fusion.

Background

Image completion is a popular field in computer vision tasks and aims to fill in missing parts of a completed image with visually reasonable content. The face completion is a special case of image completion, and aims to complete the repair of a shielded face area without being constrained by posture and direction. However, the existing face completion method only includes a simple facial feature to complete face completion, and as a result, the method is still not satisfactory and has the defect of easy detection. Furthermore, there are often ambiguous boundaries and details near the missing part. In particular, for face repair, face region information (structure information, contour information, and content information) has not been fully utilized, which would result in unnatural face images being generated, such as: asymmetric eyebrows and different eye sizes. Unlike conventional image restoration methods, face restoration requires content, contour, and structural information about a target object to achieve natural and realistic output. However, these general image restoration methods only focus on the sharpness of the whole image, do not consider the particularity of the human face, and do not fully explore and utilize the semantic information of the human face, so that the generated human face image is unnatural, fuzzy and distortion, and lacks of human face texture details. Especially under special conditions such as COVID-19 epidemic situation, the face completion can effectively remove the mask and restore the full face of the face. Therefore, face image completion based on deep learning is still a challenging main subject for face repair.

The prior art has the following defects:

1. it is difficult to complement an image having a large missing region

In the traditional image completion method, the background continuity is generally selected to complete the foreground missing region completion, and the copied similar region is filled into the missing region to complete the image completion. The method can not solve the face image completion problem of large missing area. It is not recommended to complement the missing large face area with other face areas. In fact, large missing regions with square masks are more difficult to accomplish than irregular masks or smaller square missing masks because the acceptance range of the convolution kernel is square and once the convolution kernel reaches the missing region, the convolution kernel cannot capture any useful information. While for irregular or small missing parts, the convolution kernel can capture useful information in the received field from the background or missing regions. Therefore, some image repair methods typically repair irregular, or smaller square masks to verify the effectiveness of the method, not in line with the actual requirements.

2. It is difficult to generate a natural harmonious face from a background image

For a missing image of a human face, the content of the missing region is very different from that of the background region, so that it is difficult to generate a natural and harmonious face from the background image. For example, some image inpainting methods use an attention mechanism to search background regions to find similar blocks of missing regions, but each missing lock of each image takes a long time to train for similarity matching with surrounding background blocks and is prone to facial feature deformation.

3. The adaptability of the repair network and the correctness of the repair result need to be improved

The human face missing image completion mainly focuses on reconstructing human face parts with natural and harmonious characteristics, and the natural image completion method or the partial human face completion method only focuses on the definition of the whole image or simply considers the facial characteristics to complete human face completion, does not consider the particularity of the human face, and does not fully search and utilize facial semantic information, so that the generated human face image is not natural, fuzzy and distorted, and lacks human face texture details. Face completion remains a challenging issue because it requires the generation of semantically new pixels for the missing key components and the maintenance of structural and appearance consistency. Further research is needed to improve the adaptability of the repair network and the correctness of the repair result.

To solve these problems, we propose a new generation countermeasure network, which can perform face restoration with large area missing with the assistance of obtaining the face prior face fusion information network.

Disclosure of Invention

Aiming at the defects of the prior art, the method for repairing the human face image based on the human face prediction and fusion of the GAN comprises the following steps:

step 1: downloading public human face data set, preprocessing the data set and constructing image x of missing human face_θMeanwhile, the training set, the verification set and the test set are proportionally divided;

the face image complementing method mainly comprises two stages: the method comprises a stage-I stage and a stage II stage, and specifically comprises the following steps:

step 2: the rough modification neural network model at the stage-I stage comprises a first generator, two encoders and three decoders, and firstly, the image x which is constructed in the step 1 and lacks the face information_θSending into a network with a variational automatic encoder VAE structure as a backbone, and obtaining face contour information through nonlinear reconstruction by two encoders and three decodersM′_θ-fAnd face region information x'_θ-fAnd face key point information x'_θ-l(ii) a Performing information fusion on face contour information, structure information and content information obtained by VAE network reconstruction to obtain face prior guidance information which is beneficial to generating contours, structures and contents of clear faces; image x with missing face information_θSending the face image to a first generator, and fusing the face prior guidance information in the intermediate layer of the first generator to fully explore face region information to generate a low-resolution face image;

the method comprises the following specific steps:

step 21: the missing face image x_θSequentially inputting the two encoders and the three decoders, and obtaining face contour information M 'through nonlinear reconstruction'_θ-fAnd face region information x'_θ-fAnd face key point information x'_θ-lRespectively extracting the outline, the area and the key point information of the face;

step 22: constructing and obtaining a face coding feature vector z by combining face contour information, face region information and face key point information_θ-fAnd z_θ-lFinally, z is fused by feature fusion_θ-fAnd z_θ-lFusing to construct a face feature expression space and obtain face prior guidance information z with higher quality_θ-M；

Step 23: missing face image x in stage-I training stage_θThrough a first generator, the face prior guidance information is fused in the middle layer of the first generator, and after the intermediate characteristic diagram of the first generator is spliced, a low-resolution natural symmetrical face image is generated under the action of the prior guidance information;

step 24: performing iterative training on the neural network model at the stage-I stage according to the set batch size of each training set, and iteratively updating network parameters of a generator, an encoder and a decoder of the stage-I according to the face contour information, the reconstruction loss function of the face content information and the face structure information and the face information prediction loss function to complete the face image completion network training of the stage-I;

step 25: judging whether the set verification iteration times are reached, and if so, verifying the primary model and storing the primary model; if not, go to step 26;

step 26: judging whether the set total iteration times are reached, and if so, ending the training; otherwise, repeating the steps 21-25;

and step 3: after the rough neural network training of the stage-I stage is finished, freezing the trained stage-I neural network and related parameters, and starting the learning and training process of the fine neural network of the stage-II stage, wherein the network structure of the stage-II stage mainly comprises: the second generator, the global arbiter and the block arbiter specifically comprise the following steps:

step 31: in the stage-II stage, the low-resolution natural symmetric face image generated in the stage-I stage is used as input and is input into a second generator, and meanwhile, in order to better consider face structure information, the face prior guidance information is further introduced into an intermediate layer of the second generator so as to refine details and structure of the face and generate a first face repairing image with higher resolution;

step 32: sending the first face repairing image into two discriminators, and enabling a second generator to generate a high-resolution face repairing image with a symmetrical face structure by means of the countermeasure thought generated by a GAN network, wherein the global discriminator judges the distribution consistency of the images on the whole, and the blocking discriminator is responsible for monitoring the generation details of the images in each patch;

step 32: performing iterative training on the facial image refinement network in the stage-II stage according to the set batch size of each training set;

step 33: judging whether the set verification iteration times are reached, and if so, verifying the primary model and storing the primary model; if not, go to step 34;

step 34: judging whether the set total iteration times are reached, and if so, ending the training; otherwise, repeating the steps 31 to 33;

according to a preferred embodiment, in the stage-I training, by adding constraints, three potential feature discriminators are used for replacing the original KL divergence as constraints, so that the interference caused by the reconstruction loss of the human face part by the large base number of the KL divergence is reduced, and simultaneously, the method can compete with an encoder, enhance the learning capability of the encoder and obtain more accurate human face feature information.

According to a preferred embodiment, the pretreatment method comprises:

four reference standard images are constructed to assist face completion, including a standard face contour image M_θ-fStandard face content image X_θ-fStandard face structure image Z_θ-lAnd standard portrait foreground map X_FGThe acquisition method comprises the following steps:

standard human face contour image M_θ-fExtracting 68 key points of the face from the original face by using a face detection alignment method, and expanding the obtained 41 key points one by one according to 3% of the size of the key points to ensure that the eyebrow and boundary information of the face is fused into the key points, thereby obtaining a face contour image M_θ-f；

Standard face content image X_θ-fOriginal image X_realAnd a standard face contour image M_θ-fMultiplying to obtain standard face content image X_θ-f；

Standard face structure image Z_θ-lThe face structure image Z is obtained by performing expansion fusion on 41 key points including eyes, nose, mouth and the like in the face_θ-l；

Standard portrait foreground picture X_FGObtained by segmentation of the acquired portrait using a hundredth interface.

The invention has the beneficial effects that:

1. the confrontation network is generated in two stages, and the face image can be repaired in stages from coarse to fine under the assistance of the obtained face fusion information network; meanwhile, the structural information and the enhanced texture of the human face are fused, and finally a high-resolution human face image with realistic details is generated.

2. Aiming at the problems of face structure distortion, face information asymmetry and face blurring generated by the existing face complementing method, a generation countermeasure network based on face structure, outline and content information coding is provided. The method can improve the generation quality of the large-area missing face completion, and can obtain a satisfactory completion result when the missing area is very different from the background content.

3. Aiming at the problem that the learning burden of a generator is increased by using one generator in a GAN structure, and the generated face is sometimes speckled, therefore, a VAE-based multi-generator face completion generation countermeasure network is proposed: two encoders and three decoders are adopted at the stage-I stage, so that the learning burden of a generator is reduced, and structural information, content information and contour information with dependency relationship in a face image are acquired simultaneously to generate a high-quality face image.

4. And the network is constrained by using reconstruction loss and countermeasure loss, so that the network performance is improved, the characteristics of the generated image in the network are close to the characteristics of the corresponding original face image, and the final face completion result is further improved.

5. The semantic information of the face is fused in the stage-I generator and the stage-II generator, so that the semantic information of the face can be fully explored and utilized to guide face repairing; the robustness of the generator can be enhanced, and the face completion result is more stable.

Drawings

FIG. 1 is a flow chart of a method of the facial image restoration network of the present invention;

FIG. 2 is a diagram of a face image restoration network according to the present invention;

FIG. 3 is a diagram of six face information used in the present invention; and

FIG. 4 is a graph comparing the experimental results of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.

The invention mainly solves the problems of unclear, unnatural and asymmetric image quality in face image restoration. The human face image repairing task mainly fills up images lacking human face content, is a special case of image repairing and aims to complete repairing of a shielded face area without being constrained by posture and direction. For face repair, face region information (structure information, contour information, and content information) has not been fully utilized, which would result in unnatural face images being generated, such as: asymmetric eyebrows and different eye sizes. Therefore, the improvement of the existing face image restoration algorithm is urgently needed, so that the face image restoration algorithm can generate a high-quality result.

The following detailed description is made with reference to the accompanying drawings.

Fig. 1 is a flow chart of a method of the facial image restoration network of the present invention, fig. 2 is a structure diagram of the facial image restoration network of the present invention, and the method of the present invention will be described in detail with reference to fig. 1 and fig. 2. The invention provides a human face image restoration method based on GAN (generic image extension) face prediction and fusion, which comprises the following steps:

step 1: downloading public face data sets, such as: CelebA-HQ, preprocessing the data set, and constructing an image x without human face_θ(ii) a According to the conventional proportion of 28: 1: 1 constructing a training set, a verification set and a test set. In one specific embodiment, the CelebA-HQ dataset contains 30000 human face images, and specifically, the dataset is divided into 3 subsets: 28000 training sets, 1000 validation sets and 1000 test sets.

step 2: the rough modification neural network model at the stage-I stage comprises a first generator, two encoders and three decoders, and firstly, the image x which is constructed in the step 1 and lacks the face information_θSending the data into a network with a back bone structure of a variable automatic encoder VAE, and obtaining face contour information M 'through nonlinear reconstruction by two encoders and three decoders'_θ-fAnd face region information x'_θ-fAnd face key point information x'_θ-l(ii) a Performing information fusion on face contour information, structure information and content information obtained by VAE network reconstruction to obtain face prior guidance information which is beneficial to generating contours, structures and contents of clear faces; image x with missing face information_θAnd sending the face image to a first generator, and fusing the face prior guidance information in the intermediate layer of the first generator to fully explore face region information to generate a low-resolution face image. The image generated in the first stage is a low-resolution face image with clearer contour, more symmetrical structure and more complete content.

The method comprises the following specific steps:

step 21: the missing face image x_θSequentially inputting the two encoders and the three decoders, and obtaining face contour information M 'through nonlinear reconstruction'_θ-fAnd face region information x'_θ-fAnd face key point information x'_θ-lTo extract the contour, region and key point information of the face respectively.

Step 22: constructing and obtaining a face coding feature vector z by combining face contour information, face region information and face key point information_θ-fAnd z_θ-lFinally, z is fused by feature fusion_θ-fAnd z_θ-lFusing to construct a face feature expression space and obtain face prior guidance information z with higher quality_θ-M，z_θ-MThe mathematical expression of (a) is as follows:

wherein the content of the first and second substances,

splicing the channels; θ has no practical meaning, and together with M, f, l, it has some meaning. Wherein, theta-M represents the face contour information; theta-f represents the face region characteristics; θ -l represents face keypoint information. z is a radical of_θ-fIs an intermediate quantity for learning the score of the face featureAnd (4) measuring the cloth amount.

Unlike conventional VAE structures, we use two encoders and three decoders to correlate or rely on the structural, content, and contour information in the acquired face image. Therefore, the constructed face coding feature vector z_θ-fAnd z_θ-lAlso include these three kinds of information simultaneously, finally make z_θ-MAnd providing higher-quality human face prior guiding information.

In the stage-I stage, a coding and decoding network taking VAE as a backbone is constructed, and by adding constraints, three potential feature discriminators are used for replacing original KL divergence to serve as constraints, so that the interference caused by the large base number of the KL divergence to the reconstruction loss of the face part is reduced, meanwhile, the large base number of the KL divergence can compete with an encoder, the learning capability of the large base number of the KL divergence is enhanced, and more accurate face feature information is obtained.

Step 23: in order to generate natural harmonious and symmetrical face images, missing face images x are processed in the stage-I training stage_θAnd through a first generator, the face prior guidance information is fused in the intermediate layer of the first generator, and after the intermediate characteristic diagram of the first generator is spliced, a low-resolution natural symmetrical face image is generated under the action of the prior guidance information.

Step 24: and performing iterative training on the neural network model at the stage-I stage according to the set batch size of each training set, and iteratively updating network parameters of a generator, an encoder and a decoder of the stage-I according to the face contour information, the reconstruction loss function of the face content information and the face structure information and the face information prediction loss function to complete the face image completion network training of the stage-I.

Step 25: judging whether the set verification iteration times are reached, and if so, verifying the primary model and storing the primary model; if not, go to step 26.

Step 26: judging whether the set total iteration times are reached, and if so, ending the training; otherwise, repeating the steps 21-25.

step 32: and sending the first face repairing image into two discriminators, generating a confrontation thought by means of a GAN network, enabling a second generator to generate a high-resolution face repairing image with a symmetrical face structure, judging the distribution consistency of the images by the global discriminator on the whole, and supervising the generation details of the images in each patch by the block discriminator.

Unlike the conventional region discriminator which focuses only on the generated region, the patch discriminator cuts the entire image into a plurality of patches of small size, and then judges whether or not each patch is true. Therefore, the global discriminator supervises the consistency of the generated area and the background in the whole image, and the block discriminator achieves the specific purpose of restoring the texture details. When the two discriminators are unable to distinguish between the final restored face image and the original face image, indicating that the second generator and the network of discriminators are balanced, the second generator can capture the true distribution of the face image data.

Step 32: and performing iterative training on the facial image finishing network in the stage-II stage according to the set batch size of each training set.

In each training, firstly training and updating the parameters of the discriminator according to the resistance loss function, and freezing the parameters of the discriminator after the updating is finished; secondly, updating generator parameters according to the countermeasure loss function and the reconstruction loss function, and finishing training of the facial image refinement network in the stage-II stage in an alternate training mode;

step 33: judging whether the set verification iteration times are reached, and if so, verifying the primary model and storing the primary model; if not, go to step 34.

Step 34: judging whether the set total iteration times are reached, and if so, ending the training; otherwise, step 31 to step 33 are repeated.

The loss function of the facial image restoration method provided by the invention comprises a reconstruction loss function for reconstructing facial structure information, a facial information prediction loss function and an antagonistic loss function for reconstructing facial distribution information, wherein,

the reconstruction loss function is used for constraining the global structure of the generated face image and guiding the extraction of the contour information, the content information and the structure information of the face image;

the face information prediction loss function is used for guiding the extraction of the contour information, the content information and the structure information of the face image;

the resistance loss function is used for restoring detail information of the face image, so that the face looks clearer.

The loss function of the stage-I stage comprises a reconstruction loss function and a potential classification loss function, and is specifically as follows:

wherein the content of the first and second substances,

represents the reconstruction loss function of the encoder-decoder,

representing the reconstruction loss function of the first generator,

representing a potential classification loss function.

Wherein M is_θ-fRepresenting a real face contour image, x_θ-fRepresenting real face content images, x_θ-lRepresenting a true face structure image, M'_θ-fRepresenting the generated face contour image, x'_θ-fDenotes a generated face content image, x'_θ-lRepresenting the generated image of the structure of the human face,

representing cross entropy loss, | · | | non-woven vision₁A range constraint is represented that is,

λ^gen1a hyperparameter representing the weight of the control loss,

denotes element-by-element multiplication, x_realRepresenting the original real image dataset, x_tRepresenting the result of the first generator in stage-I, x_FGAnd (5) representing a portrait foreground image, and setting a hyper parameter eta to be 0.5.

The three classifiers are used as potential feature discriminators and used for extracting potential feature constraints to replace KL divergence constraints so as to reduce interference caused by the loss of the large base number of the KL divergence on the reconstruction of the face part, simultaneously compete with an encoder, enhance the learning capability of the encoder and obtain more accurate face feature information. The specific discriminator loss is as follows:

wherein D is_i(i ∈ {0,1,2}) represents a potential discriminator.

In addition, the potential discriminator D_iThe resistance loss of (a) is defined as follows:

wherein z represents a standard normal distribution feature vector, z_θ-fAnd z_θ-lRepresenting face coding feature vector, z_θ-MAnd representing a human face feature expression space, namely human face prior guiding information.

The loss function for stage-II includes the penalty loss and reconstruction loss as follows:

wherein the content of the first and second substances,

representing the local and global penalty function of the generator,

representing the reconstruction loss function of the second generator. The second generator is used to repair the detail information of the face image, so that the face looks clearer, specifically as follows:

the reconstruction loss function of the second generator is used for constraining the global structure of the generated face image, and the mathematical expression is as follows:

wherein x is_recRepresenting the high resolution image generated by the generator in stage-II.

In addition, the countermeasure loss function of the discriminator is used for repairing the detail information of the face image, so that the face looks clearer;

wherein the content of the first and second substances,

representing local and global opponent loss functions of the arbiter.

According to a preferred embodiment, the pretreatment method comprises:

obtaining a missing image x through mask occlusion_θ。

standard human face contour image M_θ-fExtracting 68 key points of the face from the original face by using a face detection alignment method, and expanding the obtained 41 key points one by one according to 3% of the size of the key points to ensure that the eyebrow and boundary information of the face is fused into the key points, thereby obtaining a face contour image M_θ-f。

Standard face content image x_θ-fOriginal image X_realAnd a standard face contour image M_θ-fMultiplying to obtain standard face content image x_θ-f。

Standard face structure image Z_θ-lObtaining a face structure image Z by performing expansion fusion on 41 key points including eyes, nose, mouth and the like in the face_θ-l。

Wherein, the standard human face contour image M_θ-fStandard face content image x_θ-fStandard face structure image X_θ-lAs a reference standard for reconstructing relevant content by the network at loss.

The human face image repairing method also comprises the steps of testing the trained repairing network, processing the input image of the network according to the method in the step 1, respectively operating the stage-I to stage-II training networks according to the step 2, and outputting a test result by a stage-II generator after the training is finished.

Our model was evaluated on the natural face image dataset CelebA-HQ. The CelebA-HQ dataset was divided into 28000 training images, 1000 verification images and 1000 test images, with 256 × 256 face images for CelebA-HQ.

Fig. 3 includes six kinds of face information images of a training data set, which are respectively: (a) original imagex_t(b) cut out human face image X_θ(c) face contour image M_θ-f(d) face content image x_θ-f(e) face structure image x_θ-l(f) human image foreground image x_FG；

In addition, we compare our method with the existing six best face restoration methods: PM (PatchMatch), GLCIC (Global and Localiy Consistent Image completion), CA (contextual attachment), PICNet (Pluralogic Image completion), PEN (Pyramid-context Encode Network) and CSA (coherent Semantic attachment), and are compared using the same set of irregular mask data.

First, we qualitatively compared our model to PM, GLCIC, CA, PICNet, PEN and CSA. FIG. 4 shows the results of the different methods on the data set CelebA-HQ, and we show the cut-out area in black in FIG. 4 (b)). In fig. 4(c), when the missing region is largely different from the surrounding environment, the PM cannot complete the entire face. In fig. 4(d), although GLCIC can complete the whole face, the inlined area is too blurred. In fig. 4(e), the face finished by CA is severely distorted. In fig. 4(g), the PICNet may return a clear face, but the face is not harmonious. This is because the PICNet aims to produce a clear image by enhancing the constraint ability of the discriminator, but destroys the structural consistency of the image, resulting in image distortion. In FIG. 4(h), PEN, although performing well on the CelebA-HQ dataset, does not perform well on the low resolution dataset. In fig. 4(i), CSA produces very good performance similar to our approach, but it takes a long time to train because they need to find similar blocks and compute similarities to surrounding blocks. In addition, the CSA-complemented face image may generate some noise (see fig. 4 (i)). In contrast, our model achieves a natural and realistic result in FIG. 4 (j). Experimental results show that the models with reconstruction loss, face information prediction loss and countervailing loss can improve the naturalness and definition of the generated face images, and finally generate high-quality face image restoration results.

TABLE 1 Objective evaluation index comparison of the Experimental results for the CelebA-HQ dataset

To further evaluate the effectiveness of this method, we also performed quantitative comparison experiments, and the quantitative results of the data set are shown in table 1. In particular, table 1 shows the quantitative results of the different methods on the CelebA-HQ dataset. The method proposed by us utilizes the advantages of the guiding information and the second stage of the integrated information in three categories of indicators (PSNR, SSIM and L)₁) The aspect shows the most advanced level than other methods. It achieves the best generalization performance with large masks. Specifically, our method (Our) with Stage-II and the guidance section achieved PSNR 25.823, SSIM 0.890, and L1 6.74 on the CelebA-HQ dataset. Furthermore, as can be seen from table 1, PEN produced results comparable to our method in terms of PSNR, SSIM and L1 for the CelebA-HQ dataset. However, PEN has much poorer completion performance than our method. The quantitative results show that our method as a whole achieves better performance in terms of PSNR, SSIM and L1 than all other methods.

It should be noted that the above-mentioned embodiments are exemplary, and that those skilled in the art, having benefit of the present disclosure, may devise various arrangements that are within the scope of the present disclosure and that fall within the scope of the invention. It should be understood by those skilled in the art that the present specification and drawings are illustrative only and are not limiting upon the claims. The scope of the invention is defined by the claims and their equivalents.

Claims

1. A face image restoration method based on GAN face prior information prediction and fusion is characterized by comprising the following steps:

step 1: downloading a public face data set, preprocessing the data set and constructing an image x lacking a face_θAnd proportionally divided into trainingSet, validation set and test set;

step 2: the face image complementing method mainly comprises two stages: the stage I and stage II comprise the following steps:

the rough modification neural network model at the stage-I stage comprises a first generator, two encoders and three decoders, and firstly, the image x which is constructed in the step 1 and lacks the face information_θSending into a network with a back bone structure of a variational automatic encoder VAE, and obtaining face contour information M 'through nonlinear reconstruction by two encoders and three decoders'_θ-fAnd face region information x'_θ-fAnd face key point information x'_θ-l(ii) a Performing information fusion on face contour information, structure information and content information obtained by VAE network reconstruction to obtain face prior guidance information which is beneficial to generating contours, structures and contents of clear faces; image x missing face information_θSending the face image to a first generator, and fusing the face prior guidance information in the intermediate layer of the first generator to fully explore face region information to generate a low-resolution face image;

step 22: constructing and obtaining a face coding feature vector z by combining face contour information, face region information and face key point information_θ-fAnd z_θ-lFinally, z is fused by feature fusion_θ-fAnd z_θ-lFusing to construct human face feature expression space and obtain human face prior guidance information z with higher quality_θ-M；

Step 23: missing face image x in stage-I training stage_θThrough a first generator, the face prior guidance information is fused in the intermediate layer of the first generator, and after the face prior guidance information is spliced to the intermediate characteristic diagram of the first generator, the face prior guidance information is generated under the action of the prior guidance informationLow-resolution natural symmetric face images;

step 24: iteratively training the neural network model at the stage-I stage according to the set batch size of each training set, and iteratively updating network parameters of a generator, an encoder and a decoder of the stage-I according to the face contour information, the reconstruction loss function of the face content information and the face structure information and the face information prediction loss function to complete the face image completion network training of the stage-I;

step 31: in the stage-II stage, a low-resolution natural symmetric face image generated in the stage-I stage is used as input and is input into a second generator, and meanwhile, in order to better consider face structure information, face prior guidance information is further introduced into an intermediate layer of the second generator so as to refine details and structure of a face and generate a first face repairing image with higher resolution;

step 32: sending the first face repairing image into two discriminators, enabling a second generator to generate a high-resolution face repairing image with a symmetrical face structure by means of the countermeasure thought generated by a GAN network, judging the distribution consistency of the images by the overall discriminator, and supervising the generation details of the images in each patch by the blocking discriminator;

step 34: judging whether the set total iteration times are reached, and if so, ending the training; otherwise, repeating the steps 31-33;

and 4, step 4: and testing the trained restoration model according to the test set.

2. The method for restoring a human face image as claimed in claim 1, wherein in the stage-I training, by adding constraints, three potential feature discriminators are used to replace the original KL divergence as constraints, so as to reduce the interference caused by the large base number of the KL divergence on the reconstruction loss of the human face part, and simultaneously, the method can compete with an encoder, enhance the learning capability and obtain more accurate human face feature information.

3. A method of inpainting a face image as claimed in claim 2, wherein the preprocessing method comprises: