CN114627204A

CN114627204A - Two-dimensional reconstruction method, system, equipment and storage medium based on human face

Info

Publication number: CN114627204A
Application number: CN202210265066.8A
Authority: CN
Inventors: 孙腾
Original assignee: Beijing Yingshu Technology Co ltd
Current assignee: Beijing Yingshu Technology Co ltd
Priority date: 2022-03-17
Filing date: 2022-03-17
Publication date: 2022-06-14

Abstract

The embodiment of the application discloses a two-dimensional reconstruction method, a system, equipment and a storage medium based on human faces, wherein the method comprises the following steps: acquiring an unobstructed face image and an obstructed face image; inputting the occluded human face image into a pre-trained key point recognition model to obtain key points in the occluded human face image; further, inputting the shielded face image and key points thereof into an image generator network to obtain a shield-removed reconstructed face image generated based on the shielded face image; the image generator network is obtained by training the non-shielded face image and key points thereof, and the shielded face image and key points thereof based on an image discriminator network. The blocked face image is accurately and conveniently restored.

Description

Two-dimensional reconstruction method, system, equipment and storage medium based on human face

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence, in particular to a two-dimensional reconstruction method, a two-dimensional reconstruction system, two-dimensional reconstruction equipment and a storage medium based on a human face.

Background

In recent years, there has been a significant development in Virtual Reality (VR) which allows users to explore new environments (real and imaginary) that can be contacted with media in an immersive manner not previously available.

Sharing these experiences, however, is difficult because current head-mounted VR devices completely hide the wearer's face, in a social experience, hindering the interactive needs of two or more users sharing a 3D immersive experience, making it difficult for others to observe the experiences' expressions and understand their intuitive feelings.

There is therefore a need for a solution that enables a method of virtually removing facial occlusions, restoring a human face, which the viewer can use to feel the user's experience, thereby alleviating this disjointing phenomenon.

Disclosure of Invention

Therefore, the embodiment of the application provides a two-dimensional reconstruction method, a system, equipment and a storage medium based on a human face, and an occluded human face image is accurately, conveniently and rapidly restored.

In order to achieve the above object, the embodiments of the present application provide the following technical solutions:

according to a first aspect of the embodiments of the present application, there is provided a two-dimensional reconstruction method based on a human face, the method including:

acquiring an unobstructed face image and an obstructed face image;

inputting the occluded human face image into a pre-trained key point recognition model to obtain key points in the occluded human face image;

inputting the shielded face image and key points thereof into an image generator network to obtain a shield-removed reconstructed face image generated based on the shielded face image; the image generator network is obtained by training the non-shielded face image and key points thereof, and the shielded face image and key points thereof based on an image discriminator network.

Optionally, the training process of the image generator network includes the following steps:

taking consistency of the key points of the occlusion-removed reconstructed face image generated by the image generator network and the corresponding occluded image and key points thereof as constraint terms, and calculating key point loss functions of the key points of the occluded image and the key points of the occlusion-removed reconstructed face image;

inputting the reconstructed face image without the shielding and the non-shielded face image into the image discriminator network in pairs, outputting a discrimination result of whether the reconstructed face image without the shielding is the reconstructed face image without the shielding by the image discriminator network, and feeding the discrimination result back to the image generator network;

the image discriminator network trains according to the three types of data to update the weight of the image generator network until the weight meets a preset value; the three types of data comprise an unobstructed face image, key points corresponding to the unobstructed face image, a generated reconstructed face image without obstruction, key points corresponding to the generated face image under obstruction, an unobstructed face image and key points unmatched with the unobstructed face image.

Optionally, the inputting the occluded face image and the key point thereof into an image generator network to obtain a reconstructed face image without occlusion generated based on the occluded face image includes:

inputting the RGBA color image of the shielded human face and key points thereof into a down-sampling module to reduce the resolution and increase the characteristic dimension; then extracting the characteristic information of high latitude through a residual error network module; and then extracting attention weight through a feature map coding classifier so that the feature value attention module adds the learned attention weight which changes according to the input image to another residual network module, and then restoring the size of the input image through an up-sampling module.

Optionally, the feature value attention module is configured to distinguish whether the image obtained based on the auxiliary classifier is a source domain and a target domain of the occluded part to determine the densely transformed region.

According to a second aspect of the embodiments of the present application, there is provided a face-based two-dimensional reconstruction system, the system comprising:

the image acquisition module is used for acquiring an unobstructed face image and an obstructed face image;

the key point determining module is used for inputting the occluded human face image into a pre-trained key point recognition model so as to obtain key points in the occluded human face image;

the face reconstruction module is used for inputting the shielded face image and key points thereof into an image generator network to obtain a shield-removed reconstructed face image generated based on the shielded face image; the image generator network is obtained by training the non-shielded face image and key points thereof, and the shielded face image and key points thereof based on an image discriminator network.

taking consistency of the key points of the occlusion-removed reconstructed face image generated by the image generator network and the corresponding occluded image and the key points thereof as constraint terms, and calculating a key point loss function of the key points of the occluded image and the key points of the occlusion-removed reconstructed face image;

Optionally, the face reconstruction module is specifically configured to: inputting the blocked face RGBA color image and key points thereof into a down-sampling module to reduce the resolution and increase the characteristic dimension; then extracting the characteristic information of high latitude through a residual error network module; and extracting attention weight through a feature map coding classifier so that the feature value attention module adds the learned attention weight which changes according to the input image to the other residual error network module, and restoring the size of the input image through an up-sampling module.

According to a third aspect of the embodiments of the present application, there is provided a two-dimensional reconstruction apparatus based on a human face, including:

a memory for storing a computer program;

a processor for implementing the steps of a method for two-dimensional face-based reconstruction as claimed in any one of the above when said computer program is executed.

According to a fourth aspect of embodiments of the present application, a computer-readable storage medium is provided, wherein a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the steps of a two-dimensional face-based reconstruction method as described in any one of the above.

In summary, the embodiment of the present application provides a two-dimensional reconstruction method, a system, a device and a storage medium based on a human face, by obtaining an unobstructed human face image and an obstructed human face image; inputting the occluded human face image into a pre-trained key point recognition model to obtain key points in the occluded human face image; further, inputting the shielded face image and key points thereof into an image generator network to obtain a shield-removed reconstructed face image generated based on the shielded face image; the image generator network is obtained by training the non-shielded face image and the key points thereof, and the shielded face image and the key points thereof based on an image discriminator network. The blocked face image is accurately and conveniently restored.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.

The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so that those skilled in the art can understand and read the present invention, and do not limit the conditions for implementing the present invention, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the functions and purposes of the present invention, should still fall within the scope of the present invention.

Fig. 1 is a schematic flow chart of a two-dimensional reconstruction method based on a human face according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an embodiment of a network model application for generating a common loop countermeasure;

FIG. 3 is a schematic structural diagram of a loop countermeasure generation network model provided in an embodiment of the present application;

FIG. 4 is a schematic flow chart diagram of an embodiment provided by an embodiment of the present application;

fig. 5 is a block diagram of a two-dimensional reconstruction system based on a human face according to an embodiment of the present application.

Detailed Description

The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In a common face reconstruction method in the prior art, three-dimensional feature points need to be collected on a face in advance, a three-dimensional model of the face needs to be registered, and a two-dimensional face image needs to be restored through a three-dimensional space. The commonly used 2D image translation methods pix2pix and pix2pixHD require a large number of paired data sets to complete model training, that is, in the same scene, two human face images corresponding to each other with completely consistent angles, positions and sizes, wearing VR equipment and removing occlusion, which is almost impossible.

The embodiment of the application mainly relates to the fields of face alignment, face reconstruction, image generation and the like, and provides an improved algorithm based on an anti-neural network by utilizing a deep learning technology, wherein the improved algorithm is used for detecting facial expressions of visible parts of a face and correspondingly restoring and reconstructing a large-area blocked face region by utilizing the facial expressions, a VR headset removing method is combined with mixed reality, and a good effect is shown in the specific implementation of a plurality of virtual reality games and experiences.

Fig. 1 shows a flow of a two-dimensional reconstruction method based on a human face according to an embodiment of the present application, where the method includes the following steps:

step 101: and acquiring an unobstructed face image and an obstructed face image.

Step 102: and inputting the occluded human face image into a pre-trained key point recognition model to obtain key points in the occluded human face image.

Step 103: inputting the occluded human face image and key points thereof into an image generator network to obtain a reconstructed human face image which is generated based on the occluded human face image and is subjected to occlusion removal; the image generator network is obtained by training the non-shielded face image and the key points thereof, and the shielded face image and the key points thereof based on an image discriminator network.

In a possible implementation manner, in step 102, the keypoint identification model is a convolutional neural network trained according to a preset sample input set and a sample output set, and by extracting a feature map in a face image, keypoints in the face image are identified. The key points may be coordinates of key points of eyebrows, eyes, mouth, nose, ears, etc.

In one possible embodiment, the training process of the image generator network comprises the following steps: taking consistency of the key points of the occlusion-removed reconstructed face image generated by the image generator network and the corresponding occluded image and key points thereof as constraint terms, and calculating key point loss functions of the key points of the occluded image and the key points of the occlusion-removed reconstructed face image; inputting the reconstructed face image without the shielding and the non-shielded face image into the image discriminator network in pairs, outputting a discrimination result of whether the reconstructed face image without the shielding is the reconstructed face image without the shielding by the image discriminator network, and feeding the discrimination result back to the image generator network; the image discriminator network trains according to the three types of data to update the weight of the image generator network until the weight meets a preset value; the three types of data comprise an unobstructed face image, key points corresponding to the unobstructed face image, a generated reconstructed face image without obstruction, key points corresponding to the generated face image under obstruction, an unobstructed face image and key points unmatched with the unobstructed face image.

In a possible implementation manner, in step 103, the inputting the occluded face image and the key point thereof into an image generator network to obtain an occluded reconstructed face image generated based on the occluded face image, including:

inputting the blocked human face RGBA color image and key points thereof into a down-sampling module to reduce the resolution and increase the characteristic dimension; then extracting the characteristic information of high latitude through a residual error network module; and extracting attention weight through a feature map coding classifier so that the feature value attention module adds the learned attention weight which changes according to the input image to the other residual error network module, and restoring the size of the input image through an up-sampling module.

In one possible embodiment, the feature value attention module is configured to distinguish whether an image obtained based on the secondary classifier is a source domain and a target domain of an occluded part to determine a densely transformed region.

Therefore, the two-dimensional reconstruction method based on the human face provided by the embodiment of the application generates a complete non-occlusion real human face with a corresponding view angle by inputting the image according to the specified human face condition. The method is a human face image translation generation frame, can realize that the human face which is partially shielded removes shielding objects, recovers and generates a human face five-sense organ structure corresponding to the same visual angle, and ensures that the generated human face expression structure is completely matched with the non-shielded part of the original image.

An improved algorithm based on an anti-neural network is used for detecting the facial expression of a visible part of a human face, and accordingly restoring a reconstructed model of a human face area with large-area occlusion by using the improved algorithm, a VR headset removing method is combined with mixed reality, and effects are displayed in a plurality of virtual reality games and experiences.

It should be noted that the method provided by the embodiment of the present application is not limited to the scene of VR headset removal, and is also applicable to a scene in which most of faces are occluded.

The algorithm model provided by the embodiment of the application is a model which is changed on the basis of a cyclic countermeasure generating network (CycleGAN), and by adding the key point information (landmark) of the human face, the constraint of the algorithm on the human face generating structure is more stable. The algorithm provided by the embodiment of the application needs to ensure that a human face is generated, the characteristics of the human face non-shielding part are not changed, and meanwhile, the five sense organs of the shielded area which accords with the structure of the human face are reconstructed.

Therefore, the algorithm model structure is improved as follows:

1. the CycleGAN is used as a model of a basic network structure, an unsupervised (self-supervised) human face generation framework is realized, paired output images are not needed to be used as a training set, and the problem that training data of a supervised GAN model is difficult to acquire is solved. The model training can be completed only by respectively forming two types of subsets required by the cycleGAN model by a plurality of helmet wearing images at any angle and the face images without helmet shielding.

2. The corresponding images generated by the ordinary CycleGAN model have no corresponding pair data as constraints, so that it is difficult to ensure that the generated head portrait can be consistent with the position and size of the original image (as shown in fig. 2).

In fig. 2, the domain a real image refers to an image of a person's face with helmet glasses, and the domain B real image refers to an image of a person's face without occlusion. G_ABThe image generator generates a corresponding domain B image by using a certain image in the domain A as an input. G_BAIs a generator for generating an image corresponding to a domain A from an image of a domain B as an input, and generating a reconstructed image belonging to the domain A such that G is_ABWhen generating a domain B image from domain a, the shape remains unchanged. D_BThe discriminator is used for judging the authenticity of the domain B image in a plurality of discriminators, and specifically judges whether the input domain B image exists really or is synthesized by a machine.

According to the embodiment of the application, the face marked with the unmasked part and the key points are used as the input of the network model, and meanwhile, the error of the positions of the key points detected by the generated face and the input face is constructed as the loss function of deep learning fitting, so that the face generated by the cycleGAN can be ensured to be consistent with the original masked face, and the specific network structure can be as shown in FIG. 3.

3. The network model provided by the embodiment of the application is mainly used for removing VR equipment, recovering facial features of an occluded part and keeping the facial structure of an unoccluded part unchanged to the maximum extent. An attention module is added into the improved model structure, whether the model is a source domain and a target domain of an occluded part is distinguished through an attention diagram obtained based on an auxiliary classifier, the model is helped to know where to conduct dense conversion, and the model is guided to flexibly control the variable quantity of the shape and the texture without modifying the model architecture or the hyper-parameters.

The method implementation details provided in the embodiments of the present application are further described below with reference to the drawings for specific implementation purposes.

1. Data set preparation

The training data set is divided into two groups of data, namely an unshielded face image and a shielded face image, and the two groups of data do not need to be matched. Two groups of data need to respectively extract key points of the face in each image (only visible key points of the occluded image are extracted), and the key point images and the RGB images are used as model training input together.

2. Design of model

The embodiment of the application improves a loop confrontation generation network, the network structure is shown in figure 3, the detected face key points are used as conditions to be input into the network, and a loss function is improved.

(1) And calculating the loss of the L2 by using the consistency of the key points of the generated graph of the generator and the key points of the original graph as new constraint items and calculating the key points of the visible part of the original shielding input image and the extracted key points of the corresponding generated image.

(2) Meanwhile, a condition discriminator is defined, and the input comprises three types: an unobstructed face and corresponding keypoints (positive samples), a generated face and corresponding keypoints under occlusion (negative samples), an unobstructed face and keypoints that do not match it (negative samples).

Three types of images are used in the data set for discriminant training, one is an image acquired in advance, and the other is an image generated by the discriminant.

1. The first part is the complete, unobstructed face acquired in advance, and the face key points (positive samples) detected and marked on these images.

2. The second part inputs not the real image collected in advance, but the front generator generates a face image according to the key points of the occluded face and the occluded face, and the generated key points marked by the non-occluded face and the occluded face are used as a third part input (negative samples).

3. The last part is a complete and unshielded face collected in advance, but random disturbance (translation, stretching deformation and the like) is added to the key points of the face detected and marked by the image, so that the positions of the key points and the face shape of the image cannot be matched correspondingly well (negative samples).

The last set of inputs functions to: faces with corresponding unmatched keypoints are added as additional false samples to force the generator to generate a better original location structure that matches the occluded face, otherwise the discriminator may think that the keypoint unmatched pairs are also true samples.

The three parts of inputs are three different data combinations, the first two parts are two kinds of inputs commonly used by a discriminator part of a countermeasure generation network, and the third part is a data input designed and added according to the characteristics of a model. The third part inputs a real face image and a key point mark of which the position does not correspond to match, and the input tells the discriminator what belongs to a negative sample. In operation, as long as the negative samples are added in the training set of the discriminator, the model structure and the loss function do not need to be changed, and the model parameters can be optimized according to the distribution of input data in the model training process.

The function of designing negative sample input is to learn the corresponding relation between the marked key points and the human face more accurately by the constraint model. The function of the discriminator is to judge whether the input image is an image generated according to key points or a true image which really exists, and if no strong constraint corresponding to the key points and the human face exists, the discriminator can judge whether the input image is the generated image by learning some characteristics such as human face textures. Thus, a situation may occur in which the generator synthesizes a highly realistic human face, which is the same as a real human face but is far from the position of the input key point (for example, a key point extracted from a small face that is blocked on a picture generates a large face, which is not the same as the position of the original image and is not suitable for the purpose of removing the blocking even though the face is highly similar to the real person), so that a set of negative samples (a real image and a key point mark that is not matched with the position) are added, and the generated image in which the training discriminator identifies the face type is the same but the position size is also the negative sample.

(3) Increasing identity consistency loss to ensure that the color distributions of the input image and the output image are similar, a given unobstructed face image should be kept unchanged after being generated through a network. Identity consistency loss is a loss function design of the generator network, independent of the arbiter. The input of the generator is a combination of an occluded face and the face key point marks thereof, and the output is the generated face without occlusion. The design of the loss function of consistency is that in the process of training a generator, an unobstructed face and key point marks thereof are input, and the expected output face image should be the same as the input image, so the constraint of consistency of the input image and the input image is added in the loss function.

(4) In order to make the system more specific in generation and discrimination, the specific occluded area is converted and identified, an activation map module (such as the attention part of the feature map in fig. 4) is added in the embodiment of the application. It can find out the most important area for judging the truth of a picture, so that the generator and the discriminator can generate and discriminate the area more specifically.

FIG. 4 shows the above G_ABThe input is the occluded face image and the key points of the detected face, and the output is the face image which is generated by the input and reconstructed by removing the occlusion.

In fig. 4, the sub-network modules of the up-sampling, down-sampling, and residual network modules are all existing network structures, the feature graph attention mechanism learns a set of feature weights, the learned weight information is added to each network feature on the sub-network structures, and then the learned weight information is propagated to the subsequent network, and the learned attention weight changing according to the input image is added to the residual network, so that the adaptive residual network is formed.

The input RGBA color image is firstly subjected to down-sampling, the resolution is reduced, and the feature dimension is increased (for example, the original color image is a 256 × 256 resolution image of 4 channels of RGBA, and is converted into a 64 × 64 resolution image of 32 channels through down-sampling), and then the feature information of high dimension is extracted through several levels of residual error networks. Attention weights are then extracted by the feature-coded post-classifier, and the weights are applied to the feature map. The last part is mirrored as before, and the size of the input image is restored by up-sampling through a residual network.

In summary, the two-dimensional reconstruction method based on the human face provided by the embodiment of the application obtains the non-occlusion human face image and the occlusion human face image; inputting the occluded human face image into a pre-trained key point recognition model to obtain key points in the occluded human face image; further, inputting the shielded face image and key points thereof into an image generator network to obtain a shield-removed reconstructed face image generated based on the shielded face image; the image generator network is obtained by training the non-shielded face image and key points thereof, and the shielded face image and key points thereof based on an image discriminator network. The blocked face image is accurately and conveniently restored.

Based on the same technical concept, an embodiment of the present application further provides a two-dimensional reconstruction system based on a human face, as shown in fig. 5, the system includes:

an image obtaining module 501, configured to obtain an unobstructed face image and an obstructed face image.

A keypoint determination module 502, configured to input the occluded face image into a pre-trained keypoint recognition model, so as to obtain keypoints in the occluded face image.

A face reconstruction module 503, configured to input the occluded face image and the key points thereof into an image generator network, so as to obtain a reconstructed face image without occlusion, which is generated based on the occluded face image; the image generator network is obtained by training the non-shielded face image and key points thereof, and the shielded face image and key points thereof based on an image discriminator network.

In a possible implementation manner, the face reconstruction module 503 is specifically configured to: inputting the blocked human face RGBA color image and key points thereof into a down-sampling module to reduce the resolution and increase the characteristic dimension; then extracting the characteristic information of high latitude through a residual error network module; and extracting attention weight through a feature map coding classifier so that the feature value attention module adds the learned attention weight which changes according to the input image to the other residual error network module, and restoring the size of the input image through an up-sampling module.

Based on the same technical concept, an embodiment of the present application further provides an apparatus, including: the device comprises a data acquisition device, a processor and a memory; the data acquisition device is used for acquiring data; the memory is to store one or more program instructions; the processor is configured to execute one or more program instructions to perform the method.

Based on the same technical concept, the embodiment of the present application also provides a computer storage medium, wherein the computer storage medium contains one or more program instructions, and the one or more program instructions are used for executing the method.

In the present specification, each embodiment of the method is described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. Reference is made to the description of the method embodiments.

It is noted that while the operations of the methods of the present invention are depicted in the drawings in a particular order, this is not a requirement or suggestion that the operations must be performed in this particular order or that all of the illustrated operations must be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

Although the present application provides method steps as in embodiments or flowcharts, additional or fewer steps may be included based on conventional or non-inventive approaches. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an apparatus or client product in practice executes, it may execute sequentially or in parallel (e.g., in a parallel processor or multithreaded processing environment, or even in a distributed data processing environment) according to the embodiments or methods shown in the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in processes, methods, articles, or apparatus that include the recited elements is not excluded.

The units, devices, modules, etc. set forth in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. For convenience of description, the above devices are described as being divided into various modules by functions, which are described separately. Of course, in implementing the present application, the functions of each module may be implemented in one or more pieces of software and/or hardware, or a module that implements the same function may be implemented by a combination of a plurality of sub-modules or sub-units, or the like. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, or the like, and includes several instructions for enabling a computer device (which may be a personal computer, a mobile terminal, a server, or a network device) to execute the method according to the embodiments or some parts of the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

The above-mentioned embodiments are further described in detail for the purpose of illustrating the invention, and it should be understood that the above-mentioned embodiments are only illustrative of the present invention and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A two-dimensional reconstruction method based on human faces is characterized by comprising the following steps:

acquiring an unobstructed face image and an obstructed face image;

2. A method for two-dimensional face-based reconstruction as claimed in claim 1, wherein the training process of the image generator network comprises the steps of:

3. The method of claim 1, wherein the step of inputting the occluded face image and the key points thereof into an image generator network to obtain an occluded reconstructed face image generated based on the occluded face image comprises:

inputting the RGBA color image of the shielded human face and key points thereof into a down-sampling module to reduce the resolution and increase the characteristic dimension; then extracting the characteristic information of high latitude through a residual error network module; and extracting attention weight through a feature map coding classifier so that the feature value attention module adds the learned attention weight which changes according to the input image to the other residual error network module, and restoring the size of the input image through an up-sampling module.

4. The two-dimensional reconstruction method based on human face as claimed in claim 3, wherein the eigenvalue attention module is used to distinguish whether the image obtained based on the auxiliary classifier is the source domain and the target domain of the occluded part to determine the densely transformed region.

5. A face-based two-dimensional reconstruction system, the system comprising:

6. A face based two dimensional reconstruction system as claimed in claim 5, wherein said training of said image generator network comprises the steps of:

7. The two-dimensional face-based reconstruction system of claim 5, wherein the face reconstruction module is specifically configured to: inputting the blocked face RGBA color image and key points thereof into a down-sampling module to reduce the resolution and increase the characteristic dimension; then extracting the characteristic information of high latitude through a residual error network module; and then extracting attention weight through a feature map coding classifier so that the feature value attention module adds the learned attention weight which changes according to the input image to another residual network module, and then restoring the size of the input image through an up-sampling module.

8. The system of claim 7, wherein the eigenvalue attention module is used to distinguish whether the image obtained based on the auxiliary classifier is a source domain and a target domain of an occluded part to determine a densely transformed region.

9. A two-dimensional face-based reconstruction apparatus, comprising:

a memory for storing a computer program;

a processor for implementing the steps of a method for two-dimensional face-based reconstruction as claimed in any one of claims 1 to 4 when executing said computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of a method for two-dimensional face-based reconstruction according to any one of claims 1 to 4.