CN114399590A

CN114399590A - Face occlusion removal and three-dimensional model generation method based on face analysis graph

Info

Publication number: CN114399590A
Application number: CN202111592070.7A
Authority: CN
Inventors: 赵大鹏; 蔡锦康; 齐越
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-12-23
Filing date: 2021-12-23
Publication date: 2022-04-26

Abstract

The embodiment of the disclosure discloses a face occlusion removal and three-dimensional model generation method based on a face analysis graph. One embodiment of the method comprises: inputting the occlusion face image into a pre-trained deep learning network to obtain an occlusion face analytical image; inputting the shielded face image into a pre-trained feature point extraction network to obtain a face feature point group; connecting each face feature point in the face feature point group to generate a feature point face analysis graph; combining the shielding face analysis graph and the feature point face analysis graph to obtain a non-shielding face analysis graph; inputting the non-occlusion face analysis image and the occlusion face image into a pre-trained face synthesis network to obtain a non-occlusion face rendering image; and inputting the non-shielding face rendering image into a pre-trained three-dimensional face synthesis model to obtain a three-dimensional face model. The embodiment effectively removes the occlusion in the face image.

Description

Face occlusion removal and three-dimensional model generation method based on face analysis graph

Technical Field

The embodiment of the disclosure relates to the field of computer vision, in particular to a face occlusion removal and three-dimensional model generation method based on a face analysis graph.

Background

With the development of computer information technology, how to generate a three-dimensional face model from a two-dimensional face picture becomes an important research topic. At present, when a three-dimensional face model is generated according to a two-dimensional face photograph, the method generally adopted is as follows: and obtaining a three-dimensional face image according to the two-dimensional image by using a deep learning method.

However, when generating a three-dimensional face model from a two-dimensional face photograph in the above manner, there are often technical problems as follows:

when a three-dimensional face image is obtained according to a two-dimensional image by using a deep learning method, the occlusion in the face image cannot be effectively removed.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose face occlusion removal and three-dimensional model generation methods, apparatuses, devices, and media to address one or more of the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a face mask removal and three-dimensional model generation method, including: inputting the occlusion face image into a pre-trained deep learning network to obtain an occlusion face analytical image; inputting the shielding face image into a pre-trained feature point extraction network to obtain a face feature point group; connecting each face feature point in the face feature point group to generate a feature point face analysis graph; combining the occlusion face analysis graph and the feature point face analysis graph to obtain an occlusion-free face analysis graph; inputting the non-shielding face analysis image and the shielding face image into a pre-trained face synthesis network to obtain a non-shielding face rendering image; and inputting the non-shielding face rendering image into a pre-trained three-dimensional face synthesis model to obtain a three-dimensional face model.

In a second aspect, some embodiments of the present disclosure provide a face mask removal and three-dimensional model generation apparatus, the apparatus comprising: the first input unit is configured to input the occlusion face image into a pre-trained deep learning network to obtain an occlusion face analysis image; the second input unit is configured to input the shielding face image into a pre-trained feature point extraction network to obtain a face feature point group; a connection processing unit configured to perform connection processing on each of the face feature points in the face feature point group to generate a feature point face analysis map; a combination processing unit configured to combine the occlusion face analysis graph and the feature point face analysis graph to obtain an occlusion-free face analysis graph; a third input unit, configured to input the non-occlusion face analysis graph and the occlusion face graph into a pre-trained face synthesis network to obtain a non-occlusion face rendering graph; and the fourth input unit is configured to input the non-shielding face rendering image into a pre-trained three-dimensional face synthesis model to obtain a three-dimensional face model.

In a third aspect, some embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method described in any of the implementations of the first aspect.

In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium on which a computer program is stored, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect.

The above embodiments of the present disclosure have the following beneficial effects: by the face occlusion removal and three-dimensional model generation methods of some embodiments of the present disclosure, occlusion in a face image can be effectively removed. Specifically, the reason why the occlusion in the face image cannot be effectively removed is: when a three-dimensional face image is obtained according to a two-dimensional image by using a deep learning method, the occlusion in the face image cannot be effectively removed. Based on this, according to the face occlusion removal and three-dimensional model generation method of some embodiments of the present disclosure, first, an occlusion face image is input to a pre-trained deep learning network to obtain an occlusion face analysis image. Therefore, data support can be provided for obtaining the non-occlusion face analysis graph. Secondly, inputting the shielding face image into a pre-trained feature point extraction network to obtain a face feature point group; and connecting each face characteristic point in the face characteristic point group to generate a characteristic point face analysis graph. Therefore, data support can be provided for obtaining the non-occlusion face analysis graph. Moreover, because the face characteristic points are extracted, the face characteristic points can be combined with a deep learning method to generate a three-dimensional face model, so that the shielding in the face image can be effectively removed. And then, combining the occlusion face analysis graph and the feature point face analysis graph to obtain a non-occlusion face analysis graph. Therefore, the generation of the non-occlusion face rendering image is facilitated. And then inputting the non-shielding face analysis graph and the shielding face graph into a pre-trained face synthesis network to obtain a non-shielding face rendering graph. Therefore, a three-dimensional face model can be obtained. And finally, inputting the non-shielding face rendering image into a pre-trained three-dimensional face synthesis model to obtain a three-dimensional face model. Thus, the generation of the three-dimensional face model is completed. And effectively removing the occlusion in the face image.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.

FIG. 1 is a schematic diagram of one application scenario of the face occlusion removal and three-dimensional model generation methods of some embodiments of the present disclosure;

FIG. 2 is a flow diagram of some embodiments of a face mask removal and three-dimensional model generation method according to the present disclosure;

FIG. 3 is a schematic structural diagram of some embodiments of a face mask removal and three-dimensional model generation apparatus according to the present disclosure;

FIG. 4 is a schematic block diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

FIG. 1 is a schematic diagram of an application scenario of the face occlusion removal and three-dimensional model generation methods of some embodiments of the present disclosure.

In the application scenario of fig. 1, first, the computing device 101 may input the occlusion face map 102 into a depth learning network trained in advance, resulting in an occlusion face analysis map 103. Secondly, the computing device 101 may input the above-mentioned occlusion face image 102 to a pre-trained feature point extraction network to obtain a face feature point group 104. Then, the computing device 101 may perform a connection process on the individual face feature points in the above-described face-face feature point group 104 to generate a feature point face analysis map 105. Then, the computing device 101 may combine the occlusion face analysis graph 103 and the feature point face analysis graph 105 to obtain an occlusion-free face analysis graph 106. Then, the computing device 101 may input the non-occlusion face analysis graph 106 and the occlusion face graph 102 to a pre-trained face synthesis network, so as to obtain a non-occlusion face rendering graph 107. Finally, the computing device 101 may input the above-mentioned non-occlusion face rendering map 107 into a pre-trained three-dimensional face synthesis model, so as to obtain a three-dimensional face model 108.

The computing device 101 may be hardware or software. When the computing device is hardware, it may be implemented as a distributed cluster composed of multiple servers or terminal devices, or may be implemented as a single server or a single terminal device. When the computing device is embodied as software, it may be installed in the hardware devices enumerated above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.

It should be understood that the number of computing devices in FIG. 1 is merely illustrative. There may be any number of computing devices, as implementation needs dictate.

With continued reference to FIG. 2, a flow 200 of some embodiments of a face occlusion removal and three-dimensional model generation method according to the present disclosure is shown. The face occlusion removal and three-dimensional model generation method comprises the following steps:

step 201, inputting the occlusion face image into a pre-trained deep learning network to obtain an occlusion face analysis image.

In some embodiments, an executing subject of the face occlusion removal and three-dimensional model generation method (e.g., the computing device 101 shown in fig. 1) may input an occlusion face map into a pre-trained deep learning network, resulting in an occlusion face parse map. The deep learning network may be a network that extracts face feature vectors in an occlusion face image and then fuses the face feature vectors to obtain an occlusion face analysis image, and the deep learning network may include but is not limited to: SegNet networks.

Step 202, inputting the occlusion face image into a pre-trained feature point extraction network to obtain a face feature point group.

In some embodiments, the execution subject may input the occlusion face image to a pre-trained feature point extraction network to obtain a face feature point group. The feature point extraction network may be a network for extracting a preset number of feature points of the face of the person from the mask face image. The preset number may be a preset number of extracted facial feature points. For example, the preset number may be: 68. The facial feature points may be points extracted by the feature point extraction network to characterize facial features of the human face. Here, the face facial features may be: human skin, nose, eyes, eyebrows, and lips.

Optionally, before step 202, the sample of the unobstructed face image is input into a lightweight neural network to train the lightweight neural network, and the trained lightweight neural network is obtained as a feature point extraction network.

In some embodiments, the execution subject may input the non-occlusion face image sample to a lightweight neural network to train the lightweight neural network, and obtain the trained lightweight neural network as a feature point extraction network. The lightweight neural network may be a network initially used for training to obtain a feature point extraction network, and the lightweight neural network may include, but is not limited to: a MobileNet-V3 network.

As an example, inputting the sample of the non-occluded face image into the lightweight neural network to train the lightweight neural network, and obtaining the trained lightweight neural network as the feature point extraction network may include the following steps:

firstly, determining the network structure of the lightweight neural network and initializing the network parameters of the lightweight neural network.

And secondly, acquiring a non-occlusion face image sample. The non-shielding face image sample comprises a non-shielding face image and a face feature point group corresponding to the non-shielding face image.

And thirdly, respectively taking the non-occlusion face image and the face feature point group in the non-occlusion face image sample as the input and the expected output of the lightweight neural network, and training the lightweight neural network by using a deep learning method.

And fourthly, determining the trained lightweight neural network as the feature point extraction network.

In practice, the loss function used to train the lightweight neural network is:

wherein,

to train the lightweight neural network using the loss function,

is the ith individual face eigenvalue in the matrix of face eigenvalues,

is the ith personal face characteristic true value in a matrix consisting of the personal face characteristic true values, Zr is a matrix consisting of the personal face characteristic values,

represents a 2-normThe face feature value is a coordinate value of a face feature point in the non-occluded face image, and the face feature true value is a coordinate value of a face feature point in the face feature point group.

Step 203, connecting each face feature point in the face feature point group to generate a feature point face analysis map.

In some embodiments, the execution subject may perform a connection process on each facial feature point in the facial feature point group to generate a feature point face analysis map. Wherein, the above feature point face analysis graph includes: human skin, nose, eyes, eyebrows, and lips. The above-mentioned connection processing may be processing of connecting the face feature points by various methods. Here, the face feature points may be connected by an OpenCV algorithm.

And 204, combining the occlusion face analysis graph and the feature point face analysis graph to obtain a non-occlusion face analysis graph.

In some embodiments, the execution subject may combine the occlusion face analysis map and the feature point face analysis map to obtain an occlusion-free face analysis map.

In practice, the execution subject may obtain the non-occlusion face analysis map by:

firstly, carrying out backup processing on the occlusion face analytic graph to obtain a backup occlusion face analytic graph as a target analytic graph.

Secondly, for each feature pixel in the feature point face analysis image, executing the following processing steps:

1. and determining whether the gray value corresponding to the characteristic pixel is in a first preset range. The first preset range may be a range of preset gray-scale values corresponding to pixels.

2. And in response to determining that the gray value corresponding to the feature pixel is within the first preset range, determining whether the gray value corresponding to the occlusion pixel corresponding to the feature pixel in the occlusion face analysis image is within the first preset range.

3. And replacing the target pixel corresponding to the shielding pixel in the target analysis map with the characteristic pixel to update the target analysis map in response to determining that the gray value corresponding to the shielding pixel corresponding to the characteristic pixel in the shielding face analysis map is not within the first preset range.

And thirdly, determining the updated target analysis graph as an updated target analysis graph. The updated target analysis map may be an analysis map obtained by performing a processing step on each feature pixel in the feature point face analysis map.

Fourthly, for each feature pixel in the feature point face analysis image, the following replacing steps are executed:

1. and determining whether the gray value corresponding to the characteristic pixel is in a second preset range. The second preset range may be a range of gray scale values corresponding to preset pixels different from the first preset range.

2. And in response to determining that the gray value corresponding to the feature pixel is within the second preset range, determining whether the gray value corresponding to the occlusion pixel corresponding to the feature pixel in the occlusion face analysis map is within the second preset range.

3. And replacing the update target pixel corresponding to the shielding pixel in the update target analysis map with the characteristic pixel to update the update target analysis map in response to determining that the gray value corresponding to the shielding pixel corresponding to the characteristic pixel in the shielding face analysis map is not within the second preset range.

And fifthly, determining the updated target analysis graph as a non-occlusion face analysis graph. The updated target analysis map may be an analysis map obtained by performing a replacement step on each feature pixel in the feature point face analysis map.

And step 205, inputting the non-occlusion face analysis graph and the occlusion face graph into a pre-trained face synthesis network to obtain a non-occlusion face rendering graph.

In some embodiments, the execution subject may input the non-occlusion face analysis graph and the occlusion face graph to a pre-trained face synthesis network to obtain a non-occlusion face rendering graph. The face synthesis network may be configured to distinguish and modify the non-occlusion face analysis graph and the occlusion face graph to generate a non-occlusion face rendering graph. Optionally, before step 205, the non-occlusion face analysis graph sample and the occlusion face graph sample are input to the generative confrontation network to train the generative confrontation network, and the trained generative confrontation network is obtained as the face synthesis network.

In some embodiments, the executing agent may input the non-occlusion face analysis graph sample and the occlusion face graph sample to a generative confrontation network to train the generative confrontation network, and obtain the trained generative confrontation network as the face synthesis network. The generative confrontation network may be a network initially used for training to obtain a face synthesis network, and the generative confrontation network may include but is not limited to: pix2PixHD networks.

As an example, inputting the non-occlusion face analysis graph sample and the occlusion face graph sample into the generative confrontation network to train the generative confrontation network, and obtaining the trained generative confrontation network as the face synthesis network may include the following steps:

the method comprises the steps of firstly, determining the network structure of the generative countermeasure network and initializing the network parameters of the generative countermeasure network.

And secondly, acquiring a non-shielding face analysis image sample and a shielding face image sample. The non-shielding face analytic pattern comprises a non-shielding face analytic pattern, and the shielding face image sample comprises a shielding face image and a non-shielding face rendering image corresponding to the shielding face image.

And thirdly, taking an unoccluded face analysis graph in the unoccluded face analysis graph sample and an occluded face graph in the occluded face graph sample as the input of the generative confrontation network, taking an unoccluded face rendering graph in the occluded face graph sample as the expected output of the generative confrontation network, and training the generative confrontation network by using a deep learning method.

And fourthly, determining the trained generative confrontation network as the face synthesis network.

In practice, the loss function used to train the generative confrontation network is:

the penalty function:

wherein,

to combat the loss function, I_facRepresenting an occlusion face map, D_1，2Represents D₁And D₂Two discriminators of different types are provided,

for expressing symbols for evaluation functions, M_αRepresenting an occlusion face analysis graph, I_outRepresenting an unobstructed face rendering.

Feature conformity loss function:

wherein,

in order to characterize the conformity loss function,

for evaluation function representation symbols, T represents the discriminator D₁And D₂The total number of layers of (a) and (b),

representation discriminator D₁And D₂Acting on the t-th convolutional layer of the generative countermeasure network, I_facRepresenting an occlusion face map, M_αRepresenting an occlusion face analysis graph, I_outRepresenting an unobstructed face rendering map, | | | | | non-calculation₁Representing a 1-norm.

Perceptual loss function:

wherein,

in order to be a function of the perceptual loss,

the symbols are represented for the evaluation function, N represents the total number of convolutional layers determining the perceptual loss, F^(o)A feature diagram representing the output of the O-th convolutional layer of the convolutional neural network VGG (visual Geometry group), I_facRepresenting an occlusion face image, I_outRepresenting an unobstructed face rendering map, | | | | | non-calculation₁Denotes the 1-norm, M_oThe size of the feature map is shown.

Establishing a comprehensive loss function of an antagonistic loss function, a characteristic conformity loss function and a perception loss function:

wherein,

in order to combat the loss-function,

in order to characterize the conformity loss function,

as a function of perceptual loss, λ₁＝λ₂＝10。

And step 206, inputting the non-shielding face rendering image into a pre-trained three-dimensional face synthesis model to obtain a three-dimensional face model.

In some embodiments, the execution subject may input the non-occlusion face rendering image into a pre-trained three-dimensional face synthesis model to obtain a three-dimensional face model. The three-dimensional face synthesis model can be a model which establishes an input non-shielding face rendering image to obtain a three-dimensional face model corresponding to the non-shielding face rendering image.

Optionally, before step 206, the non-occluded face rendering image sample is input to a face three-dimensional reconstruction network to train the face three-dimensional reconstruction network, and the trained face three-dimensional reconstruction network is obtained as a three-dimensional face synthesis model.

In some embodiments, the execution subject may input the non-occlusion face rendering image sample to a face three-dimensional reconstruction network to train the face three-dimensional reconstruction network, and obtain the trained face three-dimensional reconstruction network as a three-dimensional face synthesis model. The three-dimensional face reconstruction network may be a network initially used for training to obtain a three-dimensional face synthesis model, and the three-dimensional face reconstruction network may include but is not limited to: bessel Face model BFM (Basel Face model).

As an example, inputting the sample of the non-occlusion face rendering image into a face three-dimensional reconstruction network to train the face three-dimensional reconstruction network, and obtaining the trained face three-dimensional reconstruction network as a three-dimensional face synthesis model may include the following steps:

the method comprises the steps of firstly, determining a network structure of the human face three-dimensional reconstruction network and initializing network parameters of the human face three-dimensional reconstruction network.

And secondly, obtaining a sample of the non-shielding face rendering image. The non-shielding face rendering pattern book comprises a non-shielding face rendering image and a three-dimensional face model corresponding to the non-shielding face rendering image.

And thirdly, respectively taking the non-shielding face rendering image and the three-dimensional face model in the non-shielding face rendering image sample as the input and the expected output of the face three-dimensional reconstruction network, and training the face three-dimensional reconstruction network by using a deep learning method.

And fourthly, determining the trained three-dimensional face reconstruction network as the three-dimensional face synthesis model.

In practice, the loss function used for training the three-dimensional face reconstruction network is as follows:

characteristic point loss function:

wherein,

n is a function of the loss of the characteristic point, and represents the total number of convolution layers for determining the loss of the characteristic point,

to render a non-occluded face_outInput to feature point extraction network Z_lmkThe obtained characteristic point diagram is used as a characteristic point diagram,

to render face image sample I_yInput to feature point extraction network Z_lmkThe obtained feature point graph, | | | | non-conducting phosphor₂The expression is given in the 2-norm,

rendering face map samples I, representing the square of the 2-norm_yThe execution subject may input the non-occlusion face image into the face three-dimensional reconstruction network to obtain a three-dimensional face model, and determine a rendered face image sample by various methods. Here, the execution subject may determine to render the face map sample by a perspective projection method.

Pixel level loss function:

wherein,

is a pixel level loss function, k is the pixel sequence number of pixels in the non-shielding face rendering image and the rendering face image sample, M is a face projection area, | | | |₂Denotes the 2-norm, I_outRepresenting an unobstructed face rendering, I_yTo render face image samples, P_kThe number of attention weights of the occlusion region. Here, P_kThe specific values of (A) are as follows:

regular loss function:

wherein,

as a regular loss function, ω_α＝1.0，ω_β＝1.75e-3，

And

representing the eigenvalues of the dependency matrix in descending order, | | | | | | non-calculation₂The expression is given in the 2-norm,

representing the square of the 2-norm.

Face feature loss function:

wherein,

is a face feature loss function, | | | | | represents a 2-norm, G () represents a feature extraction function used in the face recognition method FaceNet method,<G(I_out)，G(I_y)>represents G (I)_out) And G (I)_y) The inner product of (d).

Establishing a comprehensive loss function of a characteristic point loss function, a pixel level loss function, a regular loss function and a face characteristic loss function:

wherein λ is₃＝1.6e-3，λ₄＝1.4，λ₅＝3.7e-4，λ₆＝0.2，

In order to be a function of the loss of the characteristic point,

as a function of the loss of pixel level,

in order to be a regular loss function,

is a face feature loss function.

With further reference to fig. 3, as an implementation of the methods shown in the above figures, the present disclosure provides some embodiments of a face mask removal and three-dimensional model generation apparatus, which correspond to those shown in fig. 2, and which may be applied in various electronic devices.

As shown in fig. 3, the face mask removal and three-dimensional model generation apparatus 300 of some embodiments includes: a first input unit 301, a second input unit 302, a connection processing unit 303, a combination processing unit 304, a third input unit 305, and a fourth input unit 306. The first input unit 301 is configured to input the occlusion face image into a pre-trained deep learning network, so as to obtain an occlusion face analysis image; the second input unit 302 is configured to input the above-mentioned occlusion face image into a pre-trained feature point extraction network, so as to obtain a face feature point group; the connection processing unit 303 is configured to perform connection processing on each of the face feature points in the face feature point group to generate a feature point face analysis map; the combination processing unit 304 is configured to combine the occlusion face analysis graph and the feature point face analysis graph to obtain an occlusion-free face analysis graph; the third input unit 305 is configured to input the non-occlusion face analysis graph and the occlusion face graph into a pre-trained face synthesis network, so as to obtain a non-occlusion face rendering graph; the fourth input unit 306 is configured to input the above-mentioned non-occluded face rendering map into a pre-trained three-dimensional face synthesis model, resulting in a three-dimensional face model.

It will be understood that the units described in the apparatus 300 correspond to the various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 300 and the units included therein, and are not described herein again.

Referring now to FIG. 4, a block diagram of an electronic device (such as computing device 101 shown in FIG. 1)400 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 4, electronic device 400 may include a processing device (e.g., central processing unit, graphics processor, etc.) 401 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage device 408 into a Random Access Memory (RAM) 403. In the RAM403, various programs and data necessary for the operation of the electronic apparatus 400 are also stored. The processing device 401, the ROM 402, and the RAM403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

Generally, the following devices may be connected to the I/O interface 405: input devices 405 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 407 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 408 including, for example, tape, hard disk, etc.; and a communication device 409. The communication means 409 may allow the electronic device 400 to communicate wirelessly or by wire with other devices to exchange data. While fig. 4 illustrates an electronic device 400 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 4 may represent one device or may represent multiple devices as desired.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network through the communication device 409, or from the storage device 408, or from the ROM 402. The computer program, when executed by the processing apparatus 401, performs the above-described functions defined in the methods of some embodiments of the present disclosure.

It should be noted that the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: and inputting the occlusion face image into a pre-trained deep learning network to obtain an occlusion face analytic image. And inputting the shielding face image into a pre-trained feature point extraction network to obtain a face feature point group. And connecting each face characteristic point in the face characteristic point group to generate a characteristic point face analysis graph. And combining the occlusion face analysis graph and the feature point face analysis graph to obtain a non-occlusion face analysis graph. And inputting the non-shielding face analysis image and the shielding face image into a pre-trained face synthesis network to obtain a non-shielding face rendering image. And inputting the non-shielding face rendering image into a pre-trained three-dimensional face synthesis model to obtain a three-dimensional face model.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first input unit, a second input unit, a connection processing unit, a combination processing unit, a third input unit, and a fourth input unit. The names of the units do not form a limitation on the units themselves in some cases, for example, the first input unit may also be described as "a unit that inputs an occlusion face image into a pre-trained deep learning network to obtain an occlusion face analysis image".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A face occlusion removal and three-dimensional model generation method comprises the following steps:

inputting the occlusion face image into a pre-trained deep learning network to obtain an occlusion face analytical image;

inputting the shielding face image into a pre-trained feature point extraction network to obtain a face feature point group;

connecting each face feature point in the face feature point group to generate a feature point face analysis graph;

combining the occlusion face analysis graph and the feature point face analysis graph to obtain an occlusion-free face analysis graph;

inputting the non-shielding face analysis graph and the shielding face graph into a pre-trained face synthesis network to obtain a non-shielding face rendering graph;

and inputting the non-shielding face rendering image into a pre-trained three-dimensional face synthesis model to obtain a three-dimensional face model.

2. The method according to claim 1, wherein before inputting the occlusion face image into a pre-trained feature point extraction network to obtain a face feature point group, the method further comprises:

and inputting the non-occlusion face image sample into a lightweight neural network to train the lightweight neural network, and taking the trained lightweight neural network as a feature point extraction network.

3. The method according to claim 1, wherein the combining the occlusion face analysis graph and the feature point face analysis graph to obtain an occlusion-free face analysis graph comprises:

carrying out backup processing on the occlusion face analytic graph to obtain a backup occlusion face analytic graph as a target analytic graph;

for each feature pixel in the feature point face analysis graph, executing the following processing steps:

determining whether the gray value corresponding to the characteristic pixel is in a first preset range;

in response to determining that the gray value corresponding to the feature pixel is within the first preset range, determining whether the gray value corresponding to the occlusion pixel corresponding to the feature pixel in the occlusion face analysis graph is within the first preset range;

in response to determining that the gray value corresponding to the shielding pixel of the shielding face analysis graph corresponding to the feature pixel is not within the first preset range, replacing the target pixel of the target analysis graph corresponding to the shielding pixel with the feature pixel so as to update the target analysis graph;

and determining the updated target analysis chart as an updated target analysis chart.

4. The method of claim 3, wherein the method further comprises:

for each feature pixel in the feature point face analysis graph, the following replacement steps are carried out:

determining whether the gray value corresponding to the characteristic pixel is in a second preset range;

in response to determining that the gray value corresponding to the feature pixel is within the second preset range, determining whether the gray value corresponding to the occlusion pixel corresponding to the feature pixel in the occlusion face analysis graph is within the second preset range;

in response to determining that the gray value corresponding to the shielding pixel corresponding to the feature pixel in the shielding face analysis graph is not within the second preset range, replacing the updating target pixel corresponding to the shielding pixel in the updating target analysis graph with the feature pixel so as to update the updating target analysis graph;

and determining the updated target analysis graph as an unobstructed face analysis graph.

5. The method of claim 1, wherein before the inputting the non-occlusion face analysis graph and the occlusion face graph into a pre-trained face synthesis network to obtain a non-occlusion face rendering graph, the method further comprises:

inputting the non-shielding face analysis image sample and the shielding face image sample into a generating type confrontation network to train the generating type confrontation network, and obtaining the trained generating type confrontation network as a face synthesis network.

6. The method of claim 1, wherein before the inputting the non-occluded face rendering map into a pre-trained three-dimensional face synthesis model, resulting in a three-dimensional face model, the method further comprises:

and inputting the non-shielding face rendering image sample into a face three-dimensional reconstruction network to train the face three-dimensional reconstruction network, and obtaining the trained face three-dimensional reconstruction network as a three-dimensional face synthesis model.

7. A face occlusion removal and three-dimensional model generation apparatus, comprising:

the first input unit is configured to input the occlusion face image into a pre-trained deep learning network to obtain an occlusion face analysis image;

the second input unit is configured to input the shielding face image into a pre-trained feature point extraction network to obtain a face feature point group;

a connection processing unit configured to perform connection processing on each of the face feature points in the face feature point group to generate a feature point face analysis map;

the combined processing unit is configured to combine the occlusion face analysis graph and the feature point face analysis graph to obtain a non-occlusion face analysis graph;

a third input unit, configured to input the non-occlusion face analysis graph and the occlusion face graph into a pre-trained face synthesis network to obtain a non-occlusion face rendering graph;

and the fourth input unit is configured to input the non-shielding face rendering image into a pre-trained three-dimensional face synthesis model to obtain a three-dimensional face model.

8. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.

9. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1 to 6.