CN112396692A

CN112396692A - Face reconstruction method and device, computer equipment and storage medium

Info

Publication number: CN112396692A
Application number: CN202011337942.0A
Authority: CN
Inventors: 林纯泽; 陈祖凯; 王权; 徐胜伟; 钱晨
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2021-02-23
Anticipated expiration: 2040-11-25
Also published as: CN112396692B; TW202221645A; KR20230098313A; WO2022110855A1; JP2023551247A

Abstract

The present disclosure provides a face reconstruction method, apparatus, computer device and storage medium, wherein the method comprises obtaining dense point information of an original face included in a target image; fitting dense point information of an original face by utilizing dense point information of first reference faces corresponding to a plurality of reference images respectively to obtain target coefficients corresponding to dense point information of a plurality of groups of first reference faces respectively; determining dense point information of a target face model based on dense point information of a second reference face with a preset style and target coefficients corresponding to the dense point information of multiple groups of first reference faces respectively; the second reference face is generated based on the first reference face in the reference image; and generating a target face model corresponding to the original face of the target image based on the dense point information of the target face model.

Description

Face reconstruction method and device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a face reconstruction method, an apparatus, a computer device, and a storage medium.

Background

The face reconstruction can establish a virtual face three-dimensional model according to a real face or the preference of the face reconstruction, and the face reconstruction has wide application in the fields of games, cartoons, virtual social contact and the like. For example, in a game, a player can generate a virtual face three-dimensional model according to a real face included in an image provided by the player through a face reconstruction system provided by a game program, and participate in the game with a more substituted sense by using the created virtual face three-dimensional model.

At present, the similarity between a virtual human face three-dimensional model obtained based on a human face reconstruction method and a real human face is low.

Disclosure of Invention

The embodiment of the disclosure at least provides a face reconstruction method, a face reconstruction device, computer equipment and a storage medium.

In a first aspect, an embodiment of the present disclosure provides a face reconstruction method, including: acquiring dense point information of an original face included in a target image; fitting the dense point information of the original face by utilizing dense point information of a first reference face corresponding to a plurality of reference images respectively to obtain a plurality of groups of target coefficients corresponding to the dense point information of the first reference face respectively; determining dense point information of a target face model based on dense point information of a second reference face with a preset style and multiple groups of target coefficients corresponding to the dense point information of the first reference face respectively; the second reference face is generated based on the first reference face in the reference image; and generating a target face model corresponding to the original face of the target image based on the dense point information of the target face model.

In this embodiment, the target coefficient is used as a medium to establish an association relationship between the dense point information of the original face and the dense point information of the plurality of first reference faces, and the association relationship can characterize the association between the dense point information of the second reference face determined based on the dense point information of the first reference face and the dense point information of the target face model established based on the original face, so that the generated target face model has the features (such as shape features and the like) of the original face in the target image, has a higher similarity with the original face, and can have a preset style.

In an optional implementation manner, the determining dense point information of a target face model based on dense point information of a second reference face having a preset style and multiple sets of target coefficients corresponding to the dense point information of the first reference face respectively includes: generating average value information of the dense point information of the multiple groups of second reference faces based on the dense point information of the multiple groups of second reference faces; and generating the dense point information of the target face model based on the target coefficients respectively corresponding to the dense point information of the multiple groups of second reference faces, the average value information and the dense point information of the multiple groups of first reference faces.

In an optional implementation manner, the generating dense point information of the target face model based on target coefficients corresponding to multiple sets of dense point information of the second reference face, the average value information, and multiple sets of dense point information of the first reference face respectively includes: determining difference value information of the dense point information of each group of second reference faces based on the dense point information of each group of second reference faces in the dense point information of the groups of second reference faces and the average value information; performing interpolation processing on difference value information respectively corresponding to the dense point information of the multiple groups of second reference faces on the basis of target coefficients respectively corresponding to the dense point information of the multiple groups of first reference faces; and generating dense point information of the target face model based on the interpolation processing result and the mean value information.

In the embodiment, the average value information can be used for accurately representing the average characteristics of dense point information of a plurality of groups of second reference faces; the difference between the dense point information of each group of second reference faces and the average characteristic of the dense point information of the multiple groups of second reference faces can be accurately represented by using the difference information of the dense point information of each group of second reference faces, so that the average value information is adjusted by using more accurate difference information, and the dense point information of the target face model is more simply determined.

In an optional implementation manner, the obtaining dense point information of an original face included in a target image includes: acquiring the target image comprising the original face; and processing the target image by using a pre-trained neural network to obtain dense point information of the original face in the target image.

In the embodiment, the face features of the original face in the target image can be more accurately represented by using the dense point information of the original face.

In an optional embodiment, the method further comprises: acquiring a plurality of reference images including a first reference face; and processing each reference image by utilizing a pre-trained neural network aiming at each reference image in the plurality of reference images to obtain dense point information of the first reference face in each reference image.

In the embodiment, the dense point information of the first reference face is utilized, and the face features corresponding to the first reference face in the reference image can be represented more accurately. Meanwhile, a plurality of reference images containing the first reference face can cover the face appearance characteristics as wide as possible.

In an optional implementation manner, the fitting the dense point information of the original face with the dense point information of the first reference face corresponding to each of the multiple reference images to obtain multiple sets of target coefficients corresponding to the dense point information of the first reference face respectively includes: performing least square processing on the dense point information of the original face and the dense point information of the first reference face to obtain a plurality of groups of intermediate coefficients corresponding to the dense point information of the first reference face respectively; and determining a target coefficient corresponding to the dense point information of each group of the first reference faces based on the intermediate coefficient corresponding to the dense point information of each group of the first reference faces in the dense point information of the groups of the first reference faces.

In the embodiment, the fitting condition when the dense point information of the original face is fitted by using the dense point information of the plurality of first reference faces can be accurately represented by using the target coefficient.

In an optional implementation manner, the determining, based on the intermediate coefficient corresponding to the dense point information of each group of the first reference faces in the multiple groups of dense point information of the first reference faces, a target coefficient corresponding to the dense point information of each group of the first reference faces includes: determining first dense point information representing the part of the first reference face corresponding to the target face model from the dense point information of each group of the first reference faces; adjusting an intermediate coefficient corresponding to the first type of dense point information in the dense point information of the first reference face to obtain a first target coefficient; determining an intermediate coefficient corresponding to second dense point information in dense point information of the first reference face as a second target coefficient; the second type of dense point information is dense point information except the first type of dense point information in the dense point information of the first reference face; and obtaining the target coefficient of the dense point information of each group of the first reference face based on the first target coefficient and the second target coefficient.

In an optional embodiment, the method further comprises: adjusting dense point information of a first reference face of the first reference face in the reference image to obtain dense point information of a second reference face with a preset style; or generating a virtual face image comprising a second reference face with the preset style based on a first reference face in the reference image; and generating dense point information of the second reference face in the virtual face image by utilizing a pre-trained neural network.

In this embodiment, by adjusting the target coefficients corresponding to part of the face, the dense point information obtained by fitting the target coefficients and the dense point information of the plurality of corresponding first reference faces can be made to be similar to the dense point information of the original face corresponding to the target image, that is, the obtained target coefficients can more accurately represent the coefficients when the dense points of the first reference faces are fitted to the dense points of the original face.

In an alternative embodiment, training the neural network comprises: acquiring a sample image set; the sample image set comprises a plurality of first sample images of a first sample face and a second sample image of a second sample face; the plurality of first sample images are divided into a plurality of first sample image subsets, and each first sample image subset comprises images of first sample faces with the same expression, which are acquired from a plurality of preset acquisition angles respectively; acquiring dense point information of a first sample face of a first sample image and dense point information of a second sample face of a second sample image in the sample image set; performing feature learning on a first sample image and a second sample image in the sample image set by using an initial neural network to obtain predicted dense point information of a first sample face of the first sample image and predicted dense point information of a second sample face of the second sample image; and training the initial neural network by using the dense point information and the predicted dense point information of the first sample face and the dense point information and the predicted dense point information of the second sample face, and obtaining the neural network after training.

In an optional implementation, the obtaining dense point information of the second sample face of the second sample image includes: acquiring face key point information of each second sample image; and fitting and generating dense point information of a second sample face of a second sample image by using the face key point information of the second sample image and the second sample image.

In an optional embodiment, the sample image set further includes: a third sample image; the third sample image is obtained by performing data enhancement processing on the first sample image;

the face reconstruction method further comprises the following steps: acquiring dense point information of a third sample face of the third sample image; performing feature learning on a third sample image by using the initial neural network to obtain predicted dense point information of a third sample face of the third sample image;

the training of the initial neural network by using the dense point information and the predicted dense point information of the first sample face and the dense point information and the predicted dense point information of the second sample face, and obtaining the neural network after the training, includes: and training the initial neural network by using the dense point information and the predicted dense point information of the first sample face, the dense point information and the predicted dense point information of the second sample face, and the dense point information and the predicted dense point information of the third sample face, and obtaining the neural network after training.

In an alternative embodiment, the data enhancement process includes at least one of: random occlusion processing, gaussian noise processing, motion blur processing, and color area channel change processing.

In this embodiment, the number of the first sample image, the second sample image, and the third sample image can be adjusted to obtain neural networks with different advantages, so as to obtain a better neural network for actual needs; meanwhile, the third sample image is obtained through data enhancement processing, so that the trained neural network has stronger data processing capability when the third sample image is included in the sample image.

Meanwhile, the face included in the second sample image can cover the face appearance features as wide as possible, so that the obtained neural network has better generalization capability.

In a second aspect, an embodiment of the present disclosure further provides a face reconstruction apparatus, including:

the first acquisition module is used for acquiring dense point information of an original face included in a target image;

the first processing module is used for fitting the dense point information of the original face by utilizing the dense point information of the first reference face corresponding to the multiple reference images respectively to obtain multiple groups of target coefficients corresponding to the dense point information of the first reference face respectively;

the determining module is used for determining dense point information of a target face model based on dense point information of a second reference face with a preset style and multiple groups of target coefficients corresponding to the dense point information of the first reference face; the second reference face is generated based on the first reference face in the reference image;

and the generating module is used for generating a target face model corresponding to the original face of the target image based on the dense point information of the target face model.

In an optional embodiment, when determining the dense point information of the target face model based on the dense point information of the second reference face having the preset style and the target coefficients corresponding to the multiple sets of dense point information of the first reference face, the determining module is configured to: generating average value information of the dense point information of the multiple groups of second reference faces based on the dense point information of the multiple groups of second reference faces; and generating the dense point information of the target face model based on the target coefficients respectively corresponding to the dense point information of the multiple groups of second reference faces, the average value information and the dense point information of the multiple groups of first reference faces.

In an optional embodiment, when the determining module generates the dense point information of the target face model based on target coefficients corresponding to multiple sets of dense point information of the second reference face, the mean information, and multiple sets of dense point information of the first reference face, respectively, the determining module is configured to: determining difference value information of the dense point information of each group of second reference faces based on the dense point information of each group of second reference faces in the dense point information of the groups of second reference faces and the average value information; performing interpolation processing on difference value information respectively corresponding to the dense point information of the multiple groups of second reference faces on the basis of target coefficients respectively corresponding to the dense point information of the multiple groups of first reference faces; and generating dense point information of the target face model based on the interpolation processing result and the mean value information.

In an optional implementation, the first obtaining module, when obtaining dense point information of an original face included in a target image, is configured to: acquiring the target image comprising the original face; and processing the target image by using a pre-trained neural network to obtain dense point information of the original face in the target image.

In an optional implementation, the system further includes a second processing module, configured to: acquiring a plurality of reference images including a first reference face; and processing each reference image by utilizing a pre-trained neural network aiming at each reference image in the plurality of reference images to obtain dense point information of the first reference face in each reference image.

In an optional implementation manner, when dense point information of a first reference face corresponding to multiple reference images is used to fit dense point information of the original face, and multiple sets of target coefficients corresponding to dense point information of the first reference face are obtained, the first processing module is configured to: performing least square processing on the dense point information of the original face and the dense point information of the first reference face to obtain a plurality of groups of intermediate coefficients corresponding to the dense point information of the first reference face respectively; and determining a target coefficient corresponding to the dense point information of each group of the first reference faces based on the intermediate coefficient corresponding to the dense point information of each group of the first reference faces in the dense point information of the groups of the first reference faces.

In an optional implementation manner, when determining, based on the intermediate coefficient corresponding to the dense point information of each group of first reference faces in the multiple groups of dense point information of the first reference faces, a target coefficient corresponding to the dense point information of each group of the first reference faces, the first processing module is configured to: determining first dense point information representing the part of the first reference face corresponding to the target face model from the dense point information of each group of the first reference faces; adjusting an intermediate coefficient corresponding to the first type of dense point information in the dense point information of the first reference face to obtain a first target coefficient; determining an intermediate coefficient corresponding to second dense point information in dense point information of the first reference face as a second target coefficient; the second type of dense point information is dense point information except the first type of dense point information in the dense point information of the first reference face; and obtaining the target coefficient of the dense point information of each group of the first reference face based on the first target coefficient and the second target coefficient.

In an optional embodiment, the apparatus further includes an adjusting module, configured to: adjusting dense point information of a first reference face of the first reference face in the reference image to obtain dense point information of a second reference face with a preset style; or generating a virtual face image comprising a second reference face with the preset style based on a first reference face in the reference image; and generating dense point information of the second reference face in the virtual face image by utilizing a pre-trained neural network.

In an optional embodiment, the apparatus further comprises a training module, configured to, when training the neural network: acquiring a sample image set; the sample image set comprises a plurality of first sample images of a first sample face and a second sample image of a second sample face; the plurality of first sample images are divided into a plurality of first sample image subsets, and each first sample image subset comprises images of first sample faces with the same expression, which are acquired from a plurality of preset acquisition angles respectively; acquiring dense point information of a first sample face of a first sample image and dense point information of a second sample face of a second sample image in the sample image set; performing feature learning on a first sample image and a second sample image in the sample image set by using an initial neural network to obtain predicted dense point information of a first sample face of the first sample image and predicted dense point information of a second sample face of the second sample image; and training the initial neural network by using the dense point information and the predicted dense point information of the first sample face and the dense point information and the predicted dense point information of the second sample face, and obtaining the neural network after training.

In an optional embodiment, when obtaining dense point information of a second sample face of the second sample image, the training module is configured to: acquiring face key point information of each second sample image; and fitting and generating dense point information of a second sample face of a second sample image by using the face key point information of the second sample image and the second sample image.

the system further comprises a second obtaining module, configured to: acquiring dense point information of a third sample face of the third sample image; performing feature learning on a third sample image by using the initial neural network to obtain predicted dense point information of a third sample face of the third sample image;

the training module is further configured to: performing feature learning on a third sample image in the sample image set by using the initial neural network to obtain predicted dense point information of a third sample face of the third sample image;

the training module is used for training the initial neural network by using the dense point information and the predicted dense point information of the first sample face and the dense point information and the predicted dense point information of the second sample face, and when the neural network is obtained after training is completed, the training module is used for: and training the initial neural network by using the dense point information and the predicted dense point information of the first sample face, the dense point information and the predicted dense point information of the second sample face, and the dense point information and the predicted dense point information of the third sample face, and obtaining the neural network after training.

In a third aspect, this disclosure also provides a computer device, a processor, and a memory, where the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the machine-readable instructions stored in the memory, and when the machine-readable instructions are executed by the processor, the machine-readable instructions are executed by the processor to perform the steps in the first aspect or any one of the possible implementations of the first aspect.

In a fourth aspect, this disclosure also provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed to perform the steps in the first aspect or any one of the possible implementation manners of the first aspect.

For the description of the effects of the above-mentioned face reconstruction apparatus, computer device, and computer-readable storage medium, reference is made to the description of the above-mentioned face reconstruction method, which is not repeated herein.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 shows a flowchart of a face reconstruction method provided by an embodiment of the present disclosure;

FIG. 2 is a flow chart illustrating a specific method of training a neural network provided by an embodiment of the present disclosure;

FIG. 3 shows a schematic diagram of a first sample image provided by an embodiment of the present disclosure, and a third sample image determined using the first sample image;

fig. 4 is a flowchart illustrating a specific method for obtaining dense point information of a second sample face of a second sample image according to an embodiment of the present disclosure;

FIG. 5 is a diagram illustrating a specific example of a neural network architecture provided by an embodiment of the present disclosure;

FIG. 6 is a flow chart of a method for determining target coefficients corresponding to a plurality of reference images;

fig. 7 is a flowchart illustrating a specific method for determining a target coefficient corresponding to dense point information of each group of first reference faces according to an embodiment of the present disclosure;

fig. 8 is a flowchart illustrating a specific method for determining dense point information of a target face model according to an embodiment of the present disclosure;

fig. 9 is a flowchart illustrating a specific method for generating dense point information of a target face model according to an embodiment of the present disclosure;

fig. 10 is a schematic diagram of a face reconstruction apparatus provided in an embodiment of the present disclosure;

fig. 11 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of embodiments of the present disclosure, as generally described and illustrated herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

Research shows that when a virtual face three-dimensional model is obtained by face reconstruction based on a face included in a portrait image, face dense points corresponding to the face are generally obtained based on the face image, and then the face dense points are adjusted for multiple times according to the specific style of the virtual face three-dimensional model to generate a virtual image. Because the face dense points corresponding to different faces are different, even if the style of the virtual face to be reconstructed is the same, the adjustment performed when the face dense points corresponding to different faces are reconstructed according to the determined style is different, and the adjustment has larger uncertainty, so that the specific direction of the adjustment is difficult to control in the adjustment process, a generated virtual face three-dimensional model and a real face may have larger difference, and the similarity between the virtual face three-dimensional model and the real face is lower.

In addition, when face reconstruction is performed on different faces, adjustment schemes for different faces need to be set according to face dense points corresponding to the different faces and requirements of styles on the face dense points, so that when the face dense points of each face are adjusted based on different adjustment schemes, more time is consumed, and the face reconstruction efficiency is low.

Based on the above research, the present disclosure provides a face reconstruction method, apparatus, computer device, and storage medium, in which a target coefficient is used as a medium to establish an association relationship between dense point information of an original face and dense point information of a plurality of first reference faces, and the association relationship can represent an association between dense point information of a second reference face determined based on the dense point information of the first reference face and dense point information of a target face model established based on the original face, so that the generated target face model has features (such as shape features) of the original face in a target image, has a higher similarity with the original face, and can make the generated target face model have a preset style.

In addition, according to the scheme, for different styles, only dense point information of second reference faces corresponding to a plurality of reference images respectively needs to be generated, and the dense point information of the second reference faces with the preset style is used for generating the target face model for the original faces.

The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solutions proposed by the present disclosure to the above-mentioned problems should be the contribution of the inventor in the process of the present disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

To facilitate understanding of the present embodiment, first, a face reconstruction method disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the face reconstruction method provided in the embodiments of the present disclosure is generally a computer device with certain computing power, and the computer device includes, for example: a terminal device, which may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle mounted device, a wearable device, or a server or other processing device. In some possible implementations, the face reconstruction method may be implemented by a processor calling computer readable instructions stored in a memory.

The following explains the face reconstruction method provided by the embodiment of the present disclosure.

Referring to fig. 1, which is a flowchart of a face reconstruction method provided in the embodiment of the present disclosure, the method includes steps S101 to S104, where:

s101: acquiring dense point information of an original face included in a target image;

s102: fitting dense point information of an original face by utilizing dense point information of first reference faces corresponding to a plurality of reference images respectively to obtain target coefficients corresponding to dense point information of a plurality of groups of first reference faces respectively;

s103: determining dense point information of a target face model based on dense point information of a second reference face with a preset style and target coefficients corresponding to the dense point information of multiple groups of first reference faces respectively; the second reference face is generated based on the first reference face in the reference image;

s104: and generating a target face model corresponding to the original face of the target image based on the dense point information of the target face model.

The method comprises the steps of determining a target coefficient when dense point information of an original face in a target image is fitted by utilizing dense point information of a first reference face corresponding to a plurality of reference images respectively, obtaining dense point information of a target face model based on dense point information of a second reference face which is generated by the first reference face in the reference images and has a preset style, generating the target face model corresponding to the target image based on the dense point information of the target face model, establishing an incidence relation between the dense point information of the original face and the dense point information of the plurality of first reference faces by utilizing the target coefficient as a medium, wherein the incidence relation can represent the incidence relation between dense point information of the second reference face determined based on the density of the first reference face and the dense point information of the target face model established based on the dense point information of the original face, the generated dense point information of the target face model not only has the characteristics (such as shape characteristics and the like) of the original face in the target image, but also has higher similarity with the original face, and the generated target face model can have a preset style.

The following describes the details of S101 to S104.

For the above S101, the target image is, for example, an image including a human face acquired in advance, or an image including a human face acquired when a certain object is photographed by a photographing apparatus of a camera. At this time, for example, any one of faces included in the image may be determined as an original face, and the original face may be an object of face reconstruction.

Specifically, when the face reconstruction method provided by the embodiment of the present disclosure is applied to different scenes, the target image acquisition methods are also different.

For example, in the case of applying the face reconstruction method to a game, an image including a face of a game player may be acquired by an image acquisition device installed in the game device, or an image including a face of a game player may be selected from an album in the game device and the acquired image including a face of a game player may be used as a target image.

For another example, when the face reconstruction method is applied to a terminal device such as a mobile phone, a camera of the terminal device may collect an image including a face of a user, select an image including a face of a user from an album of the terminal device, or receive an image including a face of a user from another application installed in the terminal device.

For another example, when the face reconstruction method is applied to a live broadcast scene, a video frame image including a face may be determined from a plurality of frame video frame images included in a video stream acquired by a live broadcast device; and the video frame image containing the human face is taken as a target image. Here, the target image may have, for example, a plurality of frames; the multi-frame target image may be obtained by sampling a plurality of video frame images in a video stream, for example.

When dense point information of an original face included in a target image is acquired, for example, the following manner may be adopted: acquiring a target image comprising an original face; and processing the target image by using a pre-trained neural network to obtain dense point information of the original face in the target image.

Here, when the pre-trained neural network is used to process the target image to obtain the dense point information of the original face, the pre-trained neural network includes at least one of the following: convolutional Neural Networks (CNN), Back Propagation (BP), and Backbone Neural Networks (Backbone).

When determining the neural network structure, for example, a Backbone network (Backbone network) of the neural network may be determined first, as a main architecture of the neural network, and for example, the Backbone network may include at least one of the following: an initiation network inclusion, a residual network variant network (the next dimension to net, ResNeXt), an initiation network variant network Xception, a crush-and-Excitation network (SENet), a lightweight network mobilonet, and a lightweight network shuffle.

Illustratively, when the neural network comprises a convolutional neural network, a lightweight network (mobilene) can be selected as a basic model of the convolutional neural network, and on the basis of the mobilene, other network structures are added to form the convolutional neural network, and the formed convolutional neural network is trained. In the process, the mobilene is used as a part of a convolutional neural network, and the mobilene has small volume and high data processing speed, so that the training speed is higher; meanwhile, the trained neural network also has the advantages of small volume and high data processing speed, and is more suitable for being deployed in embedded equipment.

Here, the network structure of the neural network described above is merely an example; the specific construction mode and structure of the network structure may be determined according to actual situations, and are not described herein again, and the above examples also do not constitute limitations on the embodiments of the present disclosure.

Referring to fig. 2, an embodiment of the present disclosure provides a specific method for training a neural network, including:

s201: acquiring a sample image set; the sample image set comprises a plurality of first sample images of a first sample face and a second sample image of a second sample face; the plurality of first sample images are divided into a plurality of first sample image subsets, and each first sample image subset comprises images of first sample faces with the same expression, which are acquired from a plurality of preset acquisition angles respectively.

The method includes the steps that for a plurality of first sample images of a first sample face included in a sample image set, the corresponding first sample face is, for example, a face of at least one individual object which is predetermined and used for acquiring a face image to train a neural network.

When the first sample image subset is determined by shooting the first sample face, a plurality of first sample image subsets may be determined for a plurality of different expressions, for example. The plurality of different expressions are, for example, happy, excited, lost, hard, and the like. In the case where the first sample face is photographed to acquire the first sample image subset, taking the case where the first sample face is photographed for the "happy" expression as an example, by photographing the first sample face presenting the "happy" expression from different angles, a plurality of first sample images corresponding to the "happy" expression can be acquired as the first sample image subset. Here, when the first sample image is acquired, the background of the first sample face in different first sample images may be the same.

Similarly, a similar method may be adopted to obtain first sample image subsets corresponding to the first sample faces respectively under different expressions, which is not described herein again.

After the first sample image subsets corresponding to the different expressions are determined, a plurality of first sample images of the first sample face can be determined.

For example, when acquiring the first sample image, the image acquisition device may be used to capture and acquire the image, where the image acquisition device includes: at least one of a depth camera and a color camera.

For example, the faces of the I individual subjects may be photographed in E expressions to obtain a plurality of first sample images. For example, a face of a certain individual object a may be determined as a first sample face. The method comprises the steps of shooting a first same face from P (P is an integer larger than 1) different angles under the condition that the first same face presents a 'difficult' expression, and obtaining P images respectively corresponding to the different angles as a first sample image subset corresponding to the 'difficult' expression. Then, images corresponding to the P angles are acquired under the condition that the first sample face presents other expressions (for example, the first sample face comprises E-1 different expressions) and are used as different first sample image subsets corresponding to the other expressions. At this time, E different first sample image subsets may be acquired, including P × E images corresponding to the individual subject a. Then, the remaining I-1 individual subjects may be photographed to acquire P × E images corresponding to the remaining I-1 individual subjects, respectively, that is, to acquire M (I × E × P) images in total as the plurality of first sample images.

Under the condition of acquiring the first sample image, the faces corresponding to a plurality of shooting individual objects with different expressions are acquired under the condition that the shooting subjects in the image are different in visual angle, and other parts such as backgrounds are fewer, so that the acquisition capability of the neural network on the dense point information corresponding to the faces in the image under multiple angles can be trained by utilizing the first sample image.

The second sample image can be obtained by randomly shooting different individual objects, and a plurality of images containing human faces can be randomly crawled from a preset network platform, and the crawled images are used as the second sample human face images.

For example, in the case of randomly shooting a second sample image for different individual subjects, a plurality of second sample faces may be shot by a camera or other image shooting devices to obtain second sample images; alternatively, a plurality of second sample images captured in advance are directly acquired. The second sample image includes, for example, H acquired face images including a background; the background and other parts which interfere with the recognition of the human face in the second sample image are used for training the neural network to have anti-jamming capability on other parts except the human face under the condition that the human face in the target image is recognized to obtain dense point information.

In another embodiment of the present disclosure, the sample image set further includes a third sample image, and the third sample image can be obtained by performing data enhancement processing on the first sample image, for example. Wherein the data enhancement processing comprises at least one of: random occlusion processing, gaussian noise processing, blurring processing, and color area channel changing processing. By using the data enhancement processing method, the characteristics of the data can be concentrated to reduce the learning of the neural network on irrelevant characteristics, so that the performance of the neural network in acquiring dense point information is improved.

Exemplarily, under the condition of selecting random occlusion processing, occlusion processing can be performed on a partial area in the first sample image to obtain a third sample image; the size of the shielding part can be determined according to the size of the first sample image and the actual requirement, and is not limited herein; in the case of selecting gaussian noise, for example, at least one of fluctuation noise, cosmic noise, thermal noise, and shot noise may be selected to be added to the first sample image, so that at least one gaussian noise is included in the third sample image; under the condition of selecting the fuzzy processing, for example, at least part of pixel points in the image can be subjected to motion fuzzy processing to obtain a third sample image with a motion fuzzy effect; in the case of the color area channel change processing, for example, the first sample image may be processed by global color area channel change processing or random color area channel change processing, so as to obtain a third sample image with global or partial color area channel changed.

Referring to fig. 3, a schematic diagram of a first sample image and a third sample image determined by using the first sample image is provided for the embodiment of the present disclosure; wherein 31 denotes a first sample image; 32, a third sample image obtained by performing data enhancement processing of the blurring processing on the first sample image 31; the third sample image obtained by performing data enhancement processing of the local occlusion on the first sample image 31 is denoted by 33, and the position of the local occlusion is denoted by 34.

In this case, the dense point information of the sample face corresponding to the third sample image is the same as the dense point information of the sample face corresponding to the first sample image from which the third sample image is generated.

Receiving the above S201, the specific method for training the neural network further includes:

s202: and acquiring dense point information of a first sample face of the first sample image and dense point information of a second sample face of the second sample image in the sample image set.

Specifically, when acquiring dense point information of a first sample face of a first sample image and dense point information of a second sample face of a second sample image in a sample image set, for example, the following method may be adopted: and acquiring dense point information of a first sample face of the first sample image, dense point information of a second sample face of the second sample image and dense point information of a third sample face of the third sample image in the sample image set.

For example, for a first sample image in the sample image set, after the first sample face is photographed to obtain a corresponding first sample image, dense point information of the sample face of the first sample face in each first sample image may be obtained. Under the condition that the color camera is used for obtaining the first sample image, for example, models such as a human face 3D deformation statistical Model (3D deformable Model, 3DMM) and the like can be used for obtaining dense point information of the sample human face of the first sample human face; in the case where the first sample image is acquired by the depth camera, for example, dense point information of the sample face of the first sample face may be obtained based on the depth image acquired by the depth camera.

For a second sample image in the sample image set, referring to fig. 4, an embodiment of the present disclosure provides a specific method for obtaining dense point information of a second sample face of the second sample image, including:

s401: acquiring face key point information of each second sample image;

s402: and fitting and generating dense point information of the second sample face of the second sample image by using the face key point information of the second sample image and the second sample image.

Here, the second sample images include, for example, H acquired face images including a background, and each of the second sample images includes the determined face keypoints. The face key points included in the second sample image are used for determining dense point information of the sample face corresponding to the second sample image, for example, key points directly labeled in the second sample image and used for representing face features, such as a plurality of key points representing facial features, cheekbones and eyebrows; or, the method comprises determining key points corresponding to the human face by using a key point detection method. The key point detection method comprises at least one of the following steps: active Shape Model (ASM), Active Appearance Model (AAM), and Cascade Position Regression (CPR).

In the case of obtaining face keypoint information of the second sample image, dense point information of the second sample face of the second sample image may be generated using the fitting model. The fitting model includes, for example, a human face 3D deformation statistical model.

Receiving the above S202, the method for training a neural network further includes:

s203: and performing feature learning on the first sample image and the second sample image in the sample image set by using the initial neural network to obtain the predicted dense point information of the first sample face of the first sample image and the predicted dense point information of the second sample face of the second sample image.

Specifically, when determining the predicted dense point information of the first sample face of the first sample image and the predicted dense point information of the second sample face of the second sample image, for example, the following method may be adopted: and performing feature learning on the first sample image, the second sample image and the third sample image in the sample image set by using an initial neural network to obtain the predicted dense point information of the first sample face of the first sample image, the predicted dense point information of the second sample face of the second sample image and the predicted dense point information of the third sample face of the third sample image.

At this time, it should be noted that the step of obtaining at least one of the first sample image, the second sample image, and the third sample image may be performed synchronously with the step of performing feature learning on at least one of the first sample image, the second sample image, and the third sample image by using the initial neural network, that is, predicted dense point information respectively corresponding to at least one of the first sample image, the second sample image, and the third sample image after performing feature learning on the at least one of the first sample image, the second sample image, and the third sample image by using the initial neural network may be directly obtained.

In addition, when feature learning is performed on at least one of the first sample image, the second sample image, and the third sample image using the initial neural network, the feature learning may be performed on at least one of the first sample image, the second sample image, and the third sample image in synchronization; or, performing feature learning on at least one of the first sample image, the second sample image and the third sample image in sequence according to actual requirements to obtain predicted dense point information of the first sample face of the first sample image, predicted dense point information of the second sample face of the second sample image and predicted dense point information of the third sample face of the third sample image.

The embodiment of the present disclosure does not limit the execution sequence of the sample processing process, and may specifically set according to actual needs.

For example, for a first sample image and a second sample image included in a sample image set, different numbers of the first sample image and the second sample image may be selected according to a preset ratio, and the selected first sample image and the selected second sample image are input into an initial neural network; in the case that the sample image set includes a first sample image, a second sample image, and a third sample image, different numbers of the first sample image, the second sample image, and the third sample image may be selected according to a preset ratio, and the selected first sample image, second sample image, and third sample image may be input into the initial neural network. When the proportion is selected differently, the emphasis on neural network training is also different. When the proportion of the first sample image and/or the third sample image is larger, the trained neural network has stronger acquisition capability on face dense points corresponding to faces at different angles in the image; when the proportion of the second sample image is larger, the trained neural network has stronger anti-interference capability on other background parts except the face in the image, thereby meeting different use requirements.

After the sample images are input into the initial neural network, the initial neural network can perform feature learning on the sample images and output dense point information of the predicted face of each sample image; and determining the loss of the neural network by using the dense point information of the predicted face and the dense point information of the sample face corresponding to each sample image, wherein the loss is used for measuring the accuracy of the neural network in generating the dense point information of the face.

The dense point information of the predicted face obtained by using the neural network may include, for example, coordinate values for characterizing dense point position information of the face in a preset coordinate system, or coordinate values for characterizing dense point position information of the face in coordinate axes respectively corresponding to an x axis, a y axis, and a z axis of the preset coordinate system. The preset coordinate system is, for example, a preset face coordinate system.

For example, for any piece of dense point information of the predicted face, in the case of including a position coordinate value of a dense point characterizing the face, the dense point information of the predicted face may include coordinate values (x, y, z), or include coordinate values x on the x axis, coordinate values y on the y axis, and coordinate values z on the z axis, and the coordinate values on the x axis, y axis, and z axis included in the dense point information of different predicted faces have a preset arrangement order in the output dense point information of the predicted face.

Referring to fig. 5, an embodiment of the present disclosure further provides a specific example diagram of a neural network structure, where the neural network structure includes: a backbone network 51, a first fully connected layer 52, and three sets of second fully connected layers 53.

When the neural network is used for processing the sample image and determining dense point information of a predicted face of the face in the sample, for example, the sample image can be input to a backbone network to obtain feature data of the sample image, and the feature data are respectively input to the three groups of second full-connection layers after passing through the first full-connection layer; and the three groups of second full-connection layers are used for predicting the coordinate values of the dense points of the face in the sample image in the face coordinate system. For example, the first group of second fully-connected layers can output the coordinate values of the x-axis of the dense points in the face coordinate system, the second group of second fully-connected layers can output the coordinate values of the y-axis of the dense points in the face coordinate system, and the third group of second fully-connected layers can output the coordinate values of the z-axis of the dense points in the face coordinate system. And the coordinate values of the dense points in the face coordinate system form dense point information of the predicted face of the face in the sample image.

Under the condition that a plurality of first sample images and a plurality of second sample images are determined, a plurality of sample images used for training the neural network are selected according to a preset proportion that the selection proportion of the first sample images is 40% and the selection proportion of the second sample images is 60%. For example, in the case of training a neural network using 100 sample images, 40 first sample images and 60 second sample images are selected as a plurality of sample images.

The method comprises the steps of selecting a convolutional neural network as an initial neural network for training, constructing the convolutional neural network by using a Mobilene as a basic model and a split network (split FC) as an output layer, obtaining dense point information of a predicted face corresponding to a plurality of sample images respectively, and obtaining the dense point information of the predicted face, wherein the obtained dense point information of the predicted face comprises a plurality of coordinate values on an x axis, a y axis and a z axis respectively.

At this time, for the dense point information of the predicted face corresponding to any sample image, in the case that the face dense points include R corresponding to different face parts, the output dense point information of the predicted face may be output in the form of a matrix, for example, and is represented as [ x [ ]₁,x₂,…,x_R]、[y₁,y₂,…,y_R]And [ z)₁,z₂,…,z_R]That is, the coordinate value (x) included in the dense point information of the predicted face corresponding to any one of the R different face parts_i,y_i,z_i),i∈[1,R]Is divided into coordinate values x on the x-axis, y-axis and z-axis_i、y_iAnd z_iAnd respectively output as the ith element in three different matrixes.

Receiving the above S203, the method for training the neural network further includes:

s204: and training the initial neural network by using the dense point information and the predicted dense point information of the first sample face and the dense point information and the predicted dense point information of the second sample face, and obtaining the neural network after the training is finished.

Specifically, when training an initial neural network to obtain a neural network, the following method may be adopted: and training the initial neural network by using the dense point information and the predicted dense point information of the first sample face, the dense point information and the predicted dense point information of the second sample face and the dense point information and the predicted dense point information of the third sample face, and obtaining the neural network after the training is finished.

Illustratively, the loss of the neural network can be determined based on the difference between the dense point information of the predicted face and the dense point information of the sample face, and the neural network is trained by using the loss, wherein the training direction is the direction in which the loss is reduced, so that the obtained dense point information of the predicted face is close to the dense point information of the real face when the neural network processes the image.

Under the condition of obtaining the trained neural network, the target picture can be input into the neural network, and dense point information corresponding to the original face in the target picture is obtained.

For the above S102, the reference image may be, for example, faces corresponding to different individual objects, and the faces corresponding to different individual objects are different; for example, a plurality of persons different in at least one of sex, age, skin color, fat and thin degree, and the like may be determined, and for each of the plurality of persons, a face image of each person is acquired, and the acquired face image is taken as the second sample image. In this way, dense point information of the second sample face generated based on the second sample image can cover as wide a face appearance feature as possible.

When dense point information of a plurality of first reference faces corresponding to a plurality of reference pictures is obtained, for example, the following manner may be adopted: acquiring a plurality of reference images including a first reference face; and aiming at each reference image in the multiple reference images, acquiring dense point information of a first reference face of the first reference face in each reference image by using a pre-trained neural network. The way of obtaining the plurality of first reference face dense points by using the pre-trained neural network is similar to the way of obtaining the original face dense points by using the pre-trained neural network, and is not described herein again.

Under the condition that the dense point information of the original face and the dense point information of the first reference face are determined, the dense point information of the original face can be fitted by using the dense point information of the first reference face, so that the target coefficients of the dense point information of the first reference face corresponding to the multiple reference images respectively are obtained. The target coefficient can be used as a medium to establish an association relationship between dense point information of an original face in the target image and dense point information of a first reference face corresponding to each of the plurality of reference pictures.

Referring to fig. 6, an embodiment of the present disclosure provides a method for determining target coefficients corresponding to multiple reference images, including:

s601: and performing least square processing on the dense point information of the original face and the dense point information of the first reference face to obtain a plurality of groups of intermediate coefficients corresponding to the dense point information of the first reference face respectively.

Illustratively, dense point information of an original face is represented as IN^meshThe dense point information of the first reference face is represented as BASE^mesh. Wherein, since the dense points of the first reference face are determined by a plurality of reference images, the dense point information BASE of the first reference face exists under the condition that N reference images exist^meshThe corresponding dense point information containing N groups of human faces is expressed as

i∈[1,N]Wherein, in the step (A),

and dense point information representing the face corresponding to the ith reference image.

By using IN^meshAnd

to

Performing least square processing to obtain N fitting values expressed as alpha_i(i∈[1,N]). Wherein alpha is_iAnd representing a fitting value corresponding to the dense point information of the ith group of first reference faces. From the N fit values, the target coefficient Alpha can be determined, for example, by a coefficient matrix, i.e., Alpha ═ Alpha₁,α₂,…,α_N]。

Here, in the process of fitting the dense point information of the original face by the dense point information of the first reference face, data obtained by weighting and summing the dense point information of the first reference face by the target coefficient is as close as possible to the data of the dense point information of the original face.

The target coefficient may also be regarded as an expression coefficient of dense point information of each first reference face when the dense point information of the original face is expressed by the dense point information of the first reference faces corresponding to the plurality of reference images.

S602: and determining a target coefficient corresponding to the dense point information of each group of first reference faces based on the intermediate coefficient corresponding to the dense point information of each group of first reference faces in the dense point information of the groups of first reference faces.

Specifically, referring to fig. 7, an embodiment of the present disclosure further provides a specific method for determining a target coefficient corresponding to dense point information of each group of first reference faces, where the specific method includes:

s701: and determining first type dense point information representing the part of the first reference face corresponding to the target face model from the dense point information of each group of first reference faces.

Any one group of dense point information in multiple groups of first reference face

Due to the fact that

Is obtained based on the ith reference image, and thus

The method comprises representing dense point information corresponding to the face part in the ith reference image. The face parts include, for example, the eyebrow part, the nose part, the eye part, the mouth part, the zygomatic bone part, and the lower jaw part. For example, the eyebrow portion can be further divided into an eyebrow tip portion, an eyebrow center portion, and an eyebrow peak portion.

In a possible implementation manner, in order to enable the target coefficients to more accurately represent the condition that the dense point information of the first reference face fits the dense point information of the original face, adjustment may be made to the target coefficients corresponding to part of the face, so that the dense point information obtained based on the fitting of the target coefficients and the dense point information of the plurality of corresponding first reference faces is close to the dense point information of the original face corresponding to the target image. At this time, the part of the face corresponding to the target coefficient that needs to be adjusted is the target face model part, which may include, for example, the eyes and the mouth. The specific target face model part may be determined according to specific conditions or experience, and will not be described herein again.

Under the condition of determining first dense point information corresponding to a target face model target in a first reference face characterized by dense point information of the first reference face, using dense point information BASE of a group of first reference faces_i ^meshFor example, the face region may be divided into R, and the dense point information corresponding to the face region of the R person may be represented as R

In the case where the target face portion of the R face portions includes an eye portion and a mouth portion, the dense point information corresponding to the target face model portion includes, for example, dense point information

And

i.e. dense point information of the first kind.

S702: and adjusting an intermediate coefficient corresponding to the first-class dense point information in the dense point information of the first reference face to obtain a first target coefficient.

Under the condition of determining the first-class dense point information, first target coefficients corresponding to part of parts corresponding to the target face model, namely, part of intermediate coefficients which need to be adjusted to achieve a better fitting effect, can be determined. At the same time, for any fitting value α in the intermediate coefficients_iDue to α_iIs made use of IN^meshAnd

obtained by least squares processing, thus alpha_iThe intermediate coefficients also include a plurality of partial portions corresponding to the plurality of target face models, respectively.

Illustratively, in the case where the target face model part is an eye part and a mouth part, respectively, α_iThe intermediate coefficients corresponding to the plurality of target face model portions in the image can be expressed as alpha, for example_i-1And alpha_i-2I.e. dense point information of the first kind

And

corresponding intermediate coefficients. For the intermediate coefficient alpha_i-1And alpha_i-2And the fitting result is more similar to the dense point information of the original face after fitting the dense point information of the original face based on the first target coefficient obtained after adjustment. Wherein the value adjustment includes, for example, a value increase and/or a value decrease.

S703: determining an intermediate coefficient corresponding to second dense point information in dense point information of the first reference face as a second target coefficient; the second type of dense point information is dense point information except the first type of dense point information in the dense point information of the first reference face.

At this time, an intermediate coefficient corresponding to the second-class dense point information in the dense point information of the first reference face may be determined as a second target coefficient. And dense point information except the first dense point information except the target part in the dense point information of each group of the first reference faces can be used as second dense point information. Because the target coefficient corresponding to the second-class dense point information has a small influence on the fitting result, or the fitting result is excellent during fitting, the target coefficient corresponding to the second-class dense point information can be not adjusted, so that the efficiency is improved under the condition of ensuring the fitting effect.

S704: and obtaining the target coefficient of the dense point information of each group of the first reference human faces based on the first target coefficient and the second target coefficient.

The first target coefficient corresponds to the target face model part, and the second target coefficient corresponds to other face parts except the target face model part in the plurality of face parts, so that the target coefficients corresponding to the plurality of face parts, namely the target coefficients of the dense point information of each group of the first reference faces, can be determined by combining the first target coefficient and the second target coefficient.

For the above S103, the preset style may be, for example, a card ventilation style, an ancient style, an abstract style, or the like, and may be specifically set according to actual needs. For example, for the case that the preset style is a cartoon style, the second reference face with the preset style may be a cartoon face.

When generating dense point information of a second reference face using the second reference face having a preset style, for example, the following manner may be adopted:

adjusting dense point information of a first reference face of the first reference face in the reference image to obtain dense point information of a second reference face with a preset style;

alternatively, the first and second electrodes may be,

generating a virtual face image comprising a second reference face with a preset style based on a first reference face in the reference image;

and generating dense point information of a second reference face of the second reference face in the virtual face image by using a pre-trained neural network.

In the case that the dense points of the first reference face are adjusted to obtain dense point information of a second reference face with a preset style, for example, all or part of the dense point information in the dense point information of the first reference face may be adjusted according to the preset style, so that a face reflected by the obtained dense point information of the second reference face has the preset style.

Taking the preset style as the cartoon style, in a possible implementation manner, the cartoon style includes, for example, amplifying the eyes, and when adjusting the dense point information of the first reference face, making adjustment of upward movement of the position coordinates of the dense points corresponding to the upper eyelid and/or downward movement of the position coordinates of the dense points corresponding to the lower eyelid on the dense point information corresponding to the eyes, so that the obtained dense points of the second reference face have amplification on the eyes reflected by the dense point information of the eyes.

In the case that a virtual face image is generated based on a first reference face, and dense point information of a second reference face is generated by using a neural network obtained through pre-training, for example, a graphic image processing may be performed on the first reference face in the reference image according to a preset style to generate a virtual face image of the second reference face having the preset style. The graphic image processing may include, for example, picture editing, picture drawing, and picture designing.

Under the condition of obtaining the virtual face image of the second reference face with the preset style, the dense point information of the corresponding second reference face can be determined by utilizing the neural network obtained by pre-training. The method for determining the dense point information of the second reference face by using the neural network obtained by the pre-training is similar to the method for determining the dense point information of the original face and the dense point information of the first reference face by using the neural network obtained by the pre-training, and is not described herein again.

At this time, the obtained dense point information of the second reference face may be represented as CART, for example^mesh。

And after determining the dense point information of the second reference face with the preset style and the target coefficients corresponding to the dense point information of the multiple groups of first reference faces respectively, determining the dense point information of the target face model.

Referring to fig. 8, an embodiment of the present disclosure further provides a specific method for determining dense point information of a target face model, including:

s801: and generating average value information of the dense point information of the multiple groups of second reference faces based on the dense point information of the multiple groups of second reference faces.

In one possible implementationIn this way, the dense point information CART of the second reference face may be based on^meshAnd carrying out averaging processing on the coordinate values of the corresponding parts to generate average value information of dense point information of the second reference face. And the mean value information is used for representing average characteristics of dense point information of a plurality of groups of second reference human faces. For example, in the case where N groups of face dense points are included in the dense point information of the second reference face, dense point information of a plurality of groups of faces in the dense point information for the second reference face may be represented as

Wherein, the dense point information CART aiming at any group of second reference human faces_i ^mesh,i∈[1,N]A plurality of face dense points corresponding to different part positions may be determined, and in the case where W different face dense points are included, for example, W different face dense points may be represented as P₁、P₂、……、P_WRespectively corresponding coordinate values are represented as

Using each group of CARTs_i ^meshAnd calculating the mean value of the coordinate values of the middle corresponding part position to obtain the mean value information of the corresponding part position. At this time, the first face dense points P for different part positions₁For example, the following formula (1) can be used to obtain the mean value information

The way of determining the average information of the face dense points at other different positions is similar to the way of obtaining the average information of the first face dense point, and is not described herein again.

At this time, mean value information of different positions can be obtained

That is, the mean information of the dense point information of the second reference face, can be expressed as

S802: and generating dense point information of the target face model based on the dense point information and the mean value information of the multiple groups of second reference faces and the target coefficients corresponding to the dense point information of the multiple groups of first reference faces respectively.

Specifically, referring to fig. 9, an embodiment of the present disclosure further provides a specific method for generating dense point information of a target face model, including:

s901: and determining difference value information of the dense point information of each group of second reference faces based on the dense point information and the mean value information of each group of second reference faces in the dense point information of the groups of second reference faces.

For example, the dense point information of each group of second reference faces may be differentiated from the coordinate values of the corresponding different positions in the mean value information, and the differential information of the dense point information of each group of second reference faces may be determined, and expressed as Δ^meshI.e. delta^mesh＝CART^mesh-MEAN^mesh. The difference information may represent the difference between the dense point information of each group of second reference faces and the average feature of the dense point information of the multiple groups of second reference faces.

Wherein the difference information Δ is due to the presence of dense point information of a plurality of second reference faces^meshThe information processing device comprises a plurality of sub-difference value information of dense point information respectively corresponding to a plurality of second reference faces. Illustratively, in the case where there are N sets of dense point information of the second reference face, the corresponding difference information Δ^meshFor example, may include information included in dense point information associated with N groups of second reference faces

Respectively corresponding sub-difference information

That is, the difference information Δ mesh includes N pieces of sub-difference information, which can be expressed as

S902: and performing interpolation processing on difference value information respectively corresponding to the dense point information of the multiple groups of second reference faces on the basis of target coefficients respectively corresponding to the dense point information of the multiple groups of first reference faces.

For example, the target coefficient may be used as a weight corresponding to dense point information of multiple groups of second reference faces, and difference information corresponding to dense point information of multiple groups of second reference faces is subjected to weighted summation processing, so as to implement a difference processing process.

Where the target coefficient includes Alpha ═ Alpha₁,α₂,…,α_N]In the case of (1), the weight of the dense point information of the second reference face corresponding to each of the plurality of reference images is α₁、α₂、……、α_N. The target coefficients are used for carrying out weighted summation on difference value information respectively corresponding to the dense point information of a plurality of groups of second reference human faces, and the obtained result can be expressed as AIM^meshThe system is used for transferring the incidence relation between the original face dense points and the first reference face dense points to the positions between the original face dense points and the second reference face dense points, so that the obtained face dense points have the characteristics of the original face dense points and the style of the second reference face dense points; in this case, the result AIM obtained by performing weighted summation on difference information respectively corresponding to the dense point information of the plurality of groups of second reference faces^meshSatisfies the following formula (2):

wherein λ is_i,i∈[1,N]Weight α representing dense point information with a second reference face₁、α₂、……、α_NAnd/or difference information respectively corresponding to multiple groups of dense point information of second reference face

The weights respectively corresponding to the differences are used to represent the contribution and/or importance of the different difference information to the result obtained by the weighted summation, and may be preset or adjusted, and the specific setting and adjusting method is not described herein again.

S903: and generating dense point information of the target face model based on the interpolation processing result and the mean value information.

In one possible implementation, the result of the weighted sum may be directly superimposed on the mean information to generate the target face model dense point information, and represented as OUT^meshThat is, the following formula (3) can be used to obtain dense point information OUT of the target face model^mesh：

OUT^mesh＝AIM^mesh+MEAN^mesh (3)

At this time, dense point information OUT of the target face model in a preset style reflected by dense point information including both the face features in the target image and the second reference face can be obtained^mesh。

For the above S104, a target face model corresponding to the target image may be generated by using the dense point information of the target face model.

Illustratively, dense point information OUT may be based on a target face model^meshGenerating a corresponding target face model by using a rendering mode, or determining dense point information OUT of the target face model^meshCovering with incidence relation and dense point information OUT by using target face model^meshAnd generating a target face model corresponding to the target image. The specific method can be selected according to the actual situation, and is not described herein again.

The embodiment of the disclosure also provides a face reconstruction method for acquiring the target image Pic_AThe target virtual human face model Mod corresponding to the original human face A_AimThe description of the specific process of (a).

Determining a target virtual face model Mod_AimThe steps (1) to (4) are as follows:

(1) preparing a material; acquiring a plurality of reference images Pic_cons。

(2) Preparing dense point information; the prepared dense point information includes the following items (2-1) and (2-2).

(2-1) determining a target image Pic by using a neural network_ADense point information IN corresponding to original face a IN (1)^mesh。

(2-2) determining a plurality of reference images Pic by using a neural network_consDense point information BASE of the respectively corresponding first reference faces^mesh。

(3) Determining a target coefficient Alpha; the specific target coefficients include the following (3-1) to (3-4).

(3-1) determining first-class dense point information corresponding to part of target face model

And

(3-2) fitting dense point information of the original face by using dense point information corresponding to the target face model, and determining an intermediate coefficient;

(3-3) alpha in intermediate coefficient corresponding to first-class dense point information_i-1And alpha_i-2Adjusting to obtain a first target coefficient;

and (3-4) determining an intermediate coefficient corresponding to the second dense point except the first dense point information in the dense point information of the first reference face as a second target coefficient, and determining a target coefficient Alpha by using the first target coefficient and the second target coefficient.

(4) Determining a target face model Mod_Aim(ii) a Wherein a target face model Mod is determined_AimComprises the following (4-1) to (4-3).

(4-1) determining MEAN value information MEAN of dense point information of multiple groups of second reference faces^mesh。

(4-2) determining dense point information of the target face model; dense point information CART based on multiple groups of second reference faces^meshMEAN information MEAN^meshAnd generating dense point information AIM of the target face model by using the target coefficients Alpha corresponding to the dense point information of the multiple groups of first reference faces respectively^mesh。

(4-3) dense point information AIM using the generated target face model^meshDetermining a target face model Mod_Aim。

Here, it should be noted that the above (1) to (4) are only one specific example of the method for completing face reconstruction, and do not limit the face reconstruction method provided in the embodiment of the present disclosure.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same inventive concept, the embodiment of the present disclosure further provides a face reconstruction device corresponding to the face reconstruction method, and as the principle of solving the problem of the device in the embodiment of the present disclosure is similar to the face reconstruction method in the embodiment of the present disclosure, the implementation of the device may refer to the implementation of the method, and repeated details are not repeated.

Referring to fig. 10, a schematic diagram of a face reconstruction apparatus provided in an embodiment of the present disclosure is shown, where the apparatus includes: the device comprises a first acquisition module 10, a first processing module 20, a determination module 30 and a generation module 40; wherein the content of the first and second substances,

the first acquisition module 10 is configured to acquire dense point information of an original face included in a target image;

the first processing module 20 is configured to fit dense point information of the original face with dense point information of a first reference face corresponding to each of the plurality of reference images to obtain multiple sets of target coefficients corresponding to the dense point information of the first reference face;

the determining module 30 is configured to determine dense point information of a target face model based on dense point information of a second reference face with a preset style and target coefficients corresponding to multiple sets of dense point information of the first reference face respectively; the second reference face is generated based on the first reference face in the reference image;

a generating module 40, configured to generate a target face model corresponding to the original face of the target image based on dense point information of the target face model.

In an optional embodiment, when determining the dense point information of the target face model based on the dense point information of the second reference face with the preset style and the target coefficients corresponding to the multiple sets of dense point information of the first reference face, the determining module 30 is configured to: generating average value information of the dense point information of the multiple groups of second reference faces based on the dense point information of the multiple groups of second reference faces; and generating the dense point information of the target face model based on the target coefficients respectively corresponding to the dense point information of the multiple groups of second reference faces, the average value information and the dense point information of the multiple groups of first reference faces.

In an optional embodiment, when the determining module 30 generates the dense point information of the target face model based on target coefficients corresponding to multiple sets of dense point information of the second reference face, the average value information, and multiple sets of dense point information of the first reference face, respectively, the determining module is configured to: determining difference value information of the dense point information of each group of second reference faces based on the dense point information of each group of second reference faces in the dense point information of the groups of second reference faces and the average value information; performing interpolation processing on difference value information respectively corresponding to the dense point information of the multiple groups of second reference faces on the basis of target coefficients respectively corresponding to the dense point information of the multiple groups of first reference faces; and generating dense point information of the target face model based on the interpolation processing result and the mean value information.

In an alternative embodiment, the first obtaining module 10, when obtaining dense point information of an original face included in a target image, is configured to: acquiring the target image comprising the original face; and processing the target image by using a pre-trained neural network to obtain dense point information of the original face in the target image.

In an optional embodiment, the second processing module 50 is further included for: acquiring a plurality of reference images including a first reference face; and processing each reference image by utilizing a pre-trained neural network aiming at each reference image in the plurality of reference images to obtain dense point information of the first reference face in each reference image.

In an optional implementation manner, when the dense point information of the original face is fitted by using dense point information of a first reference face corresponding to each of multiple reference images to obtain multiple sets of target coefficients corresponding to the dense point information of the first reference face, the first processing module 20 is configured to: performing least square processing on the dense point information of the original face and the dense point information of the first reference face to obtain a plurality of groups of intermediate coefficients corresponding to the dense point information of the first reference face respectively; and determining a target coefficient corresponding to the dense point information of each group of the first reference faces based on the intermediate coefficient corresponding to the dense point information of each group of the first reference faces in the dense point information of the groups of the first reference faces.

In an optional implementation manner, when determining, based on the intermediate coefficient corresponding to the dense point information of each group of the first reference faces in the multiple groups of the dense point information of the first reference faces, the first processing module 20 is configured to: determining first dense point information representing the part of the first reference face corresponding to the target face model from the dense point information of each group of the first reference faces; adjusting an intermediate coefficient corresponding to the first type of dense point information in the dense point information of the first reference face to obtain a first target coefficient; determining an intermediate coefficient corresponding to second dense point information in dense point information of the first reference face as a second target coefficient; the second type of dense point information is dense point information except the first type of dense point information in the dense point information of the first reference face; and obtaining the target coefficient of the dense point information of each group of the first reference face based on the first target coefficient and the second target coefficient.

In an alternative embodiment, the adjusting module 60 is further included for: adjusting dense point information of a first reference face of the first reference face in the reference image to obtain dense point information of a second reference face with a preset style; or generating a virtual face image comprising a second reference face with the preset style based on a first reference face in the reference image; and generating dense point information of the second reference face in the virtual face image by utilizing a pre-trained neural network.

In an alternative embodiment, the method further includes a training module 70, when training the neural network, for: acquiring a sample image set; the sample image set comprises a plurality of first sample images of a first sample face and a second sample image of a second sample face; the plurality of first sample images are divided into a plurality of first sample image subsets, and each first sample image subset comprises images of first sample faces with the same expression, which are acquired from a plurality of preset acquisition angles respectively; acquiring dense point information of a first sample face of a first sample image and dense point information of a second sample face of a second sample image in the sample image set; performing feature learning on a first sample image and a second sample image in the sample image set by using an initial neural network to obtain predicted dense point information of a first sample face of the first sample image and predicted dense point information of a second sample face of the second sample image; and training the initial neural network by using the dense point information and the predicted dense point information of the first sample face and the dense point information and the predicted dense point information of the second sample face, and obtaining the neural network after training.

In an alternative embodiment, the training module 70, when obtaining the dense point information of the second sample face of the second sample image, is configured to: acquiring face key point information of each second sample image; and fitting and generating dense point information of a second sample face of a second sample image by using the face key point information of the second sample image and the second sample image.

a second obtaining module 80 is further included for: acquiring dense point information of a third sample face of the third sample image; performing feature learning on a third sample image by using the initial neural network to obtain predicted dense point information of a third sample face of the third sample image;

the training module 70 is further configured to: performing feature learning on a third sample image in the sample image set by using the initial neural network to obtain predicted dense point information of a third sample face of the third sample image;

the training module 70 is configured to train the initial neural network by using the dense point information and the predicted dense point information of the first sample face, and the dense point information and the predicted dense point information of the second sample face, and when the neural network is obtained after training is completed, to: and training the initial neural network by using the dense point information and the predicted dense point information of the first sample face, the dense point information and the predicted dense point information of the second sample face, and the dense point information and the predicted dense point information of the third sample face, and obtaining the neural network after training.

The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.

An embodiment of the present disclosure further provides a computer device, as shown in fig. 11, which is a schematic structural diagram of the computer device provided in the embodiment of the present disclosure, and includes:

a processor 11 and a memory 12; the memory 12 stores machine-readable instructions executable by the processor 11, the processor 11 being configured to execute the machine-readable instructions stored in the memory 12, the processor 11 performing the following steps when the machine-readable instructions are executed by the processor 11:

acquiring dense point information of an original face included in a target image; fitting dense point information of an original face by utilizing dense point information of first reference faces corresponding to a plurality of reference images respectively to obtain target coefficients corresponding to dense point information of a plurality of groups of first reference faces respectively; determining dense point information of a target face model based on dense point information of a second reference face with a preset style and target coefficients corresponding to the dense point information of multiple groups of first reference faces respectively; the second reference face is generated based on the first reference face in the reference image; and generating a target face model corresponding to the original face of the target image based on the dense point information of the target face model.

The memory 12 includes a memory 121 and an external memory 122; the memory 121 is also referred to as an internal memory, and is used to temporarily store operation data in the processor 11 and data exchanged with the external memory 122 such as a hard disk, and the processor 11 exchanges data with the external memory 122 through the memory 121.

The specific execution process of the instruction may refer to the steps of the face reconstruction method described in the embodiments of the present disclosure, and details are not repeated here.

The embodiments of the present disclosure also provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the face reconstruction method in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the face reconstruction method in the foregoing method embodiments, which may be referred to specifically for the foregoing method embodiments, and are not described herein again.

The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A face reconstruction method, comprising:

acquiring dense point information of an original face included in a target image;

fitting the dense point information of the original face by utilizing dense point information of a first reference face corresponding to a plurality of reference images respectively to obtain a plurality of groups of target coefficients corresponding to the dense point information of the first reference face respectively;

determining dense point information of a target face model based on dense point information of a second reference face with a preset style and multiple groups of target coefficients corresponding to the dense point information of the first reference face respectively; the second reference face is generated based on the first reference face in the reference image;

and generating a target face model corresponding to the original face of the target image based on the dense point information of the target face model.

2. The method according to claim 1, wherein the determining dense point information of the target face model based on dense point information of a second reference face having a preset style and multiple sets of target coefficients corresponding to the dense point information of the first reference face respectively comprises:

generating average value information of the dense point information of the multiple groups of second reference faces based on the dense point information of the multiple groups of second reference faces;

and generating the dense point information of the target face model based on the target coefficients respectively corresponding to the dense point information of the multiple groups of second reference faces, the average value information and the dense point information of the multiple groups of first reference faces.

3. The method according to claim 2, wherein the generating dense point information of the target face model based on the target coefficients corresponding to the multiple sets of dense point information of the second reference face, the average value information, and the multiple sets of dense point information of the first reference face respectively comprises:

determining difference value information of the dense point information of each group of second reference faces based on the dense point information of each group of second reference faces in the dense point information of the groups of second reference faces and the average value information;

performing interpolation processing on difference value information respectively corresponding to the dense point information of the multiple groups of second reference faces on the basis of target coefficients respectively corresponding to the dense point information of the multiple groups of first reference faces;

and generating dense point information of the target face model based on the interpolation processing result and the mean value information.

4. The method according to any one of claims 1 to 3, wherein the obtaining dense point information of the original face included in the target image comprises:

acquiring the target image comprising the original face;

and processing the target image by using a pre-trained neural network to obtain dense point information of the original face in the target image.

5. The face reconstruction method according to any one of claims 1 to 4, further comprising:

acquiring a plurality of reference images including a first reference face;

and processing each reference image by utilizing a pre-trained neural network aiming at each reference image in the plurality of reference images to obtain dense point information of the first reference face in each reference image.

6. The method according to any one of claims 1 to 5, wherein the fitting of the dense point information of the original face with the dense point information of the first reference face corresponding to each of the plurality of reference images to obtain a plurality of sets of target coefficients corresponding to the dense point information of the first reference face respectively comprises:

performing least square processing on the dense point information of the original face and the dense point information of the first reference face to obtain a plurality of groups of intermediate coefficients corresponding to the dense point information of the first reference face respectively;

and determining a target coefficient corresponding to the dense point information of each group of the first reference faces based on the intermediate coefficient corresponding to the dense point information of each group of the first reference faces in the dense point information of the groups of the first reference faces.

7. The method according to claim 6, wherein the determining the target coefficient corresponding to the dense point information of each group of the first reference faces based on the intermediate coefficient corresponding to the dense point information of each group of the first reference faces in the dense point information of the groups of the first reference faces comprises:

determining first dense point information representing the part of the first reference face corresponding to the target face model from the dense point information of each group of the first reference faces;

adjusting an intermediate coefficient corresponding to the first type of dense point information in the dense point information of the first reference face to obtain a first target coefficient;

determining an intermediate coefficient corresponding to second dense point information in dense point information of the first reference face as a second target coefficient; the second type of dense point information is dense point information except the first type of dense point information in the dense point information of the first reference face;

and obtaining the target coefficient of the dense point information of each group of the first reference face based on the first target coefficient and the second target coefficient.

8. The method of any of claims 1-7, wherein the method further comprises:

adjusting dense point information of a first reference face of the first reference face in the reference image to obtain dense point information of a second reference face with a preset style; alternatively, the first and second electrodes may be,

generating a virtual face image comprising a second reference face with the preset style based on a first reference face in the reference image; and generating dense point information of the second reference face in the virtual face image by utilizing a pre-trained neural network.

9. The method of any of claims 4, 5, and 8, wherein training the neural network comprises:

acquiring a sample image set; the sample image set comprises a plurality of first sample images of a first sample face and a second sample image of a second sample face; the plurality of first sample images are divided into a plurality of first sample image subsets, and each first sample image subset comprises images of first sample faces with the same expression, which are acquired from a plurality of preset acquisition angles respectively;

acquiring dense point information of a first sample face of a first sample image and dense point information of a second sample face of a second sample image in the sample image set;

performing feature learning on a first sample image and a second sample image in the sample image set by using an initial neural network to obtain predicted dense point information of a first sample face of the first sample image and predicted dense point information of a second sample face of the second sample image;

and training the initial neural network by using the dense point information and the predicted dense point information of the first sample face and the dense point information and the predicted dense point information of the second sample face, and obtaining the neural network after training.

10. The method according to claim 9, wherein the obtaining dense point information of the second sample face of the second sample image comprises:

acquiring face key point information of each second sample image;

and fitting and generating dense point information of a second sample face of a second sample image by using the face key point information of the second sample image and the second sample image.

11. The method according to claim 9 or 10, wherein the sample image set further comprises: a third sample image; the third sample image is obtained by performing data enhancement processing on the first sample image;

the face reconstruction method further comprises the following steps:

acquiring dense point information of a third sample face of the third sample image;

performing feature learning on a third sample image by using the initial neural network to obtain predicted dense point information of a third sample face of the third sample image;

the training of the initial neural network by using the dense point information and the predicted dense point information of the first sample face and the dense point information and the predicted dense point information of the second sample face, and obtaining the neural network after the training, includes:

and training the initial neural network by using the dense point information and the predicted dense point information of the first sample face, the dense point information and the predicted dense point information of the second sample face, and the dense point information and the predicted dense point information of the third sample face, and obtaining the neural network after training.

12. The method of claim 11, wherein the data enhancement process comprises at least one of: random occlusion processing, gaussian noise processing, motion blur processing, and color area channel change processing.

13. A face reconstruction apparatus, comprising:

14. A computer device, comprising: a processor, a memory storing machine-readable instructions executable by the processor, the processor being configured to execute the machine-readable instructions stored in the memory, the processor performing the steps of the face reconstruction method according to any one of claims 1 to 12 when the machine-readable instructions are executed by the processor.

15. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when executed by a computer device, performs the steps of the face reconstruction method according to any one of claims 1 to 12.