CN116385615A

CN116385615A - Virtual face generation method, device, computer equipment and storage medium

Info

Publication number: CN116385615A
Application number: CN202310380739.9A
Authority: CN
Inventors: 龙玉璞; 谷长健; 侯杰; 刘柏; 范长杰
Original assignee: Netease Hangzhou Network Co Ltd
Current assignee: Netease Hangzhou Network Co Ltd
Priority date: 2023-04-10
Filing date: 2023-04-10
Publication date: 2023-07-04

Abstract

The embodiment of the application discloses a virtual face generation method, a virtual face generation device, computer equipment and a storage medium. Comprising the following steps: acquiring a first facial key point of a first facial image and a second facial key point of a first virtual facial image; aligning the first facial key points with the corresponding second facial key points based on the second facial key points to obtain second facial images corresponding to the first facial images; determining head pose information according to the second facial image, inputting the head pose information into a renderer, and outputting a second virtual facial image of which the head pose information corresponds to the head pose information of the second facial image; acquiring a third face key point of the second virtual face image, and aligning the first face key point with the corresponding third face key point based on the third face key point to obtain a third face image corresponding to the first face image; and determining a target pinching face parameter according to the third facial image, and generating a three-dimensional virtual head portrait corresponding to the first facial image through the target pinching face parameter.

Description

Virtual face generation method, device, computer equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a virtual face generating method, a virtual face generating device, a computer device, and a storage medium.

Background

With the development of computer technology and network technology, concepts such as metauniverse, virtual characters and the like are presented at present. People can pinch the face through the computer equipment according to the real face information of the people in reality, so that the virtual face of the person is created.

However, in the prior art, the face is often pinched according to the face of the user, and when the real face information input by the user is not the face, the virtual character face pinched by the computer device is greatly different from the real face of the user.

Disclosure of Invention

The embodiment of the application provides a virtual face generation method, a virtual face generation device, computer equipment and a storage medium. The virtual face generation method can generate a virtual face image approximating a real face.

In a first aspect, an embodiment of the present application provides a virtual face generating method, including:

acquiring a first facial key point of a first facial image and a second facial key point of a first virtual facial image;

aligning the first surface key point and the second surface key point based on the second surface key point to obtain a second surface image corresponding to the first surface image;

Determining head pose information according to the second facial image, inputting the head pose information into a renderer, and outputting a second virtual facial image of which the head pose information corresponds to the head pose information of the second facial image;

acquiring a third face key point of the second virtual face image, and aligning the first face key point with the third face key point based on the third face key point to obtain a third face image corresponding to the first face image;

and determining a target face pinching parameter according to the third face image, and generating a three-dimensional virtual head portrait corresponding to the first face image through the target face pinching parameter.

In a second aspect, an embodiment of the present application provides a virtual face generating apparatus, including:

the acquisition module is used for acquiring first facial key points of the first facial image and second facial key points of the first virtual facial image;

the first alignment module is used for aligning the first surface key point and the second surface key point based on the second surface key point to obtain a second surface image corresponding to the first surface image;

the gesture acquisition module is used for determining head gesture information according to the second facial image, inputting the head gesture information into the renderer and outputting a second virtual facial image of which the head gesture information corresponds to the head gesture information of the second facial image;

The second alignment module is used for acquiring a third face key point of the second virtual face image, and aligning the first face key point with the third face key point based on the third face key point to obtain a third face image corresponding to the first face image;

and the generating module is used for determining target face pinching parameters according to the third face image and generating a three-dimensional virtual head portrait corresponding to the first face image through the target face pinching parameters.

In a third aspect, embodiments of the present application provide a computer-readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the virtual face generation method provided in embodiments of the present application.

In a fourth aspect, embodiments of the present application provide a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the virtual face generating method provided by the embodiments of the present application when executing the program.

In the embodiment of the application, the computer equipment acquires a first facial key point of a first facial image and a second facial key point of a first virtual facial image; aligning the first surface key point and the second surface key point based on the second surface key point to obtain a second surface image corresponding to the first surface image; determining head pose information according to the second facial image, inputting the head pose information into a renderer, and outputting a second virtual facial image of which the head pose information corresponds to the head pose information of the second facial image; acquiring a third face key point of the second virtual face image, and aligning the first face key point with the third face key point based on the third face key point to obtain a third face image corresponding to the first face image; and determining a target face pinching parameter according to the third face image, and generating a three-dimensional virtual head portrait corresponding to the first face image through the target face pinching parameter. Thereby generating an avatar face image approximating a real face.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a virtual face generating method according to an embodiment of the present application.

Fig. 2 is a schematic diagram of a first scenario of a virtual face generating method according to an embodiment of the present application.

Fig. 3 is a second flowchart of a virtual face generating method according to an embodiment of the present application.

Fig. 4 is a second scene schematic diagram of a virtual face generating method according to an embodiment of the present application.

Fig. 5 is a schematic diagram of head pose information provided in an embodiment of the present application.

Fig. 6 is a comparison diagram of virtual faces provided in an embodiment of the present application.

Fig. 7 is a schematic structural diagram of a virtual face generating apparatus provided in an embodiment of the present application.

Fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

In order to solve the technical problem, embodiments of the present application provide a virtual face generating method, a virtual face generating device, a computer device and a storage medium. The virtual face generation method can generate a virtual face image approximating a real face.

Referring to fig. 1, fig. 1 is a schematic flow chart of a virtual face generating method according to an embodiment of the present application. The virtual face generation method may include the steps of:

110. a first facial keypoint of the first facial image is acquired, and a second facial keypoint of the first virtual facial image.

In some embodiments, the first facial image may be a real facial image in which the face occupies a larger area, such as more than 95% of the area occupied by the face in the first facial image.

In some embodiments, the computer device may acquire an initial image, input the initial image into the face detection model, and output the first facial image before acquiring the first facial keypoints of the first facial image.

For example, the initial image may be an upper body image of a real person, including a head and an upper body torso of the person, and in order to improve the recognition efficiency of the facial key points of the person, the initial image may be input into a face detection model, such as a RetinaFace deep learning model, so as to cut the initial image, thereby obtaining the first facial image.

The computer device may input the first facial image into a keypoint identification model and output the first facial keypoints, such as the distribution of the first facial keypoints in the eyes, nose, mouth, eyebrows, facial edge contours, etc.

The computer device inputs a first virtual face image into the keypoint identification model, outputs a second face keypoint, and the first virtual face image is a frontal face image. The first virtual face image is a two-dimensional image, and in the first virtual face image, the front face of the virtual character is displayed, namely, a viewer views a picture under the virtual character from the front face under the same height.

The keypoint identification model may be a keypoint detection deep learning model, such as a 3ddfa V2 model, which may output information such as the number, coordinates, distribution, etc. of facial keypoints when an image is input to the keypoint identification model.

120. And aligning the first surface key point with the second surface key point based on the second surface key point to obtain a second surface image corresponding to the first surface image.

In some embodiments, the computer device controls the first facial keypoints to align with corresponding second facial keypoints based on the second facial keypoints, thereby effecting an adjustment of the first facial image, thereby resulting in an adjusted second facial image. Wherein the facial key points in the second facial image are at least partially spatially distributed the same as the distribution of the first facial key points.

In the process of aligning the first surface key point to the second surface key point, the computer equipment can divide the first surface key point into a plurality of sub-key point sets, and then align the first surface key point in each sub-key point set to the corresponding second surface key point in sequence, so that the alignment accuracy of the first surface key point to the second surface key point is improved.

In some implementations, the computer device can determine a first coordinate of the first facial keypoint and a second coordinate of the second facial keypoint. And based on the second facial key points, aligning the first facial key points with the second facial key points according to the first coordinates, the second coordinates and an affine transformation algorithm to obtain a second facial image corresponding to the first facial image. Wherein the affine transformation is a linear transformation from two-dimensional coordinates to two-dimensional coordinates.

Referring to fig. 2 together, fig. 2 is a schematic view of a first scenario of a virtual face generating method according to an embodiment of the present application. Wherein the first face image is a, the first virtual face image is b1, the second face image is a1, the second virtual face image is b2, and the third face image is a2.

The computer apparatus obtains a second face image a1 according to the alignment result by controlling the alignment of the first face key point of the first face image to the second face key point of the first virtual face image b 1. The coordinates of at least part of the key points of the second facial image are approximately the same as those of the key points of the second facial image, and the final presented result is that the coordinates of the facial features and the facial contours of the second facial image are approximately the same as those of the first virtual facial image.

130. Head pose information is determined according to the second face image, the head pose information is input into the renderer, and a second virtual face image, in which the head pose information corresponds to the head pose information of the second face image, is output.

In some embodiments, the computer device may input the second facial image into a pose calculation model, output a yaw angle, a pitch angle, and a roll angle of the head in the second facial image, and the head pose information includes the yaw angle (yaw), the pitch angle (pitch), and the roll angle (roll).

For example, the gesture calculation model may be an FSANet deep learning model, and when the second facial image is input to the model, the model may output values corresponding to a yaw angle, a pitch angle, and a roll angle of the head of the person in the second facial image, respectively.

The computer device inputs a yaw angle, a pitch angle, and a roll angle of the head in the second face image into the renderer, and outputs a second virtual face image having the yaw angle, the pitch angle, and the roll angle. Wherein the second virtual face image has the same head pose information as the second face image.

For example, the renderer may be a micro-renderer capable of inputting arbitrary pose information, thereby generating a second virtual face image having corresponding pose information. For example, when the input pose information of the head is all zero, the virtual face image output by the renderer is a front face image, i.e., a first virtual face image. After the posture information of the head of the second face image is input, the virtual character in the second virtual face image output by the renderer has the same posture information as the character in the second face image.

The second virtual face image is also a two-dimensional image.

140. And acquiring a third face key point of the second virtual face image, and aligning the first face key point with the third face key point based on the third face key point to obtain a third face image corresponding to the first face image.

The computing device may input the second virtual face image into the keypoint identification model, thereby obtaining the third face keypoint and information of coordinates, distribution, number, and the like of the third face keypoint.

In some embodiments, the computer device controls the first facial keypoints to align with corresponding third facial keypoints based on the third facial keypoints, thereby effecting an adjustment of the first facial image, thereby resulting in an adjusted second facial image. Wherein the facial key points in the second facial image are at least partially spatially distributed the same as the distribution of the first facial key points.

In the process of aligning the first surface key point to the third surface key point, the computer equipment can divide the first surface key point into a plurality of sub-key point sets, and then align the first surface key point in each sub-key point set to the corresponding third surface key point in turn, so that the accuracy of aligning the first surface key point to the third surface key point is improved.

In some implementations, the computer device can determine a first coordinate of the first face keypoint and a third coordinate of the third face keypoint. And based on the third surface key point, aligning the first surface key point with the third surface key point according to the first coordinate, the third coordinate and the affine transformation algorithm to obtain a second surface image corresponding to the first surface image. Wherein the affine transformation is a linear transformation from two-dimensional coordinates to two-dimensional coordinates.

Referring to fig. 2, the computer apparatus aligns the first facial key points of the first facial image with the third facial key points of the second virtual facial image b2, so as to obtain a third facial image a2 according to the alignment result. Wherein at least some of the keypoint coordinates of the third facial image are substantially the same as the coordinates of the third facial keypoint, the end result being that the facial features and face contour coordinates of the second facial image are substantially the same as the facial features and face contour coordinates in the first virtual facial image. Wherein the third face image is a two-dimensional image.

150. And determining a target face pinching parameter according to the third face image, and generating a three-dimensional virtual head portrait corresponding to the first face image through the target face pinching parameter.

In some implementations, the computer device may set the head pose information in the renderer resulting in a target renderer with a head pose; the pinching face parameter is input into the target renderer, a target virtual face image similar to the third face image is output, and the pinching face parameter of the target virtual face image is determined as the target pinching face parameter.

The target renderer may be a renderer after setting the head pose information, and the virtual character in the output image output by the target renderer may have the head pose information.

In some implementations, the computer device may input the pinching face parameters into the target renderer to obtain an output image; comparing each pixel of the output image with the corresponding pixel in the third face image to obtain a comparison result; and if the comparison result meets the preset condition, determining the output image as the target virtual face image.

The pinching face parameter may be a parameter set by a user, which is set after the user refers to the third face image, or may be a parameter set by the computer device. The pinching face parameters may include parameters that adjust different portions of eyes, nose, mouth, eyebrows, facial shapes, etc. The pinching face parameter is input into a target renderer, which can output an output image with head pose information. If the output image and the third face image are compared and then have higher similarity, determining the output image as a target virtual face image, and determining the parameters of the target virtual face image as target pinching face parameters.

And finally, the computer equipment inputs the target face pinching parameters into the three-dimensional rendering engine and outputs a three-dimensional virtual head portrait corresponding to the first facial image. In the three-dimensional virtual head portrait, the similarity of the face of the virtual character and the face of the character in the first face image is high, and the face of the virtual character looks more vivid.

For a more detailed understanding of the virtual face generation method provided in the present application, please continue to refer to fig. 3, fig. 3 is a second flowchart of the virtual face generation method provided in the embodiment of the present application. The virtual face generation method may include the steps of:

201. a first facial keypoint of the first facial image is acquired, and a second facial keypoint of the first virtual facial image.

202. First coordinates of the first face keypoint and second coordinates of the second face keypoint are determined.

The computer device may determine a first coordinate of the first facial key point and a second coordinate of the second facial key point from the results output by the key point recognition model.

203. And based on the second facial key points, aligning the first facial key points with the second facial key points according to the first coordinates, the second coordinates and an affine transformation algorithm to obtain a second facial image corresponding to the first facial image.

Referring to fig. 4 together, fig. 4 is a schematic diagram of a second scenario of the virtual face generating method according to the embodiment of the present application. Wherein the first virtual face image is B1, the second face image is a1, the second virtual face image is B2, the third face image is a2, the target virtual face image is B3, and the three-dimensional virtual head image is B.

The computer apparatus obtains a second face image a1 according to the alignment result by controlling the alignment of the first face key point of the first face image to the second face key point of the first virtual face image b 1. The coordinates of at least part of the key points of the second facial image are approximately the same as those of the key points of the second facial image, and the final presented result is that the coordinates of the facial features and the facial contours of the second facial image are approximately the same as those of the first virtual facial image. Wherein the affine transformation is a linear transformation from two-dimensional coordinates to two-dimensional coordinates.

204. And inputting the second facial image into a gesture calculation model, and outputting the yaw angle, the pitch angle and the roll angle of the head in the second facial image, wherein the head gesture information comprises the yaw angle, the pitch angle and the roll angle.

Referring to fig. 5 together, fig. 5 is a schematic diagram of head pose information according to an embodiment of the present application.

As shown in fig. 5, a three-dimensional space coordinate system may be established on the head, wherein the three-dimensional space coordinate system includes an X-axis, a Y-axis, a Z-axis, wherein a pitch angle (pitch) is changed around the X-axis, wherein a roll angle (roll) is changed around the Z-axis, and wherein a yaw angle (yaw) is changed around the Y-axis.

It will be appreciated that when the head of a person moves in three dimensions, three different angular values of yaw, pitch and roll are produced. In order to enable the face in the three-dimensional virtual head portrait finally generated by the computer equipment to be more similar to the real face, in the process of generating the three-dimensional virtual head portrait, the head posture information of the real person can be considered to be added, so that the face in the three-dimensional virtual head portrait finally generated can be more similar to the real face.

The gesture calculation model may be an FSANet deep learning model, and after the second facial image is input to the model, the model may output values corresponding to a yaw angle, a pitch angle, and a roll angle of the head of the person in the second facial image, respectively.

205. The yaw angle, pitch angle and roll angle of the head in the second face image are input into the renderer, and the second virtual face image having the yaw angle, pitch angle and roll angle is output.

The second virtual face image is also a two-dimensional image.

206. And acquiring a third face key point of the second virtual face image, and aligning the first face key point with the third face key point based on the third face key point to obtain a third face image corresponding to the first face image.

In the process of aligning the first surface key point to the third surface key point, the computer equipment can divide the first surface key point into a plurality of sub-key point sets, and then align the first surface key point in each sub-key point set to the corresponding third surface key point in turn, so that the alignment accuracy of the first surface key point to the third surface key point is improved.

Referring to fig. 4, the computer apparatus aligns the first facial key points of the first facial image with the third facial key points of the second virtual facial image b2, so as to obtain a third facial image a2 according to the alignment result. Wherein at least some of the keypoint coordinates of the third facial image are substantially the same as the coordinates of the third facial keypoint, the end result being that the facial features and face contour coordinates of the second facial image are substantially the same as the facial features and face contour coordinates in the first virtual facial image. Wherein the third face image is a two-dimensional image.

207. And setting the head posture information in the renderer to obtain the target renderer with the head posture.

The target renderer may be a renderer after setting the head pose information, the virtual character in the output image output by the target renderer having the head pose information.

208. The pinching face parameter is input into the target renderer, a target virtual face image similar to the third face image is output, and the pinching face parameter of the target virtual face image is determined as the target pinching face parameter.

The pinching face parameter may be a parameter set by a user, which is set after the user refers to the third face image, or may be a parameter set by the computer device. The pinching face parameters may include parameters that adjust different portions of eyes, nose, mouth, eyebrows, facial shapes, etc. The pinching face parameter is input into a target renderer, which can output an output image with head pose information.

In some implementations, the computer device may input the third facial image into a face semantic segmentation model, outputting a first class probability for each pixel in the third facial image; inputting the output image into a facial semantic segmentation model, and outputting the second class probability of each pixel in the image; and determining a norm loss between the second class probability of each pixel in the output image and the first class probability of the corresponding pixel in the third face image, and obtaining a comparison result.

In the case where the pinching face parameter is input to the target renderer, the output image output by the target renderer may have a low similarity with the third face image, for example, the eyebrow thickness may be different, the mouth size may be different, and the like. It is therefore necessary to set a fit loss function for the target renderer, by which the result of the comparison between the output image and the third face image is measured.

The fit loss function may be a result of face semantic segmentation, for example, a face image is input into a face semantic segmentation model, and the face semantic segmentation model may classify each pixel in the image, for example, classification includes a plurality of types such as background, eyes, nose, mouth, eyebrows, facial types, and the like.

In some embodiments, inputting the third facial image into a face semantic segmentation model, outputting a first class probability for each pixel in the third facial image; inputting the output image into a facial semantic segmentation model, and outputting the second class probability of each pixel in the image; and determining a norm loss between the second class probability of each pixel in the output image and the first class probability of the corresponding pixel in the third face image, and obtaining a comparison result.

And if the norm loss is not in the preset norm loss range, adjusting the pinching face parameter until the renderer outputs a target virtual face image similar to the third face image according to the pinching face parameter after adjustment.

And if the norm loss is determined to be within the preset norm loss range, determining the output image as the target virtual face image.

For example, for a pixel in the output image, the computer device may obtain a first class probability corresponding to the pixel, and then determine a corresponding target pixel and a second class probability corresponding to the target pixel in the third face image.

And calculating the L1 norm loss or the L2 norm loss between the first class probability and the second class probability. If the preset norm loss range is greater than or equal to zero and smaller than a positive value, and the norm loss range corresponding to the pixel is just within the range, the face pinching parameter corresponding to the pixel is indicated to be suitable, and the face pinching parameter can be determined to be the target face pinching parameter.

If the norm loss range corresponding to the pixel is not in the norm loss range, the fact that the pinching face parameter of the pixel needs to be adjusted is indicated, and then the pinching face parameter after adjustment is input into the target renderer until the norm loss range corresponding to the pixel is just in the range.

Similarly, the pinching parameters of the face of the whole virtual character can be adjusted in the mode, so that the output image finally output by the target renderer is similar to the third face image, the output image is determined to be the target virtual face image, and the pinching parameters corresponding to the target virtual face image are determined to be the target pinching parameters.

209. And inputting the target face pinching parameters into a three-dimensional rendering engine, and outputting a three-dimensional virtual head portrait corresponding to the first face image.

Referring to fig. 6, fig. 6 is a comparison diagram of virtual faces according to an embodiment of the present application.

The real face image is C, the image with the head posture information generated by the virtual face generating method in the application is d1, the image without the head posture information is d2, and the virtual face image generated in the prior art is E.

By comparing with the image of the real human face, the face of the virtual character generated in the method is closer to the real human face, and the technical effect cannot be achieved in the prior art.

In an embodiment of the present application, the computer device obtains a first facial keypoint of the first facial image and a second facial keypoint of the first virtual facial image. First coordinates of the first face keypoint and second coordinates of the second face keypoint are determined. And based on the second facial key points, aligning the first facial key points with the second facial key points according to the first coordinates, the second coordinates and an affine transformation algorithm to obtain a second facial image corresponding to the first facial image. And inputting the second facial image into a gesture calculation model, and outputting the yaw angle, the pitch angle and the roll angle of the head in the second facial image, wherein the head gesture information comprises the yaw angle, the pitch angle and the roll angle. The yaw angle, pitch angle and roll angle of the head in the second face image are input into the renderer, and the second virtual face image having the yaw angle, pitch angle and roll angle is output.

And then acquiring a third face key point of the second virtual face image, and aligning the first face key point with the third face key point based on the third face key point to obtain a third face image corresponding to the first face image.

Finally, the head posture information is set in the renderer to obtain the target renderer with the head posture. The pinching face parameter is input into the target renderer, a target virtual face image similar to the third face image is output, and the pinching face parameter of the target virtual face image is determined as the target pinching face parameter. And inputting the target face pinching parameters into a three-dimensional rendering engine, and outputting a three-dimensional virtual head portrait corresponding to the first face image. The virtual face generated by the embodiment of the application is more realistic than the face of the person, and has more realistic effects.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a virtual face generating apparatus according to an embodiment of the present application. The virtual face generation apparatus 300 may include:

an acquiring module 310, configured to acquire a first facial key point of the first facial image, and a second facial key point of the first virtual facial image;

the first alignment module 320 is configured to align the first facial key point and the second facial key point based on the second facial key point, so as to obtain a second facial image corresponding to the first facial image;

A pose acquisition module 330, configured to determine head pose information according to the second facial image, input the head pose information into the renderer, and output a second virtual facial image in which the head pose information corresponds to the head pose information of the second facial image;

a second alignment module 340, configured to obtain a third face key point of the second virtual face image, and align the first face key point and the third face key point based on the third face key point, so as to obtain a third face image corresponding to the first face image;

the generating module 350 is configured to determine a target pinching face parameter according to the third facial image, and generate a three-dimensional virtual head portrait corresponding to the first facial image according to the target pinching face parameter.

In some embodiments, the acquiring module 310 is further configured to acquire an initial image, input the initial image into the face detection model, and output a first facial image.

In some embodiments, the first alignment module 320 is further configured to determine a first coordinate of the first facial key point and a second coordinate of the second facial key point;

and based on the second facial key points, aligning the first facial key points with the second facial key points according to the first coordinates, the second coordinates and an affine transformation algorithm to obtain a second facial image corresponding to the first facial image.

In some embodiments, the obtaining module 310 is further configured to input the first facial image into the keypoint identification model and output the first facial keypoint;

and inputting the first virtual face image into the key point identification model, outputting a second face key point, wherein the first virtual face image is a positive face image.

In some embodiments, the gesture obtaining module 330 is further configured to input the second facial image into the gesture calculation model, and output a yaw angle, a pitch angle, and a roll angle of the head in the second facial image, where the head gesture information includes the yaw angle, the pitch angle, and the roll angle.

In some embodiments, the gesture obtaining module 330 is further configured to input a yaw angle, a pitch angle, and a roll angle of the head in the second facial image into the renderer, and output a second virtual facial image with the yaw angle, the pitch angle, and the roll angle.

In some embodiments, the generating module 350 is further configured to set the head pose information in the renderer to obtain the target renderer with the head pose;

the pinching face parameter is input into the target renderer, a target virtual face image similar to the third face image is output, and the pinching face parameter of the target virtual face image is determined as the target pinching face parameter.

In some embodiments, the generating module 350 is further configured to input the pinching face parameter into the target renderer to obtain an output image;

comparing each pixel of the output image with the corresponding pixel in the third face image to obtain a comparison result;

and if the comparison result meets the preset condition, determining the output image as the target virtual face image.

In some embodiments, the generating module 350 is further configured to input the third facial image into the face semantic segmentation model, and output a first class probability of each pixel in the third facial image;

inputting the output image into a facial semantic segmentation model to obtain a second class probability of each pixel in the output image;

and determining a norm loss between the second class probability of each pixel in the output image and the first class probability of the corresponding pixel in the third face image, and obtaining a comparison result.

In some embodiments, the generating module 350 is further configured to adjust the pinching face parameter if the normative loss is determined not to be within the preset normative loss range, until the renderer outputs a target virtual face image similar to the third face image according to the pinching face parameter after adjustment.

In some embodiments, the generating module 350 is further configured to determine the output image as the target virtual face image if the norm loss is determined to be within the preset norm loss range.

In some embodiments, the generating module 350 is further configured to input the target pinching face parameter into the three-dimensional rendering engine, and output a three-dimensional virtual avatar corresponding to the first facial image.

In the embodiment of the application, the virtual face generating device acquires a first face key point of a first face image and a second face key point of the first virtual face image; aligning the first surface key point and the second surface key point based on the second surface key point to obtain a second surface image corresponding to the first surface image; determining head pose information according to the second facial image, inputting the head pose information into a renderer, and outputting a second virtual facial image of which the head pose information corresponds to the head pose information of the second facial image; acquiring a third face key point of the second virtual face image, and aligning the first face key point with the third face key point based on the third face key point to obtain a third face image corresponding to the first face image; and determining a target face pinching parameter according to the third face image, and generating a three-dimensional virtual head portrait corresponding to the first face image through the target face pinching parameter. Thereby generating an avatar face image approximating a real face.

Correspondingly, the embodiment of the application also provides a computer device, which can be a terminal or a server, wherein the terminal can be a terminal device such as a smart phone, a tablet computer, a notebook computer, a touch screen, a game console, a personal computer (PC, personal Computer), a personal digital assistant (Personal Digital Assistant, PDA) and the like. Fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application, as shown in fig. 8. The computer apparatus 400 includes a processor 401 having one or more processing cores, a memory 402 having one or more computer readable storage media, and a computer program stored on the memory 402 and executable on the processor. The processor 401 is electrically connected to the memory 402. It will be appreciated by those skilled in the art that the computer device structure shown in the figures is not limiting of the computer device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

Processor 401 is a control center of computer device 400 and connects the various portions of the entire computer device 400 using various interfaces and lines to perform various functions of computer device 400 and process data by running or loading software programs and/or modules stored in memory 402 and invoking data stored in memory 402, thereby performing overall monitoring of computer device 400.

In the embodiment of the present application, the processor 401 in the computer device 400 loads the instructions corresponding to the processes of one or more application programs into the memory 402 according to the following steps, and the processor 401 executes the application programs stored in the memory 402, so as to implement various functions:

The processor 401 is further configured to implement the functions of:

before the first facial key points of the first facial image are acquired, an initial image is acquired, the initial image is input into a face detection model, and the first facial image is output.

The processor 401 is further configured to implement the functions of:

determining a first coordinate of a first facial key point and a second coordinate of a second facial key point;

The processor 401 is further configured to implement the functions of:

inputting the first facial image into a key point identification model, and outputting a first facial key point;

The processor 401 is further configured to implement the functions of:

and inputting the second facial image into a gesture calculation model, and outputting the yaw angle, the pitch angle and the roll angle of the head in the second facial image, wherein the head gesture information comprises the yaw angle, the pitch angle and the roll angle.

The processor 401 is further configured to implement the functions of:

The yaw angle, pitch angle and roll angle of the head in the second face image are input into the renderer, and the second virtual face image having the yaw angle, pitch angle and roll angle is output.

The processor 401 is further configured to implement the functions of:

setting the head posture information in the renderer to obtain a target renderer with the head posture;

The processor 401 is further configured to implement the functions of:

inputting the pinching face parameters into a target renderer to obtain an output image;

The processor 401 is further configured to implement the functions of:

inputting the third face image into a face semantic segmentation model, and outputting the first class probability of each pixel in the third face image;

The processor 401 is further configured to implement the functions of:

and inputting the target face pinching parameters into a three-dimensional rendering engine, and outputting a three-dimensional virtual head portrait corresponding to the first face image.

The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.

Optionally, as shown in fig. 8, the computer device 400 further includes: a touch display 403, a radio frequency circuit 404, an audio circuit 405, an input unit 406, and a power supply 407. The processor 401 is electrically connected to the touch display 403, the radio frequency circuit 404, the audio circuit 405, the input unit 406, and the power supply 407, respectively. Those skilled in the art will appreciate that the computer device structure shown in FIG. 8 is not limiting of the computer device and may include more or fewer components than shown, or may be combined with certain components, or a different arrangement of components.

The touch display 403 may be used to display a graphical user interface and receive operation instructions generated by a user acting on the graphical user interface. The touch display screen 403 may include a display panel and a touch panel. Wherein the display panel may be used to display information entered by a user or provided to a user as well as various graphical user interfaces of a computer device, which may be composed of graphics, text, icons, video, and any combination thereof. Alternatively, the display panel may be configured in the form of a liquid crystal display (LCD, liquid Crystal Display), an Organic Light-Emitting Diode (OLED), or the like. The touch panel may be used to collect touch operations on or near the user (such as operations on or near the touch panel by the user using any suitable object or accessory such as a finger, stylus, etc.), and generate corresponding operation instructions, and the operation instructions execute corresponding programs. Alternatively, the touch panel may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device, converts it into touch point coordinates, and sends the touch point coordinates to the processor 401, and can receive and execute commands sent from the processor 401. The touch panel may overlay the display panel, and upon detection of a touch operation thereon or thereabout, the touch panel is passed to the processor 401 to determine the type of touch event, and the processor 401 then provides a corresponding visual output on the display panel in accordance with the type of touch event. In the embodiment of the present application, the touch panel and the display panel may be integrated into the touch display screen 403 to implement the input and output functions. In some embodiments, however, the touch panel and the touch panel may be implemented as two separate components to perform the input and output functions. I.e. the touch-sensitive display 403 may also implement an input function as part of the input unit 406.

In the embodiment of the present application, the application program is executed by the processor 401 to generate a graphical user interface on the touch display screen 403. The touch display 403 is used for presenting a graphical user interface and receiving an operation instruction generated by a user acting on the graphical user interface.

The radio frequency circuitry 404 may be used to transceive radio frequency signals to establish wireless communications with a network device or other computer device via wireless communications.

The audio circuitry 405 may be used to provide an audio interface between a user and a computer device through speakers, microphones, and so on. The audio circuit 405 may transmit the received electrical signal after audio data conversion to a speaker, where the electrical signal is converted into a sound signal for output; on the other hand, the microphone converts the collected sound signals into electrical signals, which are received by the audio circuit 405 and converted into audio data, which are processed by the audio data output processor 401 and sent via the radio frequency circuit 404 to, for example, another computer device, or which are output to the memory 402 for further processing. The audio circuit 405 may also include an ear bud jack to provide communication of the peripheral ear bud with the computer device.

The input unit 406 may be used to receive input numbers, character information, or user characteristic information (e.g., fingerprint, iris, facial information, etc.), and to generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control.

The power supply 407 is used to power the various components of the computer device 400. Alternatively, the power supply 407 may be logically connected to the processor 401 through a power management system, so as to implement functions of managing charging, discharging, and power consumption management through the power management system. The power supply 407 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

Although not shown in fig. 8, the computer device 400 may further include a camera, a sensor, a wireless fidelity module, a bluetooth module, etc., which are not described herein.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.

To this end, embodiments of the present application provide a computer readable storage medium having stored therein a plurality of computer programs that can be loaded by a processor to perform steps in any of the virtual face generation methods provided by embodiments of the present application. For example, the computer program may perform the steps of:

The computer program may also perform the steps of:

Wherein the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

Because the computer program stored in the storage medium may execute the steps in any of the virtual face generating methods provided in the embodiments of the present application, the beneficial effects that any of the virtual face generating methods provided in the embodiments of the present application may be achieved, which are detailed in the previous embodiments and are not described herein.

The foregoing describes in detail a virtual face generating method, apparatus, computer device and computer readable storage medium provided in the embodiments of the present application, and specific examples are applied to illustrate the principles and embodiments of the present application, where the foregoing examples are only used to help understand the method and core idea of the present application; meanwhile, those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, and the present description should not be construed as limiting the present application in view of the above.

Claims

1. A virtual face generation method, comprising:

aligning the first facial key points with the second facial key points based on the second facial key points to obtain second facial images corresponding to the first facial images;

and determining a target pinching face parameter according to the third facial image, and generating a three-dimensional virtual head portrait corresponding to the first facial image through the target pinching face parameter.

2. The virtual face generation method of claim 1, wherein prior to the acquiring the first face keypoints of the first face image, the method further comprises:

And acquiring an initial image, inputting the initial image into a face detection model, and outputting the first facial image.

3. The virtual face generating method according to claim 1, wherein the aligning the first face key point and the second face key point based on the second face key point to obtain a second face image corresponding to the first face image includes:

determining a first coordinate of the first facial key point and a second coordinate of the second facial key point;

4. The virtual face generation method of claim 1, wherein the acquiring the first face keypoints of the first face image and the second face keypoints of the first virtual face image comprises:

inputting the first facial image into a key point identification model, and outputting the first facial key point;

and inputting the first virtual facial image into the key point recognition model, and outputting the second facial key point, wherein the first virtual facial image is a positive facial image.

5. The virtual face generation method of claim 1, wherein the determining head pose information from the second face image comprises:

6. The virtual face generation method of claim 5, wherein the inputting the head pose information into a renderer, outputting a second virtual face image whose head pose information corresponds to the head pose information of the second face image, comprises:

and inputting a yaw angle, a pitch angle and a roll angle of the head in the second facial image into the renderer, and outputting a second virtual facial image with the yaw angle, the pitch angle and the roll angle.

7. The virtual face generation method of any one of claims 1-6, wherein the determining a target pinching face parameter from the third face image comprises:

setting the head posture information in the renderer to obtain a target renderer with a head posture;

Inputting a pinching face parameter into the target renderer, outputting a target virtual face image similar to the third face image, and determining the pinching face parameter of the target virtual face image as the target pinching face parameter.

8. The virtual face generation method of claim 7, wherein the inputting of pinching face parameters into the target renderer, outputting a target virtual face image that is similar to the third face image, comprises:

inputting the pinching face parameters into the target renderer to obtain an output image;

comparing each pixel of the output image with a corresponding pixel in the third face image to obtain a comparison result;

and if the comparison result meets a preset condition, determining the output image as the target virtual face image.

9. The method of generating a virtual face according to claim 8, wherein comparing each pixel of the output image with a corresponding pixel in the third face image to obtain a comparison result includes:

inputting the third face image into a face semantic segmentation model, and outputting a first class probability of each pixel in the third face image;

Inputting the output image into the facial semantic segmentation model to obtain a second class probability of each pixel in the output image;

and determining a norm loss between the second class probability of each pixel in the output image and the first class probability of the corresponding pixel in the third face image, and obtaining the comparison result.

10. The virtual face generation method of claim 9, wherein after the determining a norm loss between the second class probability for each pixel in the output image and the first class probability for the corresponding pixel in the third face image, the method further comprises:

11. The virtual face generating method according to claim 9, wherein determining that the output image is the target virtual face image if the comparison result meets a preset condition comprises:

and if the norm loss is determined to be within a preset norm loss range, determining the output image as the target virtual face image.

12. The virtual face generating method according to claim 1, wherein the generating the three-dimensional virtual head portrait corresponding to the first face image by the target pinching face parameter includes:

and inputting the target face pinching parameters into a three-dimensional rendering engine, and outputting a three-dimensional virtual head portrait corresponding to the first facial image.

13. A virtual face generation apparatus, comprising:

the first alignment module is used for aligning the first facial key points with the second facial key points based on the second facial key points to obtain second facial images corresponding to the first facial images;

the gesture acquisition module is used for determining head gesture information according to the second facial image, inputting the head gesture information into a renderer and outputting a second virtual facial image of which the head gesture information corresponds to the head gesture information of the second facial image;

a second alignment module, configured to obtain a third face key point of the second virtual face image, and align the first face key point and the third face key point based on the third face key point, so as to obtain a third face image corresponding to the first face image;

And the generation module is used for determining a target pinching face parameter according to the third facial image and generating a three-dimensional virtual head portrait corresponding to the first facial image through the target pinching face parameter.

14. A computer device, comprising:

a memory storing executable program code, a processor coupled to the memory;

the processor invokes the executable program code stored in the memory to perform the virtual face generation method of any one of claims 1 to 12.

15. A computer readable storage medium, characterized in that the storage medium stores a plurality of instructions adapted to be loaded by a processor to perform the virtual face generation method of any of claims 1 to 12.