CN115239885A

CN115239885A - Face reconstruction method and device based on key point recognition

Info

Publication number: CN115239885A
Application number: CN202210932433.5A
Authority: CN
Inventors: 陈春朋; 杨智远; 吴连朋
Original assignee: Juhaokan Technology Co Ltd
Current assignee: Juhaokan Technology Co Ltd
Priority date: 2022-08-04
Filing date: 2022-08-04
Publication date: 2022-10-25

Abstract

The application relates to the technical field of three-dimensional reconstruction, and provides a face reconstruction method and equipment based on key point identification, wherein the colors of key points in other face images are locally corrected according to the mapping relation of the key points between a front face image and the other face images, and then the colors of the key points of each face are transferred to the whole image, so that the problem of wrong matching of subsequent feature points due to different illumination at different visual angles is solved, the number of matched feature point pairs is enriched, the number of visual angles can be further reduced, a high-precision face model can be reconstructed at a sparse view point, and the hardware cost is reduced; meanwhile, when target camera parameters of a plurality of cameras are determined, the human face key point pairs with mapping relations are added into the projection error energy function as regular items, so that semantic matching among the human face key points can be used as constraint conditions, matching accuracy is improved, and reconstruction accuracy of multi-view 3D human face model reconstruction is improved.

Description

Face reconstruction method and device based on key point recognition

Technical Field

The application relates to the technical field of three-dimensional reconstruction, and provides a face reconstruction method and device based on key point identification.

Background

With the rise of the concept of "metas", the reconstruction of high-precision digital people with vivid images is receiving great attention, and the face reconstruction becomes a research hotspot in the academic and industrial fields as one of the core technologies of the reconstruction of high-precision digital people.

The widespread use of RGB cameras has facilitated the development of vision-based face reconstruction. At present, a face reconstruction method based on multi-View Stereo (MVS) is one of the main methods for face reconstruction, and uses RGB images of faces from Multiple dense views as input, automatically extracts feature points in the images, and completes face reconstruction through feature point matching.

Because the traditional MVS algorithm is a reconstruction algorithm with high generalization and is not specially proposed for face reconstruction, when feature matching is carried out on face RGB images with different visual angles, the feature point is easily influenced by illumination conditions, so that the error matching or matching failure of the feature point is caused, and the face reconstruction precision is reduced; moreover, in order to ensure the consistency of the reconstruction result, the traditional MVS algorithm requires enough overlapping regions of the face RGB images between the viewing angles to ensure that enough matching feature points can be found, that is, the traditional MVS algorithm needs to deploy very dense viewpoints (dozens or even hundreds) to ensure that a high-precision face model can be reconstructed, so that the hardware cost (dozens or even millions) is greatly increased, and the large-scale application is difficult.

Disclosure of Invention

The application provides a face reconstruction method and device based on key point identification, which are used for improving reconstruction accuracy of a face model under a sparse viewpoint.

In one aspect, the present application provides a face reconstruction method based on keypoint identification, including:

acquiring face images acquired by a camera under multiple viewing angles, taking the face images under the front viewing angle as reference images, and taking the face images under the other viewing angles as near images;

respectively identifying face key points in the face image under each visual angle;

correcting color values of pixel points in the adjacent images by using the color values of the pixel points in the reference images according to the mapping relation between the key points of the human face in the reference images and the key points of the human face in the adjacent images aiming at each adjacent image;

determining camera parameters of a plurality of cameras according to the reference image and each adjacent image to align each adjacent image with the reference image;

and reconstructing a 3D face model by adopting a multi-view stereoscopic vision algorithm according to the reference image and each adjacent image.

In another aspect, the present application provides a reconstruction device, including a processor, a memory, a communication interface, and a display screen, where the communication interface, the display screen, the memory, and the processor are connected by a bus;

the memory stores a computer program, and the processor performs the following operations according to the computer program:

acquiring face images acquired by a camera under multiple viewing angles through the communication interface, taking the face images under the front viewing angle as reference images, and taking the face images under the other viewing angles as near images;

correcting color values of pixel points in the adjacent images by using color values of the pixel points in the reference images according to the mapping relation between the key points of the faces in the reference images and the key points of the faces in the adjacent images aiming at each adjacent image;

determining camera parameters of a plurality of cameras according to the reference image and each adjacent image so as to align each adjacent image with the reference image;

and reconstructing a 3D face model by adopting a multi-view stereoscopic vision algorithm according to the reference image and each adjacent image, and displaying through the display screen.

In another aspect, an embodiment of the present application provides a computer-readable storage medium, where computer-executable instructions are stored, and the computer-executable instructions are configured to cause a computer device to execute a face reconstruction method based on keypoint recognition provided in an embodiment of the present application.

According to the method and the device for reconstructing the face based on the key point identification, the face image at the front view angle in a plurality of view angles is used as a reference image, the rest of the face images are used as near images, the face key points in each image are respectively extracted, the color values of the pixel points in the corresponding near images are corrected by using the color values of the pixel points in the reference image according to the mapping relation between the reference image and the face key points in each near image, the color values of the face key points in the reference image are transferred to each near image, so that the color of the face key points in each near image is close to the color of the face key points in the reference image, the problem of wrong matching of subsequent feature points due to different illumination at different view angles is solved, the number of matched feature point pairs is enriched, the number of the view angles can be reduced, a high-precision face model can be reconstructed at sparse view points (within ten) and the hardware cost is reduced; furthermore, multi-view 3D face model reconstruction is achieved based on the corrected adjacent images, the corrected reference images and camera parameters of a plurality of cameras, and high-precision characteristics of multi-view stereoscopic vision algorithm pixel-by-pixel reconstruction can be reserved under sparse views by introducing face key points as auxiliary information.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

Fig. 1 is a flowchart of a face reconstruction method based on key point identification according to an embodiment of the present application;

FIG. 2 is a schematic diagram of 98 identified key points of a human face according to an embodiment of the present application;

fig. 3 is a comparison diagram of key points of a face extracted from face images at different viewing angles according to the embodiment of the present application;

FIG. 4 is a flowchart of a method for color correction of an adjacent image according to an embodiment of the present disclosure;

fig. 5 is a flowchart of a method for performing local color correction on key points of a face in an adjacent image according to an embodiment of the present application;

fig. 6 is a flowchart of a method for determining color values of key points of a human face according to an embodiment of the present application;

fig. 7 is a flowchart of a method for performing global color correction on a pixel point in an adjacent image according to an embodiment of the present disclosure;

fig. 8 is an effect diagram after adjusting colors of adjacent images based on a reference image according to an embodiment of the present application;

fig. 9 is a flowchart of a method for determining camera parameters of a multi-view camera according to an embodiment of the present disclosure;

fig. 10 illustrates a reconstruction effect of a geometric face model provided in an embodiment of the present application;

FIG. 11 is a diagram illustrating an effect of a textured 3D face model according to an embodiment of the present disclosure;

fig. 12 is a hardware configuration diagram of a reconstruction device according to an embodiment of the present application;

fig. 13 is a functional block diagram of a reconstruction device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the technical solutions of the present application. All other embodiments obtained by a person skilled in the art without any inventive step based on the embodiments described in the present application are within the scope of the protection of the present application.

Currently, the mainstream methods for three-dimensional reconstruction of human faces include: the face reconstruction method based on the parameterized model, the face reconstruction method based on the deep learning and the face reconstruction method based on the MVS algorithm. Wherein:

a human face reconstruction method based on a parameterized model guides and optimizes a standard parameterized model (such as 3DMM, FLAME, DECA models and the like) through an input single human face RGB image, thereby obtaining an individualized 3D human face model. However, the method is an approximate optimization algorithm, and because the input single face RGB image contains less information, the reconstructed 3D face model can only be similar to a real person, and high-precision reconstruction is difficult to perform.

In the Face reconstruction method based on deep learning, a ToFU (English is called as Topoliomy Consistent Multi-View Face Inference Using Volumetric Sampling) model with a better reconstruction effect is adopted, a sparse Face RGB image is used as input in the model, and a high-precision geometric model and a texture mapping with the same topology are restored through a deep learning algorithm. However, the method requires an accurate calibration relationship between the RGB cameras arranged on the sparse viewpoints, and the calibration process is complicated, which limits the application thereof.

The face reconstruction method based on the MVS algorithm takes face RGB images with a plurality of dense visual angles as input, automatically extracts feature points in the images, and performs matching by calculating the illumination consistency of the feature points between the two images, thereby completing face reconstruction. Specifically, for each feature point, information such as pixel intensity of the feature point, and mean, variance and the like of pixel points in surrounding neighborhoods is calculated, the information is used as the feature of the feature point, then, feature difference between the feature points in two images is calculated, and when the difference is smaller than a certain threshold value, the two feature points are considered to be successfully matched. However, because the conventional MVS algorithm is a reconstruction algorithm with high generalization and is not specially proposed for face reconstruction, wrong matching or matching failure of feature points is often caused due to different illumination conditions for the same feature shot from different viewing angles, and the accuracy of face reconstruction is reduced; in addition, in order to ensure the consistency of a reconstruction result, the traditional MVS algorithm requires enough overlapping regions of the face RGB images between the viewing angles to ensure that enough matching feature points can be found, that is, the traditional MVS algorithm needs to deploy very dense viewpoints (dozens or even hundreds) to ensure that a high-precision face model can be reconstructed, so that the hardware cost (dozens or even millions) is greatly increased, and the large-scale application is difficult.

In view of this, the embodiment of the present application provides a face reconstruction method and device based on key point identification based on a traditional MVS algorithm, and color migration is performed by introducing face key points as auxiliary information, so that the number of matching point pairs is increased, the requirement of MVS algorithm on high overlap ratio of multiple view angles is reduced, and meanwhile, the requirement of MVS algorithm reconstruction on the number of view points is reduced, so that a high-precision 3D face model can be reconstructed even under sparse view angle input (within ten view angles); moreover, semantic matching among key points of the multi-view face is used as constraint to directly calculate camera parameters of the multi-view RGB camera, so that the method can directly use face RGB images shot by a mobile terminal (such as a mobile phone, a flat panel, a television, VR glasses and the like) as input to reconstruct the face, the deployment difficulty is reduced, and the use range is expanded.

Referring to fig. 1, a flowchart of a face reconstruction method based on keypoint identification provided in the embodiment of the present application mainly includes the following steps:

s101: the method comprises the steps of acquiring face images acquired by a camera under multiple visual angles, taking the face images under the front visual angle as reference images, and taking the face images under the other visual angles as adjacent images.

Usually, the face image at the front view contains abundant face detail information, and the number of identified face key points is the largest, so that the face image at the front view is used as a reference image, and the face images at the other views are used as near images.

S102: and respectively identifying the key points of the human face in the human face image under each visual angle.

In the field of computer vision, most face key point recognition algorithms can accurately recognize face key points with semantic features, and the number of the key points is increased from 5 to hundreds. The key points of the human face mainly comprise an inner corner and an outer corner of the eye, an upper eyelid and a lower eyelid, a corner of the mouth, lips, a nose, a human face contour, eyebrows and the like.

For example, as shown in fig. 2, a schematic diagram of 98 identified key points of a human face is provided for the embodiment of the present application.

Because the face image under the front view angle contains the most face regions, the number of the identified face key points is more than that of the face key points identified in the face images under the other view angles.

As shown in fig. 3, the comparison graph of the key points of the face extracted from the face image under different viewing angles is shown. As can be seen from fig. 3, the face image at the front view angle is used as a reference image, all face key points can be detected, and the face images at the other view angles are used as neighboring images of the reference image, and some face key points can be extracted.

S103: and correcting the color values of the pixel points in the adjacent images by using the color values of the pixel points in the reference images according to the mapping relation between the face key points in the reference images and the face key points in the adjacent images aiming at each adjacent image.

Each face key point identified by the face key point identification algorithm contains semantic information (such as inner and outer canthus, upper and lower eyelids, mouth corner, lips, nose, face contour, eyebrows, etc.) corresponding to the face key point, and the semantic information can be uniquely identified by a feature point number, for example, the semantic information of the face key point numbered 96 in fig. 1 is a right eyeball. Therefore, according to the semantic information of the face key points, the one-to-one mapping relationship between the face key points in the reference image and the face key points in the adjacent images can be easily established, for example, the face key points with the same number in the reference image and the adjacent images can be directly mapped.

In step S103, color space correction is performed on each adjacent image to the reference image according to the mapping relationship between the face key points in the reference image and the face key points in the adjacent images.

It should be noted that, because the identified face key points include eyebrow key points, and the eyebrows belong to hair material and do not belong to skin material, when color correction is performed on each adjacent image, the eyebrow key points in the reference image are not suitable as references for color correction, and the color information of the eyebrows needs to be removed, that is, in S103, color correction is performed on the corresponding adjacent image according to the one-to-one mapping relationship between the skin-type face key points in the reference image and the adjacent image.

In an alternative embodiment, in S103, when performing color correction on each neighboring image, a color migration algorithm with better correction result may be used. By color migration algorithm, we understand: and synthesizing a new target image based on the reference image and the adjacent image, so that the target image after color correction learns the overall color tone of the reference image under the condition of not changing the color information expressed by the adjacent image. Thus, the process of the color migration algorithm can be regarded as a color migration synthesis process of the image.

The traditional color migration algorithm is to convert the whole reference image and the adjacent image from the RGB color space to the Lab color space, then adjust the mean value and the variance of all pixel points in the adjacent image in the Lab color space to be consistent with the mean value and the variance of all pixel points in the reference image in the Lab color space through linear transformation, and then convert the whole reference image and the adjacent image to the RGB color space, thereby completing the color migration of the whole image. Because the traditional color migration algorithm is a forced conversion algorithm of global color migration, if the colors of two images have larger deviation, the color migration effect is poorer.

In order to solve the above problem, in the embodiment of the present application, since the mapping relationship between the key points of the face in the reference image and the key points of the face in the adjacent image has been established according to the semantic information, in S103, when the linear equation of color migration is calculated, only the local area where the key points of the face having the mapping relationship are located needs to be considered, so that the influence of the global color difference is reduced.

For each adjacent image, the specific process of color correction is shown in fig. 4, which mainly includes the following steps:

s1031: the reference image and the neighboring image are converted from the RGB color space to the Lab color space.

The purpose of a color space, also called color model, also called color space or color system, is to illustrate colors in a generally accepted way under certain standards. The color space is usually used in various fields such as RGB, CMY, HSV, HIS, lab, etc.

Among them, RGB (red, green, and blue) is a color space defined according to colors recognized by human eyes, can represent most colors, is composed of a red channel (R), a green channel (G), and a blue channel (B), and is a most general hardware-oriented color model, for example, the model is used for a color monitor and a large class of color video cameras. However, the RGB color space is not generally used in scientific research because it puts three components of hue, brightness and saturation together for representation, and is difficult to separate, and it is difficult to digitally adjust the details.

The Lab color space is used for tone adjustment and color correction, is independent of the CIE color model of the device, is used for mapping the device to the model and the color distribution quality change of the model, is a device-independent color system, and is a color system based on physiological characteristics, that is, it describes the human visual sense by a digital method. The Lab color space consists of a lightness (L) channel and two color channels (a, b). The L channel is specially responsible for the darkness of the whole image, in short, the black and white version of the whole image, and the a channel and the b channel are only responsible for the number of colors. The a-channel represents the range from magenta (white in the channel) to dark green (black in the channel); b denotes a range from burnt yellow (white in the channel) to blue (black in the channel) that curls upward; a. 50% neutral gray in the b channel indicates no color, so closer to gray indicates less color and no brightness in the colors of the a and b channels. An example of a visual phenomenon in which the reaction is: the red T-shirt is very well outlined in the a, b components because red is composed of magenta and burnt yellow.

Based on the characteristics of the Lab color space, under any single-tone background, the part with obvious color difference is scratched by components, the operation can be quickly completed in the Lab color space, and any adjustment (such as sharpening, blurring and the like) on an L channel in the Lab color space can not influence the hue. Therefore, in S1031, the reference image and the adjacent image may be converted from the RGB color space to the Lab color space.

S1032: determining the color attribute of each channel of the face key point pair with the mapping relation in the set neighborhood according to the mapping relation between the face key point in the reference image and the face key point in the adjacent image, and correcting the color value of the corresponding face key point in the adjacent image in each channel according to each color attribute.

In the embodiment of the present application, it is considered that each face keypoint corresponds to one pixel point, and a color value of the face keypoint is easily interfered by noise, so to improve robustness, in S1023, a local neighborhood (for example, a neighborhood of 9*9 pixel) of a set size of each face keypoint is used to replace the face keypoint, and then color attributes (including a color mean and a standard deviation) of the face keypoint in respective set neighborhoods are determined according to a mapping relationship between the face keypoints, so as to correct a color value of each face keypoint in an adjacent image.

For convenience of description, the face key points in the adjacent images are recorded as first face key points, and the face key points in the reference images are recorded as second face key points. In specific implementation, for each first face key point in the adjacent image, the color correction process is as shown in fig. 5, and mainly includes the following steps:

s1032_1: and determining a second face key point corresponding to the first face key point in the reference image according to the mapping relation between the face key point in the reference image and the face key point in the adjacent image.

In S1032_1, the second face key points having a mapping relationship with the first face key points have the same number and have common semantic information.

S1032_2: determining a first color mean value and a first standard deviation of each pixel point in each channel in a set neighborhood of a first face key point, and determining a second color mean value and a second standard deviation of each pixel point in each channel in a set neighborhood of a second face key point.

In S1032_2, the setting fields of the first face keypoint and the second face keypoint include a plurality of pixel points, so that the color mean and the variance of each channel in the setting neighborhoods of the first face keypoint and the second face keypoint can be calculated in the Lab color space.

Suppose that, in the set neighborhood of the first face key point, the first color mean values of the pixels in the L channel, the a channel and the b channel are recorded as ml, ma and mb, and the first standard deviations are recorded as nl, na and nb. In the set neighborhood of the key point of the second face, the second color mean values of all the pixel points in the L channel, the a channel and the b channel are recorded as ml ', ma' and mb ', and the first standard variances are recorded as nl', na 'and nb'.

S1032_3: and correcting the color values of the first face key points according to the first color mean, the second color mean, the first standard deviation and the second standard deviation of each channel.

In specific implementation, referring to fig. 6, the process of correcting the color values of the first face key points in S1032_3 mainly includes the following steps:

s1032_31: and acquiring initial color values of the first face key point in an L channel, an a channel and a b channel respectively.

Suppose that the initial color values of the first face keypoint in the L channel, the a channel, and the b channel are denoted as L0, a0, and b0, respectively.

S1032_32: and respectively subtracting the first color mean value of the corresponding channel from the initial color values of the L channel, the a channel and the b channel to obtain the median color values of the first face key point in the L channel, the a channel and the b channel.

Wherein the formula of median color values is expressed as follows:

in formula 1, L1, a1, and b1 represent median color values of the first face keypoint in the L channel, the a channel, and the b channel, respectively.

S1032_33: and scaling the median color value of the corresponding channel according to the first standard deviation and the second standard deviation of the L channel, the a channel and the b channel respectively.

In S1032_33, the scaling factor of each channel is a ratio of a first standard deviation and a second standard deviation of the first face keypoint and the second face keypoint in the set neighborhood, and the scaling formula is expressed as follows:

in formula 2, L2, a2, and b2 represent color values of the first face keypoint after scaling in the L channel, the a channel, and the b channel, respectively.

S1032_34: and adding the scaled color values of the L channel, the a channel and the b channel to the second color mean value of the corresponding channel to obtain the color values of the corrected first face key point in the L channel, the a channel and the b channel.

The formula for the corrected color values is as follows:

in formula 2, L ', a ', b ' represent color values of the first face keypoint after rectification in the L channel, the a channel, and the b channel, respectively.

s1033: and aiming at each pixel point in the adjacent image, determining the target color value of the pixel point according to the color value of each face key point in the corrected adjacent image.

After local color value correction is carried out on each face key point in the adjacent image, in s1033, a linear conversion equation from the adjacent image to the reference image can be solved according to the color value of each corrected face key point, and then the linear conversion equation is applied to the whole adjacent image, so that the color correction of the whole adjacent image is completed.

In specific implementation, referring to fig. 7, for each pixel point in the adjacent image, the following operations are performed:

s1033_1: and determining the color mean value of each channel corresponding to the pixel point in the self-set neighborhood.

In S1033_1, the set neighborhood size of the pixel is the same as the set neighborhood size of the face key point. Similarly, in order to reduce the noise interference, the color value of the pixel point is replaced by the color mean value of the self-set neighborhood.

S1033_2: and calculating the distance from the pixel point to each face key point in the adjacent image.

In face reconstruction, the illumination environment is fixed, and it is generally set that illumination does not change drastically, that is, illumination change of a face in the same image can be considered to be linear. Therefore, after the color migration transformation relation between each key point in the adjacent image and the reference image is solved, linear interpolation can be carried out on pixel points at other positions in the adjacent image through distance factors, and therefore the color of the whole adjacent image is transformed to be close to the reference image. Wherein, the farther away from this pixel point, the smaller the influence to the colour value of this pixel point, like this, can set for different influence weight according to the distance size of this pixel point and each people's face key point, the distance is the bigger, and the weight is the bigger.

S1033_3: and determining the target color value of the pixel point according to the color mean value of the pixel point in each channel, the weight corresponding to each distance and the color value of each corrected face key point in each channel.

Wherein, the formula of the target color value is expressed as follows:

in formula 4, ml, ma, mb respectively represent the color mean values of the pixel point in the L channel, the a channel, and the b channel in the self-set neighborhood, i represents the ith personal face key point, i =1,2,3 _i ′、a _i ′、b _i ' color values, ω, of L, a, and b channels after i-th individual face keypoint correction _i And the target color values of the L channel, the a channel and the b channel of the pixel point are the weight corresponding to the distance from the pixel point to the ith personal face key point, and the L ', a ', b ', and the target color values are the target color values of the L channel, the a channel and the b channel.

After color correction is carried out on each adjacent image, all the adjacent images and the reference image have the same Lab distribution, and the problem of inconsistent luminosity caused by different illumination of the same characteristic point under different viewing angles is solved.

S1034: and converting the reference image and the adjusted adjacent image from the Lab color space to the RGB color space.

In face reconstruction, the RGB image is generally used for texture mapping, so that after the colors of the adjacent image and the reference image are close to each other by the color migration algorithm, the reference image and the adjacent image are converted from the Lab color space to the RGB color space.

For example, as shown in fig. 8, for the effect map obtained by adjusting the color of the adjacent image based on the reference image provided in the embodiment of the present application, the entire adjacent image after color adjustment retains the original color information expressed by the adjacent image, and simultaneously retains the overall color key of the reference image, so that the adjacent image is closer to the reference image.

In the embodiment of the application, the color values of the key points in the reference image are locally migrated to the corresponding face key points in the adjacent image by utilizing the mapping relation between the adjacent image and the face key points in the reference image, so that the local correction of the colors of the face key points in the adjacent image is realized, the colors of the face key points in the adjacent image approach to the colors of the face key points in the reference image, and the influence of global chromatic aberration is reduced. Because the environment that the people face shoots is general sunlight environment, the sudden change of color and brightness can not occur, therefore, the stretching deformation of each key point of the people face in the adjacent image in the color space can be applied to the pixel points of other non-human face key points in the adjacent image, thereby achieving the purpose of carrying out color correction on the whole adjacent image, solving the problem of subsequent matching error of the feature points caused by different illumination at different visual angles in the process of reconstructing the multi-visual-angle stereoscopic vision people face, greatly enriching the number of the feature points which can be used for matching, and further realizing the high-precision reconstruction of the people face at a sparse view point.

S104: camera parameters of the plurality of cameras are determined to align the respective adjacent images with the reference image based on the reference image and the respective adjacent images.

The process of determining the intrinsic and extrinsic parameters of the camera is called camera calibration. Generally, the internal parameters of each camera are calibrated in advance, and only external parameters reflecting the relative pose between the cameras need to be determined during face reconstruction.

In the embodiment of the present application, after color correction is performed on each adjacent image, the face image photographed at each view angle already has very close color distribution, and further, in S104, alignment of the face image can be completed by a conventional Motion recovery Structure (SFM) method and a Beam Adjustment (BA) method, so as to obtain the internal and external parameters between two cameras at each view angle.

The SFM algorithm is to determine matched 2D feature point pairs directly through the illumination consistency among multiple images on the premise of no calibration prior information, then preliminarily estimate external parameters among multiple cameras according to the matched 2D feature point pairs, obtain 3D point clouds, and optimize the external parameters of the cameras through a BA algorithm. The BA algorithm can also be understood as a lowest projection error algorithm, and the idea is to optimize a projection error energy function to minimize the projection error so as to obtain the optimal camera external parameters.

Because no calibration prior information exists, in order to ensure the calculation accuracy of the matched 2D feature point pairs and have higher requirement on illumination consistency, the number of the matched 2D feature points is less, so that the obtained 3D point cloud data is sparse, the reconstruction precision of the human face is lower, the number of cameras can be increased only for improving the reconstruction precision of the human face, and the deployment difficulty and hardware cost of the cameras are improved; and the matched 2D feature point pairs are determined only by means of illumination consistency, and the 2D feature point pairs with mismatching exist, so that once more mismatching feature point pairs exist, the optimization result of the camera external parameters is directly influenced.

In order to solve the problem in the conventional SFM + BA algorithm, in the embodiment of the present application, when the face images at each view angle are aligned in S104, the key point pairs of the face having a mapping relationship are added to the projection error energy function as regular terms, so as to improve the robustness of the calibration and obtain more accurate camera external parameters.

In S104, the camera coordinate systems of the face images collected from multiple viewing angles are unified into one coordinate system, so that the alignment of the multi-view face images can be achieved. In particular implementation, in S104, for each adjacent image, the operations shown in fig. 9 are performed:

s104l: according to a plurality of matched 2D feature point pairs in the reference image and the adjacent image, determining initial camera parameters and a first 3D point cloud corresponding to the 2D feature point pairs.

In S1041, 2D feature points in the reference image and the adjacent image are respectively extracted, illumination information such as intensity of each 2D feature point, and mean, variance, etc. of pixels in the surrounding neighborhood is calculated, based on the illumination information, a difference between the 2D feature points in the reference image and the adjacent image is calculated, 2D feature points in the two images, the difference of which is smaller than a certain threshold, are used as a pair of matched 2D feature point pairs, and then, according to the matched plurality of 2D feature point pairs, an initial camera parameter and a first 3D point cloud corresponding to the 2D feature point pairs are determined.

S1042: and determining a second 3D point cloud corresponding to the face key point pairs according to the plurality of face key point pairs with mapping relations in the reference image and the adjacent image.

Considering that the first 3D point cloud obtained based on the 2D feature point pairs is sparse, in s1042, according to a plurality of face key point pairs having a mapping relationship in the reference image and the adjacent image, a second 3D point cloud corresponding to the face key point pairs is obtained to serve as a matching constraint of the 2D feature points, thereby improving matching accuracy.

S1043: and constructing a projection error energy function of the reference image and the adjacent image according to the initial camera parameters, the first 3D point cloud and the second 3D point cloud.

In S1043, adding the second 3D point cloud obtained from the key point pair of the face having the mapping relationship as a regular term into the conventional projection error energy function, and giving a weight to the regular term to adjust the influence of the regular term on the entire projection error energy function. The projection error energy function is formulated as follows:

wherein, M ^j Representing the jth 3D point in the first 3D point cloud, V (j) representing all camera sets that can see the jth 3D point in the first 3D point cloud, P _i Initial camera parameters representing the ith camera,

representing a corresponding pixel point of the jth 3D point in the first 3D point cloud in the ith image, M ⁿ Represents the nth 3D point in the second 3D point cloud, V (n) represents all camera sets that can see the nth 3D point in the second 3D point cloud,

and expressing the corresponding pixel point of the nth 3D point in the second 3D point cloud in the ith image, expressing the weight of the regular term by lambda, and expressing a projection error energy function by E (P, M).

When the initial camera parameters of the ith camera are optimized, the projection error energy function shown in the formula 5 considers the sparse first 3D point cloud obtained by the SFM algorithm and also considers semantic matching added by the key points of the human face, so that the matching accuracy is improved. The larger the lambda is, the larger the influence of the result of semantic matching of the face key points on the optimization of the camera parameters is.

S1044: and optimizing a projection error energy function to reduce the projection error, and taking the camera parameter corresponding to the minimum projection error as a target camera parameter.

In S1044, the projection error energy function is optimized to reduce the projection error, so as to obtain the target camera parameters that minimize the projection error.

s105: and reconstructing a 3D face model by adopting a multi-view stereoscopic vision algorithm according to the reference image and each adjacent image.

And after aligning the reference image with each adjacent image, performing face three-dimensional reconstruction by adopting an MVS algorithm according to the target camera parameters of each camera, each adjacent image after color correction and the reference image. The MVS algorithm is mainly divided into a Point-Based Multi-View StereoNet (PMVSNet) for directly generating Point clouds and an MVSNet (Depth index for Unstructured Multi-View Stereo) for generating Depth maps according to different generation results.

In S105, because each adjacent image is subjected to color correction, the luminosity consistency of the same characteristic points under each view angle can be ensured, and when MVS reconstruction is carried out, the number of matching point pairs can be greatly improved, dense 3D point cloud is obtained, and therefore the reconstruction precision and the dense degree of the 3D face model are improved.

As shown in fig. 10 and 11, the effect graphs reconstructed by the face reconstruction method provided in the embodiment of the present application are shown, where fig. 10 is a reconstructed face geometric model, and fig. 11 is a 3D face model obtained by texture mapping on the face geometric model. As can be seen from fig. 10, the high-precision 3D face model can be reconstructed from the sparse viewpoint, and the reconstruction cost and difficulty of the multi-view stereoscopic vision are reduced.

According to the face reconstruction method based on key point identification, the face image at the front view angle in a plurality of view angles is used as a reference image, the rest are used as near images, the face key points in each image are respectively extracted, the local color of the corresponding face image key points in the near images is corrected according to the mapping relation between the reference image and the face key points in each near image, and the color of each face key point in the near images is transferred to the whole image, so that the problem of wrong matching of subsequent feature points caused by different illumination at different view angles is solved, the number of matched feature point pairs is enriched, the number of view angles can be reduced, a high-precision face model can be reconstructed at sparse view points (within ten), and the hardware cost is reduced; furthermore, when target camera parameters of a plurality of cameras are determined, human face key point pairs with mapping relations are added into a projection error energy function as regular terms, so that semantic matching among the human face key points can be used as constraint conditions, matching accuracy is improved, optimal target camera parameters are obtained, reconstruction of a multi-view 3D human face model is achieved by combining adjacent images and reference images after color correction, and in the reconstruction process, the human face key points are introduced as auxiliary information, so that the high-precision characteristic of pixel-by-pixel reconstruction of a multi-view stereoscopic vision algorithm can be kept under a sparse view angle.

Based on the same technical concept, the present application provides a reconstruction device, which may be a client such as a notebook computer, a desktop computer, a smart phone, a tablet, VR glasses, and AR glasses having data processing capability, or a server for implementing an interaction process, including but not limited to a micro server, a cloud server, a server cluster, and the like. When the reconstruction device is a client, a camera of the client can be directly used, and when the reconstruction device is a server, a plurality of RGB cameras can be independently deployed.

The reconstruction device can implement the steps of the face reconstruction method based on the key point identification in the above embodiment, and can achieve the same technical effects, which are not described herein again.

Referring to fig. 12, the reconstruction device includes a processor 1201, a memory 1202, a communication interface 1203, and a display 1204, the communication interface 1203, the memory 1202, and the processor 1201 are connected by a bus 1205;

the memory 1202 stores a computer program according to which the processor 1201 performs the following operations:

acquiring face images acquired by a camera under multiple viewing angles through the communication interface 1203, taking the face images under the front viewing angle as reference images, and taking the face images under the other viewing angles as near images;

and reconstructing a 3D face model by adopting a multi-view stereoscopic vision algorithm according to the reference image and each adjacent image, and displaying the 3D face model through the display screen 1204.

Optionally, the processor 1201 corrects, according to a mapping relationship between a key point of a face in the reference image and a key point of a face in the adjacent image, a color value of a pixel in the adjacent image by using a color value of the pixel in the reference image, and the specific operation is:

converting the reference image and the neighboring image from an RGB color space to a Lab color space;

determining the color attribute of each channel of the face key point pair with the mapping relation in each set neighborhood according to the mapping relation between the face key point in the reference image and the face key point in the adjacent image, and correcting the color value of the corresponding face key point in the adjacent image in each channel according to each color attribute;

aiming at each pixel point in the adjacent image, determining a target color value of the pixel point according to the color value of each face key point in the adjacent image after correction;

and converting the reference image and the adjusted adjacent image from a Lab color space to an RGB color space.

Optionally, the color attributes include a color mean and a standard deviation, the processor 1201 determines, according to a mapping relationship between a face key point in the reference image and a face key point in the adjacent image, a color attribute of each channel of the face key point pair having the mapping relationship in each set neighborhood, and corrects a color value of a corresponding face key point in the adjacent image in each channel according to each color attribute, specifically:

for each first face keypoint in the neighboring image, performing the following operations:

determining a second face key point corresponding to the first face key point in the reference image according to a mapping relation between the face key point in the reference image and the face key point in the adjacent image;

determining a first color mean value and a first standard deviation of each pixel point in a set neighborhood of the first face key point in each channel, and determining a second color mean value and a second standard deviation of each pixel point in a set neighborhood of the second face key point in each channel;

correcting the color values of the first face key points according to the first color mean, the second color mean, the first standard deviation and the second standard deviation of each channel.

Optionally, the processor 1201 corrects the color value of the first face keypoint according to the first color mean, the second color mean, the first standard deviation and the second standard deviation of each channel, and specifically operates as:

acquiring initial color values of the first face key point in an L channel, an a channel and a b channel respectively;

respectively subtracting the first color mean values of the corresponding channels from the initial color values of the L channel, the a channel and the b channel to obtain median color values of the first face key point in the L channel, the a channel and the b channel;

scaling the median color value of the corresponding channel according to the first standard deviation and the second standard deviation of the L channel, the a channel and the b channel respectively;

and adding the scaled color values of the L channel, the a channel and the b channel to a second color mean value of the corresponding channel to obtain the color values of the first face key point in the L channel, the a channel and the b channel after correction.

Optionally, for each pixel point in the adjacent image, the processor 120l adjusts the color value of the pixel point according to the color value of each face key point in the adjacent image after correction, and the specific operation is as follows:

determining the color mean value of each channel corresponding to the pixel point in the self-set neighborhood;

calculating the distance from the pixel point to each face key point in the adjacent image;

and determining the target color value of the pixel point according to the color mean value of the pixel point in each channel, the weight corresponding to each distance and the color value of each corrected face key point in each channel.

Optionally, the calculation formula of the target color value of the pixel point is as follows:

wherein, L ', a ', b ' respectively represent the target color values of the pixel point L channel, the a channel and the b channel, L _i ′、a _i ′、b _i ' respectively representing color values of an L channel, an a channel and a b channel after the ith personal face key point is corrected, ml, ma and mb respectively represent color mean values of the pixel points in the L channel, the a channel and the b channel in a set neighborhood of the pixel points, i represents the number of key points of a human face, i =1,2,3 _i And the weight corresponding to the distance from the pixel point to the ith personal face key point.

Optionally, the processor 1201 determines camera parameters of a plurality of cameras according to the reference image and each adjacent image, and specifically operates as follows:

for each adjacent image, the following operations are performed:

determining initial camera parameters and a first 3D point cloud corresponding to the 2D feature point pairs according to the reference image and the plurality of matched 2D feature point pairs in the adjacent image;

determining a second 3D point cloud corresponding to the face key point pairs according to the plurality of face key point pairs with mapping relations in the reference image and the adjacent image;

constructing a projection error energy function of the reference image and the adjacent image according to the initial camera parameters, the first 3D point cloud and the second 3D point cloud;

and optimizing the projection error energy function to reduce the projection error, and taking the camera parameter corresponding to the minimum projection error as a target camera parameter.

Optionally, the projection error energy function formula is as follows:

wherein M is ^j Representing the jth 3D point in the first 3D point cloud, V (j) representing all sets of cameras that can see the jth 3D point in the first 3D point cloud, P _i Initial camera parameters representing the ith camera,

representing a corresponding pixel point of a jth 3D point in the first 3D point cloud in an ith image, mn representing an nth 3D point in the second 3D point cloud, V (n) representing all camera sets capable of seeing the nth 3D point in the second 3D point cloud,

and expressing a corresponding pixel point of the nth 3D point in the second 3D point cloud in the ith image, wherein lambda expresses the weight of the regular term, and E (P, M) expresses the projection error energy function.

Optionally, the mapping relationship between the face key points in the reference image and the face key points in the adjacent images is established based on semantic information of the face key points.

It should be noted that fig. 12 is only an example, and hardware necessary for implementing the steps of the face reconstruction method based on keypoint recognition provided by the embodiment of the present application is given by a reconstruction device. Not shown, the reconstruction device may include conventional devices such as a microphone, speaker, audio-visual processor, power supply, and the like.

The processor referred to in fig. 12 in this Application may be a Central Processing Unit (CPU), a general-purpose processor, a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), an Application-specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof.

Referring to fig. 13, a functional structure diagram of a reconstruction apparatus provided in the embodiment of the present application is a functional structure diagram, where the reconstruction apparatus mainly includes an image obtaining module 1301, a key point identifying module 1302, a color correcting module 1303, an image aligning module 1304, and a reconstruction module 1305, where:

the image acquisition module 1301 is configured to acquire a face image acquired by a camera at multiple viewing angles, take the face image at a front viewing angle as a reference image, and take the face images at the other viewing angles as near images;

a key point identification module 1302, configured to identify key points of a face in the face image at each view angle respectively;

the color correction module 1303 is configured to correct, for each adjacent image, color values of pixels in the adjacent image by using color values of pixels in the reference image according to a mapping relationship between key points of a face in the reference image and key points of the face in the adjacent image;

an image alignment module 1304 for determining camera parameters of a plurality of cameras based on the reference image and each adjacent image to align each adjacent image with the reference image;

and a reconstruction module 1305, configured to reconstruct a 3D face model according to the reference image and each of the neighboring images by using a multi-view stereo vision algorithm.

The specific implementation of each functional module is referred to the foregoing embodiments, and will not be described repeatedly here.

The embodiment of the present application further provides a computer-readable storage medium for storing instructions, and when the instructions are executed, the method for reconstructing a face based on keypoint recognition in the foregoing embodiments may be completed.

The embodiment of the present application further provides a computer program product, configured to store a computer program, where the computer program is configured to execute the method for reconstructing a human face based on keypoint recognition in the foregoing embodiment.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A face reconstruction method based on key point recognition is characterized by comprising the following steps:

respectively identifying face key points in the face image at each view angle;

2. The method of claim 1, wherein the correcting the color values of the pixels in the neighboring image according to the mapping relationship between the key points of the face in the reference image and the key points of the face in the neighboring image by using the color values of the pixels in the reference image comprises:

3. The method of claim 2, wherein the color attributes comprise a color mean and a standard deviation, and wherein determining the color attribute of each channel of the mapped face key point pair in the respective set neighborhood according to the mapping relationship between the face key point in the reference image and the face key point in the neighboring image, and correcting the color value of the corresponding face key point in the neighboring image in each channel according to the color attributes comprises:

correcting color values of the first face key points according to the first color mean, the second color mean, the first standard deviation and the second standard deviation of each channel.

4. The method of claim 3, wherein said rectifying color values of said first face keypoints according to said first color mean, said second color mean, said first standard deviation, and said second standard deviation for each channel comprises:

5. The method of claim 2, wherein said adjusting, for each pixel in the adjacent image, a color value of the pixel according to a color value of each face key in the adjacent image after rectification comprises:

6. The method of claim 5, wherein the target color value of the pixel is calculated as:

wherein L ', a ', b ' each representTarget color values, L, of the pixel points L channel, a channel and b channel _i ′、a _i ′、b _i ' respectively representing color values of an L channel, an a channel and a b channel after the ith personal face key point is corrected, ml, ma and mb respectively represent color mean values of the pixel points in the L channel, the a channel and the b channel in a set neighborhood of the pixel points, i represents the number of key points of a human face, and i =1,2,3 _i And the weight corresponding to the distance from the pixel point to the ith personal face key point.

7. The method of claim 1, wherein determining camera parameters for a plurality of cameras based on the reference image and respective adjacent images comprises:

for each adjacent image, the following operations are performed:

8. The method of claim 7, wherein the projection error energy function is formulated as follows:

wherein M is ^j Representing a jth 3D point in the first 3D point cloud, V (j) representing all phases in which the jth 3D point in the first 3D point cloud can be seenSet of machines, P _i Initial camera parameters representing the ith camera,

representing a corresponding pixel point of the jth 3D point in the first 3D point cloud in the ith image, M ⁿ Represents an nth 3D point in the second 3D point cloud, V (n) represents all sets of cameras that can see the nth 3D point in the second 3D point cloud,

and expressing a pixel point corresponding to the nth 3D point in the second 3D point cloud in the ith image, wherein lambda expresses the weight of a regular term, and E (P, M) expresses the projection error energy function.

9. The method according to any one of claims 1 to 8, wherein the mapping relationship between the face key points in the reference image and the face key points in the adjacent images is established based on semantic information of the face key points.

10. A reconstruction device comprising a processor, a memory, a communication interface and a display screen, wherein said communication interface, said display screen, said memory and said processor are connected by a bus;