WO2024109772A1

WO2024109772A1 - Face posture estimation method and apparatus based on structured light system

Info

Publication number: WO2024109772A1
Application number: PCT/CN2023/133069
Authority: WO
Inventors: 宋展; 汪奕鋆; 叶于平; 赵娟
Original assignee: 中国科学院深圳先进技术研究院
Priority date: 2022-11-23
Filing date: 2023-11-21
Publication date: 2024-05-30
Also published as: CN115798000A

Abstract

Provided in the present application are a face posture estimation method and apparatus based on a structured light system. The method comprises: acquiring a head-on zero-posture face image of a subject under test, using a three-dimensional point cloud of the head-on zero-posture face image as a standard model to perform 2D key point selection on the head-on zero-posture face image to obtain standard 2D key points, and acquiring standard 3D key points corresponding to the standard 2D key points according to the structured light system; collecting a two-dimensional face image of the subject under test in real time, performing 2D key point selection on the two-dimensional face image to obtain real-time 2D key points, and acquiring real-time 3D key points corresponding to the real-time 2D key points according to the structured light system; determining a 3D key point cloud according to the standard 3D key points and the real-time 3D key points; and determining a precise pose according to the 3D key point cloud. The solution has high speed, high real-time performance, high precision and high stability.

Description

A method and device for estimating face posture based on structured light system

Technical Field

The present invention belongs to the technical field of computer vision, and in particular relates to a method and device for estimating human face posture based on a structured light system.

Background technique

With the continuous progress and development of artificial intelligence technology, artificial intelligence technology represented by computer vision has been widely used in various fields of production and life, promoting people's lives to be further automated, intelligent and convenient. Face pose estimation is an important subfield in computer vision. Its main purpose is to obtain the orientation information of the face. It can be represented by rotation matrix, rotation vector, quaternion or Euler angle, and the above representations can also be converted to each other. Generally speaking, Euler angles are more visible and more commonly used. Euler angles include pitch, yaw and roll, which generally correspond to raising the head, shaking the head and turning the head.

As a popular practical field, face pose estimation has been widely used in many fields of production and life, such as biometric recognition and human-computer interaction. For example, face recognition, driving state detection, human assistance, virtual reality, and various entertainment devices. In face recognition, due to the complexity and variability of people's facial postures during actual recognition, face recognition technology combined with face pose estimation can be recognized after correction, greatly improving the accuracy of face recognition. In driving state detection, face pose estimation can obtain the driver's facial posture, that is, the face orientation, in real time. When the orientation exceeds the threshold, it can be considered as fatigue driving or encountering special circumstances, and an early warning can be issued in time. In virtual reality, face pose estimation can provide corresponding information for face reconstruction, which can widely improve people's experience in games, social networking, film and television, and other fields.

Face pose estimation is challenging due to the diversity of facial appearance, as well as changes in face angles, facial expressions, facial texture differences, uneven lighting, and face occlusions.

Existing methods have low accuracy in estimating facial pose.

Summary of the invention

The purpose of the embodiments of this specification is to provide a method and device for estimating facial posture based on a structured light system.

To solve the above technical problems, the embodiments of the present application are implemented in the following ways:

In a first aspect, the present application provides a method for estimating face posture based on a structured light system, the method comprising:

Obtain a frontal zero-pose face image of the object to be tested, use the three-dimensional point cloud of the frontal zero-pose face image as a standard model, select 2D key points of the frontal zero-pose face image to obtain standard 2D key points, and obtain standard 3D key points corresponding to the standard 2D key points according to the structured light system;

Collect the two-dimensional face image of the object to be tested in real time, select 2D key points of the two-dimensional face image to obtain real-time 2D key points, and obtain the real-time 3D key points corresponding to the real-time 2D key points according to the structured light system;

Determine the 3D key point cloud based on standard 3D key points and real-time 3D key points;

Determine the precise pose based on the 3D key point cloud.

In one embodiment, selecting 2D key points from a frontal zero-pose face image or a two-dimensional face image includes:

Perform feature point detection on a frontal zero-pose face image or a two-dimensional face image to obtain face feature points;

Select 2D key points from facial feature points.

In one embodiment, determining a 3D key point cloud based on standard 3D key points and real-time 3D key points includes:

Determine the initial pose between the object to be tested and the standard model based on the standard 3D key points and the real-time 3D key points;

Determine the 3D key point cloud based on standard 3D key points, real-time 3D key points and initial pose.

In one embodiment, determining a 3D key point cloud based on standard 3D key points, real-time 3D key points and initial poses includes:

The complete 3D point cloud corresponding to the 2D face image of the object to be tested is used as the search object. The real-time 3D key point is used as the center, and a search is performed within a sphere with a preset radius, and the searched 3D face points are added to a candidate set, which includes the real-time key point and the searched 3D face points;

For each point in the candidate set, perform point cloud registration transformation based on the initial pose, and calculate the distance from each point to the nearest point in the complete 3D point cloud of the standard model after the transformation. When the distance is less than the threshold, add the corresponding point in the candidate set to the first consistent set, and add the nearest point in the complete 3D point cloud of the corresponding standard model to the second consistent set;

The first consistent set and the second consistent set constitute a 3D keypoint cloud.

In one embodiment, determining the precise pose based on the 3D key point cloud includes:

Taking the initial pose as the initial value, the first consistent set and the second consistent set are precisely aligned to determine the precise pose.

In one of the embodiments, the Trimmed ICP algorithm is used to perform precise alignment operations on the first consistent set and the second consistent set.

In one embodiment, the structured light system includes an infrared camera, an infrared projector and a terminal device, and the infrared camera and the infrared projector are respectively connected to the terminal device.

In a second aspect, the present application provides a face posture estimation device based on a structured light system, the device comprising:

The first selection module is used to obtain a frontal zero-pose face image of the object to be tested, use the three-dimensional point cloud of the frontal zero-pose face image as a standard model, select 2D key points of the frontal zero-pose face image to obtain standard 2D key points, and obtain standard 3D key points corresponding to the standard 2D key points according to the structured light system;

The second selection module is used to collect the two-dimensional face image of the object to be tested in real time, select 2D key points of the two-dimensional face image, obtain real-time 2D key points, and obtain real-time 3D key points corresponding to the real-time 2D key points according to the structured light system;

A first determination module is used to determine a 3D key point cloud according to standard 3D key points and real-time 3D key points;

The second determination module is used to determine the precise pose based on the 3D key point cloud.

In a third aspect, the present application provides an electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the method for estimating facial posture based on a structured light system as in the first aspect is implemented.

In a fourth aspect, the present application provides a readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method for estimating facial posture based on a structured light system as in the first aspect.

It can be seen from the technical solution provided in the above embodiments of this specification that the solution: uses a structured light system to estimate facial posture. This method is fast, real-time, accurate and stable.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to more clearly illustrate the embodiments of this specification or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in this specification. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative labor.

FIG1 is a schematic diagram of the structure of a structured light system;

FIG2 is a schematic diagram of a coded pattern projected by a projector in a structured light system;

FIG3 is a schematic diagram of a mathematical model of a structured light system;

FIG4 is a specific schematic diagram of using a structured light system;

FIG5 is a schematic diagram of a flow chart of a method for estimating facial posture based on a structured light system provided in the present application;

FIG6 is a schematic diagram of a face 68 feature point model detected by Dlib;

FIG7 is a schematic diagram of the nose as a 2D key point;

FIG8 is a schematic diagram of the structure of a face posture estimation device based on a structured light system provided by the present application;

FIG. 9 is a schematic diagram of the structure of an electronic device provided in this application.

Detailed ways

In order to enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below in conjunction with the drawings in the embodiments of this specification. Obviously, the described embodiments are only part of the embodiments of this specification, not all embodiments. Based on the embodiments in this specification, all other embodiments obtained by ordinary technicians in this field without creative work should fall within the scope of protection of this specification.

In the following description, specific details such as specific system structures, technologies, etc. are provided for the purpose of illustration rather than limitation, so as to provide a thorough understanding of the embodiments of the present application. However, it should be clear to those skilled in the art that the present application may also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to prevent unnecessary details from obstructing the description of the present application.

It will be apparent to those skilled in the art that various modifications and variations may be made to the specific embodiments of the present application description without departing from the scope or spirit of the present application. Other embodiments derived from the present application description will be apparent to those skilled in the art. The present application description and examples are merely exemplary.

The words “include,” “including,” “have,” “contain,” etc. used in this document are open-ended terms, meaning including but not limited to.

Unless otherwise specified, "parts" in this application are calculated by mass.

Existing face pose estimation generally includes model-based methods, feature regression-based methods, classification-based methods, methods based on the geometric relationship of facial key points, and also includes emerging methods such as subspace learning-based methods.

Among them, the model-based method extracts 2D feature points from the face area in the two-dimensional image and establishes a corresponding relationship with the 3D feature points of the three-dimensional face model to estimate the face posture. The face posture obtained by this method is a continuous value with high accuracy, so it has become a commonly used method. However, the three-dimensional average face model commonly used in this type of method is usually different from the two-dimensional face image, and the accuracy of posture estimation for large-angle deviations and exaggerated expressions of the face is low, and the robustness is poor.

Among them, the method based on feature regression: obtain the mapping relationship from image space to posture space through machine learning, that is, by building a mathematical model (mainly neural network) to train a large number of face images with known postures to establish a mapping relationship and determine the face posture of the sample. However, in reality, this correspondence requires a large number of data sets to verify, and interpolation is often required in the process of image processing, which requires a lot of calculations. It is also greatly affected by the face detection and positioning results and is not robust enough.

Among them, the classification-based method: divides the facial posture into different categories within a certain range, and classifies the samples to be determined. This type of method quantifies the head posture space into several discrete points, and prepares several templates for each posture, and then compares the samples to be determined with the templates one by one. The head posture corresponding to the template with the highest matching score is the classification result. According to the different templates to be compared, the specific methods can be divided into shape template-based, detection-based or local constraint model-based methods. The physical quantities that need to be compared with the templates are image texture, posture detector, and sub-organ detector set arranged in a certain topology. The results obtained by this type of method are discrete values, with high time complexity, low efficiency, large errors, and real-time performance is difficult to guarantee.

Among them, the method based on the geometric relationship of facial key points: first determine the location of facial key points, and then estimate the face posture through the relative position of the key points. After determining the key points, this method determines the face posture through certain geometric constraints. This type of method is relatively simple and has low time complexity, but it is very sensitive to occlusion and missing key points and has poor stability.

Among them, the method based on subspace learning: It is believed that the posture space of the face is a natural three-dimensional space, which can be regarded as a three-dimensional posture manifold embedded in the high-dimensional image space. This type of method is relatively new, but currently has high time complexity, low accuracy, and poor practicality.

Based on the above defects, the present application proposes a face pose estimation method based on a structured light system, which can perform high-precision and robust estimation of face pose.

Among them, structured light technology, as a mature active 3D information acquisition technology, has the advantages of non-contact, high precision, good real-time performance, low cost, large field of view and strong anti-interference ability. 3D reconstruction technology based on structured light system (including face pose estimation) is an active 3D reconstruction method.

The structured light system replaces a camera in the traditional binocular vision method with a projector. As shown in Figure 1, this application uses an infrared projector and an infrared camera (or infrared camera or simply camera or camera) and a terminal device (such as the computer in Figure 1) to build a dynamic structured light system. The structured light system uses the template projected by the codec projector to solve the corresponding point matching problem that is difficult to solve in binocular vision, and then uses the calibrated camera and projector to obtain the object's triangulation principle. Three-dimensional information.

The structured light system of this application is based on the principle of time coding, specifically a stripe binary coding method based on Gray code plus line shift coding. The basic idea is to first make a series of coded patterns according to the coding principle, then use a projector to project the sequence pattern onto the target surface, and then decode the target image with the coded pattern. The point cloud scanned and calculated using this method not only has a high spatial resolution, but also has a high reconstruction accuracy.

The Gray code plus line shift coding method adopted in this application achieves the coding purpose by continuously projecting multiple coding patterns to the target to be tested. There are a total of 18 patterns produced according to this method, as shown in Figure 2, of which the first two are all-white and all-black patterns, which are used to extract effective pixels and obtain the current texture, the middle 8 are Gray code coding patterns, and the last 8 are line shift patterns based on Gray code patterns.

After the target to be measured is projected with the coded pattern, the coded value of each pixel needs to be found from the image sequence captured by the camera. The general method for finding the coded value of each pixel is to first determine the edge position of the stripe in the image, then determine the stripe condition based on the pixel values on both sides of the stripe edge, and then determine the code value corresponding to each pixel based on the stripe condition. The image stripe edge detection can use, for example, the zero crossing detection method, which can achieve sub-pixel detection accuracy.

Suppose that the three stripe edges detected in the coded image are L ₁ , L ₂ and L ₃ from left to right, and the average grayscale value of the pixels in a row between edges L ₁ and L ₂ is gray _L , and the average grayscale value of the pixels in a row between edges L ₂ and L ₃ is gray _R , then the grayscale value difference between the left and right sides of L ₂ is
gray _Δ = gray _L - gray _R (1)

If gray _Δ <0, it means that the pixel value of the left stripe of edge L ₂ is smaller than that of the right stripe, and the left side of L ₂ corresponds to the black stripe and the right side corresponds to the white stripe. If gray _Δ >0, the stripes are opposite, and the stripe situation is determined accordingly.

After determining the stripe situation, set the encoding value of the pixel corresponding to the black stripe to 0, and the encoding value of the pixel corresponding to the white stripe to 1. Decode the Gray code and line shift coded images respectively. Subdivide the Gray code decoding value according to each line shift decoding value to obtain the unique absolute decoding value of each pixel position. The process is as follows:
P＝G+S
G∈{0,1,2,…,2 ⁿ -1} (2)
S∈{0,1,2,…,m-1}

Among them, P is the absolute decoding value, G is the Gray code decoding value, S is the line shift decoding value, n is the number of Gray code coding patterns, and m is the number of line shifts. Through the above operation, the 18 coding patterns projected by the projector can be encoded and decoded, and the matching points in the projector and the camera can be matched one by one.

Based on the camera's imaging model, we can quickly establish the relationship between a three-dimensional point in space and its corresponding image plane coordinates. In structured light reconstruction technology, we generally regard the projector as a camera in the reverse light path, so we can also use the camera model to establish relevant connections. The specific mathematical model is shown in Figure 3, with the projector imaging model on the left and the camera imaging model on the right. M is a point on the object to be measured, and the corresponding point of M in the projector image is m ^p , and the corresponding point in the camera image is m ^c . From the camera imaging model, we can get:

In the formula, the parameters with superscript c correspond to the parameters of the camera, and the parameters with superscript p correspond to the parameters of the projector. m ^c/p is the 2D image coordinate in the digital image coordinate system, Mc ^/p is the coordinate in the camera/projector coordinate system, M ^w is the coordinate in the world coordinate system, R ^c/p and T ^c/p are the rotation and translation matrices (external parameters) of the camera/projector coordinate system relative to the world coordinate system, s is the scale factor, and _fu , f _v , u ₀ , v ₀ , γ are the parameters of the intrinsic parameter matrix (internal parameters). Through calibration, we can obtain the intrinsic and extrinsic parameters of the camera and projector.

The position relationship between the camera and the projector is shown in equation (5):

Where R and T are the rotation and translation matrices between the projector and camera coordinate systems. After calibrating the external parameters R ^c , R ^p , T ^c and T ^p , R and T can be obtained from the coordinate transformation relationship.

Suppose the edge points of the stripes extracted on the camera image plane are ( Represents the projection representation of M) and the projector pattern on the plane is the matching point of the same scene point M. For the two-dimensional position, we encode along the x dimension (horizontally) and determine the correspondence between u _c and _up by matching the encoded values.

As shown in Figure 3, O _c O _p is the epipolar line. Using the epipolar line constraint, we can get:

Where K _c and K _p are the intrinsic parameter matrices of the camera and projector. According to Formula 6, the corresponding relationship between v _c and v _p in the y dimension (vertical) can be determined, from which we can determine the corresponding matching points and The complete coordinates of .

Finally, using the principle of triangulation, we can get the depth information z _c :

Formula (3) can be used to obtain complete three-dimensional space point information ^Mc ( _xc , _yc , _zc ), thereby achieving a 2D-3D one-to-one correspondence from two-dimensional pixel points to three-dimensional space points.

As shown in Figure 4, it is a specific schematic diagram of using a structured light system. In this application, a DLP4500 infrared projector is used to continuously project 18 coded patterns onto a human face in real time, and an infrared camera is used to capture the facial data of the projected patterns in real time. By encoding and decoding the patterns, a high-precision real-time point cloud data sequence can be obtained, realizing a one-to-one correspondence between 2D-3D facial data (i.e., 2D facial data and 3D point cloud data).

The structured light system used in this application is based on the principle of time coding, specifically based on Gray code plus line shift coding method. The structured light system can obtain high-precision, high-frame rate three-dimensional point cloud data that corresponds one-to-one to two-dimensional facial pixels for facial posture estimation.

The present invention is further described in detail below with reference to the accompanying drawings and embodiments.

5 , which shows a flow chart of a method for estimating a facial posture based on a structured light system applicable to an embodiment of the present application.

As shown in FIG5 , a method for estimating a face posture based on a structured light system may include:

S510, obtaining a frontal zero-pose face image of the subject to be tested, and converting the three The 3D point cloud is used as the standard model to select 2D key points of the frontal zero-pose face image to obtain standard 2D key points, and the standard 3D key points corresponding to the standard 2D key points are obtained according to the structured light system.

Specifically, the object to be measured refers to an object whose facial posture is to be estimated, which may be a driver in a driving state, a player playing virtual reality, a person performing face recognition, etc.

The frontal zero-pose face image refers to the face pose image obtained when the face of the subject to be tested faces the camera.

Using the three-dimensional point cloud of the frontal zero-pose facial image of the object to be tested as a standard model (or called a standard face or a standard face 3D model or a standard face model) can overcome the diversity of facial appearance and differences in facial texture to a certain extent compared to directly using the 3D average face as the standard model.

S520 , collecting a two-dimensional face image of the object to be measured in real time, selecting 2D key points of the two-dimensional face image, and obtaining 3D key points corresponding to the 2D key points according to the structured light system.

Specifically, the two-dimensional face image refers to an image captured by a camera in real time.

The infrared structured light system of this application can obtain one-to-one corresponding 2D face data and 3D point cloud data. For the task of face pose estimation, the calculation of the entire 3D point cloud (the complete point cloud corresponding to the 2D face photo is called the 3D point cloud) is large and time-consuming, and will be affected by factors such as changes in facial expressions, and the stability and robustness are poor. Therefore, this application selects stable 2D key points and 3D key points for face pose estimation.

In one embodiment, 2D key point selection is performed on a frontal zero-pose face image or a two-dimensional face image, including:

Select 2D key points from facial feature points.

Specifically, when performing key point detection and selection on a frontal zero-pose face image or a two-dimensional face image, it should be noted that for the detection of facial feature points, this application does not limit the specific algorithm. Commonly used optimization-based methods (such as ASM and AAM), regression-based methods (cascaded pose regression and SDM) and deep learning-based methods can all be used in the present invention to achieve facial feature point detection. The following is an example of using Dlib based on the cascade regression method to achieve 68 facial feature point detection.

Among them, Dlib uses the classic oriented histogram, gradient features combined with linear classifiers to achieve face detection, and uses ERT cascade regression, that is, a regression tree method based on gradient boosting learning to detect 68 feature points of the face, as shown in Figure 6, including eyebrows, eyes, nose, mouth and facial contours.

Through Dlib, the coordinates of facial feature points can be obtained in the 2D image obtained by the system. In order to avoid the influence of changes in facial expressions and improve robustness, we further select the feature points. When people change different expressions, the eyebrows, eyes, mouth and chin have large movements, so the feature points of these parts are not selected. The nose is relatively stable and is selected as the key point for subsequent estimation. Therefore, 9 points of the nose are selected as 2D key points, as shown in Figure 7. It can be understood that different stable areas can also be selected as key points according to actual conditions, such as using key point clouds of the nose, cheeks and corners of the eyes at the same time.

According to the 2D key point coordinates obtained above and the 2D-3D geometric correspondence determined in the infrared structured light system, the 3D key point coordinates corresponding to the 2D key points can be obtained, specifically including standard 3D key points corresponding to standard 2D key points and real-time 3D key points corresponding to real-time 2D key points.

S530, determining a 3D key point cloud according to standard 3D key points and real-time 3D key points, may include:

Specifically, according to the coordinates of the determined standard 3D key points and real-time 3D key points, the initial poses R ₀ and t ₀ between the object to be measured and the standard model can be calculated, and the initial poses can be used as initial values for subsequent precise registration and can also be used to obtain 3D key point clouds.

In order to prevent large angle changes of the face and improve the stability and robustness of face pose estimation, based on the already obtained 3D key points, this application expands the acquisition of 3D key point cloud.

In one embodiment, determining a 3D key point cloud according to standard 3D key points, real-time 3D key points and initial poses may include:

The complete 3D point cloud corresponding to the 2D face image of the object to be tested is used as the search object. The search is performed within a sphere with a preset radius, with the acquired real-time 3D key points as the center. The searched face 3D points are Add candidate sets, which include real-time key points and searched 3D face points;

Specifically, the candidate set C is constructed by taking the complete 3D point cloud corresponding to the 2D face photo (i.e., the 2D face image of the object to be measured) as the search object. With the acquired real-time 3D key points as the center, search within a sphere with a radius of r (i.e., the preset radius, which can be set according to actual needs), and add the searched face 3D points to the candidate set C. According to the above operation, the candidate set C of the face to be estimated can be obtained, and the candidate set C includes 3D key points and the newly searched face 3D points.

Consistent set S (i.e. 3D key point cloud) construction: Based on the candidate set C, the consistent set S is constructed based on the idea of RANSAC algorithm. Specifically, for each point p in the candidate set C, the point cloud registration transformation is performed based on the initial pose R ₀ and T ₀ , and after the transformation, the distance from each point to the nearest point in the complete 3D point cloud of the standard face model is calculated, and the point p with a distance less than the threshold δ (which can be set according to actual needs) is added to the consistent set S ₀ (i.e. the first consistent set), and the nearest point q corresponding to point p is added to the consistent set S ₁ (i.e. the second consistent set). Consistent set S ₀ and consistent set S ₁ are the obtained 3D key point clouds.

S540, determine the precise position and posture according to the 3D key point cloud, including:

Taking the initial pose as the initial value, the first consistent set and the second consistent set are precisely aligned to determine the precise pose. Among them, the Trimmed ICP algorithm is used to perform precise alignment on the first consistent set and the second consistent set.

Specifically, this application adopts the Trimmed ICP algorithm for precise alignment, and uses the LTS (least trimmed squares) method to perform an ascending sort on the residuals obtained for each group of matching points, only intercepts the first proportion of corresponding points to fit the error function, and solves R and T by iteratively minimizing the error function.

The specific process is:

a) For each point on the first consistent set S ₀ (also called the source point cloud), find its matching point on the second consistent set S ₁ (also called the target point cloud), and calculate the square value of its residual d _i (R,T) ² .

Among them, suppose there are N _p points in the source point cloud S ₀ , that is, For each point p _i in S ₀ , the corresponding point after R and T conversion is p _i (R, T) = R · p _i + T. We search for matching points for each point p _i (R, T) in the target point cloud S ₁ , and the search method is as follows:

After finding the closest point as the matching point, calculate the residual d _i (R,T) of each set of matching points:
d _i (R, T) = |m _cl (i, R, T) - _{p i} (R, T)| (9)

b) Sort d _i (R, T) ² in ascending order, select the residual square values corresponding to the first N _po points, and sum them up to get S′ _LTS . Where N _po is the number of selected points, and according to formula (10), we get:
N _po =ξN _p (10)

Where ξ is the minimum overlap rate, which can be adaptively obtained by minimizing the objective function (11):
ψ(ξ)＝e(ξ)ξ ^-(1+λ) (11)

In the formula, λ is a preset non-negative number, and e is the trimmed MSE, see formula (12):
e＝S′ _LTS /N _po (12)

c) If the termination condition is met, exit the iteration, otherwise start a new round of iteration, calculate the optimal transformation R and T of N _po selected points by minimizing S′ _LTS , use the obtained R and T to transform the corresponding points, and then return to a).

The termination condition is: any one of reaching the set maximum number of iterations (the maximum number of iterations can be set according to actual needs), trimmed MSE e=S′ _LTS /N _po is small enough, and the relative change amount of trimmed MSE |ee′|/e is small enough.

According to the Trimmed ICP algorithm, the 3D key point clouds S ₀ and S ₁ are registered to obtain the optimal rotation matrix R, which is the final face pose matrix. It can be understood that if the result needs to be better visualized, R can be matrix decomposed to obtain the Euler angle representation of the face pose, namely the pitch angle (pitch), yaw angle (yaw) and roll angle (roll).

It can be understood that in this embodiment, other registration methods may also be used to perform a precise registration operation on the first consistent set and the second consistent set.

The face posture estimation method based on the structured light system provided in the embodiment of the present application uses the structured light system to The system is used to estimate face posture. The method has fast speed, strong real-time performance, high accuracy and strong stability.

The face posture estimation method based on the structured light system provided in the embodiment of the present application does not require too many restrictions. At the same time, the present invention selects a stable area for face posture estimation based on the high-precision real-time reconstruction data of the face. Compared with most existing algorithms, it has higher accuracy and better robustness for problems such as changes in facial expressions and large angle changes.

8 , which shows a schematic diagram of the structure of a face pose estimation device based on a structured light system according to an embodiment of the present application.

As shown in FIG8 , a face posture estimation device 800 based on a structured light system may include:

The first selection module 810 is used to obtain a frontal zero-pose face image of the object to be tested, use the three-dimensional point cloud of the frontal zero-pose face image as a standard model, select 2D key points of the frontal zero-pose face image to obtain standard 2D key points, and obtain standard 3D key points corresponding to the standard 2D key points according to the structured light system;

The second selection module 820 is used to collect a two-dimensional face image of the object to be tested in real time, select 2D key points of the two-dimensional face image to obtain real-time 2D key points, and obtain real-time 3D key points corresponding to the real-time 2D key points according to the structured light system;

A first determination module 830 is used to determine a 3D key point cloud according to standard 3D key points and real-time 3D key points;

The second determination module 840 is used to determine the precise pose according to the 3D key point cloud.

Optionally, the first selection module 810 or the second selection module 820 is further used for:

Select 2D key points from facial feature points.

Optionally, the first determining module 830 is further configured to:

The complete 3D point cloud corresponding to the 2D face image of the object to be tested is used as the search object, and the search is performed within a sphere of a preset radius with the acquired real-time 3D key points as the center, and the searched face 3D points are added to the candidate set, which includes the real-time key points and the searched face 3D points;

Optionally, the second determining module 840 is further configured to:

Optionally, the Trimmed ICP algorithm is used to perform precise alignment operations on the first consistent set and the second consistent set.

Optionally, the structured light system includes an infrared camera, an infrared projector and a terminal device, and the infrared camera and the infrared projector are respectively connected to the terminal device.

This embodiment provides a face posture estimation device based on a structured light system, which can execute the embodiment of the above method. Its implementation principle and technical effect are similar and will not be repeated here.

Fig. 9 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present invention. As shown in Fig. 9, a schematic diagram of the structure of an electronic device 900 suitable for implementing the embodiment of the present application is shown.

As shown in FIG9 , the electronic device 900 includes a central processing unit (CPU) 901, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 902 or a program loaded from a storage part 908 into a random access memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 are also stored. The CPU 901, the ROM 902, and the RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

The following components are connected to the I/O interface 905: an input section 906 including a keyboard, a mouse, etc.; an output section 907 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.; The I/O interface 906 includes a storage section 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, and the like. The communication section 909 performs communication processing via a network such as the Internet. A drive 910 is also connected to the I/O interface 906 as needed. A removable medium 911, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, and the like, is installed on the drive 910 as needed, so that a computer program read therefrom is installed into the storage section 908 as needed.

In particular, according to an embodiment of the present disclosure, the process described above with reference to FIG. 1 can be implemented as a computer software program. For example, an embodiment of the present disclosure includes a computer program product, which includes a computer program tangibly contained on a machine-readable medium, and the computer program includes a program code for executing the above-mentioned method for estimating a face pose based on a structured light system. In such an embodiment, the computer program can be downloaded and installed from a network through the communication part 909, and/or installed from a removable medium 911.

The flow chart and block diagram in the accompanying drawings illustrate the possible architecture, function and operation of the system, method and computer program product according to various embodiments of the present invention. In this regard, each box in the flow chart or block diagram can represent a module, a program segment or a part of the code, and the aforementioned module, program segment or a part of the code contains one or more executable instructions for realizing the specified logical function. It should also be noted that in some alternative implementations, the functions marked in the box can also occur in a different order from the order marked in the accompanying drawings. For example, two boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved. It should also be noted that each box in the block diagram and/or flow chart, and the combination of the boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs the specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.

The units or modules involved in the embodiments described in the present application may be implemented by software or hardware. The units or modules described may also be arranged in a processor. The names of these units or modules do not constitute limitations on the units or modules themselves in certain circumstances.

The systems, devices, modules or units described in the above embodiments may be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer. Specifically, the computer may be, for example, a personal computer, a laptop computer, a mobile phone, a smart phone, A personal digital assistant, a media player, a navigation device, an email device, a gaming console, a tablet computer, a wearable device, or a combination of any of these devices.

As another aspect, the present application further provides a storage medium, which may be the storage medium included in the aforementioned device in the above embodiment; or it may be a storage medium that exists independently and is not assembled into the device. The storage medium stores one or more programs, and the aforementioned programs are used by one or more processors to execute the face pose estimation method based on the structured light system described in the present application.

Storage media includes permanent and non-permanent, removable and non-removable media that can be implemented by any method or technology to store information. Information can be computer-readable instructions, data structures, program modules or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media that can be used to store information that can be accessed by a computing device. As defined in this article, computer-readable media does not include temporary computer-readable media (transitory media), such as modulated data signals and carrier waves.

It should be noted that the terms "include", "comprises" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, commodity or device. In the absence of further restrictions, the elements defined by the sentence "comprises a ..." do not exclude the existence of other identical elements in the process, method, commodity or device including the elements.

Each embodiment in this specification is described in a progressive manner, and the same or similar parts between the embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the partial description of the method embodiment.

Claims

A method for estimating facial posture based on a structured light system, characterized in that the method comprises:

Acquire a frontal zero-pose face image of the object to be tested, use the three-dimensional point cloud of the frontal zero-pose face image as a standard model, select 2D key points of the frontal zero-pose face image to obtain standard 2D key points, and obtain standard 3D key points corresponding to the standard 2D key points according to the structured light system;

Collecting a two-dimensional face image of the object to be tested in real time, selecting 2D key points of the two-dimensional face image to obtain real-time 2D key points, and obtaining real-time 3D key points corresponding to the real-time 2D key points according to a structured light system;

Determine a 3D key point cloud according to the standard 3D key points and the real-time 3D key points;

According to the 3D key point cloud, the precise pose is determined.
The method according to claim 1, characterized in that the step of selecting 2D key points from the frontal zero-pose face image or the two-dimensional face image comprises:

Performing feature point detection on the frontal zero-pose face image or the two-dimensional face image to obtain face feature points;

Select 2D key points from facial feature points.
The method according to claim 1, characterized in that the step of determining a 3D key point cloud based on the standard 3D key points and the real-time 3D key points comprises:

Determining an initial position between the object to be measured and the standard model according to the standard 3D key points and the real-time 3D key points;

A 3D key point cloud is determined according to the standard 3D key points, the real-time 3D key points and the initial pose.
The method according to claim 3, characterized in that the step of determining a 3D key point cloud based on the standard 3D key points, the real-time 3D key points and the initial pose comprises:

Taking the complete 3D point cloud corresponding to the two-dimensional face image of the object to be tested as the search object, searching within a spherical area of a preset radius with the acquired real-time 3D key points as the center, and adding the searched face 3D points to a candidate set, wherein the candidate set includes the real-time key points and the searched face 3D points;

For each point in the candidate set, perform point cloud registration transformation based on the initial pose, and calculate the distance from each point to the nearest point in the complete 3D point cloud of the standard model after the transformation. When the distance is less than a threshold, add the corresponding point in the candidate set to the first consistent set, and add the corresponding nearest point in the complete 3D point cloud of the standard model to the second consistent set;

The first consistent set and the second consistent set constitute the 3D key point cloud.
The method according to claim 4, characterized in that determining the precise pose according to the 3D key point cloud comprises:

The first consistent set and the second consistent set are precisely aligned with each other using the initial pose as an initial value to determine the precise pose.
The method according to claim 5 is characterized in that the first consistent set and the second consistent set are precisely aligned using the Trimmed ICP algorithm.
The method according to any one of claims 1 to 6 is characterized in that the structured light system comprises an infrared camera, an infrared projector and a terminal device, and the infrared camera and the infrared projector are respectively connected to the terminal device.
A face posture estimation device based on a structured light system, characterized in that the device comprises:

The first selection module is used to obtain a frontal zero-pose face image of the object to be tested, use the three-dimensional point cloud of the frontal zero-pose face image as a standard model, perform 2D key point selection on the frontal zero-pose face image to obtain standard 2D key points, and obtain standard 3D key points corresponding to the standard 2D key points according to the structured light system;

The second selection module is used to collect the two-dimensional face image of the object to be tested in real time, select 2D key points of the two-dimensional face image to obtain real-time 2D key points, and obtain real-time 3D key points corresponding to the real-time 2D key points according to the structured light system;

A first determination module, configured to determine a 3D key point cloud according to the standard 3D key points and the real-time 3D key points;

The second determination module is used to determine the precise posture according to the 3D key point cloud.
An electronic device comprising a memory, a processor, and a device stored in the memory and operable on the processor The running computer program is characterized in that when the processor executes the program, the method for estimating facial posture based on the structured light system as described in any one of claims 1 to 7 is implemented.
A readable storage medium having a computer program stored thereon, characterized in that when the program is executed by a processor, the method for estimating a facial posture based on a structured light system as described in any one of claims 1 to 7 is implemented.