WO2022252118A1

WO2022252118A1 - Head posture measurement method and apparatus

Info

Publication number: WO2022252118A1
Application number: PCT/CN2021/097701
Authority: WO
Inventors: 刘杨; 郭子衡; 黄为
Original assignee: 华为技术有限公司
Priority date: 2021-06-01
Filing date: 2021-06-01
Publication date: 2022-12-08
Also published as: CN113544744A

Abstract

A head posture measurement method, comprising: acquiring face point cloud data of a target object (S110); acquiring point cloud data of face keypoints of the target object on the basis of the face point cloud data of the target object and face two-dimensional keypoint data of the target object (S120); aligning the point cloud data of the face keypoints of the target object with point cloud data of face keypoints in a parameterized face model to obtain a first similarity transform parameter (S130); optimizing the first similarity transform parameter according to an objective function to obtain a second similarity transform parameter (S140); and determining the head posture of the target object according to the second similarity transform parameter (S150).

Description

Method and device for measuring head posture

technical field

The present application relates to the technical field of artificial intelligence, in particular to a head posture measurement method and device.

Background technique

3D face reconstruction technology is a research hotspot in the field of computer vision and computer graphics. 3D face reconstruction is one of the core technologies in the fields of virtual reality/augmented reality, automatic driving, robotics, etc., and it has great application value in the Driver Monitoring System (DMS). The driver's head data monitored by DMS can be used to analyze the driver's driving behavior. By analyzing the driver's driving behavior, dangerous driving can be avoided. Therefore, convenient and accurate monitoring of driver's head data has great application value.

In the prior art, a dedicated measuring device is generally used to measure the driver's head data, for example, the Smarteye system. The Smarteye system marks the 3D key points of the face based on the multi-eye camera to establish the head coordinate system. The head pose measurement part of the Smarteye system consists of 4 high-definition infrared cameras. During the measurement, it is necessary to use the 4 high-definition infrared cameras to simultaneously track the 2D key points of the face, and then project the 2D key points into the 3D space to obtain the 3D key points. However, this method needs to use a checkerboard calibration board for geometric calibration in the system configuration stage, and the operation is complicated. In addition, the system is expensive and cannot be widely used.

Another method in the prior art is to monitor the driver's head data based on an optical tracker. The method requires the driver to wear a marking device on the head, and establishes the transformation relationship from the marking point of the marking device to the head coordinate system. However, this method relies on the stability of the marking equipment and complex coordinate transformation, etc., which is not only easy to introduce errors, but also complicated to operate.

Contents of the invention

In view of the above problems in the prior art, the present application provides a head posture measurement method and device, which can obtain head data conveniently and accurately.

In order to achieve the above purpose, the first aspect of the present application provides a head posture measurement method, the method comprising:

Obtain the face point cloud data of the target object; based on the face point cloud data of the target object and the face two-dimensional key point data of the target object, obtain the point cloud data of the face key points of the target object; Register the point cloud data of the key points with the point cloud data of the key points of the face in the parameterized face model to obtain the first similarity transformation parameters; optimize the first similarity transformation parameters according to the objective function to obtain the second similarity transformation parameters ; Determine the head pose of the target object according to the second similarity change parameter.

A head posture measurement method provided by the present application obtains the first similarity transformation parameters by registering the point cloud data of the key points of the face in the parameterized face model with the point cloud data of the key points of the face of the target object , and then use the objective function to optimize the first similarity transformation parameters, which can improve the accuracy of the similarity transformation parameters between the two, make the fit between the two higher, obtain higher fitting accuracy, and obtain a more accurate head monitoring data. Based on the technical solution of the present application, there is no need to introduce additional expensive equipment, therefore, the effect of cost saving can also be achieved.

As a possible implementation of the first aspect, the objective function includes a point-plane distance function; wherein, the point-plane distance function is the point in the face point cloud data of the target object to the nearest triangle in the parameterized face model The distance function of the patch; the triangle patch is a triangle formed by three adjacent points in the parameterized face model.

As a possible implementation of the first aspect, the point-to-plane distance function includes:

Among them, D _2pf is the point-plane distance function, s _i is the point in the face point cloud data of the target object, p(s _i , f _c(i) ) is the point s _i with the closest distance in the parameterized face model The projection point on the triangular face f _c(i) ,

is the point closest to _si on the jth side of the nearest triangular face f _c(i) ,

is the wth vertex on the nearest triangular face f _c(i) .

By calculating the above-mentioned point-to-plane distance, the degree of fit between the point cloud data of the face of the target object and the point cloud data of the key points of the face in the parameterized face model can be improved, and higher fitting accuracy can be obtained, thereby obtaining accurate header data.

As a possible implementation of the first aspect, the objective function also includes: a key point projection distance function; wherein, the key point projection distance function is the projection of the key points of the face in the parameterized face model on the two-dimensional image of the face point, a function of the distance to the face 2D keypoints on the face 2D image of the target subject.

As a possible implementation of the first aspect, the key point projection distance function includes:

Among them, D _proj is the key point projection distance function, u _i is the projection point of the face key point in the parameterized face model on the face two-dimensional image, v _i is the face two-dimensional key point on the face two-dimensional image point, n is the number of face key points in the parameterized face model.

From the above, by calculating the projection point of the face key points in the parameterized face model on the face two-dimensional image, the distance to the face two-dimensional key points on the face two-dimensional image of the target object can be improved. The degree of fit between the subtle parts of the face (such as the edge of the lips) and the parametric face model makes the fitting more accurate.

As a possible implementation of the first aspect, the objective function further includes: a penalty term function that parameterizes coefficients of the face model; wherein, the penalty term is used to constrain the size of the coefficients.

As a possible implementation of the first aspect, the penalty term function of parameterized face model coefficients includes:

E _pri ＝λ _S *||S|| ² +λ _E *||E|| ² +λ _P *||P|| ²

Among them, E _pri is the penalty term function of the parameterized face model coefficient, S is the shape coefficient in the parameterized face model, E is the expression coefficient in the parameterized face model, and P is the pose in the parameterized face model coefficient, λ _S is the penalty coefficient of the shape coefficient in the parameterized face model, λ _E is the penalty coefficient of the expression coefficient in the parameterized face model, and λ _P is the penalty coefficient of the pose coefficient in the parameterized face model.

By increasing the penalty term for each coefficient of the parametric face model, the deformation ability of the parametric face model can be constrained, which can reduce the deformity that is easy to occur when only using distance to constrain it.

As a possible implementation of the first aspect, obtaining face point cloud data of the target object includes: obtaining point cloud data of the target object based on the two-dimensional image and depth image of the target object; extracting the point cloud data from the two-dimensional image of the target object A two-dimensional face image of the target object; according to the extracted two-dimensional face image, point cloud data corresponding to the two-dimensional face image is extracted from the point cloud data of the target object.

From the above, by obtaining the point cloud data of the target object and the two-dimensional face image of the target object, and then extracting the point cloud data corresponding to the two-dimensional face image from the point cloud data of the target object, the face two-dimensional image can be obtained simply and quickly. The point cloud data corresponding to the three-dimensional image.

As a possible implementation of the first aspect, the two-dimensional image and the depth image of the target object are obtained by a TOF camera.

As a possible implementation of the first aspect, based on the face point cloud data of the target object and the two-dimensional key point data of the face of the target object, the point cloud data of the face key points of the target object is obtained, including: using the target object The two-dimensional coordinates corresponding to the face two-dimensional key point data are indexed in the face point cloud data of the target object, so as to obtain the point cloud data of the face key point of the target object. As a possible implementation of the first aspect, the facial key points of the target object are 51 facial key points.

As a possible implementation of the first aspect, the facial key points of the target object are 68 facial key points.

As a possible implementation of the first aspect, the process of registering the point cloud data of the facial key points of the target object with the point cloud data of the facial key points in the parameterized face model is a process of rigid body transformation.

As a possible implementation of the first aspect, the process of optimizing the first similarity transformation parameters according to the objective function with the first similarity transformation parameters as initial values is a non-rigid body transformation process.

As a possible implementation of the first aspect, the gradient descent method is used to optimize the first similarity transformation parameters according to the objective function with the first similarity transformation parameters as initial values to obtain the second similarity transformation parameters.

As a possible implementation of the first aspect, the quasi-Newton method is used, with the first similarity transformation parameters as initial values, and the first similarity transformation parameters are optimized according to the objective function to obtain the second similarity transformation parameters. As a possible implementation of the first aspect, determining the head pose of the target object according to the second similarity transformation parameter includes: performing a Rodrigues transformation on the second similarity transformation parameter to obtain an Ou used to represent the head pose of the target object. pull angle.

As a possible implementation manner of the first aspect, the method further includes: determining the concentration of the target object according to the head posture of the target object; and sending an alarm to the target object based on the concentration of the target object.

From the above, it is more accurate to obtain the head posture of the target object through the scheme provided by this application, and it is also more accurate to obtain the concentration degree of the target object according to the head posture, so that it is more accurate to send an alarm to the target object based on the concentration degree. Fitting to reality can reduce the occurrence of false positives. Exemplary, the false alarm can be: when the target object is not highly focused, no alarm is issued at this time, which may cause a safety hazard; another example: when the target object is highly focused, an alarm is issued at this time, which will affect the target object's concentration Spend.

The second aspect of the present application provides a head posture measurement device, including:

The first obtaining module is used to obtain the facial point cloud data of the target object;

The second acquisition module is used to obtain point cloud data of the face key points of the target object based on the face point cloud data of the target object and the two-dimensional key point data of the face of the target object;

The third acquisition module is used to register the point cloud data of the key points of the face of the target object with the point cloud data of the key points of the face in the parameterized face model to obtain the first similar transformation parameters;

A fourth acquisition module, configured to optimize the first similarity transformation parameters according to an objective function, and obtain second similarity transformation parameters;

The first determination module is configured to determine the head posture of the target object according to the second similarity change parameter.

As a possible implementation of the second aspect, the objective function in the fourth acquisition module includes a point-plane distance function;

Among them, the point-plane distance function is the distance function from the point in the face point cloud data of the target object to the nearest triangle patch in the parameterized face model; the triangle patch is the three adjacent triangles in the parameterized face model. The triangle formed by the vertices.

As a possible implementation of the second aspect, the point-to-plane distance function is specifically used for:

is the wth vertex on the nearest triangular face f _c(i) .

As a possible implementation of the second aspect, the objective function in the fourth acquisition module further includes: a key point projection distance function;

Among them, the key point projection distance function is a function of the distance between the key point of the face in the parameterized face model on the two-dimensional image of the face and the distance to the two-dimensional key point of the face on the two-dimensional face image of the target object .

As a possible implementation of the second aspect, the key point projection distance function is specifically used for:

As a possible implementation of the second aspect, the objective function in the fourth acquisition module also includes:

The penalty term function of parameterized face model coefficients; where the penalty term is used to constrain the size of the coefficients.

As a possible implementation of the second aspect, the penalty term function of parameterized face model coefficients is specifically used for:

E _pri ＝λ _S *||S|| ² +λ _E *||E|| ² +λ _P *||P|| ²

As a possible implementation of the second aspect, the first acquisition module includes:

The first acquisition submodule is used to obtain point cloud data of the target object based on the two-dimensional image and the depth image of the target object;

The first extraction submodule is used to extract the face two-dimensional image of the target object from the two-dimensional image of the target object;

The second extraction sub-module is used to extract the point cloud data corresponding to the two-dimensional face image from the point cloud data of the target object according to the extracted two-dimensional face image.

As a possible implementation of the second aspect, the two-dimensional image and the depth image of the target object are obtained by a TOF camera.

As a possible implementation of the second aspect, the facial key points of the target object are 51 facial key points.

As a possible implementation of the second aspect, the facial key points of the target object are 68 facial key points.

As a possible implementation of the second aspect, the first determining module is specifically configured to: perform a Rodrigues transformation on the second similarity transformation parameters to obtain Euler angles used to represent the head pose of the target object.

As a possible implementation of the second aspect, it also includes:

The second determination module is used to determine the concentration of the target object according to the head posture of the target object;

The alarm module is configured to send an alarm to the target object based on the concentration of the target object.

A third aspect of the present application provides a computing device, including:

Communication Interface;

at least one processor connected to the communication interface; and

At least one memory is connected to the processor and stores program instructions. When the program instructions are executed by the at least one processor, the at least one processor executes the method for measuring head posture according to any one of the above first aspects.

A fourth aspect of the present application provides a computer-readable storage medium, on which program instructions are stored. When the program instructions are executed by a computer, the computer executes the head posture measurement method of any one of the above-mentioned first aspects.

A fifth aspect of the present application provides a computer program product. When the computer program product is run on a computing device, the computing device is made to execute the method for measuring head posture according to any one of the above first aspects.

These and other aspects of the present application will be made more apparent in the following description of the embodiment(s).

Description of drawings

The various features of the present application and the connections between the various features are further described below with reference to the accompanying drawings. The drawings are exemplary, some features are not shown to scale, and in some drawings, features customary in the field to which the application pertains and are not necessary for the application may be omitted, or additionally shown for the The application is not an essential feature, and the combination of the various features shown in the drawings is not intended to limit the application. In addition, in the whole specification, the content indicated by the same reference numeral is also the same. The specific accompanying drawings are explained as follows:

FIG. 1 is a schematic diagram of an application scenario of a head posture measurement method provided in an embodiment of the present application;

FIG. 2 is a flow chart of a head posture measurement method provided by an embodiment of the present application;

FIG. 3 is a flow chart of a method for determining point cloud data of a face provided by an embodiment of the present application;

FIG. 4 is an example diagram of a method for determining a point-to-plane distance function provided in an embodiment of the present application;

FIG. 5 is a flowchart of a specific method of the head posture measurement method provided by the embodiment of the present application;

FIG. 6 is a schematic diagram of 68 two-dimensional key points of the face provided by the embodiment of the present application;

Fig. 7 is the schematic diagram of the human face coordinate system of the FLAME model face key point that the embodiment of the present application provides;

FIG. 8 is a structural schematic diagram of a driving assistance device provided in an embodiment of the present application;

FIG. 9 is a structural schematic diagram of a head posture measurement device provided by an embodiment of the present application;

FIG. 10 is a schematic diagram of a computing device provided by an embodiment of the present application.

Detailed ways

The words "first, second, third, etc." or similar terms such as module A, module B, and module C in the description and claims are only used to distinguish similar objects, and do not represent a specific ordering of objects. It can be understood that Obviously, where permitted, the specific order or sequence can be interchanged such that the embodiments of the application described herein can be practiced in other sequences than those illustrated or described herein.

In the following description, the involved reference numerals representing steps, such as S110, S120, etc., do not mean that this step must be executed, and the order of the previous and subsequent steps can be interchanged or executed simultaneously if allowed.

The term "comprising" used in the description and claims should not be interpreted as being restricted to what is listed thereafter; it does not exclude other elements or steps. Therefore, it should be interpreted as specifying the presence of said features, integers, steps or components, but not excluding the presence or addition of one or more other features, integers, steps or components and groups thereof. Therefore, the expression "apparatus comprising means A and B" should not be limited to an apparatus consisting of parts A and B only.

Reference in this specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, appearances of the phrases "in one embodiment" or "in an embodiment" in various places in this specification do not necessarily all refer to the same embodiment, but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to one of ordinary skill in the art from this disclosure, in one or more embodiments.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. In case of any inconsistency, the meaning stated in this manual or the meaning derived from the content recorded in this manual shall prevail. In addition, the terms used herein are only for the purpose of describing the embodiments of the application, and are not intended to limit the application.

In order to accurately describe the technical content in this application, and in order to accurately understand the present invention, the following explanations or definitions are given to the terms used in this specification before describing the specific embodiments:

1) Image data with depth: it includes ordinary RGB color image information and depth information (Depth Map), and RGB image information and Depth image information are registered, that is, there is a one-to-one correspondence between pixels. The acquisition of image data with depth can be realized through the RGB-D camera, and the collected image data with depth can be presented in the form of an RGB image frame and a depth image frame, or can be presented in the form of integrated image data. According to the internal parameters of the camera, the transformation between depth information and point cloud coordinates can be realized.

2) The mapping relationship between RGB image data and point cloud data based on TOF camera:

For a point cloud coordinate, that is, the world coordinate point M(x _w ,y _w ,z _w ), mapped to the image point m(u,v), the formula (1) is used to express as follows:

in,

is the internal reference matrix of TOF, u, v are arbitrary coordinate points in the image coordinate system, u ₀ , v ₀ are the center coordinates of the image respectively. x _w , y _w , z _w represent three-dimensional coordinate points in the world coordinate system. z _c represents the z-axis value of the camera coordinates, that is, the distance from the target to the camera, corresponding to the TOF camera, which is the depth value of the [u,v] point. R and T are the 3x3 rotation matrix and the 3x1 translation matrix of the external parameter matrix respectively.

Setting of the external parameter matrix: Since the origin of the world coordinates and the origin of the camera are coincident, that is, there is no rotation and translation, the R and T in the external parameter matrix are the following formulas (2):

Since the coordinate origins of the camera coordinate system and the world coordinate system coincide, the same coordinate point under the camera coordinate system and the world coordinate system has the same depth, that is, z _c = z _w , so the formula (1) can be further simplified as the following formula (3 ):

From the above transformation matrix formula (3), the following formula (4) from the image point [u, v] with the depth value z _c to the world coordinate point [x _w , y _w , z _w ] can be calculated:

3) Parametric face model: through a standard face (or called average face, reference face, basic shape face, statistical face), combined with shape feature vector, pose feature vector, or expression feature vector to represent a face . Such as 3DMM model, FLAME model, etc.

4) FLAME model: The FLAME model is constructed based on the real human body point cloud data in the 3D human body scanning database (for example: Caesar database), wherein, each real human head grid is obtained by registering the head data of these real human bodies, and the human head grid contains The entire area of the face and head, thus establishing a real face and head database. The human head grid is composed of several (such as 5023) vertices and several (such as 9976) triangular faces, and several (such as 300) shapes (shape), several (such as 100 1) expression (expression) and several (such as 15) posture (pose) principal components, so that a parameterized 3D human head model can be determined accordingly.

Specifically, through the definition of the vertex position of the grid, the shape T of FLAME is defined as the coordinates of each vertex k constituting the grid, which can be described as the following formula (1):

T=(x ₁ ,y ₁ ,z ₁ ,x ₂ ,...,x _n ,y _n ,z _n ) (1)

Among them, FLAME models the shape and expression separately, and the FLAME face model can be described as the following formula (2):

T = (v; p, q) = T ₀ + B _S (q; S) + B _p (q; E) (2)

Among them, T ₀ is a standard face, that is, the average shape part of the face; B _S (q; S) represents the mixing parameter of the face shape, for example, it can be ∑ _i q _i S _i , i=1 to n, Si represents the association The eigenvector of the variance matrix is the face shape vector parameter (the above-mentioned shape principal component); q is the coefficient corresponding to the face shape vector parameter. B _p (q; E) represents the facial expression mixing parameter such as can be ∑ _i p _i E _i , i=1 to I, Ei represents the eigenvector of covariance array, is the human face expression vector parameter (above-mentioned expression principal component) ; p is the coefficient corresponding to the facial expression vector parameter.

From the above, the modeling of the face shape part (which can be recorded as T(S) in this application) can be expressed as a linear combination of the basic shape T ₀ plus n shape vectors Si, which can be described as the following formula (3):

where i∈[1,n]. Since T ₀ and S _i are provided by FLAME, after inputting the initialized q _i , put q _i into formula (3) to generate a 3D face model for the face shape.

5) Geometric registration of the 3D face model, that is, transforming the 3D face model to the target position, also known as rigid body transformation, or optimization of angle and posture. After the above-mentioned 3D face model is established, the 3D position of each vertex of the model is determined. Then the coordinates X _k = (x _k , y _k , z _k ) of the corresponding vertex k of the model can be transformed to the target position through rigid body transformation, which can be described as the following formula (4):

Among them, (w _{x, k} , w _{y, k} , w _{z, k} ) represent the target position. In the embodiment of this application, the target position is the 3D coordinates of each key point in the face area. Through several key points, the The vertices of the entire 3D face model are initially aligned with the point cloud in the camera coordinate system;

Indicates the rotation parameters of the three axes, _s indicates the scaling parameters, and t _w indicates the translation parameters.

Among them, a point cloud matching algorithm can be used (point cloud matching is to solve the transformation relationship between two piles of point clouds, that is, to solve the above-mentioned rotation parameters and translation parameters) for geometric registration. Common point cloud matching algorithms such as iterative closest point algorithm (Iterative Closest Point, ICP), normal distribution transformation algorithm (Normal Distribution Transform, NDT), iterative dual correspondence algorithm (Iterative dual correspondences, IDC), etc., the embodiment of this application The ICP algorithm is used in.

The embodiments of the present application will be described in detail below with reference to the accompanying drawings. Firstly, the scenarios where the method for measuring the head posture provided by the embodiments of the present application is applied are introduced.

The head pose measurement method provided in the embodiment of the present application can be applied to any scene requiring high-precision head pose data. For example, the application scenario may be in an autonomous vehicle (Autonomous Vehicle, AV) or an intelligent driving vehicle. Specifically, the driver's head posture is obtained through the head posture measurement method provided in the embodiment of the present application, and then the driver's head posture is analyzed to determine whether the driver's driving behavior is a dangerous driving behavior, Reminding and warning the driver in time can effectively avoid dangerous driving behavior. For another example, the application scenario may also be students attending a class in online teaching. Specifically: Obtain the head posture of the student through the measurement method of the head posture provided in the embodiment of the application, and then analyze the head posture of the student, and judge the attention of the student accordingly, and timely warn the student or adjust the teaching The content, method, etc. in order to improve the seriousness of students' lectures.

Exemplarily, as shown in FIG. 1 , it is a scenario where the head pose measurement method provided in the embodiment of the present application is applied. After the head RGB image and the head depth image of the driver 20 are collected by the image acquisition device 10, the collected RGB image and the head depth image are transmitted to the local 30, and the local 30 processes the image after receiving the image to obtain The driver's head pose and store the obtained head pose in the local memory. Wherein, the image acquisition device 10 includes but not limited to a camera. The local 30 may include a local computer or a local processing chip or the like.

As shown in Figure 1, after the head RGB image and the head depth image of the driver 20 are collected by the image acquisition device 10, the collected RGB image and the head depth image can also be transmitted to the remote server 40, and the remote server 40 After receiving the image, the image is processed to obtain the driver's head posture, and the obtained head posture is stored in the remote memory, and the obtained head posture can also be sent back to the local terminal (such as mobile phone, computer, etc.), the obtained head posture can also be returned to the local storage device and the like.

Before introducing the head posture measurement method provided by the embodiment of the present application in detail, firstly introduce the relationship between the technical terms in the embodiment of the present application: In this embodiment, the point cloud data of the facial key points of the target object is the target The three-dimensional coordinates corresponding to the key points of the face of the object may also be the three-dimensional key points of the face of the target object; the key points of the face in the parametric face model may also be called the three-dimensional key points in the parametric face model.

Referring to the figures below, a head posture measurement method provided by the embodiment of the present application will be described in detail.

As shown in FIG. 2 , it is a flow chart of the head posture measurement method provided by the embodiment of the present application. The process mainly includes steps S110-S150, each step will be introduced in sequence below:

S110: Obtain face point cloud data of the target object.

As an optional implementation, as shown in FIG. 3, the process may include steps S111-S113, and each step will be introduced in turn below:

S111: Obtain point cloud data of the target object based on the two-dimensional image and the depth image of the target object.

Wherein, the two-dimensional image of the target object may be an RGB image of the target object, or may be a grayscale image of the target object, or the like. It should be noted here that, in the embodiment of the present application, both the 2D image and the depth image of the target object should at least include the face area of the target object.

As an optional implementation, a time of flight camera (Time of Fight Camera, TOF camera) can be used to obtain the RGB image and the depth image of the target object.

S112: Extract a two-dimensional face image of the target object from the two-dimensional image of the target object.

As an optional implementation manner, the two-dimensional face image of the target object may be extracted by performing semantic segmentation on the two-dimensional image of the target object. Wherein, the semantic segmentation is to remove regions other than the face of the target object in the two-dimensional image, such as background, hair, torso, and the like.

In the embodiment of the present application, it includes but is not limited to using a convolutional neural network (CNN) to perform semantic segmentation on the RGB image of the target object, and using a full convolutional network (FCN) to perform semantic segmentation on the RGB image of the target object. Semantic segmentation, or utilize mask-R-convolutional neural network (Mask RCNN) to perform semantic segmentation on the RGB image of the target object.

S113: According to the extracted two-dimensional face image, extract point cloud data corresponding to the two-dimensional face image from the point cloud data of the target object.

Since there is a one-to-one correspondence between the pixels of the two-dimensional image and the pixels of the depth image, as an optional implementation, the pixels in the two-dimensional face image of the target object extracted in step S112 can be The points are registered with the pixels of the depth image to obtain point cloud data representing the face of the target object.

S120: Acquire point cloud data of facial key points of the target object based on the face point cloud data of the target object and the two-dimensional key point data of the target object's face.

Specifically, based on the two-dimensional key points of the face of the target object, the face point cloud data of the target object can be indexed to obtain the point cloud data of the face key points corresponding to the two-dimensional key points of the face of the target object ( 3D keypoints of the face of the target object).

As an optional implementation manner, the two-dimensional key points of the face may be extracted from the two-dimensional face image of the target object. As another optional implementation manner, the two-dimensional key points of the face may also be extracted from the two-dimensional image of the target object, which is not limited in this embodiment of the present application.

As an optional implementation, you can use the Active Shape Model (ASM) method to extract the two-dimensional key points of the face of the RGB image (two-dimensional image), or you can use the deep learning method to extract the RGB image (two-dimensional image) facial two-dimensional key points, you can also use the cascaded deep alignment network (Deep Alignment Network, DAN) to extract two-dimensional image facial two-dimensional key points.

S130: Register the point cloud data of the facial key points of the target object with the point cloud data of the facial key points in the parameterized face model to obtain a first similarity transformation parameter.

In this step, the parameterized face model can be a FLAME model; the registration can also be called pose fitting, that is, rigid body transformation is performed on the entire parameterized face model so that the face in the parameterized face model The point cloud data of the key points is registered with the point cloud data of the face key points of the target object.

As an optional implementation, iterative closest point algorithm (Iterative Closest Point, ICP) can be used to compare the point cloud data of the face key points of the target object with the point cloud data of the face key points in the parameterized face model Registration, and obtain the first similarity transformation parameters. Wherein, the first similarity transformation parameters include scaling factor, rotation matrix and translation vector.

S140: Optimizing the first similarity transformation parameters according to the objective function to obtain second similarity transformation parameters. Wherein, the second similarity transformation parameters include scaling factor, rotation matrix and translation vector.

As an optional implementation, the objective function includes a point-plane distance function. Wherein, the point-plane distance function is a distance function from a point in the face point cloud data of the target object to the nearest triangular patch in the parameterized face model. The triangle patch is a triangle formed by three adjacent vertices in the parameterized face model. For example: Referring to Figure 4, point P is a point in the face point cloud data of the target object, and each triangle shown in Figure 4 is the triangle patch formed between adjacent points in the parameterized face model, such as point a , b, c form a triangle.

Among them, when the nearest triangular patch is not easy to determine, the distance between the face point cloud data of the target object and its adjacent triangular patch can also be calculated, and the size of each distance value can be obtained by comparing, and the minimum value can be obtained , which is the distance from a point in the face point cloud data of the target object to the nearest triangular patch in the parameterized face model. For example, when the nearest triangular patch to point P in Figure 4 is not easy to determine, the distance from point P to triangular patch abc and the distance from point P to triangular patch abd can be calculated, and by comparing the two distance values, Determine the point-plane distance function value corresponding to point P.

In the embodiment of the present application, the point-to-plane distance function can be determined as follows:

is the wth vertex on the nearest triangular face f _c(i) .

As an optional implementation manner, the objective function may also include a key point projection distance function. Wherein, the key point projection distance function is the projection point of the face key point in the parameterized face model on the face two-dimensional image, to the face two-dimensional key point on the face two-dimensional image of the target object function of the distance.

Exemplarily, among the 5023 vertices in the parameterized face model, there are 51 face key points (feature key points) corresponding to the face, and the key point projection distance function is exactly by projecting these 51 facial feature key points to the target object. on the two-dimensional face image, and calculate the distance between these projected points and the corresponding two-dimensional face key points on the two-dimensional face image of the target object. In other embodiments, other representative points may be selected for the face key points, for example, 68 face key points are selected.

Specifically: the key point projection distance function can be determined as follows:

Among them, D _proj is the key point projection distance function, u _i is the projection point of the face key point in the parameterized face model on the face two-dimensional image, v _i is the face two-dimensional image on the face two-dimensional image dimension key points, n is the number of face key points in the parameterized face model. It should be noted here that if there are 51 key points of facial features, then n=51.

As an optional implementation manner, the objective function may also include a penalty term function that parameterizes coefficients of the face model. Wherein, the penalty term is used to constrain the size of the coefficient.

Specifically, the penalty term function of the parameterized face model coefficients that can be determined by the following formula includes:

E _pri ＝λ _S *||S|| ² +λ _E *||E|| ² +λ _P *||P|| ²

As an optional implementation, the gradient descent method can be used to take the first similarity change parameter in step S140 as an initial value, and optimize the first similarity transformation parameter based on the objective function in this step until the objective function Convergence, the similarity transformation parameter corresponding to the convergence of the objective function is used as the second similarity transformation parameter.

As another optional implementation, the quasi-Newton method can also be used to take the first similarity change parameter in step S140 as an initial value, and optimize the first similarity transformation parameter based on the objective function in this step until the The objective function converges, and the corresponding similarity transformation parameter when the objective function converges is used as the second similarity variation parameter. S150: Determine the head pose of the target object according to the second similarity change parameter.

In this step, the second similar change parameters include scaling factor, rotation matrix and translation vector.

As an optional implementation manner, the rotation matrix in the second similarity change parameter may be selected to represent the head pose of the target object.

As another optional implementation manner, Euler angles may also be obtained by performing a Rodrigues transformation on the rotation matrix in the second similarity transformation parameter, and using the Euler angles to represent the head pose of the target object. Wherein, the Euler angles may include a pitch angle (pitch), a yaw angle (yaw), a roll angle (roll) and the like.

Another embodiment of the present application provides a head posture measurement method, which is basically the same as the head posture measurement method provided in the above-mentioned embodiments, so this embodiment will not repeat the similarities. The difference is that after step S150, it also includes:

S160: Determine the concentration of the target object according to the head posture of the target object.

As an optional implementation, the head posture of the target object can be compared with the preset head posture of the target object. High; when the difference between the two exceeds the preset range, it means that the concentration of the target object is low.

S170: Send an alarm to the target object based on the concentration level of the target object.

As an optional implementation manner, when the concentration of the target object is low, an alarm may be sent to the target object to prompt the target object to concentrate. Wherein, the manner of issuing the alarm may be playing music, playing prompt quotations, etc., which are not specifically limited in this embodiment of the present application.

A specific implementation of a head posture measurement method provided by another embodiment of the present application will be described in detail below with reference to FIGS. 5-7 .

Referring to the flow chart shown in FIG. 5 , the head posture measurement method provided by the embodiment of the present application is introduced in detail. The method mainly includes steps S210-S270, each step will be introduced in turn below.

S210: Acquire the RGB image and the depth image of the target object. Wherein, both the RGB image and the depth image should at least include the face area of the target object.

S220: Perform semantic segmentation on the RGB image of the target object obtained in step S210 to obtain a two-dimensional face image of the target object.

S230: Determine face point cloud data of the target object based on the two-dimensional face image of the target object, the depth image of the target object, and the internal reference of the TOF camera.

In this step, the two-dimensional image acquired by the TOF camera has a one-to-one correspondence with each pixel in the depth image, that is, after obtaining the coordinates (u, v) of each pixel and the depth information z _c value of the pixel, Referring to the above formula (4), the point cloud coordinates (x _w , y _w , z _w ) of the pixel can be obtained, and the face point cloud data of the target object can be obtained accordingly.

S240: Extract the two-dimensional key points of the face of the RGB image in step S210.

Exemplarily, FIG. 6 is a schematic diagram of 68 facial two-dimensional key points extracted in this step. Among them, the selection of the 68 facial two-dimensional key points follows the human face key point extraction standard established by dlib or opencv. In other embodiments, it is also possible to extract only 51 key points of facial features, and not extract key points of facial contour, that is, only point 18-point 68 in FIG. 6 .

S250: Based on the face point cloud data of the target object obtained in step S230 and the two-dimensional key points of the face obtained in step S240, determine the three-dimensional coordinates corresponding to the two-dimensional key points of the face of the target object. For the convenience of description, here These 3D coordinates are called 3D keypoints.

Specifically: each key point in the 68 face two-dimensional key points obtained in step S240 is indexed in the face point cloud data of the target object, and the corresponding three-dimensional key points of the 68 face two-dimensional key points are obtained. key points, that is, obtain the 3D key points of the face of the target object.

S260: Register the 3D face key point set of the target object obtained in step S250 with the FLAME model face key point set to obtain a first similarity transformation parameter.

In this step, first, the face coordinate system of the key points of the face of the FLAME model needs to be established.

As shown in Figure 7, specifically: the origin of the head coordinate system is defined as

The x-axis direction is defined as

The y-axis direction is defined as

The z-axis direction is defined as p _z =x×y.

Among them, l ₁ is the three-dimensional coordinates of the left corner of the left eye, l ₂ is the three-dimensional coordinates of the right corner of the left eye, l ₃ is the three-dimensional coordinates of the left corner of the right eye, l ₄ is the right corner of the right eye l ₅ is the three-dimensional coordinates of the left nose, l ₆ is the three-dimensional coordinates of the right nose, l ₇ is the three-dimensional coordinates of the left corner of the mouth, l ₈ is the three-dimensional coordinates of the right mouth corner, and o is the coordinate origin.

Next, register the face key points of the FLAME model with the 3D key points of the face of the target object.

As an optional implementation, the ICP algorithm can be used to initially register the 3D face key point set of the target object and the FLAME model face key point set to obtain the first similarity transformation parameters (s, R, t); Among them, s is the scaling factor, R is the rotation matrix, and t is the translation vector.

S270: Optimizing the first similarity transformation parameters in step S260 to obtain optimized similarity transformation parameters, so as to realize accurate registration of the 3D facial key points of the target object and the facial key points of the FLAME model.

As an optional implementation manner, an objective function may be used to optimize the first similar variation parameter.

Wherein, the objective function E can be established as follows:

E＝D _p2f +αD _proj +βE _pir

specific,

E _pri ＝λ _S ‖S‖ ² +λ _E ‖E‖ ² +λ _P ‖P‖ ²

In the above formula, D _p2f is the distance between each point in the face point cloud data of the target object in step S230 and the face formed by the key points of the FLAME model, and D _proj is the three-dimensional key of the face in the FLAME model Point the projection point of the face two-dimensional image (the RGB image of the target object in S210), to the distance of the face two-dimensional key point on the face two-dimensional image of the target object, E _pri is the penalty of the FLAME model coefficient item, α is the first weight coefficient, and β is the second weight coefficient.

s _i is the point in the face point cloud data of the target object, p(s _i ,f _c(i) ) is the projection of point s _i on the nearest triangular surface f _c(i) in the parameterized face model point,

is the point closest to _si on the jth edge of the nearest triangular face f _c(i) ,

is the wth vertex on the nearest triangular face f _c(i) .

U _i is the projection point of the three-dimensional key point in the parameterized face model on the two-dimensional image of the face, v _i is the two-dimensional key point of the face on the two-dimensional image of the face, and n is the parameterized The number of 3D keypoints in the face model.

S is the shape coefficient in the parametric face model, E is the expression coefficient in the parametric face model, P is the pose coefficient in the parametric face model, λ _S is the penalty of the shape coefficient in the parametric face model coefficient, λ _E is the penalty coefficient of the expression coefficient in the parameterized face model, and λ _P is the penalty coefficient of the pose coefficient in the parameterized face model.

S280: Obtain Euler angles used to represent the head pose of the target object according to the similarity transformation parameters optimized in step S270.

As an optional implementation, Rodrigues transformation can be performed on the rotation matrix in the similarity transformation parameters obtained by fitting the parameterized face model and the point cloud data of the face to obtain the head used to represent the target object. Euler angles of the head pose. Wherein, the Euler angles at least include a pitch angle (pitch), a yaw angle (yaw) and a roll angle (roll).

In this embodiment, the head posture measurement method further includes: determining the concentration of the target object according to the head posture of the target object. Wherein, this step is the same as step S160 in the above embodiment, so it will not be repeated here.

An alert is sent to the target object based on the concentration level of the target object. Wherein, this step is the same as step S170 in the above embodiment, so it will not be repeated here.

Another embodiment of the present application provides a driving assistance device, which may be realized by a software system, may also be realized by a hardware device, and may also be realized by a combination of a software system and a hardware device.

It should be understood that FIG. 8 is only an exemplary structural diagram showing a driving assistance device. As shown in FIG. 8 , the driving assistance device includes a driver's head posture detection module 410 and a driving assistance module 420 .

Specifically, the driver's head posture detection module 410 is used to obtain the driver's head posture using the head posture detection method provided in the above-mentioned embodiments. The embodiments will not be described in detail. The driving assistance module 420 is used for determining the concentration of the driver based on the posture of the driver's head, and issuing a warning to the driver based on the concentration.

Another embodiment of the present application provides a device for generating three-dimensional head data. The device may be implemented by a software system, may also be implemented by a hardware device, and may also be implemented by a combination of a software system and a hardware device.

It should be understood that FIG. 9 is only an exemplary structural diagram showing a head posture measurement device, and the present application does not limit the division of functional modules in the head posture measurement device. As shown in Figure 9, the head posture measurement device can be logically divided into multiple modules, each module can have different functions, and the function of each module can be read by the processor in the computing device and executed in the memory. instructions to implement. Exemplarily, the head posture measurement device includes a first acquisition module 510 , a second acquisition module 520 , a third acquisition module 530 , a fourth acquisition module 540 and a first determination module 550 . In an optional implementation manner, the head posture measurement device is used to execute the content described in steps S110-S150 shown in FIG. 2 . Specifically, it may be: a first acquiring module 510, configured to acquire facial point cloud data of the target object. The second acquisition module 520 is configured to acquire point cloud data of key points of the face of the target object based on the point cloud data of the face of the target object and the two-dimensional key point data of the face of the target object. The third acquisition module 530 is configured to register the point cloud data of the facial key points of the target object with the point cloud data of the facial key points in the parameterized face model to obtain the first similarity transformation parameters. The fourth obtaining module 540 is configured to optimize the first similarity transformation parameters according to the objective function to obtain second similarity transformation parameters. The first determining module 550 is configured to determine the head pose of the target object according to the second similarity change parameter.

Optionally, the objective function in the fourth acquisition module 540 includes a point-to-plane distance function;

Wherein, the point-plane distance function is a distance function from a point in the face point cloud data of the target object to the nearest triangular patch in the parameterized face model; the triangular patch is the parameter The triangle formed by the adjacent three vertices in the human face model.

As an optional implementation, the point-to-plane distance function includes:

Among them, D _2pf is the point-to-plane distance function, s _i is the point in the face point cloud data of the target object, p(s _i ,f _c(i) ) is the point s _i with the closest distance in the parameterized face model The projection point on the triangular face f _c(i) ,

is the wth vertex on the nearest triangular face f _c(i) .

In some embodiments, the objective function in the fourth acquisition module 540 further includes: a key point projection distance function;

Wherein, the key point projection distance function is the projection point of the face key point in the parameterized face model on the face two-dimensional image, and the face two-dimensional image on the face two-dimensional image of the target object. A function of the distance of keypoints.

As an optional implementation, the key point projection distance function includes:

Wherein, D _proj is the key point projection distance function, u _i is the projection point of the face key point in the parameterized face model on the face two-dimensional image, v _i is the face on the face two-dimensional image The first two-dimensional key points, n is the number of facial key points in the parameterized face model.

In some embodiments, the objective function in the fourth acquisition module 540 also includes:

A penalty term function for parameterizing the coefficients of the face model; wherein, the penalty term is used to constrain the size of the coefficients.

As an optional implementation, the penalty term function of the parameterized face model coefficients includes:

E _pri ＝λ _S *||S|| ² +λ _E *||E|| ² +λ _P *||P|| ²

As an optional implementation manner, the first acquiring module 510 includes:

The first acquisition submodule is used to obtain the point cloud data of the target object based on the two-dimensional image and the depth image of the target object;

The second extraction sub-module is configured to extract point cloud data corresponding to the two-dimensional face image from the point cloud data of the target object according to the extracted two-dimensional face image. In some embodiments, the two-dimensional image and the depth image of the target object are obtained by a TOF camera.

As an optional implementation manner, the facial key points of the target object are 51 facial key points.

As an optional implementation manner, the facial key points of the target object are 68 facial key points.

In some embodiments, the first determination module 550 is specifically configured to: perform a Rodrigues transformation on the similarity transformation parameters to obtain Euler angles used to represent the head pose of the target object.

In some embodiments, the head posture measurement device also includes:

An alarm module, configured to issue an alarm to the target object based on the concentration of the target object.

Wherein, for the specific implementation manner of each functional module in this embodiment, reference may be made to the introduction in the foregoing method embodiments, and details are not repeated in this embodiment.

FIG. 10 is a schematic structural diagram of a computing device 900 provided by an embodiment of the present application. The computing device 900 includes: a processor 910 , a memory 920 , and a communication interface 930 .

It should be understood that the communication interface 930 in the computing device 900 shown in FIG. 10 can be used to communicate with other devices.

Wherein, the processor 910 may be connected to the memory 920 . The memory 920 can be used to store the program codes and data. Therefore, the memory 920 may be a storage unit inside the processor 910, or an external storage unit independent of the processor 910, or may include a storage unit inside the processor 910 and an external storage unit independent of the processor 910. part.

Optionally, computing device 900 may further include a bus. Wherein, the memory 920 and the communication interface 930 may be connected to the processor 910 through a bus. The bus may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like. The bus can be divided into address bus, data bus, control bus and so on.

It should be understood that, in this embodiment of the present application, the processor 910 may be a central processing unit (central processing unit, CPU). The processor can also be other general-purpose processors, digital signal processors (digital signal processors, DSPs), application specific integrated circuits (Application specific integrated circuits, ASICs), off-the-shelf programmable gate arrays (field programmable gate arrays, FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. Alternatively, the processor 910 adopts one or more integrated circuits for executing related programs, so as to implement the technical solutions provided by the embodiments of the present application.

The memory 920 may include read-only memory and random-access memory, and provides instructions and data to the processor 910. A portion of processor 910 may also include non-volatile random access memory. For example, processor 910 may also store device type information.

When the computing device 900 is running, the processor 910 executes the computer-executed instructions in the memory 920 to perform the operation steps of the above method.

It should be understood that the computing device 900 according to the embodiment of the present application may correspond to a corresponding body executing the methods according to the various embodiments of the present application, and the above-mentioned and other operations and/or functions of the modules in the computing device 900 are for realizing the present invention For the sake of brevity, the corresponding processes of the methods in the embodiments are not repeated here.

Those skilled in the art can appreciate that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.

Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

If the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

The embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, it is used to perform a head posture measurement method, and the method includes the methods described in the above-mentioned embodiments. at least one of the options.

The computer storage medium in the embodiments of the present application may use any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connections with one or more leads, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In this document, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a data signal carrying computer readable program code in baseband or as part of a carrier wave. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. .

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for performing the operations of the present application may be written in one or more programming languages or combinations thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, and conventional Procedural Programming Language - such as "C" or a similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through the Internet using an Internet service provider). connect).

Note that the above are only preferred embodiments and technical principles used in this application. Those skilled in the art will understand that the present application is not limited to the specific embodiments described herein, and various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present application. Therefore, although the present application has been described in detail through the above embodiments, the present application is not limited to the above embodiments, and may include more other equivalent embodiments without departing from the concept of the present application, all of which belong to protection scope of this application.

Claims

A head posture measurement method, characterized in that, comprising:

Obtain the face point cloud data of the target object;

Based on the face point cloud data of the target object and the face two-dimensional key point data of the target object, acquiring point cloud data of the face key points of the target object;

Registering the point cloud data of the facial key points of the target object with the point cloud data of the facial key points in the parameterized face model to obtain the first similarity transformation parameters;

Optimizing the first similarity transformation parameters according to an objective function to obtain second similarity transformation parameters;

Determine the head pose of the target object according to the second similarity change parameter.
The method according to claim 1, wherein the objective function comprises a point-to-plane distance function;

Wherein, the point-plane distance function is a distance function from a point in the face point cloud data of the target object to the nearest triangular patch in the parameterized face model; the triangular patch is the parameter The triangle formed by three adjacent points in the human face model.
The method according to claim 2, wherein the point-to-plane distance function comprises:

Among them, D 2pf is the point-plane distance function, s i is the point in the face point cloud data of the target object, p(s i , f c(i) ) is the point s i with the closest distance in the parameterized face model The projection point on the triangular face f c(i) ,
is the point closest to si on the jth side of the nearest triangular face f c(i) ,
is the wth vertex on the nearest triangular face f c(i) .
The method according to any one of claims 1-3, wherein the objective function further comprises:

key point projection distance function;

Wherein, the key point projection distance function is the projection point of the key point of the face in the parameterized face model on the two-dimensional image of the face, and the two-dimensional key point of the face on the two-dimensional face image of the target object function of the distance.
The method according to claim 4, wherein the key point projection distance function comprises:

Among them, D proj is the key point projection distance function, u i is the projection point of the face key point in the parameterized face model on the face two-dimensional image, v i is the face two-dimensional image on the face two-dimensional image dimension key points, n is the number of face key points in the parameterized face model.
The method according to any one of claims 1-5, wherein the objective function further comprises:

A penalty term function for parameterizing the coefficients of the face model; wherein, the penalty term is used to constrain the size of the coefficients.
The method according to claim 6, wherein the penalty term function of the parameterized face model coefficients comprises:

E pri ＝λ S *||S|| 2 +λ E *||E|| 2 +λ P *||P|| 2

Among them, E pri is the penalty term function of the parameterized face model coefficient, S is the shape coefficient in the parameterized face model, E is the expression coefficient in the parameterized face model, and P is the pose in the parameterized face model coefficient, λ S is the penalty coefficient of the shape coefficient in the parameterized face model, λ E is the penalty coefficient of the expression coefficient in the parameterized face model, and λ P is the penalty coefficient of the pose coefficient in the parameterized face model.
The method according to claim 1, wherein said acquiring the facial point cloud data of the target object comprises:

Obtaining point cloud data of the target object based on the two-dimensional image and the depth image of the target object;

extracting a two-dimensional image of the face of the target object from the two-dimensional image of the target object;

According to the extracted two-dimensional facial image, point cloud data corresponding to the two-dimensional facial image is extracted from the point cloud data of the target object.
The method according to any one of claims 1-8, wherein the two-dimensional image and the depth image of the target object are obtained by a TOF camera.
The method according to claim 1, wherein the facial key points of the target object are 51 facial key points.
The method according to claim 1, wherein the facial key points of the target object are 68 facial key points.
The method according to claim 1, wherein the determining the head posture of the target object according to the second similarity change parameter comprises:

Rodriguez changes are performed on the second similarity transformation parameters to obtain Euler angles used to represent the head pose of the target object.
The method according to any one of claims 1-12, further comprising: determining the concentration of the target object according to the head posture of the target object;

An alert is sent to the target object based on the concentration level of the target object.
A head posture measuring device is characterized in that it comprises:

The first obtaining module is used to obtain the facial point cloud data of the target object;

The second acquisition module is used to acquire point cloud data of key points of the face of the target object based on the point cloud data of the face of the target object and the two-dimensional key point data of the face of the target object;

The third acquisition module is used to register the point cloud data of the facial key points of the target object with the point cloud data of the facial key points in the parameterized face model to obtain the first similarity transformation parameters;

A fourth acquisition module, configured to optimize the first similarity transformation parameters according to an objective function, and obtain second similarity transformation parameters;

A first determining module, configured to determine the head pose of the target object according to the second similarity change parameter.
The device according to claim 14, wherein the objective function in the fourth acquisition module includes a point-to-plane distance function;

Wherein, the point-plane distance function is a distance function from a point in the face point cloud data of the target object to the nearest triangular patch in the parameterized face model; the triangular patch is the parameter The triangle formed by the adjacent three vertices in the human face model.
The device according to claim 15, wherein the point-to-plane distance function is specifically used for:

Among them, D 2pf is the point-plane distance function, s i is the point in the face point cloud data of the target object, p(s i , f c(i) ) is the point s i with the closest distance in the parameterized face model The projection point on the triangular face f c(i) ,
is the point closest to si on the jth side of the nearest triangular face f c(i) ,
is the wth vertex on the nearest triangular face f c(i) .
The device according to any one of claims 14-16, wherein the objective function in the fourth acquisition module further comprises:

key point projection distance function;

Wherein, the key point projection distance function is the projection point of the key point of the face in the parameterized face model on the two-dimensional image of the face, and the two-dimensional key point of the face on the two-dimensional face image of the target object function of the distance.
The device according to claim 17, wherein the key point projection distance function is specifically used for:

Among them, D proj is the key point projection distance function, u i is the projection point of the face key point in the parameterized face model on the face two-dimensional image, v i is the face two-dimensional image on the face two-dimensional image dimension key points, n is the number of face key points in the parameterized face model.
The device according to any one of claims 14-18, wherein the objective function in the fourth acquisition module further comprises:

A penalty term function for parameterizing the coefficients of the face model; wherein, the penalty term is used to constrain the size of the coefficients.
The device according to claim 19, wherein the penalty term function of the parameterized face model coefficient is specifically used for:

E pri ＝λ S *||S|| 2 +λ E *||E|| 2 +λ P *||P|| 2

Among them, E pri is the penalty term function of the parameterized face model coefficient, S is the shape coefficient in the parameterized face model, E is the expression coefficient in the parameterized face model, and P is the pose in the parameterized face model coefficient, λ S is the penalty coefficient of the shape coefficient in the parameterized face model, λ E is the penalty coefficient of the expression coefficient in the parameterized face model, and λ P is the penalty coefficient of the pose coefficient in the parameterized face model.
The device according to claim 14, wherein the first acquiring module comprises:

The first acquisition submodule is used to obtain the point cloud data of the target object based on the two-dimensional image and the depth image of the target object;

The first extraction submodule is used to extract the face two-dimensional image of the target object from the two-dimensional image of the target object;

The second extraction sub-module is configured to extract the point cloud data corresponding to the two-dimensional face image from the point cloud data of the target object according to the extracted two-dimensional face image.
The device according to any one of claims 14-21, wherein the two-dimensional image and the depth image of the target object are obtained by a TOF camera.
The device according to claim 14, wherein the facial key points of the target object are 51 facial key points.
The device according to claim 14, wherein the facial key points of the target object are 68 facial key points.
The device according to claim 14, wherein the first determining module is specifically configured to: perform a Rodrigues transformation on the second similarity transformation parameter to obtain an Euler used to represent the head pose of the target object horn.
The device according to any one of claims 14-25, further comprising:

The second determination module is used to determine the concentration of the target object according to the head posture of the target object;

An alarm module, configured to issue an alarm to the target object based on the concentration of the target object.
A computing device, comprising:

Communication Interface;

at least one processor connected to the communication interface; and

At least one memory, which is connected to the processor and stores program instructions, and when the program instructions are executed by the at least one processor, the at least one processor executes the program described in any one of claims 1-13. A head pose measurement method.
A computer-readable storage medium on which program instructions are stored, wherein when the program instructions are executed by a computer, the computer executes the head posture described in any one of claims 1-13 Measurement methods.
A computer program product, characterized in that, when the computer program product is run on a computing device, the computing device is made to execute the method for measuring head posture according to any one of claims 1-13.