WO2023272725A1

WO2023272725A1 - Facial image processing method and apparatus, and vehicle

Info

Publication number: WO2023272725A1
Application number: PCT/CN2021/104294
Authority: WO
Inventors: 崔贤娟; 刘杨; 黄为
Original assignee: 华为技术有限公司
Priority date: 2021-07-02
Filing date: 2021-07-02
Publication date: 2023-01-05
Also published as: CN113632098A

Abstract

The present application belongs to the field of machine vision. Provided is a facial image processing method. The method comprises: firstly, acquiring a local facial shape feature from a facial image; then acquiring, from a face database, a face sample matching the local facial shape feature, and acquiring a facial shape parameter of the facial sample; then, on the basis of a parameterized facial model, generating a three-dimensional facial model by using the facial shape parameter; and performing fitting on the basis of point cloud data of a local facial area, so as to complete three-dimensional facial reconstruction. On the basis of a local facial shape feature, three-dimensional facial reconstruction when a face is partially covered or cropped can be realized; and the present application can be applied to the field of intelligent vehicles, for example, a driver monitoring system.

Description

Face image processing method, device and vehicle

technical field

The invention relates to the technical field of machine vision, in particular to a face image processing method, device and vehicle.

Background technique

Three-dimensional (3D) face reconstruction is a research hotspot in the fields of machine vision, computer vision and computer graphics. 3D face reconstruction is one of the core technologies in the fields of virtual reality/augmented reality, automatic driving, robotics, etc., and has great application value in the driver monitoring system (DMS) in the field of smart cars.

For example, when the vehicle is in an assisted driving or automatic driving state, it will allow the driver to be freed from some tasks, but requires the driver to be ready to take over the vehicle at any time, making real-time monitoring of the driver's attention status critical. 3D face reconstruction is one of the basic technologies for monitoring the driver's head posture and gaze direction, which directly affects the performance of human-computer interaction and DMS.

When monitoring the status of vehicle occupants, there are often occlusions of face images caused by hands, steering wheels, mobile phones, and food, and occlusions have a great impact on the performance of 3D face reconstruction. Occluded face images can be divided into unintentional occlusion and intentional occlusion. Common unintentional occlusions include glasses, steering wheels, and others blocking the face of the monitored person, while intentional occlusions usually include sunglasses, masks, or other objects blocking facial features. Intentional occlusion usually results in the failure of 3D face reconstruction due to excessive feature changes, while unintentional occlusion usually only covers a small part of facial features, which can easily lead to the introduction of too many interference features in the feature extraction process, resulting in distortion of 3D face reconstruction.

The uncertainty of the occluded object and the occluded area make the inherent features of the face image appear as the lack of various local features, which limits the application scenarios of 3D face reconstruction. Therefore, it is necessary to provide a A novel method for 3D face reconstruction with good robustness.

Contents of the invention

In view of the above problems, the embodiment of the present application provides a face image processing solution, which includes a face image processing method, device, vehicle, computing device, computer-readable storage medium and computer program product, which can be implemented in the face part 3D face reconstruction under occluded conditions.

In order to achieve the above object, the first aspect of the present application provides a face image processing method, including: acquiring the local face shape features in the face image; Facial shape parameters of the face sample; use the facial shape parameters to generate a 3D face model.

The embodiment of the present application generates a 3D face model based on the information of the face sample that matches the local face shape, which reduces the interference of occluders, and is suitable for 3D face reconstruction in face occlusion scenarios with high robustness. features.

When the face image processing method based on the embodiment of the present application is applied to a smart vehicle (such as a driver status monitoring system in a smart vehicle), it can be realized that the face image processing method of the occupant (the occupant refers to the driver or passenger) has a blocked face. 3D face reconstruction, and further recognition of head posture and/or gaze direction based on the reconstructed 3D face, which can improve the robustness and stability of the driver status monitoring system.

As a possible implementation manner of the first aspect, it also includes: fitting the generated 3D face model to point cloud data of a local face area, the local face area being included in the face image.

From the above, through the above fitting process, further detailed 3D face reconstruction, or 3D face optimization, can be realized. Wherein, the fitting may include fitting parameters such as lips and eyes, so that the reconstructed 3D face is closer to the real appearance of the user.

As a possible implementation of the first aspect, the above fitting includes: obtaining key points of the local face area, and obtaining the three-dimensional coordinates of the key points; performing pose transformation on the three-dimensional face model according to the three-dimensional coordinates of the key points; The transformed 3D face model is fitted with the point cloud data of the local face area.

From the above, firstly, the rigid body transformation of the 3D face model is realized with the 3D coordinates of the key points as the target, and the preliminary alignment with the point cloud data of the local face area is realized. This process can be realized by using the ICP algorithm. is small, so the calculation amount of the preliminary alignment is small and the alignment speed is fast. To further perform shape fitting with the point cloud data of the local face area, and to complete the shape optimization process, the quasi-Newton algorithm can be used to realize the fast convergence of the quasi-Newton algorithm and the characteristics of low computational complexity.

As a possible implementation of the first aspect, the point cloud data of the local face area is obtained according to the pixel depth value of the local face area and camera parameters.

From the above, this application can be implemented by using binocular cameras, RGB-D cameras, infrared cameras, etc., and the cost of implementation is lower than that of other image perception and acquisition devices. The cameras here are not limited to traditional cameras, but also include image acquisition devices such as cameras.

As a possible implementation of the first aspect, the three-dimensional coordinates of the key points in the local face area are obtained according to the depth values of the key points and camera parameters.

From the above, this application can be realized by using binocular cameras and RGB-D cameras, and the cost of implementation is lower than that of other image perception and acquisition devices.

As a possible implementation of the first aspect, obtaining the local face shape features in the face image includes: obtaining a target area in the face image, where the target area includes a local face area; Face area; obtain the local face shape features of the local face area.

From the above, the two-step method of obtaining the target area from the image first, and then obtaining the local face area from the target area, compared with the method of directly obtaining the local face area from the image, can make the neural network for this process as a whole complex. Low weight and easy to train.

As a possible implementation of the first aspect, obtaining the local face area in the target area includes at least one of the following methods: obtaining the local face area according to the color value of the pixel in the target area; according to the depth of the pixel in the target area The value gets the partial face area.

From the above, the local face area can be obtained according to the color value of the pixel in the target area. This is because the skin color of the face can be distinguished from the non-face parts, such as the skin color of the face and the background, and some occluders (such as masks, cups, water bottles, etc.), so the local area in the target area can be extracted based on the pixel color value. face area. The local face area can also be obtained according to the depth value of the pixel in the target area. This is due to the spatial position (or depth) of the human face in space (or in depth) and some non-face parts, such as the background, some non-flaky occluders (such as water cups, water bottles, hands, arms) can be distinguished, so local face regions can be extracted based on pixel depth values. In some possible implementations, the occluded area of the human face can be extracted from the image of the target area first, and then for the difference area between the target area and the occluded area of the human face, according to the color value and/or depth value of the above pixel, to extract local face regions.

As a possible implementation manner of the first aspect, obtaining the face samples matching the local face shape features includes: retrieving the face samples matching the local face shape features in a face database.

From the above, a face database can be constructed based on real faces in advance, and massive data can improve the accuracy of matched face samples in terms of probability. Moreover, obtaining a complete real face sample based on the face database will make the 3D face reconstructed in the embodiment of the present application more realistic.

As a possible implementation manner of the first aspect, using face shape parameters to generate a 3D face model includes: generating a 3D face model based on a parameterized 3D face model using face shape parameters.

From the above, generating a 3D face model based on a parameterized 3D face model can make full use of the obtained face shape parameters.

The second aspect of the present application provides a face image processing device, including: an acquisition module, used to acquire the local face shape feature in the face image, and acquire a face sample matching the local face shape feature, and acquire the face The face shape parameters of the face sample; the generation module is used to generate a three-dimensional face model.

As a possible implementation of the second aspect, the generating module is further configured to: fit the generated 3D face model with point cloud data of a local face area.

As a possible implementation of the second aspect, when the generation module is used for fitting, it is specifically used to: obtain the key points of the local face area, and obtain the three-dimensional coordinates of the key points; The coordinates are transformed into a pose; the 3D face model after pose transformation is fitted with the point cloud data of the local face area.

As a possible implementation of the second aspect, the point cloud data of the local face area is obtained according to the pixel depth value of the local face area and camera parameters.

As a possible implementation of the second aspect, the three-dimensional coordinates of the key points in the local face area are obtained according to the depth values of the key points and camera parameters.

As a possible implementation of the second aspect, it is characterized in that the acquisition module is specifically used to: acquire the target area in the image, the target area includes a partial face area; acquire the partial face area in the target area; acquire the partial face area Local face shape features for face regions.

As a possible implementation of the second aspect, when the acquisition module is used to acquire the partial face area in the target area, it is specifically used in at least one of the following ways: acquire the partial face area according to the color value of the pixel in the target area ; Obtain the local face area according to the depth value of the pixel in the target area.

As a possible implementation of the second aspect, when the acquiring module is used to acquire face samples that match local face shape features, it is specifically used to: retrieve faces in a face database that match local face shape features sample.

As a possible implementation of the second aspect, the generation module is specifically configured to: generate a three-dimensional face model based on a parameterized three-dimensional face model by using face shape parameters.

The third aspect of the present application provides an electronic device, including: a processor, and a memory, on which program instructions are stored, and when the program instructions are executed by the processor, any one of the face image processing provided in the first aspect above can be realized method.

The fourth aspect of the present application provides an electronic device, including: a processor, and an interface circuit, wherein the processor accesses the memory through the interface circuit, and the memory stores program instructions. When the program instructions are executed by the processor, the above-mentioned first In one aspect, any face image processing method provided.

The fifth aspect of the present application provides a vehicle including: an image acquisition device for acquiring face images, and any one of the face image processing devices provided in the first aspect above, or, the third aspect or the fourth aspect above. of electronic devices.

The sixth aspect of the present application provides a computer-readable storage medium. The computer-readable storage medium stores program instructions. When the program instructions are executed by the computer, the computer can realize any one of the face image processing provided by the above-mentioned first aspect. method.

The seventh aspect of the present application provides a computer program product, which includes program instructions. When the program instructions are executed by a computer, the computer implements any one of the face image processing methods provided in the first aspect above.

To sum up, the face image processing scheme adopted in the embodiment of the present application, for a face with occlusion, extracts local face features from the local face area (that is, the part of the face that is not occluded), and then matches to a similar face image. A 3D face model is established according to the shape parameters of the face sample, which solves the 3D face reconstruction when the face is occluded. On the other hand, in the process of extracting local face features and matching similar face samples, by using sparse features, the complexity of the neural network for local face feature extraction can be reduced, and the operating efficiency can be improved during the matching process. . On the other hand, through the fitting optimization of the local face area and the 3D head model, the reconstructed 3D face is closer to the real appearance of the user.

Description of drawings

FIG. 1A is a schematic diagram of a scene where an embodiment of the present application is applied to a vehicle;

FIG. 1B is a schematic diagram of a scene where the embodiment of the present application is applied to a vehicle;

Fig. 2A is the flowchart of the face image processing method of the embodiment of the present application;

FIG. 2B is a schematic diagram of a face image processing method according to an embodiment of the present application;

Fig. 3 is the flow chart of the partial human face area extraction in an embodiment of the present application;

Fig. 4 is the fitting flow chart of 3D face model and local face region point cloud in one embodiment of the present application;

Fig. 5 is a flow chart of a specific embodiment of the applicant's face image processing method;

6 is a schematic diagram of an embodiment of the 3D face reconstruction device of the present application;

FIG. 7 is a schematic diagram of an electronic device provided in an embodiment of the present application;

FIG. 8 is a schematic diagram of another electronic device provided by an embodiment of the present application;

FIG. 9A is a schematic diagram of a vehicle provided in an embodiment of the present application;

FIG. 9B is a schematic diagram of a vehicle provided in the embodiment of the present application;

FIG. 10 is a schematic diagram of an embodiment of a computing device of the present application.

It should be understood that in the above structural diagrams, the size and shape of each block diagram are for reference only, and should not constitute an exclusive interpretation of the embodiment of the present application. The relative positions and containment relationships among the block diagrams shown in the structural schematic diagram are only schematic representations of the structural relationships among the block diagrams, rather than limiting the physical connection methods of the embodiments of the present application.

detailed description

The technical solutions provided by the present application will be further described below in conjunction with the accompanying drawings and examples. It should be understood that the system structure and business scenarios provided in the embodiments of the present application are mainly for illustrating possible implementations of the technical solution of the present application, and should not be interpreted as the only limitation on the technical solution of the present application. Those skilled in the art know that with the evolution of the system structure and the emergence of new business scenarios, the technical solutions provided in this application are also applicable to similar technical problems.

It should be understood that the face image processing solution provided in the embodiments of the present application includes a face image processing method and device, a computing device, a computer-readable storage medium, and a computer program product. Since the principles of these technical solutions to solve problems are the same or similar, in the introduction of the following specific embodiments, some repetitions may not be repeated, but it should be considered that these specific embodiments have been referred to each other and can be combined with each other.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. In case of any inconsistency, the meaning stated in this manual or the meaning derived from the content recorded in this manual shall prevail. In addition, the terms used herein are intended to describe the embodiments of the present application, but not to limit the present application.

In order to accurately describe the technical content in this application, and in order to accurately understand the present invention, the following explanations or definitions are given to the terms used in this specification before describing the specific embodiments:

1) Image data with depth: it includes ordinary red, green, blue (Red, Green, Blue, RGB) color image information and depth information, and the RGB image information and depth information are registered, that is, there is a distance between pixels One-to-one correspondence. The collection of image data with depth can be realized through RGB-depth (GRB-Depth, RGB-D) cameras, and the collected image data with depth can be presented in the form of an RGB image frame and a depth image frame, and can also be integrated Presented as an image data. According to the internal parameters of the camera, the transformation between depth information and point cloud coordinates can be realized.

2) Regarding the definitions of the various regions mentioned in the embodiments of this application:

Region of Interest (ROI): In the embodiment of this application, it refers to the face target frame area in the image to be recognized. The region can be either an image containing an occluded human face image, or a cropped human face region image.

Partial face area: In this embodiment of the application, it refers to the visible area of the face in the target area, that is, the unoccluded area.

Occluded area: In this embodiment of the application, it refers to the area where the human face is occluded.

3) Parametric face model: a way to represent a face by combining a standard face (or called an average face, a reference face, a basic shape face, and a statistical face) in combination with shape feature vectors, pose feature vectors, or expression feature vectors. For example, 3D morphable face model (3D Morphable Face Model, 3DMM), FLAME model, etc.

4) FLAME model: The FLAME model is a real human body point cloud based on CAESAR data, where each real head grid is obtained by registering the head data of these real human bodies, and the head grid includes the entire area of the face and head. Thus, a real face and head database is established. The human head grid is composed of several (such as 5023) vertices and several (such as 9976) triangular faces, and several (such as 300) shapes (shape), several (such as 100 1) expression (expression) and several (such as 15) posture (pose) principal components, so that a parameterized 3D human head model can be determined accordingly.

Specifically, through the definition of the vertex position of the grid, the shape T of FLAME is defined as the coordinates of each vertex k constituting the grid, which can be described as the following formula (1):

T=(x ₁ , y ₁ , z ₁ , x ₂ , . . . , x _n , y _n , z _n ) (1)

Among them, FLAME models the shape and expression separately, and the FLAME face model can be described as the following formula (2):

T(V;p,q)=T ₀ +B _s (q;S)+B _p (p;E) (2)

Among them, T ₀ is a standard face, that is, the average shape part of the face; B _s (q; S) represents the mixing parameter of the face shape, for example, it can be ∑ _i q _i S _i , i=1 to n, and Si represents the association The eigenvector of the variance matrix is the face shape vector parameter (the above-mentioned shape principal component); q is the coefficient corresponding to the face shape vector parameter. B _p (p; E) represents the facial expression mixing parameter such as can be ∑ _i p _i E _i , i=1 to l, Ei represents the eigenvector of covariance matrix, is the human face expression vector parameter (above-mentioned expression principal component) ; p is the coefficient corresponding to the facial expression vector parameter.

From the above, the modeling of the face shape part (which can be recorded as T(S) in the embodiment of the present application) can be expressed as a linear combination of the basic shape T ₀ plus n shape vectors Si, which can be described as the following formula (3 ):

T(S)=T ₀ +B _s (q; S)=T ₀ +∑ _i q _i S _i i=1 to n (3)

Since T ₀ and S _i are provided by FLAME, after obtaining each qi, put qi into formula (3) to generate a 3D face model for the face shape.

5) Optimizing the attitude angle of the 3D face model, that is, transforming the 3D face model to the target position, also known as rigid body transformation, or geometric registration. After the above-mentioned 3D face model is established, the 3D position of each vertex of the model is also determined, in other words, the 3D position of each vertex can be determined by the 3D face model given by the coefficient qi and equation (3) . Then the coordinates X _k = (x _k , y _k , z _k ) of the corresponding vertex k of the model can be transformed to the target position through rigid body transformation, which can be described as the following formula (4):

Among them, (w _{x, k} , w _{y, k} , w _{z, k} ) represents the target position. In the embodiment of the present application, the target position is the 3D coordinates of each key point in the local face area. Through several key points, Realize the initial alignment of the vertices of the entire 3D face model with the point cloud in the camera coordinate system;

Indicates the rotation parameters of the three axes, and t _w indicates the translation parameters.

Among them, the point cloud matching algorithm can be used (point cloud matching is to solve the transformation relationship between two piles of point clouds, that is, to solve the above-mentioned rotation parameters and translation parameters) to optimize the angle and attitude. Common point cloud matching algorithms such as Iterative Closest Point (ICP), Normal Distribution Transform (NDT), Iterative Dual Correspondences (IDC) and so on.

6) Shape fitting by quasi-Newton algorithm: Quasi-Newton algorithm is one of the iterative algorithms. Quasi-Newton algorithm adopts second-order convergence. Compared with the conventional gradient descent method, the convergence speed is faster, and the quasi-Newton method is more complex than the Newton method. Low.

In the embodiment of the present application, the point cloud data of the local face area is given, and the face shape coefficient is further optimized by minimizing the objective function, wherein the objective function is the following formula (5), which is the relationship between the 3D coordinates of the face point cloud and the reconstructed model vertices The sum of the squared differences between them, the objective function is a convex function, which is solved iteratively through one of the classic convex optimization algorithms - the quasi-Newton algorithm.

Objective function: L _S =∑ _i ‖(P _{x, i} , P _{y, i} , P _{z, i} )-I _i (V _{x, i} , V _{y, i} , V _{z, i} )‖ ² (5)

Among them, (P _{x, i} , P _{y, i} , P _{z, i} ) are the points in the point cloud data of the above local face area, (V _{x, i} , V _{y, i} , V _{z, i} ) are the generated The point in the 3D face model, i represents the i-th vertex, and Ii represents whether the model vertex i is included in the calculation of the objective function.

For the 3D face reconstruction method, a technical scheme that can be adopted is to reconstruct the 3D face based on the key points of the two-dimensional (2D) face image. The technical scheme first extracts the key points on the 2D face image. Points, the key points can be 17 points on the facial contour, 5 points on the left eyebrow, 5 points on the right eyebrow, 6 points on the left eye, 6 points on the right eye, 4 points on the bridge of the nose, 5 points on the wing of the nose, 20 points on the mouth contour, etc. , each key point is used to represent the contour of the face; then, adjust the position of the corresponding feature point in the standard 3D face model through the corresponding relationship of the key point on the general standard 3D face model; and then by interpolating other non-feature points , deform the standard 3D face model to obtain a reconstructed 3D face model. It can be seen from the above that this method needs to extract 2D key points from the input image. However, when 3D face reconstruction is performed based on the side image, since the 2D key point information is occluded by itself, the positioning of the 2D key points of the face will be inaccurate. Accurate (that is, cannot be accurately extracted), resulting in a poor effect of the reconstructed 3D object, that is, poor adaptability to the pose of the face in the side input image. It can be seen that, for the input image where part of the face is invisible due to occlusion, the reconstruction effect is extremely poor due to the inaccurate extraction of 2D key points, and even the reconstruction may fail.

Another technical solution that can be adopted is to use a consumer-grade RGB-D depth camera to perform 3D reconstruction of the face. This solution performs geometric registration on the point cloud corresponding to the input image of the current frame to perform 3D face reconstruction. In the geometric registration Most of the links are based on the ICP algorithm. This process is a fitting optimization problem that requires a large number of iterative calculations. The main problem of this method is that when the face is occluded, the face point cloud data of the occluded part of the image frame is not known, so it is difficult to determine the information to be fitted, resulting in the failure of the reconstructed 3D object.

In order to realize 3D face reconstruction in the case that the face part is occluded, the embodiment of the present application provides a face image processing method, which is an improved 3D face reconstruction method. The partial face area of the local face is extracted, and the similarity matching is performed with the sparse features of each 3D face sample in the face database to obtain the shape parameter corresponding to the matched 3D face sample. According to the shape parameter, the combination Parametric face model to establish a 3D head model; on the other hand, identify each key point of the local face area, and obtain the 3D data of each key point; Perform rigid body transformation on the head model to achieve initial alignment between the 3D head model and the point cloud of the local face area in the camera coordinate system; then perform fitting optimization between the local face area and the 3D head model, and the selection of optimization targets is limited to the local face Regional point cloud to complete 3D face reconstruction. The method of the embodiment of the present application has a better 3D face reconstruction effect when the human face is partially occluded intentionally or unintentionally, or when the large-angle head posture is self-occluded, which makes it difficult to obtain the information to be fitted. .

The embodiment of the present application can be applied to the 3D reconstruction of the face of a person in a vehicle, an airplane, etc., such as a driver, so that the driver's head posture, line of sight direction, etc. can be judged based on the reconstructed 3D face, To identify the state of the driver. It can also be applied to the reconstruction of the 3D faces of the audience in front of the TV, students in teaching, etc., so that based on the reconstructed 3D faces, the head posture and line of sight direction of these people can be judged to further determine the attention of the people The direction of force, the degree of attention, etc., and then adjust the TV content, teaching methods, etc. It can also be applied to mobile terminals, such as mobile phones, tablet computers, and portable computers, to reconstruct 3D face images from people's face images. It can also be applied to technical fields such as restoration of human face images, for example, it can be used for the repair of partially stained or incomplete human face images, such as the restoration of incomplete human face images caused by cropping.

FIG. 1A and FIG. 1B show examples of scenarios where the embodiment of the present application is applied to a vehicle. The vehicle in this embodiment includes a general motor vehicle, such as a car, a sport utility vehicle (sport utility vehicle, SUV), a utility vehicle, etc. Land transportation devices including multi-purpose vehicles (MPV), buses, trucks and other cargo or passenger vehicles, as well as water vehicles including various ships and boats, and aircraft. For motor vehicles, it also includes hybrid vehicles, electric vehicles, gasoline vehicles, plug-in hybrid vehicles, fuel cell vehicles and other alternative fuel vehicles. Wherein, a hybrid vehicle refers to a vehicle having two or more power sources, and an electric vehicle includes a pure electric vehicle, an extended-range electric vehicle, etc., which is not specifically limited in this application. When the embodiment of the present application is applied to a vehicle 10 , the vehicle 10 may include an image acquisition device 11 and a processor 12 .

Wherein, the image acquisition device 11 is used to acquire images including faces of occupants (the occupants include drivers or passengers). In this embodiment, the image acquisition device 11 is a camera, where the camera may be a binocular camera, an RGB-D camera, or the like. The camera can be installed on the vehicle as required, for example, in the cockpit of the vehicle. In this embodiment, specifically, as shown in FIG. 1B, a binocular camera composed of two independent cameras is adopted, and these two cameras are the first camera 111 and the second camera 112 arranged on the left and right A-pillars of the vehicle cockpit. . In other examples, it can also be installed on the side of the rearview mirror in the vehicle cockpit facing the occupant, it can also be installed on the steering wheel, the area near the center console, and it can also be installed on the position above the display screen behind the seat. It is used to collect facial images of drivers or passengers in the vehicle cockpit.

In some other embodiments, the image acquisition device 11 can also be an electronic device that receives the occupant image data transmitted by the camera, such as a data transmission chip, such as a bus data transceiver chip, a network interface chip, etc., and the data transmission chip can also be Wireless transmission chips, such as Bluetooth chips or WiFi chips. In some other embodiments, the image acquisition device 11 may also be integrated into the processor, and become an interface circuit or a data transmission module integrated into the processor.

Among them, the processor 12 can be used to reconstruct the 3D face according to the face in the image. The face area (such as the unoccluded face area) is used for 3D face reconstruction. In some other embodiments, the processor 12 can also be used to identify the occupant's head posture and/or gaze direction according to the reconstructed 3D face, and can further determine the occupant's attention according to the recognized head posture and gaze direction. Direction of force, degree of attention, etc. When the embodiment of the present application is applied to the vehicle 10, the processor 12 may be an electronic device, such as a processor of a car machine, a domain controller, a mobile data center (Mobile Data Center, MDC) or a vehicle-mounted computer, etc. It can also be a conventional chip such as a central processing unit (central processing unit, CPU) or a microprocessor (micro control unit, MCU).

Figure 2A shows an embodiment of the face image processing method of the present application, and Fig. 2B is a schematic diagram of the face image processing method of this embodiment, and the embodiment of the face image processing method includes the following steps:

S10: For the image, obtain the local face shape features in the face image. In this embodiment, the face image is a 2D image with depth information, the face image includes a partial face area, and the partial face area includes an unoccluded partial face.

In some embodiments, the face image may be collected by a binocular camera or an RGB-D camera, or may be received by a data transmission chip. When a binocular camera is used, the depth information of the pixel can be calculated according to the collected image pairs (an image pair refers to a pair of images collected by the binocular camera) and camera parameters. When an RGB-D camera is used, the depth information of the pixels can be obtained directly.

S20: Acquire a face sample that matches the local face shape feature according to the local face shape feature, and then acquire a face shape parameter of the face sample.

In some embodiments, according to the extracted local face shape features, search can be performed in the face database to match the face samples similar to the local face shape features, for example, to match the person with the highest similarity face sample, and then obtain the face shape parameters of the face sample.

In some embodiments, when matching a face sample similar to the local face shape feature, the matching goal may be: make the face shape feature of the local face area consistent with the face sample of the face sample. The correlation of the shape features is the largest, and the correlation residual between the face shape features of the local face area and the face shape features of other face samples is minimized.

Wherein, the face samples in the face database are unoccluded face samples, through this step, a similar face can be obtained, and the face shape parameters of the similar face can be obtained.

In some embodiments, the face database can be formed based on real faces, and massive data can improve the accuracy of matched face samples in terms of probability. Moreover, obtaining a complete real face sample based on the face database will make the 3D face reconstructed by the method of the embodiment of the present application more realistic.

S30: Generate a 3D face model by using the obtained face shape parameters.

In some embodiments, the 3D face model can be generated based on the parameterized 3D face model. This step can refer to the above formula (3), and the obtained face shape parameters are brought into the formula (3) to generate a 3D human face. face model.

In some embodiments, after the above step S30, a step S40 may also be included: fitting the 3D face model based on the point cloud data of the local face area, the fitting including pose transformation and shape fitting. Wherein, the point cloud data of the local face area can be obtained according to the depth value of each pixel and camera parameters.

In some embodiments, as shown in the flowchart of FIG. 3, the above step S10 may include the following steps S11-S13:

S11: Acquire a target area in the face image, where the target area includes the partial face area.

For the original image, in addition to the face to be built, the image also includes other content. In order to avoid unnecessary key point information determination on other parts of the image in the subsequent steps and improve processing efficiency, the image is first The target area is extracted through face detection, and the target area can also be called a region of interest (ROI).

In some embodiments, the target area is an area including the partial human face area, and this area may be a rectangular area, a circular area or an area of any shape. In this embodiment, it may be a rectangular area. In this area The included local face area refers to the area where the face is not occluded, and the non-face area of this area includes the occluder area and the background area.

In some embodiments, for the extraction of the target region, a convolutional neural network (Convolutional Neural Networks, CNN), a region selection network (Region Proposal Network, RPN), a region selection network based on a convolution function (Regions with CNN features , RCNN), fast RCNN network (Faster-RCNN), MobileV2 network (a lightweight neural network) and other networks, or a combination of multiple networks to extract the target area.

S12: Extract a local face area from the target area image.

In some embodiments, the partial face area may be acquired according to the color value of the pixel in the target area. This is because the skin color of the face can be distinguished from the non-face parts, such as the skin color of the face and the background, and some occluders (such as masks, water cups, water bottles, glasses, etc.), so the pixel color value can be used to extract the target area. local face area.

In some embodiments, the partial human face area may be acquired according to depth values of pixels in the target area. This is due to the spatial position (or depth) of the human face in space (or in depth) and some non-face parts, such as the background, some non-flaky occluders (such as water cups, water bottles, hands, arms) can be distinguished, so local face regions can be extracted based on pixel depth values.

In some embodiments, the occluded area of the human face may be extracted from the image of the target area first, and then for the difference area between the target area and the occluded area of the human face, according to the color value and/or depth value of the above pixel, Extract local face regions. For this embodiment, specifically, the following two implementation modes are possible:

1. Calculate the average reference color of each pixel color in the difference area; retain pixels whose difference between the color in the difference area and the average reference color is less than a threshold, and remove pixels with a difference greater than the threshold. The main principle is that most of the pixels in the difference area are pixels in the local face area, and the pixels with a larger color difference from the pixels in the local face area may be non-face pixels such as the background, so these pixels are removed.

In some embodiments, the average reference color may be an average value of these pixel colors. It can also be the mean value of the color calculated by weighting the position of each pixel (such as the closer to the edge of the difference area, the lower the weight), or weighting the difference between each pixel and the normal face color (such as the greater the difference The lower the weight), the mean value of the calculated color.

In other embodiments, the mean value can be calculated by fitting a Gaussian function, specifically: for the pixels in the difference region, the Gaussian function is fitted with the RGB three-channel color as the color coordinates, and the mean value and the standard value of the Gaussian function are obtained. difference; the mean is used as the average reference color, and the standard deviation is used as the threshold; the mean is used as the coordinates of the color center point, and the distance from the color coordinates of each pixel to the coordinates of the color center point is calculated , keep the pixels whose distance is less than the standard deviation.

2. Calculate the average reference depth value of the depth values of each pixel in the occluded area, retain the pixels whose depth value is lower than the average reference depth value in the difference area, and remove the pixels greater than or equal to the average reference depth value of pixels. The main principle is that the depth values of the occluded area and the local face area are obviously different, and the pixels in the difference area whose depth value is not smaller than the occluded area are extremely unlikely to be human faces, so these pixels are removed.

In some implementations, the average reference depth value may be an average value of the depths of pixels in the occluded area. It can also be the mean value of the depth calculated by weighting the position of each pixel (for example, the closer to the edge of the occluded area, the lower the weight),

For this step S12, in some embodiments, networks such as CNN, Skin RCNN (Mask RCNN), Fully Convolutional Networks (Fully Convolutional Networks, FCN), YoloV2 network (a neural network for small target detection) can also be used , or a combination of multiple networks to extract local face regions.

S13: For the extracted partial human face region, extract a human face shape feature from the partial human face region.

In some embodiments, the extracted features of the face shape may be regular features or sparse features. The sparsity of sparse features can reduce the complexity of neural network training, reduce the redundant features of neural networks, and reduce the required storage space.

In some embodiments, feature extraction from the local face area, for example, when extracting conventional features or sparse features, can be implemented using a feature extraction network, such as a CNN network, a residual network (such as Resnet50), and the like.

In some embodiments, as shown in the flow chart of fitting a 3D face model and a point cloud of a local face area as shown in FIG. 4 , the above-mentioned step S40 includes the following steps S41-S43:

S41: Extract key points of the local face area, and obtain 3D coordinates of the key points;

In some embodiments, this step can be as follows: input the target area into the key point detection network, such as CNN network, Hourglass network, etc., identify the key points in each pixel of the target area through the key point detection network, and determine the key points of the key points 2D position information. Then, according to the depth information of the image, the depth information of the key point can be determined, and then the 3D coordinates of the key point can be calculated according to the internal parameters of the depth camera.

In some other embodiments, in the key point extraction process, in addition to inputting the image of the target area into the key point detection network, the image of the local face area extracted in step S10 may also be input into the key point detection network.

When calculating the 3D coordinates of each key point, the 3D coordinates of each pixel in the local face area can also be calculated according to the depth information of the local face area and the internal parameters of the camera, that is, the point cloud information of the local face area can be calculated.

S42: Change the pose of the 3D face model according to the 3D coordinates of the key points, so as to realize preliminary alignment of the 3D face model with the point cloud of the local face area.

The pose change can be a rigid body transformation, that is, use the 3D coordinates of the key points of the face as the target position for 3D shape fitting of the face, and transform the established 3D face model to the target position through rigid body transformation.

In some embodiments, the rigid body transformation of the 3D face model can be implemented by point cloud matching algorithms, such as ICP, NDT, IDC and other algorithms.

S43: After the 3D face model is preliminarily aligned with the point cloud of the local face area, shape fitting is performed on the 3D face model and the point cloud of the local face area to complete 3D face reconstruction for the face.

Wherein, when performing shape fitting, the fitting of shape parameters may be performed through iterative algorithms, such as iterative algorithms such as quasi-Newton method and Newton method. The fitting may include fitting parameters such as lips and eyes, that is, optimizing the shape factor of the 3D face model. After the fitting is completed, the reconstruction of the details of the 3D face for the face is completed.

In order to better understand the present invention, a specific implementation manner of applying the present application to 3D face reconstruction will be described in detail below.

The following introduces a specific implementation of the application for 3D face reconstruction. In this implementation, a single-purpose depth camera (or RGB-D camera) is used to collect image data with depth, and the image data includes common RGB Colored 2D image information and depth information (Depth Map). Referring to the flow chart shown in Figure 5, this embodiment includes the following steps:

S110: Acquire RGB-D image data, the RGB-D image data may come from a depth camera, wherein the RGB-D image data is image data containing a occluded face, and as described above, the image data (image frame) consists of a 2D image (color image frame) and depth image (depth image frame), it can be understood that the pixel at each position has a 2D information (such as RGB data) and a depth information. The 2D image has color and texture, so the key point position of the human face in the unoccluded area of the object to be built can be identified through the 2D image.

S112: Perform occluded face detection on the image data, identify and extract a partial face area of the object to be built in the color 2D image. Specifically include:

First, the MobileV2 network is used to detect the occluded face of the color image frame, and determine the target region (ROI) in the color image where the object to be built is located; where the target region includes the partial face area and the occluded area of the face.

After the target area is determined, the target area is obtained by cropping, and then the obtained target area is scaled to the target pixel size for subsequent processing, for example, the target area is scaled to a size of 512x512 (pixels).

S114: Input the zoomed target area into the YoloV2 network to detect the occluded object, so as to determine the occluded area of the face.

S116: Extracting the partial human face area from the target area specifically includes the following steps in two ways. On the other hand, after extracting the local face area, the point cloud information of the local face area can also be calculated.

1. Face extraction based on color: According to the aforementioned target area and occlusion area, the difference area between the aforementioned target area and occlusion area is used as the RGB color reference area, and all pixels in the reference area are coordinate with RGB three-channel color to fit Gaussian Function to obtain the mean and standard deviation, and then use the mean as the coordinates of the center point to calculate the distance from each pixel to the center point, and keep the RGB information of the pixel if the distance is less than one standard deviation, otherwise remove the RGB information of the pixel information, the way of removing it can be a way of setting the RBG value of the pixel to zero.

In the above specific implementation, most of the difference between the target area and the occlusion area is the local face area, so the mean value corresponding to the three-channel color fitting Gaussian function corresponds to the face color, and the difference with the color is kept within the threshold Pixels (that is, pixels with similar colors), and remove pixels whose color difference is greater than the threshold (that is, pixels with large color differences), so as to realize the extraction of local face regions.

2. Considering that in the reference area, there may be pixel areas that are similar in color to the human face but have not been removed, but the depth difference between these areas and the human face is relatively large, the possibility of these areas being human faces is small, Therefore, in some implementations, this part of the pixel area can be further removed based on the depth value, as follows:

Use the average depth of the occluded area as a threshold to remove pixels with a depth value greater than or equal to the threshold in the reference area, for example, set the RBG values of these pixels to zero.

Through the above-mentioned processing based on the two methods of color and depth value, a relatively accurate partial face area can be extracted.

After extracting the local face area, according to the depth information corresponding to each pixel of the input depth image, and through the internal reference matrix of the depth camera, the 3D coordinates of each pixel in the local face area can be calculated, that is, the points of the local face area can be obtained Cloud information for the steps described later.

S118: Through a feature extraction network, such as a ResNet50 network, for the local face area, the local face shape feature is extracted. Here, the sparse feature is extracted, and the sparse feature of the local face shape is output through the feature extraction network regression. Here, the output is m Dimensional sparse features X, X=[X1, X2, . . . , Xm].

S120: Match the sparse feature X of the local face shape with the sparse feature A of each face sample in the 3D face database based on the similarity to match a similar face sample, The sample obtains its face shape parameter as the shape parameter q of the 3D face model to be built. The specific implementation of this step is described in detail below:

Suppose D=[D ₁ , D ₂ ,..., D _k ]∈R ^n×k is the sample set of 3D face database, where D _i =[d _i1 , d _i2 ,...,d _in ]∈R ⁿ is the An n-dimensional feature vector composed of shape parameters of i face samples.

First, the sparse features of each face sample are extracted. Since the 3D faces in the 3D face database have many different changes due to different expressions, lighting, and shooting angles, first use the normal distribution random sampling process to obtain the projection matrix W , and then use the projection matrix W to perform sparse feature extraction on the training sample D to obtain the corresponding sparse feature matrix A of the face, where A=W ^T D;

At this time, compared with the original data D, the data A is a sparse representation of face features, where A _i is the dimensionality reduction representation of D _i , that is, the sparse features of the i-th face sample. When the sparse feature dimension is m, then Expressed as A _i =[A _i1 , A _i2 , . . . , A _im ]∈R ^m .

Then, the sparse feature A of each face sample is compared with the sparse feature X of the current partial face shape, and a face sample is matched according to the similarity, and the 3D face shape feature parameter of the face sample is used as the parameter to be constructed. The shape feature parameter q of the 3D face model. The matching process is as follows:

Calculate the correlation between the sparse feature X of the local face shape and the sparse feature A _b of the b-th face sample by the formula δ _i (x)=(X _i -A _b ), where i=1 to m; i represents the i-th personal face samples;

Calculate the correlation residual between the sparse feature X of the local face shape and the sparse feature A _j of other remaining j-1 face samples by the formula σ _i (x)=∑ _j≠b (X _i -A _j ), where j =1 to k, and j is not equal to b, i represents the ith face sample;

Combining the above correlation δ _i (x) and associated residual σ _i (x), through the formula

Indicates the similarity S _i between the visible area of the face and the sparse features of the ith face sample;

By cyclically comparing all face samples in the 3D face database, the face sample individual i that maximizes the sparse feature similarity S _i is found, and the face feature vector D _i corresponding to the face sample is the 3D face to be built. The shape feature parameter q of the face model, that is, q=D _i =[d _i1 , d _i2 , . . . , d _in ].

S122: Using the 3D face shape characteristic parameter q (namely [di1, di2, ..., din]) of the matched face sample, through the above formula (3), combined with the parameterized face model, initially establish a 3D face model, or complete the initialization of the 3D face model.

In the above formula (3), the basic shape T and the shape vector Si are provided by the parameterized 3D face model, and the shape characteristic parameter q of the 3D face of the above-mentioned face sample is input into the parameterized 3D face model to obtain the initialized 3D face model.

S124: On the other hand, the image data of the target area scaled in step S112 is used as the input of the Hourglass network, the probability that each pixel in the image data of the target area is output by the Hourglass network is a key point, and the maximum value point of the probability is determined It is a key point, which is called a key point here, and then the 2D position information of each key point in the local face area is obtained.

S126: Due to occlusion interference, the performance of the Hourglass network will be affected to a certain extent. The key points of the local face area determined by it may include the key points of the occluded area. Therefore, it can be further determined according to step S114. The occluded area, the key points of the occluded area are eliminated, that is, this part of the noise is removed, so as to obtain more accurate key point information of the local face area, which can be used in the subsequent steps.

S128: After the location information of the key points is determined, the 3D coordinates of each key point of the local face area can be calculated through the internal reference matrix of the depth camera according to the depth information corresponding to each pixel position of the input depth image.

S130: Align the 3D human face model established in step S122 with the 3D coordinates of the key points of the local human face area, that is, use the 3D coordinates of the key points of the local human face area as the target position, and use the 3D coordinates of the key points of the local human face area as the target position. The 3D face model is initialized to the target position.

In this embodiment, the ICP algorithm is used to perform rigid body transformation on the 3D face model, that is, to move and rotate the 3D face model to the corresponding target position, so as to achieve preliminary alignment with the position of the face point cloud.

S132: For the aligned 3D face models, further use a parameter fitting algorithm to fit the shape parameters, so as to minimize the difference between the shape of the 3D face model and the point cloud position of the local face area, and complete the 3D face model reconstruction. The parameter fitting algorithm may be a quasi-Newton algorithm, a Newton algorithm, a gradient descent method, etc. Due to the fast convergence of the quasi-Newton algorithm and the low complexity of operation, the quasi-Newton algorithm is used in this embodiment.

The fitting process takes the 3D coordinates of the point cloud of the local face area as the fitting target, and the 3D coordinates of the point cloud of the local face area are calculated in step S116. Since the fitting is performed on the point cloud of the local face area, The selection of the 3D face model optimization target is limited to the point cloud of the local face area. For the fitting process, refer to the aforementioned introduction of shape fitting by quasi-Newton algorithm, and will not be repeated here.

In the following, a scenario where the embodiment of the present application is applied to a driver monitor system (Driver monitor system, DMS) on a vehicle will be further introduced. DMS can monitor the state of the driver, such as fatigue monitoring, distraction monitoring (or attention monitoring), eye tracking, and dangerous behavior monitoring (such as using mobile phones, eating, etc.). The following example illustrates:

First of all, DMS collects the driver's image through the camera in the vehicle cockpit. The camera can be a binocular camera, RGB-D camera, etc. The camera can be installed on the vehicle as required, for example, installed at the position of the rearview mirror in the cockpit, or installed on the Steering wheel, the area around the center console, etc. In this embodiment, the camera adopts a binocular camera composed of two cameras arranged on the left and right A-pillars of the vehicle cockpit as shown in FIG. 1B .

After the DMS collects the driver's image through the camera, the face image processing method provided in the embodiment of the present application can be used to process the collected image to reconstruct the driver's 3D face. Especially when the face in the collected image is incomplete, the face image processing method provided in the embodiment of the present application can be used for image processing. The situation where the face is incomplete includes the situation where the face is partially occluded, such as the situation where the driver wears sunglasses, the driver drinks water and makes a phone call, etc. The water cup, mobile phone, hand or arm partially occludes the face . Incomplete faces also include situations where part of the captured image is not captured, for example, the driver’s head moves in a large range or rotates at a large angle, causing part of the face to move outside the image capture area of the camera, or the driver’s head Turning (such as turning the head backwards) results in the situation that part of the face cannot be captured by the camera.

After the driver's 3D face is reconstructed, the state of the driver can be further detected based on the reconstructed 3D face. For example, based on the 3D face detection of the driver's head posture, the head posture includes whether the head moves in a large range or rotates at a large angle, or combined with the head posture changes over a period of time to detect whether the head is in an abnormal state, abnormal state Including frequently lowering the head (for example, it can be used as the basis for judging whether the driver is dozing off or looking at the mobile phone), and for a certain period of time, the head is raised or turned to one side (for example, it can be used as the basis for judging whether the driver is asleep). Another example is based on 3D face detection to detect the facial state of the driver. The facial state includes the opening of the eyes (for example, whether the eye opening is lower than the threshold as a basis for judging whether the driver is sleepy), the opening of the mouth (for example, it can be used as a basis for judging whether the driver is sleepy), Whether the driver is dozing off), the direction of the line of sight (for example, it can be used as the basis for judging the driver's attention).

DMS determines whether to warn the driver based on the detected driver's state and combined with the current driving scene. Wherein, the driving scene here may refer to the driving scene combined with the current automatic driving level (including automatic driving level L0-L5 level). For example, when the automatic driving level is low, the threshold for triggering a warning to the driver is relatively low. When the driving level is high, the threshold for triggering a warning to the driver is relatively high.

In some embodiments, the DMS can also provide the detected state of the driver to the vehicle control device, and the vehicle control device can judge whether to take over the driving control of the vehicle. Among them, the driving control after taking over includes automatically controlling the vehicle to slow down and park on the side of the road, and also includes performing automatic driving (for L4, L5 level automatic driving), such as automatically driving for a certain area (such as driving out of the expressway) or for a period of time until Drive to a safe place and stop.

As shown in Figure 6, the present application also provides a corresponding embodiment of a face image processing device. Regarding the beneficial effects or technical problems solved by the device, you can refer to the descriptions in the methods corresponding to each device, or Refer to the description in the summary of the invention, and details will not be repeated here.

In the embodiment of this human face image processing device, this human face image processing device 600 comprises:

The acquiring module 610 is configured to acquire local face shape features in the face image, and to acquire face samples matching the local face shape features, and acquire face shape parameters of the face samples. Specifically, this module can be used to execute steps S10-S20 and examples thereof in the above-mentioned face image processing method, or to execute steps S110-S120 and examples therein in the specific implementation of the above-mentioned face image processing method.

A generating module 620, configured to generate a three-dimensional human face model using the human face shape parameters. Specifically, this module can be used to execute steps S30 and S40 and the examples thereof in the above-mentioned face image processing method, or to execute steps S122-S132 and the examples thereof in the specific implementation manner of the above-mentioned face image processing method.

In some embodiments, the generating module 620 is further configured to: fit the generated 3D face model to the point cloud data of the partial face area, the partial face area included in the face face image.

In some embodiments, when the generating module 620 is used for the fitting, it is specifically used to: obtain the key points of the local face area, and obtain the three-dimensional coordinates of the key points; the three-dimensional face model performing pose transformation according to the three-dimensional coordinates of the key points; and fitting the three-dimensional face model after the pose transformation to the point cloud data of the local face area.

In some embodiments, the point cloud data of the partial face area is obtained according to the pixel depth value of the partial face area and camera parameters.

In some embodiments, the three-dimensional coordinates of the key points of the partial human face area are obtained according to the depth values of the key points and camera parameters.

In some embodiments, the acquiring module 610 is specifically configured to: acquire a target area in the face image, where the target area includes the partial face area; acquire the partial face in the target area area; acquiring the local face shape features of the local face area.

In some embodiments, when the acquisition module 610 is used to acquire the partial human face area in the target area, it is specifically used for at least one of the following: acquire the partial human face area according to the color value of the pixel in the target area. A face area: acquiring the partial human face area according to the depth value of the pixels in the target area.

In some embodiments, when the acquisition module 610 is used to acquire the face samples matching the local face shape features, it is specifically used to: search the face database for people matching the local face shape features face samples.

In some embodiments, the generation module 620 is specifically configured to: generate the 3D face model based on the parameterized 3D face model using the face shape parameters.

Table 1 shows the failure ratio of traditional 3D face reconstruction and 3D face reconstruction using the method of the embodiment of the present application when estimating the occlusion of different proportions of the face by the method of discrete random event model simulation, and each experiment is carried out 1000 times Second, the result of calculating the proportion of the deformed face structure (that is, the failure ratio of 3D face reconstruction) is as follows:

Table 1

It can be seen from the above that the method of the embodiment of the present application has better robustness in realizing 3D face reconstruction. Compared with traditional 3D face reconstruction, under different face occlusion ratios, the proportion of reconstruction failures is low, and, in human The proportion of reconstruction failures with large face occlusions is still low.

The embodiment of the present application also provides an electronic device, including: a processor, and a memory, on which program instructions are stored, and when the program instructions are executed by the processor, the processor executes the above-mentioned embodiments corresponding to Fig. 2A-Fig. 5 method, or alternative embodiments thereof. FIG. 7 is a schematic structural diagram of an electronic device 700 provided by an embodiment of the present application. The electronic device 700 includes: a processor 710 and a memory 720 .

It should be understood that the electronic device 700 shown in FIG. 7 may further include a communication interface 730, which may be used for communication with other devices.

Wherein, the processor 710 may be connected to the memory 720 . The memory 720 can be used to store the program codes and data. Therefore, the memory 720 may be a storage unit inside the processor 710, or an external storage unit independent of the processor 710, or may include a storage unit inside the processor 710 and an external storage unit independent of the processor 710. part.

Optionally, the electronic device 700 may also include a bus. Wherein, the memory 720 and the communication interface 730 may be connected to the processor 710 through a bus. The bus may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like. The bus can be divided into address bus, data bus, control bus and so on.

It should be understood that, in this embodiment of the present application, the processor 710 may be a central processing unit (central processing unit, CPU). The processor can also be other general-purpose processors, digital signal processors (digital signal processors, DSPs), application specific integrated circuits (Application specific integrated circuits, ASICs), off-the-shelf programmable gate arrays (field programmable gate arrays, FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. Alternatively, the processor 710 adopts one or more integrated circuits for executing related programs, so as to realize the technical solutions provided by the embodiments of the present application.

The memory 720 may include read-only memory and random-access memory, and provides instructions and data to the processor 710 . A portion of processor 710 may also include non-volatile random access memory. For example, processor 710 may also store device type information.

When the electronic device 700 is running, the processor 710 executes the computer-executed instructions in the memory 720 to execute the operation steps of the above-mentioned face image processing method, for example, execute the methods of the above-mentioned embodiments corresponding to FIGS. 2A-5 , or Each of the optional embodiments.

It should be understood that the electronic device 700 according to the embodiment of the present application may correspond to a corresponding subject performing the methods according to the various embodiments of the present application, and the above-mentioned and other operations and/or functions of the modules in the electronic device 700 are for realizing the present invention For the sake of brevity, the corresponding processes of the methods in the embodiments are not repeated here.

The embodiment of the present application also provides another electronic device, as shown in FIG. 8 , which is a schematic structural diagram of another electronic device 800 provided in this embodiment, including: a processor 810, and an interface circuit 820, wherein, The processor 810 accesses the memory through the interface circuit 820, and the memory stores program instructions. When the program instructions are executed by the processor, the processor executes the methods of the above-mentioned embodiments corresponding to FIGS. 2A-5 , or the optional embodiments therein. . In addition, the electronic device may further include a communication interface, a bus, etc. For details, refer to the introduction in the embodiment shown in FIG. 7 , and details are not repeated here.

As shown in FIG. 9A or 9B, the embodiment of the present application also provides a vehicle 100, including an image acquisition device 110 for collecting face images, and a face image processing device 120 and various embodiments included therein, or, electronic device 130 . The face image collected by the image acquisition device 110 is provided to the face image processing device 120 or provided to the electronic device 130, and the above-mentioned face image processing method is realized by the face image processing device 120 or the electronic device 130 according to the face image And various embodiments, to realize the reconstruction of 3D face. Wherein, the image acquisition device 110 may be a camera, such as a camera, where the camera may be a binocular camera, an RGB-D camera, an IR camera, etc., and the camera may be installed on the vehicle as required. In some embodiments, the example shown in FIG. 1A and FIG. 1B can be a binocular camera composed of two cameras arranged on the left and right A-pillars of the vehicle cockpit. In some other embodiments, it can also be installed on the passenger side of the rearview mirror in the cockpit of the vehicle, and can also be installed on the steering wheel, the area near the center console, and the like. In some other embodiments, the image acquisition device 110 can also be an electronic device that receives the occupant image data transmitted by the camera, such as a data transmission chip, such as a bus data transceiver chip, a network interface chip, etc., and the data transmission chip can also be Wireless transmission chips, such as Bluetooth chips or WiFi chips.

FIG. 10 is a schematic structural diagram of a computing device 900 provided by an embodiment of the present application. The computing device 900 includes: a processor 910 , a memory 920 , and may also include a communication interface 930 .

It should be understood that the communication interface 930 in the computing device 900 shown in FIG. 10 can be used to communicate with other devices.

Wherein, the processor 910 may be connected to the memory 920 . The memory 920 can be used to store the program codes and data. Therefore, the memory 920 may be a storage unit inside the processor 910, or an external storage unit independent of the processor 910, or may include a storage unit inside the processor 910 and an external storage unit independent of the processor 910. part.

Optionally, computing device 900 may further include a bus. Wherein, the memory 920 and the communication interface 930 may be connected to the processor 910 through a bus. The bus can be a PCI bus or an EISA bus or the like. The bus can be divided into address bus, data bus, control bus and so on.

When the computing device 900 is running, the processor 910 executes the computer-executed instructions in the memory 920 to perform the operation steps of the above method.

It should be understood that the computing device 900 according to the embodiment of the present application may correspond to a corresponding body executing the methods according to the various embodiments of the present application, and the above-mentioned and other operations and/or functions of the modules in the computing device 900 are for realizing the present invention For the sake of brevity, the corresponding processes of the methods in the embodiments are not repeated here.

The embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, it is used to execute the above-mentioned face image processing method, and the method includes the solutions described in the above-mentioned various embodiments at least one of the .

The embodiment of the present application also provides a computer program product, including program instructions. When the program instructions are executed by a computer, the above-mentioned face image processing method is implemented, and the method includes at least one of the solutions described in the above-mentioned embodiments. one.

Those skilled in the art can appreciate that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application. If the aforementioned functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium, including: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code.

In several embodiments provided in this application, the disclosed methods and devices may be implemented in other ways. For example, the device embodiments described above are only illustrative, and the division of the units is only a logical function division. In actual implementation, there may be other division methods, for example, multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented.

Those skilled in the art will understand that the present application is not limited to the specific embodiments described herein, and various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present application. Therefore, although the present application has been described in detail through the above embodiments, the present application is not limited to the above embodiments, and may include more other equivalent embodiments without departing from the concept of the present application, all of which belong to protection scope of this application.

Claims

A face image processing method, characterized in that, comprising:

Obtain the local face shape features in the face image;

Obtaining a face sample matching the local face shape feature, and obtaining face shape parameters of the face sample;

Generate a 3D face model.
The method according to claim 1, further comprising:

Fitting the generated three-dimensional human face model with point cloud data of a local human face area included in the human face image.
The method according to claim 2, wherein said fitting comprises:

Acquiring the key points of the local face area, and obtaining the three-dimensional coordinates of the key points;

performing pose transformation on the three-dimensional human face model according to the three-dimensional coordinates of the key points;

Fitting the three-dimensional human face model after pose transformation with the point cloud data of the local human face area.
The method according to claim 2 or 3, wherein the point cloud data of the partial human face area is obtained according to pixel depth values and camera parameters of the partial human face area.
The method according to claim 3, characterized in that, the three-dimensional coordinates of the key points in the partial human face area are obtained according to the depth values of the key points and camera parameters.
The method according to any one of claims 1-5, wherein said obtaining the local face shape features in the face image comprises:

Acquiring a target area in the face image, where the target area includes the partial face area;

Acquiring the partial face area in the target area;

Acquiring the local face shape features of the local face area.
The method according to claim 6, wherein said obtaining said partial human face area in said target area comprises at least one of the following methods:

Acquiring the local face area according to the color value of the pixel in the target area;

Acquiring the partial human face area according to the depth value of the pixel in the target area.
The method according to any one of claims 1-7, characterized in that the acquiring a face sample matched with the partial face shape feature comprises: Face samples for feature matching.
The method according to any one of claims 1-8, wherein said generating a 3D face model comprises: using said face shape parameters to generate said 3D face model based on a parameterized 3D face model .
A human face image processing device is characterized in that it comprises:

An acquisition module, configured to acquire a local face shape feature in a face image, and acquire a face sample matching the local face shape feature, and acquire a face shape parameter of the face sample;

The generation module is used to generate a three-dimensional human face model.
The device according to claim 10, wherein the generation module is further configured to: fit the generated three-dimensional face model with point cloud data of a local face area, the local face area includes on the face image.
The device according to claim 11, wherein when the generating module is used for the fitting, it is specifically used for:

Acquiring the key points of the local face area, and obtaining the three-dimensional coordinates of the key points;

performing pose transformation on the three-dimensional human face model according to the three-dimensional coordinates of the key points;

Fitting the three-dimensional human face model after pose transformation with the point cloud data of the local human face area.
The device according to claim 11 or 12, wherein the point cloud data of the partial face area is obtained according to pixel depth values and camera parameters of the partial face area.
The device according to claim 12, wherein the three-dimensional coordinates of the key points in the partial face area are obtained according to the depth values of the key points and camera parameters.
The device according to any one of claims 10-13, wherein the acquiring module is specifically used for:

Acquiring a target area in the face image, where the target area includes the partial face area;

Acquiring the partial face area in the target area;

Acquiring the local face shape features of the local face area.
The device according to claim 15, wherein when the acquisition module is used to acquire the partial face area in the target area, it is specifically used in at least one of the following ways:

Acquiring the local face area according to the color value of the pixel in the target area;

Acquiring the partial human face area according to the depth value of the pixel in the target area.
The device according to any one of claims 10-16, characterized in that, when the acquisition module is used to acquire the face samples matching the local face shape features, it is specifically used to: search in the face database A human face sample matched with the local human face shape feature.
The device according to any one of claims 10-17, wherein the generation module is specifically configured to: generate the 3D face model based on the parameterized 3D face model using the face shape parameters.
An electronic device, characterized in that it comprises:

processor, and

A memory, on which program instructions are stored, and when the program instructions are executed by the processor, the face image processing method according to any one of claims 1-9 is realized.
An electronic device, characterized in that it comprises:

processor, and interface circuitry,

Wherein, the processor accesses the memory through the interface circuit, and the memory has program instructions stored thereon. When the program instructions are executed by the processor, the human face described in any one of claims 1-9 is realized. image processing method.
A vehicle, characterized in that it comprises:

An image acquisition device for acquiring face images, and

The face image processing device according to any one of claims 10-18.
A computer-readable storage medium, characterized in that, program instructions are stored in the computer-readable storage medium, and when the program instructions are executed by a computer, the human face according to any one of claims 1-9 is realized. image processing method.
A computer program product, characterized in that it includes program instructions, and when the program instructions are executed by a computer, the computer implements the face image processing method according to any one of claims 1-9.