WO2023272725A1 - Facial image processing method and apparatus, and vehicle - Google Patents

Facial image processing method and apparatus, and vehicle Download PDF

Info

Publication number
WO2023272725A1
WO2023272725A1 PCT/CN2021/104294 CN2021104294W WO2023272725A1 WO 2023272725 A1 WO2023272725 A1 WO 2023272725A1 CN 2021104294 W CN2021104294 W CN 2021104294W WO 2023272725 A1 WO2023272725 A1 WO 2023272725A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
area
local
acquiring
partial
Prior art date
Application number
PCT/CN2021/104294
Other languages
French (fr)
Chinese (zh)
Inventor
崔贤娟
刘杨
黄为
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202180002021.5A priority Critical patent/CN113632098A/en
Priority to PCT/CN2021/104294 priority patent/WO2023272725A1/en
Publication of WO2023272725A1 publication Critical patent/WO2023272725A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/08Indexing scheme for image data processing or generation, in general involving all processing steps from image acquisition to 3D model generation

Definitions

  • the invention relates to the technical field of machine vision, in particular to a face image processing method, device and vehicle.
  • Three-dimensional (3D) face reconstruction is a research hotspot in the fields of machine vision, computer vision and computer graphics.
  • 3D face reconstruction is one of the core technologies in the fields of virtual reality/augmented reality, automatic driving, robotics, etc., and has great application value in the driver monitoring system (DMS) in the field of smart cars.
  • DMS driver monitoring system
  • 3D face reconstruction is one of the basic technologies for monitoring the driver's head posture and gaze direction, which directly affects the performance of human-computer interaction and DMS.
  • Occluded face images can be divided into unintentional occlusion and intentional occlusion.
  • Common unintentional occlusions include glasses, steering wheels, and others blocking the face of the monitored person, while intentional occlusions usually include sunglasses, masks, or other objects blocking facial features.
  • Intentional occlusion usually results in the failure of 3D face reconstruction due to excessive feature changes, while unintentional occlusion usually only covers a small part of facial features, which can easily lead to the introduction of too many interference features in the feature extraction process, resulting in distortion of 3D face reconstruction.
  • the embodiment of the present application provides a face image processing solution, which includes a face image processing method, device, vehicle, computing device, computer-readable storage medium and computer program product, which can be implemented in the face part 3D face reconstruction under occluded conditions.
  • the first aspect of the present application provides a face image processing method, including: acquiring the local face shape features in the face image; Facial shape parameters of the face sample; use the facial shape parameters to generate a 3D face model.
  • the embodiment of the present application generates a 3D face model based on the information of the face sample that matches the local face shape, which reduces the interference of occluders, and is suitable for 3D face reconstruction in face occlusion scenarios with high robustness. features.
  • the face image processing method based on the embodiment of the present application is applied to a smart vehicle (such as a driver status monitoring system in a smart vehicle), it can be realized that the face image processing method of the occupant (the occupant refers to the driver or passenger) has a blocked face. 3D face reconstruction, and further recognition of head posture and/or gaze direction based on the reconstructed 3D face, which can improve the robustness and stability of the driver status monitoring system.
  • the first aspect also includes: fitting the generated 3D face model to point cloud data of a local face area, the local face area being included in the face image.
  • the fitting may include fitting parameters such as lips and eyes, so that the reconstructed 3D face is closer to the real appearance of the user.
  • the above fitting includes: obtaining key points of the local face area, and obtaining the three-dimensional coordinates of the key points; performing pose transformation on the three-dimensional face model according to the three-dimensional coordinates of the key points; The transformed 3D face model is fitted with the point cloud data of the local face area.
  • the rigid body transformation of the 3D face model is realized with the 3D coordinates of the key points as the target, and the preliminary alignment with the point cloud data of the local face area is realized.
  • This process can be realized by using the ICP algorithm. is small, so the calculation amount of the preliminary alignment is small and the alignment speed is fast.
  • the quasi-Newton algorithm can be used to realize the fast convergence of the quasi-Newton algorithm and the characteristics of low computational complexity.
  • the point cloud data of the local face area is obtained according to the pixel depth value of the local face area and camera parameters.
  • this application can be implemented by using binocular cameras, RGB-D cameras, infrared cameras, etc., and the cost of implementation is lower than that of other image perception and acquisition devices.
  • the cameras here are not limited to traditional cameras, but also include image acquisition devices such as cameras.
  • the three-dimensional coordinates of the key points in the local face area are obtained according to the depth values of the key points and camera parameters.
  • this application can be realized by using binocular cameras and RGB-D cameras, and the cost of implementation is lower than that of other image perception and acquisition devices.
  • obtaining the local face shape features in the face image includes: obtaining a target area in the face image, where the target area includes a local face area; Face area; obtain the local face shape features of the local face area.
  • the two-step method of obtaining the target area from the image first, and then obtaining the local face area from the target area, compared with the method of directly obtaining the local face area from the image, can make the neural network for this process as a whole complex. Low weight and easy to train.
  • obtaining the local face area in the target area includes at least one of the following methods: obtaining the local face area according to the color value of the pixel in the target area; according to the depth of the pixel in the target area The value gets the partial face area.
  • the local face area can be obtained according to the color value of the pixel in the target area. This is because the skin color of the face can be distinguished from the non-face parts, such as the skin color of the face and the background, and some occluders (such as masks, cups, water bottles, etc.), so the local area in the target area can be extracted based on the pixel color value. face area.
  • the local face area can also be obtained according to the depth value of the pixel in the target area.
  • the occluded area of the human face can be extracted from the image of the target area first, and then for the difference area between the target area and the occluded area of the human face, according to the color value and/or depth value of the above pixel, to extract local face regions.
  • obtaining the face samples matching the local face shape features includes: retrieving the face samples matching the local face shape features in a face database.
  • a face database can be constructed based on real faces in advance, and massive data can improve the accuracy of matched face samples in terms of probability. Moreover, obtaining a complete real face sample based on the face database will make the 3D face reconstructed in the embodiment of the present application more realistic.
  • using face shape parameters to generate a 3D face model includes: generating a 3D face model based on a parameterized 3D face model using face shape parameters.
  • the second aspect of the present application provides a face image processing device, including: an acquisition module, used to acquire the local face shape feature in the face image, and acquire a face sample matching the local face shape feature, and acquire the face The face shape parameters of the face sample; the generation module is used to generate a three-dimensional face model.
  • the generating module is further configured to: fit the generated 3D face model with point cloud data of a local face area.
  • the generation module when used for fitting, it is specifically used to: obtain the key points of the local face area, and obtain the three-dimensional coordinates of the key points; The coordinates are transformed into a pose; the 3D face model after pose transformation is fitted with the point cloud data of the local face area.
  • the point cloud data of the local face area is obtained according to the pixel depth value of the local face area and camera parameters.
  • the three-dimensional coordinates of the key points in the local face area are obtained according to the depth values of the key points and camera parameters.
  • the acquisition module is specifically used to: acquire the target area in the image, the target area includes a partial face area; acquire the partial face area in the target area; acquire the partial face area Local face shape features for face regions.
  • the acquisition module when used to acquire the partial face area in the target area, it is specifically used in at least one of the following ways: acquire the partial face area according to the color value of the pixel in the target area ; Obtain the local face area according to the depth value of the pixel in the target area.
  • the acquiring module when used to acquire face samples that match local face shape features, it is specifically used to: retrieve faces in a face database that match local face shape features sample.
  • the generation module is specifically configured to: generate a three-dimensional face model based on a parameterized three-dimensional face model by using face shape parameters.
  • the third aspect of the present application provides an electronic device, including: a processor, and a memory, on which program instructions are stored, and when the program instructions are executed by the processor, any one of the face image processing provided in the first aspect above can be realized method.
  • the fourth aspect of the present application provides an electronic device, including: a processor, and an interface circuit, wherein the processor accesses the memory through the interface circuit, and the memory stores program instructions.
  • the program instructions are executed by the processor, the above-mentioned first
  • any face image processing method provided.
  • the fifth aspect of the present application provides a vehicle including: an image acquisition device for acquiring face images, and any one of the face image processing devices provided in the first aspect above, or, the third aspect or the fourth aspect above. of electronic devices.
  • the sixth aspect of the present application provides a computer-readable storage medium.
  • the computer-readable storage medium stores program instructions.
  • the program instructions When executed by the computer, the computer can realize any one of the face image processing provided by the above-mentioned first aspect. method.
  • the seventh aspect of the present application provides a computer program product, which includes program instructions.
  • the program instructions When the program instructions are executed by a computer, the computer implements any one of the face image processing methods provided in the first aspect above.
  • the face image processing scheme adopted in the embodiment of the present application extracts local face features from the local face area (that is, the part of the face that is not occluded), and then matches to a similar face image.
  • a 3D face model is established according to the shape parameters of the face sample, which solves the 3D face reconstruction when the face is occluded.
  • the complexity of the neural network for local face feature extraction can be reduced, and the operating efficiency can be improved during the matching process.
  • the reconstructed 3D face is closer to the real appearance of the user.
  • FIG. 1A is a schematic diagram of a scene where an embodiment of the present application is applied to a vehicle
  • FIG. 1B is a schematic diagram of a scene where the embodiment of the present application is applied to a vehicle
  • Fig. 2A is the flowchart of the face image processing method of the embodiment of the present application.
  • FIG. 2B is a schematic diagram of a face image processing method according to an embodiment of the present application.
  • Fig. 3 is the flow chart of the partial human face area extraction in an embodiment of the present application.
  • Fig. 4 is the fitting flow chart of 3D face model and local face region point cloud in one embodiment of the present application
  • Fig. 5 is a flow chart of a specific embodiment of the applicant's face image processing method
  • FIG. 6 is a schematic diagram of an embodiment of the 3D face reconstruction device of the present application.
  • FIG. 7 is a schematic diagram of an electronic device provided in an embodiment of the present application.
  • FIG. 8 is a schematic diagram of another electronic device provided by an embodiment of the present application.
  • FIG. 9A is a schematic diagram of a vehicle provided in an embodiment of the present application.
  • FIG. 9B is a schematic diagram of a vehicle provided in the embodiment of the present application.
  • FIG. 10 is a schematic diagram of an embodiment of a computing device of the present application.
  • the face image processing solution provided in the embodiments of the present application includes a face image processing method and device, a computing device, a computer-readable storage medium, and a computer program product. Since the principles of these technical solutions to solve problems are the same or similar, in the introduction of the following specific embodiments, some repetitions may not be repeated, but it should be considered that these specific embodiments have been referred to each other and can be combined with each other.
  • Image data with depth it includes ordinary red, green, blue (Red, Green, Blue, RGB) color image information and depth information, and the RGB image information and depth information are registered, that is, there is a distance between pixels One-to-one correspondence.
  • the collection of image data with depth can be realized through RGB-depth (GRB-Depth, RGB-D) cameras, and the collected image data with depth can be presented in the form of an RGB image frame and a depth image frame, and can also be integrated Presented as an image data. According to the internal parameters of the camera, the transformation between depth information and point cloud coordinates can be realized.
  • Region of Interest In the embodiment of this application, it refers to the face target frame area in the image to be recognized.
  • the region can be either an image containing an occluded human face image, or a cropped human face region image.
  • Partial face area In this embodiment of the application, it refers to the visible area of the face in the target area, that is, the unoccluded area.
  • Occluded area In this embodiment of the application, it refers to the area where the human face is occluded.
  • Parametric face model a way to represent a face by combining a standard face (or called an average face, a reference face, a basic shape face, and a statistical face) in combination with shape feature vectors, pose feature vectors, or expression feature vectors.
  • 3D morphable face model (3D Morphable Face Model, 3DMM), FLAME model, etc.
  • the FLAME model is a real human body point cloud based on CAESAR data, where each real head grid is obtained by registering the head data of these real human bodies, and the head grid includes the entire area of the face and head. Thus, a real face and head database is established.
  • the human head grid is composed of several (such as 5023) vertices and several (such as 9976) triangular faces, and several (such as 300) shapes (shape), several (such as 100 1) expression (expression) and several (such as 15) posture (pose) principal components, so that a parameterized 3D human head model can be determined accordingly.
  • the shape T of FLAME is defined as the coordinates of each vertex k constituting the grid, which can be described as the following formula (1):
  • T (x 1 , y 1 , z 1 , x 2 , . . . , x n , y n , z n ) (1)
  • FLAME models the shape and expression separately, and the FLAME face model can be described as the following formula (2):
  • T 0 is a standard face, that is, the average shape part of the face
  • Si represents the association
  • the eigenvector of the variance matrix is the face shape vector parameter (the above-mentioned shape principal component); q is the coefficient corresponding to the face shape vector parameter.
  • the modeling of the face shape part (which can be recorded as T(S) in the embodiment of the present application) can be expressed as a linear combination of the basic shape T 0 plus n shape vectors Si, which can be described as the following formula (3 ):
  • (w x, k , w y, k , w z, k ) represents the target position.
  • the target position is the 3D coordinates of each key point in the local face area.
  • point cloud matching is to solve the transformation relationship between two piles of point clouds, that is, to solve the above-mentioned rotation parameters and translation parameters) to optimize the angle and attitude.
  • Common point cloud matching algorithms such as Iterative Closest Point (ICP), Normal Distribution Transform (NDT), Iterative Dual Correspondences (IDC) and so on.
  • Quasi-Newton algorithm is one of the iterative algorithms. Quasi-Newton algorithm adopts second-order convergence. Compared with the conventional gradient descent method, the convergence speed is faster, and the quasi-Newton method is more complex than the Newton method. Low.
  • the point cloud data of the local face area is given, and the face shape coefficient is further optimized by minimizing the objective function, wherein the objective function is the following formula (5), which is the relationship between the 3D coordinates of the face point cloud and the reconstructed model vertices The sum of the squared differences between them, the objective function is a convex function, which is solved iteratively through one of the classic convex optimization algorithms - the quasi-Newton algorithm.
  • (P x, i , P y, i , P z, i ) are the points in the point cloud data of the above local face area
  • (V x, i , V y, i , V z, i ) are the generated
  • the point in the 3D face model, i represents the i-th vertex
  • Ii represents whether the model vertex i is included in the calculation of the objective function.
  • a technical scheme that can be adopted is to reconstruct the 3D face based on the key points of the two-dimensional (2D) face image.
  • the technical scheme first extracts the key points on the 2D face image.
  • the key points can be 17 points on the facial contour, 5 points on the left eyebrow, 5 points on the right eyebrow, 6 points on the left eye, 6 points on the right eye, 4 points on the bridge of the nose, 5 points on the wing of the nose, 20 points on the mouth contour, etc.
  • each key point is used to represent the contour of the face; then, adjust the position of the corresponding feature point in the standard 3D face model through the corresponding relationship of the key point on the general standard 3D face model; and then by interpolating other non-feature points , deform the standard 3D face model to obtain a reconstructed 3D face model. It can be seen from the above that this method needs to extract 2D key points from the input image. However, when 3D face reconstruction is performed based on the side image, since the 2D key point information is occluded by itself, the positioning of the 2D key points of the face will be inaccurate.
  • Another technical solution that can be adopted is to use a consumer-grade RGB-D depth camera to perform 3D reconstruction of the face.
  • This solution performs geometric registration on the point cloud corresponding to the input image of the current frame to perform 3D face reconstruction.
  • Most of the links are based on the ICP algorithm.
  • This process is a fitting optimization problem that requires a large number of iterative calculations.
  • the main problem of this method is that when the face is occluded, the face point cloud data of the occluded part of the image frame is not known, so it is difficult to determine the information to be fitted, resulting in the failure of the reconstructed 3D object.
  • the embodiment of the present application provides a face image processing method, which is an improved 3D face reconstruction method.
  • the partial face area of the local face is extracted, and the similarity matching is performed with the sparse features of each 3D face sample in the face database to obtain the shape parameter corresponding to the matched 3D face sample.
  • the combination Parametric face model to establish a 3D head model identify each key point of the local face area, and obtain the 3D data of each key point; Perform rigid body transformation on the head model to achieve initial alignment between the 3D head model and the point cloud of the local face area in the camera coordinate system; then perform fitting optimization between the local face area and the 3D head model, and the selection of optimization targets is limited to the local face Regional point cloud to complete 3D face reconstruction.
  • the method of the embodiment of the present application has a better 3D face reconstruction effect when the human face is partially occluded intentionally or unintentionally, or when the large-angle head posture is self-occluded, which makes it difficult to obtain the information to be fitted. .
  • the embodiment of the present application can be applied to the 3D reconstruction of the face of a person in a vehicle, an airplane, etc., such as a driver, so that the driver's head posture, line of sight direction, etc. can be judged based on the reconstructed 3D face, To identify the state of the driver. It can also be applied to the reconstruction of the 3D faces of the audience in front of the TV, students in teaching, etc., so that based on the reconstructed 3D faces, the head posture and line of sight direction of these people can be judged to further determine the attention of the people The direction of force, the degree of attention, etc., and then adjust the TV content, teaching methods, etc.
  • FIG. 1A and FIG. 1B show examples of scenarios where the embodiment of the present application is applied to a vehicle.
  • the vehicle in this embodiment includes a general motor vehicle, such as a car, a sport utility vehicle (sport utility vehicle, SUV), a utility vehicle, etc.
  • Land transportation devices including multi-purpose vehicles (MPV), buses, trucks and other cargo or passenger vehicles, as well as water vehicles including various ships and boats, and aircraft.
  • MPV multi-purpose vehicles
  • a hybrid vehicle refers to a vehicle having two or more power sources
  • an electric vehicle includes a pure electric vehicle, an extended-range electric vehicle, etc., which is not specifically limited in this application.
  • the vehicle 10 may include an image acquisition device 11 and a processor 12 .
  • the image acquisition device 11 is used to acquire images including faces of occupants (the occupants include drivers or passengers).
  • the image acquisition device 11 is a camera, where the camera may be a binocular camera, an RGB-D camera, or the like.
  • the camera can be installed on the vehicle as required, for example, in the cockpit of the vehicle.
  • a binocular camera composed of two independent cameras is adopted, and these two cameras are the first camera 111 and the second camera 112 arranged on the left and right A-pillars of the vehicle cockpit. .
  • it can also be installed on the side of the rearview mirror in the vehicle cockpit facing the occupant, it can also be installed on the steering wheel, the area near the center console, and it can also be installed on the position above the display screen behind the seat. It is used to collect facial images of drivers or passengers in the vehicle cockpit.
  • the image acquisition device 11 can also be an electronic device that receives the occupant image data transmitted by the camera, such as a data transmission chip, such as a bus data transceiver chip, a network interface chip, etc., and the data transmission chip can also be Wireless transmission chips, such as Bluetooth chips or WiFi chips.
  • the image acquisition device 11 may also be integrated into the processor, and become an interface circuit or a data transmission module integrated into the processor.
  • the processor 12 can be used to reconstruct the 3D face according to the face in the image.
  • the face area (such as the unoccluded face area) is used for 3D face reconstruction.
  • the processor 12 can also be used to identify the occupant's head posture and/or gaze direction according to the reconstructed 3D face, and can further determine the occupant's attention according to the recognized head posture and gaze direction.
  • Direction of force, degree of attention, etc. When the embodiment of the present application is applied to the vehicle 10, the processor 12 may be an electronic device, such as a processor of a car machine, a domain controller, a mobile data center (Mobile Data Center, MDC) or a vehicle-mounted computer, etc. It can also be a conventional chip such as a central processing unit (central processing unit, CPU) or a microprocessor (micro control unit, MCU).
  • Figure 2A shows an embodiment of the face image processing method of the present application
  • Fig. 2B is a schematic diagram of the face image processing method of this embodiment, and the embodiment of the face image processing method includes the following steps:
  • the face image is a 2D image with depth information
  • the face image includes a partial face area
  • the partial face area includes an unoccluded partial face.
  • the face image may be collected by a binocular camera or an RGB-D camera, or may be received by a data transmission chip.
  • a binocular camera the depth information of the pixel can be calculated according to the collected image pairs (an image pair refers to a pair of images collected by the binocular camera) and camera parameters.
  • an RGB-D camera the depth information of the pixels can be obtained directly.
  • S20 Acquire a face sample that matches the local face shape feature according to the local face shape feature, and then acquire a face shape parameter of the face sample.
  • search can be performed in the face database to match the face samples similar to the local face shape features, for example, to match the person with the highest similarity face sample, and then obtain the face shape parameters of the face sample.
  • the matching goal when matching a face sample similar to the local face shape feature, may be: make the face shape feature of the local face area consistent with the face sample of the face sample.
  • the correlation of the shape features is the largest, and the correlation residual between the face shape features of the local face area and the face shape features of other face samples is minimized.
  • the face samples in the face database are unoccluded face samples, through this step, a similar face can be obtained, and the face shape parameters of the similar face can be obtained.
  • the face database can be formed based on real faces, and massive data can improve the accuracy of matched face samples in terms of probability. Moreover, obtaining a complete real face sample based on the face database will make the 3D face reconstructed by the method of the embodiment of the present application more realistic.
  • the 3D face model can be generated based on the parameterized 3D face model. This step can refer to the above formula (3), and the obtained face shape parameters are brought into the formula (3) to generate a 3D human face. face model.
  • a step S40 may also be included: fitting the 3D face model based on the point cloud data of the local face area, the fitting including pose transformation and shape fitting.
  • the point cloud data of the local face area can be obtained according to the depth value of each pixel and camera parameters.
  • step S10 may include the following steps S11-S13:
  • S11 Acquire a target area in the face image, where the target area includes the partial face area.
  • the image in addition to the face to be built, the image also includes other content.
  • the image is first
  • the target area is extracted through face detection, and the target area can also be called a region of interest (ROI).
  • ROI region of interest
  • the target area is an area including the partial human face area, and this area may be a rectangular area, a circular area or an area of any shape. In this embodiment, it may be a rectangular area. In this area The included local face area refers to the area where the face is not occluded, and the non-face area of this area includes the occluder area and the background area.
  • a convolutional neural network Convolutional Neural Networks, CNN
  • a region selection network (Region Proposal Network, RPN)
  • a region selection network based on a convolution function (Regions with CNN features , RCNN), fast RCNN network (Faster-RCNN), MobileV2 network (a lightweight neural network) and other networks, or a combination of multiple networks to extract the target area.
  • the partial face area may be acquired according to the color value of the pixel in the target area. This is because the skin color of the face can be distinguished from the non-face parts, such as the skin color of the face and the background, and some occluders (such as masks, water cups, water bottles, glasses, etc.), so the pixel color value can be used to extract the target area. local face area.
  • the partial human face area may be acquired according to depth values of pixels in the target area. This is due to the spatial position (or depth) of the human face in space (or in depth) and some non-face parts, such as the background, some non-flaky occluders (such as water cups, water bottles, hands, arms) can be distinguished, so local face regions can be extracted based on pixel depth values.
  • the occluded area of the human face may be extracted from the image of the target area first, and then for the difference area between the target area and the occluded area of the human face, according to the color value and/or depth value of the above pixel, Extract local face regions.
  • Extract local face regions may be extracted from the image of the target area first, and then for the difference area between the target area and the occluded area of the human face, according to the color value and/or depth value of the above pixel, Extract local face regions.
  • the average reference color may be an average value of these pixel colors. It can also be the mean value of the color calculated by weighting the position of each pixel (such as the closer to the edge of the difference area, the lower the weight), or weighting the difference between each pixel and the normal face color (such as the greater the difference The lower the weight), the mean value of the calculated color.
  • the mean value can be calculated by fitting a Gaussian function, specifically: for the pixels in the difference region, the Gaussian function is fitted with the RGB three-channel color as the color coordinates, and the mean value and the standard value of the Gaussian function are obtained. difference; the mean is used as the average reference color, and the standard deviation is used as the threshold; the mean is used as the coordinates of the color center point, and the distance from the color coordinates of each pixel to the coordinates of the color center point is calculated , keep the pixels whose distance is less than the standard deviation.
  • the average reference depth value may be an average value of the depths of pixels in the occluded area. It can also be the mean value of the depth calculated by weighting the position of each pixel (for example, the closer to the edge of the occluded area, the lower the weight),
  • networks such as CNN, Skin RCNN (Mask RCNN), Fully Convolutional Networks (Fully Convolutional Networks, FCN), YoloV2 network (a neural network for small target detection) can also be used , or a combination of multiple networks to extract local face regions.
  • the extracted features of the face shape may be regular features or sparse features.
  • the sparsity of sparse features can reduce the complexity of neural network training, reduce the redundant features of neural networks, and reduce the required storage space.
  • feature extraction from the local face area for example, when extracting conventional features or sparse features, can be implemented using a feature extraction network, such as a CNN network, a residual network (such as Resnet50), and the like.
  • a feature extraction network such as a CNN network, a residual network (such as Resnet50), and the like.
  • the above-mentioned step S40 includes the following steps S41-S43:
  • this step can be as follows: input the target area into the key point detection network, such as CNN network, Hourglass network, etc., identify the key points in each pixel of the target area through the key point detection network, and determine the key points of the key points 2D position information. Then, according to the depth information of the image, the depth information of the key point can be determined, and then the 3D coordinates of the key point can be calculated according to the internal parameters of the depth camera.
  • the key point detection network such as CNN network, Hourglass network, etc.
  • the image of the local face area extracted in step S10 may also be input into the key point detection network.
  • the 3D coordinates of each pixel in the local face area can also be calculated according to the depth information of the local face area and the internal parameters of the camera, that is, the point cloud information of the local face area can be calculated.
  • S42 Change the pose of the 3D face model according to the 3D coordinates of the key points, so as to realize preliminary alignment of the 3D face model with the point cloud of the local face area.
  • the pose change can be a rigid body transformation, that is, use the 3D coordinates of the key points of the face as the target position for 3D shape fitting of the face, and transform the established 3D face model to the target position through rigid body transformation.
  • the rigid body transformation of the 3D face model can be implemented by point cloud matching algorithms, such as ICP, NDT, IDC and other algorithms.
  • the fitting of shape parameters may be performed through iterative algorithms, such as iterative algorithms such as quasi-Newton method and Newton method.
  • the fitting may include fitting parameters such as lips and eyes, that is, optimizing the shape factor of the 3D face model. After the fitting is completed, the reconstruction of the details of the 3D face for the face is completed.
  • a single-purpose depth camera (or RGB-D camera) is used to collect image data with depth, and the image data includes common RGB Colored 2D image information and depth information (Depth Map).
  • RGB-D camera RGB-D camera
  • this embodiment includes the following steps:
  • the RGB-D image data may come from a depth camera, wherein the RGB-D image data is image data containing a occluded face, and as described above, the image data (image frame) consists of a 2D image (color image frame) and depth image (depth image frame), it can be understood that the pixel at each position has a 2D information (such as RGB data) and a depth information.
  • the 2D image has color and texture, so the key point position of the human face in the unoccluded area of the object to be built can be identified through the 2D image.
  • S112 Perform occluded face detection on the image data, identify and extract a partial face area of the object to be built in the color 2D image. Specifically include:
  • the MobileV2 network is used to detect the occluded face of the color image frame, and determine the target region (ROI) in the color image where the object to be built is located; where the target region includes the partial face area and the occluded area of the face.
  • ROI target region
  • the target area is obtained by cropping, and then the obtained target area is scaled to the target pixel size for subsequent processing, for example, the target area is scaled to a size of 512x512 (pixels).
  • S114 Input the zoomed target area into the YoloV2 network to detect the occluded object, so as to determine the occluded area of the face.
  • Extracting the partial human face area from the target area specifically includes the following steps in two ways. On the other hand, after extracting the local face area, the point cloud information of the local face area can also be calculated.
  • Face extraction based on color According to the aforementioned target area and occlusion area, the difference area between the aforementioned target area and occlusion area is used as the RGB color reference area, and all pixels in the reference area are coordinate with RGB three-channel color to fit Gaussian Function to obtain the mean and standard deviation, and then use the mean as the coordinates of the center point to calculate the distance from each pixel to the center point, and keep the RGB information of the pixel if the distance is less than one standard deviation, otherwise remove the RGB information of the pixel information, the way of removing it can be a way of setting the RBG value of the pixel to zero.
  • most of the difference between the target area and the occlusion area is the local face area, so the mean value corresponding to the three-channel color fitting Gaussian function corresponds to the face color, and the difference with the color is kept within the threshold Pixels (that is, pixels with similar colors), and remove pixels whose color difference is greater than the threshold (that is, pixels with large color differences), so as to realize the extraction of local face regions.
  • this part of the pixel area can be further removed based on the depth value, as follows:
  • the 3D coordinates of each pixel in the local face area can be calculated, that is, the points of the local face area can be obtained Cloud information for the steps described later.
  • the local face shape feature is extracted.
  • the sparse feature is extracted, and the sparse feature of the local face shape is output through the feature extraction network regression.
  • the data A is a sparse representation of face features, where A i is the dimensionality reduction representation of D i , that is, the sparse features of the i-th face sample.
  • a i [A i1 , A i2 , . . . , A im ] ⁇ R m .
  • the sparse feature A of each face sample is compared with the sparse feature X of the current partial face shape, and a face sample is matched according to the similarity, and the 3D face shape feature parameter of the face sample is used as the parameter to be constructed.
  • the face sample individual i that maximizes the sparse feature similarity S i is found, and the face feature vector D i corresponding to the face sample is the 3D face to be built.
  • the basic shape T and the shape vector Si are provided by the parameterized 3D face model, and the shape characteristic parameter q of the 3D face of the above-mentioned face sample is input into the parameterized 3D face model to obtain the initialized 3D face model.
  • step S124 On the other hand, the image data of the target area scaled in step S112 is used as the input of the Hourglass network, the probability that each pixel in the image data of the target area is output by the Hourglass network is a key point, and the maximum value point of the probability is determined It is a key point, which is called a key point here, and then the 2D position information of each key point in the local face area is obtained.
  • the key points of the local face area determined by it may include the key points of the occluded area. Therefore, it can be further determined according to step S114.
  • the occluded area, the key points of the occluded area are eliminated, that is, this part of the noise is removed, so as to obtain more accurate key point information of the local face area, which can be used in the subsequent steps.
  • step S130 Align the 3D human face model established in step S122 with the 3D coordinates of the key points of the local human face area, that is, use the 3D coordinates of the key points of the local human face area as the target position, and use the 3D coordinates of the key points of the local human face area as the target position.
  • the 3D face model is initialized to the target position.
  • the ICP algorithm is used to perform rigid body transformation on the 3D face model, that is, to move and rotate the 3D face model to the corresponding target position, so as to achieve preliminary alignment with the position of the face point cloud.
  • the parameter fitting algorithm may be a quasi-Newton algorithm, a Newton algorithm, a gradient descent method, etc. Due to the fast convergence of the quasi-Newton algorithm and the low complexity of operation, the quasi-Newton algorithm is used in this embodiment.
  • the fitting process takes the 3D coordinates of the point cloud of the local face area as the fitting target, and the 3D coordinates of the point cloud of the local face area are calculated in step S116. Since the fitting is performed on the point cloud of the local face area, The selection of the 3D face model optimization target is limited to the point cloud of the local face area.
  • the fitting process refer to the aforementioned introduction of shape fitting by quasi-Newton algorithm, and will not be repeated here.
  • DMS Driver monitor system
  • DMS can monitor the state of the driver, such as fatigue monitoring, distraction monitoring (or attention monitoring), eye tracking, and dangerous behavior monitoring (such as using mobile phones, eating, etc.).
  • fatigue monitoring such as fatigue monitoring, distraction monitoring (or attention monitoring), eye tracking, and dangerous behavior monitoring (such as using mobile phones, eating, etc.).
  • distraction monitoring or attention monitoring
  • eye tracking such as using mobile phones, eating, etc.
  • dangerous behavior monitoring such as using mobile phones, eating, etc.
  • DMS collects the driver's image through the camera in the vehicle cockpit.
  • the camera can be a binocular camera, RGB-D camera, etc.
  • the camera can be installed on the vehicle as required, for example, installed at the position of the rearview mirror in the cockpit, or installed on the Steering wheel, the area around the center console, etc.
  • the camera adopts a binocular camera composed of two cameras arranged on the left and right A-pillars of the vehicle cockpit as shown in FIG. 1B .
  • the face image processing method provided in the embodiment of the present application can be used to process the collected image to reconstruct the driver's 3D face.
  • the face image processing method provided in the embodiment of the present application can be used for image processing.
  • the situation where the face is incomplete includes the situation where the face is partially occluded, such as the situation where the driver wears sunglasses, the driver drinks water and makes a phone call, etc. The water cup, mobile phone, hand or arm partially occludes the face .
  • Incomplete faces also include situations where part of the captured image is not captured, for example, the driver’s head moves in a large range or rotates at a large angle, causing part of the face to move outside the image capture area of the camera, or the driver’s head Turning (such as turning the head backwards) results in the situation that part of the face cannot be captured by the camera.
  • the state of the driver can be further detected based on the reconstructed 3D face.
  • the head posture includes whether the head moves in a large range or rotates at a large angle, or combined with the head posture changes over a period of time to detect whether the head is in an abnormal state, abnormal state Including frequently lowering the head (for example, it can be used as the basis for judging whether the driver is dozing off or looking at the mobile phone), and for a certain period of time, the head is raised or turned to one side (for example, it can be used as the basis for judging whether the driver is asleep).
  • Another example is based on 3D face detection to detect the facial state of the driver.
  • the facial state includes the opening of the eyes (for example, whether the eye opening is lower than the threshold as a basis for judging whether the driver is sleepy), the opening of the mouth (for example, it can be used as a basis for judging whether the driver is sleepy), Whether the driver is dozing off), the direction of the line of sight (for example, it can be used as the basis for judging the driver's attention).
  • the DMS determines whether to warn the driver based on the detected driver's state and combined with the current driving scene.
  • the driving scene here may refer to the driving scene combined with the current automatic driving level (including automatic driving level L0-L5 level).
  • the threshold for triggering a warning to the driver is relatively low.
  • the threshold for triggering a warning to the driver is relatively high.
  • the DMS can also provide the detected state of the driver to the vehicle control device, and the vehicle control device can judge whether to take over the driving control of the vehicle.
  • the driving control after taking over includes automatically controlling the vehicle to slow down and park on the side of the road, and also includes performing automatic driving (for L4, L5 level automatic driving), such as automatically driving for a certain area (such as driving out of the expressway) or for a period of time until Drive to a safe place and stop.
  • the present application also provides a corresponding embodiment of a face image processing device.
  • this human face image processing device 600 comprises:
  • the acquiring module 610 is configured to acquire local face shape features in the face image, and to acquire face samples matching the local face shape features, and acquire face shape parameters of the face samples. Specifically, this module can be used to execute steps S10-S20 and examples thereof in the above-mentioned face image processing method, or to execute steps S110-S120 and examples therein in the specific implementation of the above-mentioned face image processing method.
  • a generating module 620 configured to generate a three-dimensional human face model using the human face shape parameters. Specifically, this module can be used to execute steps S30 and S40 and the examples thereof in the above-mentioned face image processing method, or to execute steps S122-S132 and the examples thereof in the specific implementation manner of the above-mentioned face image processing method.
  • the generating module 620 is further configured to: fit the generated 3D face model to the point cloud data of the partial face area, the partial face area included in the face face image.
  • the generating module 620 when used for the fitting, it is specifically used to: obtain the key points of the local face area, and obtain the three-dimensional coordinates of the key points; the three-dimensional face model performing pose transformation according to the three-dimensional coordinates of the key points; and fitting the three-dimensional face model after the pose transformation to the point cloud data of the local face area.
  • the point cloud data of the partial face area is obtained according to the pixel depth value of the partial face area and camera parameters.
  • the three-dimensional coordinates of the key points of the partial human face area are obtained according to the depth values of the key points and camera parameters.
  • the acquiring module 610 is specifically configured to: acquire a target area in the face image, where the target area includes the partial face area; acquire the partial face in the target area area; acquiring the local face shape features of the local face area.
  • the acquisition module 610 when used to acquire the partial human face area in the target area, it is specifically used for at least one of the following: acquire the partial human face area according to the color value of the pixel in the target area.
  • a face area acquiring the partial human face area according to the depth value of the pixels in the target area.
  • the acquisition module 610 when used to acquire the face samples matching the local face shape features, it is specifically used to: search the face database for people matching the local face shape features face samples.
  • the generation module 620 is specifically configured to: generate the 3D face model based on the parameterized 3D face model using the face shape parameters.
  • Table 1 shows the failure ratio of traditional 3D face reconstruction and 3D face reconstruction using the method of the embodiment of the present application when estimating the occlusion of different proportions of the face by the method of discrete random event model simulation, and each experiment is carried out 1000 times Second, the result of calculating the proportion of the deformed face structure (that is, the failure ratio of 3D face reconstruction) is as follows:
  • the method of the embodiment of the present application has better robustness in realizing 3D face reconstruction.
  • the proportion of reconstruction failures is low, and, in human The proportion of reconstruction failures with large face occlusions is still low.
  • FIG. 7 is a schematic structural diagram of an electronic device 700 provided by an embodiment of the present application.
  • the electronic device 700 includes: a processor 710 and a memory 720 .
  • the electronic device 700 shown in FIG. 7 may further include a communication interface 730, which may be used for communication with other devices.
  • the processor 710 may be connected to the memory 720 .
  • the memory 720 can be used to store the program codes and data. Therefore, the memory 720 may be a storage unit inside the processor 710, or an external storage unit independent of the processor 710, or may include a storage unit inside the processor 710 and an external storage unit independent of the processor 710. part.
  • the electronic device 700 may also include a bus.
  • the memory 720 and the communication interface 730 may be connected to the processor 710 through a bus.
  • the bus may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the processor 710 may be a central processing unit (central processing unit, CPU).
  • the processor can also be other general-purpose processors, digital signal processors (digital signal processors, DSPs), application specific integrated circuits (Application specific integrated circuits, ASICs), off-the-shelf programmable gate arrays (field programmable gate arrays, FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the processor 710 adopts one or more integrated circuits for executing related programs, so as to realize the technical solutions provided by the embodiments of the present application.
  • the memory 720 may include read-only memory and random-access memory, and provides instructions and data to the processor 710 .
  • a portion of processor 710 may also include non-volatile random access memory.
  • processor 710 may also store device type information.
  • the processor 710 executes the computer-executed instructions in the memory 720 to execute the operation steps of the above-mentioned face image processing method, for example, execute the methods of the above-mentioned embodiments corresponding to FIGS. 2A-5 , or Each of the optional embodiments.
  • the electronic device 700 may correspond to a corresponding subject performing the methods according to the various embodiments of the present application, and the above-mentioned and other operations and/or functions of the modules in the electronic device 700 are for realizing the present invention For the sake of brevity, the corresponding processes of the methods in the embodiments are not repeated here.
  • FIG. 8 is a schematic structural diagram of another electronic device 800 provided in this embodiment, including: a processor 810, and an interface circuit 820, wherein, The processor 810 accesses the memory through the interface circuit 820, and the memory stores program instructions. When the program instructions are executed by the processor, the processor executes the methods of the above-mentioned embodiments corresponding to FIGS. 2A-5 , or the optional embodiments therein. .
  • the electronic device may further include a communication interface, a bus, etc. For details, refer to the introduction in the embodiment shown in FIG. 7 , and details are not repeated here.
  • the embodiment of the present application also provides a vehicle 100, including an image acquisition device 110 for collecting face images, and a face image processing device 120 and various embodiments included therein, or, electronic device 130 .
  • the face image collected by the image acquisition device 110 is provided to the face image processing device 120 or provided to the electronic device 130, and the above-mentioned face image processing method is realized by the face image processing device 120 or the electronic device 130 according to the face image And various embodiments, to realize the reconstruction of 3D face.
  • the image acquisition device 110 may be a camera, such as a camera, where the camera may be a binocular camera, an RGB-D camera, an IR camera, etc., and the camera may be installed on the vehicle as required.
  • the example shown in FIG. 1A and FIG. 1B can be a binocular camera composed of two cameras arranged on the left and right A-pillars of the vehicle cockpit. In some other embodiments, it can also be installed on the passenger side of the rearview mirror in the cockpit of the vehicle, and can also be installed on the steering wheel, the area near the center console, and the like.
  • the image acquisition device 110 can also be an electronic device that receives the occupant image data transmitted by the camera, such as a data transmission chip, such as a bus data transceiver chip, a network interface chip, etc., and the data transmission chip can also be Wireless transmission chips, such as Bluetooth chips or WiFi chips.
  • FIG. 10 is a schematic structural diagram of a computing device 900 provided by an embodiment of the present application.
  • the computing device 900 includes: a processor 910 , a memory 920 , and may also include a communication interface 930 .
  • the communication interface 930 in the computing device 900 shown in FIG. 10 can be used to communicate with other devices.
  • the processor 910 may be connected to the memory 920 .
  • the memory 920 can be used to store the program codes and data. Therefore, the memory 920 may be a storage unit inside the processor 910, or an external storage unit independent of the processor 910, or may include a storage unit inside the processor 910 and an external storage unit independent of the processor 910. part.
  • computing device 900 may further include a bus.
  • the memory 920 and the communication interface 930 may be connected to the processor 910 through a bus.
  • the bus can be a PCI bus or an EISA bus or the like.
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the processor 910 executes the computer-executed instructions in the memory 920 to perform the operation steps of the above method.
  • the computing device 900 may correspond to a corresponding body executing the methods according to the various embodiments of the present application, and the above-mentioned and other operations and/or functions of the modules in the computing device 900 are for realizing the present invention For the sake of brevity, the corresponding processes of the methods in the embodiments are not repeated here.
  • the embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, it is used to execute the above-mentioned face image processing method, and the method includes the solutions described in the above-mentioned various embodiments at least one of the .
  • the embodiment of the present application also provides a computer program product, including program instructions.
  • program instructions When the program instructions are executed by a computer, the above-mentioned face image processing method is implemented, and the method includes at least one of the solutions described in the above-mentioned embodiments. one.
  • the disclosed methods and devices may be implemented in other ways.
  • the device embodiments described above are only illustrative, and the division of the units is only a logical function division. In actual implementation, there may be other division methods, for example, multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The present application belongs to the field of machine vision. Provided is a facial image processing method. The method comprises: firstly, acquiring a local facial shape feature from a facial image; then acquiring, from a face database, a face sample matching the local facial shape feature, and acquiring a facial shape parameter of the facial sample; then, on the basis of a parameterized facial model, generating a three-dimensional facial model by using the facial shape parameter; and performing fitting on the basis of point cloud data of a local facial area, so as to complete three-dimensional facial reconstruction. On the basis of a local facial shape feature, three-dimensional facial reconstruction when a face is partially covered or cropped can be realized; and the present application can be applied to the field of intelligent vehicles, for example, a driver monitoring system.

Description

人脸图像处理方法、装置和车辆Face image processing method, device and vehicle 技术领域technical field
本发明涉及机器视觉技术领域,尤其涉及人脸图像处理方法、装置和车辆。The invention relates to the technical field of machine vision, in particular to a face image processing method, device and vehicle.
背景技术Background technique
三维(three-dimensional,3D)人脸重建,是机器视觉、计算机视觉和计算机图形学领域的研究热点。3D人脸的重建是虚拟现实/增强现实、自动驾驶、机器人等领域的核心技术之一,且在智能汽车领域驾驶员状态监测系统(driver monitoring system,DMS)中具有极大的应用价值。Three-dimensional (3D) face reconstruction is a research hotspot in the fields of machine vision, computer vision and computer graphics. 3D face reconstruction is one of the core technologies in the fields of virtual reality/augmented reality, automatic driving, robotics, etc., and has great application value in the driver monitoring system (DMS) in the field of smart cars.
例如,在车辆的处于辅助驾驶或自动驾驶状态时,会允许驾驶员从一些任务中解脱出来,但又需要驾驶员随时准备接管车辆,从而使得实时监控驾驶员注意力状态至关重要。3D人脸重建是监测驾驶员头部姿态、视线方向的基础技术之一,直接影响到人机交互、DMS的性能。For example, when the vehicle is in an assisted driving or automatic driving state, it will allow the driver to be freed from some tasks, but requires the driver to be ready to take over the vehicle at any time, making real-time monitoring of the driver's attention status critical. 3D face reconstruction is one of the basic technologies for monitoring the driver's head posture and gaze direction, which directly affects the performance of human-computer interaction and DMS.
对车辆乘员进行状态监测时,经常有人手、方向盘、手机、食物导致的人脸图像遮挡情况的出现,而遮挡对3D人脸重建的性能有很大的影响。遮挡人脸图像可分为无意遮挡和有意遮挡。常见的无意遮挡包括眼镜、方向盘、他人挡住被监测人员脸部的情形,而有意遮挡通常包括墨镜、口罩或是其他物体挡住面部五官的情况。有意遮挡通常由于特征变化过大,会导致3D人脸重建失败,无意遮挡通常仅遮挡小部分面部特征,容易导致特征提取过程中引入过多干扰特征,使3D人脸重建失真。When monitoring the status of vehicle occupants, there are often occlusions of face images caused by hands, steering wheels, mobile phones, and food, and occlusions have a great impact on the performance of 3D face reconstruction. Occluded face images can be divided into unintentional occlusion and intentional occlusion. Common unintentional occlusions include glasses, steering wheels, and others blocking the face of the monitored person, while intentional occlusions usually include sunglasses, masks, or other objects blocking facial features. Intentional occlusion usually results in the failure of 3D face reconstruction due to excessive feature changes, while unintentional occlusion usually only covers a small part of facial features, which can easily lead to the introduction of too many interference features in the feature extraction process, resulting in distortion of 3D face reconstruction.
遮挡物体的不确定性以及遮挡区域的不确定性,使人脸图像的固有特征呈现为各种局部特征的缺失,限制了3D人脸重建的应用场景,因此,有待提供一种在遮挡条件下具有良好鲁棒性的3D人脸重建新方法。The uncertainty of the occluded object and the occluded area make the inherent features of the face image appear as the lack of various local features, which limits the application scenarios of 3D face reconstruction. Therefore, it is necessary to provide a A novel method for 3D face reconstruction with good robustness.
发明内容Contents of the invention
鉴于以上问题,本申请实施例提供了一种人脸图像处理方案,该方案包括人脸图像处理方法、装置、车辆、计算设备、计算机可读存储介质和计算机程序产品,可以实现在人脸部分被遮挡条件下的三维人脸重建。In view of the above problems, the embodiment of the present application provides a face image processing solution, which includes a face image processing method, device, vehicle, computing device, computer-readable storage medium and computer program product, which can be implemented in the face part 3D face reconstruction under occluded conditions.
为达到上述目的,本申请第一方面,提供了一种人脸图像处理方法,包括:获取人脸图像中的局部人脸形状特征;获取与局部人脸形状特征匹配的人脸样本,获取人脸样本的人脸形状参数;使用人脸形状参数生成三维人脸模型。In order to achieve the above object, the first aspect of the present application provides a face image processing method, including: acquiring the local face shape features in the face image; Facial shape parameters of the face sample; use the facial shape parameters to generate a 3D face model.
本申请实施例根据与局部人脸形状相匹配的人脸样本的信息生成三维人脸模型,降低了遮挡物的干扰,适用于人脸遮挡场景下的3D人脸重建,具有高鲁棒性的特点。The embodiment of the present application generates a 3D face model based on the information of the face sample that matches the local face shape, which reduces the interference of occluders, and is suitable for 3D face reconstruction in face occlusion scenarios with high robustness. features.
当基于本申请实施例的人脸图像处理方法应用于智能车辆(例如智能车辆中的驾驶员状态监测系统)时,可以实现在乘员(乘员指驾驶员或乘客)脸部有遮挡的情况下进行三维人脸重建,并进一步可基于重建的三维人脸进行头部姿态和/或视线方向进行识别,由此可以提升驾驶员状态监测系统的鲁棒性与稳定性。When the face image processing method based on the embodiment of the present application is applied to a smart vehicle (such as a driver status monitoring system in a smart vehicle), it can be realized that the face image processing method of the occupant (the occupant refers to the driver or passenger) has a blocked face. 3D face reconstruction, and further recognition of head posture and/or gaze direction based on the reconstructed 3D face, which can improve the robustness and stability of the driver status monitoring system.
作为第一方面的一种可能的实现方式,还包括:将生成的三维人脸模型与局部人脸区域的点云数据进行拟合,该局部人脸区域包含于人脸图像。As a possible implementation manner of the first aspect, it also includes: fitting the generated 3D face model to point cloud data of a local face area, the local face area being included in the face image.
由上,通过上述拟合过程,可以实现进一步细节的3D人脸重建,或称为3D人脸的优化。其中,该拟合可以包括拟合嘴唇,眼睛等参数,可以实现所重建的3D人脸与用户真实外观更为贴近。From the above, through the above fitting process, further detailed 3D face reconstruction, or 3D face optimization, can be realized. Wherein, the fitting may include fitting parameters such as lips and eyes, so that the reconstructed 3D face is closer to the real appearance of the user.
作为第一方面的一种可能的实现方式,上述拟合包括:获取局部人脸区域的关键点,获取关键点的三维坐标;将三维人脸模型根据关键点的三维坐标进行姿态变换;将姿态变换后的三维人脸模型与局部人脸区域的点云数据进行拟合。As a possible implementation of the first aspect, the above fitting includes: obtaining key points of the local face area, and obtaining the three-dimensional coordinates of the key points; performing pose transformation on the three-dimensional face model according to the three-dimensional coordinates of the key points; The transformed 3D face model is fitted with the point cloud data of the local face area.
由上,首先以关键点3D坐标为目标实现3D人脸模型的刚体变换,实现与局部人脸区域点云数据的初步对齐,该过程可以采用ICP算法实现,由于目标数据,即关键点,数量较小,因此该初步对齐计算量小,对齐速度快。进一步与局部人脸区域点云数据进行形状拟合,完成形状的优化过程时,可以采用拟牛顿算法实现,拟牛顿算法的快速收敛性和运算的复杂性低的特性。From the above, firstly, the rigid body transformation of the 3D face model is realized with the 3D coordinates of the key points as the target, and the preliminary alignment with the point cloud data of the local face area is realized. This process can be realized by using the ICP algorithm. is small, so the calculation amount of the preliminary alignment is small and the alignment speed is fast. To further perform shape fitting with the point cloud data of the local face area, and to complete the shape optimization process, the quasi-Newton algorithm can be used to realize the fast convergence of the quasi-Newton algorithm and the characteristics of low computational complexity.
作为第一方面的一种可能的实现方式,局部人脸区域的点云数据根据局部人脸区域的像素深度值和相机参数获得。As a possible implementation of the first aspect, the point cloud data of the local face area is obtained according to the pixel depth value of the local face area and camera parameters.
由上,本申请可以采用双目相机、RGB-D相机、红外相机等来实现,实现的成本相对其它图像感知和采集装置较低。这里的相机不限于传统相机,也包括摄像头等图像采集装置。From the above, this application can be implemented by using binocular cameras, RGB-D cameras, infrared cameras, etc., and the cost of implementation is lower than that of other image perception and acquisition devices. The cameras here are not limited to traditional cameras, but also include image acquisition devices such as cameras.
作为第一方面的一种可能的实现方式,局部人脸区域的关键点的三维坐标根据关键点的深度值和相机参数获得。As a possible implementation of the first aspect, the three-dimensional coordinates of the key points in the local face area are obtained according to the depth values of the key points and camera parameters.
由上,本申请可以采用双目相机、RGB-D相机来实现,实现的成本相对其它图像感知和采集装置较低。From the above, this application can be realized by using binocular cameras and RGB-D cameras, and the cost of implementation is lower than that of other image perception and acquisition devices.
作为第一方面的一种可能的实现方式,获取人脸图像中的局部人脸形状特征,包括:获取人脸图像中的目标区域,目标区域包括局部人脸区域;获取目标区域中的局部人脸区域;获取局部人脸区域的局部人脸形状特征。As a possible implementation of the first aspect, obtaining the local face shape features in the face image includes: obtaining a target area in the face image, where the target area includes a local face area; Face area; obtain the local face shape features of the local face area.
由上,通过先从图像中获得目标区域,再从目标区域中获得局部人脸区域的两步骤方式,相比直接从图像获得局部人脸区域的方式,可以使得实现该过程的神经网络整体复杂度低,也容易训练。From the above, the two-step method of obtaining the target area from the image first, and then obtaining the local face area from the target area, compared with the method of directly obtaining the local face area from the image, can make the neural network for this process as a whole complex. Low weight and easy to train.
作为第一方面的一种可能的实现方式,获取目标区域中的局部人脸区域,包括以下方式中的至少一种:根据目标区域像素的颜色值获取局部人脸区域;根据目标区域像素的深度值获取局部人脸区域。As a possible implementation of the first aspect, obtaining the local face area in the target area includes at least one of the following methods: obtaining the local face area according to the color value of the pixel in the target area; according to the depth of the pixel in the target area The value gets the partial face area.
由上,可以根据目标区域像素的颜色值获取局部人脸区域。这是由于人脸肤色与非人脸部分,例如人脸肤色与背景、某些遮挡物(如口罩、水杯、水瓶等)可以区分出来,因此,可以基于像素颜色值提取出目标区域内的局部人脸区域。也可以根据目标区域像素的深度值获取局部人脸区域。这是由于在空间上(或称在深度上)人脸的空间位置(或深度)与某些非人脸部分,例如背景、某些非片状遮挡物(如水杯、水瓶、手、胳膊)可以区分出来,因此可以基于像素深度值提取局部人脸区域。在一些可能的实现方式中,可以是先从目标区域图像中提取人脸被遮挡区域,再针对目标区域与人脸被遮挡区域的差异区域,根据上述像素的颜色值、和/或深度值,来提取局部 人脸区域。From the above, the local face area can be obtained according to the color value of the pixel in the target area. This is because the skin color of the face can be distinguished from the non-face parts, such as the skin color of the face and the background, and some occluders (such as masks, cups, water bottles, etc.), so the local area in the target area can be extracted based on the pixel color value. face area. The local face area can also be obtained according to the depth value of the pixel in the target area. This is due to the spatial position (or depth) of the human face in space (or in depth) and some non-face parts, such as the background, some non-flaky occluders (such as water cups, water bottles, hands, arms) can be distinguished, so local face regions can be extracted based on pixel depth values. In some possible implementations, the occluded area of the human face can be extracted from the image of the target area first, and then for the difference area between the target area and the occluded area of the human face, according to the color value and/or depth value of the above pixel, to extract local face regions.
作为第一方面的一种可能的实现方式,获取与局部人脸形状特征匹配的人脸样本,包括:在人脸数据库中检索与局部人脸形状特征匹配的人脸样本。As a possible implementation manner of the first aspect, obtaining the face samples matching the local face shape features includes: retrieving the face samples matching the local face shape features in a face database.
由上,可预先基于真实人脸构成人脸数据库,海量的数据从概率上能提高匹配出的人脸样本的精度。并且,基于人脸数据库获得完整的真实人脸的样本,会使得本申请实施例重建的3D人脸更具有真实感。From the above, a face database can be constructed based on real faces in advance, and massive data can improve the accuracy of matched face samples in terms of probability. Moreover, obtaining a complete real face sample based on the face database will make the 3D face reconstructed in the embodiment of the present application more realistic.
作为第一方面的一种可能的实现方式,使用人脸形状参数生成三维人脸模型,包括:基于参数化三维人脸模型,使用人脸形状参数生成三维人脸模型。As a possible implementation manner of the first aspect, using face shape parameters to generate a 3D face model includes: generating a 3D face model based on a parameterized 3D face model using face shape parameters.
由上,基于参数化三维人脸模型的方式生成三维人脸模型,可以充分利用所得到的人脸形状参数。From the above, generating a 3D face model based on a parameterized 3D face model can make full use of the obtained face shape parameters.
本申请第二方面提供了一种人脸图像处理装置,包括:获取模块,用于获取人脸图像中的局部人脸形状特征,以及获取与局部人脸形状特征匹配的人脸样本,获取人脸样本的人脸形状参数;生成模块,用于生成三维人脸模型。The second aspect of the present application provides a face image processing device, including: an acquisition module, used to acquire the local face shape feature in the face image, and acquire a face sample matching the local face shape feature, and acquire the face The face shape parameters of the face sample; the generation module is used to generate a three-dimensional face model.
作为第二方面的一种可能的实现方式,生成模块还用于:将生成的三维人脸模型与局部人脸区域的点云数据进行拟合。As a possible implementation of the second aspect, the generating module is further configured to: fit the generated 3D face model with point cloud data of a local face area.
作为第二方面的一种可能的实现方式,生成模块用于拟合时,具体用于:获取局部人脸区域的关键点,获取关键点的三维坐标;将三维人脸模型根据关键点的三维坐标进行姿态变换;将姿态变换后的三维人脸模型与局部人脸区域的点云数据进行拟合。As a possible implementation of the second aspect, when the generation module is used for fitting, it is specifically used to: obtain the key points of the local face area, and obtain the three-dimensional coordinates of the key points; The coordinates are transformed into a pose; the 3D face model after pose transformation is fitted with the point cloud data of the local face area.
作为第二方面的一种可能的实现方式,局部人脸区域的点云数据根据局部人脸区域的像素深度值和相机参数获得。As a possible implementation of the second aspect, the point cloud data of the local face area is obtained according to the pixel depth value of the local face area and camera parameters.
作为第二方面的一种可能的实现方式,局部人脸区域的关键点的三维坐标根据关键点的深度值和相机参数获得。As a possible implementation of the second aspect, the three-dimensional coordinates of the key points in the local face area are obtained according to the depth values of the key points and camera parameters.
作为第二方面的一种可能的实现方式,其特征在于,获取模块具体用于:获取图像中的目标区域,目标区域包括局部人脸区域;获取目标区域中的局部人脸区域;获取局部人脸区域的局部人脸形状特征。As a possible implementation of the second aspect, it is characterized in that the acquisition module is specifically used to: acquire the target area in the image, the target area includes a partial face area; acquire the partial face area in the target area; acquire the partial face area Local face shape features for face regions.
作为第二方面的一种可能的实现方式,获取模块用于获取目标区域中的局部人脸区域时,具体用于以下方式中的至少一种:根据目标区域像素的颜色值获取局部人脸区域;根据目标区域像素的深度值获取局部人脸区域。As a possible implementation of the second aspect, when the acquisition module is used to acquire the partial face area in the target area, it is specifically used in at least one of the following ways: acquire the partial face area according to the color value of the pixel in the target area ; Obtain the local face area according to the depth value of the pixel in the target area.
作为第二方面的一种可能的实现方式,获取模块用于获取与局部人脸形状特征匹配的人脸样本时,具体用于:在人脸数据库中检索与局部人脸形状特征匹配的人脸样本。As a possible implementation of the second aspect, when the acquiring module is used to acquire face samples that match local face shape features, it is specifically used to: retrieve faces in a face database that match local face shape features sample.
作为第二方面的一种可能的实现方式,生成模块具体用于:基于参数化三维人脸模型,使用人脸形状参数生成三维人脸模型。As a possible implementation of the second aspect, the generation module is specifically configured to: generate a three-dimensional face model based on a parameterized three-dimensional face model by using face shape parameters.
本申请第三方面提供了一种电子装置,包括:处理器,以及存储器,存储器上存储有程序指令,程序指令当被处理器执行时,实现上述第一方面提供的任一项人脸图像处理方法。The third aspect of the present application provides an electronic device, including: a processor, and a memory, on which program instructions are stored, and when the program instructions are executed by the processor, any one of the face image processing provided in the first aspect above can be realized method.
本申请第四方面提供了一种电子装置,包括:处理器,以及接口电路,其中,处理器通过接口电路访问存储器,存储器上存储有程序指令,程序指令当被处理器执行时,实现上述第一方面提供的任一项人脸图像处理方法。The fourth aspect of the present application provides an electronic device, including: a processor, and an interface circuit, wherein the processor accesses the memory through the interface circuit, and the memory stores program instructions. When the program instructions are executed by the processor, the above-mentioned first In one aspect, any face image processing method provided.
本申请第五方面提供了一种车辆包括:图像采集装置,用于采集人脸图像,以及上述第一方面提供的任一项人脸图像处理装置,或,上述第三方面或第四方面提供的电子装置。The fifth aspect of the present application provides a vehicle including: an image acquisition device for acquiring face images, and any one of the face image processing devices provided in the first aspect above, or, the third aspect or the fourth aspect above. of electronic devices.
本申请第六方面提供了一种计算机可读存储介质,计算机可读存储介质中存储有程序指令,程序指令当被计算机执行时,使得计算机实现上述第一方面提供的任一项人脸图像处理方法。The sixth aspect of the present application provides a computer-readable storage medium. The computer-readable storage medium stores program instructions. When the program instructions are executed by the computer, the computer can realize any one of the face image processing provided by the above-mentioned first aspect. method.
本申请第七方面提供了一种计算机程序产品,其包括有程序指令,程序指令当被计算机执行时,使得计算机实现上述第一方面提供的任一项人脸图像处理方法。The seventh aspect of the present application provides a computer program product, which includes program instructions. When the program instructions are executed by a computer, the computer implements any one of the face image processing methods provided in the first aspect above.
综上,本申请实施例采用的人脸图像处理的方案,对于存在遮挡的人脸,通过对局部人脸区域(即人脸未遮挡的部分)进行局部人脸特征提取,再匹配到一相似的人脸样本,根据该人脸样本的形状参数去建立3D人脸模型,解决了人脸被遮挡情况下的3D人脸重建。另一方面,在进行局部人脸特征提取,以及匹配相似人脸样本过程中,通过采用稀疏特征,可以降低进行局部人脸特征提取的神经网络的复杂度,并且在匹配过程中可以提高运行效率。另一方面通过局部人脸区域与3D人头模型进行拟合优化,实现所重建的3D人脸与用户真实外观更为贴近。To sum up, the face image processing scheme adopted in the embodiment of the present application, for a face with occlusion, extracts local face features from the local face area (that is, the part of the face that is not occluded), and then matches to a similar face image. A 3D face model is established according to the shape parameters of the face sample, which solves the 3D face reconstruction when the face is occluded. On the other hand, in the process of extracting local face features and matching similar face samples, by using sparse features, the complexity of the neural network for local face feature extraction can be reduced, and the operating efficiency can be improved during the matching process. . On the other hand, through the fitting optimization of the local face area and the 3D head model, the reconstructed 3D face is closer to the real appearance of the user.
附图说明Description of drawings
图1A为本申请实施例应用于车辆的场景示意图;FIG. 1A is a schematic diagram of a scene where an embodiment of the present application is applied to a vehicle;
图1B为本申请实施例应用于车辆的场景示意图;FIG. 1B is a schematic diagram of a scene where the embodiment of the present application is applied to a vehicle;
图2A为本申请实施例人脸图像处理方法的流程图;Fig. 2A is the flowchart of the face image processing method of the embodiment of the present application;
图2B为本申请实施例人脸图像处理方法的示意图;FIG. 2B is a schematic diagram of a face image processing method according to an embodiment of the present application;
图3为本申请一实施例中的局部人脸区域提取的流程图;Fig. 3 is the flow chart of the partial human face area extraction in an embodiment of the present application;
图4为本申请一实施例中的3D人脸模型与局部人脸区域点云的拟合流程图;Fig. 4 is the fitting flow chart of 3D face model and local face region point cloud in one embodiment of the present application;
图5为申请人脸图像处理方法的一具体实施方式的流程图;Fig. 5 is a flow chart of a specific embodiment of the applicant's face image processing method;
图6为本申请3D人脸重建装置的一实施例的示意图;6 is a schematic diagram of an embodiment of the 3D face reconstruction device of the present application;
图7为本申请实施例提供的电子装置的示意图;FIG. 7 is a schematic diagram of an electronic device provided in an embodiment of the present application;
图8为本申请实施例提供的另一电子装置的示意图;FIG. 8 is a schematic diagram of another electronic device provided by an embodiment of the present application;
图9A为本申请实施例提供的一种车辆的示意图;FIG. 9A is a schematic diagram of a vehicle provided in an embodiment of the present application;
图9B为本申请实施例提供的一种车辆的示意图;FIG. 9B is a schematic diagram of a vehicle provided in the embodiment of the present application;
图10为本申请计算设备的一实施例的示意图。FIG. 10 is a schematic diagram of an embodiment of a computing device of the present application.
应理解,上述结构示意图中,各框图的尺寸和形态仅供参考,不应构成对本申请实施例的排他性的解读。结构示意图所呈现的各框图间的相对位置和包含关系,仅为示意性地表示各框图间的结构关联,而非限制本申请实施例的物理连接方式。It should be understood that in the above structural diagrams, the size and shape of each block diagram are for reference only, and should not constitute an exclusive interpretation of the embodiment of the present application. The relative positions and containment relationships among the block diagrams shown in the structural schematic diagram are only schematic representations of the structural relationships among the block diagrams, rather than limiting the physical connection methods of the embodiments of the present application.
具体实施方式detailed description
下面结合附图并举实施例,对本申请提供的技术方案作进一步说明。应理解,本申请实施例中提供的系统结构和业务场景主要是为了说明本申请的技术方案的可能的实施方式,不应被解读为对本申请的技术方案的唯一限定。本领域普通技术人员可知,随着系统结构的演进和新业务场景的出现,本申请提供的技术方案对类似技术问 题同样适用。The technical solutions provided by the present application will be further described below in conjunction with the accompanying drawings and examples. It should be understood that the system structure and business scenarios provided in the embodiments of the present application are mainly for illustrating possible implementations of the technical solution of the present application, and should not be interpreted as the only limitation on the technical solution of the present application. Those skilled in the art know that with the evolution of the system structure and the emergence of new business scenarios, the technical solutions provided in this application are also applicable to similar technical problems.
应理解,本申请实施例提供的人脸图像处理方案,包括人脸图像处理方法及装置、计算设备、计算机可读存储介质及计算机程序产品。由于这些技术方案解决问题的原理相同或相似,在如下具体实施例的介绍中,某些重复之处可能不再赘述,但应视为这些具体实施例之间已有相互引用,可以相互结合。It should be understood that the face image processing solution provided in the embodiments of the present application includes a face image processing method and device, a computing device, a computer-readable storage medium, and a computer program product. Since the principles of these technical solutions to solve problems are the same or similar, in the introduction of the following specific embodiments, some repetitions may not be repeated, but it should be considered that these specific embodiments have been referred to each other and can be combined with each other.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。如有不一致,以本说明书中所说明的含义或者根据本说明书中记载的内容得出的含义为准。另外,本文中所使用的术语旨在描述本申请实施例的目的,而非限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. In case of any inconsistency, the meaning stated in this manual or the meaning derived from the content recorded in this manual shall prevail. In addition, the terms used herein are intended to describe the embodiments of the present application, but not to limit the present application.
为了准确地对本申请中的技术内容进行叙述,以及为了准确地理解本发明,在对具体实施方式进行说明之前先对本说明书中所使用的术语给出如下的解释说明或定义:In order to accurately describe the technical content in this application, and in order to accurately understand the present invention, the following explanations or definitions are given to the terms used in this specification before describing the specific embodiments:
1)具有深度的图像数据:其包括普通的红绿蓝(Red、Green、Blue,RGB)彩色图像信息和深度信息,且RGB图像信息和深度信息是配准的,即像素点之间具有一对一的对应关系。可以通过RGB-深度(GRB-Depth,RGB-D)相机实现具有深度的图像数据的采集,所采集的具有深度的图像数据可以以一RGB图像帧和一深度图像帧的方式呈现,也可以整合为一个图像数据的方式呈现。根据相机的内参,可实现深度信息与点云坐标的变换。1) Image data with depth: it includes ordinary red, green, blue (Red, Green, Blue, RGB) color image information and depth information, and the RGB image information and depth information are registered, that is, there is a distance between pixels One-to-one correspondence. The collection of image data with depth can be realized through RGB-depth (GRB-Depth, RGB-D) cameras, and the collected image data with depth can be presented in the form of an RGB image frame and a depth image frame, and can also be integrated Presented as an image data. According to the internal parameters of the camera, the transformation between depth information and point cloud coordinates can be realized.
2)关于本申请实施例中所提到的各个区域的定义:2) Regarding the definitions of the various regions mentioned in the embodiments of this application:
目标区域(Region of Interest,ROI):本申请实施例中指待识别图像中的人脸目标框区域。该区域既可以是包含被遮挡的人脸图像的图像,也可以是经裁切的人脸区域图像。Region of Interest (ROI): In the embodiment of this application, it refers to the face target frame area in the image to be recognized. The region can be either an image containing an occluded human face image, or a cropped human face region image.
局部人脸区域:本申请实施例中指目标区域中人脸的可见区域部分,即未被遮挡的区域。Partial face area: In this embodiment of the application, it refers to the visible area of the face in the target area, that is, the unoccluded area.
被遮挡区域:本申请实施例中指人脸被遮挡的区域。Occluded area: In this embodiment of the application, it refers to the area where the human face is occluded.
3)参数化人脸模型:通过一标准脸(或称为平均脸、参考脸、基础形状脸、统计人脸),结合形状特征向量、姿态特征向量或表情特征向量来表示一人脸的方式。例如三维可变性人脸模型(3D Morphable Face Model,3DMM)、FLAME模型等。3) Parametric face model: a way to represent a face by combining a standard face (or called an average face, a reference face, a basic shape face, and a statistical face) in combination with shape feature vectors, pose feature vectors, or expression feature vectors. For example, 3D morphable face model (3D Morphable Face Model, 3DMM), FLAME model, etc.
4)FLAME模型:FLAME模型是基于凯撒(CAESAR)数据的真实人体点云,其中,将这些真实人体的人头数据注册得到每个真实人头网格,人头网格包含人脸和头部整个区域,由此建立了一个真实的人脸、人头数据库。人头网格由若干(如5023个)顶点和若干(如9976个)三角面组成,并用主成分分析(Principal Component Analysis,PCA)法得到若干(如300个)形状(shape)、若干(如100个)表情(expression)和若干(如15个)姿态(pose)主成分,从而可以据此确定一个参数化3D人头模型。4) FLAME model: The FLAME model is a real human body point cloud based on CAESAR data, where each real head grid is obtained by registering the head data of these real human bodies, and the head grid includes the entire area of the face and head. Thus, a real face and head database is established. The human head grid is composed of several (such as 5023) vertices and several (such as 9976) triangular faces, and several (such as 300) shapes (shape), several (such as 100 1) expression (expression) and several (such as 15) posture (pose) principal components, so that a parameterized 3D human head model can be determined accordingly.
具体地讲,通过网格的顶点位置定义,将FLAME的形状T定义为构成网格的各个顶点k的坐标,可描述为如下公式(1):Specifically, through the definition of the vertex position of the grid, the shape T of FLAME is defined as the coordinates of each vertex k constituting the grid, which can be described as the following formula (1):
T=(x 1,y 1,z 1,x 2,...,x n,y n,z n)  (1) T=(x 1 , y 1 , z 1 , x 2 , . . . , x n , y n , z n ) (1)
其中,FLAME单独地对形状和表情建模,FLAME人脸模型可以描述为如下公式(2):Among them, FLAME models the shape and expression separately, and the FLAME face model can be described as the following formula (2):
T(V;p,q)=T 0+B s(q;S)+B p(p;E)  (2) T(V;p,q)=T 0 +B s (q;S)+B p (p;E) (2)
其中,T 0是标准脸,即表示人脸的平均形状部分;B s(q;S)表示人脸形状混合参数,例如可以是∑ iq iS i,i=1至n,Si表示协方差阵的特征向量,是人脸形状向量参数(上述形状主成分);q是人脸形状向量参数对应的系数。B p(p;E)表示人脸表情混合参数例如可以是∑ ip iE i,i=1至l,Ei表示协方差阵的特征向量,是人脸表情向量参数(上述表情主成分);p是人脸表情向量参数对应的系数。 Among them, T 0 is a standard face, that is, the average shape part of the face; B s (q; S) represents the mixing parameter of the face shape, for example, it can be ∑ i q i S i , i=1 to n, and Si represents the association The eigenvector of the variance matrix is the face shape vector parameter (the above-mentioned shape principal component); q is the coefficient corresponding to the face shape vector parameter. B p (p; E) represents the facial expression mixing parameter such as can be ∑ i p i E i , i=1 to l, Ei represents the eigenvector of covariance matrix, is the human face expression vector parameter (above-mentioned expression principal component) ; p is the coefficient corresponding to the facial expression vector parameter.
由上,对于人脸形状部分(本申请实施例中可记为T(S))的建模,可表示为基础形状T 0加n个形状向量Si的线性组合,可描述为如下公式(3): From the above, the modeling of the face shape part (which can be recorded as T(S) in the embodiment of the present application) can be expressed as a linear combination of the basic shape T 0 plus n shape vectors Si, which can be described as the following formula (3 ):
T(S)=T 0+B s(q;S)=T 0+∑ iq iS i i=1至n  (3) T(S)=T 0 +B s (q; S)=T 0 +∑ i q i S i i=1 to n (3)
由于T 0和S i是FLAME提供,因此,当求得各个qi后,将qi带入公式(3)即可生成针对人脸形状部分的3D人脸模型。 Since T 0 and S i are provided by FLAME, after obtaining each qi, put qi into formula (3) to generate a 3D face model for the face shape.
5)对3D人脸模型优化姿态角度,即将3D人脸模型变换到目标位置,也称为刚体变换,或称为几何配准。在建立上述3D人脸模型后,该模型各顶点的3D位置也就确定了,换句话说,所述各顶点的3D位置可通过系数qi和等式(3)给出的3D人脸模型确定。然后可以通过刚体变换将模型相应顶点k的坐标X k=(x k,y k,z k)变换到目标位置,刚体变换可描述为如下公式(4): 5) Optimizing the attitude angle of the 3D face model, that is, transforming the 3D face model to the target position, also known as rigid body transformation, or geometric registration. After the above-mentioned 3D face model is established, the 3D position of each vertex of the model is also determined, in other words, the 3D position of each vertex can be determined by the 3D face model given by the coefficient qi and equation (3) . Then the coordinates X k = (x k , y k , z k ) of the corresponding vertex k of the model can be transformed to the target position through rigid body transformation, which can be described as the following formula (4):
Figure PCTCN2021104294-appb-000001
Figure PCTCN2021104294-appb-000001
其中,(w x,k,w y,k,w z,k)表示目标位置,在本申请实施例中,目标位置为局部人脸区域的各关键点的3D坐标,通过若干个关键点,实现将整个3D人脸模型的顶点与相机坐标系中点云初步对齐;
Figure PCTCN2021104294-appb-000002
表示三轴的旋转参数,t w表示平移参数。
Among them, (w x, k , w y, k , w z, k ) represents the target position. In the embodiment of the present application, the target position is the 3D coordinates of each key point in the local face area. Through several key points, Realize the initial alignment of the vertices of the entire 3D face model with the point cloud in the camera coordinate system;
Figure PCTCN2021104294-appb-000002
Indicates the rotation parameters of the three axes, and t w indicates the translation parameters.
其中,可以采用点云匹配算法(点云匹配就是求解两堆点云之间的变换关系,即求解上述旋转参数和平移参数)进行角度和姿态的优化。常见的点云匹配算法如迭代最近点算法(Iterative Closest Point,ICP)、正态分布变换算法(Normal Distribution Transform,NDT)、迭代对偶对应算法(Iterative dual correspondences,IDC)等等。Among them, the point cloud matching algorithm can be used (point cloud matching is to solve the transformation relationship between two piles of point clouds, that is, to solve the above-mentioned rotation parameters and translation parameters) to optimize the angle and attitude. Common point cloud matching algorithms such as Iterative Closest Point (ICP), Normal Distribution Transform (NDT), Iterative Dual Correspondences (IDC) and so on.
6)通过拟牛顿算法进行形状拟合:拟牛顿算法是迭代算法之一,拟牛顿算法采用二阶收敛,相对于常规的梯度下降法收敛速度更快,拟牛顿法相对牛顿法运算复杂度更低。6) Shape fitting by quasi-Newton algorithm: Quasi-Newton algorithm is one of the iterative algorithms. Quasi-Newton algorithm adopts second-order convergence. Compared with the conventional gradient descent method, the convergence speed is faster, and the quasi-Newton method is more complex than the Newton method. Low.
本申请实施例中,给出局部人脸区域的点云数据,通过最小化目标函数进一步优化人脸形状系数,其中目标函数如下公式(5),为人脸点云3D坐标与重建的模型顶点之间的平方差的和,该目标函数为凸函数,通过凸优化经典算法之一----所述拟牛顿算法进行迭代求解。In the embodiment of the present application, the point cloud data of the local face area is given, and the face shape coefficient is further optimized by minimizing the objective function, wherein the objective function is the following formula (5), which is the relationship between the 3D coordinates of the face point cloud and the reconstructed model vertices The sum of the squared differences between them, the objective function is a convex function, which is solved iteratively through one of the classic convex optimization algorithms - the quasi-Newton algorithm.
目标函数:L S=∑ i‖(P x,i,P y,i,P z,i)-I i(V x,i,V y,i,V z,i)‖ 2  (5) Objective function: L S =∑ i ‖(P x, i , P y, i , P z, i )-I i (V x, i , V y, i , V z, i )‖ 2 (5)
其中,(P x,i,P y,i,P z,i)是上述局部人脸区域的点云数据中的点,(V x,i,V y,i,V z,i)是生成的3D人脸模型中的点,i表示第i个顶点,Ii表示模型顶点i是否计入该目标函数的计算。 Among them, (P x, i , P y, i , P z, i ) are the points in the point cloud data of the above local face area, (V x, i , V y, i , V z, i ) are the generated The point in the 3D face model, i represents the i-th vertex, and Ii represents whether the model vertex i is included in the calculation of the objective function.
对于3D人脸重建方法,一种可采用的技术方案是基于二维(two-dimensional,2D)人脸图像的关键点进行3D人脸的重建,该技术方案首先提取2D人脸图像上的关键点,关键点可以是位于面部轮廓17点,左眉5点,右眉5点,左眼6点,右眼6点,鼻梁4点,鼻翼5点,嘴部轮廓20点等位置的关键点,各关键点用于表征人脸轮廓;然后,通过关键点在通用的标准3D人脸模型上的对应关系,调整标准3D人 脸模型中的对应特征点的位置;然后通过插值其他非特征点,对标准3D人脸模型进行形变,从而得到重建的3D人脸模型。由上可知,该方法需要对输入图像进行2D关键点的提取,然而,当基于侧面图像进行3D人脸重建时,由于2D关键点信息被自身遮挡,会导致对人脸2D关键点的定位不准确(即无法准确提取),导致重建的3D对象效果并不好,即对侧面输入图像中人脸的姿态适应性较差。可见,该方法对于遮挡导致的人脸部分区域不可见的输入图像,由于2D关键点提取的不准确,重建的效果极差、甚至会导致重建失败。For the 3D face reconstruction method, a technical scheme that can be adopted is to reconstruct the 3D face based on the key points of the two-dimensional (2D) face image. The technical scheme first extracts the key points on the 2D face image. Points, the key points can be 17 points on the facial contour, 5 points on the left eyebrow, 5 points on the right eyebrow, 6 points on the left eye, 6 points on the right eye, 4 points on the bridge of the nose, 5 points on the wing of the nose, 20 points on the mouth contour, etc. , each key point is used to represent the contour of the face; then, adjust the position of the corresponding feature point in the standard 3D face model through the corresponding relationship of the key point on the general standard 3D face model; and then by interpolating other non-feature points , deform the standard 3D face model to obtain a reconstructed 3D face model. It can be seen from the above that this method needs to extract 2D key points from the input image. However, when 3D face reconstruction is performed based on the side image, since the 2D key point information is occluded by itself, the positioning of the 2D key points of the face will be inaccurate. Accurate (that is, cannot be accurately extracted), resulting in a poor effect of the reconstructed 3D object, that is, poor adaptability to the pose of the face in the side input image. It can be seen that, for the input image where part of the face is invisible due to occlusion, the reconstruction effect is extremely poor due to the inaccurate extraction of 2D key points, and even the reconstruction may fail.
另一种可采用的技术方案是使用消费级别的RGB-D深度相机对人脸进行3D重建,该方案对当前帧输入图像对应点云进行几何配准进行3D人脸重建,其中在几何配准环节大都使用的是基于ICP算法,此过程是一个拟合优化问题,需要通过大量的循环迭代计算。该方法的主要问题是,当人脸发生遮挡的场景下,并不知晓图像帧中被遮挡的部分的人脸点云数据,因此难以确定待拟合信息,导致重建的3D对象失效。Another technical solution that can be adopted is to use a consumer-grade RGB-D depth camera to perform 3D reconstruction of the face. This solution performs geometric registration on the point cloud corresponding to the input image of the current frame to perform 3D face reconstruction. In the geometric registration Most of the links are based on the ICP algorithm. This process is a fitting optimization problem that requires a large number of iterative calculations. The main problem of this method is that when the face is occluded, the face point cloud data of the occluded part of the image frame is not known, so it is difficult to determine the information to be fitted, resulting in the failure of the reconstructed 3D object.
为了实现针对人脸部分被遮挡的情况下的3D人脸重建,本申请实施例提供了一种人脸图像处理方法,是一种改进的3D人脸重建方法,基本原理是:首先对图像中的局部人脸区域进行局部人脸稀疏特征提取,与人脸数据库库中的各3D人脸样本的稀疏特征进行相似度匹配,获得匹配的3D人脸样本对应的形状参数,根据该形状参数结合参数化人脸模型建立3D人头模型;另一方面对局部人脸区域的各个关键点进行识别,并获得各个关键点的3D数据;然后以各个关键点的3D数据为目标,将所建立的3D人头模型进行刚体变换,实现3D人头模型与相机坐标系中的局部人脸区域点云初步对齐;然后进行局部人脸区域与3D人头模型进行拟合优化,且优化目标的选择限制在局部人脸区域点云,完成3D人脸重建。本申请实施例的方法在人脸发生有意、无意的部分遮挡,或大角度头部姿态发生自遮挡的场景下,导致难以获取待拟合信息时,均具有较好的3D人脸的重建效果。In order to realize 3D face reconstruction in the case that the face part is occluded, the embodiment of the present application provides a face image processing method, which is an improved 3D face reconstruction method. The partial face area of the local face is extracted, and the similarity matching is performed with the sparse features of each 3D face sample in the face database to obtain the shape parameter corresponding to the matched 3D face sample. According to the shape parameter, the combination Parametric face model to establish a 3D head model; on the other hand, identify each key point of the local face area, and obtain the 3D data of each key point; Perform rigid body transformation on the head model to achieve initial alignment between the 3D head model and the point cloud of the local face area in the camera coordinate system; then perform fitting optimization between the local face area and the 3D head model, and the selection of optimization targets is limited to the local face Regional point cloud to complete 3D face reconstruction. The method of the embodiment of the present application has a better 3D face reconstruction effect when the human face is partially occluded intentionally or unintentionally, or when the large-angle head posture is self-occluded, which makes it difficult to obtain the information to be fitted. .
本申请实施例可应用于对车辆、飞机等交通工具内的人员,如驾驶员人员的脸进行3D重建,从而可以根据重建后的3D人脸,判断驾驶员的头部姿态、视线方向等,以对驾驶员状态进行识别。还可以应用于对电视前方的观众、教学中的学生等人员的3D人脸的重建,从而可以基于重建后的3D人脸,判断这些人员头部姿态、视线方向等,以进一步确定人员的注意力方向、注意力度等,进而调整电视内容、授课方式等等。还可以应用于移动终端,如手机、平板电脑、便携式电脑所采集的人员的脸部图像进行3D人脸的重建。也可以应用于人脸图像的修复等技术领域,例如用于人脸图像部分有污渍、或不完整的人脸图像的修复,例如是经裁切导致的不完整的人脸图像的修复。The embodiment of the present application can be applied to the 3D reconstruction of the face of a person in a vehicle, an airplane, etc., such as a driver, so that the driver's head posture, line of sight direction, etc. can be judged based on the reconstructed 3D face, To identify the state of the driver. It can also be applied to the reconstruction of the 3D faces of the audience in front of the TV, students in teaching, etc., so that based on the reconstructed 3D faces, the head posture and line of sight direction of these people can be judged to further determine the attention of the people The direction of force, the degree of attention, etc., and then adjust the TV content, teaching methods, etc. It can also be applied to mobile terminals, such as mobile phones, tablet computers, and portable computers, to reconstruct 3D face images from people's face images. It can also be applied to technical fields such as restoration of human face images, for example, it can be used for the repair of partially stained or incomplete human face images, such as the restoration of incomplete human face images caused by cropping.
如图1A、图1B示出了本申请实施例应用于车辆的场景的示例,该实施例的车辆包括一般的机动车辆,例如包括轿车、运动型多用途汽车(sport utility vehicle,SUV)、多用途汽车(Multi-purpose vehicle,MPV)、公交车、卡车和其它载货或者载客车辆在内的陆地运输装置,也包括各种船、艇在内的水运工具,以及航空器等。对于机动 车辆,还包括混合动力车辆、电动车辆、燃油车辆、插电式混合动力车辆、燃料电池汽车以及其它代用燃料车辆。其中,混合动力车辆指的是具有两种或者多种动力源的车辆,电动车辆包括纯电动汽车、增程式电动汽车等,本申请对此不做具体限定。当本申请实施例应用于车辆10时,车辆10可以包括图像采集装置11和处理器12。FIG. 1A and FIG. 1B show examples of scenarios where the embodiment of the present application is applied to a vehicle. The vehicle in this embodiment includes a general motor vehicle, such as a car, a sport utility vehicle (sport utility vehicle, SUV), a utility vehicle, etc. Land transportation devices including multi-purpose vehicles (MPV), buses, trucks and other cargo or passenger vehicles, as well as water vehicles including various ships and boats, and aircraft. For motor vehicles, it also includes hybrid vehicles, electric vehicles, gasoline vehicles, plug-in hybrid vehicles, fuel cell vehicles and other alternative fuel vehicles. Wherein, a hybrid vehicle refers to a vehicle having two or more power sources, and an electric vehicle includes a pure electric vehicle, an extended-range electric vehicle, etc., which is not specifically limited in this application. When the embodiment of the present application is applied to a vehicle 10 , the vehicle 10 may include an image acquisition device 11 and a processor 12 .
其中,图像采集装置11用于获取包括乘员(乘员包括驾驶员或乘客)脸部的图像。本实施例中,图像采集装置11为摄像头,其中摄像头可以是双目摄像头、RGB-D摄像头等。摄像头可以按照需求安装到车辆上,例如,安装在车辆的座舱内。本实施例中,具体可以如图1B所示,采用了由两独立的摄像头构成的双目摄像头,这两个摄像头为设置在车辆座舱的左右A柱上的第一摄像头111和第二摄像头112。在其他例子中,也可以安装在车辆座舱内的后视镜的朝向乘员的一侧,还可以安装在方向盘、中控台附近区域,还可以安装在座椅后方显示屏上方等位置,主要用于对车辆座舱的驾驶员或乘客的脸部图像进行采集。Wherein, the image acquisition device 11 is used to acquire images including faces of occupants (the occupants include drivers or passengers). In this embodiment, the image acquisition device 11 is a camera, where the camera may be a binocular camera, an RGB-D camera, or the like. The camera can be installed on the vehicle as required, for example, in the cockpit of the vehicle. In this embodiment, specifically, as shown in FIG. 1B, a binocular camera composed of two independent cameras is adopted, and these two cameras are the first camera 111 and the second camera 112 arranged on the left and right A-pillars of the vehicle cockpit. . In other examples, it can also be installed on the side of the rearview mirror in the vehicle cockpit facing the occupant, it can also be installed on the steering wheel, the area near the center console, and it can also be installed on the position above the display screen behind the seat. It is used to collect facial images of drivers or passengers in the vehicle cockpit.
在其他一些实施例中,图像采集装置11也可以是接收摄像头传输的乘员图像数据的电子设备,如数据传输芯片,数据传输芯片例如总线数据收发芯片、网络接口芯片等,数据传输芯片也可以是无线传输芯片,如蓝牙芯片或WiFi芯片等。在另一些实施例中,图像采集装置11也可以集成于处理器中,成为集成到处理器中的接口电路或数据传输模块等。In some other embodiments, the image acquisition device 11 can also be an electronic device that receives the occupant image data transmitted by the camera, such as a data transmission chip, such as a bus data transceiver chip, a network interface chip, etc., and the data transmission chip can also be Wireless transmission chips, such as Bluetooth chips or WiFi chips. In some other embodiments, the image acquisition device 11 may also be integrated into the processor, and become an interface circuit or a data transmission module integrated into the processor.
其中,处理器12可用于根据所述图像中的人脸进行3D人脸的重建,在本申请实施例中,可以用于当所述图像中的脸部部分被遮挡时,根据图像中的局部人脸区域(如未被遮挡的人脸区域)进行3D人脸的重建。在其他一些实施例中,处理器12还可用于根据重建的3D人脸对乘员的头部姿态、或/和视线方向进行识别,还可以进一步根据识别的头部姿态、视线方向确定乘员的注意力方向、注意力度等。当本申请实施例应用于车辆10时,处理器12可以为电子设备,例如可以为车机、域控制器、移动数据中心(Mobile Data Center,MDC)或车载电脑等车载处理装置的处理器,也可以为中央处理器(central processing unit,CPU)、微处理器(micro control unit,MCU)等常规的芯片。Among them, the processor 12 can be used to reconstruct the 3D face according to the face in the image. The face area (such as the unoccluded face area) is used for 3D face reconstruction. In some other embodiments, the processor 12 can also be used to identify the occupant's head posture and/or gaze direction according to the reconstructed 3D face, and can further determine the occupant's attention according to the recognized head posture and gaze direction. Direction of force, degree of attention, etc. When the embodiment of the present application is applied to the vehicle 10, the processor 12 may be an electronic device, such as a processor of a car machine, a domain controller, a mobile data center (Mobile Data Center, MDC) or a vehicle-mounted computer, etc. It can also be a conventional chip such as a central processing unit (central processing unit, CPU) or a microprocessor (micro control unit, MCU).
如图2A示出了本申请的人脸图像处理方法的一实施例,图2B是该实施例的人脸图像处理方法的示意图,该人脸图像处理方法的实施例包括以下步骤:Figure 2A shows an embodiment of the face image processing method of the present application, and Fig. 2B is a schematic diagram of the face image processing method of this embodiment, and the embodiment of the face image processing method includes the following steps:
S10:针对图像,获取人脸图像中的局部人脸形状特征。本实施例中,所述人脸图像为具有深度信息的2D图像,人脸图像中包含了局部人脸区域,局部人脸区域包含了未被遮挡的局部人脸。S10: For the image, obtain the local face shape features in the face image. In this embodiment, the face image is a 2D image with depth information, the face image includes a partial face area, and the partial face area includes an unoccluded partial face.
在一些实施例中,所述人脸图像可以是通过双目摄像头或RGB-D摄像头采集的,也可是通过数据传输芯片所接收的。当采用双目摄像头时,可以根据所采集的图像对(图像对指通过双目摄像头所采集的成对的图像)和摄像头参数计算出像素的深度信息。当采用RGB-D摄像头时,可以直接获得像素的深度信息。In some embodiments, the face image may be collected by a binocular camera or an RGB-D camera, or may be received by a data transmission chip. When a binocular camera is used, the depth information of the pixel can be calculated according to the collected image pairs (an image pair refers to a pair of images collected by the binocular camera) and camera parameters. When an RGB-D camera is used, the depth information of the pixels can be obtained directly.
S20:根据所述局部人脸形状特征,获取与所述局部人脸形状特征匹配的人脸样本,然后获取所述人脸样本的人脸形状参数。S20: Acquire a face sample that matches the local face shape feature according to the local face shape feature, and then acquire a face shape parameter of the face sample.
在一些实施例中,可以根据提取的局部人脸形状特征,在人脸数据库中进行检索, 以匹配出的与所述局部人脸形状特征相似的人脸样本,例如匹配出相似度最高的人脸样本,然后获取该人脸样本的人脸形状参数。In some embodiments, according to the extracted local face shape features, search can be performed in the face database to match the face samples similar to the local face shape features, for example, to match the person with the highest similarity face sample, and then obtain the face shape parameters of the face sample.
在一些实施例中,在匹配与所述局部人脸形状特征相似的人脸样本时,所匹配的目标可以是:使所述局部人脸区域的人脸形状特征与该人脸样本的人脸形状特征的相关性最大,且使所述局部人脸区域的人脸形状特征与其他各人脸样本的人脸形状特征的关联残差最小。In some embodiments, when matching a face sample similar to the local face shape feature, the matching goal may be: make the face shape feature of the local face area consistent with the face sample of the face sample. The correlation of the shape features is the largest, and the correlation residual between the face shape features of the local face area and the face shape features of other face samples is minimized.
其中,人脸数据库中的人脸样本为未被遮挡的人脸样本,通过本步骤,可以获得相似人脸,并获取该相似人脸的人脸形状参数。Wherein, the face samples in the face database are unoccluded face samples, through this step, a similar face can be obtained, and the face shape parameters of the similar face can be obtained.
在一些实施例中,所述人脸数据库可基于真实人脸构成,海量的数据从概率上能提高匹配出的人脸样本的精度。并且,基于人脸数据库获得完整的真实人脸的样本,会使得通过本申请实施例方法重建的3D人脸更具有真实感。In some embodiments, the face database can be formed based on real faces, and massive data can improve the accuracy of matched face samples in terms of probability. Moreover, obtaining a complete real face sample based on the face database will make the 3D face reconstructed by the method of the embodiment of the present application more realistic.
S30:使用获得的所述人脸形状参数生成3D人脸模型。S30: Generate a 3D face model by using the obtained face shape parameters.
在一些实施例中,可以基于参数化3D人脸模型进行3D人脸模型的生成,本步骤可参见上述公式(3),将获得的所述人脸形状参数带入公式(3)生成3D人脸模型。In some embodiments, the 3D face model can be generated based on the parameterized 3D face model. This step can refer to the above formula (3), and the obtained face shape parameters are brought into the formula (3) to generate a 3D human face. face model.
在一些实施例中,在上述步骤S30后,还可包括步骤S40:将所述3D人脸模型基于局部人脸区域的点云数据进行拟合,该拟合包括姿态变换和形状的拟合。其中,所述局部人脸区域的点云数据可以根据各像素的深度值和相机参数获得。In some embodiments, after the above step S30, a step S40 may also be included: fitting the 3D face model based on the point cloud data of the local face area, the fitting including pose transformation and shape fitting. Wherein, the point cloud data of the local face area can be obtained according to the depth value of each pixel and camera parameters.
在一些实施例中,如图3所示的流程图,上述步骤S10可包括以下步骤S11-S13:In some embodiments, as shown in the flowchart of FIG. 3, the above step S10 may include the following steps S11-S13:
S11:获取所述人脸图像中的目标区域,所述目标区域包括所述局部人脸区域。S11: Acquire a target area in the face image, where the target area includes the partial face area.
对于原始的图像,该图像中除了包括待建的人脸,还包括其他内容,为了后续步骤中避免对图像中包括的其他部分进行不必要的关键点信息确定,提高处理效率,故首先对图像通过人脸检测进行目标区域的提取,其目标区域也可以称为感兴趣区域(region of interest,ROI)。For the original image, in addition to the face to be built, the image also includes other content. In order to avoid unnecessary key point information determination on other parts of the image in the subsequent steps and improve processing efficiency, the image is first The target area is extracted through face detection, and the target area can also be called a region of interest (ROI).
在一些实施例中,所述目标区域为包括所述局部人脸区域的一区域,该区域可以是矩形区域、圆形区域或任意形状区域,在本实施例中可以为矩形区域,该区域中包括的局部人脸区域指人脸未被遮挡的区域,该区域的非人脸区域包括遮挡物区域、背景区域。In some embodiments, the target area is an area including the partial human face area, and this area may be a rectangular area, a circular area or an area of any shape. In this embodiment, it may be a rectangular area. In this area The included local face area refers to the area where the face is not occluded, and the non-face area of this area includes the occluder area and the background area.
在一些实施例中,对于目标区域的提取,可以采用卷积神经网络(Convolutional Neural Networks,CNN)、区域选取网络(Region Proposal Network,RPN)、基于卷积功能的区域选取网络(Regions with CNN features,RCNN)、快速RCNN网络(Faster-RCNN)、MobileV2网络(一种轻量化神经网络)等网络,或多种网络的结合来进行目标区域的提取。In some embodiments, for the extraction of the target region, a convolutional neural network (Convolutional Neural Networks, CNN), a region selection network (Region Proposal Network, RPN), a region selection network based on a convolution function (Regions with CNN features , RCNN), fast RCNN network (Faster-RCNN), MobileV2 network (a lightweight neural network) and other networks, or a combination of multiple networks to extract the target area.
S12:从目标区域图像中提取局部人脸区域。S12: Extract a local face area from the target area image.
在一些实施例中,可以根据所述目标区域像素的颜色值获取所述局部人脸区域。这是由于人脸肤色与非人脸部分,例如人脸肤色与背景、某些遮挡物(如口罩、水杯、水瓶、眼镜等)可以区分出来,因此,可以基于像素颜色值提取出目标区域内的局部人脸区域。In some embodiments, the partial face area may be acquired according to the color value of the pixel in the target area. This is because the skin color of the face can be distinguished from the non-face parts, such as the skin color of the face and the background, and some occluders (such as masks, water cups, water bottles, glasses, etc.), so the pixel color value can be used to extract the target area. local face area.
在一些实施例中,可以根据所述目标区域像素的深度值获取所述局部人脸区域。 这是由于在空间上(或称在深度上)人脸的空间位置(或深度)与某些非人脸部分,例如背景、某些非片状遮挡物(如水杯、水瓶、手、胳膊)可以区分出来,因此可以基于像素深度值提取局部人脸区域。In some embodiments, the partial human face area may be acquired according to depth values of pixels in the target area. This is due to the spatial position (or depth) of the human face in space (or in depth) and some non-face parts, such as the background, some non-flaky occluders (such as water cups, water bottles, hands, arms) can be distinguished, so local face regions can be extracted based on pixel depth values.
在一些实施例,可以是先从目标区域图像中提取人脸被遮挡区域,再针对目标区域与所述人脸被遮挡区域的差异区域,根据上述像素的颜色值、和/或深度值,来提取局部人脸区域。针对该实施例,具体可以如下两种实施方式:In some embodiments, the occluded area of the human face may be extracted from the image of the target area first, and then for the difference area between the target area and the occluded area of the human face, according to the color value and/or depth value of the above pixel, Extract local face regions. For this embodiment, specifically, the following two implementation modes are possible:
1、计算该差异区域内的各像素颜色的平均参考颜色;保留所述差异区域内的颜色与所述平均参考颜色的差别小于阈值的像素,去除差别大于所述阈值的像素。其主要原理是所述差异区域内大部分像素是局部人脸区域的像素,与局部人脸区域像素颜色差别较大的像素可能是背景等非人脸的像素,故去除这部分像素。1. Calculate the average reference color of each pixel color in the difference area; retain pixels whose difference between the color in the difference area and the average reference color is less than a threshold, and remove pixels with a difference greater than the threshold. The main principle is that most of the pixels in the difference area are pixels in the local face area, and the pixels with a larger color difference from the pixels in the local face area may be non-face pixels such as the background, so these pixels are removed.
在一些实施方式中,所述平均参考颜色可以为这些像素颜色的均值。也可以是结合各像素所在位置进行了加权(如越靠近差异区域边缘权值越低)计算得到的颜色的均值,或针对各像素与通常人脸颜色的差别大小进行了加权(如差别越大权值越低)计算得到的颜色的均值。In some embodiments, the average reference color may be an average value of these pixel colors. It can also be the mean value of the color calculated by weighting the position of each pixel (such as the closer to the edge of the difference area, the lower the weight), or weighting the difference between each pixel and the normal face color (such as the greater the difference The lower the weight), the mean value of the calculated color.
在另一些实施方式中,可以通过拟合高斯函数的方式计算均值,具体为:针对所述差异区域内的像素,以RGB三通道颜色为颜色坐标拟合高斯函数,获得高斯函数的均值和标准差;所述均值作为所述平均参考颜色,所述标准差作为所述阈值;将所述均值作为颜色中心点坐标,计算每个像素点的所述颜色坐标到所述颜色中心点坐标的距离,保留所述距离小于所述标准差的像素。In other embodiments, the mean value can be calculated by fitting a Gaussian function, specifically: for the pixels in the difference region, the Gaussian function is fitted with the RGB three-channel color as the color coordinates, and the mean value and the standard value of the Gaussian function are obtained. difference; the mean is used as the average reference color, and the standard deviation is used as the threshold; the mean is used as the coordinates of the color center point, and the distance from the color coordinates of each pixel to the coordinates of the color center point is calculated , keep the pixels whose distance is less than the standard deviation.
2、计算所述被遮挡区域内的各像素的深度值的平均参考深度值,保留所述差异区域内的深度值低于所述平均参考深度值的像素,去除大于等于所述平均参考深度值的像素。其主要原理是所述被遮挡区域与局部人脸区域的深度值明显不同,所述差异区域内深度值不小于被遮挡区域的像素是人脸的可能性极小,故去除这部分像素。2. Calculate the average reference depth value of the depth values of each pixel in the occluded area, retain the pixels whose depth value is lower than the average reference depth value in the difference area, and remove the pixels greater than or equal to the average reference depth value of pixels. The main principle is that the depth values of the occluded area and the local face area are obviously different, and the pixels in the difference area whose depth value is not smaller than the occluded area are extremely unlikely to be human faces, so these pixels are removed.
在一些实施方式中,平均参考深度值可以为被遮挡区域的各像素的深度的均值。也可以是结合各像素所在位置进行了加权(如越靠近被遮挡区域边缘权值越低)计算得到的深度的均值,In some implementations, the average reference depth value may be an average value of the depths of pixels in the occluded area. It can also be the mean value of the depth calculated by weighting the position of each pixel (for example, the closer to the edge of the occluded area, the lower the weight),
对于本步骤S12,在一些实施例中,也可以采用CNN、蒙皮RCNN(Mask RCNN)、全卷积网络(Fully Convolutional Networks,FCN)、YoloV2网络(一种小目标检测的神经网络)等网络,或多种网络的结合来进行局部人脸区域的提取。For this step S12, in some embodiments, networks such as CNN, Skin RCNN (Mask RCNN), Fully Convolutional Networks (Fully Convolutional Networks, FCN), YoloV2 network (a neural network for small target detection) can also be used , or a combination of multiple networks to extract local face regions.
S13:针对提取的所述局部人脸区域,从该局部人脸区域提取人脸形状特征。S13: For the extracted partial human face region, extract a human face shape feature from the partial human face region.
在一些实施例中,提取的人脸形状的特征可以为常规特征(feature),也可以为稀疏特征(sparse feature)。稀疏特征的稀疏性可以降低神经网络训练的复杂度、减少神经网络的冗余特征、减少所需的存储空间。In some embodiments, the extracted features of the face shape may be regular features or sparse features. The sparsity of sparse features can reduce the complexity of neural network training, reduce the redundant features of neural networks, and reduce the required storage space.
在一些实施例中,从所述局部人脸区域中进行特征的提取,例如进行常规特征或稀疏特征的提取时,可以采用特征提取网络实现,如CNN网络、残差网络(如Resnet50)等。In some embodiments, feature extraction from the local face area, for example, when extracting conventional features or sparse features, can be implemented using a feature extraction network, such as a CNN network, a residual network (such as Resnet50), and the like.
在一些实施例中,如图4所示的3D人脸模型与局部人脸区域点云的拟合流程图,上述步骤S40包括下述步骤S41-S43:In some embodiments, as shown in the flow chart of fitting a 3D face model and a point cloud of a local face area as shown in FIG. 4 , the above-mentioned step S40 includes the following steps S41-S43:
S41:提取局部人脸区域的关键点,并获得所述关键点的3D坐标;S41: Extract key points of the local face area, and obtain 3D coordinates of the key points;
在一些实施例中,本步骤可以如下:将目标区域输入关键点检测网络,例如CNN网络、Hourglass网络等,通过关键点检测网络识别目标区域的各个像素中的关键点,并确定出关键点的2D位置信息。然后,根据图像的深度信息,可以确定出所述关键点的深度信息,再根据深度相机内参计算出所述关键点的3D坐标。In some embodiments, this step can be as follows: input the target area into the key point detection network, such as CNN network, Hourglass network, etc., identify the key points in each pixel of the target area through the key point detection network, and determine the key points of the key points 2D position information. Then, according to the depth information of the image, the depth information of the key point can be determined, and then the 3D coordinates of the key point can be calculated according to the internal parameters of the depth camera.
在另一些实施例中,关键点提取过程中,除了是将目标区域的图像输入关键点检测网络,也可以是将步骤S10提取出的局部人脸区域的图像输入关键点检测网络。In some other embodiments, in the key point extraction process, in addition to inputting the image of the target area into the key point detection network, the image of the local face area extracted in step S10 may also be input into the key point detection network.
在计算所述各关键点3D坐标时,也可一并根据局部人脸区域深度信息与相机内参计算出局部人脸区域的各个像素的3D坐标,即计算局部人脸区域的点云信息。When calculating the 3D coordinates of each key point, the 3D coordinates of each pixel in the local face area can also be calculated according to the depth information of the local face area and the internal parameters of the camera, that is, the point cloud information of the local face area can be calculated.
S42:将所述3D人脸模型根据所述各关键点的3D坐标进行姿态变化,实现将3D人脸模型与局部人脸区域点云的初步对齐。S42: Change the pose of the 3D face model according to the 3D coordinates of the key points, so as to realize preliminary alignment of the 3D face model with the point cloud of the local face area.
所述姿态变化可为刚体变换,即,使用所述人脸关键点的3D坐标作为人脸三维形状拟合的目标位置,并将建立的3D人脸模型通过刚体变换变换到该目标位置。The pose change can be a rigid body transformation, that is, use the 3D coordinates of the key points of the face as the target position for 3D shape fitting of the face, and transform the established 3D face model to the target position through rigid body transformation.
在一些实施例中,该3D人脸模型的刚体变换可以采用点云匹配算法实现,点云匹配算法如ICP、NDT、IDC等算法。In some embodiments, the rigid body transformation of the 3D face model can be implemented by point cloud matching algorithms, such as ICP, NDT, IDC and other algorithms.
S43:当3D人脸模型与局部人脸区域点云初步对齐后,将3D人脸模型与局部人脸区域点云进行形状拟合,以完成针对该人脸的3D人脸重建。S43: After the 3D face model is preliminarily aligned with the point cloud of the local face area, shape fitting is performed on the 3D face model and the point cloud of the local face area to complete 3D face reconstruction for the face.
其中,进行形状拟合时,可以通过迭代算法进行形状参数的拟合,例如拟牛顿法、牛顿法等迭代算法。该拟合可以包括拟合嘴唇,眼睛等参数,即优化3D人脸模型的形状系数,当完成拟合后,即完成了针对该人脸的3D人脸的细节的重建。Wherein, when performing shape fitting, the fitting of shape parameters may be performed through iterative algorithms, such as iterative algorithms such as quasi-Newton method and Newton method. The fitting may include fitting parameters such as lips and eyes, that is, optimizing the shape factor of the 3D face model. After the fitting is completed, the reconstruction of the details of the 3D face for the face is completed.
为了更好的理解本发明,下面对本申请应用于3D人脸重建的一具体实施方式进行详细描述。In order to better understand the present invention, a specific implementation manner of applying the present application to 3D face reconstruction will be described in detail below.
下面介绍本申请应用于3D人脸重建的一具体实施方式,该实施方式中,采用基于单目的深度相机(或称为RGB-D相机)采集具有深度的图像数据,该图像数据包括普通的RGB彩色的2D图像信息和深度信息(Depth Map)。参见图5示出的流程图,该具体实施方式包括以下步骤:The following introduces a specific implementation of the application for 3D face reconstruction. In this implementation, a single-purpose depth camera (or RGB-D camera) is used to collect image data with depth, and the image data includes common RGB Colored 2D image information and depth information (Depth Map). Referring to the flow chart shown in Figure 5, this embodiment includes the following steps:
S110:获取RGB-D图像数据,该RGB-D图像数据可以来自深度相机,其中,RGB-D图像数据为含有遮挡人脸的图像数据,并且如上所述,图像数据(图像帧)由2D图像(彩色图像帧)和深度图像(深度图像帧)构成,可以理解为,每个位置的像素具有一2D信息(如RGB数据)和一深度信息。2D图像具有颜色和纹理,因此通过2D图像可识别出待建对象中未遮挡区域的人脸的关键点位置。S110: Acquire RGB-D image data, the RGB-D image data may come from a depth camera, wherein the RGB-D image data is image data containing a occluded face, and as described above, the image data (image frame) consists of a 2D image (color image frame) and depth image (depth image frame), it can be understood that the pixel at each position has a 2D information (such as RGB data) and a depth information. The 2D image has color and texture, so the key point position of the human face in the unoccluded area of the object to be built can be identified through the 2D image.
S112:对所述图像数据进行遮挡人脸的检测,识别并提取出彩色2D图像中的待建对象的局部人脸区域。具体包括:S112: Perform occluded face detection on the image data, identify and extract a partial face area of the object to be built in the color 2D image. Specifically include:
首先利用MobileV2网络对彩色图像帧进行遮挡人脸检测,确定待建对象所在彩色图像中的目标区域(ROI);其中,目标区域包含局部人脸区域和人脸的被遮挡区域。First, the MobileV2 network is used to detect the occluded face of the color image frame, and determine the target region (ROI) in the color image where the object to be built is located; where the target region includes the partial face area and the occluded area of the face.
确定目标区域后,通过裁剪的方式得到目标区域,然后将得到的目标区域缩放到目标像素大小以便于后续处理,例如将目标区域缩放到512x512(像素)的大小。After the target area is determined, the target area is obtained by cropping, and then the obtained target area is scaled to the target pixel size for subsequent processing, for example, the target area is scaled to a size of 512x512 (pixels).
S114:将上述缩放后的目标区域,输入YoloV2网络,对遮挡物进行检测,从而确定人脸被遮挡区域。S114: Input the zoomed target area into the YoloV2 network to detect the occluded object, so as to determine the occluded area of the face.
S116:从目标区域中提取局部人脸区域具体包括下述步骤两种方式。另一方面,提取局部人脸区域后,也就可以计算出局部人脸区域的点云信息。S116: Extracting the partial human face area from the target area specifically includes the following steps in two ways. On the other hand, after extracting the local face area, the point cloud information of the local face area can also be calculated.
1、基于颜色提取人脸:根据前述目标区域和遮挡区域,将前述目标区域和遮挡区域的差异区域作为RGB颜色参考区域,对该参考区域所有像素,以RGB三通道颜色为坐标,拟合高斯函数,求得均值和标准差,然后将均值作为中心点坐标,计算每个像素点到中心点距离,距离小于一个标准差之内的像素,保留该像素的RGB信息,否则去除该像素的RGB信息,去除的方式可为将该像素的RBG值置零的方式。1. Face extraction based on color: According to the aforementioned target area and occlusion area, the difference area between the aforementioned target area and occlusion area is used as the RGB color reference area, and all pixels in the reference area are coordinate with RGB three-channel color to fit Gaussian Function to obtain the mean and standard deviation, and then use the mean as the coordinates of the center point to calculate the distance from each pixel to the center point, and keep the RGB information of the pixel if the distance is less than one standard deviation, otherwise remove the RGB information of the pixel information, the way of removing it can be a way of setting the RBG value of the pixel to zero.
以上具体实施方式中,目标区域和遮挡区域的差异区域的大部分为局部人脸区域,因此将三通道颜色拟合高斯函数所对应的均值对应人脸颜色,保留与该颜色差值在阈值内像素(即颜色相近的像素),去除颜色差值大于阈值的像素(即颜色差别大的像素),从而实现局部人脸区域的提取。In the above specific implementation, most of the difference between the target area and the occlusion area is the local face area, so the mean value corresponding to the three-channel color fitting Gaussian function corresponds to the face color, and the difference with the color is kept within the threshold Pixels (that is, pixels with similar colors), and remove pixels whose color difference is greater than the threshold (that is, pixels with large color differences), so as to realize the extraction of local face regions.
2、考虑到所述参考区域内,可能存在与人脸颜色相近而未被去除的像素区域,但这些区域却与人脸深度差值较大的情况,这些区域为人脸的可能性较小,因此在一些实施方式中,可进一步采用基于深度值的方式去除这部分像素区域,具体如下:2. Considering that in the reference area, there may be pixel areas that are similar in color to the human face but have not been removed, but the depth difference between these areas and the human face is relatively large, the possibility of these areas being human faces is small, Therefore, in some implementations, this part of the pixel area can be further removed based on the depth value, as follows:
以遮挡区域深度均值作为阈值,去除参考区域内深度值大于等于阈值的像素,例如将这些像素RBG值置零。Use the average depth of the occluded area as a threshold to remove pixels with a depth value greater than or equal to the threshold in the reference area, for example, set the RBG values of these pixels to zero.
通过上述基于颜色、深度值两种方式的处理,可以提取得到较为准确的局部人脸区域。Through the above-mentioned processing based on the two methods of color and depth value, a relatively accurate partial face area can be extracted.
当提取局部人脸区域后,根据输入深度图像的各个像素对应的深度信息,并通过深度相机的内参矩阵,可计算得到局部人脸区域的各像素的3D坐标,即获得局部人脸区域的点云信息,以用于后述步骤。After extracting the local face area, according to the depth information corresponding to each pixel of the input depth image, and through the internal reference matrix of the depth camera, the 3D coordinates of each pixel in the local face area can be calculated, that is, the points of the local face area can be obtained Cloud information for the steps described later.
S118:对局部人脸区域经过特征提取网络,如ResNet50网络,进行局部人脸形状特征的提取,这里提取的为稀疏特征,通过特征提取网络回归输出局部人脸形状的稀疏特征,这里输出为m维的稀疏特征X,X=[X1,X2,…,Xm]。S118: Through a feature extraction network, such as a ResNet50 network, for the local face area, the local face shape feature is extracted. Here, the sparse feature is extracted, and the sparse feature of the local face shape is output through the feature extraction network regression. Here, the output is m Dimensional sparse features X, X=[X1, X2, . . . , Xm].
S120:将所述局部人脸形状的稀疏特征X,与3D人脸数据库中的各人脸样本的稀疏特征A基于相似度进行匹配,匹配出一相似的人脸样本,根据所匹配的人脸样本获取其人脸形状参数,作为待建的3D人脸模型的形状参数q。下面,对本步骤的具体实现进行详述:S120: Match the sparse feature X of the local face shape with the sparse feature A of each face sample in the 3D face database based on the similarity to match a similar face sample, The sample obtains its face shape parameter as the shape parameter q of the 3D face model to be built. The specific implementation of this step is described in detail below:
假设D=[D 1,D 2,…,D k]∈R n×k为3D人脸数据库的样本集,其中D i=[d i1,d i2,…,d in]∈R n为第i个人脸样本形状参数构成的n维特征向量。 Suppose D=[D 1 , D 2 ,..., D k ]∈R n×k is the sample set of 3D face database, where D i =[d i1 , d i2 ,...,d in ]∈R n is the An n-dimensional feature vector composed of shape parameters of i face samples.
首先,提取出各个人脸样本的稀疏特征,由于3D人脸数据库中的3D人脸因为表情、光照、以及拍摄角度的不同存在很多不同变化,首先用正态分布随机采样过程求取投影矩阵W,然后采用该投影矩阵W对训练样本D进行稀疏特征提取,得到相对应的人脸稀疏特征矩阵A,其中A=W TD; First, the sparse features of each face sample are extracted. Since the 3D faces in the 3D face database have many different changes due to different expressions, lighting, and shooting angles, first use the normal distribution random sampling process to obtain the projection matrix W , and then use the projection matrix W to perform sparse feature extraction on the training sample D to obtain the corresponding sparse feature matrix A of the face, where A=W T D;
此时,与原始数据D相比,数据A为人脸特征的稀疏表示,其中A i为D i的降维表示,即第i个人脸样本的稀疏特征,当稀疏特征维度为m时,则可表示为A i=[A i1,A i2,…,A im]∈R mAt this time, compared with the original data D, the data A is a sparse representation of face features, where A i is the dimensionality reduction representation of D i , that is, the sparse features of the i-th face sample. When the sparse feature dimension is m, then Expressed as A i =[A i1 , A i2 , . . . , A im ]∈R m .
然后,各个人脸样本的稀疏特征A和当前所述局部人脸形状的稀疏特征X进行比对,根据相似度匹配出一人脸样本,将该人脸样本的3D人脸形状特征参数作为待 建3D人脸模型的形状特征参数q。匹配过程具体为:Then, the sparse feature A of each face sample is compared with the sparse feature X of the current partial face shape, and a face sample is matched according to the similarity, and the 3D face shape feature parameter of the face sample is used as the parameter to be constructed. The shape feature parameter q of the 3D face model. The matching process is as follows:
通过公式δ i(x)=(X i-A b)计算局部人脸形状的稀疏特征X与第b个人脸样本的稀疏特征A b的相关性,其中i=1到m;i表示第i个人脸样本; Calculate the correlation between the sparse feature X of the local face shape and the sparse feature A b of the b-th face sample by the formula δ i (x)=(X i -A b ), where i=1 to m; i represents the i-th personal face samples;
通过公式σ i(x)=∑ j≠b(X i-A j)计算局部人脸形状的稀疏特征X与其他剩余j-1个人脸样本的稀疏特征A j的关联残差,其中,j=1到k,且j不等于b,i表示第i个人脸样本; Calculate the correlation residual between the sparse feature X of the local face shape and the sparse feature A j of other remaining j-1 face samples by the formula σ i (x)=∑ j≠b (X i -A j ), where j =1 to k, and j is not equal to b, i represents the ith face sample;
综合上述相关性δ i(x)与关联残差σ i(x),通过公式
Figure PCTCN2021104294-appb-000003
表示人脸的可见区域与第i个人脸样本的稀疏特征的相似度S i
Combining the above correlation δ i (x) and associated residual σ i (x), through the formula
Figure PCTCN2021104294-appb-000003
Indicates the similarity S i between the visible area of the face and the sparse features of the ith face sample;
通过循环比对3D人脸数据库中的所有人脸样本,找到使稀疏特征相似度S i最大化时的人脸样本个体i,该人脸样本对应的人脸特征向量D i即为待建3D人脸模型的形状特征参数q,也即,q=D i=[d i1,d i2,…,d in]。 By cyclically comparing all face samples in the 3D face database, the face sample individual i that maximizes the sparse feature similarity S i is found, and the face feature vector D i corresponding to the face sample is the 3D face to be built. The shape feature parameter q of the face model, that is, q=D i =[d i1 , d i2 , . . . , d in ].
S122:利用匹配出的人脸样本的3D人脸形状特征参数q(即[di1,di2,…,din]),通过上述公式(3),结合参数化的人脸模型,初步建立3D人脸模型,或称为完成3D人脸模型的初始化。S122: Using the 3D face shape characteristic parameter q (namely [di1, di2, ..., din]) of the matched face sample, through the above formula (3), combined with the parameterized face model, initially establish a 3D face model, or complete the initialization of the 3D face model.
上述公式(3)中,基础形状T与形状向量Si由参数化3D人脸模型提供,将上述人脸样本的3D人脸的形状特征参数q输入参数化3D人脸模型,即可得到初始化的3D人脸模型。In the above formula (3), the basic shape T and the shape vector Si are provided by the parameterized 3D face model, and the shape characteristic parameter q of the 3D face of the above-mentioned face sample is input into the parameterized 3D face model to obtain the initialized 3D face model.
S124:另一方面,将步骤S112缩放后的目标区域的图像数据,作为Hourglass网络的输入,由Hourglass网络输出目标区域的图像数据中每个像素为关键点的概率,确定概率的极大值点即为关键点,这里称为关键点,然后获取局部人脸区域的各个关键点的2D位置信息。S124: On the other hand, the image data of the target area scaled in step S112 is used as the input of the Hourglass network, the probability that each pixel in the image data of the target area is output by the Hourglass network is a key point, and the maximum value point of the probability is determined It is a key point, which is called a key point here, and then the 2D position information of each key point in the local face area is obtained.
S126:由于遮挡干扰,Hourglass网络性能会受到一定程度的影响,由其所确定出的局部人脸区域的各个关键点,可能包含了被遮挡区域的关键点,因此,可以进一步根据步骤S114确定出的遮挡区域,剔除遮挡区域的关键点,即除去这部分噪声,从而得到更为准确的局部人脸区域的关键点信息,以在后续步骤中使用。S126: Due to occlusion interference, the performance of the Hourglass network will be affected to a certain extent. The key points of the local face area determined by it may include the key points of the occluded area. Therefore, it can be further determined according to step S114. The occluded area, the key points of the occluded area are eliminated, that is, this part of the noise is removed, so as to obtain more accurate key point information of the local face area, which can be used in the subsequent steps.
S128:确定出关键点的位置信息后,可根据输入深度图像的各个像素位置对应的深度信息,通过深度相机的内参矩阵计算出所述局部人脸区域的每个关键点的3D坐标。S128: After the location information of the key points is determined, the 3D coordinates of each key point of the local face area can be calculated through the internal reference matrix of the depth camera according to the depth information corresponding to each pixel position of the input depth image.
S130:将步骤S122建立的3D人脸模型与所述局部人脸区域的关键点的3D坐标进行对齐,即使用该局部人脸区域的关键点的3D坐标作为目标位置,将在步骤S122建立的3D人脸模型初始化到该目标位置。S130: Align the 3D human face model established in step S122 with the 3D coordinates of the key points of the local human face area, that is, use the 3D coordinates of the key points of the local human face area as the target position, and use the 3D coordinates of the key points of the local human face area as the target position. The 3D face model is initialized to the target position.
本实施方式中采用ICP算法将3D人脸模型进行刚体变换,即将3D人脸模型移动和旋转到相应的目标位置,实现与人脸点云的位置初步对齐。In this embodiment, the ICP algorithm is used to perform rigid body transformation on the 3D face model, that is, to move and rotate the 3D face model to the corresponding target position, so as to achieve preliminary alignment with the position of the face point cloud.
S132:针对对齐后的3D人脸模型,进一步采用参数拟合算法进行形状参数的拟合,以将3D人脸模型的形状与局部人脸区域点云位置差异化最小,完成3D人脸模型的重建。参数拟合算法可以为拟牛顿算法、牛顿算法、梯度下降法等,由于拟牛顿算法的快速收敛性和运算的复杂性低,本实施例中采用拟牛顿算法。S132: For the aligned 3D face models, further use a parameter fitting algorithm to fit the shape parameters, so as to minimize the difference between the shape of the 3D face model and the point cloud position of the local face area, and complete the 3D face model reconstruction. The parameter fitting algorithm may be a quasi-Newton algorithm, a Newton algorithm, a gradient descent method, etc. Due to the fast convergence of the quasi-Newton algorithm and the low complexity of operation, the quasi-Newton algorithm is used in this embodiment.
其中这个拟合过程将局部人脸区域点云3D坐标作为拟合的目标,其中局部人脸区域点云3D坐标在步骤S116中计算获得,由于是针对局部人脸区域点云进行拟合, 也就将3D人脸模型优化目标的选择限制在了局部人脸区域的点云。该拟合过程可参见前述的通过拟牛顿算法进行形状拟合的介绍,不再赘述。The fitting process takes the 3D coordinates of the point cloud of the local face area as the fitting target, and the 3D coordinates of the point cloud of the local face area are calculated in step S116. Since the fitting is performed on the point cloud of the local face area, The selection of the 3D face model optimization target is limited to the point cloud of the local face area. For the fitting process, refer to the aforementioned introduction of shape fitting by quasi-Newton algorithm, and will not be repeated here.
下面,对本申请实施例应用于车辆上的驾驶员状态监测系统(Driver monitor system,DMS)的场景进一步进行介绍。DMS可以监测驾驶员的状态,例如疲劳监测、分心监测(或称注意力监测)、视线跟踪、危险行为监测(如使用手机、吃东西等)。下面举例说明:In the following, a scenario where the embodiment of the present application is applied to a driver monitor system (Driver monitor system, DMS) on a vehicle will be further introduced. DMS can monitor the state of the driver, such as fatigue monitoring, distraction monitoring (or attention monitoring), eye tracking, and dangerous behavior monitoring (such as using mobile phones, eating, etc.). The following example illustrates:
首先,DMS通过车辆座舱内的摄像头采集驾驶员的图像,摄像头可以是双目摄像头、RGB-D摄像头等,摄像头可以按照需求安装到车辆上,例如安装于座舱内后视镜位置,或安装于方向盘、中控台附近区域等。本实施例中,摄像头采用图1B所示的设置在车辆座舱的左右A柱的两个摄像头构成的双目摄像头。First of all, DMS collects the driver's image through the camera in the vehicle cockpit. The camera can be a binocular camera, RGB-D camera, etc. The camera can be installed on the vehicle as required, for example, installed at the position of the rearview mirror in the cockpit, or installed on the Steering wheel, the area around the center console, etc. In this embodiment, the camera adopts a binocular camera composed of two cameras arranged on the left and right A-pillars of the vehicle cockpit as shown in FIG. 1B .
DMS通过摄像头采集到驾驶员的图像后,可以使用本申请实施例提供的人脸图像处理方法对所采集的图像进行处理,重建驾驶员的3D人脸。尤其是当所采集的图像中的人脸不完整的情况下,可采用本申请实施例提供的人脸图像处理方法进行图像处理。其中人脸不完整的情况包括人脸局部被遮挡的情况,例如驾驶员佩戴墨镜的情况、驾驶员喝水和打电话等导致的水杯、手机、手部或手臂对人脸产生局部遮挡的情况。人脸不完整的情况还包括所采集的图像中部分未被采集到的情况,例如驾驶员头部大范围移动或大角度转动,导致部分人脸移动至摄像头的图像采集区域外,或头部转动(如向后转头)导致部分人脸无法由摄像头采集到的情况。After the DMS collects the driver's image through the camera, the face image processing method provided in the embodiment of the present application can be used to process the collected image to reconstruct the driver's 3D face. Especially when the face in the collected image is incomplete, the face image processing method provided in the embodiment of the present application can be used for image processing. The situation where the face is incomplete includes the situation where the face is partially occluded, such as the situation where the driver wears sunglasses, the driver drinks water and makes a phone call, etc. The water cup, mobile phone, hand or arm partially occludes the face . Incomplete faces also include situations where part of the captured image is not captured, for example, the driver’s head moves in a large range or rotates at a large angle, causing part of the face to move outside the image capture area of the camera, or the driver’s head Turning (such as turning the head backwards) results in the situation that part of the face cannot be captured by the camera.
重建驾驶员3D人脸后,可以进一步基于所重建的3D人脸对驾驶员状态进行检测。例如根据3D人脸检测驾驶员的头部姿态,头部姿态包括头部是否大范围移动或大角度转动,或结合一段时间内的头部姿态变化检测头部是否处于非正常状态,非正常状态包括频繁低头(例如可以作为判断驾驶员是否是打瞌睡、或看手机的依据)、一定时间仰头或偏向一侧(例如可以作为判断驾驶员是否睡着的依据)。又如根据3D人脸检驾驶员测面部状态,面部状态包括眼睛的开度(例如根据眼睛开度是否低于阈值作为判断驾驶员是否困乏的依据)、嘴巴的开度(例如可以作为判断驾驶员是否打瞌睡的依据)、视线的朝向(例如可以作为判断驾驶员注意力的依据)。After the driver's 3D face is reconstructed, the state of the driver can be further detected based on the reconstructed 3D face. For example, based on the 3D face detection of the driver's head posture, the head posture includes whether the head moves in a large range or rotates at a large angle, or combined with the head posture changes over a period of time to detect whether the head is in an abnormal state, abnormal state Including frequently lowering the head (for example, it can be used as the basis for judging whether the driver is dozing off or looking at the mobile phone), and for a certain period of time, the head is raised or turned to one side (for example, it can be used as the basis for judging whether the driver is asleep). Another example is based on 3D face detection to detect the facial state of the driver. The facial state includes the opening of the eyes (for example, whether the eye opening is lower than the threshold as a basis for judging whether the driver is sleepy), the opening of the mouth (for example, it can be used as a basis for judging whether the driver is sleepy), Whether the driver is dozing off), the direction of the line of sight (for example, it can be used as the basis for judging the driver's attention).
DMS基于所检测的驾驶员状态,并结合当前行驶场景,确定是否警示驾驶员。其中,这里的行驶场景可以指结合当前的自动驾驶级别(包括自动驾驶级别L0-L5级别)情况下的行驶场景,例如,当自动驾驶级别低时,触发警示驾驶员的阈值相对较低,自动驾驶级别高时,触发警示驾驶员的阈值相对较高。DMS determines whether to warn the driver based on the detected driver's state and combined with the current driving scene. Wherein, the driving scene here may refer to the driving scene combined with the current automatic driving level (including automatic driving level L0-L5 level). For example, when the automatic driving level is low, the threshold for triggering a warning to the driver is relatively low. When the driving level is high, the threshold for triggering a warning to the driver is relatively high.
在某些实施例中,DMS还可以将检测的驾驶员的状态提供给车辆控制装置,由车辆控制装置判断是否接管车辆的行驶控制。其中,接管后的行驶控制包括自动控制车辆减速并停靠路侧,也包括执行自动行驶(对于L4、L5级别的自动驾驶),例如自动行驶一段区域(如行驶出高速公路)或一段时间,直到行驶至安全位置停车。In some embodiments, the DMS can also provide the detected state of the driver to the vehicle control device, and the vehicle control device can judge whether to take over the driving control of the vehicle. Among them, the driving control after taking over includes automatically controlling the vehicle to slow down and park on the side of the road, and also includes performing automatic driving (for L4, L5 level automatic driving), such as automatically driving for a certain area (such as driving out of the expressway) or for a period of time until Drive to a safe place and stop.
如图6所示,本申请还提供了相应的一种人脸图像处理装置的实施例,关于该装置的有益效果或解决的技术问题,可以参见与各装置分别对应的方法中的描述,或者参见发明内容中的描述,此处不再一一赘述。As shown in Figure 6, the present application also provides a corresponding embodiment of a face image processing device. Regarding the beneficial effects or technical problems solved by the device, you can refer to the descriptions in the methods corresponding to each device, or Refer to the description in the summary of the invention, and details will not be repeated here.
在该人脸图像处理装置的实施例中,该人脸图像处理装置600包括:In the embodiment of this human face image processing device, this human face image processing device 600 comprises:
获取模块610,用于获取人脸图像中的局部人脸形状特征,以及用于获取与所述局部人脸形状特征匹配的人脸样本,获取所述人脸样本的人脸形状参数。具体的,该模块可以用于执行上述人脸图像处理方法中的步骤S10-S20以及其中的示例,或用于执行上述人脸图像处理方法具体实施方式中的步骤S110-S120以及其中的示例。The acquiring module 610 is configured to acquire local face shape features in the face image, and to acquire face samples matching the local face shape features, and acquire face shape parameters of the face samples. Specifically, this module can be used to execute steps S10-S20 and examples thereof in the above-mentioned face image processing method, or to execute steps S110-S120 and examples therein in the specific implementation of the above-mentioned face image processing method.
生成模块620,用于使用所述人脸形状参数生成三维人脸模型。具体的,该模块可以用于执行上述人脸图像处理方法中的步骤S30和S40以及其中的示例,或用于执行上述人脸图像处理方法具体实施方式中的步骤S122-S132以及其中的示例。A generating module 620, configured to generate a three-dimensional human face model using the human face shape parameters. Specifically, this module can be used to execute steps S30 and S40 and the examples thereof in the above-mentioned face image processing method, or to execute steps S122-S132 and the examples thereof in the specific implementation manner of the above-mentioned face image processing method.
在一些实施例中,所述生成模块620还用于:将生成的所述三维人脸模型与所述局部人脸区域的点云数据进行拟合,所述局部人脸区域包含于所述人脸图像。In some embodiments, the generating module 620 is further configured to: fit the generated 3D face model to the point cloud data of the partial face area, the partial face area included in the face face image.
在一些实施例中,所述生成模块620用于所述拟合时,具体用于:获取所述局部人脸区域的关键点,获取所述关键点的三维坐标;将所述三维人脸模型根据所述关键点的三维坐标进行姿态变换;将所述姿态变换后的三维人脸模型与所述局部人脸区域的点云数据进行拟合。In some embodiments, when the generating module 620 is used for the fitting, it is specifically used to: obtain the key points of the local face area, and obtain the three-dimensional coordinates of the key points; the three-dimensional face model performing pose transformation according to the three-dimensional coordinates of the key points; and fitting the three-dimensional face model after the pose transformation to the point cloud data of the local face area.
在一些实施例中,所述局部人脸区域的点云数据根据所述局部人脸区域的像素深度值和相机参数获得。In some embodiments, the point cloud data of the partial face area is obtained according to the pixel depth value of the partial face area and camera parameters.
在一些实施例中,所述局部人脸区域的所述关键点的三维坐标根据所述关键点的深度值和相机参数获得。In some embodiments, the three-dimensional coordinates of the key points of the partial human face area are obtained according to the depth values of the key points and camera parameters.
在一些实施例中,所述获取模块610具体用于:获取所述人脸图像中的目标区域,所述目标区域包括所述局部人脸区域;获取所述目标区域中的所述局部人脸区域;获取所述局部人脸区域的所述局部人脸形状特征。In some embodiments, the acquiring module 610 is specifically configured to: acquire a target area in the face image, where the target area includes the partial face area; acquire the partial face in the target area area; acquiring the local face shape features of the local face area.
在一些实施例中,所述获取模块610用于获取所述目标区域中的所述局部人脸区域时,具体用于至少以下之一:根据所述目标区域像素的颜色值获取所述局部人脸区域;根据所述目标区域像素的深度值获取所述局部人脸区域。In some embodiments, when the acquisition module 610 is used to acquire the partial human face area in the target area, it is specifically used for at least one of the following: acquire the partial human face area according to the color value of the pixel in the target area. A face area: acquiring the partial human face area according to the depth value of the pixels in the target area.
在一些实施例中,所述获取模块610用于获取与所述局部人脸形状特征匹配的人脸样本时,具体用于:在人脸数据库中检索与所述局部人脸形状特征匹配的人脸样本。In some embodiments, when the acquisition module 610 is used to acquire the face samples matching the local face shape features, it is specifically used to: search the face database for people matching the local face shape features face samples.
在一些实施例中,所述生成模块620具体用于:基于参数化三维人脸模型,使用所述人脸形状参数生成所述三维人脸模型。In some embodiments, the generation module 620 is specifically configured to: generate the 3D face model based on the parameterized 3D face model using the face shape parameters.
表1示出了用离散随机事件模型仿真的方法估算人脸发生不同比例的遮挡时,传统3D人脸重建与使用本申请实施例方法进行3D人脸重建的失败的比例,每次实验进行1000次,计算人脸畸形结构(即3D人脸重建失败比例)占比结果如下所示:Table 1 shows the failure ratio of traditional 3D face reconstruction and 3D face reconstruction using the method of the embodiment of the present application when estimating the occlusion of different proportions of the face by the method of discrete random event model simulation, and each experiment is carried out 1000 times Second, the result of calculating the proportion of the deformed face structure (that is, the failure ratio of 3D face reconstruction) is as follows:
表1Table 1
Figure PCTCN2021104294-appb-000004
Figure PCTCN2021104294-appb-000004
Figure PCTCN2021104294-appb-000005
Figure PCTCN2021104294-appb-000005
由上可见,本申请实施例方法实现3D人脸重建有更好的鲁棒性,相比传统的3D人脸重建,在不同的人脸遮挡比例下,重建失败比例均低,并且,在人脸遮挡较大情况下重建失败的比例仍然很低。It can be seen from the above that the method of the embodiment of the present application has better robustness in realizing 3D face reconstruction. Compared with traditional 3D face reconstruction, under different face occlusion ratios, the proportion of reconstruction failures is low, and, in human The proportion of reconstruction failures with large face occlusions is still low.
本申请实施例还提供了一种电子装置,包括:处理器,以及存储器,其上存储有程序指令,程序指令当被处理器执行时使得处理器执行上述图2A-图5对应的各实施例的方法,或其中的各可选实施例。图7是本申请实施例提供的一种电子装置700的结构性示意性图。该电子装置700包括:处理器710、存储器720。The embodiment of the present application also provides an electronic device, including: a processor, and a memory, on which program instructions are stored, and when the program instructions are executed by the processor, the processor executes the above-mentioned embodiments corresponding to Fig. 2A-Fig. 5 method, or alternative embodiments thereof. FIG. 7 is a schematic structural diagram of an electronic device 700 provided by an embodiment of the present application. The electronic device 700 includes: a processor 710 and a memory 720 .
应理解,图7中所示的电子装置700中还可包括通信接口730,可以用于与其他设备之间进行通信。It should be understood that the electronic device 700 shown in FIG. 7 may further include a communication interface 730, which may be used for communication with other devices.
其中,该处理器710可以与存储器720连接。该存储器720可以用于存储该程序代码和数据。因此,该存储器720可以是处理器710内部的存储单元,也可以是与处理器710独立的外部存储单元,还可以是包括处理器710内部的存储单元和与处理器710独立的外部存储单元的部件。Wherein, the processor 710 may be connected to the memory 720 . The memory 720 can be used to store the program codes and data. Therefore, the memory 720 may be a storage unit inside the processor 710, or an external storage unit independent of the processor 710, or may include a storage unit inside the processor 710 and an external storage unit independent of the processor 710. part.
可选的,电子装置700还可以包括总线。其中,存储器720、通信接口730可以通过总线与处理器710连接。总线可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。所述总线可以分为地址总线、数据总线、控制总线等。Optionally, the electronic device 700 may also include a bus. Wherein, the memory 720 and the communication interface 730 may be connected to the processor 710 through a bus. The bus may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like. The bus can be divided into address bus, data bus, control bus and so on.
应理解,在本申请实施例中,该处理器710可以采用中央处理单元(central processing unit,CPU)。该处理器还可以是其它通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(Application specific integrated circuit,ASIC)、现成可编程门矩阵(field programmable gate Array,FPGA)或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。或者该处理器710采用一个或多个集成电路,用于执行相关程序,以实现本申请实施例所提供的技术方案。It should be understood that, in this embodiment of the present application, the processor 710 may be a central processing unit (central processing unit, CPU). The processor can also be other general-purpose processors, digital signal processors (digital signal processors, DSPs), application specific integrated circuits (Application specific integrated circuits, ASICs), off-the-shelf programmable gate arrays (field programmable gate arrays, FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. Alternatively, the processor 710 adopts one or more integrated circuits for executing related programs, so as to realize the technical solutions provided by the embodiments of the present application.
该存储器720可以包括只读存储器和随机存取存储器,并向处理器710提供指令和数据。处理器710的一部分还可以包括非易失性随机存取存储器。例如,处理器710还可以存储设备类型的信息。The memory 720 may include read-only memory and random-access memory, and provides instructions and data to the processor 710 . A portion of processor 710 may also include non-volatile random access memory. For example, processor 710 may also store device type information.
在电子装置700运行时,所述处理器710执行所述存储器720中的计算机执行指令执行上述人脸图像处理方法的操作步骤,例如执行上述图2A-图5对应的各实施例的方法,或其中的各可选实施例。When the electronic device 700 is running, the processor 710 executes the computer-executed instructions in the memory 720 to execute the operation steps of the above-mentioned face image processing method, for example, execute the methods of the above-mentioned embodiments corresponding to FIGS. 2A-5 , or Each of the optional embodiments.
应理解,根据本申请实施例的电子装置700可以对应于执行根据本申请各实施例的方法中的相应主体,并且电子装置700中的各个模块的上述和其它操作和/或功能分别为了实现本实施例各方法的相应流程,为了简洁,在此不再赘述。It should be understood that the electronic device 700 according to the embodiment of the present application may correspond to a corresponding subject performing the methods according to the various embodiments of the present application, and the above-mentioned and other operations and/or functions of the modules in the electronic device 700 are for realizing the present invention For the sake of brevity, the corresponding processes of the methods in the embodiments are not repeated here.
本申请实施例还提供了另一种电子装置,如图8所示的该实施例提供的另一种电 子装置800的结构性示意性图,包括:处理器810,以及接口电路820,其中,处理器810通过接口电路820访问存储器,存储器存储有程序指令,程序指令当被处理器执行时使得处理器执行上述图2A-图5对应的各实施例的方法,或其中的各可选实施例。另外,该电子装置还可包括通信接口、总线等,具体可参见图7所示的实施例中的介绍,不再赘述。The embodiment of the present application also provides another electronic device, as shown in FIG. 8 , which is a schematic structural diagram of another electronic device 800 provided in this embodiment, including: a processor 810, and an interface circuit 820, wherein, The processor 810 accesses the memory through the interface circuit 820, and the memory stores program instructions. When the program instructions are executed by the processor, the processor executes the methods of the above-mentioned embodiments corresponding to FIGS. 2A-5 , or the optional embodiments therein. . In addition, the electronic device may further include a communication interface, a bus, etc. For details, refer to the introduction in the embodiment shown in FIG. 7 , and details are not repeated here.
如图9A或9B所示,本申请实施例还提供了一种车辆100,包括图像采集装置110,用于采集人脸图像,以及人脸图像处理装置120及所包括的各实施例,或,电子装置130。由图像采集装置110将采集的人脸图像提供给人脸图像处理装置120或提供给电子装置130,由人脸图像处理装置120或电子装置130根据所述人脸图像实现上述人脸图像处理方法及各个实施例,以实现3D人脸的重建。其中,图像采集装置110可以是相机,如摄像头,其中摄像头可以是双目摄像头、RGB-D摄像头、IR摄像头等,摄像头可以按照需求安装到车辆上。在一些实施例中可以如图1A和图1B所示的例子,为设置在车辆座舱的左右A柱上的两个摄像头构成的双目摄像头。在其他一些实施例中也可以安装在车辆座舱内的后视镜的朝向乘员的一侧,还可以安装在方向盘、中控台附近区域等。在其他一些实施例中,图像采集装置110也可以是接收摄像头传输的乘员图像数据的电子设备,如数据传输芯片,数据传输芯片例如总线数据收发芯片、网络接口芯片等,数据传输芯片也可以是无线传输芯片,如蓝牙芯片或WiFi芯片等。As shown in FIG. 9A or 9B, the embodiment of the present application also provides a vehicle 100, including an image acquisition device 110 for collecting face images, and a face image processing device 120 and various embodiments included therein, or, electronic device 130 . The face image collected by the image acquisition device 110 is provided to the face image processing device 120 or provided to the electronic device 130, and the above-mentioned face image processing method is realized by the face image processing device 120 or the electronic device 130 according to the face image And various embodiments, to realize the reconstruction of 3D face. Wherein, the image acquisition device 110 may be a camera, such as a camera, where the camera may be a binocular camera, an RGB-D camera, an IR camera, etc., and the camera may be installed on the vehicle as required. In some embodiments, the example shown in FIG. 1A and FIG. 1B can be a binocular camera composed of two cameras arranged on the left and right A-pillars of the vehicle cockpit. In some other embodiments, it can also be installed on the passenger side of the rearview mirror in the cockpit of the vehicle, and can also be installed on the steering wheel, the area near the center console, and the like. In some other embodiments, the image acquisition device 110 can also be an electronic device that receives the occupant image data transmitted by the camera, such as a data transmission chip, such as a bus data transceiver chip, a network interface chip, etc., and the data transmission chip can also be Wireless transmission chips, such as Bluetooth chips or WiFi chips.
图10是本申请实施例提供的一种计算设备900的结构性示意性图。该计算设备900包括:处理器910、存储器920,还可以包括通信接口930。FIG. 10 is a schematic structural diagram of a computing device 900 provided by an embodiment of the present application. The computing device 900 includes: a processor 910 , a memory 920 , and may also include a communication interface 930 .
应理解,该图10中所示的计算设备900中的通信接口930可以用于与其他设备之间进行通信。It should be understood that the communication interface 930 in the computing device 900 shown in FIG. 10 can be used to communicate with other devices.
其中,该处理器910可以与存储器920连接。该存储器920可以用于存储该程序代码和数据。因此,该存储器920可以是处理器910内部的存储单元,也可以是与处理器910独立的外部存储单元,还可以是包括处理器910内部的存储单元和与处理器910独立的外部存储单元的部件。Wherein, the processor 910 may be connected to the memory 920 . The memory 920 can be used to store the program codes and data. Therefore, the memory 920 may be a storage unit inside the processor 910, or an external storage unit independent of the processor 910, or may include a storage unit inside the processor 910 and an external storage unit independent of the processor 910. part.
可选的,计算设备900还可以包括总线。其中,存储器920、通信接口930可以通过总线与处理器910连接。总线可以是PCI总线或EISA总线等。所述总线可以分为地址总线、数据总线、控制总线等。Optionally, computing device 900 may further include a bus. Wherein, the memory 920 and the communication interface 930 may be connected to the processor 910 through a bus. The bus can be a PCI bus or an EISA bus or the like. The bus can be divided into address bus, data bus, control bus and so on.
在计算设备900运行时,所述处理器910执行所述存储器920中的计算机执行指令执行上述方法的操作步骤。When the computing device 900 is running, the processor 910 executes the computer-executed instructions in the memory 920 to perform the operation steps of the above method.
应理解,根据本申请实施例的计算设备900可以对应于执行根据本申请各实施例的方法中的相应主体,并且计算设备900中的各个模块的上述和其它操作和/或功能分别为了实现本实施例各方法的相应流程,为了简洁,在此不再赘述。It should be understood that the computing device 900 according to the embodiment of the present application may correspond to a corresponding body executing the methods according to the various embodiments of the present application, and the above-mentioned and other operations and/or functions of the modules in the computing device 900 are for realizing the present invention For the sake of brevity, the corresponding processes of the methods in the embodiments are not repeated here.
本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时用于执行上述人脸图像处理方法,该方法包括上述各个实施例所描述的方案中的至少之一。The embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, it is used to execute the above-mentioned face image processing method, and the method includes the solutions described in the above-mentioned various embodiments at least one of the .
本申请实施例还提供了一种计算机程序产品,包括有程序指令,该程序指令当被计算机执行时,实现上述人脸图像处理方法,该方法包括上述各个实施例所描述的方 案中的至少之一。The embodiment of the present application also provides a computer program product, including program instructions. When the program instructions are executed by a computer, the above-mentioned face image processing method is implemented, and the method includes at least one of the solutions described in the above-mentioned embodiments. one.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。前述的功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中,包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。Those skilled in the art can appreciate that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application. If the aforementioned functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium, including: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code.
在本申请所提供的几个实施例中,所揭露的方法、装置可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。In several embodiments provided in this application, the disclosed methods and devices may be implemented in other ways. For example, the device embodiments described above are only illustrative, and the division of the units is only a logical function division. In actual implementation, there may be other division methods, for example, multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented.
本领域技术人员会理解,本申请不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本申请的保护范围。因此,虽然通过以上实施例对本申请进行了较为详细的说明,但是本申请不仅仅限于以上实施例,在不脱离本申请的构思的情况下,还可以包括更多其他等效实施例,均属于本申请的保护范畴。Those skilled in the art will understand that the present application is not limited to the specific embodiments described herein, and various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present application. Therefore, although the present application has been described in detail through the above embodiments, the present application is not limited to the above embodiments, and may include more other equivalent embodiments without departing from the concept of the present application, all of which belong to protection scope of this application.

Claims (23)

  1. 一种人脸图像处理方法,其特征在于,包括:A face image processing method, characterized in that, comprising:
    获取人脸图像中的局部人脸形状特征;Obtain the local face shape features in the face image;
    获取与所述局部人脸形状特征匹配的人脸样本,获取所述人脸样本的人脸形状参数;Obtaining a face sample matching the local face shape feature, and obtaining face shape parameters of the face sample;
    生成三维人脸模型。Generate a 3D face model.
  2. 根据权利要求1所述的方法,其特征在于,还包括:The method according to claim 1, further comprising:
    将生成的所述三维人脸模型与局部人脸区域的点云数据进行拟合,所述局部人脸区域包含于所述人脸图像。Fitting the generated three-dimensional human face model with point cloud data of a local human face area included in the human face image.
  3. 根据权利要求2所述的方法,其特征在于,所述拟合包括:The method according to claim 2, wherein said fitting comprises:
    获取所述局部人脸区域的关键点,获取所述关键点的三维坐标;Acquiring the key points of the local face area, and obtaining the three-dimensional coordinates of the key points;
    将所述三维人脸模型根据所述关键点的三维坐标进行姿态变换;performing pose transformation on the three-dimensional human face model according to the three-dimensional coordinates of the key points;
    将姿态变换后的三维人脸模型与所述局部人脸区域的点云数据进行拟合。Fitting the three-dimensional human face model after pose transformation with the point cloud data of the local human face area.
  4. 根据权利要求2或3所述的方法,其特征在于,所述局部人脸区域的点云数据根据所述局部人脸区域的像素深度值和相机参数获得。The method according to claim 2 or 3, wherein the point cloud data of the partial human face area is obtained according to pixel depth values and camera parameters of the partial human face area.
  5. 根据权利要求3所述的方法,其特征在于,所述局部人脸区域的所述关键点的三维坐标根据所述关键点的深度值和相机参数获得。The method according to claim 3, characterized in that, the three-dimensional coordinates of the key points in the partial human face area are obtained according to the depth values of the key points and camera parameters.
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述获取人脸图像中的局部人脸形状特征,包括:The method according to any one of claims 1-5, wherein said obtaining the local face shape features in the face image comprises:
    获取所述人脸图像中的目标区域,所述目标区域包括所述局部人脸区域;Acquiring a target area in the face image, where the target area includes the partial face area;
    获取所述目标区域中的所述局部人脸区域;Acquiring the partial face area in the target area;
    获取所述局部人脸区域的所述局部人脸形状特征。Acquiring the local face shape features of the local face area.
  7. 根据权利要求6所述的方法,其特征在于,所述获取所述目标区域中的所述局部人脸区域,包括以下方式中的至少一种:The method according to claim 6, wherein said obtaining said partial human face area in said target area comprises at least one of the following methods:
    根据所述目标区域像素的颜色值获取所述局部人脸区域;Acquiring the local face area according to the color value of the pixel in the target area;
    根据所述目标区域像素的深度值获取所述局部人脸区域。Acquiring the partial human face area according to the depth value of the pixel in the target area.
  8. 根据权利要求1-7任一项所述的方法,其特征在于,所述获取与所述局部人脸形状特征匹配的人脸样本,包括:在人脸数据库中检索与所述局部人脸形状特征匹配的人脸样本。The method according to any one of claims 1-7, characterized in that the acquiring a face sample matched with the partial face shape feature comprises: Face samples for feature matching.
  9. 根据权利要求1-8任一项所述的方法,其特征在于,所述生成三维人脸模型,包括:基于参数化三维人脸模型,使用所述人脸形状参数生成所述三维人脸模型。The method according to any one of claims 1-8, wherein said generating a 3D face model comprises: using said face shape parameters to generate said 3D face model based on a parameterized 3D face model .
  10. 一种人脸图像处理装置,其特征在于,包括:A human face image processing device is characterized in that it comprises:
    获取模块,用于获取人脸图像中的局部人脸形状特征,以及获取与所述局部人脸形状特征匹配的人脸样本,获取所述人脸样本的人脸形状参数;An acquisition module, configured to acquire a local face shape feature in a face image, and acquire a face sample matching the local face shape feature, and acquire a face shape parameter of the face sample;
    生成模块,用于生成三维人脸模型。The generation module is used to generate a three-dimensional human face model.
  11. 根据权利要求10所述的装置,其特征在于,所述生成模块还用于:将生成的所述三维人脸模型与局部人脸区域的点云数据进行拟合,所述局部人脸区域包含于 所述人脸图像。The device according to claim 10, wherein the generation module is further configured to: fit the generated three-dimensional face model with point cloud data of a local face area, the local face area includes on the face image.
  12. 根据权利要求11所述的装置,其特征在于,所述生成模块用于所述拟合时,具体用于:The device according to claim 11, wherein when the generating module is used for the fitting, it is specifically used for:
    获取所述局部人脸区域的关键点,获取所述关键点的三维坐标;Acquiring the key points of the local face area, and obtaining the three-dimensional coordinates of the key points;
    将所述三维人脸模型根据所述关键点的三维坐标进行姿态变换;performing pose transformation on the three-dimensional human face model according to the three-dimensional coordinates of the key points;
    将姿态变换后的三维人脸模型与所述局部人脸区域的点云数据进行拟合。Fitting the three-dimensional human face model after pose transformation with the point cloud data of the local human face area.
  13. 根据权利要求11或12所述的装置,其特征在于,所述局部人脸区域的点云数据根据所述局部人脸区域的像素深度值和相机参数获得。The device according to claim 11 or 12, wherein the point cloud data of the partial face area is obtained according to pixel depth values and camera parameters of the partial face area.
  14. 根据权利要求12所述的装置,其特征在于,所述局部人脸区域的所述关键点的三维坐标根据所述关键点的深度值和相机参数获得。The device according to claim 12, wherein the three-dimensional coordinates of the key points in the partial face area are obtained according to the depth values of the key points and camera parameters.
  15. 根据权利要求10-13任一项所述的装置,其特征在于,所述获取模块具体用于:The device according to any one of claims 10-13, wherein the acquiring module is specifically used for:
    获取所述人脸图像中的目标区域,所述目标区域包括所述局部人脸区域;Acquiring a target area in the face image, where the target area includes the partial face area;
    获取所述目标区域中的所述局部人脸区域;Acquiring the partial face area in the target area;
    获取所述局部人脸区域的所述局部人脸形状特征。Acquiring the local face shape features of the local face area.
  16. 根据权利要求15所述的装置,其特征在于,所述获取模块用于获取所述目标区域中的所述局部人脸区域时,具体用于以下方式中的至少一种:The device according to claim 15, wherein when the acquisition module is used to acquire the partial face area in the target area, it is specifically used in at least one of the following ways:
    根据所述目标区域像素的颜色值获取所述局部人脸区域;Acquiring the local face area according to the color value of the pixel in the target area;
    根据所述目标区域像素的深度值获取所述局部人脸区域。Acquiring the partial human face area according to the depth value of the pixel in the target area.
  17. 根据权利要求10-16任一项所述的装置,其特征在于,所述获取模块用于获取与所述局部人脸形状特征匹配的人脸样本时,具体用于:在人脸数据库中检索与所述局部人脸形状特征匹配的人脸样本。The device according to any one of claims 10-16, characterized in that, when the acquisition module is used to acquire the face samples matching the local face shape features, it is specifically used to: search in the face database A human face sample matched with the local human face shape feature.
  18. 根据权利要求10-17任一项所述的装置,其特征在于,所述生成模块具体用于:基于参数化三维人脸模型,使用所述人脸形状参数生成所述三维人脸模型。The device according to any one of claims 10-17, wherein the generation module is specifically configured to: generate the 3D face model based on the parameterized 3D face model using the face shape parameters.
  19. 一种电子装置,其特征在于,包括:An electronic device, characterized in that it comprises:
    处理器,以及processor, and
    存储器,所述存储器上存储有程序指令,所述程序指令当被所述处理器执行时,实现权利要求1-9任一项所述的人脸图像处理方法。A memory, on which program instructions are stored, and when the program instructions are executed by the processor, the face image processing method according to any one of claims 1-9 is realized.
  20. 一种电子装置,其特征在于,包括:An electronic device, characterized in that it comprises:
    处理器,以及接口电路,processor, and interface circuitry,
    其中,所述处理器通过所述接口电路访问存储器,所述存储器上存储有程序指令,所述程序指令当被所述处理器执行时,实现权利要求1-9任一项所述的人脸图像处理方法。Wherein, the processor accesses the memory through the interface circuit, and the memory has program instructions stored thereon. When the program instructions are executed by the processor, the human face described in any one of claims 1-9 is realized. image processing method.
  21. 一种车辆,其特征在于,包括:A vehicle, characterized in that it comprises:
    图像采集装置,用于采集人脸图像,以及An image acquisition device for acquiring face images, and
    根据权利要求10-18任一项所述的人脸图像处理装置。The face image processing device according to any one of claims 10-18.
  22. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有程序指令,所述程序指令当被计算机执行时,实现如权利要求1-9任一项所述的人脸 图像处理方法。A computer-readable storage medium, characterized in that, program instructions are stored in the computer-readable storage medium, and when the program instructions are executed by a computer, the human face according to any one of claims 1-9 is realized. image processing method.
  23. 一种计算机程序产品,其特征在于,包括有程序指令,所述程序指令当被计算机执行时,使得所述计算机实现如权利要求1-9任一项所述的人脸图像处理方法。A computer program product, characterized in that it includes program instructions, and when the program instructions are executed by a computer, the computer implements the face image processing method according to any one of claims 1-9.
PCT/CN2021/104294 2021-07-02 2021-07-02 Facial image processing method and apparatus, and vehicle WO2023272725A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180002021.5A CN113632098A (en) 2021-07-02 2021-07-02 Face image processing method and device and vehicle
PCT/CN2021/104294 WO2023272725A1 (en) 2021-07-02 2021-07-02 Facial image processing method and apparatus, and vehicle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/104294 WO2023272725A1 (en) 2021-07-02 2021-07-02 Facial image processing method and apparatus, and vehicle

Publications (1)

Publication Number Publication Date
WO2023272725A1 true WO2023272725A1 (en) 2023-01-05

Family

ID=78391353

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/104294 WO2023272725A1 (en) 2021-07-02 2021-07-02 Facial image processing method and apparatus, and vehicle

Country Status (2)

Country Link
CN (1) CN113632098A (en)
WO (1) WO2023272725A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116721194A (en) * 2023-08-09 2023-09-08 瀚博半导体(上海)有限公司 Face rendering method and device based on generation model

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114511911A (en) * 2022-02-25 2022-05-17 支付宝(杭州)信息技术有限公司 Face recognition method, device and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104966316A (en) * 2015-05-22 2015-10-07 腾讯科技(深圳)有限公司 3D face reconstruction method, apparatus and server
CN108492373A (en) * 2018-03-13 2018-09-04 齐鲁工业大学 A kind of face embossment Geometric Modeling Method
CN109087340A (en) * 2018-06-04 2018-12-25 成都通甲优博科技有限责任公司 A kind of face three-dimensional rebuilding method and system comprising dimensional information
CN110414394A (en) * 2019-07-16 2019-11-05 公安部第一研究所 A kind of face blocks face image method and the model for face occlusion detection

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062791A (en) * 2018-01-12 2018-05-22 北京奇虎科技有限公司 A kind of method and apparatus for rebuilding human face three-dimensional model
CN110136243B (en) * 2019-04-09 2023-03-17 五邑大学 Three-dimensional face reconstruction method, system, device and storage medium thereof
CN110610127B (en) * 2019-08-01 2023-10-27 平安科技(深圳)有限公司 Face recognition method and device, storage medium and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104966316A (en) * 2015-05-22 2015-10-07 腾讯科技(深圳)有限公司 3D face reconstruction method, apparatus and server
CN108492373A (en) * 2018-03-13 2018-09-04 齐鲁工业大学 A kind of face embossment Geometric Modeling Method
CN109087340A (en) * 2018-06-04 2018-12-25 成都通甲优博科技有限责任公司 A kind of face three-dimensional rebuilding method and system comprising dimensional information
CN110414394A (en) * 2019-07-16 2019-11-05 公安部第一研究所 A kind of face blocks face image method and the model for face occlusion detection

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116721194A (en) * 2023-08-09 2023-09-08 瀚博半导体(上海)有限公司 Face rendering method and device based on generation model
CN116721194B (en) * 2023-08-09 2023-10-24 瀚博半导体(上海)有限公司 Face rendering method and device based on generation model

Also Published As

Publication number Publication date
CN113632098A (en) 2021-11-09

Similar Documents

Publication Publication Date Title
JP6695503B2 (en) Method and system for monitoring the condition of a vehicle driver
CN107818310B (en) Driver attention detection method based on sight
EP3033999B1 (en) Apparatus and method for determining the state of a driver
CN108638999B (en) Anti-collision early warning system and method based on 360-degree look-around input
WO2023272725A1 (en) Facial image processing method and apparatus, and vehicle
CN113366491B (en) Eyeball tracking method, device and storage medium
Murphy-Chutorian et al. Hyhope: Hybrid head orientation and position estimation for vision-based driver head tracking
CN109359514B (en) DeskVR-oriented gesture tracking and recognition combined strategy method
CN109875568A (en) A kind of head pose detection method for fatigue driving detection
JP2003015816A (en) Face/visual line recognizing device using stereo camera
CN111144207B (en) Human body detection and tracking method based on multi-mode information perception
CN103632129A (en) Facial feature point positioning method and device
CN114041175A (en) Neural network for estimating head pose and gaze using photorealistic synthetic data
WO2023272453A1 (en) Gaze calibration method and apparatus, device, computer-readable storage medium, system, and vehicle
WO2013074153A1 (en) Generating three dimensional models from range sensor data
WO2020063000A1 (en) Neural network training and line of sight detection methods and apparatuses, and electronic device
US20230116638A1 (en) Method for eye gaze tracking
CN115376113A (en) Driver distraction detection method, driver monitoring system and storage medium
CN115331205A (en) Driver fatigue detection system with cloud edge cooperation
CN108268858A (en) A kind of real-time method for detecting sight line of high robust
CN113361441B (en) Sight line area estimation method and system based on head posture and space attention
Ribas et al. In-Cabin vehicle synthetic data to test Deep Learning based human pose estimation models
CN113780125A (en) Fatigue state detection method and device for multi-feature fusion of driver
US11417063B2 (en) Determining a three-dimensional representation of a scene
Sui et al. A-pillar blind spot display algorithm based on line of sight

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE