CN116597488A

CN116597488A - Face recognition method based on Kinect database

Info

Publication number: CN116597488A
Application number: CN202310561028.1A
Authority: CN
Inventors: 张斌; 骞志彦
Original assignee: Shiyuan Shanghai Transportation Technology Co ltd
Current assignee: Shiyuan Shanghai Transportation Technology Co ltd
Priority date: 2023-05-18
Filing date: 2023-05-18
Publication date: 2023-08-15

Abstract

The invention discloses a face recognition method based on a Kinect database, and relates to the technical field of face recognition. Firstly, building an RGB and depth image imaging environment of Kinect, and acquiring RGB and depth images; then obtaining the space point cloud information of the face; then converting the three-dimensional coordinates into an RGB camera coordinate system, mapping the RGB camera coordinate system onto an undistorted RGB camera plane, and recovering the space position of the three-dimensional point in the RGB camera plane by correcting distortion and mapping the RGB camera coordinate system onto an RGB image origin; and finally, mapping RGB color information to space points through the corresponding relation between the RGB image and the depth image to obtain the three-dimensional point cloud with the color information. The invention solves the problems of relatively low quality of the 3D data captured by Kinect, such as blind spot data missing, relatively low depth resolution, a large amount of noise of depth conversion, space calibration/mapping of RGB and depth images.

Description

Face recognition method based on Kinect database

Technical Field

The invention belongs to the technical field of face recognition, in particular relates to a face recognition method, a Kinect-based face recognition technology, and particularly relates to a method for evaluating a face recognition algorithm using a Kinect sensor database.

Background

In biological feature recognition, the face database is a standard platform for quantitatively evaluating different face recognition algorithms, and is the basis for developing a stable and reliable face recognition system. The number of three-dimensional face databases is relatively small compared to a large number of two-dimensional face databases.

Most existing databases employ high quality laser scanners for data acquisition, which can lead to unbalanced matching of two-dimensional and three-dimensional data in terms of acquisition efficiency and data accuracy; in addition, the capturing time of the high-resolution RGB image is far less than the laser scanning of the human face, and in order to significantly reduce the speed of non-cooperative 2D human face recognition when integrating 3D, the high-quality 3D human face scanning needs careful user coordination. While Kinect sensors overcome the above-mentioned problems by providing interactive rates of 2D and 3D data simultaneously. However, the quality of the 3D data captured by Kinect is relatively low, which suffers from the problem of "blind spots" missing data, relatively low depth resolution, a large amount of noise in depth conversion, spatial calibration/mapping of RGB and depth images. Therefore, in terms of face recognition, it is important to evaluate the three-dimensional data quality of Kinect as compared to high-quality laser scanning. On the other hand, no existing three-dimensional face database provides a three-dimensional video sequence, because a conventional 3D scanner cannot acquire 3D data in real time, and the lack of 3D video data limits the three-dimensional face recognition method of three-dimensional images.

In summary, the existing database based on the laser scanner has a certain limitation in data acquisition, while the use of the Kinect sensor lacks the capability of overcoming the problems of blind spots and the like, and can provide a standard database of three-dimensional video sequences. Therefore, the invention provides a face recognition method based on the Kinect database, and provides a standard database for a face recognition algorithm using the Kinect sensor.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: a method of Kinect database for face recognition is provided, which supplements a face recognition system which cannot be effectively developed stably and reliably due to a relatively small three-dimensional face database, and is used for evaluating a face recognition algorithm using a Kinect sensor.

In order to solve the technical problems, the invention is realized by the following technical scheme:

the invention discloses a face recognition method based on a Kinect database, which comprises the following steps of:

s1, establishing a controlled indoor environment;

the controlled indoor environment includes four controls: (1) The Kinect is installed and stabilized on the top of the notebook computer and is parallel to the ground; (2) The collected person needs to be 0.7m to 0.9m in front of the Kinect sensor at a distance; (3) A simple background was placed behind each participant, fixed a distance of 1.25m from Kinect; (4) arranging an LED lamp in front of the person to be collected;

s2, performing 3D imaging of Kinect; the method specifically comprises the following steps:

projecting a pre-designed speckle pattern into a scene with an IR infrared laser emitter, capturing reflections of the pattern with an IR infrared camera, and comparing the captured pattern with a reference pattern to produce a disparity map I _Disparity And generating a disparity value d at each point; wherein the triangulation of Kinect is:

s3, converting the 3D face data; the method specifically comprises the following steps:

from the obtained disparity map I _Disparity In which Kinect outputs an RGB image and a depth map I simultaneously by triangulation _Depth The method comprises the steps of carrying out a first treatment on the surface of the Based on the obtained RGB image and depth image, three-dimensional point coordinates are calculated:

wherein:

z _world ＝I _Depth (x, y) formula IV

S4, aligning RGB and depth images; the method specifically comprises the following steps:

projecting depth values from the IR camera plane to the RGB camera plane;

then, according to the focal length of the RGB camera, mapping the three-dimensional coordinate based on the RGB camera to an ideal undistorted RGB camera plane;

finally, restoring the real position of the three-dimensional point in the RGB camera plane by correcting the distortion and mapping to the origin of the RGB image;

s5, post-processing: noise removal, facial marking; the method specifically comprises the following steps:

clipping and normalizing by using facial coordinates, and reducing the sampling of 2D and 2.5D surfaces to 96X 96 dimensions; the 3D curved surface cutting is realized by storing the vertexes in a sphere with the radius of 100mm, and the center of the circle is 20mm away from the nose tip in the +z direction; removing spines through a threshold method, performing a hole filling process, and removing white noise while keeping edges by adopting a bilateral smoothing filter;

6 anchor points, namely a left eye center, a right eye center, a nose tip, a left mouth angle, a right mouth angle and a chin, are defined on a human face, are manually marked on an RGB image, and then corresponding positions and three-dimensional points on a depth map are directly found according to the established point corresponding relation.

Compared with the prior art, the invention has the following beneficial effects:

(1) The method solves the problem that the unbalanced matching of two-dimensional and three-dimensional data in acquisition efficiency and data precision can be caused by high-quality three-dimensional scanning experiments, and the 2D and 3D integrated face recognition system is difficult to be effectively deployed in an actual scene; the quality of the 3D data captured by the Kinect is relatively low, the data of the missing 'blind point' is relatively low, the depth resolution is relatively low, the noise is large, the depth conversion is performed, and the space calibration/mapping of RGB and depth images is performed, so that the quality of the three-dimensional data of the Kinect is evaluated by using a standard database, and the problem of effective combination of the two-dimensional data and the three-dimensional data obtained from the Kinect is solved;

(2) The invention obtains the video sequence of the aligned RGB and depth frames by utilizing the alignment of the RGB and depth images, and breaks through the problem of lack of 3D video data to limit the three-dimensional face recognition method of the three-dimensional images;

(3) The method solves the problems of relatively low quality of 3D data captured by Kinect, such as blind spot data missing, relatively low depth resolution, large amount of depth conversion, and noise of space calibration/mapping of RGB and depth images.

Of course, it is not necessary for any one product to practice the invention to achieve all of the advantages set forth above at the same time.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a face recognition method based on a Kinect database of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention aims to provide a standard database and a method, which solve the problem that the unbalanced matching of two-dimensional and three-dimensional data in acquisition efficiency and data precision can be caused by high-quality three-dimensional scanning experiments, and a 2D and 3D integrated face recognition system is difficult to effectively deploy in an actual scene; the quality of the 3D data captured by the Kinect is relatively low, the missing data of the 'blind spot', the depth resolution is relatively low, the noise is high, the depth conversion is high, and the space calibration/mapping of RGB and depth images are performed, so that a standard database is used for evaluating the three-dimensional data quality of the Kinect, and the problem of effectively combining the two-dimensional data and the three-dimensional data acquired from the Kinect is solved. In the present invention, information including gender/year of birth/race/wearing or not wearing glasses/capturing time/session is associated with each identity. In each link, four types of data patterns are captured for each identity: 1) A 2D RGB image; 2) 2.5D depth map; 3) A 3D point cloud; 4) RGB-D video sequence. And 9 facial changes in two stages, namely, facial expression, smile, mouth opening, strong light, sunglasses shielding, hand shielding, paper shielding, right shielding and left shielding, are designed, and all the photos are taken under controlled conditions, but have no limitation on the clothes, make-up or hairstyles of the participants. A draft was also designed to record RGB-D video sequences for each person in two sessions. The protocol includes slow head movements in the horizontal (yaw) and vertical (pitch) directions. The video sequence allows to extract frames of different poses (except for left/right contours recorded in still images), which can be used to test the robustness of two-dimensional/three-dimensional face recognition algorithms, and video-based face recognition can be studied in this dataset.

The following further describes the scheme of the invention, comprising the following steps:

s1, establishing an RGB and depth imaging environment for obtaining Kinect:

the Kinect sensor, which is mounted on top of the notebook computer and stabilizes the Kinect by adjusting the tilt to be parallel to the ground, contains three main components of RGB-D sensing, an Infrared (IR) laser transmitter, an RGB camera, and an IR camera. Participants were asked to sit (0.7 m to 0.9 m) in front of the remote Kinect sensor and follow a pre-set acquisition protocol that involved slow head movements in the horizontal (yaw) and vertical (pitch) directions. A whiteboard was placed behind each participant, fixed a distance of 1.25m from Kinect, to create a simple unified background for filtering. An LED light is placed in front of the participant to create a change in illumination. Different sunglasses and a piece of paper are used to create the shade change. The faces of the participants are automatically captured, processed and organized for database recording according to a predefined database (OpenNI library) structure. The captured RGB image and depth image (256 x 256 in size) are cropped using the predefined ROI.

S2, RGB and depth imaging from Kinect: RGB camera direct capture RGB image I _RGB While the laser transmitter and IR camera together as an active depth sensor acquire distance information from the scene. The pre-designed speckle pattern is projected into the scene created by the transillumination of the grating using an IR infrared laser emitter, and the reflection of the pattern is captured using an IR infrared camera. The captured pattern is then compared to a reference pattern (between the predefined plane and the known distance) to produce a disparity map I _Disparity And generates a disparity value d at each point. From the obtained disparity map I _Disparity In which the depth map I is directly deduced by a simple triangulation method _Depth . This method of describing the triangulation of Kinect is as follows:

where z is the distance (i.e., depth in mm) between Kinect and the real world location; d' is a normalized difference value between 0 and 2047 by normalizing the original disparity value d,

d＝m×d'+n (2)

wherein m and n are denormalization parameters;

b and f are the base length and focal length, respectively, Z ₀ Is the distance between Kinect and the predefined reference pattern. Comprising b, f and Z ₀ The calibration parameters are estimated and provided by the equipment provider.

By triangulation, kinect outputs an RGB image I at the same time _RGB And a depth image I _Depth The image size is 640 x 480, where I _RGB (x,y)＝{v _R ,v _G ,v _B }，v _R ，v _G ，v _B Values (x, y) for the R, G, B channel for the image position; i _Depth (x,y)＝z _world ，z _world Depth values (x, y) representing image positions.

S3, converting into 3D face data: given the depth map obtained in step 2:

I _Depth (x,y)＝z _world (3)

the three-dimensional coordinates of each point can be determined from its image position (x _world ,y _world ,z _world ) The calculation is as follows:

wherein, (x) ₀ ,y ₀ ) Is the primary location of the depth image, and δx, δy represents the correction of lens distortion, where δx and δy are estimated in advance, as supplied by the equipment vendor.

Based on the above projection, three-dimensional coordinates of each pre-cut surface depth image are calculated and stored in a three-dimensional format of KinectFaceDB.

S4, RGB-D comparison of face data: converting the three-dimensional coordinates based on the IR camera into an RGB camera-defined three-dimensional coordinate system based on affine transformation:

wherein R is R ^3×3 Is a rotation matrix and T.epsilon.R ^3×1 Is the transformation vector.

Then according to the focal length f of RGB camera _RGB Mapping three-dimensional coordinates based on the RGB camera onto an ideal undistorted RGB camera plane:

finally, the true position (x) of the three-dimensional point in the RGB camera plane is restored by correcting the distortion and mapping to the RGB image origin _RGB ,y _RGB )：

Wherein at D ε R ^3×3 And V.epsilon.R ^3×3 Directly using Kinect factory calibration parameters.

The RGB image and the aligned depth map (using the original mm size) are then stored separately. And (3) finding the corresponding relation between the RGB image and the depth image, directly mapping the RGB color to a three-dimensional point, and then recording the three-dimensional point cloud with corresponding color mapping. Finally, the draft of step 1 is used to store video sequences of aligned RGB and depth frames from the RGB camera and the IR camera.

S5, post-processing:

noise removal: clipping, normalization, and reduction of 2D and 2.5D face sampling to 96 x 96 dimensions using facial coordinates. The 3D surface clipping is realized by preserving the vertexes in a sphere with the radius of 100mm, and the center of the circle is 20mm away from the nose tip in the +z direction. The spike is removed by a thresholding method, and a hole filling process is performed, using a bilateral smoothing filter, white noise is removed while retaining the edges.

Facial marking: firstly, 6 anchor points, namely a left eye center, a right eye center, a nose tip, a left mouth angle, a right mouth angle and a chin, are defined on a human face, are manually marked on an RGB image, and then the corresponding position and three-dimensional point on a depth map are directly found according to the point corresponding relation established by people.

The method can be applied to the fields of facial demographic analysis, three-dimensional surface modeling, facial expression analysis, three-dimensional face registration, RGB-D feature extraction, occlusion detection, plastic surgery and the like.

The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.

Claims

1. The face recognition method based on the Kinect database is characterized by comprising the following steps of:

s1, establishing a controlled indoor environment;

wherein:

z _world ＝I _Depth (x, y) formula IV

projecting depth values from the IR camera plane to the RGB camera plane;