CN111523398A - Method and device for fusing 2D face detection and 3D face recognition - Google Patents

Method and device for fusing 2D face detection and 3D face recognition Download PDF

Info

Publication number
CN111523398A
CN111523398A CN202010241057.6A CN202010241057A CN111523398A CN 111523398 A CN111523398 A CN 111523398A CN 202010241057 A CN202010241057 A CN 202010241057A CN 111523398 A CN111523398 A CN 111523398A
Authority
CN
China
Prior art keywords
face
image
depth
dimensional
point cloud
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010241057.6A
Other languages
Chinese (zh)
Inventor
葛晨阳
邓鹏超
卢泳冲
屈渝立
乔欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202010241057.6A priority Critical patent/CN111523398A/en
Publication of CN111523398A publication Critical patent/CN111523398A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships

Abstract

A method for fusing 2D face detection and 3D face recognition comprises the following steps: collecting an RGB image and a depth image or an IR image and a depth image by a 3D depth camera; preprocessing an RGB image or an IR image; detecting a human face according to a retinaface2D detection method for the preprocessed RGB image or IR image, and framing 5 human face key points of a human face boundary frame, a nose tip, left and right eyes and left and right mouth corners; mapping 5 face key points to corresponding point clouds of 5 XYZ key points, and standardizing three-dimensional face data point clouds detected by an XYZ point cloud data set; orthogonally projecting the standardized three-dimensional face data point cloud to a specified plane according to the standardized three-dimensional face data point cloud, carrying out grid fitting, and converting the three-dimensional face data point cloud into a two-dimensional face depth map after the grid fitting; and identifying the two-dimensional face depth image by adopting a face identification method to obtain a face identification result.

Description

Method and device for fusing 2D face detection and 3D face recognition
Technical Field
The disclosure belongs to the technical field of computer vision and image recognition, and particularly relates to a method and a device for fusing 2D face detection and 3D face recognition.
Background
Face recognition, as an effective non-invasive biometric identification technique, has rapidly become the tool of choice in the fields of surveillance, security and entertainment since the development of computer vision. Compared with other biological identification technologies, such as iris identification, fingerprint and palm print identification and the like, the face identification has the advantages of non-contact type, strong interactivity, easy acquisition and the like, becomes the basis of technologies such as city security, equipment unlocking, man-machine interaction, virtual reality, digital media and the like, and plays more and more roles.
2D and 3D face recognition belong to the field of computer vision. Because the 2D image is the projection of a three-dimensional object on a plane, the performance of 2D face recognition can be interfered by external factors such as illumination, posture, expression and the like, and therefore the accuracy is influenced. Therefore, how to fully utilize two-dimensional and three-dimensional information of a human face and how to improve the recognition rate of human face recognition, especially in scenes with human face shielding, poor lighting conditions, poor shooting angles and the like, is one of the key problems to be solved at present.
Disclosure of Invention
In order to solve the above problem, the present disclosure provides a method for fusing 2D face detection and 3D face recognition, the method comprising the steps of:
s100: collecting an RGB image and a depth image or an IR image and a depth image by a 3D depth camera;
s200: preprocessing an RGB image or an IR image;
s300: detecting a human face according to a retinaface2D detection method for the preprocessed RGB image or IR image, and framing a human face boundary frame and 5 human face key points of a nose tip, left and right eyes, left and right mouth corners;
s400: mapping the obtained 5 face key points to corresponding point clouds of 5 XYZ key points, and standardizing the three-dimensional face data point cloud detected by the XYZ point cloud data set;
s500: orthogonally projecting the standardized three-dimensional face data point cloud to a specified plane according to the standardized three-dimensional face data point cloud, carrying out grid fitting, and converting the three-dimensional face data point cloud into a two-dimensional face depth map after the grid fitting;
s600: and identifying the two-dimensional face depth image by adopting a face identification method to obtain a face identification result.
The present disclosure also provides a device for fusing 2D face detection and 3D face recognition, including:
means for capturing RGB images and depth images, or IR images and depth images, by a 3D depth camera;
means for pre-processing the RGB image or the IR image;
a device for detecting a human face according to a retinaface2D detection method for the preprocessed RGB image or IR image, and framing a human face boundary frame and 5 human face key points of a nose tip, left and right eyes, left and right mouth corners;
a device for mapping the obtained 5 face key points to corresponding point clouds of 5 XYZ key points, and standardizing the three-dimensional face data point cloud detected by the XYZ point cloud data set;
the device is used for orthogonally projecting the standardized three-dimensional face data point cloud to a specified plane according to the standardized three-dimensional face data point cloud, carrying out grid fitting and converting the three-dimensional face into a two-dimensional face depth map after the grid fitting;
and the device is used for identifying the two-dimensional face depth image by adopting a face identification method to obtain a face identification result.
According to the technical scheme, the rapid 2D face detection is carried out through RGB or IR images, then the face key points are mapped to the 3D face point cloud for standardization processing, the 3D face point cloud data subjected to standardization processing are mapped to the 2D face depth image, and a face recognition result can be rapidly obtained by combining a face recognition method. The method has the characteristics of rapidness, high efficiency, safety and reliability in the face recognition process, and is suitable for the application field of embedded 3D face recognition.
Drawings
FIG. 1 is a flow chart of a method of fusing 2D face detection and 3D face recognition provided in an embodiment of the present disclosure;
FIG. 2 is a schematic representation of a grid in one embodiment of the present disclosure;
FIG. 3 is a schematic diagram of bilinear interpolation in one embodiment of the present disclosure;
FIG. 4 is a schematic diagram of weight calculation in one embodiment of the present disclosure;
fig. 5 is a schematic flow chart of a face recognition method in an embodiment of the present disclosure.
Detailed Description
In one embodiment, as shown in fig. 1, a method for fusing 2D face detection and 3D face recognition is disclosed, the method comprising the steps of:
s100: collecting an RGB image and a depth image or an IR image and a depth image by a 3D depth camera;
s200: preprocessing an RGB image or an IR image;
s300: detecting a human face according to a retinaface2D detection method for the preprocessed RGB image or IR image, and framing a human face boundary frame and 5 human face key points of a nose tip, left and right eyes, left and right mouth corners;
s400: mapping the obtained 5 face key points to corresponding point clouds of 5 XYZ key points, and standardizing the three-dimensional face data point cloud detected by the XYZ point cloud data set;
s500: orthogonally projecting the standardized three-dimensional face data point cloud to a specified plane according to the standardized three-dimensional face data point cloud, carrying out grid fitting, and converting the three-dimensional face data point cloud into a two-dimensional face depth map after the grid fitting;
s600: and identifying the two-dimensional face depth image by adopting a face identification method to obtain a face identification result.
For the embodiment, the method takes a 2D data (IR image or RGB) image as an auxiliary to carry out face detection, extracts face boundaries and key points, corresponds the information to depth data, and carries out face recognition on the depth data. In the method, the 2D data is used as assistance and is identified by depending on the depth data. This increases complexity if one device takes both 2D and 3D recognition.
The three-dimensional face feature vector obtained by the method can accurately identify the face, and the identification method can be transplanted to an embedded system platform and has good reliability and convenience. The method has the characteristics of high speed, high efficiency, safety and reliability in the face recognition process, the speed is high because a lightweight network is used, and the accuracy is high because not only global features are extracted, but also local feature extraction is added.
In another embodiment, the 3D depth camera in step S100 is a structured light depth camera or a ToF depth camera.
For this embodiment, the 3D depth camera may be a structured light depth camera, or may be a ToF depth camera, mainly including: the device comprises a projector, an RGB camera, an IR camera and a depth calculation module.
And the projector is an infrared laser speckle projector. The working process comprises the following steps: firstly, an infrared laser speckle projector projects dense infrared laser beams outwards, and the laser beams form a coding pattern in a specific mode after coherent interference and diffuse reflection on the surface of an object. The encoding pattern is a speckle pattern consisting of randomly distributed speckle points, and the speckle pattern is fixed and has a certain offset in the horizontal or vertical direction along with the distance. The IR camera is responsible for collecting and receiving the speckle pattern, the Depth calculation module carries out block matching parallax estimation on the collected speckle pattern and an internally stored reference speckle pattern with a known distance as a left binocular parallax image and a right binocular parallax image to generate a parallax vector image, and a Depth image of a projection space or a target object is finally obtained according to a Depth calculation method.
And the ToF depth camera, the projector is a floodlight projector or a regular speckle projector, and the IR camera is a ToF receiving camera. Firstly, a floodlight projector emits a light source for uniform irradiation, or a regular speckle projector emits a laser speckle pattern, a ToF receiving camera synchronously receives a phase shift image reflected by the projector after irradiating the surface of an object, and a depth calculation module performs depth decoding on RAW data of the phase shift image by using a phase shift method.
In another embodiment, step S200 further comprises: and converting the Depth image into an XYZ point cloud data set according to internal and external parameters of the RGB camera and the IR Depth camera, and obtaining a pixel point mapping relation between the RGB image and the IR image.
In this embodiment, preprocessing is performed from three aspects of image contrast, image gradation, and image detail.
Image contrast: the dynamic range of the image is enlarged by increasing the contrast of the image, and for the image of each pixel represented by 8 bits, the dynamic range occupies the whole 0-255 gray levels, and the contrast is obviously increased compared with the image only expressed by local gray levels. The dynamic range stretching method comprises (1) linear mapping, wherein when the dynamic range is stretched in equal proportion, the loss of gray level caused by saturated truncation can be caused by parameter setting; (2) the nonlinear Gamma transformation mapping can determine a mapping curve according to the requirement, enlarge the dynamic range of high gray level and reduce the dynamic range of low gray level; (3) with improved Gamma variation, the stretching from the middle gray level of the dynamic range to both ends is achieved.
Hierarchy of the image: by adopting a self-adaptive histogram equalization algorithm for limiting contrast, the image is partitioned, and the histograms are respectively counted for mapping, so that the effect of enhancing the local layering sense is improved.
Details of the image: the high-frequency information in the image is firstly separated by adopting an image sharpening algorithm and multiplied by an enhancement coefficient, and then the high-frequency information is superposed into the original image. The sharpening may be 4 neighbors or 8 neighbors. The sharpening algorithm idea is expanded, namely the size and shape of a window with low-pass or high-pass filtering for high-low frequency separation are changed, so that different detail enhancement effects can be obtained.
In another embodiment, the method for detecting retinaface2D in step S300 specifically includes: the preprocessed IR or RGB image is input into a backbone network, in the face detection multitask learning, 5 pieces of key point information of the face are additionally marked except for a traditional face classification loss function and a face box loss function, an additional supervision information loss function for face alignment is introduced according to the key point information, and a self-superimposed decoding branch is introduced to predict a 3D face information branch.
In another embodiment, the three-dimensional face data point cloud normalization in step S400 specifically comprises the following steps: taking the nose tip as the origin of a new coordinate system, taking the direction from the left eye to the right eye as the new x-axis direction, taking the direction from the midpoint of the two eyes to the midpoint of the two corners of the mouth as the new y-axis direction, and determining the new z-axis by the cross multiplication of the x-axis and the y-axis; and coordinate values of each three-dimensional face under a new coordinate system are obtained through coordinate transformation, and all three-dimensional face data are the same in orientation and posture after transformation.
For this embodiment, for two sets of points, and the points between the sets are mapped one-to-one:
A=(X1i,Y1i,Z1i,...,Xni,Yni,Zni)T
B=(X1i,Y1i,Z1i,...,Xni,Yni,Zni)T
a and B respectively correspond to a human face point cloud data point set, and the central point of the point set is calculated:
Figure BDA0002430874690000071
Figure BDA0002430874690000072
translating the center of the point set to u respectivelyAAnd uB
A′={Ai-uA}
B′={Bi-uB}
And (3) calculating a covariance matrix H between the point sets, and performing singular value decomposition:
Figure BDA0002430874690000073
SVD(H)=USVT
wherein U and VTIs a unitary matrix, S is a rectangular diagonal matrix, and the rotation matrix R ═ VU can be obtainedTThe translation matrix t ═ RuA+uBSo that the transformation of the depth point cloud of any position to the standardized position can be obtainedAnd (4) matrix.
In a specific embodiment, the conversion process of the depth image into the XYZ point cloud dataset is obtained by internal reference of the IR camera, and the specific formula is as follows:
X(i,j)=depth(i,j)*(j-cx)/fx
Y(i,j)=depth(i,j)*(i-cy)/fy
wherein (i, j) corresponds to the pixel position in the Depth map, Depth (i, j) is the Depth information value of the corresponding pixel point, and fx, fy, cx, and cy are internal references of the IR camera.
In another embodiment, the mesh fitting in step S500 refers to converting from a three-dimensional format to a two-dimensional format, the pixel values of the two-dimensional format representing depth values.
For the purposes of this embodiment, a depth map is a lossy conversion of three-dimensional data to two-dimensional data, with the horizontal and vertical coordinates replacing physical coordinates with pixel coordinates, and only depth values remaining, and is therefore generally referred to as 2.5D data. And a mesh surface fitting algorithm is adopted to convert the input disordered three-dimensional point cloud into a mesh surface in a lossless manner, and the physical meanings of x, y and z are kept. For a given point cloud set { x1, y1, z1, x2, y2, z2... times.xn, yn, zn }, the range of the point cloud set data values is (— infinity, + ∞), the corresponding grid curved surface z ═ f (x, y) needs to be found, i.e. the grid curved surface z ═ f (x, y) is fitted into the grids with the width and length of nx × ny and the distance of d respectively. The value on each grid represents the depth value interpolated for that point. The grid schematic is shown in fig. 2 below, where the numbers shown next to the grid points along the x-axis and y-axis are grid numbers.
As shown in fig. 3, for an input point in a grid formed by J, K, L, M with four grid numbers, the depth value of the point obtained by bilinear interpolation should be:
Figure BDA0002430874690000091
the four proportional products in the x-axis and y-axis directions can be regarded as weights, Q and P respectively represent input point position information, Depth represents a Depth value, and len represents a position distance value.
All points in the point cloud are mapped into a grid as input points, and as shown in fig. 4, a bilinear interpolation weight of each grid point is calculated. Fitting 7 grids with the width and the length of 5 multiplied by 5 and the distance of 1, wherein the left table represents the space coordinates of the input 7 points, the middle table represents the bilinear interpolation operation process, and the rightmost table represents the interpolation weight corresponding to each point. For example, the point (0, 4, 1.5) is located at the vertex of mesh No. 21, and the weight is (0, 0, 1, 0) according to the formula; for a point (3.9, 0.1, 0.2) located between vertices 4, 5, 9, 10, the interpolation weight is calculated as (0.09, 0.81, 0.01, 0.09).
The sparse vector for each input point weight is nx × ny dimension, so all weight vectors form a sparse weight matrix AFThe true depth value of the input point is bFThe depth value of the fitted surface on each grid is z; according to the equation:
AF·z=bF
the fitted mesh surface can be solved. Because the formula is an underdetermined equation set, the formula has infinite solution; meanwhile, the fitted curved surface is not smooth and has severe fluctuation. The first derivative needs to be added as a penalty term on the basis that the first derivative is equal to 0, and (I is the corresponding point depth value) is satisfied for each point:
I(x,y)-I(x+1,y)=0
I(x,y)-I(x,y+1)=0
I(x,y)-I(x+1,y+1)=0
I(x+1,y)-I(x,y+1)=0
after fitting the mesh, the point cloud may be converted from a three-dimensional format to a two-dimensional format.
In another embodiment, the face recognition method in step S600 mainly comprises feature point detection, a global feature extraction network GFE, a local feature extraction network LFE, and a full connection layer.
For the embodiment, the face recognition method has the following main characteristics: the method is based on the Mobilefacenet and has the advantages of high speed, deep network and good performance; secondly, extracting the relationship characteristics between the local characteristic pairs by adopting a local characteristic extraction network.
The flow of the face recognition method is shown in fig. 5, and the face recognition method mainly comprises feature point detection, a global feature extraction network (GFE), a local feature extraction network (LFE), and a full connection layer. The input to the network is a gridded fitted surface. Inputting a two-dimensional face depth map detected by face key points, and entering a global feature extraction process. The global feature extraction network has two roles: the method is used as a main network framework for algorithm feature extraction, and is used for providing a multi-channel feature map for local feature extraction; the method takes the modified MobileFaceNet as a basic network structure (Base CNN) for global feature extraction. In the local feature extraction, Region Of Interest (ROI) (region Of interest) selection is carried out according to the mapping Of the face key points on the feature map, so that the features Of the corresponding key point positions are extracted. And finally, fusing the global features and the local features, and outputting a feature vector representing the three-dimensional face through a full connection layer.
On three data sets of IAS-Lab RGBD, RGBD and ND-2006, the identification accuracy of global feature extraction is 99.0%, 98.9% and 96.0% on the three data sets respectively, and the identification accuracy of the fusion global feature and the local feature is improved to 99.4% 99.3% and 96.8%.
In another embodiment, the internal and external parameters of the RGB camera and the IR depth camera in step S200 are obtained by 3D depth camera calibration.
In this embodiment, due to the position difference between the IR camera and the RGB camera, the final output RGB image and the Depth image cannot be aligned in a one-to-one correspondence relationship between pixels, and the method needs to use a color image as an auxiliary tool to perform face detection, so internal and external reference coefficients of the RGB camera and the IR camera need to be calibrated, and a Depth map and a color map acquired at the same time are calibrated to the same point in a space corresponding to the pixels at the same position. The pixel coordinates, the image coordinates, the camera coordinates and the world coordinates have a corresponding conversion relationship through the camera internal reference and the camera external reference, so that the pixel coordinates of the depth image and the color image can be unified. The specific conversion formula is as follows:
Figure BDA0002430874690000111
wherein, [ X ]rgbYrgbZrgb]TIs the three-dimensional coordinate, [ X ], of a point next to the RGB camera coordinate systemdYdZd]TThe three-dimensional coordinates of the next point of the camera coordinate system of the IR camera are R and t respectively represent the rotation and translation relations from the IR camera to the camera coordinate system and are calibrated according to the Zhang Zhengyou calibration method. The RGB camera and the IR camera are relatively fixed in position, so that calibration is only needed once.
After the coordinate system conversion relation between the RGB camera and the IR camera is obtained, the corresponding relation between the IR image and the RGB image pixels can be obtained according to the internal parameters, and the formula is as follows:
Figure BDA0002430874690000121
wherein [ u ]rgbvrgb]TAnd fx, fy, cx and cy are internal parameters of the RGB camera. Because the Depth image and the IR image are in one-to-one correspondence, a certain pixel in the Depth image is given, the three-dimensional coordinate of the point in the camera coordinate system of the IR camera can be obtained through the internal parameters of the IR camera, the coordinate of the point in the camera coordinate system of the corresponding RGB camera is obtained, and the pixel coordinate of the corresponding RGB image can be obtained according to the internal parameters of the RGB camera, so that the pixel correspondence between the RGB image and the Depth image (IR image) is realized, and the RGB and Depth images under the uniform visual angle are obtained.
In another embodiment, the RGB image pixel point mapping relationship obtained in step S200 is obtained by using the following method: and aligning the pixels of the RGB image and the Depth image by internal and external parameters obtained by calibration.
For the embodiment, the bounding box detected by the IR image and the key point of the face may directly correspond to the Depth image without performing a pixel alignment operation, and the result of detecting the RGB image may correspond the pixel point on the RGB image to the pixel point on the Depth image through the RGB camera and the internal and external parameters of the IR camera.
In the present embodiment, although there is inevitably a certain deviation in the pose of the face data obtained by actual scanning when the data is acquired. At the same time, the spatial position between different data sets differs more. The inconsistency of the positions greatly interferes with related visual tasks, and the spatial positions of the point clouds of the human faces need to be transformed, so that the positions of different human faces in a world coordinate system are relatively consistent.
In another embodiment, an apparatus for fusing 2D face detection and 3D face recognition includes:
means for capturing an RGB image and a Depth image, or an IR image and a Depth image, by a 3D Depth camera;
the device is used for preprocessing the RGB image or the IR image, converting the Depth image into an XYZ point cloud data set according to internal and external parameters of the RGB camera and the IR Depth camera, and obtaining a pixel point mapping relation between the RGB image and the IR image;
a device for detecting a human face according to a retinafece 2D detection method for the preprocessed RGB image or IR image, and framing 5 key points of the human face, namely a human face bounding box and a nose tip, left and right eyes, left and right mouth corners;
a device for mapping the obtained 5 face key points to corresponding point clouds of 5 XYZ key points, and standardizing the three-dimensional face data point cloud detected by the XYZ point cloud data set;
the device is used for orthogonally projecting the standardized three-dimensional face data point cloud to a specified plane according to the standardized three-dimensional face data point cloud, carrying out grid fitting and converting the three-dimensional face into a two-dimensional face depth map after the grid fitting;
and the device is used for identifying the two-dimensional face depth image by adopting a face identification method to obtain a face identification result.
For the embodiment, the retinaface2D detection method is improved on the basis of One-stage target detection network, and the class probability and the position coordinate value of the object are directly regressed. The method has the characteristics of high speed, high precision and the like.
In summary, the above embodiments are only used for illustrating the technical solutions of the present disclosure, and not for limiting the same; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present disclosure.

Claims (10)

1. A method for fusing 2D face detection and 3D face recognition, the method comprising the steps of:
s100: collecting an RGB image and a depth image or an IR image and a depth image by a 3D depth camera;
s200: preprocessing an RGB image or an IR image;
s300: detecting a human face according to a retinaface2D detection method for the preprocessed RGB image or IR image, and framing a human face boundary frame and 5 human face key points of a nose tip, left and right eyes, left and right mouth corners;
s400: mapping the obtained 5 face key points to corresponding point clouds of 5 XYZ key points, and standardizing the three-dimensional face data point cloud detected by the XYZ point cloud data set;
s500: orthogonally projecting the standardized three-dimensional face data point cloud to a specified plane according to the standardized three-dimensional face data point cloud, carrying out grid fitting, and converting the three-dimensional face data point cloud into a two-dimensional face depth map after the grid fitting;
s600: and identifying the two-dimensional face depth image by adopting a face identification method to obtain a face identification result.
2. The method according to claim 1, preferably, the 3D depth camera in step S100 is a structured light depth camera or a ToF depth camera.
3. The method of claim 1, step S200 further comprising: and converting the depth image into an XYZ point cloud data set according to internal and external parameters of the RGB camera and the IR depth camera, and obtaining a pixel point mapping relation between the RGB image and the IR image.
4. The method according to claim 1, wherein the detecting method of retinaface2D in step S300 specifically comprises: the preprocessed IR or RGB image is input into a backbone network, in the face detection multitask learning, 5 pieces of key point information of the face are additionally marked except for a traditional face classification loss function and a face box loss function, an additional supervision information loss function for face alignment is introduced according to the key point information, and a self-superimposed decoding branch is introduced to predict a 3D face information branch.
5. The method of claim 1, wherein the step S400 of normalizing the point cloud of the three-dimensional face data comprises the following steps: taking the nose tip as the origin of a new coordinate system, taking the direction from the left eye to the right eye as the new x-axis direction, taking the direction from the midpoint of the two eyes to the midpoint of the two corners of the mouth as the new y-axis direction, and determining the new z-axis by the cross multiplication of the x-axis and the y-axis; and coordinate values of each three-dimensional face under a new coordinate system are obtained through coordinate transformation, and all three-dimensional face data are the same in orientation and posture after transformation.
6. The method of claim 1, wherein the mesh fitting in step S500 refers to converting from a three-dimensional format to a two-dimensional format, the pixel values of the two-dimensional format representing depth values.
7. The method according to claim 1, wherein the face recognition method in step S600 mainly comprises feature point detection, global feature extraction network GFE, local feature extraction network LFE and full connection layer.
8. The method of claim 3, wherein the RGB camera and IR depth camera internal and external parameters are obtained by 3D depth camera calibration.
9. The method as claimed in claim 3, wherein the obtained RGB image pixel mapping relationship is obtained by adopting the following method: the RGB image and the depth image are aligned by the internal and external parameters obtained by calibration.
10. An apparatus for fusing 2D face detection and 3D face recognition, comprising:
means for capturing RGB images and depth images, or IR images and depth images, by a 3D depth camera;
means for pre-processing the RGB image or the IR image;
a device for detecting a human face according to a retinaface2D detection method for the preprocessed RGB image or IR image, and framing a human face boundary frame and 5 human face key points of a nose tip, left and right eyes, left and right mouth corners;
a device for mapping the obtained 5 face key points to corresponding point clouds of 5 XYZ key points, and standardizing the three-dimensional face data point cloud detected by the XYZ point cloud data set;
the device is used for orthogonally projecting the standardized three-dimensional face data point cloud to a specified plane according to the standardized three-dimensional face data point cloud, carrying out grid fitting and converting the three-dimensional face data point cloud into a two-dimensional face depth map after the grid fitting;
and the device is used for identifying the two-dimensional face depth image by adopting a face identification method to obtain a face identification result.
CN202010241057.6A 2020-03-30 2020-03-30 Method and device for fusing 2D face detection and 3D face recognition Pending CN111523398A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010241057.6A CN111523398A (en) 2020-03-30 2020-03-30 Method and device for fusing 2D face detection and 3D face recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010241057.6A CN111523398A (en) 2020-03-30 2020-03-30 Method and device for fusing 2D face detection and 3D face recognition

Publications (1)

Publication Number Publication Date
CN111523398A true CN111523398A (en) 2020-08-11

Family

ID=71901235

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010241057.6A Pending CN111523398A (en) 2020-03-30 2020-03-30 Method and device for fusing 2D face detection and 3D face recognition

Country Status (1)

Country Link
CN (1) CN111523398A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183481A (en) * 2020-10-29 2021-01-05 中国科学院计算技术研究所厦门数据智能研究院 3D face recognition method based on structured light camera
CN112215174A (en) * 2020-10-19 2021-01-12 江苏中讯通物联网技术有限公司 Sanitation vehicle state analysis method based on computer vision
CN112434647A (en) * 2020-12-09 2021-03-02 浙江光珀智能科技有限公司 Human face living body detection method
CN112597823A (en) * 2020-12-07 2021-04-02 深延科技(北京)有限公司 Attention recognition method and device, electronic equipment and storage medium
CN112651279A (en) * 2020-09-24 2021-04-13 深圳福鸽科技有限公司 3D face recognition method and system based on short-distance application
CN113139465A (en) * 2021-04-23 2021-07-20 北京华捷艾米科技有限公司 Face recognition method and device
CN113392763A (en) * 2021-06-15 2021-09-14 支付宝(杭州)信息技术有限公司 Face recognition method, device and equipment
CN113469043A (en) * 2021-06-30 2021-10-01 南方科技大学 Method and device for detecting wearing state of safety helmet, computer equipment and storage medium
CN113886477A (en) * 2021-09-28 2022-01-04 北京三快在线科技有限公司 Face recognition method and device
CN115050149A (en) * 2022-06-17 2022-09-13 郑州铁路职业技术学院 Automatic teller machine based on face recognition and automatic teller method thereof
CN116645299A (en) * 2023-07-26 2023-08-25 中国人民解放军国防科技大学 Method and device for enhancing depth fake video data and computer equipment

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104091162A (en) * 2014-07-17 2014-10-08 东南大学 Three-dimensional face recognition method based on feature points
CN105243374A (en) * 2015-11-02 2016-01-13 湖南拓视觉信息技术有限公司 Three-dimensional human face recognition method and system, and data processing device applying same
CN107944435A (en) * 2017-12-27 2018-04-20 广州图语信息科技有限公司 A kind of three-dimensional face identification method, device and processing terminal
CN108197587A (en) * 2018-01-18 2018-06-22 中科视拓(北京)科技有限公司 A kind of method that multi-modal recognition of face is carried out by face depth prediction
CN108520204A (en) * 2018-03-16 2018-09-11 西北大学 A kind of face identification method
CN108549873A (en) * 2018-04-19 2018-09-18 北京华捷艾米科技有限公司 Three-dimensional face identification method and three-dimensional face recognition system
CN109101871A (en) * 2018-08-07 2018-12-28 北京华捷艾米科技有限公司 A kind of living body detection device based on depth and Near Infrared Information, detection method and its application
CN109753875A (en) * 2018-11-28 2019-05-14 北京的卢深视科技有限公司 Face identification method, device and electronic equipment based on face character perception loss
CN110458041A (en) * 2019-07-19 2019-11-15 国网安徽省电力有限公司建设分公司 A kind of face identification method and system based on RGB-D camera
CN110852310A (en) * 2020-01-14 2020-02-28 长沙小钴科技有限公司 Three-dimensional face recognition method and device, terminal equipment and computer readable medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104091162A (en) * 2014-07-17 2014-10-08 东南大学 Three-dimensional face recognition method based on feature points
CN105243374A (en) * 2015-11-02 2016-01-13 湖南拓视觉信息技术有限公司 Three-dimensional human face recognition method and system, and data processing device applying same
CN107944435A (en) * 2017-12-27 2018-04-20 广州图语信息科技有限公司 A kind of three-dimensional face identification method, device and processing terminal
CN108197587A (en) * 2018-01-18 2018-06-22 中科视拓(北京)科技有限公司 A kind of method that multi-modal recognition of face is carried out by face depth prediction
CN108520204A (en) * 2018-03-16 2018-09-11 西北大学 A kind of face identification method
CN108549873A (en) * 2018-04-19 2018-09-18 北京华捷艾米科技有限公司 Three-dimensional face identification method and three-dimensional face recognition system
CN109101871A (en) * 2018-08-07 2018-12-28 北京华捷艾米科技有限公司 A kind of living body detection device based on depth and Near Infrared Information, detection method and its application
CN109753875A (en) * 2018-11-28 2019-05-14 北京的卢深视科技有限公司 Face identification method, device and electronic equipment based on face character perception loss
CN110458041A (en) * 2019-07-19 2019-11-15 国网安徽省电力有限公司建设分公司 A kind of face identification method and system based on RGB-D camera
CN110852310A (en) * 2020-01-14 2020-02-28 长沙小钴科技有限公司 Three-dimensional face recognition method and device, terminal equipment and computer readable medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
JIANKANG DENG: "RetinaFace: Single一stage Dense Face Localisation in the Wild", 《COMPUTER VISION AND PATTERN RECOGNITION》 *
张会森: "人脸识别技术", 《计算机工程与设计》 *
明悦: "《视听媒体感知与识别》", 31 August 2015, 北京邮电大学出版社 *
栗科峰: "《人脸图像处理与识别技术》", 31 August 2018, 黄河水利出版社 *
赵立新: "《移动互联网时代的智能硬件安全探析》", 31 July 2019, 中国财富出版社 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112651279A (en) * 2020-09-24 2021-04-13 深圳福鸽科技有限公司 3D face recognition method and system based on short-distance application
CN112215174A (en) * 2020-10-19 2021-01-12 江苏中讯通物联网技术有限公司 Sanitation vehicle state analysis method based on computer vision
CN112183481B (en) * 2020-10-29 2022-05-20 中科(厦门)数据智能研究院 3D face recognition method based on structured light camera
CN112183481A (en) * 2020-10-29 2021-01-05 中国科学院计算技术研究所厦门数据智能研究院 3D face recognition method based on structured light camera
CN112597823A (en) * 2020-12-07 2021-04-02 深延科技(北京)有限公司 Attention recognition method and device, electronic equipment and storage medium
CN112434647A (en) * 2020-12-09 2021-03-02 浙江光珀智能科技有限公司 Human face living body detection method
CN113139465A (en) * 2021-04-23 2021-07-20 北京华捷艾米科技有限公司 Face recognition method and device
CN113392763A (en) * 2021-06-15 2021-09-14 支付宝(杭州)信息技术有限公司 Face recognition method, device and equipment
CN113469043A (en) * 2021-06-30 2021-10-01 南方科技大学 Method and device for detecting wearing state of safety helmet, computer equipment and storage medium
CN113886477A (en) * 2021-09-28 2022-01-04 北京三快在线科技有限公司 Face recognition method and device
CN113886477B (en) * 2021-09-28 2023-01-06 北京三快在线科技有限公司 Face recognition method and device
CN115050149A (en) * 2022-06-17 2022-09-13 郑州铁路职业技术学院 Automatic teller machine based on face recognition and automatic teller method thereof
CN115050149B (en) * 2022-06-17 2023-08-04 郑州铁路职业技术学院 Face recognition-based self-service cash dispenser and cash withdrawal method thereof
CN116645299A (en) * 2023-07-26 2023-08-25 中国人民解放军国防科技大学 Method and device for enhancing depth fake video data and computer equipment
CN116645299B (en) * 2023-07-26 2023-10-10 中国人民解放军国防科技大学 Method and device for enhancing depth fake video data and computer equipment

Similar Documents

Publication Publication Date Title
CN111523398A (en) Method and device for fusing 2D face detection and 3D face recognition
CN111066065B (en) System and method for hybrid depth regularization
TWI455062B (en) Method for 3d video content generation
CN101443817B (en) Method and device for determining correspondence, preferably for the three-dimensional reconstruction of a scene
Cao et al. Sparse photometric 3D face reconstruction guided by morphable models
KR100681320B1 (en) Method for modelling three dimensional shape of objects using level set solutions on partial difference equation derived from helmholtz reciprocity condition
CN114666564B (en) Method for synthesizing virtual viewpoint image based on implicit neural scene representation
CN106919257B (en) Haptic interactive texture force reproduction method based on image brightness information force
CN110910437A (en) Depth prediction method for complex indoor scene
CN113012293A (en) Stone carving model construction method, device, equipment and storage medium
CN113850865A (en) Human body posture positioning method and system based on binocular vision and storage medium
CN107767358B (en) Method and device for determining ambiguity of object in image
Kang et al. Competitive learning of facial fitting and synthesis using uv energy
CN109345570B (en) Multi-channel three-dimensional color point cloud registration method based on geometric shape
Zhuang et al. A dense stereo matching method based on optimized direction-information images for the real underwater measurement environment
Yin et al. Virtual reconstruction method of regional 3D image based on visual transmission effect
KR101673144B1 (en) Stereoscopic image registration method based on a partial linear method
CN113129348B (en) Monocular vision-based three-dimensional reconstruction method for vehicle target in road scene
Ervan et al. Downsampling of a 3D LiDAR point cloud by a tensor voting based method
CN115147577A (en) VR scene generation method, device, equipment and storage medium
CN106056599B (en) A kind of object recognition algorithm and device based on Object Depth data
Villa-Uriol et al. Automatic creation of three-dimensional avatars
Chang et al. Pixel-based adaptive normalized cross correlation for illumination invariant stereo matching
Drap et al. Underwater multimodal survey: Merging optical and acoustic data
CN117593618B (en) Point cloud generation method based on nerve radiation field and depth map

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200811

RJ01 Rejection of invention patent application after publication