CN113139465A

CN113139465A - Face recognition method and device

Info

Publication number: CN113139465A
Application number: CN202110443405.2A
Authority: CN
Inventors: 司苏沛; 李骊
Original assignee: Beijing HJIMI Technology Co Ltd
Current assignee: Beijing HJIMI Technology Co Ltd
Priority date: 2021-04-23
Filing date: 2021-04-23
Publication date: 2021-07-20

Abstract

The invention provides a face recognition method and a device, a pair of T-frame image frames transmitted by a user terminal is obtained, face detection is carried out, and if a face is detected, a T-frame face pair consisting of T-frame color face data and T-frame depth face data is obtained; fusing the depth face data of the T frame and n depth face data before the T frame; and performing similarity calculation according to the extracted color face feature vector and the extracted depth face feature vector, and if the obtained final decision score is greater than a threshold value, determining that the T-frame face identifies the corresponding face through the face. In the scheme, the T-depth face data and n depth face data before the T frame are fused are improved, whether the T frame face identifies the corresponding face through the face is determined according to the final decision score, the depth image is applied to carry out depth face data fusion and obtain the final decision score, the face identification accuracy of the depth image is improved, and the face identification accuracy is further improved.

Description

Face recognition method and device

Technical Field

The invention relates to the technical field of face recognition, in particular to a face recognition method and a face recognition device.

Background

With the development of the depth detection technology, the depth data acquired by the depth camera based on the structured light has higher integrity and higher precision, and gradually reaches the practical standard.

In the prior art, it is possible to extract 3D (3 Dimensions) shape features of a face through depth data and perform face recognition, but there are differences due to differences in imaging quality of depth camera devices, resulting in a lower accuracy of face recognition through pure 3D (2 Dimensions) than 2D (two-dimensional) color RGB face recognition. However, the 2D image has no depth information, so the existing face recognition system has generally insufficient ability to distinguish similar faces, and cannot accurately recognize texture features (such as color painting, tattoo, or illumination change) of the same face.

Disclosure of Invention

In view of this, embodiments of the present invention provide a face recognition method and apparatus, so as to achieve the purpose of improving the face recognition accuracy.

In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:

the first aspect of the embodiment of the invention discloses a face recognition method, which comprises the following steps:

acquiring a pair of T frame image frames transmitted by a user terminal, wherein the pair of T frame image frames comprises a pair of T frame color image frames and a T frame depth image frame, and the value of T is an integer greater than 1;

taking the T frame color image as the input of a face detection model to carry out face detection;

if the face is detected, acquiring a T-frame face pair consisting of T-frame color face data and T-frame depth face data;

fusing the T frame depth face data and n depth face data before the T frame to obtain fused T frame depth face data, wherein the value of n is a positive integer smaller than T;

respectively extracting color face feature vectors corresponding to the T-frame color face data and depth face feature vectors corresponding to the fused T-frame depth face data;

performing similarity calculation according to the color face feature vector and the depth face feature vector to obtain a final decision score final _ score;

and if the final decision score final _ score is larger than a threshold value, determining that the face of the T frame passes face recognition to the corresponding face.

Optionally, the performing face detection by using the T-frame color image as an input of a face detection model includes:

inputting the T frame color images in the pair of T frame image frames into a pre-constructed face detection model;

acquiring satisfactory face data in the T-frame color image output by the face detection model, wherein the face data comprises coordinates of the upper left corner and the lower right corner of a rectangle in which a face is positioned, the satisfactory face refers to the face data of the face when one face exists, and the face data corresponding to the face with the largest area in the faces when a plurality of faces exist;

correspondingly, if a face is detected, acquiring a T-frame face pair composed of T-frame color face data and T-frame depth face data, including:

and cutting the T frame color image frame and the T frame depth image frame in the pair of T frame image frames according to the coordinates to obtain a T frame face pair consisting of T frame color face data and T frame depth face data.

Optionally, the fusing the T-frame depth face data and n depth face data before the T frame to obtain fused T-frame depth face data includes:

converting the T-frame depth face data and n depth face data before the T frame into corresponding T-frame three-dimensional point cloud data based on camera parameters;

rotating and transforming n three-dimensional point cloud data before a T frame to a T frame camera coordinate system where the T frame three-dimensional point cloud data is located by an iterative closest point method ICP (inductively coupled plasma) registration algorithm to obtain n three-dimensional point cloud data before the transformed T frame;

fusing the T-frame three-dimensional point cloud data and n three-dimensional point cloud data before the transformed T frame into a Truncated Signed Distance Function (TSDF) model;

and based on camera internal parameters, projecting the TSDF model to a T-frame camera imaging plane to obtain fused T-frame depth face data.

Optionally, the respectively extracting the color face feature vectors corresponding to the T-frame color face data and the depth face feature vectors corresponding to the fused T-frame depth face data includes:

inputting the T-frame color face data into a color face recognition model, and extracting color face feature vectors corresponding to the T-frame color face data;

inputting the fused T-frame depth face data into a depth face recognition model, and extracting a depth face feature vector corresponding to the fused T-frame depth face data;

the color face recognition model and the depth face recognition model are obtained based on a training depth neural convolution network model.

Optionally, the performing similarity calculation according to the color face feature vector and the depth face feature vector to obtain a final decision score final _ score includes:

comparing the extracted color face feature vector with color face feature vectors in a pre-constructed sample library, and calculating color similarity score (color _ score);

comparing the extracted depth face feature vector with a depth face feature vector in a pre-constructed sample library, and calculating a depth similarity score depth _ score;

based on a weighted average method, acquiring sum values of the color similarity score color _ score and the depth similarity score depth _ score after being respectively multiplied by respective corresponding weight coefficients, and taking the sum values as final decision scores final _ score;

wherein, final _ score is w1 color _ score + w2 depth _ score, w1 is the weight coefficient of the color similarity score color _ score, w2 is the weight coefficient of the depth similarity score depth _ score, and w1+ w2 is 1.

Optionally, the calculating the color similarity score color _ score includes:

calculating cosine similarity or Euclidean distance between the color face feature vectors and the color face feature vectors in the sample library, and obtaining color similarity score color _ score according to the cosine similarity or the Euclidean distance;

accordingly, calculating the depth similarity score depth _ score includes:

and calculating cosine similarity or Euclidean distance between the depth face feature vector and the depth face feature vector in the sample library, and obtaining depth similarity score depth _ score according to the cosine similarity or the Euclidean distance.

The second aspect of the embodiments of the present invention discloses a face recognition apparatus, the apparatus comprising:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a pair of T frame image frames transmitted by a user terminal, the pair of T frame image frames comprises a pair of T frame color image frames and a T frame depth image frame, and the value of T is an integer greater than 1;

the processing module is used for performing face detection by taking the pair of T-frame image frames as the input of a face detection model, and acquiring a T-frame face pair consisting of T-frame color face data and T-frame depth face data if a face is detected;

the fusion module is used for fusing the T frame depth face data and n depth face data before the T frame to obtain fused T frame depth face data, wherein the value of n is a positive integer smaller than T;

the extraction module is used for respectively extracting color face feature vectors corresponding to the T-frame color face data and depth face feature vectors corresponding to the fused T-frame depth face data;

the calculation module is used for carrying out similarity calculation according to the color face feature vector and the depth face feature vector to obtain a final decision score final _ score;

and the determining module is used for determining that the T frame face passes face recognition to the corresponding face if the final decision score final _ score is larger than a threshold value.

Optionally, the processing module includes:

the input unit is used for inputting the T frame color images in the pair of T frame image frames into a pre-constructed human face detection model;

the acquisition unit is used for acquiring the face data meeting the requirements in the T-frame color image output by the face detection model, wherein the face data comprises coordinates of the upper left corner and the lower right corner of a rectangle at which the face is positioned, the face meeting the requirements refers to the face data of one face when one face exists, and the face data corresponding to the face with the largest area in the faces when a plurality of faces exist;

and the processing unit is used for cutting the T frame color image frame and the T frame depth image frame in the pair of T frame image frames according to the coordinates and acquiring a T frame face pair consisting of T frame color face data and T frame depth face data.

Optionally, the fusion module includes:

the conversion unit is used for converting the T-frame depth face data and n depth face data before the T frame into corresponding T-frame three-dimensional point cloud data based on camera internal parameters;

the first processing unit is used for rotationally converting n three-dimensional point cloud data before a T frame to a T frame camera coordinate system where the T frame three-dimensional point cloud data is located through an ICP (inductively coupled plasma) registration algorithm to obtain n three-dimensional point cloud data before the T frame after conversion;

the fusion unit is used for fusing the T-frame three-dimensional point cloud data and n three-dimensional point cloud data before the transformed T frame into a truncated signed distance function model (TSDF);

and the second processing unit is used for projecting the TSDF model to a T-frame camera imaging plane based on camera internal parameters to obtain fused T-frame depth face data.

Optionally, the extracting module includes:

the first extraction unit is used for inputting the T-frame color face data into a color face recognition model and extracting color face feature vectors corresponding to the T-frame color face data;

the second extraction unit is used for inputting the fused T-frame depth face data into a depth face recognition model and extracting a depth face feature vector corresponding to the fused T-frame depth face data;

Based on the face recognition method and device provided by the embodiment of the invention, a pair of T frame image frames transmitted by a user terminal is obtained, wherein the pair of T frame image frames comprises a pair of T frame color image frames and a T frame depth image frame, and the value of T is an integer greater than 1; taking the T frame color image as the input of a face detection model to carry out face detection; if the face is detected, acquiring a T-frame face pair consisting of T-frame color face data and T-frame depth face data; fusing the T frame depth face data and n depth face data before the T frame to obtain fused T frame depth face data, wherein the value of n is a positive integer smaller than T; respectively extracting color face feature vectors corresponding to the T-frame color face data and depth face feature vectors corresponding to the fused T-frame depth face data; performing similarity calculation according to the color face feature vector and the depth face feature vector to obtain a final decision score final _ score; and if the final decision score final _ score is larger than a threshold value, determining that the face of the T frame passes face recognition to the corresponding face. In the scheme, a plurality of T-frame depth face data and n depth face data before a T frame are fused to obtain fused T-frame depth face data, whether the T-frame face identifies the corresponding face through the face is determined according to the final decision score, the depth image is applied to carry out depth face data fusion and obtain the final decision score, and the depth image comprises the shape information of the face, so that the integrity, the precision and the signal-to-noise ratio of the depth image are improved, the face identification accuracy of the depth image is further improved, the difference between the face identification accuracy and the face identification accuracy of a color image is reduced, and the face identification accuracy is further improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic flow chart of a face recognition method according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of acquiring a T-frame face pair according to an embodiment of the present invention;

fig. 3 is a schematic flow chart of obtaining fused T-frame depth face data according to an embodiment of the present invention;

fig. 4 is a schematic view of a process for extracting a face feature vector according to an embodiment of the present invention;

FIG. 5 is a schematic flow chart of obtaining a final decision score final _ score according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a face recognition apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of another face recognition apparatus according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of another face recognition apparatus according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of another face recognition apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

As can be seen from the background art, the existing face recognition system has a generally insufficient ability to distinguish similar faces, and cannot accurately recognize texture features (such as color drawings, tattoos, or illumination changes) of the same face.

Therefore, the embodiment of the invention provides a face recognition method and a face recognition device, which improve the accuracy of face recognition on the basis of applying texture information of a 2D color image and shape information of a 3D depth image. The depth image is applied by fusing the face data and obtaining the decision score, and the depth image comprises the shape information of the face, so that the integrity, the precision and the signal-to-noise ratio of the depth image are improved, the face recognition accuracy of the depth image is further improved, the difference between the face recognition accuracy of the depth image and the face recognition accuracy of the color image is reduced, and the face recognition accuracy is improved.

As shown in fig. 1, which is a schematic flow chart of a face recognition method according to an embodiment of the present invention, the method includes the following steps:

step S101: a pair of T frame image frames transmitted by a user terminal is obtained.

In step S101, a pair of T frame image frames includes a pair of T frame color image frames and a T frame depth image frame, where T is an integer greater than 1. For example, if T is 2, the T frame is the 2 nd frame.

The T-frame color image frame and the T-frame depth image frame represent the color image frame and the depth image frame at the same time, namely, pixel points at the same positions of the color image frame and the depth image frame respectively represent a certain position point (represented by RGB) and the depth from the center point of the camera (a z-axis value in a camera coordinate system).

The user terminal may be a device with a camera function, such as a camera and a smart phone.

In the process of implementing step S101, a pair of T frame image frames composed of T frame color image frames and T frame depth image frames is intercepted from the color image video stream information and the depth image video stream information transmitted from the user terminal.

In the embodiment of the present invention, the intercepting mode is optional, and the intercepting mode may be sequential, or a pair of T frame image frames may be taken every N frames. Wherein the value of N is an integer greater than or equal to 2.

Step S102: and (4) performing face detection by taking the T-frame color image as the input of a face detection model, executing step S103 if a face is detected, and executing step S101 if the face is not detected.

In step 102, the face detection model may be a PCN (progressive calibration network).

Optionally, the model may also be a face detection model such as yolo (you Only Look one).

In the process of implementing step S102 specifically, the acquired T-frame color image is input to the face detection model as an input parameter of the face detection model, and face detection is performed based on the face detection model.

If the face detection model detects that a face exists in the T frame color image, step S103 is executed, and if the face detection model does not detect that a face exists in the T frame color image, step S101 is returned to and continuously executed.

Step S103: and acquiring a T-frame face pair consisting of T-frame color face data and T-frame depth face data.

In the process of implementing step S103 specifically, based on the position coordinates of the face detected in the T-frame color image, the T-frame color face data and the T-frame depth face data are captured from the T-frame image frame corresponding to the T-frame color image, and the pair of T-frame color face data and T-frame depth face data is referred to as a T-frame face pair.

Step S104: and fusing the T-frame depth face data and n depth face data before the T frame to obtain fused T-frame depth face data.

In step S104, the n depth face data before the T frame refers to T-1 frame depth face data,. -, T-n +1 frame depth face data, and T-n frame depth face data, where n is a positive integer smaller than T.

Similarly, the n color face data before the T frame refers to the color face data of the T-1 frame, the color face data of the T-n +1 frame and the color face data of the T-n frame. That is, the T-1 frame color face data and the T-1 frame depth face data may constitute a T-1 frame face pair, and the T-n frame color face data and the T-n frame depth face data may constitute a T-n frame face pair.

In the process of specifically implementing the step S104, the acquired T-frame depth face data and n depth face data before the T frame are fused to obtain fused T-frame depth face data.

Step S105: and respectively extracting color face characteristic vectors corresponding to the T-frame color face data and depth face characteristic vectors corresponding to the fused T-frame depth face data.

In the process of implementing step S105 specifically, based on the T frame color face data and the fused T frame depth face data, a color face feature vector corresponding to the T frame color face data is extracted from the T frame color face data, and a depth face feature vector corresponding to the fused T frame depth face data is extracted from the fused T frame depth face data.

It should be noted that, when the operations of extracting the color face feature vectors corresponding to the T-frame color face data and extracting the depth face feature vectors corresponding to the fused T-frame depth face data are performed, the order of the operations is not distinguished.

The color face feature vector corresponding to the T-frame color face data can be extracted first, and then the depth face feature vector corresponding to the T-frame depth face data after fusion is extracted.

Or extracting the depth face feature vector corresponding to the T-frame depth face data after fusion, and then extracting the color face feature vector corresponding to the T-frame color face data.

Step S106: and performing similarity calculation according to the color face feature vector and the depth face feature vector to obtain a final decision score final _ score.

In step S106, the similarity calculation method may adopt a cosine similarity algorithm or an euclidean distance algorithm, and generally adopts the cosine similarity algorithm to calculate the similarity of the human faces, and the higher the value of the obtained similarity is, the more similar the human faces are, the more likely the human faces are to be the same person.

In the process of specifically implementing the step S106, similarity calculation is performed by using the extracted color face feature vectors to obtain a color similarity score color _ score, similarity calculation is performed by using the extracted depth face feature vectors to obtain a depth similarity score depth _ score, and then a final decision score final _ score is obtained by using the color similarity score color _ score and the depth similarity score depth _ score based on the formula (1).

final_score＝w1*color_score+w2*depth_score，(1)

w1 is the weight coefficient of the color similarity score color _ score, w2 is the weight coefficient of the depth similarity score depth _ score, and w1+ w2 is 1.

Step S107: and judging whether the final decision score final _ score is larger than a threshold value, if so, executing the step S108, and if not, executing the step S101.

In the process of implementing step S107 specifically, it is determined whether the final decision score final _ score is greater than a threshold, if so, it indicates that the face corresponding to the T-frame face pair is the same as the face corresponding to the sample library and belongs to the same person, step S108 is executed, otherwise, it indicates that the face corresponding to the T-frame face pair is different from the face corresponding to the sample library and does not belong to the same person, and step S101 is executed.

Step S108: and determining that the face corresponding to the T-frame face passes face recognition.

In the process of implementing step S108 specifically, on the premise that the final decision score final _ score is determined to be greater than the threshold, it is determined that the face of the T frame passes face recognition on the corresponding face.

Based on the face recognition method provided by the embodiment of the invention, a pair of T frame image frames transmitted by a user terminal is obtained, wherein the pair of T frame image frames comprises a pair of T frame color image frames and a T frame depth image frame, and the value of T is an integer greater than 1; taking the T frame color image as the input of a face detection model to carry out face detection; if the face is detected, acquiring a T-frame face pair consisting of T-frame color face data and T-frame depth face data; fusing the T-frame depth face data and n depth face data before the T frame to obtain fused T-frame depth face data, wherein the value of n is a positive integer smaller than T; respectively extracting color face feature vectors corresponding to the T-frame color face data and depth face feature vectors corresponding to the T-frame depth face data after fusion; performing similarity calculation according to the color face feature vector and the depth face feature vector to obtain a final decision score final _ score; and if the final decision score final _ score is larger than the threshold value, determining that the face of the T frame passes face recognition to the corresponding face. In the scheme, a plurality of T-frame depth face data and n depth face data before a T frame are fused to obtain fused T-frame depth face data, whether the T-frame face identifies the corresponding face through the face is determined according to the final decision score, the depth image is applied to carry out depth face data fusion and obtain the final decision score, and the depth image comprises the shape information of the face, so that the integrity, the precision and the signal-to-noise ratio of the depth image are improved, the face identification accuracy of the depth image is further improved, the difference between the face identification accuracy and the face identification accuracy of a color image is reduced, and the face identification accuracy is further improved.

Based on the above-mentioned face recognition method provided by the embodiment of the present invention, step S102 is executed to perform face detection by using the T-frame color image as the input of the face detection model, and step S103 is executed to obtain the process of T-frame face pair composed of T-frame color face data and T-frame depth face data. As shown in fig. 2, a schematic flow chart for obtaining a T-frame face pair according to an embodiment of the present invention mainly includes the following steps:

step S201: and inputting the T frame color images in a pair of T frame image frames into a pre-constructed human face detection model.

Step S202: and acquiring the human face data which meets the requirements in the T-frame color image output by the human face detection model.

In step S202, when there is one face, the satisfactory face refers to the face data of the one face, and when there are a plurality of faces, the satisfactory face refers to the face data corresponding to the face with the largest area among the plurality of faces.

Specifically, the face data includes coordinates of the top left corner and the bottom right corner of the rectangle where the face is located.

In the process of implementing step S202 specifically, a face detection result output by the face detection model is obtained, where the face detection result is face data meeting requirements in the T-frame color image.

Step S203: and cutting a T frame color image frame and a T frame depth image frame in the pair of T frame image frames according to the coordinates to obtain a T frame face pair consisting of T frame color face data and T frame depth face data.

In the process of the specific implementation step S203, according to the coordinates of the upper left corner and the lower right corner of the rectangle in which the face is located in the face data, a T-frame color image frame and a T-frame depth image frame in a pair of T-frame image frames are cut, and a T-frame face pair composed of T-frame color face data and T-frame depth face data is obtained.

Based on the face recognition method provided by the embodiment of the invention, the coordinates of the detected face are obtained by carrying out face detection on the T-frame color image, the T-frame color image frame and the T-frame depth image frame in coordinate cutting are obtained, the T-frame face pair consisting of the T-frame color face data and the T-frame depth face data is obtained, the face is more accurately recognized by applying the color image and the depth image, and the accuracy of face recognition is improved.

Based on the above-mentioned face recognition method provided by the embodiment of the present invention, step S105 is executed to fuse the T-frame depth face data and n depth face data before the T frame, so as to obtain a process of fused T-frame depth face data. As shown in fig. 3, a schematic flow chart for obtaining fused T-frame depth face data according to an embodiment of the present invention mainly includes the following steps:

step S301: and converting the T-frame depth face data and n depth face data before the T frame into corresponding T-frame three-dimensional point cloud data based on camera parameters.

In the process of implementing step S301 specifically, the T-frame depth face data and the T-1 frame depth face data, the T-n +1 frame depth face data, and the T-n frame depth face data are converted into corresponding T-frame three-dimensional point cloud data by using camera internal parameters.

It should be noted that the camera internal parameters are parameters related to the characteristics of the camera itself, such as the focal length and the rotation direction of the camera. In the embodiment of the invention, the camera internal reference specifically refers to internal reference of a shooting device or a shooting module used when a face image is shot or collected.

Step S302: and (2) rotationally transforming n three-dimensional point cloud data before the T frame to a T frame camera coordinate system where the T frame three-dimensional point cloud data is located by an ICP (iterative closest point) registration algorithm to obtain n three-dimensional point cloud data before the transformed T frame.

In step S302, the ICP registration algorithm, also called iterative closest point method, is an algorithm for registering a pair of point clouds to the same coordinate system.

The n three-dimensional point cloud data before the T frame refer to T-1 frame three-dimensional point cloud data, T-n +1 frame three-dimensional point cloud data and T-n frame three-dimensional point cloud data, wherein the value of n is a positive integer smaller than T.

In the specific implementation process of step S302, the T-1 frame three-dimensional point cloud data,. and T-n +1 frame three-dimensional point cloud data and the T-n frame three-dimensional point cloud data are rotationally transformed to the T frame camera coordinate system where the T frame three-dimensional point cloud data is located by the ICP registration algorithm, so as to obtain n three-dimensional point cloud data before the transformed T frame.

It should be noted that, in addition to the ICP registration algorithm, n three-dimensional point cloud data before the T frame is rotationally converted to the T frame camera coordinate system where the T frame three-dimensional point cloud data is located, so as to obtain n three-dimensional point cloud data before the converted T frame, the NDT registration algorithm (Normal distribution Transform, Normal distribution transformation algorithm) and the RANSAC registration algorithm (RANdom SAmple Consensus algorithm) may also be used to obtain n three-dimensional point cloud data before the converted T frame.

It should be noted that the registration algorithm is not limited to the ICP registration algorithm disclosed above, and may be set by a technician based on technical requirements.

Step S303: fusing the T-frame three-dimensional point cloud data and n three-dimensional point cloud data before the transformed T frame into a truncated signed distance function model (TSDF) model.

In step S303, the TSDF model, also called a truncated signed distance function model, is a surface reconstruction algorithm that uses structured point cloud data and expresses a surface by parameters, and the core is to map the point cloud data into a predefined three-dimensional space, and to express an area near the surface of a real scene by a truncated signed distance function, thereby establishing a surface model.

In the specific implementation process of step S303, the T-frame three-dimensional point cloud data and the converted T-1-frame three-dimensional point cloud data, the T-n + 1-frame three-dimensional point cloud data, and the T-n-frame three-dimensional point cloud data are fused into the TSDF model.

Step S304: and based on camera internal parameters, projecting the TSDF model to a T-frame camera imaging plane to obtain fused T-frame depth face data.

In the process of implementing step S304 specifically, the TSDF model is projected into the T-frame camera imaging plane according to the camera parameters, so as to obtain the fused T-frame depth face data.

It should be noted that the larger n is, the more depth images are fused, and the higher the integrity, accuracy and signal-to-noise ratio of T frame face depth data is.

In the embodiment of the present invention, the size of n needs to be determined in consideration of the imaging quality, the chip computation power and the face recognition accuracy requirement of a specific depth camera device, and generally should be in the range of [2, 30 ], that is, n is greater than or equal to 2 and less than 30.

Based on the face recognition method provided by the embodiment of the invention, the T-frame depth face data and the n depth face data before the T frame are converted into the corresponding T-frame three-dimensional point cloud data, the n three-dimensional point cloud data before the T frame are rotationally converted into a T-frame camera coordinate system where the T-frame three-dimensional point cloud data is located, the n three-dimensional point cloud data before the converted T frame are obtained, the T-frame three-dimensional point cloud data and the n three-dimensional point cloud data before the converted T frame are fused, the fused T-frame depth face data are obtained, and the depth face data are fused by applying the depth image, so that the integrity and the precision of the face depth data are improved, and the accuracy of face recognition is improved.

Based on the above-mentioned face recognition method provided by the embodiment of the present invention, step S105 is executed to respectively extract the color face feature vectors corresponding to the T-frame color face data and the depth face feature vectors corresponding to the fused T-frame depth face data. As shown in fig. 4, a schematic flow chart for extracting a face feature vector according to an embodiment of the present invention mainly includes the following steps:

step S401: and inputting the T-frame color face data into a color face recognition model, and extracting color face characteristic vectors corresponding to the T-frame color face data.

In the process of implementing step S401 specifically, T-frame color face data is input into the trained color face recognition model, and color face feature vectors corresponding to the T-frame color face data are extracted.

Step S402: and inputting the fused T-frame depth face data into a depth face recognition model, and extracting a depth face feature vector corresponding to the fused T-frame depth face data.

In the process of implementing step S402 specifically, the fused T-frame depth face data is input into the trained depth face recognition model, and the depth face feature vector corresponding to the fused T-frame depth face data is extracted.

In the embodiment of the invention, the color face recognition model and the depth face recognition model are obtained based on a training depth neural convolution network model.

It should be noted that the training modes for the color face recognition model and the depth face recognition model are not limited to the deep neural convolution network model based on training disclosed above, and may be set by the skilled person based on technical requirements.

Based on the face recognition method provided by the embodiment of the invention, the subsequent similarity calculation process is simplified and the accuracy of face recognition is improved by extracting the color face feature vector and the depth face feature vector.

Based on the above-mentioned face recognition method provided by the embodiment of the present invention, the step S106 is executed to perform similarity calculation according to the color face feature vector and the depth face feature vector, so as to obtain a final decision score final _ score. As shown in fig. 5, a schematic flow chart for obtaining a final decision score final _ score according to an embodiment of the present invention mainly includes the following steps:

step S501: and comparing the extracted color face feature vector with color face feature vectors in a pre-constructed sample library, and calculating a color similarity score color _ score.

In the process of implementing step S501 specifically, based on the extracted color face feature vector, the color face feature vector is compared with color face feature vectors in a pre-constructed sample library, and a color similarity score color _ score is calculated.

Optionally, the process of calculating the color similarity score color _ score specifically includes:

and calculating cosine similarity or Euclidean distance between the color face feature vectors and the color face feature vectors in the sample library, and obtaining a color similarity score color _ score according to the cosine similarity or the Euclidean distance.

For example: the color face feature vector A-RGB of the user A and the color face feature vector B-RGB of the target user B in the sample library, and the color similarity score color _ score between the color face feature vector A-RGB and the color face feature vector B-RGB of the target user B in the sample library can be obtained through cosine similarity or Euclidean distance calculation, and when the color similarity score color _ score is calculated through cosine similarity, the color similarity score color _ score between the face corresponding to the T frame face and the target face in the sample library is as follows: color _ score is cosine (a-RGB, B-RGB).

Step S502: and comparing the extracted depth face feature vector with the depth face feature vector in a pre-constructed sample library, and calculating a depth similarity score depth _ score.

In the process of implementing step S502 specifically, based on the extracted depth face feature vector, the depth face feature vector is compared with a depth face feature vector in a pre-constructed sample library, and a depth similarity score depth _ score is calculated.

Optionally, the process of calculating the depth similarity score depth _ score specifically includes:

Based on the example in step S501, the depth face feature vector a-DEP of the user a, the depth face feature vector B-DEP of the target user B in the sample library, and the depth similarity score depth _ score between two users can be calculated by the cosine similarity cosine between the a-DEP and the B-DEP: depth _ score is cosine (a-DEP, B-DEP).

It should be noted that, when the operations of comparing the extracted color face feature vector with the color face feature vector in the pre-constructed sample library, calculating the color similarity score color _ score, comparing the extracted depth face feature vector with the depth face feature vector in the pre-constructed sample library, and calculating the depth similarity score depth _ score are performed, the order of the operations is not distinguished.

The extracted color face feature vector may be compared with color face feature vectors in a pre-constructed sample library to calculate a color similarity score color _ score, and then the extracted depth face feature vector may be compared with depth face feature vectors in the pre-constructed sample library to calculate a depth similarity score depth _ score.

Or the extracted depth face feature vector may be compared with a depth face feature vector in a pre-constructed sample library to calculate a depth similarity score depth _ score, and then the extracted color face feature vector may be compared with a color face feature vector in a pre-constructed sample library to calculate a color similarity score color _ score.

Step S503: and acquiring sum values of the color similarity score color _ score and the depth similarity score depth _ score which are respectively multiplied by the corresponding weight coefficients based on a weighted average method, and taking the sum values as final decision scores final _ score.

In step S503, a final decision score final _ score is calculated using formula (1) final _ score ═ w1 color _ score + w2 depth _ score.

In the process of specifically implementing step S503, based on a weighted average method, the color similarity score color _ score and the depth similarity score depth _ score are respectively multiplied by respective corresponding weight coefficients and then added to obtain a specific numerical value, which is used as the final decision score final _ score.

Based on the examples in step S501 and step S502, the color similarity score color _ score is: color _ score is cosine (a-RGB, B-RGB) 0.96, depth similarity score depth _ score is: depth _ score is equal to cosine (a-DEP, B-DEP) 0.956, w2 is equal to 0.5, w1 is equal to 1-w2 is equal to 0.5, then final decision score final _ score is: final _ score ═ w1 color _ score + w2 depth _ score ═ 0.5 × 0.96+0.5 × 0.956 ═ 0.958.

Another example is: the color similarity score color _ score is: color _ score is cosine (a-RGB, B-RGB) 0.96, depth similarity score depth _ score is: depth _ score is equal to cosine (a-DEP, B-DEP) 0.93, w2 is equal to 0.4, w1 is equal to 1-w2 is equal to 0.6, then final decision score final _ score is: final _ score ═ w1 color _ score + w2 depth _ score ═ 0.6 × 0.96+0.4 × 0.93 ═ 0.948.

For another example: the color similarity score color _ score is: color _ score is cosine (a-RGB, B-RGB) 0.96, depth similarity score depth _ score is: depth _ score is equal to cosine (a-DEP, B-DEP) 0.83, w2 is equal to 0.1, w1 is equal to 1-w2 is equal to 0.9, then final decision score final _ score is: final _ score ═ w1 color _ score + w2 depth _ score ═ 0.9 × 0.96+0.1 × 0.83 ═ 0.947.

From the above example, the values of w1 and w2 will vary according to the level of imaging quality of the depth camera device, and if the level of imaging quality of the depth camera device is high, the face recognition accuracy obtained by the individual depth face data is high, and can approach the level of color face recognition accuracy; as the imaging quality level of the depth camera device decreases, the weight w2 must decrease to achieve the maximum improvement in the accuracy of the final face recognition.

It should be noted that, instead of using the weighted average method to calculate the final decision score final _ score, it can also be implemented using the softmax function.

It should be noted that the method for calculating the final decision score final _ score is not limited to the weighted average method disclosed above, and may be set by a skilled person based on technical requirements.

Based on the face recognition method provided by the embodiment of the invention, the final decision score final _ score is obtained by performing similarity calculation on the extracted color face feature vector and the extracted depth face feature vector, so that the calculation process is simplified, the calculation efficiency is improved, and the accuracy of face recognition is improved.

Corresponding to the face recognition method shown in the above embodiment of the present invention, the embodiment of the present invention further provides a face recognition apparatus, as shown in fig. 6, where the face recognition apparatus includes: an acquisition module 61, a processing module 62, a fusion module 63, an extraction module 64, a calculation module 65 and a determination module 66.

The acquiring module 61 is configured to acquire a pair of T frame image frames transmitted by the user terminal.

The pair of T frame image frames comprises a pair of T frame color image frames and a pair of T frame depth image frames, and T is an integer larger than 1.

And the processing module 62 is configured to perform face detection by using a pair of T-frame image frames as input of a face detection model, and if a face is detected, obtain a T-frame face pair formed by T-frame color face data and T-frame depth face data.

And the fusion module 63 is configured to fuse the T-frame depth face data and n depth face data before the T frame to obtain fused T-frame depth face data.

Wherein the value of n is a positive integer less than T.

And the extraction module 64 is configured to extract color face feature vectors corresponding to the T-frame color face data and depth face feature vectors corresponding to the fused T-frame depth face data, respectively.

And the calculating module 65 is configured to perform similarity calculation according to the color face feature vector and the depth face feature vector to obtain a final decision score final _ score.

And the determining module 66 is configured to determine that the T-frame face passes face recognition on a corresponding face if the final decision score final _ score is greater than a threshold.

Based on the face recognition device provided by the embodiment of the invention, a pair of T frame image frames transmitted by a user terminal is obtained, wherein the pair of T frame image frames comprises a pair of T frame color image frames and a T frame depth image frame, and the value of T is an integer greater than 1; performing face detection by taking the T-frame color image as the input of a face detection model, and if a face is detected, acquiring a T-frame face pair consisting of T-frame color face data and T-frame depth face data; fusing the T-frame depth face data and n depth face data before the T frame to obtain fused T-frame depth face data, wherein the value of n is a positive integer smaller than T; respectively extracting color face feature vectors corresponding to the T-frame color face data and depth face feature vectors corresponding to the T-frame depth face data after fusion; performing similarity calculation according to the color face feature vector and the depth face feature vector to obtain a final decision score final _ score; and if the final decision score final _ score is larger than the threshold value, determining that the face of the T frame passes face recognition to the corresponding face. In the scheme, a plurality of T-frame depth face data and n depth face data before a T frame are fused to obtain fused T-frame depth face data, whether the T-frame face identifies the corresponding face through the face is determined according to the final decision score, the depth image is applied to carry out depth face data fusion and obtain the final decision score, and the depth image comprises the shape information of the face, so that the integrity, the precision and the signal-to-noise ratio of the depth image are improved, the face identification accuracy of the depth image is further improved, the difference between the face identification accuracy and the face identification accuracy of a color image is reduced, and the face identification accuracy is further improved.

Referring to fig. 6, as shown in fig. 7, a schematic structural diagram of another face recognition apparatus provided in the embodiment of the present invention is shown, wherein the processing module 62 includes: an input unit 621, an acquisition unit 622, and a processing unit 623.

An input unit 621, configured to input a T-frame color image in a pair of T-frame image frames into a pre-constructed face detection model.

An obtaining unit 622, configured to obtain the face data meeting the requirements in the T-frame color image output by the face detection model.

The face data comprises coordinates of the upper left corner and the lower right corner of a rectangle at the position of the face, the face meeting the requirements refers to the face data of the face when one face exists, and the face data corresponding to the face with the largest area in a plurality of faces when a plurality of faces exist.

And the processing unit 623 is configured to cut a T frame color image frame and a T frame depth image frame in the pair of T frame image frames according to the coordinates, and obtain a T frame face pair formed by T frame color face data and T frame depth face data.

Based on the face recognition device provided by the embodiment of the invention, a plurality of T-frame depth face data and n depth face data before the T frame are fused to obtain the fused T-frame depth face data, whether the T-frame face passes face recognition on the corresponding face is determined according to the final decision score, the depth image is applied to carry out depth face data fusion and obtain the final decision score, and the depth image comprises the shape information of the face, so that the integrity, the precision and the signal-to-noise ratio of the depth image are improved, the face recognition accuracy of the depth image is further improved, the difference between the face recognition accuracy of the depth image and the face recognition accuracy of the color image is reduced, and the face recognition accuracy is further improved.

With reference to fig. 6, as shown in fig. 8, a schematic structural diagram of another face recognition apparatus provided in the embodiment of the present invention is shown, wherein the fusion module 63 includes: a conversion unit 631, a first processing unit 632, a fusion unit 633, and a second processing unit 634.

And a conversion unit 631, configured to convert the T-frame depth face data and n depth face data before the T frame into corresponding T-frame three-dimensional point cloud data based on the camera parameters.

The first processing unit 632 is configured to rotationally transform the n three-dimensional point cloud data before the T frame to a T frame camera coordinate system where the T frame three-dimensional point cloud data is located through an ICP registration algorithm, so as to obtain n three-dimensional point cloud data before the T frame after transformation.

And the fusion unit 633 is used for fusing the T-frame three-dimensional point cloud data and the n three-dimensional point cloud data before the T frame after transformation into the truncated signed distance function model TSDF model.

The second processing unit 634 is configured to project the TSDF model to a T-frame camera imaging plane based on the camera internal parameters, so as to obtain fused T-frame depth face data.

With reference to fig. 6, as shown in fig. 9, a schematic structural diagram of another face recognition apparatus provided in the embodiment of the present invention is shown, wherein the extracting module 64 includes: a first extraction unit 641 and a second extraction unit 642.

The first extracting unit 642 is configured to input the T-frame color face data into the color face recognition model, and extract a color face feature vector corresponding to the T-frame color face data.

The second extracting unit 642 is configured to input the fused T-frame depth face data into the depth face recognition model, and extract a depth face feature vector corresponding to the fused T-frame depth face data.

Optionally, based on the calculation module 65 shown in fig. 6, the calculation module 65 includes a first calculation unit, a second calculation unit, and a third calculation unit.

And the first calculation unit is used for comparing the extracted color face feature vector with color face feature vectors in a pre-constructed sample library and calculating a color similarity score color _ score.

And the second calculating unit is used for comparing the extracted depth face feature vector with the depth face feature vector in a pre-constructed sample library and calculating a depth similarity score depth _ score.

And the third calculating unit is used for acquiring a sum value of the color similarity score color _ score and the depth similarity score depth _ score multiplied by the corresponding weight coefficients respectively based on a weighted average method, and taking the sum value as a final decision score final _ score.

Optionally, the first computing unit is specifically configured to:

Optionally, the second calculating unit is specifically configured to:

Based on the face recognition device provided by the embodiment of the invention, a plurality of T-frame depth face data and n depth face data before the T frame are fused to obtain fused T-frame depth face data, whether the T-frame face passes face recognition on the corresponding face is determined according to the final decision score, the depth image is applied to carry out depth face data fusion and obtain the final decision score, and the depth image comprises the shape information of the face, so that the integrity, the precision and the signal-to-noise ratio of the depth image are improved, the face recognition accuracy of the depth image is further improved, the difference between the face recognition accuracy of the color image and the face recognition accuracy of the color image is reduced, and the face recognition accuracy is improved.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A face recognition method, comprising:

2. The method according to claim 1, wherein the performing face detection by using the T-frame color image as an input of a face detection model comprises:

3. The method according to claim 1, wherein the fusing the T-frame depth face data with n depth face data before the T-frame to obtain fused T-frame depth face data comprises:

4. The method according to claim 1, wherein the extracting color face feature vectors corresponding to the T-frame color face data and the depth face feature vectors corresponding to the fused T-frame depth face data respectively comprises:

5. The method according to claim 1, wherein the performing similarity calculation according to the color face feature vector and the depth face feature vector to obtain a final decision score final _ score comprises:

6. The method of claim 5, wherein said calculating a color similarity score color score comprises:

accordingly, calculating the depth similarity score depth _ score includes:

7. An apparatus for face recognition, the apparatus comprising:

8. The apparatus of claim 7, wherein the processing module comprises:

9. The apparatus of claim 7, wherein the fusion module comprises:

10. The apparatus of claim 7, wherein the extraction module comprises: