Disclosure of Invention
Accordingly, the present invention aims to provide a three-dimensional face recognition method based on deep learning, which can fully utilize depth information of a face to effectively improve reliability of a face recognition result.
In order to achieve the above purpose, the present invention provides a three-dimensional face recognition method based on deep learning, comprising:
Step one, constructing a three-dimensional deep learning network of a human face, calculating a high-frequency-depth mapping diagram of each training sample, and then inputting a color human face two-dimensional image and the high-frequency-depth mapping diagram of each training sample into the three-dimensional deep learning network of the human face at the same time for training;
Shooting by using a color camera to obtain a color face two-dimensional image of the face to be detected, inputting the color face two-dimensional image of the face to be detected into a trained face three-dimensional deep learning network, and outputting to obtain three-dimensional point cloud coordinates of the face to be detected;
Step three, calculating three-dimensional face feature vectors according to the two-dimensional images of the color faces and the three-dimensional point cloud coordinates of the faces to be detected, comparing the three-dimensional face feature vectors of the faces to be detected with the three-dimensional face feature vectors of the registered faces in the registration library, thereby identifying personnel information of the faces to be detected,
Calculating a high frequency-depth map of any training sample X, further comprising:
step 11, carrying out Fourier spectrum transformation on the color face two-dimensional image of the training sample X, thereby obtaining a spectrogram of the training sample X;
Step 12, extracting a plurality of key points from the color face two-dimensional image of the training sample X, then reading the frequency spectrum value of each key point from the spectrogram of the training sample X, and calculating the truncated value D 0,D0 of the high-pass filter function as the average value of the frequency spectrum values of all the key points according to the frequency spectrum value;
Step 13, setting a high-pass filter function, and passing the spectrogram of the training sample X through the high-pass filter function to obtain a filtered high-frequency chart, wherein the high-pass filter function is set as follows: Wherein D (u, v) is a frequency spectrum value of a coordinate (u, v) on a spectrogram of the training sample X, n is an order constant, 2 or 4 is taken, and H (u, v) is a frequency spectrum value obtained after filtering;
step 14, performing inverse Fourier transform on the high-frequency image of the training sample X, so as to obtain a high-frequency color face two-dimensional image;
and 15, comparing the brightness value of each point in the high-frequency color face two-dimensional image with a threshold value, replacing the brightness value higher than the threshold value with a depth value, and obtaining the high-frequency color face two-dimensional image which is the high-frequency-depth map of the training sample X after all the points are compared.
Compared with the prior art, the face detection method has the beneficial effects that the face depth information is added into the training process of the neural network, the defect of the depth information is made up, so that the reliability of the face recognition result is effectively improved, only one camera is adopted, only a single image of the detected face is required to be acquired, and the detection speed, the passing rate and the anti-counterfeiting rate of the face living body are greatly improved by combining the technologies of deep learning, machine learning and the like.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent.
As shown in fig. 1, the three-dimensional face recognition method based on deep learning of the present invention includes:
Step one, constructing a three-dimensional deep learning network of a human face, calculating a high-frequency-depth mapping diagram of each training sample, and then inputting a color human face two-dimensional image and the high-frequency-depth mapping diagram of each training sample into the three-dimensional deep learning network of the human face at the same time for training;
Shooting by using a color camera to obtain a color face two-dimensional image of the face to be detected, inputting the color face two-dimensional image of the face to be detected into a trained face three-dimensional deep learning network, and outputting to obtain three-dimensional point cloud coordinates of the face to be detected;
And thirdly, calculating a three-dimensional face feature vector according to the two-dimensional image of the color face and the three-dimensional point cloud coordinates of the face to be detected, and comparing the three-dimensional face feature vector of the face to be detected with the three-dimensional face feature vector of the registered face in the registry, so as to identify the personnel information of the face to be detected.
When registering a face, the 3d sensor can be used for accurately recording three-dimensional information of the face, generating a three-dimensional face feature vector, and finally storing the three-dimensional face feature vector of the registered face into a registration library.
The specific structure of the face three-dimensional deep learning network in the first step is constructed as follows:
the method is realized based on tensorflow frames and adopts an encoding-decoding (namely an encoder-decoder) structure, wherein the encoder part comprises 1 convolution layer and 7 residual layers, so that an input 250 x 3 face two-dimensional image can be converted into a 16 x 1024 feature image, 9 deconvolution layers are arranged in the decoder part, the feature image can be converted into a 250 x 1 three-dimensional depth image, the three-dimensional depth image is converted into a three-dimensional point cloud through a fixed conversion relation between the three-dimensional depth image and point cloud, the sizes of the convolution layers and deconvolution layer cores are 4, and an activation function is Softplus. Thus, when a 250×250×3 color face two-dimensional image is input, the output of the face three-dimensional deep learning network is 250×250 three-dimensional point cloud coordinates (62500 total).
It is worth mentioning that, in training the three-dimensional deep learning network of human face, the invention can be through calculating the high frequency-depth map of each training sample, and input the two-dimensional image of color human face and high frequency-depth map of each training sample into the three-dimensional deep learning network of human face at the same time to train, thus make the weight of the general characteristic (namely the characteristic that is relatively common in human face and difficult to be used for carrying on the recognition) in the human face image reduce in the model parameter obtained after training, the weight of the individual characteristic (namely the characteristic that is relatively individual in human face and helpful for accurate recognition) improves, improve the accuracy of human face obviously. As shown in fig. 2, the calculation of the high frequency-depth map of any training sample X may further include:
step 11, carrying out Fourier spectrum transformation on the color face two-dimensional image of the training sample X, thereby obtaining a spectrogram of the training sample X;
step 12, extracting a plurality of key points from the color face two-dimensional image of the training sample X, wherein the key points can be eyebrows, eyes, noses, mouths, facial contours and the like, then reading the frequency spectrum value of each key point from the frequency spectrum diagram of the training sample X, and calculating a cutoff value D 0,D0 of a high-pass filter function according to the frequency spectrum value of each key point;
Step 13, setting a high-pass filter function, and passing the spectrogram of the training sample X through the high-pass filter function to obtain a filtered high-frequency chart, wherein the high-pass filter function is set as follows: Wherein D (u, v) is the frequency spectrum value of the coordinate (u, v) on the spectrogram of the training sample X, n is the order constant, 2 or 4 is taken, H (u, v) is the frequency spectrum value obtained after filtering, thus the low-frequency information can be effectively removed, and the required high-frequency information can be obtained;
step 14, performing inverse Fourier transform on the high-frequency image of the training sample X, so as to obtain a high-frequency color face two-dimensional image;
And 15, comparing the brightness value of each point in the high-frequency color face two-dimensional image with a threshold value, and replacing the brightness value higher than the threshold value with a depth value, namely judging whether the brightness value of each point is higher than the threshold value, if so, reading the three-dimensional coordinates of the point from the three-dimensional point cloud of the training sample X, converting the read three-dimensional coordinates into the depth value, replacing the brightness value of the point with the depth value, and if not, continuing judging the next point, wherein the obtained high-frequency color face two-dimensional image is the high-frequency-depth map of the training sample X after all the points are compared. The three-dimensional point cloud of the training sample X may be obtained by collecting the training sample using a depth camera, and the threshold may be set according to actual service needs, for example, 20.
As shown in fig. 3, in the third step, according to the two-dimensional image of the color face and the three-dimensional point cloud coordinates of the face to be detected, a three-dimensional face feature vector is calculated, which may further include:
Step 31, performing face detection, cutting and alignment on a two-dimensional image of a face to be detected, converting the two-dimensional image into a floating point matrix, and then calculating a corresponding two-dimensional image feature vector, wherein the two-dimensional image feature vector is a one-dimensional vector with a size of 512;
Step 32, according to the three-dimensional point cloud coordinates of the face to be detected, a Cartesian xyz rectangular coordinate system is established by taking the nose tip as a coordinate origin, taking a binocular connecting line as a transverse axis direction x and taking a connecting line between the nose tip and the lips as a longitudinal axis direction y, so that a three-dimensional point cloud matrix of the face is obtained, and then a corresponding three-dimensional point cloud feature vector is calculated, wherein the three-dimensional point cloud feature vector is a three-dimensional vector with the size of 3 x 512;
And 33, setting weight factors, namely fusing the two-dimensional image feature vector and the three-dimensional point cloud feature vector, wherein the feature vector obtained after fusion is the three-dimensional face feature vector, and the weight factors of the two-dimensional face feature, the x-direction feature, the y-direction feature and the z-direction feature can be respectively set to be 0.5, 0.17 and 0.16.
In the third step, the three-dimensional face feature vector of the face to be detected is compared with the three-dimensional face feature vector of any registered face in the registration library, and the method may further include:
Step A1, calculating cosine similarity between a three-dimensional face feature vector A of a face to be detected and a three-dimensional face feature vector B of a registered face, wherein a calculation formula can be as follows: Wherein a·b represents the dot product of vectors a and B, |a| 2 represents the L2 norm of vector a, |b|| 2 represents the L2 norm of vector B;
And A2, calculating cosine distances between the face to be detected and the registered face, wherein dist (A, B) =1-cos (A, B), and if the cosine distances are larger, the similarity between the two faces is lower, and judging whether the face to be detected is the registered face according to the cosine distances.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the invention.