CN111881841A

CN111881841A - Face detection and recognition method based on binocular vision

Info

Publication number: CN111881841A
Application number: CN202010748989.XA
Authority: CN
Inventors: 李磊; 宦蕴哲; 蒋晨阳; 王永春; 金玮; 李建业
Original assignee: Changzhou Campus of Hohai University
Current assignee: Changzhou Campus of Hohai University
Priority date: 2020-07-30
Filing date: 2020-07-30
Publication date: 2020-11-03
Anticipated expiration: 2040-07-30
Also published as: CN111881841B

Abstract

The invention discloses a face detection and identification method based on binocular vision, which comprises the following steps: (1) acquiring a left human face picture and a right human face picture through a binocular camera; (2) detecting two pictures through the hog characteristics, and finding out two corresponding human face pictures from the two pictures; (3) extracting human face features from the two acquired pictures containing human faces; (4) obtaining the depth information of the human face characteristic points by a binocular vision ranging method, thereby solving a human face three-dimensional model; (5) and analyzing the solved result by using a statistical method to realize identification. Specifically, the three-dimensional face model is classified through a support vector machine, so that face recognition is realized. The face detection, three-dimensional reconstruction and identification are realized through the shooting, detection, modeling and classification technology, and the method has the characteristics of high efficiency and rapidness in processing, safety and reliability, and capability of providing more complete and richer detection information.

Description

Face detection and recognition method based on binocular vision

Technical Field

The invention belongs to the technical field of digital image processing, and particularly relates to a face detection and identification method based on binocular vision.

Background

Since 2001, the public security department began using face detection technology to combat serious criminal crimes and to receive national support. The face detection technology is applied when the Beijing Olympic Games hold in 2008, which marks that the face detection of China enters a practical stage. Since then, in the world exposition in Shanghai, the technology is more widely applied, and meanwhile, various companies are added in succession, so that the large-scale application of the face detection technology in China is accelerated. With the technical progress in the field in China, the 'three-in-two' will be the inevitable trend of the development of the face detection technology. Wherein: the "three-transformation" refers to the mainstream, chip and standardization; "bipartite" refers to the association of multi-biometric fusions with REID with other biometrics.

At present, the research on human face detection at home and abroad is mainly based on two-dimensional images, and the human face detection and detection methods used in various documents are different, but most methods focus on human face detection on a single two-dimensional image, and currently, the mainstream human face identification comprises the following methods,

face recognition based on geometric features;

face recognition based on the characteristic face;

face recognition based on template matching;

face recognition based on a neural network;

hidden Markov Model (HMM) based face recognition;

face recognition based on an elastic matching method;

face recognition based on Bayesian decision;

face recognition based on a support vector machine;

and the three-dimensional face detection system based on binocular vision is less.

The current face recognition is mainly researched on two-dimensional images or two-dimensional dynamic video sequences. The two-dimensional image recognition technology has many applications in other fields, but because the human face is a plastic deformation body, the human face recognition technology has difficulty in the two-dimensional image recognition technology. In addition, the face recognition based on the two-dimensional image is inevitably affected by ambient light, background, visual angle and the like, and the posture, expression, shielding and the like of the face, so that the recognition accuracy is difficult to be further improved.

The two-dimensional face image is only a plane projection result of the three-dimensional face image, a part of information is necessarily lost in the process, and the face is influenced by factors such as illumination conditions, background, posture, expression and the like, so that the problems are difficult to solve by the face detection method based on the monocular camera.

Disclosure of Invention

In order to overcome the defects of a face recognition technology based on a two-dimensional image, the invention provides a face detection recognition method based on binocular vision, which is used for collecting face features to carry out three-dimensional detection research based on the binocular vision and can greatly improve the detection efficiency and accuracy. By introducing a binocular vision technology, compared with a two-dimensional face image obtained by a common monocular camera and a three-dimensional face image formed by the binocular camera, more information, particularly depth information of a face, can be obtained. The method is used for solving the problem that the traditional monocular camera is difficult to detect the face in some special scenes.

In order to solve the technical problem, the invention is solved by the following technical scheme:

a face detection and identification method based on binocular vision comprises the following steps:

step (1) acquiring a left picture and a right picture containing human faces;

respectively carrying out hog feature detection on two pictures containing human faces to obtain human face images corresponding to the two pictures;

performing feature extraction on the two obtained face images to obtain face feature points;

step (4) calculating four-dimensional coordinate point information of each face characteristic point through a binocular ranging algorithm based on the extracted face characteristic points, and fitting a face three-dimensional model based on the four-dimensional coordinate point information of the face characteristic points;

and (5) identifying the three-dimensional model of the face through a support vector machine algorithm to obtain an identification result.

In some embodiments, in step (1), two pictures containing human faces are obtained through a binocular camera.

In some embodiments, in step (3), feature extraction is performed on the two obtained face images by using a convolutional neural network to obtain face feature points.

In some embodiments, in step (3), the face feature points are selected from 9 feature points, which are 2 eyeball center points, 4 eyeball corner points, a nostril center point and 2 mouth corner points.

In some embodiments, the convolutional neural network is a modified P _ Net network, the input layer size is set to 1202 × 1202 × 3, the second layer network is 600 × 600 × 10, the pooling layer is a 2 × 2 matrix, the third layer network is 300 × 300 × 16, the pooling layer is a 2 × 2 matrix, the fourth layer network is 1 × 1 × 1000, and the last layer output layer size is 2 × 9.

In some embodiments, in step (4), four-dimensional coordinate point information of each feature point is calculated by a binocular ranging algorithm, and the calculation formula is as follows:

p point is the target of the depth to be calculated, O_lAnd O_rTwo points, f, corresponding to the left and right images respectively_l、f_rThe distance between the corresponding point of the left and right images and the lens is divided into X_l、Y_lThe positions of P on the left photograph, X respectively_r、Y_rRespectively, the position of P on the right photograph; r is₁To r₉Coordinate positions of the 9 characteristic points on the two photos; the relationship between the camera coordinate system and the world coordinate system can be represented by a rotation matrix R, wherein

Time matrix

Representing the time conversion relation between a camera coordinate system and a world coordinate system, wherein Z is the depth of the target to be measured relative to the world coordinate system, and x is converted into X through R_l、y_l、z_l、x_r、y_r、z_rConverted to a position relative to the world coordinate system.

In some embodiments, in step (4), the three-dimensional model of the human face is fitted by cubic spline interpolation based on the four-dimensional coordinate point information of the characteristic points of the human face.

Compared with the prior art, the invention has the following beneficial effects:

the invention provides a binocular vision-based face detection and recognition method, which mainly aims at the problem that the face recognition of a monocular camera cannot be well performed due to the influence of factors such as illumination conditions, backgrounds, postures, expressions and the like. Aiming at the defect, a binocular camera is used for acquiring images of the human face to establish a three-dimensional model of the human face, and finally the recognition of the human face is realized through a support vector machine algorithm, so that the problem that the monocular camera cannot work well in the face recognition under the condition of large external interference is effectively solved.

Drawings

FIG. 1 is a general flow diagram of a method involved in an embodiment of the invention;

FIG. 2 is a structure of a modified version of the P _ Net network;

fig. 3 is a schematic view of binocular vision computed depth.

Detailed Description

The invention is described in further detail below with reference to the following figures and detailed description:

as shown in fig. 1, a face detection and recognition method of the present invention includes the following steps:

(1) the images of two human faces are obtained through the binocular camera, and when the images are obtained, the human faces are possibly right opposite to the binocular camera, so that a more reliable human face image can be obtained. The definition of the camera is not less than 500 ten thousand pixels, so that more accurate human face feature point positions can be obtained conveniently. The distance of the face from the camera should ensure that the face occupies the area of the picture as much as possible.

(2) After the left and right images were obtained, the input was calculated using a modified version of P _ Net network, as shown in fig. 2, with the input size set to 1202 × 1202 × 3, the second tier network set to 600 × 600 × 10, the pooling layer to a 2 × 2 matrix, the third tier network set to 300 × 300 × 16, the pooling layer to a 2 × 2 matrix, the fourth tier network set to 1 × 1 × 1000, and the last tier output to 2 × 9. The reason for outputting 2 × 9 here is that the network is designed to find the positions of 9 corner points, 2 eyeball center points, 4 eye corner points, the center points of the nostrils, and 2 mouth corner points in the face, and each corner point has coordinates in two directions on one picture, so the output is 2 × 9.

(3) After coordinate positions of the 9 points on the two photos are obtained, depth coordinate information of each corner point is obtained through a binocular distance measuring algorithm. The ranging algorithm is shown in fig. 3: p point is the target of the depth to be calculated, Z_l、Z_rFor the depth, O, of the object to be measured in the left and right images, respectively_lAnd O_rTwo points, f, corresponding to the left and right images respectively_l、f_rThe distance between the corresponding point of the left and right images and the lens is divided into X_l、Y_lThe positions of P on the left photograph, X respectively_r、Y_rRespectively, the position of P on the right photograph; r is₁To r₉Coordinate positions of 9 characteristic points (respectively 2 eyeball center points, 4 eyeball corner points, nostril center points and 2 mouth corner points) on the two pictures;

the calculation formula is as follows:

the relationship between the camera coordinate system and the world coordinate system can be represented by a rotation matrix R, wherein

Time matrix

Representing the time conversion relation of two coordinate systems, wherein Z is the depth of the target to be measured relative to the world coordinate system, and x is converted into X through R_l、y_l、z_l、x_r、y_r、z_rConverted to a position relative to the world coordinate system.

(4) After the depth information of 9 points is obtained, specific positions of the 9 corner points in a three-dimensional coordinate space are obtained, and then the specific shape of the human face is fitted through cubic spline interpolation to obtain model parameters of the human face.

(5) And finally, taking the obtained model parameters of the human face as input, and realizing the recognition of the human face through a support vector machine algorithm. The input of the support vector machine is a parameter model of the human face, the output size is set to be 1, different human faces are used as the input of the support vector machine, the output value difference is large, the same human face is used as the input of the support vector machine, the output value difference is small, two input human faces with similar output values can be considered as one person, and two input human faces with larger value difference are considered as two persons.

The positions of 9 corner points, namely 2 eyeball center points, 4 eye corner points, a nostril center point and 2 mouth corner points in the human face, are obtained through a P-Net network, and after the positions of the 9 corner points are obtained (two image images obtained by a left camera and a right camera, 9 points of each human face image are obtained, 18 points are obtained in total), three-dimensional coordinate information corresponding to the 9 corner points is obtained through a binocular vision ranging algorithm.

After 9 positions are obtained, the shape of the human face is restored by cubic spline interpolation.

The information of the human face shape is used as the feature, and the feature is classified through a support vector machine algorithm.

In some embodiments, the input of the improved P-Net network is the face images obtained by the left and right cameras, and the output is the positions of the 2 eyeball center points, 4 eye corner points, the middle point of the nostril and the 9 corner points of the 2 mouth corner points in the face, according to which the output can be expanded to increase the output types, such as the positions of the ears, the chin and the like.

The input of the support vector machine algorithm is a human face shape obtained through cubic spline interpolation, and only one output is set for the output of the human face shape, namely the output dimension is 1 multiplied by 1. The output values are very close to each other for the same face, and the output values are greatly different for different faces.

The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims

1. A face detection and identification method based on binocular vision is characterized by comprising the following steps:

step (1) acquiring a left picture and a right picture containing human faces;

2. The binocular vision based face detection and recognition method of claim 1, wherein: in the step (1), two pictures containing human faces are obtained through a binocular camera.

3. The binocular vision based face detection and recognition method of claim 1, wherein: and (3) performing feature extraction on the two obtained face images by using a convolutional neural network to obtain face feature points.

4. The binocular vision based face detection and recognition method of claim 1, wherein: in the step (3), 9 feature points are selected from the face feature points, wherein the feature points are respectively 2 eyeball center points, 4 eye corner points, the midpoint of a nostril and 2 mouth corner points.

5. The binocular vision based face detection and recognition method of claim 3, wherein: the convolutional neural network adopts an improved P _ Net network, the size of an input layer is set to be 1202 multiplied by 3, a second layer network is 600 multiplied by 10, a pooling layer is a matrix of 2 multiplied by 2, a third layer network is 300 multiplied by 16, the pooling layer is a matrix of 2 multiplied by 2, a fourth layer network is 1 multiplied by 1000, and the size of a last layer output layer is 2 multiplied by 9.

6. The binocular vision based face detection and recognition method of claim 1, wherein: in the step (4), four-dimensional coordinate point information of each characteristic point is calculated through a binocular ranging algorithm, and a calculation formula is as follows:

p is the target to be calculated in depth, Z is the depth of the target to be calculated relative to the world coordinate system, O_lAnd O_rTwo points, f, corresponding to the left and right images respectively_l、f_rThe distance between the corresponding point of the left and right images and the lens is divided into X_l、Y_lThe positions of P on the left photograph, X respectively_r、Y_rRespectively, the position of P on the right photograph; r is₁To r₉Coordinate positions of the 9 characteristic points on the two photos; the relationship between the camera coordinate system and the world coordinate system can be represented by a rotation matrix R, wherein

Time matrix

Representing the time-transformed relationship between the camera coordinate system and the world coordinate system.

7. The binocular vision based face detection and recognition method of claim 1, wherein: and (4) fitting a three-dimensional model of the human face through cubic spline interpolation based on the four-dimensional coordinate point information of the characteristic points of the human face.