CN109344714B

CN109344714B - Sight estimation method based on key point matching

Info

Publication number: CN109344714B
Application number: CN201811011543.8A
Authority: CN
Inventors: 李宏亮; 颜海强; 尹康; 袁欢; 梁小娟; 邓志康
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2018-08-31
Filing date: 2018-08-31
Publication date: 2022-03-15
Anticipated expiration: 2038-08-31
Also published as: CN109344714A

Abstract

The invention discloses a sight estimation method based on key point matching, and belongs to sight estimation in the field of computer vision. After the pupil key points are initially positioned through a depth network, the pupil center position is further corrected by adopting an SGBM template matching method. Compared with the existing sight line estimation method, the pupil center position can be more accurately positioned, especially for the situation that the head or eyeball offset is large. The implementation of the invention can effectively improve the precision of sight line estimation, and compared with a pupil corneal reflection method, only a single network camera is adopted, thereby greatly reducing the equipment cost. Compared with the existing method based on single image processing, the method does not need to limit the posture of the head, and the robustness of the algorithm is greatly increased. By matching the 3D face model, the limitation that the existing database can not represent all postures at present is avoided, so that the practicability of the method is improved.

Description

Sight estimation method based on key point matching

Technical Field

The invention provides a sight line estimation method based on key point matching, which is a novel sight line estimation technology in the field of computer vision.

Background

With the development of computer science, human-computer interaction gradually becomes a popular field. The human eye sight line can reflect the attention information of a human and belongs to an important information input source in human-computer interaction. The human-computer interaction based on the sight estimation has wide development prospect in the fields of military affairs, medical treatment, entertainment and the like.

The currently practical sight line estimation technology is mainly based on a pupil corneal reflection technology (PCCR), uses a near-infrared light source to generate a reflection image on the cornea and the pupil of the eye of a user, then uses an image sensor to collect the image of the eye and the reflection, and finally calculates the position and sight line of the eye in space based on a three-dimensional eyeball model. This method, although having a high accuracy, is limited by the difficulty of popularizing expensive sensor equipment.

Aiming at the problems, a sight line estimation method based on a 3D face model appears. The method only needs the picture collected by the camera as input data, carries out key point positioning on the collected picture, estimates the head posture and the eyeball center position by combining a known model, and then obtains the sight angle by combining the detected pupil center position.

However, when the existing sight line estimation method based on the 3D face model calculates the pupil center position, due to the limitation of the database, all real situations cannot be covered, and a large error exists under the condition of a large head posture or eye offset, which causes a great deviation in the final estimation of the sight line.

Disclosure of Invention

The invention aims to: aiming at the existing problems, a method combining a depth network and template matching is provided to accurately position the pupil center, and the feasibility of the scheme is increased.

The sight line estimation method based on key point matching comprises the following steps:

step one, detecting a target face:

inputting a video stream acquired by a camera into a trained face detection network model (selecting a familiar face detection network model such as MobileNet-SSD) to perform face detection, and intercepting a face with the largest size as a target face image for sight detection;

and performing size normalization processing on the target face image, and using the size normalization processed image as an input of a face key point detection network model (selecting a corresponding conventional detection network model, such as SE-Net) to obtain a face key point and a pupil center of the target face image.

Step two, detecting key points of the human face and initially positioning the pupil center:

inputting a target face image after size normalization processing based on a selected face key point detection network model to obtain face key points and coordinates of 2 initial pupil centers on the current target face image, and converting the coordinates into coordinates on a video image (before normalization), wherein the face key points comprise 4 eye key points, and the left eye and the right eye respectively comprise two key points (two end points of the eyes);

thirdly, estimating the head posture and positioning the eyeballs:

matching the detected key points of the human face with the key points of the standard three-dimensional human face through a perspective n-point algorithm (PNP algorithm) to obtain the spatial position and the rotation angle of the human face relative to the camera;

thereby obtaining three-dimensional coordinates of 2 initial pupil centers and three-dimensional coordinates of 4 eye key points;

taking the midpoint of two eye key points of the left eye and the right eye 12mm behind the head posture direction as the center positions of the left eyeball and the right eyeball respectively under the three-dimensional coordinate;

step four, correcting the pupil center position:

intercepting left eye and right eye pictures according to 4 detected eye key points under a three-dimensional coordinate, repositioning a pupil center point by using a semi-global matching SGBM method, and if the confidence coefficient of a currently obtained matching point (the repositioned pupil center point) is greater than 0.7, considering the matching point as credible; taking the median of the two credible matching points as the final pupil center position;

step five, estimating the sight direction:

and calculating the optical axis information from the center of the eyeball to the center of the pupil under the three-dimensional coordinate to obtain the current sight direction. And obtaining the current sight line direction.

In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:

the sight line estimation method based on key point matching can effectively improve the precision of sight line estimation, and compared with a pupil corneal reflection method, only a single network camera is adopted, so that the equipment cost is greatly reduced. Compared with the existing method based on single image processing, the method does not need to limit the posture of the head, and the robustness of the algorithm is greatly increased. By matching the 3D face model, the limitation that the existing database can not represent all postures at present is avoided, so that the practicability of the method is improved.

Drawings

FIG. 1 is a schematic view of the process of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings.

The existing sight line estimation method has a large error for pupil center positioning, especially for the case of large head pose. The invention tries to preliminarily position key points and pupil centers of the human face through SE-Net (Squeeze-and-Excitation Networks), and then corrects the result by adopting the pupil center obtained by an SGBM matching algorithm (semi-global matching), thereby further improving the pupil positioning precision.

Firstly, carrying out face detection on a picture read by a camera, cutting a face with the largest dimension as a target needing to estimate a sight line, and normalizing to a standard size. The face feature points (68 common key points) and 2 pupil center positions of the face are detected based on the SE-Net network.

Then, the detected 68 face key points are matched with the standard 3D face key points by using a perspective n-point algorithm (PNP) to obtain the position and the rotation angle of the face relative to the camera in the space.

Then, by adopting the method provided by the invention, pictures of the left eye and the right eye are respectively intercepted according to the obtained eye key points, the eye pictures are matched with the standard pupil pictures by adopting a semi-global block matching algorithm (SGBM), the point with the highest confidence level in the matching result is found as the pupil center, if the matched confidence level is more than 0.7, the matched position is considered to be credible, then the final positioning result is calculated by taking the pupil positioning results twice, and the calculation formula is as follows:

wherein, P_SeNetFor pupil detection results from SE-Net, P_SGBMPupil detection results obtained for SGBM, and pupil center confidence obtained for SGBM, TThe larger the degree is, the more accurate the detection result is.

And finally, taking the center of the key point of the eye part as the eyeball center along the head offset direction of 12mm, and combining the vectors of the eyeball center and the pupil center to obtain the final sight line direction.

After the pupil key points are initially positioned through a depth network, the pupil center position is further corrected by adopting an SGBM template matching method. Compared with the existing sight line estimation method, the pupil center position can be more accurately positioned, especially for the situation that the head or eyeball offset is large.

Examples

Referring to fig. 1, the present invention mainly comprises the following steps: detecting a target face, detecting key points of the face, primarily positioning the pupil center, estimating the head posture and positioning eyeballs, correcting the pupil center position and estimating the sight direction.

Step one, detecting a target face.

And inputting the video stream acquired by the camera into a trained face detection network (MobileNet-SSD) for face detection, intercepting the face with the largest size as a target face for sight detection, and normalizing the face to 300 × 300 to be used as the input of the key point detection network.

And step two, detecting key points of the human face and initially positioning the pupil center.

And the SE-Net is used as a basic network to carry out model training of face key point and pupil center detection, and the L1loss is used as a loss function in the training process, so that the positioning precision is further improved. The 300 x 300 face picture is transmitted into the trained model, so as to obtain the coordinates of 68 key points and 2 pupil centers on the face picture, and then, the coordinates are converted into the coordinates on the original image. Wherein, the expression of L1loss is as follows:

wherein f (x)_i) Model prediction result, y, representing ith input data_iThe corresponding label result is shown, and m represents the number of data input to the model each time.

And thirdly, estimating the head posture and positioning the eyeballs.

And estimating the spatial position and the rotation angle of the face relative to the camera by using the obtained 68 key two-dimensional coordinates on the video picture and the existing 68-point face three-dimensional coordinate model by adopting a PNP algorithm. Then, the midpoint of the two eye key points 12mm behind the head posture direction is taken as the eyeball center.

And step four, correcting the center position of the pupil.

And intercepting the left eye picture and the right eye picture according to the detected 4 eye key points, searching the pupil center by using an SGBM method, and obtaining the center position and the corresponding confidence coefficient, wherein the higher the confidence coefficient is, the higher the accuracy is. And if the confidence coefficient is greater than 0.7, combining the positioning results of the two times to obtain the final pupil center position.

And fifthly, estimating the sight direction.

And taking three-dimensional coordinates of the eyeball center and the pupil center, and calculating the optical axis information to obtain the sight line direction finally.

While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.

Claims

1. A sight line estimation method based on key point matching is characterized by comprising the following steps:

step one, detecting a target face:

inputting a video stream collected by a camera into a trained face detection network model for face detection, and intercepting a face with the largest size as a target face image for sight detection;

carrying out size normalization processing on the target face image;

inputting a target face image after size normalization processing based on a selected face key point detection network model to obtain face key points and coordinates of 2 initial pupil centers on the current target face image, and converting the coordinates into coordinates on a video image, wherein the face key points comprise 4 eye key points, and the left eye and the right eye respectively comprise two key points;

thirdly, estimating the head posture and positioning the eyeballs:

matching the detected face key points with standard three-dimensional face key points through a perspective n-point algorithm to obtain the spatial position and the rotation angle of the face relative to the camera;

step four, correcting the pupil center position:

intercepting left eye and right eye pictures according to 4 detected eye key points under a three-dimensional coordinate, repositioning a pupil center point by using a semi-global matching SGBM method, and if the confidence coefficient of a currently obtained matching point is greater than 0.7, considering the matching point as credible; taking the median of the two credible matching points as the final pupil center position;

step five, estimating the sight direction:

and calculating the optical axis information from the center of the eyeball to the center of the pupil under the three-dimensional coordinate to obtain the current sight direction.

2. The method of claim 1, wherein the target face image has a normalized size of 300 x 300.