CN110647800B

CN110647800B - Eye contact communication detection method based on deep learning

Info

Publication number: CN110647800B
Application number: CN201910720061.8A
Authority: CN
Inventors: 张宏; 李碧蓉; 何力; 管贻生
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2019-08-06
Filing date: 2019-08-06
Publication date: 2022-06-03
Anticipated expiration: 2039-08-06
Also published as: CN110647800A

Abstract

The invention discloses a catch communication detection method based on deep learning, which comprises the steps of firstly training a sight regression model based on deep learning, namely a VH model, outputting deflection angles in the horizontal direction and the vertical direction to represent the sight direction of eyes by the model, and then transmitting the obtained sight direction result to a binary classifier to judge whether catch communication occurs or not; in the present invention, because Columbia-Gaze has fine-grained Gaze direction and head pose, we chose Columbia-Gaze as the dataset we trained and tested; finally, the MCC value of the method on the Columbia-Gaze data set reaches 0.92; the training process includes four key stages: acquiring a face image, enhancing the image, training a sight regression model and training an eye contact detector; the test process is divided into three stages: acquiring a face image, performing visual line direction regression and performing eye contact detection.

Description

Eye contact communication detection method based on deep learning

Technical Field

The invention relates to the technical field of computer vision detection, in particular to a catch communication detection method based on deep learning.

Background

Catch of eye communication is a main form of non-verbal communication, and plays a vital role in daily social interaction of people. It is often used by people to receive information, adjust the interaction atmosphere, and express emotions. While it is now difficult to determine the exact angle at which someone else looks at him, it seems easy to determine whether someone else looks at him, and when. Therefore, eye detection research is in the process of human-computer interaction, driver distraction detection, life recording, children autism detection, user interface and terminal eye tracking.

Today, most gaze-based interactive systems rely on gaze tracking. They can get the user's gaze direction but cannot directly perceive whether catch communication is occurring. Although eye tracking has been widely studied, some methods have relatively high accuracy, but have limitations, such as the need for infrared devices, head-mounted devices, complicated calibration, etc., which limit the practical applications.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a catch communication detection method based on deep learning, which is combined with computer vision, free of constraint, calibration and any external equipment.

The purpose of the invention is realized by the following technical scheme:

a catch communication detection method based on deep learning comprises the following steps:

step one, processing a face image;

firstly, a human face needs to be positioned in each image to avoid interference information; processing all input images by using a face coordinate detector of Openface 2.0; the detector has robustness to illumination and background, and can keep accuracy to the positioning of the characteristic points; in the training process, once a human face is detected, aligning the human face through affine transformation, and then cutting a human face area into a human face image with a fixed size and a resolution of 224 multiplied by 224;

considering that the positive and negative samples of the data set are seriously unbalanced, and the negative sample is far more than the positive sample, a synthesized few sample oversampling technology (SMOTE) is adopted to increase the positive sample, reduce the difference between two sample groups and avoid model overfitting; meanwhile, in order to improve the generalization capability of the model, a training data set is expanded by utilizing Gaussian blur and adjusting brightness in the training process;

step two, regression of sight line direction;

designing a sight line regression model based on deep learning, and training by using the 224 multiplied by 224 fixed-size face image obtained in the step one to obtain a VH sight line regression model; selecting a residual error network ResNet-50 to train the model, setting the learning rate to be 0.01, and using a Mean Square Error (MSE) as a loss function to finally obtain a VH visual line regression model with the training batch size of 64 and training steps of 10000; the VH sight regression model outputs deflection angles (V, H) of sight in horizontal and vertical directions; training a catch contact binary classifier through (V, H) to generate a final catch communication detection result;

thirdly, detecting sight line communication;

eye contact is detected using gaze directions (V, H) obtained with a VH gaze regression model as input; training a binary classifier by using a random forest, wherein the set random forest comprises 28 decision trees, each decision tree is trained on the basis of bootstrap sampling, and binary results output by the binary classifier are determined by all the trees, wherein each tree votes respectively; finally, obtaining a binary classifier, thereby obtaining a complete catch communication detection system;

due to the imbalance of the data set, the results of the test were evaluated using the Mazis Correlation Coefficient (MCC), where MCC has a value between (-1, 1), with larger values indicating better performance.

Compared with the prior art, the invention has the following beneficial effects:

the method has strong generalization capability and outstanding cross-data set performance; the MCC value is greatly improved; no extra debugging and equipment are needed, and the running environment is not limited.

Drawings

FIG. 1 is a flow chart of the system of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

As shown in fig. 1, a catch communication detection method based on deep learning includes the following steps:

step one, processing a face image;

firstly, a human face needs to be positioned in each image to avoid interference information; processing all input images by using a face coordinate detector of Openface 2.0; the detector has robustness to illumination and background, and can maintain the accuracy of positioning the characteristic points; in the training process, once the face is detected, the face is aligned through affine transformation, and then the face area is cut into a face image with the resolution of 224 multiplied by 224 and with a fixed size;

considering that the positive and negative samples of the data set are seriously unbalanced, if the negative sample is far more than the positive sample, the positive sample is increased by adopting a synthesized few sample oversampling technology (SMOTE), the difference between two sample groups is reduced, and model overfitting is avoided; meanwhile, in order to improve the generalization capability of the model, a training data set is expanded by utilizing Gaussian blur and adjusting brightness in the training process;

step two, regression of sight line direction;

designing a sight line regression model based on deep learning, and training by using the 224 multiplied by 224 fixed-size face image obtained in the step one to obtain a VH sight line regression model; selecting a residual error network ResNet-50 to train the model, setting the learning rate to be 0.01, and using a Mean Square Error (MSE) as a loss function to finally obtain a VH sight regression model of which the training batch size is 64 and which is trained in 10000 steps; the VH sight regression model outputs deflection angles (V, H) of sight in horizontal and vertical directions; training a catch-eye contact binary classifier through (V, H) to generate a final catch communication detection result;

thirdly, detecting sight line communication;

eye contact is detected using gaze directions (V, H) obtained with a VH gaze regression model as input; training a binary classifier by using a random forest, wherein the set random forest comprises 28 decision trees, each decision tree is trained based on bootstrap sampling, a binary result output by the binary classifier is determined by all the trees, and each tree is voted respectively; finally, obtaining a binary classifier, thereby obtaining a complete catch communication detection system;

due to the imbalance of the data set, the results of the test were evaluated using the Mazis Correlation Coefficient (MCC), where MCC has a value between (-1, 1), with larger values indicating better performance. In the test results, the average MCC of the present system was found to reach 0.92, which is much higher than that of other systems. To validate the performance in detail, we give results for head pose deflections 0 °, ± 15 °, ± 30 °; head postures that deflect 0 ° and ± 15 ° frequently occur in social interaction, but postures of ± 30 ° are generally considered as extreme postures in line-of-sight studies; our system has MCC values as high as 0.94 and 0.94 at 0 ° and ± 15 °, and reaches 0.84 at extreme postures ± 30 °.

Generally speaking, the catch of eye exchange detection method based on deep learning disclosed by the invention, firstly train a sight regression model based on deep learning, namely VH model, the model outputs deflection angles in horizontal direction and vertical direction to represent the sight direction of eyes, and then the sight direction result is transmitted to a binary classifier to judge whether catch of eye exchange occurs; in the present invention, because Columbia-Gaze has fine-grained Gaze direction and head pose, we chose Columbia-Gaze as the dataset we trained and tested; finally, the MCC value of the method on the Columbia-Gaze data set reaches 0.92; the training process includes four key stages: acquiring a face image, enhancing the image, training a sight regression model and training an eye contact detector; the test process is divided into three stages: acquiring a face image, a sight direction regression and an eye contact detection; in this example, 80% of the data set was used for the training process and 20% of the data set was used for the test.

The present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents and are included in the scope of the present invention.

Claims

1. A catch communication detection method based on deep learning is characterized by comprising the following steps:

step one, processing a face image;

firstly, a human face needs to be positioned in each image to avoid interference information; processing all input images by using a face coordinate detector of Openface 2.0; the detector has robustness to illumination and background, and can maintain the accuracy of positioning the characteristic points; in the training process, once a human face is detected, aligning the human face through affine transformation, and then cutting a human face area into a human face image with a fixed size and a resolution of 224 multiplied by 224;

considering that the positive and negative samples of the data set are seriously unbalanced, if the negative sample is far more than the positive sample, the positive sample is increased by adopting a synthesized minority sample oversampling technology SMOTE, the difference between two sample groups is reduced, and model overfitting is avoided; meanwhile, in order to improve the generalization capability of the model, a training data set is expanded by utilizing Gaussian blur and adjusting brightness in the training process;

step two, regression of sight line direction;

designing a sight line regression model based on deep learning, and training by using the 224 multiplied by 224 fixed-size face image obtained in the step one to obtain a VH sight line regression model; selecting a residual error network ResNet-50 to train the model, setting the learning rate to be 0.01, and using a mean square error MSE as a loss function to finally obtain a VH sight regression model with the training batch size of 64 and training steps of 10000; the VH sight regression model outputs a horizontal deflection angle V and a vertical deflection angle H of the sight, and a binary classifier for eye contact is trained through the V and the H to generate a final eye contact alternating current detection result;

thirdly, detecting sight line communication;

detecting eye contact by using a horizontal deflection angle V and a vertical deflection angle H of the sight line obtained by a VH sight line regression model as inputs; training a binary classifier by using a random forest, wherein the set random forest comprises 28 decision trees, each decision tree is trained based on bootstrap sampling, a binary result output by the binary classifier is determined by all the trees, and each tree is voted respectively; finally, obtaining a binary classifier, thereby obtaining a complete catch communication detection system;

due to the imbalance of the data set, the test results were evaluated using a mausus correlation coefficient, where the value of the mausus correlation coefficient is between (-1, 1), with larger values indicating better performance.