CN110647800B - Eye contact communication detection method based on deep learning - Google Patents

Eye contact communication detection method based on deep learning Download PDF

Info

Publication number
CN110647800B
CN110647800B CN201910720061.8A CN201910720061A CN110647800B CN 110647800 B CN110647800 B CN 110647800B CN 201910720061 A CN201910720061 A CN 201910720061A CN 110647800 B CN110647800 B CN 110647800B
Authority
CN
China
Prior art keywords
training
sight
model
regression model
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910720061.8A
Other languages
Chinese (zh)
Other versions
CN110647800A (en
Inventor
张宏
李碧蓉
何力
管贻生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201910720061.8A priority Critical patent/CN110647800B/en
Publication of CN110647800A publication Critical patent/CN110647800A/en
Application granted granted Critical
Publication of CN110647800B publication Critical patent/CN110647800B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation

Abstract

The invention discloses a catch communication detection method based on deep learning, which comprises the steps of firstly training a sight regression model based on deep learning, namely a VH model, outputting deflection angles in the horizontal direction and the vertical direction to represent the sight direction of eyes by the model, and then transmitting the obtained sight direction result to a binary classifier to judge whether catch communication occurs or not; in the present invention, because Columbia-Gaze has fine-grained Gaze direction and head pose, we chose Columbia-Gaze as the dataset we trained and tested; finally, the MCC value of the method on the Columbia-Gaze data set reaches 0.92; the training process includes four key stages: acquiring a face image, enhancing the image, training a sight regression model and training an eye contact detector; the test process is divided into three stages: acquiring a face image, performing visual line direction regression and performing eye contact detection.

Description

Eye contact communication detection method based on deep learning
Technical Field
The invention relates to the technical field of computer vision detection, in particular to a catch communication detection method based on deep learning.
Background
Catch of eye communication is a main form of non-verbal communication, and plays a vital role in daily social interaction of people. It is often used by people to receive information, adjust the interaction atmosphere, and express emotions. While it is now difficult to determine the exact angle at which someone else looks at him, it seems easy to determine whether someone else looks at him, and when. Therefore, eye detection research is in the process of human-computer interaction, driver distraction detection, life recording, children autism detection, user interface and terminal eye tracking.
Today, most gaze-based interactive systems rely on gaze tracking. They can get the user's gaze direction but cannot directly perceive whether catch communication is occurring. Although eye tracking has been widely studied, some methods have relatively high accuracy, but have limitations, such as the need for infrared devices, head-mounted devices, complicated calibration, etc., which limit the practical applications.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a catch communication detection method based on deep learning, which is combined with computer vision, free of constraint, calibration and any external equipment.
The purpose of the invention is realized by the following technical scheme:
a catch communication detection method based on deep learning comprises the following steps:
step one, processing a face image;
firstly, a human face needs to be positioned in each image to avoid interference information; processing all input images by using a face coordinate detector of Openface 2.0; the detector has robustness to illumination and background, and can keep accuracy to the positioning of the characteristic points; in the training process, once a human face is detected, aligning the human face through affine transformation, and then cutting a human face area into a human face image with a fixed size and a resolution of 224 multiplied by 224;
considering that the positive and negative samples of the data set are seriously unbalanced, and the negative sample is far more than the positive sample, a synthesized few sample oversampling technology (SMOTE) is adopted to increase the positive sample, reduce the difference between two sample groups and avoid model overfitting; meanwhile, in order to improve the generalization capability of the model, a training data set is expanded by utilizing Gaussian blur and adjusting brightness in the training process;
step two, regression of sight line direction;
designing a sight line regression model based on deep learning, and training by using the 224 multiplied by 224 fixed-size face image obtained in the step one to obtain a VH sight line regression model; selecting a residual error network ResNet-50 to train the model, setting the learning rate to be 0.01, and using a Mean Square Error (MSE) as a loss function to finally obtain a VH visual line regression model with the training batch size of 64 and training steps of 10000; the VH sight regression model outputs deflection angles (V, H) of sight in horizontal and vertical directions; training a catch contact binary classifier through (V, H) to generate a final catch communication detection result;
thirdly, detecting sight line communication;
eye contact is detected using gaze directions (V, H) obtained with a VH gaze regression model as input; training a binary classifier by using a random forest, wherein the set random forest comprises 28 decision trees, each decision tree is trained on the basis of bootstrap sampling, and binary results output by the binary classifier are determined by all the trees, wherein each tree votes respectively; finally, obtaining a binary classifier, thereby obtaining a complete catch communication detection system;
due to the imbalance of the data set, the results of the test were evaluated using the Mazis Correlation Coefficient (MCC), where MCC has a value between (-1, 1), with larger values indicating better performance.
Compared with the prior art, the invention has the following beneficial effects:
the method has strong generalization capability and outstanding cross-data set performance; the MCC value is greatly improved; no extra debugging and equipment are needed, and the running environment is not limited.
Drawings
FIG. 1 is a flow chart of the system of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
As shown in fig. 1, a catch communication detection method based on deep learning includes the following steps:
step one, processing a face image;
firstly, a human face needs to be positioned in each image to avoid interference information; processing all input images by using a face coordinate detector of Openface 2.0; the detector has robustness to illumination and background, and can maintain the accuracy of positioning the characteristic points; in the training process, once the face is detected, the face is aligned through affine transformation, and then the face area is cut into a face image with the resolution of 224 multiplied by 224 and with a fixed size;
considering that the positive and negative samples of the data set are seriously unbalanced, if the negative sample is far more than the positive sample, the positive sample is increased by adopting a synthesized few sample oversampling technology (SMOTE), the difference between two sample groups is reduced, and model overfitting is avoided; meanwhile, in order to improve the generalization capability of the model, a training data set is expanded by utilizing Gaussian blur and adjusting brightness in the training process;
step two, regression of sight line direction;
designing a sight line regression model based on deep learning, and training by using the 224 multiplied by 224 fixed-size face image obtained in the step one to obtain a VH sight line regression model; selecting a residual error network ResNet-50 to train the model, setting the learning rate to be 0.01, and using a Mean Square Error (MSE) as a loss function to finally obtain a VH sight regression model of which the training batch size is 64 and which is trained in 10000 steps; the VH sight regression model outputs deflection angles (V, H) of sight in horizontal and vertical directions; training a catch-eye contact binary classifier through (V, H) to generate a final catch communication detection result;
thirdly, detecting sight line communication;
eye contact is detected using gaze directions (V, H) obtained with a VH gaze regression model as input; training a binary classifier by using a random forest, wherein the set random forest comprises 28 decision trees, each decision tree is trained based on bootstrap sampling, a binary result output by the binary classifier is determined by all the trees, and each tree is voted respectively; finally, obtaining a binary classifier, thereby obtaining a complete catch communication detection system;
due to the imbalance of the data set, the results of the test were evaluated using the Mazis Correlation Coefficient (MCC), where MCC has a value between (-1, 1), with larger values indicating better performance. In the test results, the average MCC of the present system was found to reach 0.92, which is much higher than that of other systems. To validate the performance in detail, we give results for head pose deflections 0 °, ± 15 °, ± 30 °; head postures that deflect 0 ° and ± 15 ° frequently occur in social interaction, but postures of ± 30 ° are generally considered as extreme postures in line-of-sight studies; our system has MCC values as high as 0.94 and 0.94 at 0 ° and ± 15 °, and reaches 0.84 at extreme postures ± 30 °.
Generally speaking, the catch of eye exchange detection method based on deep learning disclosed by the invention, firstly train a sight regression model based on deep learning, namely VH model, the model outputs deflection angles in horizontal direction and vertical direction to represent the sight direction of eyes, and then the sight direction result is transmitted to a binary classifier to judge whether catch of eye exchange occurs; in the present invention, because Columbia-Gaze has fine-grained Gaze direction and head pose, we chose Columbia-Gaze as the dataset we trained and tested; finally, the MCC value of the method on the Columbia-Gaze data set reaches 0.92; the training process includes four key stages: acquiring a face image, enhancing the image, training a sight regression model and training an eye contact detector; the test process is divided into three stages: acquiring a face image, a sight direction regression and an eye contact detection; in this example, 80% of the data set was used for the training process and 20% of the data set was used for the test.
The method has strong generalization capability and outstanding cross-data set performance; the MCC value is greatly improved; no extra debugging and equipment are needed, and the running environment is not limited.
The present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents and are included in the scope of the present invention.

Claims (1)

1. A catch communication detection method based on deep learning is characterized by comprising the following steps:
step one, processing a face image;
firstly, a human face needs to be positioned in each image to avoid interference information; processing all input images by using a face coordinate detector of Openface 2.0; the detector has robustness to illumination and background, and can maintain the accuracy of positioning the characteristic points; in the training process, once a human face is detected, aligning the human face through affine transformation, and then cutting a human face area into a human face image with a fixed size and a resolution of 224 multiplied by 224;
considering that the positive and negative samples of the data set are seriously unbalanced, if the negative sample is far more than the positive sample, the positive sample is increased by adopting a synthesized minority sample oversampling technology SMOTE, the difference between two sample groups is reduced, and model overfitting is avoided; meanwhile, in order to improve the generalization capability of the model, a training data set is expanded by utilizing Gaussian blur and adjusting brightness in the training process;
step two, regression of sight line direction;
designing a sight line regression model based on deep learning, and training by using the 224 multiplied by 224 fixed-size face image obtained in the step one to obtain a VH sight line regression model; selecting a residual error network ResNet-50 to train the model, setting the learning rate to be 0.01, and using a mean square error MSE as a loss function to finally obtain a VH sight regression model with the training batch size of 64 and training steps of 10000; the VH sight regression model outputs a horizontal deflection angle V and a vertical deflection angle H of the sight, and a binary classifier for eye contact is trained through the V and the H to generate a final eye contact alternating current detection result;
thirdly, detecting sight line communication;
detecting eye contact by using a horizontal deflection angle V and a vertical deflection angle H of the sight line obtained by a VH sight line regression model as inputs; training a binary classifier by using a random forest, wherein the set random forest comprises 28 decision trees, each decision tree is trained based on bootstrap sampling, a binary result output by the binary classifier is determined by all the trees, and each tree is voted respectively; finally, obtaining a binary classifier, thereby obtaining a complete catch communication detection system;
due to the imbalance of the data set, the test results were evaluated using a mausus correlation coefficient, where the value of the mausus correlation coefficient is between (-1, 1), with larger values indicating better performance.
CN201910720061.8A 2019-08-06 2019-08-06 Eye contact communication detection method based on deep learning Active CN110647800B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910720061.8A CN110647800B (en) 2019-08-06 2019-08-06 Eye contact communication detection method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910720061.8A CN110647800B (en) 2019-08-06 2019-08-06 Eye contact communication detection method based on deep learning

Publications (2)

Publication Number Publication Date
CN110647800A CN110647800A (en) 2020-01-03
CN110647800B true CN110647800B (en) 2022-06-03

Family

ID=68990009

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910720061.8A Active CN110647800B (en) 2019-08-06 2019-08-06 Eye contact communication detection method based on deep learning

Country Status (1)

Country Link
CN (1) CN110647800B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116311481B (en) * 2023-05-19 2023-08-25 广州视景医疗软件有限公司 Construction method, device and storage medium of enhanced vision estimation model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106133648A (en) * 2014-03-26 2016-11-16 微软技术许可有限责任公司 Eye gaze based on self adaptation homography is followed the tracks of
CN109389037A (en) * 2018-08-30 2019-02-26 中国地质大学(武汉) A kind of sensibility classification method based on depth forest and transfer learning
CN109583338A (en) * 2018-11-19 2019-04-05 山东派蒙机电技术有限公司 Driver Vision decentralized detection method based on depth integration neural network
KR20190063582A (en) * 2017-11-30 2019-06-10 재단법인대구경북과학기술원 Method for Estimating Driver's Gaze Zone by Transfer Learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7762665B2 (en) * 2003-03-21 2010-07-27 Queen's University At Kingston Method and apparatus for communication between humans and devices

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106133648A (en) * 2014-03-26 2016-11-16 微软技术许可有限责任公司 Eye gaze based on self adaptation homography is followed the tracks of
KR20190063582A (en) * 2017-11-30 2019-06-10 재단법인대구경북과학기술원 Method for Estimating Driver's Gaze Zone by Transfer Learning
CN109389037A (en) * 2018-08-30 2019-02-26 中国地质大学(武汉) A kind of sensibility classification method based on depth forest and transfer learning
CN109583338A (en) * 2018-11-19 2019-04-05 山东派蒙机电技术有限公司 Driver Vision decentralized detection method based on depth integration neural network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Detecting eye contact using wearable eye-tracking glasses;Ye, Zhefan, et al.;《Acm Conference on Ubiquitous Computing》;20121231;全文 *
Everyday Eye Contact Detection Using Unsupervised Gaze Target Discovery;Zhang, XC,et.al;《PROCEEDINGS OF THE 30TH ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY》;20171231;全文 *
随机场景人眼检测实时追踪及其应用技术研究;李斌;《中国优秀博士学位论文全文数据库》;20180531;全文 *

Also Published As

Publication number Publication date
CN110647800A (en) 2020-01-03

Similar Documents

Publication Publication Date Title
US11417148B2 (en) Human face image classification method and apparatus, and server
CN107220942B (en) Method and apparatus for image representation and processing of dynamic vision sensors
CN107545302B (en) Eye direction calculation method for combination of left eye image and right eye image of human eye
KR102334139B1 (en) Eye gaze tracking based upon adaptive homography mapping
US10796685B2 (en) Method and device for image recognition
CN112633384B (en) Object recognition method and device based on image recognition model and electronic equipment
CN107491729B (en) Handwritten digit recognition method based on cosine similarity activated convolutional neural network
CN113221771B (en) Living body face recognition method, device, apparatus, storage medium and program product
CN111539897A (en) Method and apparatus for generating image conversion model
CN110555426A (en) Sight line detection method, device, equipment and storage medium
JP2014032623A (en) Image processor
CN111353336B (en) Image processing method, device and equipment
CN106407966B (en) A kind of face identification method applied to attendance
CN110647800B (en) Eye contact communication detection method based on deep learning
Kurdthongmee et al. A yolo detector providing fast and accurate pupil center estimation using regions surrounding a pupil
CN113435408A (en) Face living body detection method and device, electronic equipment and storage medium
Murthy et al. Deep learning-based eye gaze estimation for military aviation
CN116091551A (en) Target retrieval tracking method and system based on multi-mode fusion
CN114119990B (en) Method, apparatus and computer program product for image feature point matching
CN109993767A (en) Image processing method and system
CN113741682A (en) Method, device and equipment for mapping fixation point and storage medium
CN115170919A (en) Image processing model training method, image processing device, image processing equipment and storage medium
KR20220146663A (en) Video recovery methods, devices, appliances, media and computer programs
CN114093007A (en) Binocular camera face image abnormity monitoring method and system
Jadhav et al. GoogLeNet application towards gesture recognition for ASL character identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant