CN110458001A - A kind of convolutional neural networks gaze estimation method and system based on attention mechanism - Google Patents

A kind of convolutional neural networks gaze estimation method and system based on attention mechanism Download PDF

Info

Publication number
CN110458001A
CN110458001A CN201910578161.1A CN201910578161A CN110458001A CN 110458001 A CN110458001 A CN 110458001A CN 201910578161 A CN201910578161 A CN 201910578161A CN 110458001 A CN110458001 A CN 110458001A
Authority
CN
China
Prior art keywords
camera
axis
image
convolutional neural
neural networks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910578161.1A
Other languages
Chinese (zh)
Inventor
李菁
钟艺豪
陈则金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanchang University
Original Assignee
Nanchang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanchang University filed Critical Nanchang University
Priority to CN201910578161.1A priority Critical patent/CN110458001A/en
Publication of CN110458001A publication Critical patent/CN110458001A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/19Sensors therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/193Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/197Matching; Classification

Abstract

The invention discloses a kind of convolutional neural networks gaze estimation methods based on attention mechanism, comprising the following steps: step 1: being positioned using local restriction neuron domain to face key point;Step 2: eyes image being intercepted using the coordinate points that step 1 detects;Step 3: the image being truncated to is standardized;Step 4: the convolutional neural networks that the image after standardization is sent into attention mechanism being returned, the sight angle coordinate estimated.Present invention design makes the position in the high-rise feature for extracting feature substantially from pupil reduce error to preferably improve accuracy rate using attention mechanism network;And by critical point detection so that cut-out photo resolution is smaller, so that rapidity be made to be improved.

Description

A kind of convolutional neural networks gaze estimation method and system based on attention mechanism
Technical field
The present invention relates to image procossings and area of pattern recognition, and in particular to a kind of convolutional Neural based on attention mechanism Network gaze estimation method and system.
Background technique
Sight estimation is a classical problem in computer vision research, existing to be estimated based on eye image progress sight The method of meter has: (1) pupil corneal reflection method;(2) iris-corneoscleral limbus method;(3) appearance based on convolutional neural networks Method.
The main problem of present method has: (1) head movement bring sight estimation inaccuracy;(2) calibration for cameras is needed, Need to measure environment distance;(3) profession, expensive hardware device are needed;(3) precision is not high enough.
Summary of the invention
The purpose of the present invention is to provide a kind of convolutional neural networks gaze estimation methods based on attention mechanism, thus It can succinctly, conveniently, accurately realize the sight estimation of people.
To achieve the above object, the invention provides the following technical scheme: a kind of convolutional Neural net based on attention mechanism Network gaze estimation method, comprising the following steps:
Step 1: face key point being positioned using local restriction neuron domain;
Step 2: eyes image being intercepted using the coordinate points that step 1 detects;
Step 3: the image being truncated to is standardized;
Step 4: the convolutional neural networks that the image after standardization is sent into attention mechanism being returned, are estimated The sight angle coordinate of meter.
Preferably, described image standardization is the affine transformation by image, and image is transformed into a standardization Camera space, in this standardization camera space, as the head of the people in all images with the distance of camera is, and Head pose is also the same.
Preferably, described image standardization includes that there are three steps:
Step 1: using camera coordinates system as world coordinate system, it is known that eyes centre coordinate ecWith head pose spin matrix R, The z-axis that first camera is rotated to camera is directed at two centers;This step need to only allow camera z-axis to be aligned eyes centre coordinate ec, can Obtaining postrotational camera z-axis is rz=ec/||ec||;
Step 2: it is in the same plane that camera around z-axis rotates the x-axis to the x-axis of camera and head pose;Due to head The x-axis of posture is known quantity, is the first row R of head pose spin matrix Rx, to allow postrotational camera x-axis rxAnd RxIt is located at Same plane then needs to meet postrotational camera y-axis ryPerpendicular to this plane;R againyPerpendicular to postrotational camera z-axis rz, because This, ryIt can be by RxAnd rzCross product acquire: ry=Rx×rz;rxIt can be by ryAnd rzCross product acquire: rx=ry×rz;Then, it obtains The spin matrix R of camerac=[rx,ry,rz];
Step 3: the distance at standardization eyes center to image center;This step can be by the z-axis realization of scaling camera, i.e., Define a scaling matrix S=diag (1,1, d/ | | ec| |), wherein d be eyes center to image center standardization away from From.
Preferably, the attention power module of the convolutional neural networks of the attention mechanism is made of binary channels;
Upper layer is known as main channel, by CNN module composition;
Lower layer is known as mask channel, is bottom-up-top-down hourglass network.
Preferably, for an input picture I, remember that the output of main channel is F (I), the output in mask channel is A (I), then Notice that the output M (I) of power module can be obtained according to the dot product of F (I) and A (I): Mc(I)=Fc(I)+Fc(I)·Ac(I);
In formula: Fc(I) c-th of channel of F (I), A are indicatedc(I) c-th of channel of A (I), symbol representing matrix are indicated Dot product.
The utility model has the advantages that
(1) a kind of convolutional neural networks gaze estimation method and system based on attention mechanism of the invention, design are adopted Make the position in the high-rise feature for extracting feature substantially from pupil with attention mechanism network, to preferably improve quasi- True rate reduces error;And by critical point detection so that cut-out photo resolution is smaller, to make rapidity It is improved.
(2) a kind of convolutional neural networks gaze estimation method and system based on attention mechanism of the invention has standard Really (accuracy is improved, and can achieve error and only has 4.8 °), objective, convenient (without harsh laboratory environment, without spy Different equipment, only need common a camera or smart phone), quick advantage.
Detailed description of the invention
Fig. 1 is the method for the present invention flow diagram.
Fig. 2 is the convolutional neural networks structure chart of attention mechanism in the present invention.
Fig. 3 is attention function structure chart in the present invention.
Specific embodiment
Embodiments of the present invention will be further described below with reference to the accompanying drawings.
As shown in Figure 1-3, firstly, using local restriction neuron domain (Constrained Local Neural Fields, CLNF) face key point positioned.
However, different head poses is different when we intercept eyes image using the coordinate points that CLNF is detected Shooting distance, can all cause different image sizes, convolutional neural networks (Convolutional Neural Networks, CNN) desired input picture size is often consistent, and usual way is zoomed image to fixed size, in this way meeting Picture is caused to be distorted, especially when handling sight estimation task, this pantography can seriously affect the performance of network, bring partially Difference.In order to solve this problem, we introduce image standardization technology: i.e. by the affine transformation of image, image being transformed into One standardized camera space, in this standardization camera space, the head of the people in all images and the distance of camera It is the same, and head pose is also the same.Specifically, can be divided into the following three steps:
1) using camera coordinates system as world coordinate system, it is known that eyes centre coordinate ecWith head pose spin matrix R, first will Camera, which is rotated to the z-axis of camera, is directed at two centers.This step need to only allow camera z-axis to be aligned eyes centre coordinate ec, can must revolve Camera z-axis after turning is rz=ec/||ec||。
2) then around z-axis to rotate the x-axis to the x-axis of camera and head pose in the same plane for camera.Due to head The x-axis of posture is known quantity, is the first row R of head pose spin matrix Rx, to allow postrotational camera x-axis rxAnd RxIt is located at Same plane then needs to meet postrotational camera y-axis ryPerpendicular to this plane.R againyPerpendicular to postrotational camera z-axis rz, because This, ryIt can be by RxAnd rzCross product acquire:
ry=Rx×rz (1)
rxIt can be by ryAnd rzCross product acquire:
rx=ry×rz (2)
Then, the spin matrix R of camera is obtainedc=[rx,ry,rz]。
3) distance of the standardization eyes center to image center.This step can be realized by scaling the z-axis of camera, that is, be defined One scaling matrix S=diag (1,1, d/ | | ec| |), wherein d is standardization distance of the eyes center to image center, D=600mm is taken in the application.
By above three step, we are available camera transition matrix M=SRc.In actual operation, it to be marked The image of standardization is needed by an affine transformation matrixWherein CrFor the true internal reference matrix of camera, and Cs For the internal reference matrix of virtual camera in standardised space.After standardization, head pose spin matrix is become by original RWatch vector attentively is become from original gIn addition, watching vector attentively can further be turned by three-dimensional cartesian coordinate system Change spheroidal coordinate system intoWhereinTo predict three variables Problem is changed into prediction two.
Finally, the picture after standardization to be sent into the convolutional neural networks of attention mechanism.
The attention power module of the network is made of binary channels: upper layer is known as main channel, by the residual error module etc. of ResNet Popular CNN module composition;Lower layer is known as mask channel, is bottom-up-top-down hourglass network.For one Input picture I remembers that the output of main channel is F (I), and the output in mask channel is A (I), then notices that the output M (I) of power module can To be obtained according to the dot product of F (I) and A (I):
Mc(I)=Fc(I)+Fc(I)·Ac(I) (3)
In formula: Fc(I) c-th of channel of F (I), A are indicatedc(I) c-th of channel of A (I), symbol representing matrix are indicated Dot product.
By stacking such attention power module, the attention mechanism CNN of depth is formed.In this way, estimating task in sight In, notice that power module can begin look for the position of eye pupil in the picture from bottom, and be constantly increasing the weight of the position And reduce the weight of other irrelevant positions, to it is high-rise when, the position of the feature of extraction substantially from eye pupil.By these Feature, which is sent into classifier, classifies, and can obtain high-accuracy.
In an experiment, we test ResNet-50 we by the size of convolution kernel in first convolutional layer by original 7 × 7 are revised as 5 × 5, to adapt to our small-sized image input (36 × 224), and the softmax of the last layer layer are changed to entirely Articulamentum, for returning two gaze angles;Since sight estimation is the depending on eye locations of the task, it is believed that net The position insensitivity of network will cause the decline of performance.Attention network based on ResNet-50 is referred to as AttentionGazeNet-Res.Loss function:
By the convolutional neural networks of this attention mechanism, we only only have 4.8 ° at sight evaluated error.
A kind of convolutional neural networks gaze estimation method and system based on attention mechanism of the invention, design is using note Meaning power mechanism network makes the position in the high-rise feature for extracting feature substantially from pupil, to preferably improve accurately Rate reduces error.And by critical point detection so that cut-out photo resolution is smaller, so that rapidity be made to obtain To raising.
A kind of convolutional neural networks gaze estimation method and system based on attention mechanism of the invention, it is accurate to have It is (accuracy is improved, and can achieve error and only has 4.8 °), objective, convenient (without harsh laboratory environment, without special Equipment, only need common a camera or smart phone), quick advantage.
Specific embodiments of the present invention are described in detail above, but it is merely an example, the present invention is simultaneously unlimited It is formed on above description specific embodiment.To those skilled in the art, the equivalent modifications and replace that any couple of present invention carries out In generation, is also all among scope of the invention.Therefore, without departing from the spirit and scope of the invention made by equal transformation and repair Change, all covers within the scope of the present invention.

Claims (5)

1. a kind of convolutional neural networks gaze estimation method based on attention mechanism, which comprises the following steps:
Step 1: face key point being positioned using local restriction neuron domain;
Step 2: eyes image being intercepted using the coordinate points that step 1 detects;
Step 3: the image being truncated to is standardized;
Step 4: the convolutional neural networks that the image after standardization is sent into attention mechanism being returned, are estimated Sight angle coordinate.
2. a kind of convolutional neural networks gaze estimation method based on attention mechanism according to claim 1, feature It is:
Described image standardization is the affine transformation by image, and image is transformed into a standardized camera space, In this standardization camera space, as the head of the people in all images with the distance of camera is, and head pose It is the same.
3. a kind of convolutional neural networks gaze estimation method based on attention mechanism according to claim 2, feature It is:
Described image standardization includes that there are three steps:
Step 1: using camera coordinates system as world coordinate system, it is known that eyes centre coordinate ecWith head pose spin matrix R, first will Camera, which is rotated to the z-axis of camera, is directed at two centers;This step need to only allow camera z-axis to be aligned eyes centre coordinate ec, can must revolve Camera z-axis after turning is rz=ec/||ec||;
Step 2: it is in the same plane that camera around z-axis rotates the x-axis to the x-axis of camera and head pose;Due to head pose X-axis be known quantity, be head pose spin matrix R first row Rx, to allow postrotational camera x-axis rxAnd RxPositioned at same Plane then needs to meet postrotational camera y-axis ryPerpendicular to this plane;R againyPerpendicular to postrotational camera z-axis rz, therefore, ry It can be by RxAnd rzCross product acquire: ry=Rx×rz;rxIt can be by ryAnd rzCross product acquire: rx=ry×rz;Then, camera is obtained Spin matrix Rc=[rx,ry,rz];
Step 3: the distance at standardization eyes center to image center;This step can be realized by scaling the z-axis of camera, that is, be defined One scaling matrix S=diag (1,1, d/ | | ec| |), wherein d is standardization distance of the eyes center to image center.
4. a kind of convolutional neural networks gaze estimation method based on attention mechanism according to claim 1, feature It is:
The attention power module of the convolutional neural networks of the attention mechanism is made of binary channels;
Upper layer is known as main channel, by CNN module composition;
Lower layer is known as mask channel, is bottom-up-top-down hourglass network.
5. a kind of convolutional neural networks gaze estimation method based on attention mechanism according to claim 4, feature It is:
For an input picture I, remember that the output of main channel is F (I), the output in mask channel is A (I), then pays attention to power module Output M (I) can be obtained according to the dot product of F (I) and A (I): Mc(I)=Fc(I)+Fc(I)·Ac(I);
In formula: Fc(I) c-th of channel of F (I), A are indicatedc(I) c-th of channel of A (I), the point of symbol representing matrix are indicated Multiply.
CN201910578161.1A 2019-06-28 2019-06-28 A kind of convolutional neural networks gaze estimation method and system based on attention mechanism Pending CN110458001A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910578161.1A CN110458001A (en) 2019-06-28 2019-06-28 A kind of convolutional neural networks gaze estimation method and system based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910578161.1A CN110458001A (en) 2019-06-28 2019-06-28 A kind of convolutional neural networks gaze estimation method and system based on attention mechanism

Publications (1)

Publication Number Publication Date
CN110458001A true CN110458001A (en) 2019-11-15

Family

ID=68481733

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910578161.1A Pending CN110458001A (en) 2019-06-28 2019-06-28 A kind of convolutional neural networks gaze estimation method and system based on attention mechanism

Country Status (1)

Country Link
CN (1) CN110458001A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111523480A (en) * 2020-04-24 2020-08-11 北京嘀嘀无限科技发展有限公司 Method and device for detecting face obstruction, electronic equipment and storage medium
CN112259119A (en) * 2020-10-19 2021-01-22 成都明杰科技有限公司 Music source separation method based on stacked hourglass network
CN112417991A (en) * 2020-11-02 2021-02-26 武汉大学 Double-attention face alignment method based on hourglass capsule network
CN113095274A (en) * 2021-04-26 2021-07-09 中山大学 Sight estimation method, system, device and storage medium
CN113468971A (en) * 2021-06-04 2021-10-01 南昌大学 Variational fixation estimation method based on appearance
CN113505694A (en) * 2021-07-09 2021-10-15 南开大学 Human-computer interaction method and device based on sight tracking and computer equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951875A (en) * 2017-03-24 2017-07-14 深圳市唯特视科技有限公司 The method that a kind of human body attitude estimation and face based on binary system convolution are alignd
EP3203416A1 (en) * 2016-02-05 2017-08-09 IDscan Biometrics Limited Method computer program and system for facial recognition
CN108564016A (en) * 2018-04-04 2018-09-21 北京红云智胜科技有限公司 A kind of AU categorizing systems based on computer vision and method
US20190147607A1 (en) * 2017-11-15 2019-05-16 Toyota Research Institute, Inc. Systems and methods for gaze tracking from arbitrary viewpoints

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3203416A1 (en) * 2016-02-05 2017-08-09 IDscan Biometrics Limited Method computer program and system for facial recognition
CN106951875A (en) * 2017-03-24 2017-07-14 深圳市唯特视科技有限公司 The method that a kind of human body attitude estimation and face based on binary system convolution are alignd
US20190147607A1 (en) * 2017-11-15 2019-05-16 Toyota Research Institute, Inc. Systems and methods for gaze tracking from arbitrary viewpoints
CN108564016A (en) * 2018-04-04 2018-09-21 北京红云智胜科技有限公司 A kind of AU categorizing systems based on computer vision and method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
FEI WANG: "《Residual Attention Network for Image Classification》", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
TADAS BALTRUSAITIS ;: "《OpenFace 2.0: Facial Behavior Analysis Toolkit》", 《2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION》 *
TADAS BALTRUSAITIS: "《Constrained Local Neural Fields for robust facial landmark detection in the wild》", 《2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS》 *
XUCONG ZHANG ET AL;: "《Appearance-Based Gaze Estimation in the Wild》", 《IEEE》 *
YUSUKE SUGANO ET AL: "《Learning by Synthesis for Appearance based 3D Gaze Estimation》", 《IN PROC.CVPR》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111523480A (en) * 2020-04-24 2020-08-11 北京嘀嘀无限科技发展有限公司 Method and device for detecting face obstruction, electronic equipment and storage medium
CN111523480B (en) * 2020-04-24 2021-06-18 北京嘀嘀无限科技发展有限公司 Method and device for detecting face obstruction, electronic equipment and storage medium
CN112259119A (en) * 2020-10-19 2021-01-22 成都明杰科技有限公司 Music source separation method based on stacked hourglass network
CN112417991A (en) * 2020-11-02 2021-02-26 武汉大学 Double-attention face alignment method based on hourglass capsule network
CN112417991B (en) * 2020-11-02 2022-04-29 武汉大学 Double-attention face alignment method based on hourglass capsule network
CN113095274A (en) * 2021-04-26 2021-07-09 中山大学 Sight estimation method, system, device and storage medium
CN113095274B (en) * 2021-04-26 2024-02-09 中山大学 Sight estimation method, system, device and storage medium
CN113468971A (en) * 2021-06-04 2021-10-01 南昌大学 Variational fixation estimation method based on appearance
CN113505694A (en) * 2021-07-09 2021-10-15 南开大学 Human-computer interaction method and device based on sight tracking and computer equipment
CN113505694B (en) * 2021-07-09 2024-03-26 南开大学 Man-machine interaction method and device based on sight tracking and computer equipment

Similar Documents

Publication Publication Date Title
CN110458001A (en) A kind of convolutional neural networks gaze estimation method and system based on attention mechanism
US10782095B2 (en) Automatic target point tracing method for electro-optical sighting system
CN106919944B (en) ORB algorithm-based large-view-angle image rapid identification method
Chaumette et al. Structure from controlled motion
CN109029433A (en) Join outside the calibration of view-based access control model and inertial navigation fusion SLAM on a kind of mobile platform and the method for timing
CN107953329B (en) Object recognition and attitude estimation method and device and mechanical arm grabbing system
KR20160138062A (en) Eye gaze tracking based upon adaptive homography mapping
CN110782499B (en) Calibration method and calibration device for augmented reality equipment and terminal equipment
CN103839277B (en) A kind of mobile augmented reality register method of outdoor largescale natural scene
CN109255813A (en) A kind of hand-held object pose real-time detection method towards man-machine collaboration
CN108509848A (en) The real-time detection method and system of three-dimension object
CN107990899A (en) A kind of localization method and system based on SLAM
CN106355147A (en) Acquiring method and detecting method of live face head pose detection regression apparatus
CN110399809A (en) The face critical point detection method and device of multiple features fusion
CN109074657A (en) Target tracking method and device, electronic equipment and readable storage medium
CN108805987A (en) Combined tracking method and device based on deep learning
CN114022560A (en) Calibration method and related device and equipment
CN112053447A (en) Augmented reality three-dimensional registration method and device
CN113642393B (en) Attention mechanism-based multi-feature fusion sight estimation method
CN109583187A (en) A kind of augmented reality identifying code method and application
CN112657176A (en) Binocular projection man-machine interaction method combined with portrait behavior information
CN111325828B (en) Three-dimensional face acquisition method and device based on three-dimensional camera
Xu et al. Robust hand gesture recognition based on RGB-D Data for natural human–computer interaction
Su et al. Virtual keyboard: A human-computer interaction device based on laser and image processing
Zhang et al. A visual-inertial dynamic object tracking SLAM tightly coupled system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191115

RJ01 Rejection of invention patent application after publication