CN109508679A - Realize method, apparatus, equipment and the storage medium of eyeball three-dimensional eye tracking - Google Patents
Realize method, apparatus, equipment and the storage medium of eyeball three-dimensional eye tracking Download PDFInfo
- Publication number
- CN109508679A CN109508679A CN201811375929.7A CN201811375929A CN109508679A CN 109508679 A CN109508679 A CN 109508679A CN 201811375929 A CN201811375929 A CN 201811375929A CN 109508679 A CN109508679 A CN 109508679A
- Authority
- CN
- China
- Prior art keywords
- eyeball
- facial image
- head pose
- network
- dimensional
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 210000005252 bulbus oculi Anatomy 0.000 title claims abstract description 153
- 210000001508 eye Anatomy 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 title claims abstract description 42
- 210000003128 head Anatomy 0.000 claims abstract description 114
- 230000001815 facial effect Effects 0.000 claims abstract description 105
- 238000001514 detection method Methods 0.000 claims abstract description 103
- 230000009471 action Effects 0.000 claims abstract description 16
- 230000000007 visual effect Effects 0.000 claims abstract description 14
- 230000006870 function Effects 0.000 claims description 18
- 238000013480 data collection Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 210000002569 neuron Anatomy 0.000 claims description 4
- 238000007689 inspection Methods 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 description 13
- 238000005516 engineering process Methods 0.000 description 7
- 238000013135 deep learning Methods 0.000 description 5
- 208000017442 Retinal disease Diseases 0.000 description 4
- 206010038923 Retinopathy Diseases 0.000 description 4
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000009434 installation Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000001568 sexual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
Abstract
The invention discloses a kind of method, apparatus for realizing eyeball three-dimensional eye tracking, equipment and computer readable storage mediums, include: the head pose detection network that facial image to be detected is input to and is constructed in advance, obtains the head pose in the facial image;The facial image is input to the eyeball motion detection network constructed in advance, obtains the eyeball movement of the facial image;The head pose and the eyeball action input to the three-dimensional sight line vector constructed in advance are detected into network, obtain the three-dimensional direction of visual lines vector of eyeball in the facial image.Method, apparatus, equipment and computer readable storage medium provided by the present invention can extract the three-dimensional direction of visual lines vector of the person's of being taken eyeball from two-dimensional facial image, have a wide range of applications scene.
Description
Technical field
The present invention relates to eye tracking technical field, more particularly to a kind of method for realizing eyeball three-dimensional eye tracking,
Device, equipment and computer readable storage medium.
Background technique
The research of eye tracking algorithm has had more mature achievement, and successfully real in many business applications
It is existing, such as VR/AR technology, although traditional eye tracking technology can be realized higher precision, eye tracking at this stage
Algorithm is substantially based on traditional image processing method, and dependent on expensive infrared equipment, and needs are special in head installation
Detection device, detect the feature of eyeball.Traditional image processing method detection accuracy by light variation influenced, and detect away from
From by serious constraint.So being badly in need of a kind of a kind of RGB image that can be shot by common camera realizes eye tracking
Algorithm.In computer vision field, depth convolutional neural networks achieve great achievement, such as target inspection at many aspects
Survey, example segmentation etc..
Also there is the eye tracking technology based on deep learning accordingly in the prior art, the specific steps are as follows: obtain view
Film lesion image data;Data mark is carried out to retinopathy image data, obtains labeled data;Establish initial depth study
Network;Retinopathy image data is inputted in initial depth learning network, output obtains corresponding prediction data;Utilize damage
It loses function to be compared the corresponding labeled data of retinopathy image data and prediction data, obtains comparison result;According to
Comparison result adjusts the parameter in initial depth learning network, until comparison result reaches preset threshold, obtains final depth
Learning network model;Retinopathy image data to be measured is handled using deep learning network model, is obtained corresponding
Eyeball centre coordinate and eyeball diameter.
Therefore in existing eye tracking technology, one is realize eye tracking skill based on traditional image processing algorithm
Art, although this kind of algorithm has had more mature business application, traditional image processing algorithm detection accuracy is by light
The influence of variation, and dependent on expensive head-mount infrored equipment, the convenient sexual experience on head is poor, detecting distance
It suffers restraints.Another kind is the eye tracking algorithm based on deep learning algorithm, however existing based on being based on depth in technology
The eye tracking algorithm of learning algorithm is only capable of detection eyeball center and eyeball diameter, only comprising the two dimension letter of eyeball movement
Breath, application scenarios suffer restraints.
In summary as can be seen that it is current for how obtaining the three-dimensional direction of visual lines vector of eyeball by two-dimension human face image
Problem to be solved.
Summary of the invention
The object of the present invention is to provide a kind of method, apparatus for realizing eyeball three-dimensional eye tracking, equipment and computers
Readable storage medium storing program for executing can only detect the two dimension letter of eyeball to solve the eye tracking algorithm based on deep learning in the prior art
The problem of breath.
In order to solve the above technical problems, the present invention provides a kind of method for realizing eyeball three-dimensional eye tracking, comprising: will be to
The facial image of detection is input to the head pose detection network constructed in advance, obtains the head pose in the facial image;
The facial image is input to the eyeball motion detection network constructed in advance, obtains the eyeball movement of the facial image;It will
The head pose and the eyeball action input to the three-dimensional sight line vector constructed in advance detect network, obtain the face figure
The three-dimensional direction of visual lines vector of eyeball as in.
Preferably, described that facial image to be detected is input to the head pose detection network constructed in advance, obtain institute
Include: before stating the head pose in facial image
The facial image of several three-dimensional labels with head pose and eyeball sight is acquired, face image data is constructed
Collection, wherein the facial image is RGB image;
Construct initial head pose detection network and initial eyeball motion detection network;
It is dynamic to the initial head pose detection network and the initial eyeball respectively using the face image data collection
Make detection network to be trained, obtains the head pose detection network for completing training and the eyeball motion detection network.
Preferably, the facial image for acquiring several three-dimensional labels with head pose and eyeball sight, constructs people
Face image data set includes:
The facial image for acquiring data set provider respectively using each camera in the battle array camera array of face, obtains face figure
As the first subclass;
Every row camera collects several facial images in the face battle array camera array, indicates that the data set provider exists
The different head pose in the direction y;
Several collected facial images of each column camera in the face battle array camera array, indicate the data set provider
In the different head pose in the direction p;
The face battle array collected facial image of camera array is carried out clockwise and counterclockwise respectively
Rotation obtains indicating that the data set provider is closed in the facial image second subset of the different head pose in the direction r;
Merge first subclass of facial image and the facial image second subset closes to obtain the facial image number
According to collection.
Preferably, the facial image for acquiring data set provider respectively using each camera in the battle array camera array of face
Include:
When acquiring every width facial image, the dynamic point on the display screen that the data set provider eyeball is faced is recorded, from
And determine the three-dimensional vector label of the data set provider eyeball sight, and record the head appearance in every width facial image simultaneously
State.
Preferably, the initial head pose detection network of the building includes:
Using Alex NET model as basic structure, the initial head detection network, the preliminary head detection net are constructed
The network structure of network are as follows:
C(3,1,6)-BN-PReLU-P(2,2)-C(3,1,16)-BN-PReLU-P(2,2)-C(3,1,24)-BN-
PReLU-C(3,1,24)-PReLU(3,1,16)-BN-PReLU-P(2,2)-FC(256)-FC(128)-PReLU-FC(3);
Wherein, C (k, s, c) indicates convolution kernel having a size of k, and convolution step-length is s, and port number is the convolutional layer of c, P (k, s) table
Show core having a size of k, step-length is the maximum value pond layer of s, and BN indicates batch normalization, and PReLU indicates that activation primitive, FC (n) indicate
Full articulamentum, neuron number n.
Preferably, it is described using the face image data collection respectively to the initial head pose detection network and described
Initial eyeball motion detection network, which is trained, includes:
Network and the initial eyeball motion detection net are detected to the head pose using the face image data collection
Network is trained;
Wherein, loss function Loss1=Lossh+LosseThe loss function of network is detected for the preliminary head poseWith the preliminary eyeball motion detection network losses functionThe sum of.
Preferably, the head pose and the eyeball action input to the three-dimensional sight line vector constructed in advance are detected into net
Network, obtain include: before the three-dimensional direction of visual lines vector of eyeball in the facial image
Network and the eyeball motion detection network are detected respectively to the human face data set using the head pose
In facial image detected, obtain every width facial image head pose and eyeball movement;
Using head pose and the eyeball movement of each width facial image to the initial three-dimensional sight line vector pre-established
Detection network is trained, to obtain completing the three-dimensional sight line vector detection network of training;
Current loss function Loss2=Loss1+Lossg=Lossh+Losse+LossgFor loss function Loss1With it is described
Initial three-dimensional sight line vector detects network losses functionThe sum of.
The present invention also provides a kind of devices for realizing eyeball three-dimensional eye tracking, comprising:
Head pose detection module detects net for facial image to be detected to be input to the head pose constructed in advance
Network obtains the head pose in the facial image;
Eyeball motion detection module, for the facial image to be input to the eyeball motion detection network constructed in advance,
Obtain the eyeball movement of the facial image;
Three-dimensional line-of-sight detection module, for by the head pose and the eyeball action input to the three-dimensional constructed in advance
Sight line vector detects network, obtains the three-dimensional direction of visual lines vector of eyeball in the facial image.
The present invention also provides a kind of equipment for realizing eyeball three-dimensional eye tracking, comprising:
Memory, for storing computer program;Processor realizes above-mentioned one kind when for executing the computer program
The step of realizing the method for eyeball three-dimensional eye tracking.
The present invention also provides a kind of computer readable storage medium, meter is stored on the computer readable storage medium
Calculation machine program, the computer program realize a kind of above-mentioned method for realizing eyeball three-dimensional eye tracking when being executed by processor
Step.
The method provided by the present invention for realizing eyeball three-dimensional eye tracking, facial image to be detected is input in advance
The head pose of building detects network, has obtained the head pose in the facial image.The facial image is input to institute
It states in the eyeball motion detection network constructed in advance, obtains the eyeball movement in the facial image.By the head pose and
The eyeball action input to the three-dimensional sight line vector that constructs in advance detects network, in order to according to geometrical constraint and pass through sight
Switching network obtains the three-dimensional direction of visual lines vector of eyeball in the facial image.The eye tracking side of offer of the present invention
Method is based on deep learning network, head pose and the eyeball movement of the person of being taken is extracted from two-dimensional facial image, and by institute
It states in head pose and the eyeball action input three-dimensional sight line vector detection network trained in advance, obtains the face figure
The three-dimensional direction of visual lines vector of the person's of being taken eyeball as in.Method provided by the present invention is specifically widely applied field, passes through
The three-dimensional sight line vector direction that facial image obtains eyeball can be used for the monitoring field, field of human-computer interaction, the heart of safe driving
Manage research field etc.;When solving in the prior art through deep neural network realization eye tracking technology, it is only able to detect eye
Ball center position and eyeball diameter do not have the problem of scene is widely applied.Corresponding, device provided by the present invention is set
Standby and computer readable storage medium all has above-mentioned beneficial effect.
Detailed description of the invention
It, below will be to embodiment or existing for the clearer technical solution for illustrating the embodiment of the present invention or the prior art
Attached drawing needed in technical description is briefly described, it should be apparent that, the accompanying drawings in the following description is only this hair
Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root
Other attached drawings are obtained according to these attached drawings.
Fig. 1 is the process of the first specific embodiment of the method provided by the present invention for realizing eyeball three-dimensional eye tracking
Figure;
Fig. 2 is the process of second of specific embodiment of the method provided by the present invention for realizing eyeball three-dimensional eye tracking
Figure;
Fig. 3 is a kind of structural block diagram for the device for realizing eyeball three-dimensional eye tracking provided in an embodiment of the present invention.
Specific embodiment
Core of the invention is to provide a kind of method, apparatus for realizing eyeball three-dimensional eye tracking, equipment and computer
Readable storage medium storing program for executing can obtain the three-dimensional sight line vector of eyeball by two-dimension human face image, have and scene is widely applied.
In order to enable those skilled in the art to better understand the solution of the present invention, with reference to the accompanying drawings and detailed description
The present invention is described in further detail.Obviously, described embodiments are only a part of the embodiments of the present invention, rather than
Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise
Under every other embodiment obtained, shall fall within the protection scope of the present invention.
Referring to FIG. 1, Fig. 1 is specific real for the first of the method provided by the present invention for realizing eyeball three-dimensional eye tracking
Apply the flow chart of example;Specific steps are as follows:
Step S101: facial image to be detected is input to the head pose detection network constructed in advance, is obtained described
Head pose in facial image;
It is described that facial image to be detected is input to the head pose detection network constructed in advance, obtain the face figure
The facial image of several three-dimensional labels with head pose and eyeball sight is acquired before head pose as in first, constructs people
Face image data set;And construct initial head pose detection network and initial eyeball motion detection network;Utilize the face figure
As data set is respectively trained the initial head pose detection network and the initial eyeball motion detection network, obtain
Complete the head pose detection network and the eyeball motion detection network of training.
It is preferably general in order to have the initial head pose detection network and the initial eyeball motion detection network
Change ability, the face image data set acquired in the present embodiment need to have following characteristics: a, having extensive distribution, to the greatest extent
All head poses of possible covering and eyeball movement, while data image should also include different light intensities, or even packet
Include glasses reflection interference.B, face image data set has the three-dimensional label of head pose and eyeball sight.C, facial image
The RGB image of facial image preferably generally in data acquisition system, rather than relies on specific camera device.
In order to make the face image data set have widely distribution, the present embodiment uses one 3 × 4 camera shooting
Head array represents different head poses by different camera visual angles.But face battle array camera array is only capable of representing head
Portion's posture (y, p) both direction difference, so, in order to obtain head pose the direction r difference, to the facial image of acquisition
Carry out respectively along being rotated counterclockwise, come indicate head side wobbling action variation, corresponding each head pose takes the photograph
Label (the y of the corresponding head pose of the angle of the position of array and image rotation as where headGT,pGT,rGT)。
In order to obtain eyeball movement more abundant, while acquiring the face image data collection, data is allowed to provide
Person's eye tracking watches a dynamic point of display screen attentively, shows that the dynamic point of screen includes random letters, data set provider is needed to identify
Letter is to ensure that data set provider eyeball is just watching the dynamic point of screen attentively, to guarantee the accuracy of data label, to obtain not
Same eyeball movement, the position of each corresponding eye tracking record eyeball sight line vector label (φ at this timeGT,θGT).?
The three-dimensional of head pose and corresponding eyeball sight in every width facial image is recorded while acquiring face image data set
Vector label.
In the present embodiment, when acquiring the face image data set, it is only necessary to acquire face RGB image, without according to
The other special installations of Lai Yu are not only reduced relative to the prior art for the expensive head-mount infrored equipment that needs to rely on
Application cost, and since head is freely without constraint, to have better convenience.
Construct the initial head pose detection network, the preliminary eyeball motion detection network and initial three-dimensional sight to
Before amount detection network, geometrical analysis and coordinate-system used by the present embodiment are described first.The present embodiment uses two altogether
Coordinate system, head coordinate system (Xh,Yh,Zh) and camera coordinate system (Xc,Yc,Zc), g is sight line vector.In order to be further simplified
The expression of head pose, the embodiment of the present invention, which uses three-dimensional ball shape rotary angle, indicates (y, p, r), and wherein y indicates yaw angle
(along YhThe rotation angle of axis), p indicates inclination angle (along XhThe rotation angle of axis), r indicates yaw angle (along ZhThe rotation of axis
Angle).And the movement of eyeball using two-dimensional spherical coordinate system (θ, φ) indicate, wherein θ and φ respectively indicate sight line vector with
Head coordinate system both horizontally and vertically on angle.
It is as follows with eyeball movement sight line vector to be described in the head coordinate-system:
gh=[- cos (φ) sin (θ), sin (φ) ,-cos (φ) cos (θ)]T
Camera coordinate system (Xc,Yc,Zc) be then defined as using camera center as origin, camera depth direction is ZcAxis,
Both direction perpendicular to the plane of depth direction is respectively Xc,YcAxis.Due to the three-dimensional sight line vector of network final output be
What camera coordinate system indicated, so the embodiment of the present invention defines gcFor the three-dimensional sight line vector under camera coordinate system, according to
Geometry is gained knowledge it is found that gcDepending on gh, ghIt is undefined in head coordinate system, it is possible to obtain the embodiment of the present invention
Global mapping relationship:
Step S102: the facial image is input to the eyeball motion detection network constructed in advance, obtains the face
The eyeball of image acts;
Step S103: the head pose and the eyeball action input to the three-dimensional sight line vector constructed in advance are detected
Network obtains the three-dimensional direction of visual lines vector of eyeball in the facial image.
The head pose and the eyeball action input to the three-dimensional sight line vector constructed in advance are detected into network, obtained
The three-dimensional sight line vector of eyeball in the facial image.
In order to reuse existing data set, the network in the present embodiment uses end-to-end structure, first builds respectively
Initial head pose detection network and the eyeball motion detection network are found, it is then that the structure detection result of two parts network is defeated
Enter the three-dimensional sight line vector obtained to a fully-connected network to the end, network is divided into Liang Ge branch, and upper element branches are for detecting
Then head pose, lower part pass through the sight conversion layer of geometrical constraint, obtain camera coordinate system for detecting eyeball movement
Sight three-dimensional vector.
Based on the above embodiment, in the present embodiment, in order to reuse collected face image data collection, this implementation
Example uses end-to-end structure, first establishes the network of head pose detection and the network of eyeball motion detection respectively, then will
The structure detection result of two parts network is input to a fully-connected network, obtains three-dimensional sight line vector to the end, and network is divided into
Liang Ge branch, upper element branches are for detecting head pose, and lower part is for detecting eyeball movement, then by geometrical constraint
Sight conversion layer obtains the sight three-dimensional vector of camera coordinate system.Referring to FIG. 2, Fig. 2 is reality provided by the present invention
Lose face ball three-dimensional eye tracking method second of specific embodiment flow chart;Specific steps are as follows:
Step S201: the facial image of several data set providers is acquired using face battle array camera array, and records every width people
The three-dimensional vector label of head pose and eyeball movement in face image, obtains the first subclass of facial image;
Step S202: side clockwise and anticlockwise is carried out respectively to the facial image in first subclass of facial image
To rotation, obtain facial image second subset conjunction;
Step S203: merging first subclass of facial image and the facial image second subset closes to obtain the people
Face image data set;
Step S204: network is detected to the initial head pose constructed in advance respectively using the face image data set
It is trained with initial eyeball motion detection network, obtains target cranial attitude detection network and target eyeball detection network;
The basic network topology of the initial head pose detection network uses the structure of Alex Net, carries out phase to it
The simplification and modification answered.The number of plies of network is constant, but each layer of port number has carried out reduction appropriate, while by local acknowledgement
Normalization is changed to batch normalization, and activation primitive uses PReLU.The network structure of the initial head pose detection network is as follows: C
(3,1,6)-BN-PReLU-P(2,2)-C(3,1,16)-BN-PReLU-P(2,2)-C(3,1,24)-BN-PReLU-C(3,1,
24)-PReLU(3,1,16)-BN-PReLU-P(2,2)-FC(256)-FC(128)-PReLU-FC(3)
Wherein, wherein C (k, s, c) indicates convolution kernel having a size of k, and convolution step-length is s, and port number is the convolutional layer of c, P
(k, s) indicates core having a size of k, and step-length is the maximum value pond layer of s, and BN indicates batch normalization, and PReLU indicates activation primitive, FC
(n) full articulamentum, neuron number n are indicated.
The eye areas that the input of the eyeball motion detection network is intercepted by the original image of facial image, divides left eye
Its part will be described in detail below with right eye two parts since two parts network is full symmetric, by eyeball image block tune
It is whole to arrive consistent size 36x36, then pass through convolutional neural networks and fully-connected network, the initial eyeball motion detection network knot
Structure is as follows: C (11,2,96)-BN-PReLU-P (2,2)-C (5,1,256)-BN-PReLU-P (2,2)-C (3,1,384)-BN-
PReLU-P(2,2)-C(1,1,64)-BN-PReLU-P(2,2)-FC(128)-FC(2)。
Step S205: using the target cranial attitude detection network and the target eyeball motion detection network to described
Each width face that face image data is concentrated is detected, and head pose and the eyeball movement of each width facial image are obtained;
Step S206: it is acted using the head pose and eyeball of every width facial image in the face image data set defeated
Enter to the initial three-dimensional sight line vector detection network constructed in advance and be trained, obtains the target three-dimensional sight line vector detection net
Network;
(y, p, r) that the initial three-dimensional sight line vector detection network is obtained by the target cranial attitude detection network and
Input of (θ, the φ) that the target eyeball motion detection network obtains as the initial three-dimensional sight line vector detection network, institute
Stating initial three-dimensional sight line vector detection network is two layers of fully-connected network, and network first tier neuron number is 128, final layer mind
It is 3 through first number, corresponding three-dimensional sight line vector.
When being trained to head pose detection network and the initial eyeball motion detection network, loss function
Loss1=Lossh+LosseThe loss function of network is detected for the preliminary head poseIt is acted with the preliminary eyeball
Detect network losses functionThe sum of.
When using being trained to the initial three-dimensional sight line vector detection network pre-established, current loss function
Loss2=Loss1+Lossg=Lossh+Losse+LossgFor loss function Loss1With the initial three-dimensional sight line vector detection
Network losses functionThe sum of.
Lossh=| | h-hGT||2, h={ y, p, r }
Losse=| | e-eGT||2, e={ φ, θ }
Lossg=| | gc-gc GT||2,gc={ x, y, z }
Step S207: being input to the target cranial attitude detection network for facial image to be detected, obtain it is described to
Head pose in the facial image of detection;
Step S208: the facial image to be detected is input to the target eyeball motion detection network, obtains institute
State the eyeball movement of facial image to be detected;
Step S209: by the eyeball of the head pose of the facial image to be detected and the facial image to be detected
Action input to the target three-dimensional sight line vector detects network, obtains the three-dimensional view of eyeball in the facial image to be detected
Line direction vector.
The two dimension mark for only having carried out eyeball center in eyeball identification in the prior art, finally can only obtain eyeball
Two-dimensional signal, so using being limited to, and method provided by the present embodiment is equally based on deep neural network, but this implementation
Example has not only handled the action message of eyeball, has also carried out the prediction of head pose, while predicting eyeball three-dimensional sight line vector,
To have higher level information, it may have better application value.Network training uses end-to-end in the present embodiment
Substep training, in first step training process, can make full use of the data set and eyeball action data of existing head pose
Collection allows depth network in the present embodiment to have better generalization ability to significantly increase trained data set.
Referring to FIG. 3, Fig. 3 is a kind of structure for the device for realizing eyeball three-dimensional eye tracking provided in an embodiment of the present invention
Block diagram;Specific device may include:
Head pose detection module 100, for facial image to be detected to be input to the head pose constructed in advance inspection
Survey grid network obtains the head pose in the facial image;
Eyeball motion detection module 200, for the facial image to be input to the eyeball motion detection net constructed in advance
Network obtains the eyeball movement of the facial image;
Three-dimensional line-of-sight detection module 300, for by the head pose and the eyeball action input to constructing in advance
Three-dimensional sight line vector detects network, obtains the three-dimensional direction of visual lines vector of eyeball in the facial image.
The present embodiment realization eyeball three-dimensional eye tracking device for realizing realization eyeball three-dimensional sight above-mentioned with
The method of track, therefore the visible realization eyeball three hereinbefore of specific embodiment in the device of realization eyeball three-dimensional eye tracking
The embodiment part of the method for eye tracking is tieed up, for example, head pose detection module 100, eyeball motion detection module 200, three
Tie up line-of-sight detection module 300, be respectively used in the method for realizing above-mentioned realization eyeball three-dimensional eye tracking step S101, S102 and
S103, so, specific embodiment is referred to the description of corresponding various pieces embodiment, and details are not described herein.
The specific embodiment of the invention additionally provides a kind of equipment for realizing eyeball three-dimensional eye tracking, comprising: memory is used
In storage computer program;Processor realizes a kind of above-mentioned realization eyeball three-dimensional sight when for executing the computer program
The step of method of tracking.
The specific embodiment of the invention additionally provides a kind of computer readable storage medium, the computer readable storage medium
On be stored with computer program, the computer program realized when being executed by processor a kind of above-mentioned realization eyeball three-dimensional sight with
The step of method of track.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with it is other
The difference of embodiment, same or similar part may refer to each other between each embodiment.For being filled disclosed in embodiment
For setting, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is referring to method part
Explanation.
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure
And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and
The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These
Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession
Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered
Think beyond the scope of this invention.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor
The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit
Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology
In any other form of storage medium well known in field.
It above can to method, apparatus, equipment and the computer provided by the present invention for realizing eyeball three-dimensional eye tracking
Storage medium is read to be described in detail.Specific case used herein explains the principle of the present invention and embodiment
It states, the above description of the embodiment is only used to help understand the method for the present invention and its core ideas.It should be pointed out that for this skill
For the those of ordinary skill in art field, without departing from the principle of the present invention, several change can also be carried out to the present invention
Into and modification, these improvements and modifications also fall within the scope of protection of the claims of the present invention.
Claims (10)
1. a kind of method for realizing eyeball three-dimensional eye tracking characterized by comprising
Facial image to be detected is input to the head pose detection network constructed in advance, obtains the head in the facial image
Portion's posture;
The facial image is input to the eyeball motion detection network constructed in advance, the eyeball for obtaining the facial image is dynamic
Make;
The head pose and the eyeball action input to the three-dimensional sight line vector constructed in advance are detected into network, obtained described
The three-dimensional direction of visual lines vector of eyeball in facial image.
2. the method as described in claim 1, which is characterized in that described be input to facial image to be detected constructs in advance
Head pose detects network, includes: before obtaining the head pose in the facial image
The facial image of several three-dimensional labels with head pose and eyeball sight is acquired, face image data collection is constructed,
In, the facial image is RGB image;
Construct initial head pose detection network and initial eyeball motion detection network;
Using the face image data collection respectively to the initial head pose detection network and the initial eyeball movement inspection
Survey grid network is trained, and obtains the head pose detection network for completing training and the eyeball motion detection network.
3. method according to claim 2, which is characterized in that it is described acquisition several with head pose and eyeball sight three
The facial image of dimension label, building face image data collection include:
The facial image of data set provider is acquired respectively using each camera in the battle array camera array of face, obtains facial image the
One subclass;
Every row camera collects several facial images in the face battle array camera array, indicates the data set provider in the side y
To different head poses;
Several collected facial images of each column camera in the face battle array camera array, indicate the data set provider in p
The different head pose in direction;
Rotation clockwise and counterclockwise is carried out respectively to the face battle array collected facial image of camera array,
It obtains indicating that the data set provider is closed in the facial image second subset of the different head pose in the direction r;
Merge first subclass of facial image and the facial image second subset closes to obtain the face image data collection.
4. method as claimed in claim 3, which is characterized in that described to be distinguished using each camera in the battle array camera array of face
Acquisition data set provider facial image include:
When acquiring every width facial image, the dynamic point on the display screen that the data set provider eyeball is faced is recorded, thus really
The three-dimensional vector label of the fixed data set provider eyeball sight, and the head pose in every width facial image is recorded simultaneously.
5. method according to claim 2, which is characterized in that the initial head pose of building detects network and includes:
Using Alex NET model as basic structure, the initial head detection network is constructed, the preliminary head detection network
Network structure are as follows:
C(3,1,6)-BN-PReLU-P(2,2)-C(3,1,16)-BN-PReLU-P(2,2)-C(3,1,24)-BN-PReLU-C
(3,1,24)-PReLU(3,1,16)-BN-PReLU-P(2,2)-FC(256)-FC(128)-PReLU-FC(3);
Wherein, C (k, s, c) indicates convolution kernel having a size of k, and convolution step-length is s, and port number is the convolutional layer of c, and P (k, s) indicates core
Having a size of k, step-length is the maximum value pond layer of s, and BN indicates batch normalization, and PReLU indicates activation primitive, and FC (n) expression connects entirely
Meet layer, neuron number n.
6. method according to claim 2, which is characterized in that described to utilize the face image data collection respectively to described first
Beginning head pose detection network and the initial eyeball motion detection network are trained and include:
Using the face image data collection to the head pose detect network and the initial eyeball motion detection network into
Row training;
Wherein, loss function Loss1=Lossh+LosseThe loss function Loss of network is detected for the preliminary head posehWith
The preliminary eyeball motion detection network losses function LosseThe sum of.
7. method as claimed in claim 6, which is characterized in that it is described by the head pose and the eyeball action input extremely
The three-dimensional sight line vector detection network constructed in advance, obtains wrapping before the three-dimensional direction of visual lines vector of eyeball in the facial image
It includes:
Network and the eyeball motion detection network are detected respectively in the human face data set using the head pose
Facial image is detected, and head pose and the eyeball movement of every width facial image are obtained;
The initial three-dimensional sight line vector pre-established is detected using head pose and the eyeball movement of each width facial image
Network is trained, to obtain completing the three-dimensional sight line vector detection network of training;
Current loss function Loss2=Loss1+Lossg=Lossh+Losse+LossgFor loss function Loss1With described initial three
It ties up sight line vector and detects network losses function LossgThe sum of.
8. a kind of device for realizing eyeball three-dimensional eye tracking characterized by comprising
Head pose detection module detects network for facial image to be detected to be input to the head pose constructed in advance,
Obtain the head pose in the facial image;
Eyeball motion detection module is obtained for the facial image to be input to the eyeball motion detection network constructed in advance
The eyeball of the facial image acts;
Three-dimensional line-of-sight detection module, for by the head pose and the eyeball action input to the three-dimensional sight constructed in advance
Vector detection network obtains the three-dimensional direction of visual lines vector of eyeball in the facial image.
9. a kind of equipment for realizing eyeball three-dimensional eye tracking characterized by comprising
Memory, for storing computer program;
Processor realizes a kind of realization eyeball three as described in any one of claim 1 to 7 when for executing the computer program
The step of tieing up the method for eye tracking.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium
Program realizes that a kind of realization eyeball is three-dimensional as described in any one of claim 1 to 7 when the computer program is executed by processor
The step of method of eye tracking.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811375929.7A CN109508679B (en) | 2018-11-19 | 2018-11-19 | Method, device and equipment for realizing three-dimensional eye gaze tracking and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811375929.7A CN109508679B (en) | 2018-11-19 | 2018-11-19 | Method, device and equipment for realizing three-dimensional eye gaze tracking and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109508679A true CN109508679A (en) | 2019-03-22 |
CN109508679B CN109508679B (en) | 2023-02-10 |
Family
ID=65749029
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811375929.7A Active CN109508679B (en) | 2018-11-19 | 2018-11-19 | Method, device and equipment for realizing three-dimensional eye gaze tracking and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109508679B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110191234A (en) * | 2019-06-21 | 2019-08-30 | 中山大学 | It is a kind of based on the intelligent terminal unlocking method for watching point analysis attentively |
CN110555426A (en) * | 2019-09-11 | 2019-12-10 | 北京儒博科技有限公司 | Sight line detection method, device, equipment and storage medium |
CN110909611A (en) * | 2019-10-29 | 2020-03-24 | 深圳云天励飞技术有限公司 | Method and device for detecting attention area, readable storage medium and terminal equipment |
WO2020216054A1 (en) * | 2019-04-24 | 2020-10-29 | 腾讯科技(深圳)有限公司 | Sight line tracking model training method, and sight line tracking method and device |
CN111847147A (en) * | 2020-06-18 | 2020-10-30 | 闽江学院 | Non-contact eye-movement type elevator floor input method and device |
CN112114671A (en) * | 2020-09-22 | 2020-12-22 | 上海汽车集团股份有限公司 | Human-vehicle interaction method and device based on human eye sight and storage medium |
WO2021135827A1 (en) * | 2019-12-30 | 2021-07-08 | 上海商汤临港智能科技有限公司 | Line-of-sight direction determination method and apparatus, electronic device, and storage medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100026803A1 (en) * | 2006-03-27 | 2010-02-04 | Fujifilm Corporaiton | Image recording apparatus, image recording method and image recording program |
CN104391574A (en) * | 2014-11-14 | 2015-03-04 | 京东方科技集团股份有限公司 | Sight processing method, sight processing system, terminal equipment and wearable equipment |
US20150109204A1 (en) * | 2012-11-13 | 2015-04-23 | Huawei Technologies Co., Ltd. | Human-machine interaction method and apparatus |
CN105740846A (en) * | 2016-03-02 | 2016-07-06 | 河海大学常州校区 | Horizontal visual angle estimation and calibration method based on depth camera |
CN106598221A (en) * | 2016-11-17 | 2017-04-26 | 电子科技大学 | Eye key point detection-based 3D sight line direction estimation method |
JP2017213191A (en) * | 2016-05-31 | 2017-12-07 | 富士通株式会社 | Sight line detection device, sight line detection method and sight line detection program |
CN107818310A (en) * | 2017-11-03 | 2018-03-20 | 电子科技大学 | A kind of driver attention's detection method based on sight |
US20180140187A1 (en) * | 2015-07-17 | 2018-05-24 | Sony Corporation | Eyeball observation device, eyewear terminal, line-of-sight detection method, and program |
CN108171218A (en) * | 2018-01-29 | 2018-06-15 | 深圳市唯特视科技有限公司 | A kind of gaze estimation method for watching network attentively based on appearance of depth |
CN108229284A (en) * | 2017-05-26 | 2018-06-29 | 北京市商汤科技开发有限公司 | Eye-controlling focus and training method and device, system, electronic equipment and storage medium |
-
2018
- 2018-11-19 CN CN201811375929.7A patent/CN109508679B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100026803A1 (en) * | 2006-03-27 | 2010-02-04 | Fujifilm Corporaiton | Image recording apparatus, image recording method and image recording program |
US20150109204A1 (en) * | 2012-11-13 | 2015-04-23 | Huawei Technologies Co., Ltd. | Human-machine interaction method and apparatus |
CN104391574A (en) * | 2014-11-14 | 2015-03-04 | 京东方科技集团股份有限公司 | Sight processing method, sight processing system, terminal equipment and wearable equipment |
US20180140187A1 (en) * | 2015-07-17 | 2018-05-24 | Sony Corporation | Eyeball observation device, eyewear terminal, line-of-sight detection method, and program |
CN105740846A (en) * | 2016-03-02 | 2016-07-06 | 河海大学常州校区 | Horizontal visual angle estimation and calibration method based on depth camera |
JP2017213191A (en) * | 2016-05-31 | 2017-12-07 | 富士通株式会社 | Sight line detection device, sight line detection method and sight line detection program |
CN106598221A (en) * | 2016-11-17 | 2017-04-26 | 电子科技大学 | Eye key point detection-based 3D sight line direction estimation method |
CN108229284A (en) * | 2017-05-26 | 2018-06-29 | 北京市商汤科技开发有限公司 | Eye-controlling focus and training method and device, system, electronic equipment and storage medium |
CN107818310A (en) * | 2017-11-03 | 2018-03-20 | 电子科技大学 | A kind of driver attention's detection method based on sight |
CN108171218A (en) * | 2018-01-29 | 2018-06-15 | 深圳市唯特视科技有限公司 | A kind of gaze estimation method for watching network attentively based on appearance of depth |
Non-Patent Citations (1)
Title |
---|
周小龙,汤帆扬,管秋,华敏: "基于3D人眼模型的视线跟踪技术综述", 《计算机辅助设计与图形学学报》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020216054A1 (en) * | 2019-04-24 | 2020-10-29 | 腾讯科技(深圳)有限公司 | Sight line tracking model training method, and sight line tracking method and device |
US11797084B2 (en) | 2019-04-24 | 2023-10-24 | Tencent Technology (Shenzhen) Company Limited | Method and apparatus for training gaze tracking model, and method and apparatus for gaze tracking |
CN110191234A (en) * | 2019-06-21 | 2019-08-30 | 中山大学 | It is a kind of based on the intelligent terminal unlocking method for watching point analysis attentively |
CN110555426A (en) * | 2019-09-11 | 2019-12-10 | 北京儒博科技有限公司 | Sight line detection method, device, equipment and storage medium |
CN110909611A (en) * | 2019-10-29 | 2020-03-24 | 深圳云天励飞技术有限公司 | Method and device for detecting attention area, readable storage medium and terminal equipment |
CN110909611B (en) * | 2019-10-29 | 2021-03-05 | 深圳云天励飞技术有限公司 | Method and device for detecting attention area, readable storage medium and terminal equipment |
WO2021135827A1 (en) * | 2019-12-30 | 2021-07-08 | 上海商汤临港智能科技有限公司 | Line-of-sight direction determination method and apparatus, electronic device, and storage medium |
CN111847147A (en) * | 2020-06-18 | 2020-10-30 | 闽江学院 | Non-contact eye-movement type elevator floor input method and device |
CN111847147B (en) * | 2020-06-18 | 2023-04-18 | 闽江学院 | Non-contact eye-movement type elevator floor input method and device |
CN112114671A (en) * | 2020-09-22 | 2020-12-22 | 上海汽车集团股份有限公司 | Human-vehicle interaction method and device based on human eye sight and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109508679B (en) | 2023-02-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109508679A (en) | Realize method, apparatus, equipment and the storage medium of eyeball three-dimensional eye tracking | |
Zhang et al. | Saliency detection in 360 videos | |
JP7136875B2 (en) | Eye Pose Identification Using Eye Features | |
US20230273676A1 (en) | Methods and apparatuses for determining and/or evaluating localizing maps of image display devices | |
US10748313B2 (en) | Dynamic multi-view interactive digital media representation lock screen | |
US10803365B2 (en) | System and method for relocalization and scene recognition | |
CN111243093B (en) | Three-dimensional face grid generation method, device, equipment and storage medium | |
CN106251404B (en) | Orientation tracking, the method and relevant apparatus, equipment for realizing augmented reality | |
Upenik et al. | A simple method to obtain visual attention data in head mounted virtual reality | |
CN115427758A (en) | Cross reality system with accurate shared map | |
CN114586071A (en) | Cross-reality system supporting multiple device types | |
CN104978548A (en) | Visual line estimation method and visual line estimation device based on three-dimensional active shape model | |
CN108135469A (en) | Estimated using the eyelid shape of eyes attitude measurement | |
CN108875524A (en) | Gaze estimation method, device, system and storage medium | |
CN106796449A (en) | Eye-controlling focus method and device | |
CN109887003A (en) | A kind of method and apparatus initialized for carrying out three-dimensional tracking | |
WO2019062056A1 (en) | Smart projection method and system, and smart terminal | |
CN113689503B (en) | Target object posture detection method, device, equipment and storage medium | |
CN105760809A (en) | Method and apparatus for head pose estimation | |
CN107145224A (en) | Human eye sight tracking and device based on three-dimensional sphere Taylor expansion | |
US20210056292A1 (en) | Image location identification | |
CN110188630A (en) | A kind of face identification method and camera | |
US10789778B1 (en) | Systems and methods for displaying augmented-reality objects | |
CN110046554A (en) | A kind of face alignment method and camera | |
CN115550563A (en) | Video processing method, video processing device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |