CN109508679B - Method, device and equipment for realizing three-dimensional eye gaze tracking and storage medium - Google Patents

Method, device and equipment for realizing three-dimensional eye gaze tracking and storage medium Download PDF

Info

Publication number
CN109508679B
CN109508679B CN201811375929.7A CN201811375929A CN109508679B CN 109508679 B CN109508679 B CN 109508679B CN 201811375929 A CN201811375929 A CN 201811375929A CN 109508679 B CN109508679 B CN 109508679B
Authority
CN
China
Prior art keywords
detection network
eyeball
dimensional
face image
prelu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811375929.7A
Other languages
Chinese (zh)
Other versions
CN109508679A (en
Inventor
张国生
李东
冯广
章云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201811375929.7A priority Critical patent/CN109508679B/en
Publication of CN109508679A publication Critical patent/CN109508679A/en
Application granted granted Critical
Publication of CN109508679B publication Critical patent/CN109508679B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method, a device, equipment and a computer readable storage medium for realizing eyeball three-dimensional sight tracking, which comprises the following steps: inputting a human face image to be detected into a pre-constructed head posture detection network to obtain a head posture in the human face image; inputting the face image into a pre-constructed eyeball action detection network to obtain the eyeball action of the face image; and inputting the head posture and the eyeball motion into a pre-constructed three-dimensional sight line vector detection network to obtain a three-dimensional sight line direction vector of the eyeball in the face image. The method, the device, the equipment and the computer readable storage medium provided by the invention can extract the three-dimensional sight direction vector of the eyeball of the shot person from the two-dimensional face image, and have wide application scenes.

Description

Method, device and equipment for realizing three-dimensional eye gaze tracking of eyeball and storage medium
Technical Field
The present invention relates to the field of eyeball tracking technologies, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for realizing three-dimensional eye gaze tracking.
Background
The research of the eyeball tracking algorithm has been well developed and successfully implemented in many commercial applications, such as VR/AR technology, and although the traditional eyeball tracking technology can achieve higher precision, the eyeball tracking algorithm at present is basically based on the traditional image processing method, depends on expensive infrared equipment, and needs to install special detection equipment on the head to detect the characteristics of the eyeball. The detection precision of the traditional image processing method is influenced by light change, and the detection distance is severely restricted. Therefore, an algorithm for realizing eyeball tracking through an RGB image shot by a common camera is urgently needed. In the field of computer vision, deep convolutional neural networks have achieved significant results in many ways, such as object detection, instance segmentation, and so on.
In the prior art, there is also a corresponding eyeball tracking technology based on deep learning, which comprises the following specific steps: acquiring retinopathy image data; carrying out data annotation on the retinopathy image data to obtain annotated data; establishing an initial deep learning network; inputting the retinopathy image data into an initial deep learning network, and outputting to obtain corresponding prediction data; comparing the corresponding labeling data and the prediction data of the retinopathy image data by using a loss function to obtain a comparison result; adjusting parameters in the initial deep learning network according to the comparison result until the comparison result reaches a preset threshold value to obtain a final deep learning network model; and processing the image data of the retinopathy to be detected by using the deep learning network model to obtain corresponding eyeball center coordinates and eyeball diameters.
Therefore, in the existing eyeball tracking technology, one is to realize the eyeball tracking technology based on a traditional image processing algorithm, although the algorithm is already applied in relatively mature business, the detection accuracy of the traditional image processing algorithm is influenced by light changes, and the traditional image processing algorithm depends on expensive head-worn infrared equipment, so that the head convenience experience is poor, and the detection distance is also restricted. The other is an eyeball tracking algorithm based on a deep learning algorithm, however, the existing eyeball tracking algorithm based on the deep learning algorithm in the technology can only detect the center position and the diameter of the eyeball, only comprises two-dimensional information of the eyeball action, and the application scene is restricted.
In summary, it can be seen that how to obtain a three-dimensional eye gaze direction vector of an eyeball through a two-dimensional face image is a problem to be solved at present.
Disclosure of Invention
The invention aims to provide a method, a device, equipment and a computer readable storage medium for realizing eyeball three-dimensional sight tracking, so as to solve the problem that in the prior art, an eyeball tracking algorithm based on deep learning can only detect two-dimensional information of eyeballs.
In order to solve the above technical problem, the present invention provides a method for realizing three-dimensional eye gaze tracking, comprising: inputting a human face image to be detected into a pre-constructed head posture detection network to obtain a head posture in the human face image; inputting the face image into a pre-constructed eyeball action detection network to obtain the eyeball action of the face image; and inputting the head posture and the eyeball motion into a pre-constructed three-dimensional sight line vector detection network to obtain a three-dimensional sight line direction vector of the eyeball in the face image.
Preferably, the inputting the human face image to be detected into a pre-constructed head pose detection network comprises, before obtaining the head pose in the human face image:
acquiring a plurality of face images with three-dimensional labels of head postures and eye sight lines, and constructing a face image data set, wherein the face images are RGB images;
constructing an initial head posture detection network and an initial eyeball action detection network;
and respectively training the initial head posture detection network and the initial eyeball action detection network by utilizing the face image data set to obtain the trained head posture detection network and the trained eyeball action detection network.
Preferably, the acquiring a plurality of face images with three-dimensional labels of head pose and eye sight line, and the constructing a face image data set comprises:
respectively acquiring face images of a data provider by using each camera in an area array camera array to obtain a first subset of the face images;
each row of cameras in the area array camera array collects a plurality of face images and represents different head postures of the data provider in the y direction;
a plurality of face images collected by each row of cameras in the area array camera array represent different head postures of the data provider in the p direction;
rotating the face images acquired by the area array camera array in the clockwise direction and the anticlockwise direction respectively to obtain a second subset of the face images representing different head postures of the data provider in the r direction;
and combining the first subset of the face images and the second subset of the face images to obtain the face image data set.
Preferably, the respectively acquiring the face images of the data providers by using each camera in the area array camera array includes:
when each human face image is collected, the moving point on the display screen of the eyeball of the data provider is recorded, so that the three-dimensional vector label of the sight line of the eyeball of the data provider is determined, and the head posture in each human face image is recorded at the same time.
Preferably, said constructing an initial head pose detection network comprises:
constructing the initial head detection network by taking an Alex NET model as a basic structure, wherein the network structure of the initial head detection network is as follows:
C(3,1,6)-BN-PReLU-P(2,2)-C(3,1,16)-BN-PReLU-P(2,2)-C(3,1,24)-BN-PReLU-C(3,1,24)-PReLU(3,1,16)-BN-PReLU-P(2,2)-FC(256)-FC(128)-PReLU-FC(3);
wherein C (k, s, C) represents convolution layer with convolution kernel size k, convolution step size s and channel number C, P (k, s) represents maximum pooling layer with kernel size k and step size s, BN represents batch normalization, PReLU represents activation function, FC (n) represents full connection layer and number of neurons is n.
Preferably, the training the initial head pose detection network and the initial eye movement detection network respectively by using the facial image dataset comprises:
training the head posture detection network and the initial eyeball motion detection network by using the facial image data set;
wherein the Loss function Loss 1 =Loss h +Loss e For the preliminary head poseDetecting loss functions of a network
Figure BDA0001870731570000031
And said preliminary eye movement detection network loss function
Figure BDA0001870731570000032
And (4) summing.
Preferably, inputting the head pose and the eyeball motion to a pre-constructed three-dimensional sight line vector detection network, and obtaining a three-dimensional sight line direction vector of an eyeball in the face image comprises:
respectively detecting the face images in the face data set by using the head posture detection network and the eyeball action detection network to obtain the head posture and the eyeball action of each face image;
training a pre-established initial three-dimensional sight line vector detection network by using the head postures and the eye movements of the human face images so as to obtain a trained three-dimensional sight line vector detection network;
loss function Loss at present 2 =Loss 1 +Loss g =Loss h +Loss e +Loss g Is a Loss function Loss 1 And said initial three-dimensional line-of-sight vector detection network loss function
Figure BDA0001870731570000041
And (4) the sum.
The invention also provides a device for realizing eyeball three-dimensional sight tracking, which comprises:
the head posture detection module is used for inputting a human face image to be detected into a pre-constructed head posture detection network to obtain a head posture in the human face image;
the eyeball motion detection module is used for inputting the face image into a pre-constructed eyeball motion detection network to obtain the eyeball motion of the face image;
and the three-dimensional sight line detection module is used for inputting the head posture and the eyeball motion into a pre-constructed three-dimensional sight line vector detection network to obtain a three-dimensional sight line direction vector of the eyeball in the face image.
The invention also provides equipment for realizing three-dimensional eye gaze tracking, which comprises:
a memory for storing a computer program; and the processor is used for realizing the steps of the method for realizing the three-dimensional eye gaze tracking of the eyeball when the computer program is executed.
The present invention also provides a computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the steps of the above-mentioned method for implementing three-dimensional gaze tracking of an eyeball.
According to the method for realizing eyeball three-dimensional sight tracking, the face image to be detected is input into the head posture detection network which is constructed in advance, and the head posture in the face image is obtained. And inputting the face image into the pre-constructed eyeball motion detection network to obtain the eyeball motion in the face image. And inputting the head posture and the eyeball motion into a pre-constructed three-dimensional sight line vector detection network so as to obtain a three-dimensional sight line direction vector of the eyeball in the face image through a sight line conversion network according to geometric constraint. The eyeball tracking method provided by the invention is based on a deep learning network, extracts the head posture and the eyeball action of a shot person from a two-dimensional face image, and inputs the head posture and the eyeball action into a pre-trained three-dimensional sight line vector detection network to obtain the three-dimensional sight line direction vector of the eyeball of the shot person in the face image. The method provided by the invention has a wide application field, and the three-dimensional sight vector direction of the eyeball obtained by the face image can be used in the monitoring field of safe driving, the human-computer interaction field, the psychological research field and the like; the problem of in the prior art when realizing eyeball tracking technique through the deep neural network, can only detect eyeball central point and eyeball diameter, not have wide application scene is solved. Correspondingly, the device, the equipment and the computer readable storage medium provided by the invention have the beneficial effects.
Drawings
In order to more clearly illustrate the embodiments or technical solutions of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a flowchart illustrating a method for tracking three-dimensional eye gaze according to a first embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for tracking three-dimensional eye gaze according to a second embodiment of the present invention;
fig. 3 is a block diagram of a device for tracking three-dimensional eye gaze according to an embodiment of the present invention.
Detailed Description
The core of the invention is to provide a method, a device, equipment and a computer readable storage medium for realizing eyeball three-dimensional sight tracking, which can obtain the three-dimensional sight vector of the eyeball through a two-dimensional face image and have wide application scenes.
In order that those skilled in the art will better understand the disclosure, reference will now be made in detail to the embodiments of the disclosure as illustrated in the accompanying drawings. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating a method for tracking an eyeball three-dimensional visual line according to a first embodiment of the present invention; the specific operation steps are as follows:
step S101: inputting a human face image to be detected into a pre-constructed head posture detection network to obtain a head posture in the human face image;
inputting a human face image to be detected into a pre-constructed head posture detection network, firstly collecting a plurality of human face images with three-dimensional labels of head postures and eyeball sight before obtaining the head postures in the human face images, and constructing a human face image data set; constructing an initial head posture detection network and an initial eyeball action detection network; and respectively training the initial head posture detection network and the initial eyeball action detection network by using the facial image data set to obtain the trained head posture detection network and the trained eyeball action detection network.
In order to make the initial head pose detection network and the initial eye movement detection network have better generalization capability, the face image data set collected in this embodiment needs to have the following features: a. the data image has wide distribution, covers all head postures and eyeball actions as far as possible, and simultaneously comprises different light intensities and even glasses reflection interference. B. The facial image data set has three-dimensional labels of head pose and eye gaze. c. The face images in the face image data set are preferably generic RGB images, rather than being dependent on a particular camera device.
In order to make the facial image data set have wider distribution, the present embodiment employs a 3 × 4 camera array, and different camera angles represent different head poses. However, the area array camera array can only represent the difference of the head postures in two directions (y, p), so in order to obtain the difference of the head postures in the r direction, the collected face images are respectively rotated clockwise and anticlockwise to represent the change of the side head swinging action of the head, and the position of the array where the camera is located and the image rotating angle correspond to the label (y, p) of one head posture for each head posture GT ,p GT ,r GT )。
In order to obtain richer eyeball actions, the eyeball of the data provider is enabled to track a moving point of watching a display screen while the face image data set is collected, the moving point of the display screen contains random letters, and the letters need to be recognized by the data provider to ensure that the eyeball of the data provider watches the display screenThe moving point of the screen is ensured, so that the accuracy of the data label is ensured, different eyeball actions are obtained, and the eyeball sight line vector label (phi) at the moment is recorded corresponding to the position tracked by each eyeball GTGT ). And recording the head posture and the corresponding three-dimensional vector label of the eyeball sight line in each human face image while acquiring a human face image data set.
In this embodiment, when the face image data set is acquired, only face RGB images need to be acquired, and other special devices do not need to be relied on, so that the application cost is reduced, and the head is free and unconstrained, thereby having better convenience compared with the prior art that an expensive infrared device is worn on the head.
Before the initial head posture detection network, the initial eyeball motion detection network, and the initial three-dimensional sight line vector detection network are constructed, first, the geometric analysis and the coordinate system adopted in the embodiment are described. The present embodiment employs two coordinate systems in common, the head coordinate system (X) h ,Y h ,Z h ) And camera coordinate system (X) c ,Y c ,Z c ) And g is a sight line vector. To further simplify the representation of head pose, embodiments of the present invention employ a three-dimensional representation of spherical rotation angles (Y, p, r), where Y represents yaw angle (along Y) h Rotation angle of the shaft), p represents an inclination angle (along X) h Rotation angle of the shaft), r represents yaw angle (along Z) h The angle of rotation of the shaft). The movement of the eyeball is represented by a two-dimensional spherical coordinate system (theta, phi), wherein the theta and the phi respectively represent the included angles between the sight line vector and the horizontal direction and the vertical direction of the head coordinate system.
Describing the gaze vector with eye movement in the head coordinate system is as follows:
g h =[-cos(φ)sin(θ),sin(φ),-cos(φ)cos(θ)] T
camera coordinate system (X) c ,Y c ,Z c ) Then the camera center is defined as the origin point, and the depth direction of the camera is defined as Z c Axis, two directions of the plane perpendicular to the depth direction being X c ,Y c And a shaft. Due to final transmission of networkThe three-dimensional sight line vector is expressed in a camera coordinate system, so that the embodiment of the invention defines g c Is a three-dimensional sight line vector under a camera coordinate system and can be known according to geometrical knowledge g c Dependent on g h ,g h Is defined under the head coordinate system, so the overall mapping relation of the embodiment of the invention can be obtained:
Figure BDA0001870731570000071
step S102: inputting the face image into a pre-constructed eyeball action detection network to obtain eyeball action of the face image;
step S103: and inputting the head posture and the eyeball action into a pre-constructed three-dimensional sight line vector detection network to obtain a three-dimensional sight line direction vector of the eyeball in the face image.
And inputting the head posture and the eyeball action into a pre-constructed three-dimensional sight line vector detection network to obtain a three-dimensional sight line vector of the eyeball in the face image.
In order to reuse the existing data set, the network in this embodiment adopts an end-to-end structure, an initial head posture detection network and the eyeball motion detection network are respectively established, then the structure detection results of the two parts of networks are input into a fully connected network to obtain a final three-dimensional sight vector, the network is divided into two branches, the upper part of the branch is used for detecting the head posture, the lower part of the branch is used for detecting the eyeball motion, and then the sight three-dimensional direction vector of the camera coordinate system is obtained through a sight conversion layer with geometric constraint.
Based on the above embodiments, in this embodiment, in order to reuse the collected face image data set, an end-to-end structure is adopted in this embodiment, a network for detecting head gestures and a network for detecting eye movements are respectively established, then the structure detection results of the two parts of networks are input into a fully connected network to obtain a final three-dimensional sight vector, the network is divided into two branches, the upper part of the branch is used for detecting head gestures, the lower part of the branch is used for detecting eye movements, and then the sight three-dimensional direction vector of the camera coordinate system is obtained through a geometrically constrained sight conversion layer. Referring to fig. 2, fig. 2 is a flowchart illustrating a method for tracking three-dimensional eye gaze of an eye according to a second embodiment of the present invention; the specific operation steps are as follows:
step S201: acquiring a plurality of face images of a data provider by using an area array camera array, and recording three-dimensional vector labels of head gestures and eyeball actions in each face image to obtain a first subset of the face images;
step S202: respectively rotating the face images in the first subset of the face images in the clockwise direction and the anticlockwise direction to obtain a second subset of the face images;
step S203: combining the first subset of the face images and the second subset of the face images to obtain a face image data set;
step S204: respectively training a pre-constructed initial head posture detection network and an initial eyeball action detection network by utilizing the face image data set to obtain a target head posture detection network and a target eyeball detection network;
the basic network structure of the initial head posture detection network adopts an Alex Net structure, and the Alex Net structure is simplified and modified correspondingly. The number of layers of the network is unchanged, but the number of channels of each layer is reduced properly, meanwhile, the local response normalization is changed into batch normalization, and the PReLU is adopted as the activation function. The network structure of the initial head pose detection network is as follows: c (3, 1, 6) -BN-PReLU-P (2, 2) -C (3, 1, 16) -BN-PReLU-P (2, 2) -C (3, 1, 24) -BN-PReLU-C (3, 1, 24) -PReLU (3, 1, 16) -BN-PReLU-P (2, 2) -FC (256) -FC (128) -PReLU-FC (3)
Wherein, C (k, s, C) represents convolution layer with convolution kernel size k, convolution step size s and channel number C, P (k, s) represents maximum pooling layer with kernel size k and step size s, BN represents batch normalization, PReLU represents activation function, FC (n) represents full connection layer, and number of neurons is n.
The input of the eyeball motion detection network is an eye area intercepted by an original picture of a face image, the eye area is divided into a left eye part and a right eye part, because the two parts of networks are completely symmetrical, the parts of the networks are explained in detail below, eyeball image blocks are adjusted to be 36x36 with the same size, and then the initial eyeball motion detection network passes through a convolutional neural network and a full-connection network, and the initial eyeball motion detection network has the following structure: c (11,2,96) -BN-PReLU-P (2,2) -C (5,1,256) -BN-PReLU-P (2,2) -C (3,1,384) -BN-PReLU-P (2,2) -C (1,1,64) -BN-PReLU-P (2,2) -FC (128) -FC (2).
Step S205: detecting each human face in the human face image data set by using the target head posture detection network and the target eyeball action detection network to obtain the head posture and the eyeball action of each human face image;
step S206: inputting the head posture and eyeball motion of each facial image in the facial image data set to a pre-constructed initial three-dimensional sight line vector detection network for training to obtain the target three-dimensional sight line vector detection network;
the initial three-dimensional sight line vector detection network takes (y, p, r) obtained by the target head posture detection network and (theta, phi) obtained by the target eyeball motion detection network as the input of the initial three-dimensional sight line vector detection network, the initial three-dimensional sight line vector detection network is a two-layer fully-connected network, the number of neurons in the first layer of the network is 128, the number of neurons in the last layer of the network is 3, and the network corresponds to a three-dimensional sight line vector.
Loss function Loss when training the head posture detection network and the initial eyeball motion detection network 1 =Loss h +Loss e Detecting a loss function of a network for the preliminary head pose
Figure BDA0001870731570000091
And said preliminary eye movement detection network loss function
Figure BDA0001870731570000092
And (4) summing.
When the pre-established initial three-dimensional sight line vector detection network is used for training, the Loss function Loss at present 2 =Loss 1 +Loss g =Loss h +Loss e +Loss g Is a Loss function Loss 1 And the initial three-dimensional sight line vector detection network loss function
Figure BDA0001870731570000101
And (4) summing.
Loss h =||h-h GT || 2 ,h={y,p,r}
Loss e =||e-e GT || 2 ,e={φ,θ}
Loss g =||g c -g c GT || 2 ,g c ={x,y,z}
Step S207: inputting a human face image to be detected into the target head posture detection network to obtain a head posture in the human face image to be detected;
step S208: inputting the face image to be detected into the target eyeball action detection network to obtain the eyeball action of the face image to be detected;
step S209: and inputting the head posture of the face image to be detected and the eyeball action of the face image to be detected into the target three-dimensional sight line vector detection network to obtain the three-dimensional sight line direction vector of the eyeball in the face image to be detected.
In the prior art, only two-dimensional labeling of eyeball center positions is performed in eyeball identification, and finally only two-dimensional information of eyeballs can be obtained, so that the application is limited. In the embodiment, the network training adopts end-to-end step-by-step training, and in the first training process, the existing data set of the head posture and the eyeball action data set can be fully utilized, so that the training data set is greatly increased, and the deep network in the embodiment has better generalization capability.
Referring to fig. 3, fig. 3 is a block diagram illustrating a structure of an apparatus for tracking three-dimensional eye gaze according to an embodiment of the present invention; the specific device may include:
the head pose detection module 100 is configured to input a human face image to be detected into a pre-constructed head pose detection network, so as to obtain a head pose in the human face image;
an eyeball motion detection module 200, configured to input the face image into a pre-constructed eyeball motion detection network, so as to obtain an eyeball motion of the face image;
the three-dimensional sight line detection module 300 is configured to input the head pose and the eyeball motion to a pre-constructed three-dimensional sight line vector detection network to obtain a three-dimensional sight line direction vector of the eyeball in the face image.
The device for realizing three-dimensional eye gaze tracking of an eyeball of the present embodiment is used for realizing the aforementioned method for realizing three-dimensional eye gaze tracking of an eyeball, and therefore specific embodiments of the device for realizing three-dimensional eye gaze tracking of an eyeball may be found in the foregoing embodiments of the method for realizing three-dimensional eye gaze tracking of an eyeball, for example, the head posture detection module 100, the eyeball motion detection module 200, and the three-dimensional eye gaze detection module 300 are respectively used for steps S101, S102, and S103 in the method for realizing three-dimensional eye gaze tracking of an eyeball, so specific embodiments thereof may refer to descriptions of corresponding embodiments of each part, and are not described herein again.
The embodiment of the invention also provides equipment for realizing three-dimensional eye gaze tracking, which comprises: a memory for storing a computer program; and the processor is used for realizing the steps of the method for realizing the three-dimensional eye gaze tracking of the eyeball when the computer program is executed.
Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the above method for implementing three-dimensional eye gaze tracking.
In the present specification, the embodiments are described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same or similar parts between the embodiments are referred to each other. The device disclosed in the embodiment corresponds to the method disclosed in the embodiment, so that the description is simple, and the relevant points can be referred to the description of the method part.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The method, apparatus, device and computer readable storage medium for three-dimensional eye gaze tracking provided by the present invention are described in detail above. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims (8)

1. A method for realizing three-dimensional eye gaze tracking of an eyeball is characterized by comprising the following steps:
inputting a human face image to be detected into a pre-constructed head posture detection network to obtain a head posture in the human face image;
inputting the face image into a pre-constructed eyeball action detection network to obtain eyeball action of the face image;
inputting the head posture and the eyeball motion into a pre-constructed three-dimensional sight line vector detection network to obtain a three-dimensional sight line direction vector of the eyeball in the face image;
the method comprises the following steps of inputting a human face image to be detected into a pre-constructed head posture detection network, and obtaining the head posture in the human face image: collecting a plurality of face images with three-dimensional labels of head postures and eye sight lines to construct a face image data set, wherein the face images are RGB images; constructing an initial head posture detection network and an initial eyeball action detection network; respectively training the initial head posture detection network and the initial eyeball action detection network by using the facial image data set to obtain the trained head posture detection network and the trained eyeball action detection network;
the network structure of the initial head pose detection network is as follows:
C(3,1,6)-BN-PReLU-P(2,2)-C(3,1,16)-BN-PReLU-P(2,2)-C(3,1,24)-BN-PReLU-C(3,1,24)-PReLU(3,1,16)-BN-PReLU-P(2,2)-FC(256)-FC(128)-PReLU-FC(3);
the initial eyeball motion detection network structure is as follows:
C(11,2,96)-BN-PReLU-P(2,2)-C(5,1,256)-BN-PReLU-P(2,2)-C(3,1,384)-BN-PReLU-P(2,2)-C(1,1,64)-BN-PReLU-P(2,2)-FC(128)-FC(2);
c (k, s, C) represents a convolution layer with a convolution kernel size of k, a convolution step size of s and a channel number of C, P (k, s) represents a maximum pooling layer with the kernel size of k and the step size of s, BN represents batch normalization, PReLU represents an activation function, FC (n) represents a full-link layer, and the number of neurons is n;
the step of inputting the head pose and the eyeball motion into a pre-constructed three-dimensional sight line vector detection network to obtain a three-dimensional sight line direction vector of the eyeball in the face image comprises the following steps:
respectively detecting the face images in the face image data set by using the head posture detection network and the eyeball action detection network to obtain the head posture and the eyeball action of each face image;
training a pre-established initial three-dimensional sight line vector detection network by utilizing the head postures and the eye movements of the human face images so as to obtain a trained three-dimensional sight line vector detection network;
the initial three-dimensional sight line vector detection network is a two-layer fully-connected network, the number of neurons in the first layer of the network is 128, the number of neurons in the last layer of the network is 3, and the three-dimensional sight line vector corresponds to the initial three-dimensional sight line vector.
2. The method of claim 1, wherein acquiring a plurality of facial images having three-dimensional labels of head pose and eye gaze, constructing a facial image dataset comprises:
respectively acquiring face images of a data provider by using each camera in an area array camera array to obtain a first subset of the face images;
each row of cameras in the area array camera array acquires a plurality of face images, and the face images represent different head postures of the data provider in the y direction;
a plurality of face images collected by each row of cameras in the area array camera array represent different head postures of the data provider in the p direction;
rotating the face images acquired by the area array camera array in the clockwise direction and the anticlockwise direction respectively to obtain a second subset of the face images representing different head postures of the data provider in the r direction;
and combining the first subset of the facial images and the second subset of the facial images to obtain the facial image data set.
3. The method of claim 2, wherein the separately acquiring the facial images of the data providers using each camera in the area array camera array comprises:
when the face images of the data provider are collected, the moving points on the display screen of the eyeball of the data provider in the front view are recorded, so that the three-dimensional vector labels of the sight line of the eyeball of the data provider are determined, and the head posture in each face image is recorded at the same time.
4. The method of claim 1, wherein the separately training the initial head pose detection network and the initial eye movement detection network with the facial image dataset comprises:
training the head posture detection network and the initial eyeball motion detection network by using the facial image data set;
wherein the Loss function Loss 1 =Loss h +Loss e Loss function Loss for preliminary head pose detection network h And Loss function Loss of preliminary eyeball motion detection network e And (4) summing.
5. The method according to claim 4, wherein the inputting the head pose and the eye movement into a pre-constructed three-dimensional gaze vector detection network comprises, before obtaining a three-dimensional gaze direction vector of an eye in the face image:
loss function Loss at present 2 =Loss 1 +Loss g =Loss h +Loss e +Loss g Is a Loss function Loss 1 And the initial three-dimensional sight line vector detection network Loss function Loss g And (4) summing.
6. An apparatus for three-dimensional eye gaze tracking, comprising:
the head posture detection module is used for inputting a human face image to be detected into a pre-constructed head posture detection network to obtain a head posture in the human face image; the method comprises the following steps of inputting a human face image to be detected into a pre-constructed head posture detection network, and obtaining the head posture in the human face image: acquiring a plurality of face images with three-dimensional labels of head postures and eye sight lines, and constructing a face image data set, wherein the face images are RGB images; constructing an initial head posture detection network and an initial eyeball action detection network; respectively training the initial head posture detection network and the initial eyeball action detection network by using the facial image data set to obtain the trained head posture detection network and the trained eyeball action detection network; the network structure of the initial head pose detection network is as follows: c (3, 1, 6) -BN-PReLU-P (2, 2) -C (3, 1, 16) -BN-PReLU-P (2, 2) -C (3, 1, 24) -BN-PReLU-C (3, 1, 24) -PReLU (3, 1, 16) -BN-PReLU-P (2, 2) -FC (256) -FC (128) -PReLU-FC (3); the initial eyeball motion detection network structure is as follows: c (11, 2, 96) -BN-PreLU-P (2, 2) -C (5, 1, 256) -BN-PReLU-P (2, 2) -C (3, 1, 384) -BN-PReLU-P (2, 2) -C (1, 64) -BN-PReLU-P (2, 2) -FC (128) -FC (2); c (k, s, C) represents a convolution layer with a convolution kernel size of k, a convolution step size of s and a channel number of C, P (k, s) represents a maximum pooling layer with the kernel size of k and the step size of s, BN represents batch normalization, PReLU represents an activation function, FC (n) represents a full-link layer, and the number of neurons is n;
the eyeball motion detection module is used for inputting the face image into a pre-constructed eyeball motion detection network to obtain the eyeball motion of the face image;
the three-dimensional sight line detection module is used for inputting the head posture and the eyeball motion into a pre-constructed three-dimensional sight line vector detection network to obtain a three-dimensional sight line direction vector of the eyeball in the face image; the step of inputting the head pose and the eyeball motion into a pre-constructed three-dimensional sight line vector detection network to obtain a three-dimensional sight line direction vector of an eyeball in the face image comprises the following steps: respectively detecting the face images in the face image data set by using the head posture detection network and the eyeball action detection network to obtain the head posture and the eyeball action of each face image; training a pre-established initial three-dimensional sight line vector detection network by using the head postures and the eye movements of the human face images so as to obtain a trained three-dimensional sight line vector detection network; the initial three-dimensional sight line vector detection network is a two-layer fully-connected network, the number of neurons in the first layer of the network is 128, the number of neurons in the last layer of the network is 3, and the three-dimensional sight line vector corresponds to the number of neurons in the first layer of the network.
7. An apparatus for three-dimensional eye gaze tracking, comprising:
a memory for storing a computer program;
a processor for implementing the steps of a method of three-dimensional eye gaze tracking according to any of claims 1 to 5 when executing said computer program.
8. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of a method for three-dimensional eye gaze tracking according to any one of claims 1 to 5.
CN201811375929.7A 2018-11-19 2018-11-19 Method, device and equipment for realizing three-dimensional eye gaze tracking and storage medium Active CN109508679B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811375929.7A CN109508679B (en) 2018-11-19 2018-11-19 Method, device and equipment for realizing three-dimensional eye gaze tracking and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811375929.7A CN109508679B (en) 2018-11-19 2018-11-19 Method, device and equipment for realizing three-dimensional eye gaze tracking and storage medium

Publications (2)

Publication Number Publication Date
CN109508679A CN109508679A (en) 2019-03-22
CN109508679B true CN109508679B (en) 2023-02-10

Family

ID=65749029

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811375929.7A Active CN109508679B (en) 2018-11-19 2018-11-19 Method, device and equipment for realizing three-dimensional eye gaze tracking and storage medium

Country Status (1)

Country Link
CN (1) CN109508679B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110058694B (en) 2019-04-24 2022-03-25 腾讯科技(深圳)有限公司 Sight tracking model training method, sight tracking method and sight tracking device
CN110191234B (en) * 2019-06-21 2021-03-26 中山大学 Intelligent terminal unlocking method based on fixation point analysis
CN110555426A (en) * 2019-09-11 2019-12-10 北京儒博科技有限公司 Sight line detection method, device, equipment and storage medium
CN110909611B (en) * 2019-10-29 2021-03-05 深圳云天励飞技术有限公司 Method and device for detecting attention area, readable storage medium and terminal equipment
CN111178278B (en) * 2019-12-30 2022-04-08 上海商汤临港智能科技有限公司 Sight direction determining method and device, electronic equipment and storage medium
CN111847147B (en) * 2020-06-18 2023-04-18 闽江学院 Non-contact eye-movement type elevator floor input method and device
CN112114671A (en) * 2020-09-22 2020-12-22 上海汽车集团股份有限公司 Human-vehicle interaction method and device based on human eye sight and storage medium
CN114529731B (en) * 2020-10-30 2024-07-12 北京眼神智能科技有限公司 Face feature point positioning and attribute analysis method, device, storage medium and equipment
CN112465862B (en) * 2020-11-24 2024-05-24 西北工业大学 Visual target tracking method based on cross-domain depth convolution neural network

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391574A (en) * 2014-11-14 2015-03-04 京东方科技集团股份有限公司 Sight processing method, sight processing system, terminal equipment and wearable equipment
CN105740846A (en) * 2016-03-02 2016-07-06 河海大学常州校区 Horizontal visual angle estimation and calibration method based on depth camera
CN106598221A (en) * 2016-11-17 2017-04-26 电子科技大学 Eye key point detection-based 3D sight line direction estimation method
JP2017213191A (en) * 2016-05-31 2017-12-07 富士通株式会社 Sight line detection device, sight line detection method and sight line detection program
CN107818310A (en) * 2017-11-03 2018-03-20 电子科技大学 A kind of driver attention's detection method based on sight
CN108171218A (en) * 2018-01-29 2018-06-15 深圳市唯特视科技有限公司 A kind of gaze estimation method for watching network attentively based on appearance of depth
CN108229284A (en) * 2017-05-26 2018-06-29 北京市商汤科技开发有限公司 Eye-controlling focus and training method and device, system, electronic equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5044237B2 (en) * 2006-03-27 2012-10-10 富士フイルム株式会社 Image recording apparatus, image recording method, and image recording program
CN103809737A (en) * 2012-11-13 2014-05-21 华为技术有限公司 Method and device for human-computer interaction
WO2017013913A1 (en) * 2015-07-17 2017-01-26 ソニー株式会社 Gaze detection device, eyewear terminal, gaze detection method, and program

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391574A (en) * 2014-11-14 2015-03-04 京东方科技集团股份有限公司 Sight processing method, sight processing system, terminal equipment and wearable equipment
CN105740846A (en) * 2016-03-02 2016-07-06 河海大学常州校区 Horizontal visual angle estimation and calibration method based on depth camera
JP2017213191A (en) * 2016-05-31 2017-12-07 富士通株式会社 Sight line detection device, sight line detection method and sight line detection program
CN106598221A (en) * 2016-11-17 2017-04-26 电子科技大学 Eye key point detection-based 3D sight line direction estimation method
CN108229284A (en) * 2017-05-26 2018-06-29 北京市商汤科技开发有限公司 Eye-controlling focus and training method and device, system, electronic equipment and storage medium
CN107818310A (en) * 2017-11-03 2018-03-20 电子科技大学 A kind of driver attention's detection method based on sight
CN108171218A (en) * 2018-01-29 2018-06-15 深圳市唯特视科技有限公司 A kind of gaze estimation method for watching network attentively based on appearance of depth

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于3D人眼模型的视线跟踪技术综述;周小龙,汤帆扬,管秋,华敏;《计算机辅助设计与图形学学报》;20170915;第09卷(第29期);1579-1589 *

Also Published As

Publication number Publication date
CN109508679A (en) 2019-03-22

Similar Documents

Publication Publication Date Title
CN109508679B (en) Method, device and equipment for realizing three-dimensional eye gaze tracking and storage medium
CN108229284B (en) Sight tracking and training method and device, system, electronic equipment and storage medium
WO2020125499A1 (en) Operation prompting method and glasses
US10157477B2 (en) Robust head pose estimation with a depth camera
CN110363133B (en) Method, device, equipment and storage medium for sight line detection and video processing
Gorodnichy et al. Nouse ‘use your nose as a mouse’perceptual vision technology for hands-free games and interfaces
Qiao et al. Viewport-dependent saliency prediction in 360 video
US11703949B2 (en) Directional assistance for centering a face in a camera field of view
WO2022156640A1 (en) Gaze correction method and apparatus for image, electronic device, computer-readable storage medium, and computer program product
WO2020091891A1 (en) Cross-domain image translation
US20200105013A1 (en) Robust Head Pose Estimation with a Depth Camera
US11574424B2 (en) Augmented reality map curation
CN111710036A (en) Method, device and equipment for constructing three-dimensional face model and storage medium
Schauerte et al. Saliency-based identification and recognition of pointed-at objects
US20170316610A1 (en) Assembly instruction system and assembly instruction method
JP2023545190A (en) Image line-of-sight correction method, device, electronic device, and computer program
CN111353336B (en) Image processing method, device and equipment
CN111046734A (en) Multi-modal fusion sight line estimation method based on expansion convolution
Sidenko et al. Eye-tracking technology for the analysis of dynamic data
Perra et al. Adaptive eye-camera calibration for head-worn devices
JP2022095332A (en) Learning model generation method, computer program and information processing device
Funes Mora et al. Eyediap database: Data description and gaze tracking evaluation benchmarks
CN118076984A (en) Method and apparatus for line of sight estimation
Li et al. Estimating gaze points from facial landmarks by a remote spherical camera
Kumano et al. Automatic gaze analysis in multiparty conversations based on collective first-person vision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant