CN108171218A - A kind of gaze estimation method for watching network attentively based on appearance of depth - Google Patents
A kind of gaze estimation method for watching network attentively based on appearance of depth Download PDFInfo
- Publication number
- CN108171218A CN108171218A CN201810081808.5A CN201810081808A CN108171218A CN 108171218 A CN108171218 A CN 108171218A CN 201810081808 A CN201810081808 A CN 201810081808A CN 108171218 A CN108171218 A CN 108171218A
- Authority
- CN
- China
- Prior art keywords
- image
- facial
- dimensional
- camera
- eye
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/18—Eye characteristics, e.g. of the iris
- G06V40/19—Sensors therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/18—Eye characteristics, e.g. of the iris
- G06V40/193—Preprocessing; Feature extraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/18—Eye characteristics, e.g. of the iris
- G06V40/197—Matching; Classification
Abstract
The present invention proposes a kind of gaze estimation method for watching network attentively based on appearance of depth, and main contents include:Gaze data collection, watch network attentively, cross datasets are assessed, its process is, the great amount of images from different participants is acquired as gaze data collection, the manual annotations facial marks in data set subset, facial calibration is carried out to the input picture that monocular RGB video camera obtains, using Face datection method and facial marks detection method telltale mark, a general three dimensional face shape is fitted to estimate detected face 3 d pose, application space normalizes technology, head pose and eye image are twisted into normalized trained space, learn the mapping of three-dimensional direction of gaze in head pose and eye image to camera coordinates system using convolutional neural networks.The present invention detects facial marks using condition of continuity neural network model, and average facial contours carry out 3 d pose estimation, estimate suitable for the sight under varying environment, improve the accuracy of estimated result.
Description
Technical field
The present invention relates to sights to estimate field, estimates more particularly, to a kind of sight for watching network attentively based on appearance of depth
Method.
Background technology
Sight estimation is that the human eye sight direction in picture is estimated, calculates and returns to eyes eyeball center
Coordinate and eyes sight direction vector.The real-time tracing to human eye sight can be realized in video, be usually used in public security, handed over
The fields such as logical, medicine, military investigation.Specifically, in police field, by the estimation to human eye sight direction, it may infer that people
Situations such as psychological activity of object area of interest or subject object, further research people, available in convict's Interrogation Procedure
It detects a lie.In field of traffic, the direction of gaze of the automatic monitoring driver of sight estimation, once driver sees the mobile phone or dozes off, depending on
When line deviates road ahead, system is issued by alarm and reminding driver, and vehicle assistant drive is carried out with this.In medical domain,
On the one hand sight can be estimated to be mounted in robot, is estimated by the sight to user come control device, be disabled person
Life provide convenience, on the other hand, can pass through and cognitive disorder diagnosis etc. is carried out to the estimation of the sight of patient.And in military affairs
Field can realize the control to external equipment and system using the motor behavior of eyes, further improve military equipment
Human-computer interaction is horizontal.Since the facial image background that is acquired in daily life is complicated, it is illuminated by the light, posture, direction of gaze and a
The conditions such as people's appearance influence, and existing method is only applicable to the data set of specific background mostly, once cross datasets are estimated
As a result meter is easy for generating error, therefore accurately to carry out sight and estimate that even there are certain challenges.
The present invention proposes a kind of gaze estimation method for watching network attentively based on appearance of depth, and acquisition is from different participants
Great amount of images as gaze data collection, manual annotations facial marks in the subset that is provided in data set, to monocular RGB video camera
The input picture of acquisition carries out facial calibration, using Face datection method and facial marks detection method telltale mark, is fitted one and leads to
Three dimensional face shape estimates detected face 3 d pose, application space normalization technology, by head appearance
State and eye image twist into normalized trained space, are arrived using convolutional neural networks to learn head pose and eye image
The mapping of three-dimensional direction of gaze in camera coordinates system.The present invention detects facial marks using condition of continuity neural network model, adopts
3 d pose estimation is carried out with average facial contours, and assesses entire sight estimation in the actual environment, suitable for varying environment
Under sight estimation, improve the accuracy of estimated result.
Invention content
Estimate for sight, the present invention proposes a kind of gaze estimation method for watching network attentively based on appearance of depth, acquires
Great amount of images from different participants is as gaze data collection, manual annotations facial marks in the subset provided in data set,
Facial calibration is carried out to the input picture that monocular RGB video camera obtains, is positioned using Face datection method and facial marks detection method
Label is fitted a general three dimensional face shape to estimate detected face 3 d pose, application space normalizing
Head pose and eye image are twisted into normalized trained space, learn head using convolutional neural networks by change technology
The mapping of three-dimensional direction of gaze in posture and eye image to camera coordinates system.
To solve the above problems, a kind of gaze estimation method for watching network attentively based on appearance of depth is proposed, it is main interior
Appearance includes:
(1) gaze data collection;
(2) watch network attentively;
(3) cross datasets are assessed.
Wherein, the gaze data collection, in order to assess without constraint gaze estimation method, data set is needed to have and not shared the same light
According to the head pose, direction of gaze and area of personal appearance of condition, the great amount of images from different participants is acquired as gaze data
Collection, the three-dimensional annotation that image includes fixation object and the eyes or the three-dimensional annotation of head position that detect, later in data
Manual annotations facial marks in the subset provided are provided, are assessed watching estimation performance attentively.
Further, the acquisition image by the use of laptop as collecting device, is transported on the computer of participant
Row acquisition software is collected calibration annotation using the point moved in screen, is collected once within every 10 minutes, acquisition software requires to join automatically
Check 20 positions represented with gray circles at random with person, participant watches acquisition position attentively, pressed when circle will disappear
Space bar confirms, if missing, records identical screen position again, since the Computer model of gathered data collection is different, therefore shields
Position of watching attentively on curtain is converted to three-dimensional position in camera coordinates system, before gathered data, uses cameras record participant
Six three dimensional face marks, for establishing three dimensional face model.
Further, the annotation facial marks, manual annotations have the image subset of facial marks, assess face accidentally
Difference estimates watching attentively the influence of performance, randomly chooses 1500 left-eye images and 1500 eye images as assessment subset, leads to
It crosses method of facial landmark detection and six marks is generated on each face-image, including four eyes marks and two corners of the mouth marks
Will, cuts the eye image of facial marks manually, and pupil center is annotated.
Wherein, it is described to watch network attentively, facial calibration is carried out to the input picture that monocular RGB video camera obtains, using people
Face detection method and facial marks detection method telltale mark, one general three dimensional face shape of fitting are detected to estimate
Face 3 d pose, application space normalization technology, head pose and eye image are twisted into normalized trained space,
Learn in head pose and eye image to camera coordinates system reflecting for three-dimensional direction of gaze using convolutional neural networks (CNN)
It penetrates.
Further, the facial calibration detects facial marks, using average using condition of continuity neural network model
Facial contours F carries out 3 d pose estimation, assesses entire sight estimation in the actual environment, and the face that F is all participants is put down
Equal shape is made of the three-dimensional position of six facial marks, is sat according to the triangle Shape definition head at connection eyes and face midpoint
Mark system:
(1) x-axis is given by the line of right eye annotation dot center and left eye annotation dot center;
(2) y-axis is line of the eyes to face perpendicular to x-axis;
(3) z-axis is directed toward face's opposite direction perpendicular to triangle projective planum;
The three-dimensional rotation matrix R of mask is obtained by the two-dimentional facial marks point p detectedrWith translation vector tr, p
It is a perspective point, in the case of given threedimensional model, estimates that the 3 d pose of target image and the corresponding two dimension of image are thrown
Shadow estimates an initial solution using Attitude estimation algorithm, makes the facial marks that detect of F adaptations, using minimum optimization distance into
Step refining posture.
Further, eyes image normalization normalizes eyes image and head pose, due to gestures of object
With six-freedom degree, the eyes cosmetic variation in sextuple space must be handled, but assume eye areas by watching estimator attentively
It is a plane, then the arbitrary scaling of camera and rotation can be compensated by corresponding image warpage, therefore based on appearance
Estimation function need to only handle two degree of freedom of cosmetic variation.
Further, the step of rotation, rotation, is as follows:
(1) end rotation matrix R is givenr, the eye position e in camera coordinates systemr=tr+eh, wherein ehIt is head coordinate
The point midway at two canthus, e in systemrRepresent eye position, normalization transition matrix M=SR, M represent eye center in camera
Three-dimensional scaling and rotation in coordinate system, R is the inverse of camera spin matrix, and S is scaling, and camera is enabled to be directed at er;
(2) scaling matrix S=diag (1,1, dn/||er| |), erDistance away from camera coordinates system origin is dn, from
Camera parameter original camera projection matrix obtained by calibrating is expressed as Cr, CnIt is normalization camera projection matrix;
(3) identical conversion is carried out using the perspective distortion of image transformation matrix W in original image pixels, wherein Cn=[fx,0,cx;0,fy,cy;0,0,1], f is the focal length for normalizing camera, and c is the principal point for normalizing camera;
(4) entire normalization process is enabled to be applied to right eye and left eye in an identical manner, is determined according to corresponding eye position
Adopted er, generate one group of eye image I, end rotation matrix Rn=MRrWith angle of direction vector gn=MgrPositioned at normalization space, gr
It is that e is derived from original camera coordinate systemrThree-dimensional sight line vector, normalized end rotation matrix RnBe converted to three-dimensional rotation
Gyration hn;
(5) it is zero due to normalizing back wall around the rotation angle of z-axis, therefore uses hnRepresent Two Dimensional Rotating vector h, gnRepresent false
Two Dimensional Rotating vector g in order bit length, by dnIt is defined as 600mm, normalization camera projection matrix CnFocal length fxAnd fyFor
Normalized eye image resolution ratio is set as 60 × 36, c in 960, IxAnd cy30 and 18 are respectively set to, eyes image I is returning
One change after be converted to gray level image go forward side by side column hisgram equalization, make normalization eyes image compatible between different data collection,
Promote to intersect data set assessment.
Further, the network structure, the task of CNN is in normalization space learning input feature vector to angle of direction
The mapping of g is spent, the distance of change target plane of fixation under unconfined condition, flip horizontal eye image, and in the horizontal direction
The mirror image of h and g is created, depth convolutional neural networks framework is fully connected layer and a classification including 13 convolutional layers, two
Layer, wherein there are five maximum pond layer, using the gray scale single channel image that resolution ratio is 60 × 36 pixels as input, by first
The stride of pond layer and the second pond layer is changed to 1 from 2, to reflect smaller input resolution ratio, by yaw angleAnd pitch angle
The two-dimentional angle of direction vector of compositionAs output, head pose information h inputs first are fully connected layer, using predicted vectorThe L2 norms of distance lose summation as loss function between true vector g.
Wherein, cross datasets assessment, using the weight of 16 layer depth convolutional neural networks in ImageNet data
It is assessed on collection, in the entire network 15000 iteration of progress, the size of each batch makes to be set as 256 on training set
Two momentum values are set as β with solver1=0.9 and β2=0.9, initial learning rate is 0.00001, after iteration 5000 times
It is multiplied by 0.1.
Description of the drawings
Fig. 1 is a kind of frame diagram for the gaze estimation method for watching network attentively based on appearance of depth of the present invention.
Fig. 2 is a kind of network structure for the gaze estimation method for watching network attentively based on appearance of depth of the present invention.
Fig. 3 is a kind of eye sample graph for the gaze estimation method for watching network attentively based on appearance of depth of the present invention.
Fig. 4 is a kind of head coordinate system figure for the gaze estimation method for watching network attentively based on appearance of depth of the present invention.
Specific embodiment
It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the application can phase
It mutually combines, the present invention is described in further detail in the following with reference to the drawings and specific embodiments.
Fig. 1 is a kind of frame diagram for the gaze estimation method for watching network attentively based on appearance of depth of the present invention.It is main to include note
Depending on data set, watch network, cross datasets assessment attentively.
Wherein, the gaze data collection, in order to assess without constraint gaze estimation method, data set is needed to have and not shared the same light
According to the head pose, direction of gaze and area of personal appearance of condition, the great amount of images from different participants is acquired as gaze data
Collection, the three-dimensional annotation that image includes fixation object and the eyes or the three-dimensional annotation of head position that detect, later in data
Manual annotations facial marks in the subset provided are provided, are assessed watching estimation performance attentively.
Further, the acquisition image by the use of laptop as collecting device, is transported on the computer of participant
Row acquisition software is collected calibration annotation using the point moved in screen, is collected once within every 10 minutes, acquisition software requires to join automatically
Check 20 positions represented with gray circles at random with person, participant watches acquisition position attentively, pressed when circle will disappear
Space bar confirms, if missing, records identical screen position again, since the Computer model of gathered data collection is different, therefore shields
Position of watching attentively on curtain is converted to three-dimensional position in camera coordinates system, before gathered data, uses cameras record participant
Six three dimensional face marks, for establishing three dimensional face model.
Further, the annotation facial marks, manual annotations have the image subset of facial marks, assess face accidentally
Difference estimates watching attentively the influence of performance, randomly chooses 1500 left-eye images and 1500 eye images as assessment subset, leads to
It crosses method of facial landmark detection and six marks is generated on each face-image, including four eyes marks and two corners of the mouth marks
Will, cuts the eye image of facial marks manually, and pupil center is annotated.
Wherein, it is described to watch network attentively, facial calibration is carried out to the input picture that monocular RGB video camera obtains, using people
Face detection method and facial marks detection method telltale mark, one general three dimensional face shape of fitting are detected to estimate
Face 3 d pose, application space normalization technology, head pose and eye image are twisted into normalized trained space,
Learn in head pose and eye image to camera coordinates system reflecting for three-dimensional direction of gaze using convolutional neural networks (CNN)
It penetrates.
Further, the facial calibration detects facial marks, using average using condition of continuity neural network model
Facial contours F carries out 3 d pose estimation, assesses entire sight estimation in the actual environment, and the face that F is all participants is put down
Equal shape is made of the three-dimensional position of six facial marks, is sat according to the triangle Shape definition head at connection eyes and face midpoint
Mark system:
(1) x-axis is given by the line of right eye annotation dot center and left eye annotation dot center;
(2) y-axis is line of the eyes to face perpendicular to x-axis;
(3) z-axis is directed toward face's opposite direction perpendicular to triangle projective planum;
The three-dimensional rotation matrix R of mask is obtained by the two-dimentional facial marks point p detectedrWith translation vector tr, p
It is a perspective point, in the case of given threedimensional model, estimates that the 3 d pose of target image and the corresponding two dimension of image are thrown
Shadow estimates an initial solution using Attitude estimation algorithm, makes the facial marks that detect of F adaptations, using minimum optimization distance into
Step refining posture.
Further, eyes image normalization normalizes eyes image and head pose, due to gestures of object
With six-freedom degree, the eyes cosmetic variation in sextuple space must be handled, but assume eye areas by watching estimator attentively
It is a plane, then the arbitrary scaling of camera and rotation can be compensated by corresponding image warpage, therefore based on appearance
Estimation function need to only handle two degree of freedom of cosmetic variation.
Further, the step of rotation, rotation, is as follows:
(1) end rotation matrix R is givenr, the eye position e in camera coordinates systemr=tr+eh, wherein ehIt is head coordinate
The point midway at two canthus, e in systemrRepresent eye position, normalization transition matrix M=SR, M represent eye center in camera
Three-dimensional scaling and rotation in coordinate system, R is the inverse of camera spin matrix, and S is scaling, and camera is enabled to be directed at er;
(2) scaling matrix S=diag (1,1, dn/||er| |), erDistance away from camera coordinates system origin is dn, from
Camera parameter original camera projection matrix obtained by calibrating is expressed as Cr, CnIt is normalization camera projection matrix;
(3) identical conversion is carried out using the perspective distortion of image transformation matrix W in original image pixels, wherein Cn=[fx,0,cx;0,fy,cy;0,0,1], f is the focal length for normalizing camera, and c is the principal point for normalizing camera;
(4) entire normalization process is enabled to be applied to right eye and left eye in an identical manner, is determined according to corresponding eye position
Adopted er, generate one group of eye image I, end rotation matrix Rn=MRrWith angle of direction vector gn=MgrPositioned at normalization space, gr
It is that e is derived from original camera coordinate systemrThree-dimensional sight line vector, normalized end rotation matrix RnBe converted to three-dimensional rotation
Gyration hn;
(5) it is zero due to normalizing back wall around the rotation angle of z-axis, therefore uses hnRepresent Two Dimensional Rotating vector h, gnRepresent false
Two Dimensional Rotating vector g in order bit length, by dnIt is defined as 600mm, normalization camera projection matrix CnFocal length fxAnd fyFor
Normalized eye image resolution ratio is set as 60 × 36, c in 960, IxAnd cy30 and 18 are respectively set to, eyes image I is returning
One change after be converted to gray level image go forward side by side column hisgram equalization, make normalization eyes image compatible between different data collection,
Promote to intersect data set assessment.
Further, the network structure, the task of CNN is in normalization space learning input feature vector to angle of direction
The mapping of g is spent, the distance of change target plane of fixation under unconfined condition, flip horizontal eye image, and in the horizontal direction
The mirror image of h and g is created, depth convolutional neural networks framework is fully connected layer and a classification including 13 convolutional layers, two
Layer, wherein there are five maximum pond layer, using the gray scale single channel image that resolution ratio is 60 × 36 pixels as input, by first
The stride of pond layer and the second pond layer is changed to 1 from 2, to reflect smaller input resolution ratio, by yaw angleAnd pitch angle
The two-dimentional angle of direction vector of compositionAs output, head pose information h inputs first are fully connected layer, using predicted vector
The L2 norms of distance lose summation as loss function between true vector g.
Wherein, cross datasets assessment, using the weight of 16 layer depth convolutional neural networks in ImageNet data
It is assessed on collection, in the entire network 15000 iteration of progress, the size of each batch makes to be set as 256 on training set
Two momentum values are set as β with solver1=0.9 and β2=0.9, initial learning rate is 0.00001, after iteration 5000 times
It is multiplied by 0.1.
Fig. 2 is a kind of network structure for the gaze estimation method for watching network attentively based on appearance of depth of the present invention.To monocular
The input picture that RGB video camera obtains carries out facial calibration, using Face datection method and facial marks detection method telltale mark, intends
Unify a general three dimensional face shape to estimate detected face 3 d pose, application space normalizes technology,
Head pose and eye image are twisted into normalized trained space, learn head appearance using convolutional neural networks (CNN)
The mapping of three-dimensional direction of gaze in state and eye image to camera coordinates system.
Fig. 3 is a kind of eye sample graph for the gaze estimation method for watching network attentively based on appearance of depth of the present invention.Expression is returned
Eye sample image from each data set after one change, figure (a), figure (b) they are the images from gaze data collection, figure (c),
Figure (d) is the image from other data sets, and every group of image randomly chooses a roughly the same direction of gaze.With (c), (d)
It compares, figure (b) is it can be seen that gaze data collection includes larger cosmetic variation in ocular, wherein can be seen that by scheming (a)
Wear glasses participant image change it is the most apparent.
Fig. 4 is a kind of head coordinate system figure for the gaze estimation method for watching network attentively based on appearance of depth of the present invention.According to
The triangle Shape definition head coordinate system at three midpoints of eyes and face annotation is connected, x-axis is by two midpoints, and y-axis is hung down
Directly in the x-axis in triangle projective planum, z-axis is perpendicular to triangle projective planum.
For those skilled in the art, the present invention is not limited to the details of above-described embodiment, in the essence without departing substantially from the present invention
In the case of refreshing and range, the present invention can be realized in other specific forms.In addition, those skilled in the art can be to this hair
Bright to carry out various modification and variations without departing from the spirit and scope of the present invention, these improvements and modifications also should be regarded as the present invention's
Protection domain.Therefore, appended claims are intended to be construed to include preferred embodiment and fall into all changes of the scope of the invention
More and change.
Claims (10)
1. a kind of gaze estimation method for watching network attentively based on appearance of depth, which is characterized in that mainly include gaze data collection
(1), watch network (two) attentively;Cross datasets assess (three).
2. based on the gaze data collection (one) described in claims 1, which is characterized in that in order to assess without constraint sight estimation side
Method, data set need have the head pose of different illumination conditions, direction of gaze and area of personal appearance, and acquisition is from different participants
Great amount of images as gaze data collection, image includes the three-dimensional annotation of fixation object and the eyes detected or head position
The three-dimensional annotation put, the manual annotations facial marks in the subset of data set offer, are assessed watching estimation performance attentively later.
3. based on the acquisition image described in claims 2, which is characterized in that by the use of laptop as collecting device,
Acquisition software is run on the computer of participant, calibration annotation is collected using the point moved in screen, collects once within every 10 minutes, adopt
Collection software requires participant to check 20 positions represented with gray circles at random automatically, and participant watches acquisition position attentively, in circle
Circle presses space bar confirmation when will disappear, if missing, identical screen position is recorded again, due to the electricity of gathered data collection
Brain model is different, therefore the position of watching attentively on screen is converted to three-dimensional position in camera coordinates system, before gathered data, uses
Six three dimensional face marks of cameras record participant, for establishing three dimensional face model.
4. based on the annotation facial marks described in claim 2, which is characterized in that manual annotations have image of facial marks
Collection assesses the influence that facial error estimates watching attentively performance, randomly chooses 1500 left-eye images and 1500 eye images are made
To assess subset, six marks are generated on each face-image by method of facial landmark detection, including four eyes marks
Will and two corners of the mouth marks, cut the eye image of facial marks manually, and pupil center is annotated.
5. based on network (two) is watched attentively described in claims 1, which is characterized in that the input obtained to monocular RGB video camera
Image carries out facial calibration, using Face datection method and facial marks detection method telltale mark, is fitted a general three-dimensional surface
Portion's shape estimates detected face 3 d pose, application space normalization technology, by head pose and eyes figure
As twisting into normalized trained space, learn head pose and eye image to camera using convolutional neural networks (CNN)
The mapping of three-dimensional direction of gaze in coordinate system.
6. based on the facial calibration described in claims 5, which is characterized in that use condition of continuity neural network model detection faces
Facial marker carries out 3 d pose estimation using average facial contours F, assesses entire sight estimation in the actual environment, F is all
The facial average shape of participant, is made of the three-dimensional position of six facial marks, according to connection eyes and face midpoint three
It is angular to define head coordinate system:
(1) x-axis is given by the line of right eye annotation dot center and left eye annotation dot center;
(2) y-axis is line of the eyes to face perpendicular to x-axis;
(3) z-axis is directed toward face's opposite direction perpendicular to triangle projective planum;
The three-dimensional rotation matrix R of mask is obtained by the two-dimentional facial marks point p detectedrWith translation vector tr, p is one
A perspective point in the case of given threedimensional model, is estimated the 3 d pose of target image and the corresponding two-dimensional projection of image, is adopted
Estimate an initial solution with Attitude estimation algorithm, the facial marks for detecting F adaptations are further thin using minimum optimization distance
Change posture.
7. based on the eyes image normalization described in claims 5, which is characterized in that by eyes image and head pose normalizing
Change, since gestures of object has six-freedom degree, the eyes cosmetic variation in sextuple space must be handled by watching estimator attentively,
But assume that eye areas is a plane, then the arbitrary scaling of camera and rotation can be mended by corresponding image warpage
It repays, therefore the estimation function based on appearance need to only handle two degree of freedom of cosmetic variation.
8. based on the rotation described in claims 7, which is characterized in that the step of rotation is as follows:
(1) end rotation matrix R is givenr, the eye position e in camera coordinates systemr=tr+eh, wherein ehIt is in the coordinate system of head
The point midway at two canthus, erRepresent eye position, normalization transition matrix M=SR, M represent eye center in camera coordinates
Three-dimensional scaling and rotation in system, R is the inverse of camera spin matrix, and S is scaling, and camera is enabled to be directed at er;
(2) scaling matrix S=diag (1,1, dn/||er| |), erDistance away from camera coordinates system origin is dn, join from camera
Number original camera projection matrix obtained by calibrating is expressed as Cr, CnIt is normalization camera projection matrix;
(3) identical conversion is carried out using the perspective distortion of image transformation matrix W in original image pixels, wherein Cn=[fx,0,cx;0,fy,cy;0,0,1], f is the focal length for normalizing camera, and c is the principal point for normalizing camera;
(4) entire normalization process is enabled to be applied to right eye and left eye in an identical manner, e is defined according to corresponding eye positionr,
Generate one group of eye image I, end rotation matrix Rn=MRrWith angle of direction vector gn=MgrPositioned at normalization space, grIt is original
E is derived from camera coordinate systemrThree-dimensional sight line vector, normalized end rotation matrix RnBe converted to three-dimensional rotation angle
hn;
(5) it is zero due to normalizing back wall around the rotation angle of z-axis, therefore uses hnRepresent Two Dimensional Rotating vector h, gnIt represents to assume single
Two Dimensional Rotating vector g in bit length, by dnIt is defined as 600mm, normalization camera projection matrix CnFocal length fxAnd fyIt is 960,
Normalized eye image resolution ratio is set as 60 × 36, c in IxAnd cy30 and 18 are respectively set to, eyes image I is being normalized
After be converted to gray level image go forward side by side column hisgram equalization, make normalization eyes image compatible between different data collection, promote
Intersect data set assessment.
9. based on the network structure described in claims 5, which is characterized in that the task of CNN is in normalization space learning
Input feature vector changes the distance of target plane of fixation, flip horizontal eyes figure to the mapping of gaze angle g under unconfined condition
Picture, and the mirror image of h and g is created in the horizontal direction, depth convolutional neural networks framework is fully connected including 13 convolutional layers, two
Layer and a classification layer, wherein there are five maximum pond layer, use the gray scale single channel image that resolution ratio is 60 × 36 pixels
As input, the stride of the first pond layer and the second pond layer is changed to 1 from 2, to reflect smaller input resolution ratio, by yawing
AngleAnd pitch angleThe two-dimentional angle of direction vector of compositionAs output, head pose information h inputs first are fully connected
Layer, using predicted vectorThe L2 norms of distance lose summation as loss function between true vector g.
10. assess (three) based on the cross datasets described in claims 1, which is characterized in that use 16 layer depth convolutional Neurals
The weight of network is assessed on ImageNet data sets, in the entire network 15000 iteration of progress, each on training set
Two momentum values are set as β by the size of batch to be set as 256 using solver1=0.9 and β2=0.9, initial learning rate
Be 0.00001, per iteration 5000 times after be multiplied by 0.1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810081808.5A CN108171218A (en) | 2018-01-29 | 2018-01-29 | A kind of gaze estimation method for watching network attentively based on appearance of depth |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810081808.5A CN108171218A (en) | 2018-01-29 | 2018-01-29 | A kind of gaze estimation method for watching network attentively based on appearance of depth |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108171218A true CN108171218A (en) | 2018-06-15 |
Family
ID=62515678
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810081808.5A Withdrawn CN108171218A (en) | 2018-01-29 | 2018-01-29 | A kind of gaze estimation method for watching network attentively based on appearance of depth |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108171218A (en) |
Cited By (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108960302A (en) * | 2018-06-20 | 2018-12-07 | 同济大学 | A kind of head pose estimation method based on random forest |
CN109033957A (en) * | 2018-06-20 | 2018-12-18 | 同济大学 | A kind of gaze estimation method based on quadratic polynomial |
CN109298786A (en) * | 2018-09-13 | 2019-02-01 | 北京旷视科技有限公司 | Mark accuracy rate appraisal procedure and device |
CN109359512A (en) * | 2018-08-28 | 2019-02-19 | 深圳壹账通智能科技有限公司 | Eyeball position method for tracing, device, terminal and computer readable storage medium |
CN109492514A (en) * | 2018-08-28 | 2019-03-19 | 初速度(苏州)科技有限公司 | A kind of method and system in one camera acquisition human eye sight direction |
CN109508679A (en) * | 2018-11-19 | 2019-03-22 | 广东工业大学 | Realize method, apparatus, equipment and the storage medium of eyeball three-dimensional eye tracking |
CN109583338A (en) * | 2018-11-19 | 2019-04-05 | 山东派蒙机电技术有限公司 | Driver Vision decentralized detection method based on depth integration neural network |
CN109684969A (en) * | 2018-12-18 | 2019-04-26 | 上海科技大学 | Stare location estimation method, computer equipment and storage medium |
CN109949374A (en) * | 2019-04-26 | 2019-06-28 | 清华大学深圳研究生院 | A kind of reversed camera calibration system and method based on mirror image |
CN110032278A (en) * | 2019-03-29 | 2019-07-19 | 华中科技大学 | A kind of method for recognizing position and attitude, the apparatus and system of human eye attention object |
CN110147163A (en) * | 2019-05-20 | 2019-08-20 | 浙江工业大学 | The eye-tracking method and system of the multi-model fusion driving of facing mobile apparatus |
CN110191234A (en) * | 2019-06-21 | 2019-08-30 | 中山大学 | It is a kind of based on the intelligent terminal unlocking method for watching point analysis attentively |
CN110647790A (en) * | 2019-04-26 | 2020-01-03 | 北京七鑫易维信息技术有限公司 | Method and device for determining gazing information |
CN110795982A (en) * | 2019-07-04 | 2020-02-14 | 哈尔滨工业大学(深圳) | Apparent sight estimation method based on human body posture analysis |
CN110826374A (en) * | 2018-08-10 | 2020-02-21 | Oppo广东移动通信有限公司 | Method and device for monitoring human eye fixation time, storage medium and electronic equipment |
CN110826367A (en) * | 2018-08-10 | 2020-02-21 | 北京魔门塔科技有限公司 | Human face image acquisition system for data analysis |
CN110909611A (en) * | 2019-10-29 | 2020-03-24 | 深圳云天励飞技术有限公司 | Method and device for detecting attention area, readable storage medium and terminal equipment |
CN111259713A (en) * | 2019-09-16 | 2020-06-09 | 浙江工业大学 | Sight tracking method based on self-adaptive weighting |
CN111488775A (en) * | 2019-01-29 | 2020-08-04 | 财团法人资讯工业策进会 | Device and method for judging degree of fixation |
CN111626152A (en) * | 2020-05-13 | 2020-09-04 | 闽江学院 | Space-time sight direction estimation prototype design based on Few-shot |
US10769803B2 (en) | 2018-11-16 | 2020-09-08 | Industrial Technology Research Institute | Sight vector detecting method and device |
CN111723828A (en) * | 2019-03-18 | 2020-09-29 | 北京市商汤科技开发有限公司 | Watching region detection method and device and electronic equipment |
WO2020231401A1 (en) * | 2019-05-13 | 2020-11-19 | Huawei Technologies Co., Ltd. | A neural network for head pose and gaze estimation using photorealistic synthetic data |
CN112183160A (en) * | 2019-07-04 | 2021-01-05 | 北京七鑫易维科技有限公司 | Sight estimation method and device |
WO2021042277A1 (en) * | 2019-09-03 | 2021-03-11 | 浙江大学 | Method for acquiring normal vector, geometry and material of three-dimensional object employing neural network |
CN112541400A (en) * | 2020-11-20 | 2021-03-23 | 小米科技(武汉)有限公司 | Behavior recognition method and device based on sight estimation, electronic equipment and storage medium |
CN112989907A (en) * | 2019-12-16 | 2021-06-18 | 辉达公司 | Neural network based gaze direction determination using spatial models |
CN113095274A (en) * | 2021-04-26 | 2021-07-09 | 中山大学 | Sight estimation method, system, device and storage medium |
CN113227878A (en) * | 2018-08-31 | 2021-08-06 | 目察科技股份有限公司 | Method and system for gaze estimation |
CN113505694A (en) * | 2021-07-09 | 2021-10-15 | 南开大学 | Human-computer interaction method and device based on sight tracking and computer equipment |
CN113627267A (en) * | 2021-07-15 | 2021-11-09 | 中汽创智科技有限公司 | Sight line detection method, device, equipment and medium |
JP2021530823A (en) * | 2018-09-29 | 2021-11-11 | ベイジン センスタイム テクノロジー デベロップメント カンパニー, リミテッド | Neural network training methods, line-of-sight tracking methods and devices, and electronic devices |
CN113807251A (en) * | 2021-09-17 | 2021-12-17 | 哈尔滨理工大学 | Sight estimation method based on appearance |
CN113822174A (en) * | 2021-09-02 | 2021-12-21 | 北京的卢深视科技有限公司 | Gaze estimation method, electronic device, and storage medium |
CN114546112A (en) * | 2022-02-11 | 2022-05-27 | 清华大学深圳国际研究生院 | Method, device and storage medium for estimating fixation point |
CN114967128A (en) * | 2022-06-20 | 2022-08-30 | 深圳市新联优品科技有限公司 | Sight tracking system and method applied to VR glasses |
CN115482574A (en) * | 2022-09-29 | 2022-12-16 | 珠海视熙科技有限公司 | Screen fixation point estimation method, device, medium and equipment based on deep learning |
CN116052261A (en) * | 2022-05-31 | 2023-05-02 | 荣耀终端有限公司 | Sight estimation method and electronic equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106547341A (en) * | 2015-09-21 | 2017-03-29 | 现代自动车株式会社 | The method of gaze tracker and its tracing fixation |
CN107368774A (en) * | 2016-03-31 | 2017-11-21 | 富士通株式会社 | Gaze detection equipment and gaze detection method |
-
2018
- 2018-01-29 CN CN201810081808.5A patent/CN108171218A/en not_active Withdrawn
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106547341A (en) * | 2015-09-21 | 2017-03-29 | 现代自动车株式会社 | The method of gaze tracker and its tracing fixation |
CN107368774A (en) * | 2016-03-31 | 2017-11-21 | 富士通株式会社 | Gaze detection equipment and gaze detection method |
Non-Patent Citations (1)
Title |
---|
XUCONG ZHANG, YUSUKE SUGANO ET.AL.: ""MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation"", 《ARXIV》 * |
Cited By (59)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109033957A (en) * | 2018-06-20 | 2018-12-18 | 同济大学 | A kind of gaze estimation method based on quadratic polynomial |
CN108960302B (en) * | 2018-06-20 | 2021-06-04 | 同济大学 | Head attitude estimation method based on random forest |
CN109033957B (en) * | 2018-06-20 | 2021-05-11 | 同济大学 | Sight estimation method based on quadratic polynomial |
CN108960302A (en) * | 2018-06-20 | 2018-12-07 | 同济大学 | A kind of head pose estimation method based on random forest |
CN110826374B (en) * | 2018-08-10 | 2023-07-14 | Oppo广东移动通信有限公司 | Method and device for monitoring eye gazing time, storage medium and electronic equipment |
CN110826367A (en) * | 2018-08-10 | 2020-02-21 | 北京魔门塔科技有限公司 | Human face image acquisition system for data analysis |
CN110826374A (en) * | 2018-08-10 | 2020-02-21 | Oppo广东移动通信有限公司 | Method and device for monitoring human eye fixation time, storage medium and electronic equipment |
CN109492514A (en) * | 2018-08-28 | 2019-03-19 | 初速度(苏州)科技有限公司 | A kind of method and system in one camera acquisition human eye sight direction |
WO2020042345A1 (en) * | 2018-08-28 | 2020-03-05 | 初速度(苏州)科技有限公司 | Method and system for acquiring line-of-sight direction of human eyes by means of single camera |
CN109359512A (en) * | 2018-08-28 | 2019-02-19 | 深圳壹账通智能科技有限公司 | Eyeball position method for tracing, device, terminal and computer readable storage medium |
CN113227878A (en) * | 2018-08-31 | 2021-08-06 | 目察科技股份有限公司 | Method and system for gaze estimation |
CN109298786A (en) * | 2018-09-13 | 2019-02-01 | 北京旷视科技有限公司 | Mark accuracy rate appraisal procedure and device |
JP2021530823A (en) * | 2018-09-29 | 2021-11-11 | ベイジン センスタイム テクノロジー デベロップメント カンパニー, リミテッド | Neural network training methods, line-of-sight tracking methods and devices, and electronic devices |
JP7146087B2 (en) | 2018-09-29 | 2022-10-03 | ベイジン・センスタイム・テクノロジー・デベロップメント・カンパニー・リミテッド | Neural network training method, line-of-sight tracking method and device, and electronic equipment |
US10769803B2 (en) | 2018-11-16 | 2020-09-08 | Industrial Technology Research Institute | Sight vector detecting method and device |
TWI704473B (en) * | 2018-11-16 | 2020-09-11 | 財團法人工業技術研究院 | Vision vector detecting method and device |
CN109583338A (en) * | 2018-11-19 | 2019-04-05 | 山东派蒙机电技术有限公司 | Driver Vision decentralized detection method based on depth integration neural network |
CN109508679B (en) * | 2018-11-19 | 2023-02-10 | 广东工业大学 | Method, device and equipment for realizing three-dimensional eye gaze tracking and storage medium |
CN109508679A (en) * | 2018-11-19 | 2019-03-22 | 广东工业大学 | Realize method, apparatus, equipment and the storage medium of eyeball three-dimensional eye tracking |
CN109684969B (en) * | 2018-12-18 | 2022-09-30 | 上海科技大学 | Gaze position estimation method, computer device, and storage medium |
CN109684969A (en) * | 2018-12-18 | 2019-04-26 | 上海科技大学 | Stare location estimation method, computer equipment and storage medium |
CN111488775B (en) * | 2019-01-29 | 2023-04-28 | 财团法人资讯工业策进会 | Device and method for judging degree of visibility |
CN111488775A (en) * | 2019-01-29 | 2020-08-04 | 财团法人资讯工业策进会 | Device and method for judging degree of fixation |
CN111723828A (en) * | 2019-03-18 | 2020-09-29 | 北京市商汤科技开发有限公司 | Watching region detection method and device and electronic equipment |
CN110032278A (en) * | 2019-03-29 | 2019-07-19 | 华中科技大学 | A kind of method for recognizing position and attitude, the apparatus and system of human eye attention object |
CN109949374B (en) * | 2019-04-26 | 2020-12-25 | 清华大学深圳研究生院 | Reverse camera calibration system and method based on mirror image |
CN109949374A (en) * | 2019-04-26 | 2019-06-28 | 清华大学深圳研究生院 | A kind of reversed camera calibration system and method based on mirror image |
CN110647790A (en) * | 2019-04-26 | 2020-01-03 | 北京七鑫易维信息技术有限公司 | Method and device for determining gazing information |
WO2020231401A1 (en) * | 2019-05-13 | 2020-11-19 | Huawei Technologies Co., Ltd. | A neural network for head pose and gaze estimation using photorealistic synthetic data |
CN110147163B (en) * | 2019-05-20 | 2022-06-21 | 浙江工业大学 | Eye movement tracking method and system driven by multi-model fusion for mobile equipment |
CN110147163A (en) * | 2019-05-20 | 2019-08-20 | 浙江工业大学 | The eye-tracking method and system of the multi-model fusion driving of facing mobile apparatus |
CN110191234A (en) * | 2019-06-21 | 2019-08-30 | 中山大学 | It is a kind of based on the intelligent terminal unlocking method for watching point analysis attentively |
CN110191234B (en) * | 2019-06-21 | 2021-03-26 | 中山大学 | Intelligent terminal unlocking method based on fixation point analysis |
CN112183160A (en) * | 2019-07-04 | 2021-01-05 | 北京七鑫易维科技有限公司 | Sight estimation method and device |
CN110795982A (en) * | 2019-07-04 | 2020-02-14 | 哈尔滨工业大学(深圳) | Apparent sight estimation method based on human body posture analysis |
WO2021042277A1 (en) * | 2019-09-03 | 2021-03-11 | 浙江大学 | Method for acquiring normal vector, geometry and material of three-dimensional object employing neural network |
US11748618B2 (en) | 2019-09-03 | 2023-09-05 | Zhejiang University | Methods for obtaining normal vector, geometry and material of three-dimensional objects based on neural network |
CN111259713A (en) * | 2019-09-16 | 2020-06-09 | 浙江工业大学 | Sight tracking method based on self-adaptive weighting |
CN111259713B (en) * | 2019-09-16 | 2023-07-21 | 浙江工业大学 | Sight tracking method based on self-adaptive weighting |
CN110909611B (en) * | 2019-10-29 | 2021-03-05 | 深圳云天励飞技术有限公司 | Method and device for detecting attention area, readable storage medium and terminal equipment |
CN110909611A (en) * | 2019-10-29 | 2020-03-24 | 深圳云天励飞技术有限公司 | Method and device for detecting attention area, readable storage medium and terminal equipment |
US11934955B2 (en) | 2019-12-16 | 2024-03-19 | Nvidia Corporation | Neural network based facial analysis using facial landmarks and associated confidence values |
CN112989907A (en) * | 2019-12-16 | 2021-06-18 | 辉达公司 | Neural network based gaze direction determination using spatial models |
CN111626152B (en) * | 2020-05-13 | 2023-05-30 | 闽江学院 | Space-time line-of-sight direction estimation prototype design method based on Few-shot |
CN111626152A (en) * | 2020-05-13 | 2020-09-04 | 闽江学院 | Space-time sight direction estimation prototype design based on Few-shot |
CN112541400A (en) * | 2020-11-20 | 2021-03-23 | 小米科技(武汉)有限公司 | Behavior recognition method and device based on sight estimation, electronic equipment and storage medium |
CN113095274B (en) * | 2021-04-26 | 2024-02-09 | 中山大学 | Sight estimation method, system, device and storage medium |
CN113095274A (en) * | 2021-04-26 | 2021-07-09 | 中山大学 | Sight estimation method, system, device and storage medium |
CN113505694B (en) * | 2021-07-09 | 2024-03-26 | 南开大学 | Man-machine interaction method and device based on sight tracking and computer equipment |
CN113505694A (en) * | 2021-07-09 | 2021-10-15 | 南开大学 | Human-computer interaction method and device based on sight tracking and computer equipment |
CN113627267A (en) * | 2021-07-15 | 2021-11-09 | 中汽创智科技有限公司 | Sight line detection method, device, equipment and medium |
CN113822174A (en) * | 2021-09-02 | 2021-12-21 | 北京的卢深视科技有限公司 | Gaze estimation method, electronic device, and storage medium |
CN113807251A (en) * | 2021-09-17 | 2021-12-17 | 哈尔滨理工大学 | Sight estimation method based on appearance |
CN114546112B (en) * | 2022-02-11 | 2023-10-17 | 清华大学深圳国际研究生院 | Gaze point estimation method, gaze point estimation device and storage medium |
CN114546112A (en) * | 2022-02-11 | 2022-05-27 | 清华大学深圳国际研究生院 | Method, device and storage medium for estimating fixation point |
CN116052261A (en) * | 2022-05-31 | 2023-05-02 | 荣耀终端有限公司 | Sight estimation method and electronic equipment |
CN114967128A (en) * | 2022-06-20 | 2022-08-30 | 深圳市新联优品科技有限公司 | Sight tracking system and method applied to VR glasses |
CN115482574B (en) * | 2022-09-29 | 2023-07-21 | 珠海视熙科技有限公司 | Screen gaze point estimation method, device, medium and equipment based on deep learning |
CN115482574A (en) * | 2022-09-29 | 2022-12-16 | 珠海视熙科技有限公司 | Screen fixation point estimation method, device, medium and equipment based on deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108171218A (en) | A kind of gaze estimation method for watching network attentively based on appearance of depth | |
US20220214747A1 (en) | Interactive Motion-Based Eye Tracking Calibration | |
CN104978548B (en) | A kind of gaze estimation method and device based on three-dimensional active shape model | |
Zhu et al. | Novel eye gaze tracking techniques under natural head movement | |
JP4692526B2 (en) | Gaze direction estimation apparatus, gaze direction estimation method, and program for causing computer to execute gaze direction estimation method | |
CN109690553A (en) | The system and method for executing eye gaze tracking | |
CN108229284A (en) | Eye-controlling focus and training method and device, system, electronic equipment and storage medium | |
Hennessey et al. | Noncontact binocular eye-gaze tracking for point-of-gaze estimation in three dimensions | |
CN106796449A (en) | Eye-controlling focus method and device | |
US20140111630A1 (en) | Systems and methods for iris detection and gaze estimation | |
WO2016014718A1 (en) | Systems, devices, and methods for tracking and compensating for patient motion during a medical imaging scan | |
JP4936491B2 (en) | Gaze direction estimation apparatus, gaze direction estimation method, and program for causing computer to execute gaze direction estimation method | |
EP3154407B1 (en) | A gaze estimation method and apparatus | |
US11181978B2 (en) | System and method for gaze estimation | |
CN109344714A (en) | One kind being based on the matched gaze estimation method of key point | |
Xiong et al. | Eye control system base on ameliorated hough transform algorithm | |
Al-Rahayfeh et al. | Enhanced frame rate for real-time eye tracking using circular hough transform | |
CN109815913B (en) | Visual enhancement perception system and method based on eye movement information | |
CN104679222A (en) | Medical office system based on human-computer interaction, medical information sharing system and method | |
CN114022514A (en) | Real-time sight line inference method integrating head posture and eyeball tracking | |
Kaminski et al. | Single image face orientation and gaze detection | |
Weidenbacher et al. | Detection of head pose and gaze direction for human-computer interaction | |
Nitschke | Image-based eye pose and reflection analysis for advanced interaction techniques and scene understanding | |
Parada et al. | ExpertEyes: Open-source, high-definition eyetracking | |
Lanillos et al. | A Bayesian hierarchy for robust gaze estimation in human–robot interaction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20180615 |