CN108171218A - A kind of gaze estimation method for watching network attentively based on appearance of depth - Google Patents

A kind of gaze estimation method for watching network attentively based on appearance of depth Download PDF

Info

Publication number
CN108171218A
CN108171218A CN201810081808.5A CN201810081808A CN108171218A CN 108171218 A CN108171218 A CN 108171218A CN 201810081808 A CN201810081808 A CN 201810081808A CN 108171218 A CN108171218 A CN 108171218A
Authority
CN
China
Prior art keywords
image
facial
dimensional
camera
eye
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201810081808.5A
Other languages
Chinese (zh)
Inventor
夏春秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Vision Technology Co Ltd
Original Assignee
Shenzhen Vision Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Vision Technology Co Ltd filed Critical Shenzhen Vision Technology Co Ltd
Priority to CN201810081808.5A priority Critical patent/CN108171218A/en
Publication of CN108171218A publication Critical patent/CN108171218A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/19Sensors therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/193Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/197Matching; Classification

Abstract

The present invention proposes a kind of gaze estimation method for watching network attentively based on appearance of depth, and main contents include:Gaze data collection, watch network attentively, cross datasets are assessed, its process is, the great amount of images from different participants is acquired as gaze data collection, the manual annotations facial marks in data set subset, facial calibration is carried out to the input picture that monocular RGB video camera obtains, using Face datection method and facial marks detection method telltale mark, a general three dimensional face shape is fitted to estimate detected face 3 d pose, application space normalizes technology, head pose and eye image are twisted into normalized trained space, learn the mapping of three-dimensional direction of gaze in head pose and eye image to camera coordinates system using convolutional neural networks.The present invention detects facial marks using condition of continuity neural network model, and average facial contours carry out 3 d pose estimation, estimate suitable for the sight under varying environment, improve the accuracy of estimated result.

Description

A kind of gaze estimation method for watching network attentively based on appearance of depth
Technical field
The present invention relates to sights to estimate field, estimates more particularly, to a kind of sight for watching network attentively based on appearance of depth Method.
Background technology
Sight estimation is that the human eye sight direction in picture is estimated, calculates and returns to eyes eyeball center Coordinate and eyes sight direction vector.The real-time tracing to human eye sight can be realized in video, be usually used in public security, handed over The fields such as logical, medicine, military investigation.Specifically, in police field, by the estimation to human eye sight direction, it may infer that people Situations such as psychological activity of object area of interest or subject object, further research people, available in convict's Interrogation Procedure It detects a lie.In field of traffic, the direction of gaze of the automatic monitoring driver of sight estimation, once driver sees the mobile phone or dozes off, depending on When line deviates road ahead, system is issued by alarm and reminding driver, and vehicle assistant drive is carried out with this.In medical domain, On the one hand sight can be estimated to be mounted in robot, is estimated by the sight to user come control device, be disabled person Life provide convenience, on the other hand, can pass through and cognitive disorder diagnosis etc. is carried out to the estimation of the sight of patient.And in military affairs Field can realize the control to external equipment and system using the motor behavior of eyes, further improve military equipment Human-computer interaction is horizontal.Since the facial image background that is acquired in daily life is complicated, it is illuminated by the light, posture, direction of gaze and a The conditions such as people's appearance influence, and existing method is only applicable to the data set of specific background mostly, once cross datasets are estimated As a result meter is easy for generating error, therefore accurately to carry out sight and estimate that even there are certain challenges.
The present invention proposes a kind of gaze estimation method for watching network attentively based on appearance of depth, and acquisition is from different participants Great amount of images as gaze data collection, manual annotations facial marks in the subset that is provided in data set, to monocular RGB video camera The input picture of acquisition carries out facial calibration, using Face datection method and facial marks detection method telltale mark, is fitted one and leads to Three dimensional face shape estimates detected face 3 d pose, application space normalization technology, by head appearance State and eye image twist into normalized trained space, are arrived using convolutional neural networks to learn head pose and eye image The mapping of three-dimensional direction of gaze in camera coordinates system.The present invention detects facial marks using condition of continuity neural network model, adopts 3 d pose estimation is carried out with average facial contours, and assesses entire sight estimation in the actual environment, suitable for varying environment Under sight estimation, improve the accuracy of estimated result.
Invention content
Estimate for sight, the present invention proposes a kind of gaze estimation method for watching network attentively based on appearance of depth, acquires Great amount of images from different participants is as gaze data collection, manual annotations facial marks in the subset provided in data set, Facial calibration is carried out to the input picture that monocular RGB video camera obtains, is positioned using Face datection method and facial marks detection method Label is fitted a general three dimensional face shape to estimate detected face 3 d pose, application space normalizing Head pose and eye image are twisted into normalized trained space, learn head using convolutional neural networks by change technology The mapping of three-dimensional direction of gaze in posture and eye image to camera coordinates system.
To solve the above problems, a kind of gaze estimation method for watching network attentively based on appearance of depth is proposed, it is main interior Appearance includes:
(1) gaze data collection;
(2) watch network attentively;
(3) cross datasets are assessed.
Wherein, the gaze data collection, in order to assess without constraint gaze estimation method, data set is needed to have and not shared the same light According to the head pose, direction of gaze and area of personal appearance of condition, the great amount of images from different participants is acquired as gaze data Collection, the three-dimensional annotation that image includes fixation object and the eyes or the three-dimensional annotation of head position that detect, later in data Manual annotations facial marks in the subset provided are provided, are assessed watching estimation performance attentively.
Further, the acquisition image by the use of laptop as collecting device, is transported on the computer of participant Row acquisition software is collected calibration annotation using the point moved in screen, is collected once within every 10 minutes, acquisition software requires to join automatically Check 20 positions represented with gray circles at random with person, participant watches acquisition position attentively, pressed when circle will disappear Space bar confirms, if missing, records identical screen position again, since the Computer model of gathered data collection is different, therefore shields Position of watching attentively on curtain is converted to three-dimensional position in camera coordinates system, before gathered data, uses cameras record participant Six three dimensional face marks, for establishing three dimensional face model.
Further, the annotation facial marks, manual annotations have the image subset of facial marks, assess face accidentally Difference estimates watching attentively the influence of performance, randomly chooses 1500 left-eye images and 1500 eye images as assessment subset, leads to It crosses method of facial landmark detection and six marks is generated on each face-image, including four eyes marks and two corners of the mouth marks Will, cuts the eye image of facial marks manually, and pupil center is annotated.
Wherein, it is described to watch network attentively, facial calibration is carried out to the input picture that monocular RGB video camera obtains, using people Face detection method and facial marks detection method telltale mark, one general three dimensional face shape of fitting are detected to estimate Face 3 d pose, application space normalization technology, head pose and eye image are twisted into normalized trained space, Learn in head pose and eye image to camera coordinates system reflecting for three-dimensional direction of gaze using convolutional neural networks (CNN) It penetrates.
Further, the facial calibration detects facial marks, using average using condition of continuity neural network model Facial contours F carries out 3 d pose estimation, assesses entire sight estimation in the actual environment, and the face that F is all participants is put down Equal shape is made of the three-dimensional position of six facial marks, is sat according to the triangle Shape definition head at connection eyes and face midpoint Mark system:
(1) x-axis is given by the line of right eye annotation dot center and left eye annotation dot center;
(2) y-axis is line of the eyes to face perpendicular to x-axis;
(3) z-axis is directed toward face's opposite direction perpendicular to triangle projective planum;
The three-dimensional rotation matrix R of mask is obtained by the two-dimentional facial marks point p detectedrWith translation vector tr, p It is a perspective point, in the case of given threedimensional model, estimates that the 3 d pose of target image and the corresponding two dimension of image are thrown Shadow estimates an initial solution using Attitude estimation algorithm, makes the facial marks that detect of F adaptations, using minimum optimization distance into Step refining posture.
Further, eyes image normalization normalizes eyes image and head pose, due to gestures of object With six-freedom degree, the eyes cosmetic variation in sextuple space must be handled, but assume eye areas by watching estimator attentively It is a plane, then the arbitrary scaling of camera and rotation can be compensated by corresponding image warpage, therefore based on appearance Estimation function need to only handle two degree of freedom of cosmetic variation.
Further, the step of rotation, rotation, is as follows:
(1) end rotation matrix R is givenr, the eye position e in camera coordinates systemr=tr+eh, wherein ehIt is head coordinate The point midway at two canthus, e in systemrRepresent eye position, normalization transition matrix M=SR, M represent eye center in camera Three-dimensional scaling and rotation in coordinate system, R is the inverse of camera spin matrix, and S is scaling, and camera is enabled to be directed at er
(2) scaling matrix S=diag (1,1, dn/||er| |), erDistance away from camera coordinates system origin is dn, from Camera parameter original camera projection matrix obtained by calibrating is expressed as Cr, CnIt is normalization camera projection matrix;
(3) identical conversion is carried out using the perspective distortion of image transformation matrix W in original image pixels, wherein Cn=[fx,0,cx;0,fy,cy;0,0,1], f is the focal length for normalizing camera, and c is the principal point for normalizing camera;
(4) entire normalization process is enabled to be applied to right eye and left eye in an identical manner, is determined according to corresponding eye position Adopted er, generate one group of eye image I, end rotation matrix Rn=MRrWith angle of direction vector gn=MgrPositioned at normalization space, gr It is that e is derived from original camera coordinate systemrThree-dimensional sight line vector, normalized end rotation matrix RnBe converted to three-dimensional rotation Gyration hn
(5) it is zero due to normalizing back wall around the rotation angle of z-axis, therefore uses hnRepresent Two Dimensional Rotating vector h, gnRepresent false Two Dimensional Rotating vector g in order bit length, by dnIt is defined as 600mm, normalization camera projection matrix CnFocal length fxAnd fyFor Normalized eye image resolution ratio is set as 60 × 36, c in 960, IxAnd cy30 and 18 are respectively set to, eyes image I is returning One change after be converted to gray level image go forward side by side column hisgram equalization, make normalization eyes image compatible between different data collection, Promote to intersect data set assessment.
Further, the network structure, the task of CNN is in normalization space learning input feature vector to angle of direction The mapping of g is spent, the distance of change target plane of fixation under unconfined condition, flip horizontal eye image, and in the horizontal direction The mirror image of h and g is created, depth convolutional neural networks framework is fully connected layer and a classification including 13 convolutional layers, two Layer, wherein there are five maximum pond layer, using the gray scale single channel image that resolution ratio is 60 × 36 pixels as input, by first The stride of pond layer and the second pond layer is changed to 1 from 2, to reflect smaller input resolution ratio, by yaw angleAnd pitch angle The two-dimentional angle of direction vector of compositionAs output, head pose information h inputs first are fully connected layer, using predicted vectorThe L2 norms of distance lose summation as loss function between true vector g.
Wherein, cross datasets assessment, using the weight of 16 layer depth convolutional neural networks in ImageNet data It is assessed on collection, in the entire network 15000 iteration of progress, the size of each batch makes to be set as 256 on training set Two momentum values are set as β with solver1=0.9 and β2=0.9, initial learning rate is 0.00001, after iteration 5000 times It is multiplied by 0.1.
Description of the drawings
Fig. 1 is a kind of frame diagram for the gaze estimation method for watching network attentively based on appearance of depth of the present invention.
Fig. 2 is a kind of network structure for the gaze estimation method for watching network attentively based on appearance of depth of the present invention.
Fig. 3 is a kind of eye sample graph for the gaze estimation method for watching network attentively based on appearance of depth of the present invention.
Fig. 4 is a kind of head coordinate system figure for the gaze estimation method for watching network attentively based on appearance of depth of the present invention.
Specific embodiment
It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the application can phase It mutually combines, the present invention is described in further detail in the following with reference to the drawings and specific embodiments.
Fig. 1 is a kind of frame diagram for the gaze estimation method for watching network attentively based on appearance of depth of the present invention.It is main to include note Depending on data set, watch network, cross datasets assessment attentively.
Wherein, the gaze data collection, in order to assess without constraint gaze estimation method, data set is needed to have and not shared the same light According to the head pose, direction of gaze and area of personal appearance of condition, the great amount of images from different participants is acquired as gaze data Collection, the three-dimensional annotation that image includes fixation object and the eyes or the three-dimensional annotation of head position that detect, later in data Manual annotations facial marks in the subset provided are provided, are assessed watching estimation performance attentively.
Further, the acquisition image by the use of laptop as collecting device, is transported on the computer of participant Row acquisition software is collected calibration annotation using the point moved in screen, is collected once within every 10 minutes, acquisition software requires to join automatically Check 20 positions represented with gray circles at random with person, participant watches acquisition position attentively, pressed when circle will disappear Space bar confirms, if missing, records identical screen position again, since the Computer model of gathered data collection is different, therefore shields Position of watching attentively on curtain is converted to three-dimensional position in camera coordinates system, before gathered data, uses cameras record participant Six three dimensional face marks, for establishing three dimensional face model.
Further, the annotation facial marks, manual annotations have the image subset of facial marks, assess face accidentally Difference estimates watching attentively the influence of performance, randomly chooses 1500 left-eye images and 1500 eye images as assessment subset, leads to It crosses method of facial landmark detection and six marks is generated on each face-image, including four eyes marks and two corners of the mouth marks Will, cuts the eye image of facial marks manually, and pupil center is annotated.
Wherein, it is described to watch network attentively, facial calibration is carried out to the input picture that monocular RGB video camera obtains, using people Face detection method and facial marks detection method telltale mark, one general three dimensional face shape of fitting are detected to estimate Face 3 d pose, application space normalization technology, head pose and eye image are twisted into normalized trained space, Learn in head pose and eye image to camera coordinates system reflecting for three-dimensional direction of gaze using convolutional neural networks (CNN) It penetrates.
Further, the facial calibration detects facial marks, using average using condition of continuity neural network model Facial contours F carries out 3 d pose estimation, assesses entire sight estimation in the actual environment, and the face that F is all participants is put down Equal shape is made of the three-dimensional position of six facial marks, is sat according to the triangle Shape definition head at connection eyes and face midpoint Mark system:
(1) x-axis is given by the line of right eye annotation dot center and left eye annotation dot center;
(2) y-axis is line of the eyes to face perpendicular to x-axis;
(3) z-axis is directed toward face's opposite direction perpendicular to triangle projective planum;
The three-dimensional rotation matrix R of mask is obtained by the two-dimentional facial marks point p detectedrWith translation vector tr, p It is a perspective point, in the case of given threedimensional model, estimates that the 3 d pose of target image and the corresponding two dimension of image are thrown Shadow estimates an initial solution using Attitude estimation algorithm, makes the facial marks that detect of F adaptations, using minimum optimization distance into Step refining posture.
Further, eyes image normalization normalizes eyes image and head pose, due to gestures of object With six-freedom degree, the eyes cosmetic variation in sextuple space must be handled, but assume eye areas by watching estimator attentively It is a plane, then the arbitrary scaling of camera and rotation can be compensated by corresponding image warpage, therefore based on appearance Estimation function need to only handle two degree of freedom of cosmetic variation.
Further, the step of rotation, rotation, is as follows:
(1) end rotation matrix R is givenr, the eye position e in camera coordinates systemr=tr+eh, wherein ehIt is head coordinate The point midway at two canthus, e in systemrRepresent eye position, normalization transition matrix M=SR, M represent eye center in camera Three-dimensional scaling and rotation in coordinate system, R is the inverse of camera spin matrix, and S is scaling, and camera is enabled to be directed at er
(2) scaling matrix S=diag (1,1, dn/||er| |), erDistance away from camera coordinates system origin is dn, from Camera parameter original camera projection matrix obtained by calibrating is expressed as Cr, CnIt is normalization camera projection matrix;
(3) identical conversion is carried out using the perspective distortion of image transformation matrix W in original image pixels, wherein Cn=[fx,0,cx;0,fy,cy;0,0,1], f is the focal length for normalizing camera, and c is the principal point for normalizing camera;
(4) entire normalization process is enabled to be applied to right eye and left eye in an identical manner, is determined according to corresponding eye position Adopted er, generate one group of eye image I, end rotation matrix Rn=MRrWith angle of direction vector gn=MgrPositioned at normalization space, gr It is that e is derived from original camera coordinate systemrThree-dimensional sight line vector, normalized end rotation matrix RnBe converted to three-dimensional rotation Gyration hn
(5) it is zero due to normalizing back wall around the rotation angle of z-axis, therefore uses hnRepresent Two Dimensional Rotating vector h, gnRepresent false Two Dimensional Rotating vector g in order bit length, by dnIt is defined as 600mm, normalization camera projection matrix CnFocal length fxAnd fyFor Normalized eye image resolution ratio is set as 60 × 36, c in 960, IxAnd cy30 and 18 are respectively set to, eyes image I is returning One change after be converted to gray level image go forward side by side column hisgram equalization, make normalization eyes image compatible between different data collection, Promote to intersect data set assessment.
Further, the network structure, the task of CNN is in normalization space learning input feature vector to angle of direction The mapping of g is spent, the distance of change target plane of fixation under unconfined condition, flip horizontal eye image, and in the horizontal direction The mirror image of h and g is created, depth convolutional neural networks framework is fully connected layer and a classification including 13 convolutional layers, two Layer, wherein there are five maximum pond layer, using the gray scale single channel image that resolution ratio is 60 × 36 pixels as input, by first The stride of pond layer and the second pond layer is changed to 1 from 2, to reflect smaller input resolution ratio, by yaw angleAnd pitch angle The two-dimentional angle of direction vector of compositionAs output, head pose information h inputs first are fully connected layer, using predicted vector The L2 norms of distance lose summation as loss function between true vector g.
Wherein, cross datasets assessment, using the weight of 16 layer depth convolutional neural networks in ImageNet data It is assessed on collection, in the entire network 15000 iteration of progress, the size of each batch makes to be set as 256 on training set Two momentum values are set as β with solver1=0.9 and β2=0.9, initial learning rate is 0.00001, after iteration 5000 times It is multiplied by 0.1.
Fig. 2 is a kind of network structure for the gaze estimation method for watching network attentively based on appearance of depth of the present invention.To monocular The input picture that RGB video camera obtains carries out facial calibration, using Face datection method and facial marks detection method telltale mark, intends Unify a general three dimensional face shape to estimate detected face 3 d pose, application space normalizes technology, Head pose and eye image are twisted into normalized trained space, learn head appearance using convolutional neural networks (CNN) The mapping of three-dimensional direction of gaze in state and eye image to camera coordinates system.
Fig. 3 is a kind of eye sample graph for the gaze estimation method for watching network attentively based on appearance of depth of the present invention.Expression is returned Eye sample image from each data set after one change, figure (a), figure (b) they are the images from gaze data collection, figure (c), Figure (d) is the image from other data sets, and every group of image randomly chooses a roughly the same direction of gaze.With (c), (d) It compares, figure (b) is it can be seen that gaze data collection includes larger cosmetic variation in ocular, wherein can be seen that by scheming (a) Wear glasses participant image change it is the most apparent.
Fig. 4 is a kind of head coordinate system figure for the gaze estimation method for watching network attentively based on appearance of depth of the present invention.According to The triangle Shape definition head coordinate system at three midpoints of eyes and face annotation is connected, x-axis is by two midpoints, and y-axis is hung down Directly in the x-axis in triangle projective planum, z-axis is perpendicular to triangle projective planum.
For those skilled in the art, the present invention is not limited to the details of above-described embodiment, in the essence without departing substantially from the present invention In the case of refreshing and range, the present invention can be realized in other specific forms.In addition, those skilled in the art can be to this hair Bright to carry out various modification and variations without departing from the spirit and scope of the present invention, these improvements and modifications also should be regarded as the present invention's Protection domain.Therefore, appended claims are intended to be construed to include preferred embodiment and fall into all changes of the scope of the invention More and change.

Claims (10)

1. a kind of gaze estimation method for watching network attentively based on appearance of depth, which is characterized in that mainly include gaze data collection (1), watch network (two) attentively;Cross datasets assess (three).
2. based on the gaze data collection (one) described in claims 1, which is characterized in that in order to assess without constraint sight estimation side Method, data set need have the head pose of different illumination conditions, direction of gaze and area of personal appearance, and acquisition is from different participants Great amount of images as gaze data collection, image includes the three-dimensional annotation of fixation object and the eyes detected or head position The three-dimensional annotation put, the manual annotations facial marks in the subset of data set offer, are assessed watching estimation performance attentively later.
3. based on the acquisition image described in claims 2, which is characterized in that by the use of laptop as collecting device, Acquisition software is run on the computer of participant, calibration annotation is collected using the point moved in screen, collects once within every 10 minutes, adopt Collection software requires participant to check 20 positions represented with gray circles at random automatically, and participant watches acquisition position attentively, in circle Circle presses space bar confirmation when will disappear, if missing, identical screen position is recorded again, due to the electricity of gathered data collection Brain model is different, therefore the position of watching attentively on screen is converted to three-dimensional position in camera coordinates system, before gathered data, uses Six three dimensional face marks of cameras record participant, for establishing three dimensional face model.
4. based on the annotation facial marks described in claim 2, which is characterized in that manual annotations have image of facial marks Collection assesses the influence that facial error estimates watching attentively performance, randomly chooses 1500 left-eye images and 1500 eye images are made To assess subset, six marks are generated on each face-image by method of facial landmark detection, including four eyes marks Will and two corners of the mouth marks, cut the eye image of facial marks manually, and pupil center is annotated.
5. based on network (two) is watched attentively described in claims 1, which is characterized in that the input obtained to monocular RGB video camera Image carries out facial calibration, using Face datection method and facial marks detection method telltale mark, is fitted a general three-dimensional surface Portion's shape estimates detected face 3 d pose, application space normalization technology, by head pose and eyes figure As twisting into normalized trained space, learn head pose and eye image to camera using convolutional neural networks (CNN) The mapping of three-dimensional direction of gaze in coordinate system.
6. based on the facial calibration described in claims 5, which is characterized in that use condition of continuity neural network model detection faces Facial marker carries out 3 d pose estimation using average facial contours F, assesses entire sight estimation in the actual environment, F is all The facial average shape of participant, is made of the three-dimensional position of six facial marks, according to connection eyes and face midpoint three It is angular to define head coordinate system:
(1) x-axis is given by the line of right eye annotation dot center and left eye annotation dot center;
(2) y-axis is line of the eyes to face perpendicular to x-axis;
(3) z-axis is directed toward face's opposite direction perpendicular to triangle projective planum;
The three-dimensional rotation matrix R of mask is obtained by the two-dimentional facial marks point p detectedrWith translation vector tr, p is one A perspective point in the case of given threedimensional model, is estimated the 3 d pose of target image and the corresponding two-dimensional projection of image, is adopted Estimate an initial solution with Attitude estimation algorithm, the facial marks for detecting F adaptations are further thin using minimum optimization distance Change posture.
7. based on the eyes image normalization described in claims 5, which is characterized in that by eyes image and head pose normalizing Change, since gestures of object has six-freedom degree, the eyes cosmetic variation in sextuple space must be handled by watching estimator attentively, But assume that eye areas is a plane, then the arbitrary scaling of camera and rotation can be mended by corresponding image warpage It repays, therefore the estimation function based on appearance need to only handle two degree of freedom of cosmetic variation.
8. based on the rotation described in claims 7, which is characterized in that the step of rotation is as follows:
(1) end rotation matrix R is givenr, the eye position e in camera coordinates systemr=tr+eh, wherein ehIt is in the coordinate system of head The point midway at two canthus, erRepresent eye position, normalization transition matrix M=SR, M represent eye center in camera coordinates Three-dimensional scaling and rotation in system, R is the inverse of camera spin matrix, and S is scaling, and camera is enabled to be directed at er
(2) scaling matrix S=diag (1,1, dn/||er| |), erDistance away from camera coordinates system origin is dn, join from camera Number original camera projection matrix obtained by calibrating is expressed as Cr, CnIt is normalization camera projection matrix;
(3) identical conversion is carried out using the perspective distortion of image transformation matrix W in original image pixels, wherein Cn=[fx,0,cx;0,fy,cy;0,0,1], f is the focal length for normalizing camera, and c is the principal point for normalizing camera;
(4) entire normalization process is enabled to be applied to right eye and left eye in an identical manner, e is defined according to corresponding eye positionr, Generate one group of eye image I, end rotation matrix Rn=MRrWith angle of direction vector gn=MgrPositioned at normalization space, grIt is original E is derived from camera coordinate systemrThree-dimensional sight line vector, normalized end rotation matrix RnBe converted to three-dimensional rotation angle hn
(5) it is zero due to normalizing back wall around the rotation angle of z-axis, therefore uses hnRepresent Two Dimensional Rotating vector h, gnIt represents to assume single Two Dimensional Rotating vector g in bit length, by dnIt is defined as 600mm, normalization camera projection matrix CnFocal length fxAnd fyIt is 960, Normalized eye image resolution ratio is set as 60 × 36, c in IxAnd cy30 and 18 are respectively set to, eyes image I is being normalized After be converted to gray level image go forward side by side column hisgram equalization, make normalization eyes image compatible between different data collection, promote Intersect data set assessment.
9. based on the network structure described in claims 5, which is characterized in that the task of CNN is in normalization space learning Input feature vector changes the distance of target plane of fixation, flip horizontal eyes figure to the mapping of gaze angle g under unconfined condition Picture, and the mirror image of h and g is created in the horizontal direction, depth convolutional neural networks framework is fully connected including 13 convolutional layers, two Layer and a classification layer, wherein there are five maximum pond layer, use the gray scale single channel image that resolution ratio is 60 × 36 pixels As input, the stride of the first pond layer and the second pond layer is changed to 1 from 2, to reflect smaller input resolution ratio, by yawing AngleAnd pitch angleThe two-dimentional angle of direction vector of compositionAs output, head pose information h inputs first are fully connected Layer, using predicted vectorThe L2 norms of distance lose summation as loss function between true vector g.
10. assess (three) based on the cross datasets described in claims 1, which is characterized in that use 16 layer depth convolutional Neurals The weight of network is assessed on ImageNet data sets, in the entire network 15000 iteration of progress, each on training set Two momentum values are set as β by the size of batch to be set as 256 using solver1=0.9 and β2=0.9, initial learning rate Be 0.00001, per iteration 5000 times after be multiplied by 0.1.
CN201810081808.5A 2018-01-29 2018-01-29 A kind of gaze estimation method for watching network attentively based on appearance of depth Withdrawn CN108171218A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810081808.5A CN108171218A (en) 2018-01-29 2018-01-29 A kind of gaze estimation method for watching network attentively based on appearance of depth

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810081808.5A CN108171218A (en) 2018-01-29 2018-01-29 A kind of gaze estimation method for watching network attentively based on appearance of depth

Publications (1)

Publication Number Publication Date
CN108171218A true CN108171218A (en) 2018-06-15

Family

ID=62515678

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810081808.5A Withdrawn CN108171218A (en) 2018-01-29 2018-01-29 A kind of gaze estimation method for watching network attentively based on appearance of depth

Country Status (1)

Country Link
CN (1) CN108171218A (en)

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108960302A (en) * 2018-06-20 2018-12-07 同济大学 A kind of head pose estimation method based on random forest
CN109033957A (en) * 2018-06-20 2018-12-18 同济大学 A kind of gaze estimation method based on quadratic polynomial
CN109298786A (en) * 2018-09-13 2019-02-01 北京旷视科技有限公司 Mark accuracy rate appraisal procedure and device
CN109359512A (en) * 2018-08-28 2019-02-19 深圳壹账通智能科技有限公司 Eyeball position method for tracing, device, terminal and computer readable storage medium
CN109492514A (en) * 2018-08-28 2019-03-19 初速度(苏州)科技有限公司 A kind of method and system in one camera acquisition human eye sight direction
CN109508679A (en) * 2018-11-19 2019-03-22 广东工业大学 Realize method, apparatus, equipment and the storage medium of eyeball three-dimensional eye tracking
CN109583338A (en) * 2018-11-19 2019-04-05 山东派蒙机电技术有限公司 Driver Vision decentralized detection method based on depth integration neural network
CN109684969A (en) * 2018-12-18 2019-04-26 上海科技大学 Stare location estimation method, computer equipment and storage medium
CN109949374A (en) * 2019-04-26 2019-06-28 清华大学深圳研究生院 A kind of reversed camera calibration system and method based on mirror image
CN110032278A (en) * 2019-03-29 2019-07-19 华中科技大学 A kind of method for recognizing position and attitude, the apparatus and system of human eye attention object
CN110147163A (en) * 2019-05-20 2019-08-20 浙江工业大学 The eye-tracking method and system of the multi-model fusion driving of facing mobile apparatus
CN110191234A (en) * 2019-06-21 2019-08-30 中山大学 It is a kind of based on the intelligent terminal unlocking method for watching point analysis attentively
CN110647790A (en) * 2019-04-26 2020-01-03 北京七鑫易维信息技术有限公司 Method and device for determining gazing information
CN110795982A (en) * 2019-07-04 2020-02-14 哈尔滨工业大学(深圳) Apparent sight estimation method based on human body posture analysis
CN110826374A (en) * 2018-08-10 2020-02-21 Oppo广东移动通信有限公司 Method and device for monitoring human eye fixation time, storage medium and electronic equipment
CN110826367A (en) * 2018-08-10 2020-02-21 北京魔门塔科技有限公司 Human face image acquisition system for data analysis
CN110909611A (en) * 2019-10-29 2020-03-24 深圳云天励飞技术有限公司 Method and device for detecting attention area, readable storage medium and terminal equipment
CN111259713A (en) * 2019-09-16 2020-06-09 浙江工业大学 Sight tracking method based on self-adaptive weighting
CN111488775A (en) * 2019-01-29 2020-08-04 财团法人资讯工业策进会 Device and method for judging degree of fixation
CN111626152A (en) * 2020-05-13 2020-09-04 闽江学院 Space-time sight direction estimation prototype design based on Few-shot
US10769803B2 (en) 2018-11-16 2020-09-08 Industrial Technology Research Institute Sight vector detecting method and device
CN111723828A (en) * 2019-03-18 2020-09-29 北京市商汤科技开发有限公司 Watching region detection method and device and electronic equipment
WO2020231401A1 (en) * 2019-05-13 2020-11-19 Huawei Technologies Co., Ltd. A neural network for head pose and gaze estimation using photorealistic synthetic data
CN112183160A (en) * 2019-07-04 2021-01-05 北京七鑫易维科技有限公司 Sight estimation method and device
WO2021042277A1 (en) * 2019-09-03 2021-03-11 浙江大学 Method for acquiring normal vector, geometry and material of three-dimensional object employing neural network
CN112541400A (en) * 2020-11-20 2021-03-23 小米科技(武汉)有限公司 Behavior recognition method and device based on sight estimation, electronic equipment and storage medium
CN112989907A (en) * 2019-12-16 2021-06-18 辉达公司 Neural network based gaze direction determination using spatial models
CN113095274A (en) * 2021-04-26 2021-07-09 中山大学 Sight estimation method, system, device and storage medium
CN113227878A (en) * 2018-08-31 2021-08-06 目察科技股份有限公司 Method and system for gaze estimation
CN113505694A (en) * 2021-07-09 2021-10-15 南开大学 Human-computer interaction method and device based on sight tracking and computer equipment
CN113627267A (en) * 2021-07-15 2021-11-09 中汽创智科技有限公司 Sight line detection method, device, equipment and medium
JP2021530823A (en) * 2018-09-29 2021-11-11 ベイジン センスタイム テクノロジー デベロップメント カンパニー, リミテッド Neural network training methods, line-of-sight tracking methods and devices, and electronic devices
CN113807251A (en) * 2021-09-17 2021-12-17 哈尔滨理工大学 Sight estimation method based on appearance
CN113822174A (en) * 2021-09-02 2021-12-21 北京的卢深视科技有限公司 Gaze estimation method, electronic device, and storage medium
CN114546112A (en) * 2022-02-11 2022-05-27 清华大学深圳国际研究生院 Method, device and storage medium for estimating fixation point
CN114967128A (en) * 2022-06-20 2022-08-30 深圳市新联优品科技有限公司 Sight tracking system and method applied to VR glasses
CN115482574A (en) * 2022-09-29 2022-12-16 珠海视熙科技有限公司 Screen fixation point estimation method, device, medium and equipment based on deep learning
CN116052261A (en) * 2022-05-31 2023-05-02 荣耀终端有限公司 Sight estimation method and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106547341A (en) * 2015-09-21 2017-03-29 现代自动车株式会社 The method of gaze tracker and its tracing fixation
CN107368774A (en) * 2016-03-31 2017-11-21 富士通株式会社 Gaze detection equipment and gaze detection method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106547341A (en) * 2015-09-21 2017-03-29 现代自动车株式会社 The method of gaze tracker and its tracing fixation
CN107368774A (en) * 2016-03-31 2017-11-21 富士通株式会社 Gaze detection equipment and gaze detection method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XUCONG ZHANG, YUSUKE SUGANO ET.AL.: ""MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation"", 《ARXIV》 *

Cited By (59)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033957A (en) * 2018-06-20 2018-12-18 同济大学 A kind of gaze estimation method based on quadratic polynomial
CN108960302B (en) * 2018-06-20 2021-06-04 同济大学 Head attitude estimation method based on random forest
CN109033957B (en) * 2018-06-20 2021-05-11 同济大学 Sight estimation method based on quadratic polynomial
CN108960302A (en) * 2018-06-20 2018-12-07 同济大学 A kind of head pose estimation method based on random forest
CN110826374B (en) * 2018-08-10 2023-07-14 Oppo广东移动通信有限公司 Method and device for monitoring eye gazing time, storage medium and electronic equipment
CN110826367A (en) * 2018-08-10 2020-02-21 北京魔门塔科技有限公司 Human face image acquisition system for data analysis
CN110826374A (en) * 2018-08-10 2020-02-21 Oppo广东移动通信有限公司 Method and device for monitoring human eye fixation time, storage medium and electronic equipment
CN109492514A (en) * 2018-08-28 2019-03-19 初速度(苏州)科技有限公司 A kind of method and system in one camera acquisition human eye sight direction
WO2020042345A1 (en) * 2018-08-28 2020-03-05 初速度(苏州)科技有限公司 Method and system for acquiring line-of-sight direction of human eyes by means of single camera
CN109359512A (en) * 2018-08-28 2019-02-19 深圳壹账通智能科技有限公司 Eyeball position method for tracing, device, terminal and computer readable storage medium
CN113227878A (en) * 2018-08-31 2021-08-06 目察科技股份有限公司 Method and system for gaze estimation
CN109298786A (en) * 2018-09-13 2019-02-01 北京旷视科技有限公司 Mark accuracy rate appraisal procedure and device
JP2021530823A (en) * 2018-09-29 2021-11-11 ベイジン センスタイム テクノロジー デベロップメント カンパニー, リミテッド Neural network training methods, line-of-sight tracking methods and devices, and electronic devices
JP7146087B2 (en) 2018-09-29 2022-10-03 ベイジン・センスタイム・テクノロジー・デベロップメント・カンパニー・リミテッド Neural network training method, line-of-sight tracking method and device, and electronic equipment
US10769803B2 (en) 2018-11-16 2020-09-08 Industrial Technology Research Institute Sight vector detecting method and device
TWI704473B (en) * 2018-11-16 2020-09-11 財團法人工業技術研究院 Vision vector detecting method and device
CN109583338A (en) * 2018-11-19 2019-04-05 山东派蒙机电技术有限公司 Driver Vision decentralized detection method based on depth integration neural network
CN109508679B (en) * 2018-11-19 2023-02-10 广东工业大学 Method, device and equipment for realizing three-dimensional eye gaze tracking and storage medium
CN109508679A (en) * 2018-11-19 2019-03-22 广东工业大学 Realize method, apparatus, equipment and the storage medium of eyeball three-dimensional eye tracking
CN109684969B (en) * 2018-12-18 2022-09-30 上海科技大学 Gaze position estimation method, computer device, and storage medium
CN109684969A (en) * 2018-12-18 2019-04-26 上海科技大学 Stare location estimation method, computer equipment and storage medium
CN111488775B (en) * 2019-01-29 2023-04-28 财团法人资讯工业策进会 Device and method for judging degree of visibility
CN111488775A (en) * 2019-01-29 2020-08-04 财团法人资讯工业策进会 Device and method for judging degree of fixation
CN111723828A (en) * 2019-03-18 2020-09-29 北京市商汤科技开发有限公司 Watching region detection method and device and electronic equipment
CN110032278A (en) * 2019-03-29 2019-07-19 华中科技大学 A kind of method for recognizing position and attitude, the apparatus and system of human eye attention object
CN109949374B (en) * 2019-04-26 2020-12-25 清华大学深圳研究生院 Reverse camera calibration system and method based on mirror image
CN109949374A (en) * 2019-04-26 2019-06-28 清华大学深圳研究生院 A kind of reversed camera calibration system and method based on mirror image
CN110647790A (en) * 2019-04-26 2020-01-03 北京七鑫易维信息技术有限公司 Method and device for determining gazing information
WO2020231401A1 (en) * 2019-05-13 2020-11-19 Huawei Technologies Co., Ltd. A neural network for head pose and gaze estimation using photorealistic synthetic data
CN110147163B (en) * 2019-05-20 2022-06-21 浙江工业大学 Eye movement tracking method and system driven by multi-model fusion for mobile equipment
CN110147163A (en) * 2019-05-20 2019-08-20 浙江工业大学 The eye-tracking method and system of the multi-model fusion driving of facing mobile apparatus
CN110191234A (en) * 2019-06-21 2019-08-30 中山大学 It is a kind of based on the intelligent terminal unlocking method for watching point analysis attentively
CN110191234B (en) * 2019-06-21 2021-03-26 中山大学 Intelligent terminal unlocking method based on fixation point analysis
CN112183160A (en) * 2019-07-04 2021-01-05 北京七鑫易维科技有限公司 Sight estimation method and device
CN110795982A (en) * 2019-07-04 2020-02-14 哈尔滨工业大学(深圳) Apparent sight estimation method based on human body posture analysis
WO2021042277A1 (en) * 2019-09-03 2021-03-11 浙江大学 Method for acquiring normal vector, geometry and material of three-dimensional object employing neural network
US11748618B2 (en) 2019-09-03 2023-09-05 Zhejiang University Methods for obtaining normal vector, geometry and material of three-dimensional objects based on neural network
CN111259713A (en) * 2019-09-16 2020-06-09 浙江工业大学 Sight tracking method based on self-adaptive weighting
CN111259713B (en) * 2019-09-16 2023-07-21 浙江工业大学 Sight tracking method based on self-adaptive weighting
CN110909611B (en) * 2019-10-29 2021-03-05 深圳云天励飞技术有限公司 Method and device for detecting attention area, readable storage medium and terminal equipment
CN110909611A (en) * 2019-10-29 2020-03-24 深圳云天励飞技术有限公司 Method and device for detecting attention area, readable storage medium and terminal equipment
US11934955B2 (en) 2019-12-16 2024-03-19 Nvidia Corporation Neural network based facial analysis using facial landmarks and associated confidence values
CN112989907A (en) * 2019-12-16 2021-06-18 辉达公司 Neural network based gaze direction determination using spatial models
CN111626152B (en) * 2020-05-13 2023-05-30 闽江学院 Space-time line-of-sight direction estimation prototype design method based on Few-shot
CN111626152A (en) * 2020-05-13 2020-09-04 闽江学院 Space-time sight direction estimation prototype design based on Few-shot
CN112541400A (en) * 2020-11-20 2021-03-23 小米科技(武汉)有限公司 Behavior recognition method and device based on sight estimation, electronic equipment and storage medium
CN113095274B (en) * 2021-04-26 2024-02-09 中山大学 Sight estimation method, system, device and storage medium
CN113095274A (en) * 2021-04-26 2021-07-09 中山大学 Sight estimation method, system, device and storage medium
CN113505694B (en) * 2021-07-09 2024-03-26 南开大学 Man-machine interaction method and device based on sight tracking and computer equipment
CN113505694A (en) * 2021-07-09 2021-10-15 南开大学 Human-computer interaction method and device based on sight tracking and computer equipment
CN113627267A (en) * 2021-07-15 2021-11-09 中汽创智科技有限公司 Sight line detection method, device, equipment and medium
CN113822174A (en) * 2021-09-02 2021-12-21 北京的卢深视科技有限公司 Gaze estimation method, electronic device, and storage medium
CN113807251A (en) * 2021-09-17 2021-12-17 哈尔滨理工大学 Sight estimation method based on appearance
CN114546112B (en) * 2022-02-11 2023-10-17 清华大学深圳国际研究生院 Gaze point estimation method, gaze point estimation device and storage medium
CN114546112A (en) * 2022-02-11 2022-05-27 清华大学深圳国际研究生院 Method, device and storage medium for estimating fixation point
CN116052261A (en) * 2022-05-31 2023-05-02 荣耀终端有限公司 Sight estimation method and electronic equipment
CN114967128A (en) * 2022-06-20 2022-08-30 深圳市新联优品科技有限公司 Sight tracking system and method applied to VR glasses
CN115482574B (en) * 2022-09-29 2023-07-21 珠海视熙科技有限公司 Screen gaze point estimation method, device, medium and equipment based on deep learning
CN115482574A (en) * 2022-09-29 2022-12-16 珠海视熙科技有限公司 Screen fixation point estimation method, device, medium and equipment based on deep learning

Similar Documents

Publication Publication Date Title
CN108171218A (en) A kind of gaze estimation method for watching network attentively based on appearance of depth
US20220214747A1 (en) Interactive Motion-Based Eye Tracking Calibration
CN104978548B (en) A kind of gaze estimation method and device based on three-dimensional active shape model
Zhu et al. Novel eye gaze tracking techniques under natural head movement
JP4692526B2 (en) Gaze direction estimation apparatus, gaze direction estimation method, and program for causing computer to execute gaze direction estimation method
CN109690553A (en) The system and method for executing eye gaze tracking
CN108229284A (en) Eye-controlling focus and training method and device, system, electronic equipment and storage medium
Hennessey et al. Noncontact binocular eye-gaze tracking for point-of-gaze estimation in three dimensions
CN106796449A (en) Eye-controlling focus method and device
US20140111630A1 (en) Systems and methods for iris detection and gaze estimation
WO2016014718A1 (en) Systems, devices, and methods for tracking and compensating for patient motion during a medical imaging scan
JP4936491B2 (en) Gaze direction estimation apparatus, gaze direction estimation method, and program for causing computer to execute gaze direction estimation method
EP3154407B1 (en) A gaze estimation method and apparatus
US11181978B2 (en) System and method for gaze estimation
CN109344714A (en) One kind being based on the matched gaze estimation method of key point
Xiong et al. Eye control system base on ameliorated hough transform algorithm
Al-Rahayfeh et al. Enhanced frame rate for real-time eye tracking using circular hough transform
CN109815913B (en) Visual enhancement perception system and method based on eye movement information
CN104679222A (en) Medical office system based on human-computer interaction, medical information sharing system and method
CN114022514A (en) Real-time sight line inference method integrating head posture and eyeball tracking
Kaminski et al. Single image face orientation and gaze detection
Weidenbacher et al. Detection of head pose and gaze direction for human-computer interaction
Nitschke Image-based eye pose and reflection analysis for advanced interaction techniques and scene understanding
Parada et al. ExpertEyes: Open-source, high-definition eyetracking
Lanillos et al. A Bayesian hierarchy for robust gaze estimation in human–robot interaction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20180615