US10007336B2 - Apparatus, system, and method for mobile, low-cost headset for 3D point of gaze estimation - Google Patents

Apparatus, system, and method for mobile, low-cost headset for 3D point of gaze estimation Download PDF

Info

Publication number
US10007336B2
US10007336B2 US14/482,109 US201414482109A US10007336B2 US 10007336 B2 US10007336 B2 US 10007336B2 US 201414482109 A US201414482109 A US 201414482109A US 10007336 B2 US10007336 B2 US 10007336B2
Authority
US
United States
Prior art keywords
user
gaze
eye
point
identifying
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US14/482,109
Other versions
US20150070470A1 (en
Inventor
Christopher D. McMURROUGH
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Texas System
Original Assignee
University of Texas System
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Texas System filed Critical University of Texas System
Priority to US14/482,109 priority Critical patent/US10007336B2/en
Publication of US20150070470A1 publication Critical patent/US20150070470A1/en
Assigned to THE BOARD OF REGENTS OF THE UNIVERSITY OF TEXAS SYSTEM reassignment THE BOARD OF REGENTS OF THE UNIVERSITY OF TEXAS SYSTEM ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MCMURROUGH, CHRISTOPHER D.
Application granted granted Critical
Publication of US10007336B2 publication Critical patent/US10007336B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • G06K9/00604
    • G06K9/6211
    • G06K9/6212
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/758Involving statistics of pixels or of feature values, e.g. histogram matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/19Sensors therefor

Definitions

  • This invention relates to 3D point of gaze apparatus and more particularly relates to an apparatus system and method for mobile, low-cost, head-mounted, 3D point of gaze estimation.
  • Eye gaze based interaction has many useful applications in human-machine interfaces, assistive technologies, and multimodal systems.
  • Traditional input methods such as the keyboard and mouse, are not practical in many situations and can be ineffective for some users with physical impairments.
  • Knowledge of a user's point of gaze (PoG) can be a powerful data modality in intelligent systems by facilitating intuitive control, perception of user intent, and enhanced interactive experiences.
  • Gaze tracking devices have proven to be extremely beneficial to impaired users.
  • sixteen amyotrophic lateral sclerosis (ALS) patients with severe motor impairments (loss of mobility, unable to speak, etc.) were introduced to eye tracking devices during a 1-2 week period. The patients were assessed by a psychologist during an initial meeting in order to evaluate their general quality of life.
  • ALS amyotrophic lateral sclerosis
  • ACM Press. presents a low-cost head-mounted eye tracker that uses a pair of inexpensive IEEE-1394 cameras to capture images of both the eye and scene.
  • This hardware device coupled with the open source Starburst algorithm, facilitates estimation of the user PoG in the 2D scene image.
  • a similar open source project, the EyeWriter provides detailed build instructions for creating a head-mounted eye tracker from a modified Playstation Eye USB camera. The project was designed to enable digital drawing by eye gaze control for artists with ALS while using the device with the accompanying open source software.
  • J. San Agustin H. Skovsgaard
  • J. P. Hansen J. P. Hansen
  • D. W. Hansen Low-cost gaze interaction: ready to deliver the promises.
  • the head-mounted eye gaze systems mentioned above facilitate effective interactive experiences with some limiting constraints.
  • these solutions are designed for interaction with fixed computer displays or 2D scene images.
  • These types of systems provide a 2D PoG, which does not directly translate into the 3D world.
  • An accurate estimate of the 3D user PoG can be especially useful in mobile applications, human-robot interaction, and in designing intelligent assistive environments.
  • Knowledge of the 3D PoG within an environment can be used to detect user attention and intention to interact, leading to multimodal attentive systems able to adapt to the user state.
  • Some mobile 3D PoG tracking systems have been proposed in literature. For example, a head-mounted multi-camera system has been presented that estimates the 3D PoG by computing the intersection of the optical axis of both eyes. This approach gives the 3D PoG relative to the user's frame of reference, but does not provide a mapping of this point to the environment in which the user is present.
  • a similar stereo camera approach is presented in K. Takemura, Y. Kohashi, T. Suenaga, J. Takamatsu, and T. Ogasawara. Estimating 3D point-of-regard and visualizing gaze trajectories under natural head movements.
  • the apparatus includes an eye tracking camera configured to track the movements of a user's eye.
  • a scene camera may be configured to create a three-dimensional image and a two-dimensional image in the direction of the user's gaze.
  • the point of gaze apparatus may include an image processing module that is configured to identify a point of gaze of the user and identify an object located at the user's point of gaze. The point of gaze apparatus may identify the object by using information from the eye tracking camera and the scene camera.
  • the apparatus may include an illumination source configured to illuminate the user's eye.
  • the illumination source may be an infrared light emitting diode.
  • the eye tracking camera may include an infrared pass filter.
  • the eye tracking camera and scene camera of the point of gaze apparatus may be mounted on a wearable headset.
  • the scene camera may be an RGB-D camera.
  • a point of gaze apparatus may include a means for tracking the movement of an eye.
  • the means for tracking may be a USB camera, for example.
  • the point of gaze apparatus may include a means for imaging a scene.
  • the means for imaging the scene may be an RGB-D camera, for example.
  • the point of gaze apparatus may include a means for using information gathered by the means for tracking and information from the means for imaging to identify an object seen by the eye.
  • the means for imaging may be a general purpose computer programmed to perform the steps disclosed in the flow chart of FIG. 6 .
  • the point of gaze apparatus may include a means for mounting the means for tracking and means for imaging to a user's head.
  • the means for mounting may be a pair of goggles or glasses that a user can wear.
  • a method is also presented for estimating a point of gaze.
  • the method in the disclosed embodiments substantially includes the steps necessary to carry out the functions presented above with respect to the operation of the described apparatus and system.
  • the method includes tracking the movement of a user's eye with an eye tracking camera.
  • the method may include obtaining a three-dimensional image and a two-dimensional image in the direction of the user's gaze.
  • the method may include identifying an object in a point of gaze of the user using the eye tracking camera, three-dimensional image, and two dimensional image.
  • tracking the movement of the user's eye may include measuring a corneal reflection of the user's eye.
  • the method may include calibrating the eye tracking camera before tracking the movement of the user's eye.
  • the user's point of gaze may be calculated using a pupil tracking algorithm.
  • identifying the object may include identifying a euclidean cluster in the three-dimensional image closest to the user's point of gaze.
  • the method may include identifying a region of interest in the euclidean cluster and identifying a shape of the object from points in the region of interest. For example, identification of the shape of the object may be performed using the RANSAC algorithm.
  • the method may include using a region of the two-dimensional image corresponding to the image cluster to identify the object.
  • the region of the two-dimensional image may be compared to a reference image. For example, the comparison may be performed using the SURF method.
  • identifying the object may include comparing a histogram of a region of the two-dimensional image near the near the point of gaze to a reference histogram.
  • the method may include calculating a plurality of geometric classification match scores between the object and a plurality of reference objects. For example, the method may include calculating a plurality of keypoint match scores between the object and the plurality of reference objects. In addition, the method may include calculating a plurality of histogram comparison scores between the object and the plurality of reference object. Also, the method may include identifying a reference object based on the sum of geometric classification match score, keypoint match score, and histogram comparison score. In some embodiments, the sum is a weighted sum.
  • Coupled is defined as connected, although not necessarily directly, and not necessarily mechanically.
  • substantially and its variations are defined as being largely but not necessarily wholly what is specified as understood by one of ordinary skill in the art, and in one non-limiting embodiment “substantially” refers to ranges within 10%, preferably within 5%, more preferably within 1%, and most preferably within 0.5% of what is specified.
  • a step of a method or an element of a device that “comprises,” “has,” “includes” or “contains” one or more features possesses those one or more features, but is not limited to possessing only those one or more features.
  • a device or structure that is configured in a certain way is configured in at least that way, but may also be configured in ways that are not listed.
  • FIG. 1 is a headset hardware solution for a 3D Point of Gaze apparatus.
  • FIG. 2A is an image of an eye to illustrate calculations made to determine a user's point of gaze.
  • FIG. 2B shows a user's gaze as he or she scans a table with objects.
  • FIGS. 3A-3F show the results of a disclosed method for identifying an object at a user's point of gaze.
  • FIG. 4 shows an example of using SURF keypoint matches to identify an object.
  • FIG. 5 shows an experimental setup for using a point of gaze apparatus.
  • FIG. 6 is a flow chart for a method of using a point of gaze apparatus.
  • This application discloses a novel head-mounted system that provides additional data modalities that are not present in previous solutions. We show that the effective integration of these modalities can provide knowledge of gaze interaction with environmental objects to aid the development of intelligent human spaces.
  • the solution considers three key data modalities for 3D PoG estimation and environment interaction in real-time. First, an eye tracking camera is used to estimate the 2D PoG. Next, an RGB-D scene camera is used to acquire two additional modalities: A 3D representation of the environment structure and a color image in the direction of the user's gaze. Then, according to methods disclosed herein the 2D PoG is transformed to 3D coordinates, and show that the objects are able to be identified using a combination of computer vision techniques and 3D processing. The disclosed experimental results show that accurate classification results are achieved by combining the multiple data modalities.
  • the solution presented in this disclosure is designed to provide information about the environment existing around the user, together with the points or areas within the environment that the user interacts with visually.
  • a wearable headset was developed that provides a 3D scan of the area in front of the user, a color image of this area, and an estimate of the user's visual PoG.
  • These three data modalities are provided by an eye tracking camera, which observes the user's eye motions, and a forward facing RGB-D camera, providing the scene image and 3D representation.
  • These two components are mounted on rigid eyeglass frames such that their position remains fixed relative to the user's head during movement.
  • An example of a complete headset hardware solution is shown in FIG. 1 .
  • the system eye tracking feature is accomplished using an eye tracking camera 102 (such as an embedded USB camera module) equipped with an infrared pass filter 104 .
  • the user's eye is illuminated with a single infrared LED 106 to provide consistent image data in various ambient lighting conditions.
  • the LED 106 also produces a corneal refection on the user's eye, which can be seen by the eye tracking camera 102 and exploited to enhance tracking accuracy.
  • the LED 106 may be chosen according to particular guidelines to ensure that the device can be used safely for indefinite periods of time.
  • the eye tracking camera 102 is positioned such that the image frame is centered in front of one of the user's eyes.
  • the module can be easily moved from the left or right side of the headset frame so that either eye may be used (to take advantage of user preference or eye dominance), while fine adjustments to the camera position and orientation are possible by manipulating the flexible mounting arm 108 .
  • streaming video frames from the eye tracking cameral 102 are provided with a resolution of 640 ⁇ 480 at a rate of 30 Hz, which facilitates accurate tracking of the pupil and corneal reflection using computer vision techniques.
  • Information about the user's environment may be provided, for example, by a forward-facing RGB-D camera, such as the Asus XtionPRO Live.
  • This device provides a 640 ⁇ 480 color image of the environment along with a 640 ⁇ 480 depth range image at a rate of 30 Hz. The two images are obtained from individual imaging sensors and registered by the device such that each color pixel value is assigned actual 3D coordinates in space. This provides a complete scanning solution for the environment in the form of 3D “point clouds”, which can be further processed in software.
  • This section describes the computational approach that may be used for object of interest identification and classification.
  • the four steps of the process are to: 1) estimate the PoG using the eye and scene cameras, 2) assign a geometric classification based on the 3D object of interest structure, 3) perform visual classification using SURF feature matching, color histograms, and, 4) fuse the multimodal data for a final result.
  • An estimate of the user PoG may be computed using a pupil tracking algorithm.
  • This algorithm creates a mapping between pupil positions and 2D scene image coordinates after a simple calibration routine is performed.
  • an ellipse is fitted to the pupil such that the ellipse center provides an accurate estimate of the pupil center.
  • FIG. 2A shows a graphical representation of a fitted pupil ellipse 202 around a pupil 204 computed from a single eye tracking camera 102 image frame.
  • the mapping from pupil coordinates 208 to 2D scene image coordinates may be accomplished, in one embodiment, by a nine-point calibration procedure.
  • the user sequentially gazes upon nine different points in the scene image.
  • the pupil coordinates for each calibration point is saved, and the nine point mapping is used to interpolate a 2D PoG from future eye tracking camera frames.
  • the 3D PoG can be obtained from the 2D points by looking up the 3D coordinates of the pixel in the point cloud data structure provided by the RGB-D camera. Exploitation of the RGB-D point cloud structure removes the need for stereo eye tracking during 3D PoG estimation as used in other methods.
  • FIG. 2B shows a user's gaze as he or she scans a table with objects.
  • Point cloud manipulation may be performed with the utilization of the Point Cloud Library (PCL).
  • PCL Point Cloud Library
  • PCL provides the methods necessary to extract information from point clouds, the contribution presented in this section is the overall process for which the given methods are applied.
  • a series of operations may be performed on the point cloud to remove points that are not of interest.
  • a large portion of the point cloud is comprised of these points, which include the points that correspond to the floor, wall, or ceiling, and the points that lie outside the area of interest.
  • points are not of interests due to the fact that points of interest must provide interactivity and lie within a reasonable distance to the user's PoG.
  • Planar models may be quicker to detect than more complex models, such as cylinders or spheres, so may be beneficial to remove large planes from the point cloud prior to detecting the models belonging to the more interactive geometries.
  • Planes corresponding to tables, walls, the ceiling or floor will span a large portion of the point cloud. Due to this it will not be necessary to perform the planar segmentation on the full point cloud, and down sampling of the point cloud can be performed. This will provide a performance increase since the fidelity of the point cloud is reduced, while allowing large models to maintain their structure within the point cloud. The removal of these large planes from the point cloud is useful in reducing the point cloud size, as these will not provide valuable interaction for the user.
  • Objects that are of interest are comprised of several points that are relatively close together and are not disjoint.
  • PCL provides a method to detect the euclidean clusters within a point cloud. These clusters are found by linking points together that are within a defined distance threshold, which further emphasizes the importance of removing large planes, since they will connect clusters that otherwise would be disjoint.
  • the PoG is combined with the point cloud to determine the cluster closest to the PoG.
  • This cluster is extracted from the point cloud.
  • the extracted cluster provides a region of interest within the original point-cloud, and the final model segmentation is performed on the subset of points from the initial point cloud that lie inside the area of the extracted cluster region.
  • Model parameter estimation may be done using the RANSAC algorithm. This model parameter estimation is also done in similar fashion when estimating the planar coefficients discussed previously.
  • Final model classification is assigned based on the results of the segmentation over each of the specified models. The currently-available geometric classifications belong to the set ⁇ cylinder, sphere, other ⁇ .
  • FIGS. 3A-3F show the results of manipulating a point cloud to identify an object of interest.
  • FIG. 3A shows an original point cloud of a scene. In this example there are three potential objects of interest, oatmeal 304 , a basketball 306 and raisins 308 , all set on a table 310 .
  • FIG. 3B the planes of the table 310 and walls have been removed, leaving only the three potential object of interest.
  • FIG. 3C the euclidean clustering is performed to identify the point cloud clusters around the object of interest.
  • the euclidean cluster belonging to the basketball is selected as being in the users PoG.
  • segmentation is performed to detect the shape of the model of interest (cylinder, sphere, or other).
  • FIG. 3F a portion of the 2D image corresponding to the object of interest is cropped to include only the object of interest.
  • the input for these methods consists of the geometric classification and a cropped 2D RGB image representing the final extracted point cloud.
  • the cropped image comes from creating a bounding box relative to the 2D RGB image of the region of interest containing the extracted cluster.
  • the system maintains a knowledge base of SURF features and descriptors for all reference object images. For these images, the keypoints and descriptors are precomputed and stored to avoid recalculation each time an object is to be identified.
  • the feature/descriptor calculations for the query object images are necessarily performed on-the-fly as object identifications are requested.
  • the query object image keypoints are compared to those of each reference object image to determine similarity.
  • One method is to use a modified version of the robust feature matching approach described in R. Laganiere. OpenCV 2 Computer Vision Application Programming Cookbook. Packt Publishing, June 2011, to do so.
  • a k-nearest-neighbors search is performed to match each keypoint descriptor in the query image with the two most similar descriptors in the reference image, and vice versa. These matches enter a series of tests to narrow down the list of those that are accepted. First, if the two nearest-neighbor matches are too similar to reliably determine which is the better match, neither is used. Otherwise, the best match is tentatively accepted.
  • FIG. 4 shows an example of using SURF keypoint matches to identify an object.
  • the algorithm compares and matches keypoints 406 in a query image 402 to keypoints in a reference image 404 .
  • RGB red-green
  • the histograms we used contain eight bins in each dimension. So, for the normalized RG color space, we used 2-dimensional 8 ⁇ 8 histograms for a total of sixty-four bins.
  • the histograms for the reference object images are computed and stored in the knowledge base for easy comparison later, while the histograms for the test images are calculated at identification time.
  • the similarity between the query image histogram and each reference image histogram is calculated using normalized cross-correlation to obtain a value in the range [ ⁇ 1, 1].
  • n the number of keypoints matched from the query object image to each reference object image is stored as a raw score, n for that particular reference object.
  • a final, normalized SURF score ⁇ [0; 1] is calculated for each reference object i:
  • normalized cross-correlation values obtained from the histogram comparisons are stored for each reference object image as a raw histogram score, h ⁇ [ ⁇ 1; 1].
  • a final normalized histogram score ⁇ [ ⁇ 1, 1] is calculated for each object i:
  • the third score we calculate is a simple geometric classification match score ⁇ i for each reference object image i.
  • ⁇ i the query image's detected classification c is compared to the reference classification d i :
  • a final score S i is calculated for each object i as a linear combination of the three scores.
  • the object O can now be identified as:
  • a participant 502 sat in multiple positions in front of a table with an assortment of objects placed on top. They were free to move their head, eyes, and body. We instructed the participant to focus their gaze on an object and notify us with a verbal cue when this was accomplished. On this cue, a trigger event for the system to identify the object was issued. The PoG calibration was performed prior to system use, and the calibration result was checked for validity. In the experiment, the participant focused his gaze on each of the objects from three different locations at distances of up to 2 meters. Calibration may be done by looking at known positions in a set order.
  • a red dot on a wall and collect gaze points as the user moves his or her head slightly (so that the pupils move while following the dot).
  • a “calibration wand” may be used to give the user a point on which to focus during the calibration routine.
  • the knowledge base used for image comparison and identification consisted of fifteen objects that varied in size from a baseball to a musical keyboard. Each object had two previously-collected training images from different angles and distances, which had been obtained using the same headset and automatically cropped via the method described above.
  • Table 1 shows the object identification accuracy for the various classifiers in the system, both individually and in combination.
  • the systems and methods disclosed herein illustrate impact of combining PoG estimation techniques with low-cost 3D scanning devices such as RGB-D cameras.
  • the data modalities provided by the headset can be analyzed in such a way that user intent and visual attention can be detected and utilized by other environment actors, such as caregivers or robotic agents.
  • FIG. 6 illustrates one embodiment of a method 600 for use with a mobile, low-cost headset for 3D point of gaze estimation.
  • the method 600 includes the step 602 of tracking the movement of a user's eye with an eye tracking camera.
  • the eye tracking camera may be a USB camera mounted on a headset.
  • the method includes the step of obtaining a three-dimensional image and a two-dimensional image of the user's field of view. The two images may be obtained using an RGB-D camera.
  • the method may include the step of identifying an object of interest.
  • the object of interest may be a euclidean cluster in the 3D point cloud or a cropped image in the 2D image.
  • the method may include the step of creating a geometric classification of the object of interest.
  • the object of interest may be identified as a sphere or a cylinder.
  • the method may include creating a histogram of the object of interest. The histogram may describe the colors exhibited by the object.
  • the method may include the step of creating a keypoint match score for the object of interest. As described above, the keypoint match score may be computed using the SURF algorithm.
  • the method may include the step of using the geometric classification, histogram, and keypoint match score to identify the object of interest. In some embodiments, the geometric classification, histogram, and keypoint match score may be weighted to increase the accuracy of the method in identifying the object.

Abstract

An apparatus, system, and method for a mobile, low-cost headset for 3D point of gaze estimation. A point of gaze apparatus may include an eye tracking camera configured to track the movements of a user's eye and a scene camera configured to create a three-dimensional image and a two-dimensional image in the direction of the user's gaze. The point of gaze apparatus may include an image processing module configured to identify a point of gaze of the user and identify an object located at the user's point of gaze by using information from the eye tracking camera and the scene camera.

Description

RELATED APPLICATION
This application claims priority to U.S. Provisional Patent Application 61/876,038 entitled “Apparatuses, System, and Method for Mobile, Low-Cost Headset for 3D Point of Gaze Estimation,” and filed on Sep. 10, 2013, the entire contents of which are incorporated herein by reference without disclaimer.
STATEMENT REGARDING FEDERALLY-SPONSORED RESEARCH
This invention was made with government support under grant numbers CNS 0923494, CNS 1035913, and IIS 1238660 awarded by the National Science Foundation. The government has certain rights in the invention.
BACKGROUND OF THE INVENTION
1. Field of Invention
This invention relates to 3D point of gaze apparatus and more particularly relates to an apparatus system and method for mobile, low-cost, head-mounted, 3D point of gaze estimation.
2. Description of Related Art
Eye gaze based interaction has many useful applications in human-machine interfaces, assistive technologies, and multimodal systems. Traditional input methods, such as the keyboard and mouse, are not practical in many situations and can be ineffective for some users with physical impairments. Knowledge of a user's point of gaze (PoG) can be a powerful data modality in intelligent systems by facilitating intuitive control, perception of user intent, and enhanced interactive experiences.
Gaze tracking devices have proven to be extremely beneficial to impaired users. In one case study presented (V. Pasian, F. Corno, I. Signorile, and L. Farinetti. The Impact of Gaze Controlled Technology on Quality of Life. In Gaze Interaction and Applications of Eye Tracking: Advances in Assistive Technologies, chapter 6, pages 48-54. IGI Global, 2012.) sixteen amyotrophic lateral sclerosis (ALS) patients with severe motor impairments (loss of mobility, unable to speak, etc.) were introduced to eye tracking devices during a 1-2 week period. The patients were assessed by a psychologist during an initial meeting in order to evaluate their general quality of life. Eye tracking devices and proper training, as well as access to a speech and language therapist and a computer engineer, were provided for the duration of the study. Patients completed questionnaires related to their experiences with the equipment several times during the study. Several patients reported a clear positive impact on their quality of life during the study, resulting from the enhanced communication facilitated by the eye tracking devices over other non-gaze based assistive devices.
While the utility of gaze interaction in a variety of applications has been demonstrated, the availability of the technology has been a limiting factor in more widespread use. Due to the relatively high monetary cost and proprietary nature associated with commercial eye tracking equipment and software, several low-cost solutions have been developed using inexpensive on-the-shelf components. Many of these designs have been made publicly available through the open source community. The openEyes project (D. Li, J. Babcock, and D. J. Parkhurst. openEyes: a low-cost head-mounted eye-tracking solution. In Proceedings of the 2006 symposium on Eye tracking research & applications—ETRA '06, page 95, New York, N.Y., USA, 2006. ACM Press.) presents a low-cost head-mounted eye tracker that uses a pair of inexpensive IEEE-1394 cameras to capture images of both the eye and scene. This hardware device, coupled with the open source Starburst algorithm, facilitates estimation of the user PoG in the 2D scene image. A similar open source project, the EyeWriter, provides detailed build instructions for creating a head-mounted eye tracker from a modified Playstation Eye USB camera. The project was designed to enable digital drawing by eye gaze control for artists with ALS while using the device with the accompanying open source software. Interestingly, in J. San Agustin, H. Skovsgaard, J. P. Hansen, and D. W. Hansen. Low-cost gaze interaction: ready to deliver the promises. In Proceedings of the 27th international conference extended abstracts on Human factors in computing systems—CHI EA '09, page 4453, New York, N.Y., USA, 2009. ACM Press., the effectiveness of a low-cost eye tracker is shown to be comparable to that of commercial devices for target acquisition and eye-typing activities.
The head-mounted eye gaze systems mentioned above facilitate effective interactive experiences with some limiting constraints. In general, these solutions are designed for interaction with fixed computer displays or 2D scene images. These types of systems provide a 2D PoG, which does not directly translate into the 3D world. An accurate estimate of the 3D user PoG can be especially useful in mobile applications, human-robot interaction, and in designing intelligent assistive environments. Knowledge of the 3D PoG within an environment can be used to detect user attention and intention to interact, leading to multimodal attentive systems able to adapt to the user state.
Some mobile 3D PoG tracking systems have been proposed in literature. For example, a head-mounted multi-camera system has been presented that estimates the 3D PoG by computing the intersection of the optical axis of both eyes. This approach gives the 3D PoG relative to the user's frame of reference, but does not provide a mapping of this point to the environment in which the user is present. A similar stereo camera approach is presented in K. Takemura, Y. Kohashi, T. Suenaga, J. Takamatsu, and T. Ogasawara. Estimating 3D point-of-regard and visualizing gaze trajectories under natural head movements. In Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications—ETRA '10, volume 1, page 157, New York, N.Y., USA, 2010. ACM Press., which also includes a forward-facing scene camera for mapping of the 3D PoG to scene coordinates. While multi-camera approaches such as these provide a 3D PoG, their use is limited by increased uncertainty at increasing PoG depths. Another limiting factor is the scene camera, which is generally a standard 2D camera that does not provide any 3D information of the environment itself.
SUMMARY OF THE INVENTION
An point of gaze apparatus is presented. In one embodiment, the apparatus includes an eye tracking camera configured to track the movements of a user's eye. In some embodiments, a scene camera may be configured to create a three-dimensional image and a two-dimensional image in the direction of the user's gaze. In addition, in some embodiments, the point of gaze apparatus may include an image processing module that is configured to identify a point of gaze of the user and identify an object located at the user's point of gaze. The point of gaze apparatus may identify the object by using information from the eye tracking camera and the scene camera.
In some embodiments, the apparatus may include an illumination source configured to illuminate the user's eye. For example, the illumination source may be an infrared light emitting diode. In some embodiments, the eye tracking camera may include an infrared pass filter.
In some embodiments, the eye tracking camera and scene camera of the point of gaze apparatus may be mounted on a wearable headset. Furthermore, the scene camera may be an RGB-D camera.
In some embodiments, a point of gaze apparatus may include a means for tracking the movement of an eye. The means for tracking may be a USB camera, for example. The point of gaze apparatus may include a means for imaging a scene. The means for imaging the scene may be an RGB-D camera, for example. Furthermore, the point of gaze apparatus may include a means for using information gathered by the means for tracking and information from the means for imaging to identify an object seen by the eye. The means for imaging may be a general purpose computer programmed to perform the steps disclosed in the flow chart of FIG. 6. Furthermore, in some embodiments, the point of gaze apparatus may include a means for mounting the means for tracking and means for imaging to a user's head. For example, the means for mounting may be a pair of goggles or glasses that a user can wear.
A method is also presented for estimating a point of gaze. The method in the disclosed embodiments substantially includes the steps necessary to carry out the functions presented above with respect to the operation of the described apparatus and system. In one embodiment, the method includes tracking the movement of a user's eye with an eye tracking camera. In addition, in one embodiment, the method may include obtaining a three-dimensional image and a two-dimensional image in the direction of the user's gaze. Furthermore, the method may include identifying an object in a point of gaze of the user using the eye tracking camera, three-dimensional image, and two dimensional image.
In some embodiments, tracking the movement of the user's eye may include measuring a corneal reflection of the user's eye. In some embodiments, the method may include calibrating the eye tracking camera before tracking the movement of the user's eye. Furthermore, according to the disclosed methods, the user's point of gaze may be calculated using a pupil tracking algorithm. In some embodiments, identifying the object may include identifying a euclidean cluster in the three-dimensional image closest to the user's point of gaze. Furthermore, the method may include identifying a region of interest in the euclidean cluster and identifying a shape of the object from points in the region of interest. For example, identification of the shape of the object may be performed using the RANSAC algorithm.
In some embodiments, the method may include using a region of the two-dimensional image corresponding to the image cluster to identify the object. In addition, the region of the two-dimensional image may be compared to a reference image. For example, the comparison may be performed using the SURF method.
In some embodiments, identifying the object may include comparing a histogram of a region of the two-dimensional image near the near the point of gaze to a reference histogram.
In some embodiments, the method may include calculating a plurality of geometric classification match scores between the object and a plurality of reference objects. For example, the method may include calculating a plurality of keypoint match scores between the object and the plurality of reference objects. In addition, the method may include calculating a plurality of histogram comparison scores between the object and the plurality of reference object. Also, the method may include identifying a reference object based on the sum of geometric classification match score, keypoint match score, and histogram comparison score. In some embodiments, the sum is a weighted sum.
The term “coupled” is defined as connected, although not necessarily directly, and not necessarily mechanically.
The terms “a” and “an” are defined as one or more unless this disclosure explicitly requires otherwise.
The term “substantially” and its variations are defined as being largely but not necessarily wholly what is specified as understood by one of ordinary skill in the art, and in one non-limiting embodiment “substantially” refers to ranges within 10%, preferably within 5%, more preferably within 1%, and most preferably within 0.5% of what is specified.
The terms “comprise” (and any form of comprise, such as “comprises” and “comprising”), “have” (and any form of have, such as “has” and “having”), “include” (and any form of include, such as “includes” and “including”) and “contain” (and any form of contain, such as “contains” and “containing”) are open-ended linking verbs. As a result, a method or device that “comprises,” “has,” “includes” or “contains” one or more steps or elements possesses those one or more steps or elements, but is not limited to possessing only those one or more elements. Likewise, a step of a method or an element of a device that “comprises,” “has,” “includes” or “contains” one or more features possesses those one or more features, but is not limited to possessing only those one or more features. Furthermore, a device or structure that is configured in a certain way is configured in at least that way, but may also be configured in ways that are not listed.
Other features and associated advantages will become apparent with reference to the following detailed description of specific embodiments in connection with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present invention. The invention may be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein.
FIG. 1 is a headset hardware solution for a 3D Point of Gaze apparatus.
FIG. 2A is an image of an eye to illustrate calculations made to determine a user's point of gaze.
FIG. 2B shows a user's gaze as he or she scans a table with objects.
FIGS. 3A-3F show the results of a disclosed method for identifying an object at a user's point of gaze.
FIG. 4 shows an example of using SURF keypoint matches to identify an object.
FIG. 5 shows an experimental setup for using a point of gaze apparatus.
FIG. 6 is a flow chart for a method of using a point of gaze apparatus.
DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
Various features and advantageous details are explained more fully with reference to the nonlimiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well-known starting materials, processing techniques, components, and equipment are omitted so as not to unnecessarily obscure the invention in detail. It should be understood, however, that the detailed description and the specific examples, while indicating embodiments of the invention, are given by way of illustration only, and not by way of limitation. Various substitutions, modifications, additions, and/or rearrangements within the spirit and/or scope of the underlying inventive concept will become apparent to those skilled in the art from this disclosure.
This application discloses a novel head-mounted system that provides additional data modalities that are not present in previous solutions. We show that the effective integration of these modalities can provide knowledge of gaze interaction with environmental objects to aid the development of intelligent human spaces. The solution considers three key data modalities for 3D PoG estimation and environment interaction in real-time. First, an eye tracking camera is used to estimate the 2D PoG. Next, an RGB-D scene camera is used to acquire two additional modalities: A 3D representation of the environment structure and a color image in the direction of the user's gaze. Then, according to methods disclosed herein the 2D PoG is transformed to 3D coordinates, and show that the objects are able to be identified using a combination of computer vision techniques and 3D processing. The disclosed experimental results show that accurate classification results are achieved by combining the multiple data modalities.
The solution presented in this disclosure is designed to provide information about the environment existing around the user, together with the points or areas within the environment that the user interacts with visually. In order to realize these goals, a wearable headset was developed that provides a 3D scan of the area in front of the user, a color image of this area, and an estimate of the user's visual PoG. These three data modalities are provided by an eye tracking camera, which observes the user's eye motions, and a forward facing RGB-D camera, providing the scene image and 3D representation. These two components are mounted on rigid eyeglass frames such that their position remains fixed relative to the user's head during movement. An example of a complete headset hardware solution is shown in FIG. 1.
Eye Tracking Camera
In one embodiment, the system eye tracking feature is accomplished using an eye tracking camera 102 (such as an embedded USB camera module) equipped with an infrared pass filter 104. The user's eye is illuminated with a single infrared LED 106 to provide consistent image data in various ambient lighting conditions. The LED 106 also produces a corneal refection on the user's eye, which can be seen by the eye tracking camera 102 and exploited to enhance tracking accuracy. The LED 106 may be chosen according to particular guidelines to ensure that the device can be used safely for indefinite periods of time.
The eye tracking camera 102 is positioned such that the image frame is centered in front of one of the user's eyes. The module can be easily moved from the left or right side of the headset frame so that either eye may be used (to take advantage of user preference or eye dominance), while fine adjustments to the camera position and orientation are possible by manipulating the flexible mounting arm 108. In some embodiments, streaming video frames from the eye tracking cameral 102 are provided with a resolution of 640×480 at a rate of 30 Hz, which facilitates accurate tracking of the pupil and corneal reflection using computer vision techniques.
Scene RGB-D Camera
Information about the user's environment may be provided, for example, by a forward-facing RGB-D camera, such as the Asus XtionPRO Live. This device provides a 640×480 color image of the environment along with a 640×480 depth range image at a rate of 30 Hz. The two images are obtained from individual imaging sensors and registered by the device such that each color pixel value is assigned actual 3D coordinates in space. This provides a complete scanning solution for the environment in the form of 3D “point clouds”, which can be further processed in software.
Computational Approach
This section describes the computational approach that may be used for object of interest identification and classification. In general, the four steps of the process are to: 1) estimate the PoG using the eye and scene cameras, 2) assign a geometric classification based on the 3D object of interest structure, 3) perform visual classification using SURF feature matching, color histograms, and, 4) fuse the multimodal data for a final result.
Point of Gaze Estimation
An estimate of the user PoG may be computed using a pupil tracking algorithm. For example, a modified version of the starburst algorithm presented in D. Winfield and D. Parkhurst. Starburst: A hybrid algorithm for video-based eye tracking combining feature-based and model-based approaches. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05)—Workshops, 3:79-79, 2005 may be used. This algorithm creates a mapping between pupil positions and 2D scene image coordinates after a simple calibration routine is performed. During the pupil detection phase of the algorithm, an ellipse is fitted to the pupil such that the ellipse center provides an accurate estimate of the pupil center. The center of the infrared corneal reflection is detected during the next phase of the algorithm, which can then be used together with the pupil center coordinates to create the calibration mapping. Another pupil tracking algorithm that may be used is described in Robust Real-Time Pupil Tracking in Highly Off-Axis Images, ETRA '12 Proceedings of the Symposium on Eye Tracking Research and Applications, pp. 173-176, 2012. FIG. 2A shows a graphical representation of a fitted pupil ellipse 202 around a pupil 204 computed from a single eye tracking camera 102 image frame.
The mapping from pupil coordinates 208 to 2D scene image coordinates may be accomplished, in one embodiment, by a nine-point calibration procedure. During calibration, the user sequentially gazes upon nine different points in the scene image. The pupil coordinates for each calibration point is saved, and the nine point mapping is used to interpolate a 2D PoG from future eye tracking camera frames. The 3D PoG can be obtained from the 2D points by looking up the 3D coordinates of the pixel in the point cloud data structure provided by the RGB-D camera. Exploitation of the RGB-D point cloud structure removes the need for stereo eye tracking during 3D PoG estimation as used in other methods.
FIG. 2B shows a user's gaze as he or she scans a table with objects.
Geometric Classification
Point cloud manipulation may be performed with the utilization of the Point Cloud Library (PCL). PCL provides the methods necessary to extract information from point clouds, the contribution presented in this section is the overall process for which the given methods are applied.
Instead of applying the model segmentation on the initial point cloud, a series of operations may be performed on the point cloud to remove points that are not of interest. A large portion of the point cloud is comprised of these points, which include the points that correspond to the floor, wall, or ceiling, and the points that lie outside the area of interest. One can assume these points are not of interests due to the fact that points of interest must provide interactivity and lie within a reasonable distance to the user's PoG.
Planar models may be quicker to detect than more complex models, such as cylinders or spheres, so may be beneficial to remove large planes from the point cloud prior to detecting the models belonging to the more interactive geometries. Planes corresponding to tables, walls, the ceiling or floor, will span a large portion of the point cloud. Due to this it will not be necessary to perform the planar segmentation on the full point cloud, and down sampling of the point cloud can be performed. This will provide a performance increase since the fidelity of the point cloud is reduced, while allowing large models to maintain their structure within the point cloud. The removal of these large planes from the point cloud is useful in reducing the point cloud size, as these will not provide valuable interaction for the user.
Objects that are of interest are comprised of several points that are relatively close together and are not disjoint. PCL provides a method to detect the euclidean clusters within a point cloud. These clusters are found by linking points together that are within a defined distance threshold, which further emphasizes the importance of removing large planes, since they will connect clusters that otherwise would be disjoint. After the clusters are identified, the PoG is combined with the point cloud to determine the cluster closest to the PoG. This cluster is extracted from the point cloud. The extracted cluster provides a region of interest within the original point-cloud, and the final model segmentation is performed on the subset of points from the initial point cloud that lie inside the area of the extracted cluster region. When segmenting smaller objects, higher fidelity is needed with the point cloud, which is why the region must be taken from the original high-fidelity point cloud. When model segmentation is performed on this final point cloud, cylinder and sphere models are detected. Model parameter estimation may be done using the RANSAC algorithm. This model parameter estimation is also done in similar fashion when estimating the planar coefficients discussed previously. Final model classification is assigned based on the results of the segmentation over each of the specified models. The currently-available geometric classifications belong to the set {cylinder, sphere, other}.
FIGS. 3A-3F show the results of manipulating a point cloud to identify an object of interest. FIG. 3A shows an original point cloud of a scene. In this example there are three potential objects of interest, oatmeal 304, a basketball 306 and raisins 308, all set on a table 310. In FIG. 3B, the planes of the table 310 and walls have been removed, leaving only the three potential object of interest. In FIG. 3C, the euclidean clustering is performed to identify the point cloud clusters around the object of interest. In FIG. 3D, the euclidean cluster belonging to the basketball is selected as being in the users PoG. In FIG. 3E, segmentation is performed to detect the shape of the model of interest (cylinder, sphere, or other). In FIG. 3F, a portion of the 2D image corresponding to the object of interest is cropped to include only the object of interest.
Following the geometric classification, analysis is performed on the cropped RGB data to further classify the object. The input for these methods consists of the geometric classification and a cropped 2D RGB image representing the final extracted point cloud. The cropped image comes from creating a bounding box relative to the 2D RGB image of the region of interest containing the extracted cluster.
SURF Feature Matching
In order to reliably identify a query object by image comparison, there needs to be similarity between image features. Since it is unlikely that the object being identified will be in the same orientation and position relative to the reference image, it is important to calculate features that are reproducible at different scales and viewing angles. Speeded Up Robust Features (SURF) is an efficient method to find such features, called keypoints, and calculate their descriptors, which contain information about the grayscale pixel intensity distribution around the keypoints.
The system maintains a knowledge base of SURF features and descriptors for all reference object images. For these images, the keypoints and descriptors are precomputed and stored to avoid recalculation each time an object is to be identified. The feature/descriptor calculations for the query object images, on the other hand, are necessarily performed on-the-fly as object identifications are requested.
In the SURF-based object identification we perform, the query object image keypoints are compared to those of each reference object image to determine similarity. One method is to use a modified version of the robust feature matching approach described in R. Laganiere. OpenCV 2 Computer Vision Application Programming Cookbook. Packt Publishing, June 2011, to do so. A k-nearest-neighbors search is performed to match each keypoint descriptor in the query image with the two most similar descriptors in the reference image, and vice versa. These matches enter a series of tests to narrow down the list of those that are accepted. First, if the two nearest-neighbor matches are too similar to reliably determine which is the better match, neither is used. Otherwise, the best match is tentatively accepted. FIG. 4 shows several keypoint matches at this stage. Second, if a keypoint matching from the query image to the reference image is not also a match from the reference image to the query image, it is rejected. The surviving keypoint matches are validated using the epipolar constraint so that any matched points not lying on corresponding epipolar lines are rejected, and the number of remaining matches is stored for each image in the knowledge base.
FIG. 4 shows an example of using SURF keypoint matches to identify an object. The algorithm compares and matches keypoints 406 in a query image 402 to keypoints in a reference image 404.
Histogram Matching
Since multiple objects can produce similar features in SURF calculations, it may be beneficial to incorporate color information into object identification. One may use color histograms to do so, since they provide a convenient way to represent the distribution of colors in an image and can easily and efficiently be compared. To minimize the effect on histogram matching of potential differences in brightness and contrast between reference and query images, a normalized red-green (RG) color space may be used for the calculations.
The histograms we used contain eight bins in each dimension. So, for the normalized RG color space, we used 2-dimensional 8×8 histograms for a total of sixty-four bins. As with the SURF keypoints/descriptors, the histograms for the reference object images are computed and stored in the knowledge base for easy comparison later, while the histograms for the test images are calculated at identification time. To identify a query object by histogram matching, the similarity between the query image histogram and each reference image histogram is calculated using normalized cross-correlation to obtain a value in the range [−1, 1].
Data Fusion and Object Identification
To most reliably identify the object of interest, one may effectively incorporate the data from SURF feature matching, geometric classification, and histogram comparison into a single score for each object in the reference set.
For example, after SURF keypoint match calculations, the number of keypoints matched from the query object image to each reference object image is stored as a raw score, n for that particular reference object. A final, normalized SURF score αϵ[0; 1] is calculated for each reference object i:
α i = n i m , for m = max i ( n i )
Similarly, normalized cross-correlation values obtained from the histogram comparisons are stored for each reference object image as a raw histogram score, hϵ[−1; 1]. A final normalized histogram score βϵ[−1, 1] is calculated for each object i:
β i = h i k , for k = max i ( h i )
The third score we calculate is a simple geometric classification match score γi for each reference object image i. To determine γi, the query image's detected classification c is compared to the reference classification di:
γ i = { 1 : c = d i 0 : c d i
A final score Si is calculated for each object i as a linear combination of the three scores. To do so, the SURF, histogram, and geometric scores are assigned weights, wα, wβ, wγ ω ω, w, and w respectively:
S iααiββiγγi
The object O can now be identified as:
O = argmax i ( S i )
EXAMPLE
Referring to FIG. 5, to assess the ability of the system to identify the object gazed upon by the user 502, we created an experiment to reproduce a typical usage application in which the user is seated at a table and desires assistance with an item 504 on the table. The user might, for example, desire some water from a pitcher on the table, but be unable to reach for the object or request assistance through verbal means or gesturing.
To this end, we used the system software to create a knowledge base of known objects and placed an assortment of test items on the table to evaluate the system's ability to estimate the user's point of gaze, use that information to isolate the object of interest, and perform successful identification.
Experimental Setup
During our experiment, a participant 502 sat in multiple positions in front of a table with an assortment of objects placed on top. They were free to move their head, eyes, and body. We instructed the participant to focus their gaze on an object and notify us with a verbal cue when this was accomplished. On this cue, a trigger event for the system to identify the object was issued. The PoG calibration was performed prior to system use, and the calibration result was checked for validity. In the experiment, the participant focused his gaze on each of the objects from three different locations at distances of up to 2 meters. Calibration may be done by looking at known positions in a set order. For example, one can place a red dot on a wall and collect gaze points as the user moves his or her head slightly (so that the pupils move while following the dot). In addition a “calibration wand” may be used to give the user a point on which to focus during the calibration routine.
Data was acquired using the headset described above, while computations were performed in real-time on a Lenovo Ideapad Y560 laptop running the Linux operating system. The laptop was equipped with a 2.20 GHz Core i7 processor with 4 GB DDR3 1333 memory.
The knowledge base used for image comparison and identification consisted of fifteen objects that varied in size from a baseball to a musical keyboard. Each object had two previously-collected training images from different angles and distances, which had been obtained using the same headset and automatically cropped via the method described above.
Experimental Results
After running the experiments, the raw scores of the image comparisons were processed to determine the optimal values for the three weights discussed above. Once the score weights were adjusted, the results were collected and analyzed. Table 1 shows the object identification accuracy for the various classifiers in the system, both individually and in combination.
TABLE 1
Object identification results
Classifier Accuracy
SURF Matching 0.711
Histogram Matching 0.622
SURF + Histograms 0.756
SURF + Histograms + Geometry 0.844
As can be seen from the results, the ability to identify the object of a user's gaze significantly improves as additional classifiers are added. Since SURF feature matching is a popularly used method of object matching, we use its accuracy as a baseline for our analysis. We see a significant 18.7% increase in correct object identifications by incorporating color histogram and geometric classification data with SURF matching. These results clearly illustrate the benefit of fusing multiple data modalities. The average execution times, in seconds, for each step in the identification method are presented in Table 2.
TABLE 2
Table of average runtimes
Classifier Execution time (s)
Geometric Classification 0.329
SURF Matching 0.201
Histogram Matching 0.001
The systems and methods disclosed herein illustrate impact of combining PoG estimation techniques with low-cost 3D scanning devices such as RGB-D cameras. The data modalities provided by the headset can be analyzed in such a way that user intent and visual attention can be detected and utilized by other environment actors, such as caregivers or robotic agents.
The results of the experiment show that the combination of classification methods using multiple data modalities increases overall accuracy. Weighting the individual classification methods in the final data fusion step allows for a higher emphasis to be placed on different modalities at different times, which could facilitate dynamic adjustment of weights based on external factors such as lighting conditions.
While the experimental portion of this work focused mainly on 3D object recognition, the 3D PoG estimation provided by the combination of eye tracking and RGB-D modalities is extremely useful by itself. The utility of this approach warrants further investigation and comparison with existing 3D PoG methods, such as stereo eye tracking. Given that the inclusion of the RGB-D scene camera removes the need for multiple eye tracking cameras, it follows that the area obstructed by optical devices in the user's field of vision would be minimized. The trade-off between multiple eye tracking cameras and a bulkier RGB-D scene camera will likely improve significantly with time as the technology matures and miniaturizes.
The schematic flow chart diagrams that follow are generally set forth as logical flow chart diagrams. As such, the depicted order and labeled steps are indicative of one embodiment of the presented method. Other steps and methods may be conceived that are equivalent in function, logic, or effect to one or more steps, or portions thereof, of the illustrated method. Additionally, the format and symbols employed are provided to explain the logical steps of the method and are understood not to limit the scope of the method. Although various arrow types and line types may be employed in the flow chart diagrams, they are understood not to limit the scope of the corresponding method. Indeed, some arrows or other connectors may be used to indicate only the logical flow of the method. For instance, an arrow may indicate a waiting or monitoring period of unspecified duration between enumerated steps of the depicted method. Additionally, the order in which a particular method occurs may or may not strictly adhere to the order of the corresponding steps shown.
FIG. 6 illustrates one embodiment of a method 600 for use with a mobile, low-cost headset for 3D point of gaze estimation. In one embodiment, the method 600 includes the step 602 of tracking the movement of a user's eye with an eye tracking camera. As discussed above, the eye tracking camera may be a USB camera mounted on a headset. At step 604, the method includes the step of obtaining a three-dimensional image and a two-dimensional image of the user's field of view. The two images may be obtained using an RGB-D camera. At step 606 the method may include the step of identifying an object of interest. The object of interest may be a euclidean cluster in the 3D point cloud or a cropped image in the 2D image. At step 608, the method may include the step of creating a geometric classification of the object of interest. For example, the object of interest may be identified as a sphere or a cylinder. At step 610 the method may include creating a histogram of the object of interest. The histogram may describe the colors exhibited by the object. At step 612, the method may include the step of creating a keypoint match score for the object of interest. As described above, the keypoint match score may be computed using the SURF algorithm. Finally, the method may include the step of using the geometric classification, histogram, and keypoint match score to identify the object of interest. In some embodiments, the geometric classification, histogram, and keypoint match score may be weighted to increase the accuracy of the method in identifying the object.
All of the methods disclosed and claimed herein can be made and executed without undue experimentation in light of the present disclosure. While the apparatus and methods of this invention have been described in terms of preferred embodiments, it will be apparent to those of skill in the art that variations may be applied to the methods and in the steps or in the sequence of steps of the method described herein without departing from the concept, spirit and scope of the invention. For example, in some embodiments, a histogram may be particularly helpful (and therefore more heavily weighted) if the objects of interest are color-coded. In addition, modifications may be made to the disclosed apparatus and components may be eliminated or substituted for the components described herein where the same or similar results would be achieved. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope, and concept of the invention as defined by the appended claims.

Claims (19)

The invention claimed is:
1. A mobile point of gaze apparatus capable of being mounted to the head of a user, comprising:
(a) an eye tracking camera configured to generate information associated with movements of a user's eye;
(b) a scene camera configured to create a three-dimensional image and a two-dimensional image in a direction of a user's gaze; and
(c) an image processing module configured to identify a point of gaze of the user and identify an object located at the user's point of gaze based on information from the eye tracking camera and the scene camera,
wherein the image processing module is configured to identify the object by:
generating an initial model of a scene using the information from the scene camera;
generating a reduced model of the scene that omits one or more portions of the initial model, wherein the one or more portions of the initial model that are omitted from the reduced model correspond to portions that are not used to identify objects within the initial model;
identifying one or more clusters within the reduced model; and
identifying, within the reduced model, a cluster of interest corresponding to the user's point of gaze;
identifying a region within the initial model corresponding to the identified cluster of interest; and
identifying the object based at least in part on the region identified within the initial model and the point of gaze of the user.
2. The apparatus of claim 1, further comprising an illumination source configured to illuminate the user's eye.
3. The apparatus of claim 2, where the illumination source is an infrared light emitting diode.
4. The apparatus of claim 3, where the eye tracking camera further comprises an infrared pass filter.
5. The apparatus of claim 1, where the eye tracking camera and scene camera are mounted on a wearable headset.
6. The apparatus of claim 1, where the scene camera is an RGB-D camera.
7. A mobile point of gaze apparatus capable of being mounted to the head of a user, the apparatus comprising:
a means for tracking movement of an eye;
a means for imaging a scene; and
a means for using information gathered by the means for tracking and information from the means for imaging to identify an object seen by the eye, wherein the object seen by the eye is identified by:
generating an initial model of the scene using the information from the means for imaging the scene;
generating a reduced model of the scene that omits one or more portions of the initial model, wherein the one or more portions of the initial model that are omitted from the reduced model correspond to portions that are not used to identify objects within the initial model;
identifying one or more clusters within the reduced model; and
identifying, within the reduced model a cluster of interest corresponding to the object seen by the eye;
identifying a region within the initial model corresponding to the identified cluster of interest; and
identifying the object based at least in part on the region identified within the initial model.
8. The apparatus of claim 7, further comprising a means for mounting the means for tracking and means for imaging to a user's head.
9. A method for estimating a point of gaze, the method comprising:
tracking movement of a user's eye with an eye tracking camera;
obtaining a three-dimensional image and a two-dimensional image in a direction of a user's gaze; and
identifying an object in a point of gaze of the user using the eye tracking camera, three-dimensional image, and two dimensional image, where identifying the object comprises:
calculating a plurality of geometric classification match scores between the object and a plurality of reference objects;
calculating a plurality of keypoint match scores between the object and the plurality of reference objects;
calculating a plurality of histogram comparison scores between the object and the plurality of reference objects; and
identifying the object based on a sum of a geometric classification match score, a keypoint match score, and a histogram comparison score between the object and each reference object.
10. The method of claim 9, where tracking the movement of the user's eye comprises measuring a corneal reflection of the user's eye.
11. The method of claim 9, further comprising calibrating the eye tracking camera before tracking the movement of the user's eye.
12. The method of claim 9, where the user's point of gaze is calculated using a pupil tracking algorithm.
13. The method of claim 9, where identifying the object comprises:
identifying a Euclidean cluster in the three-dimensional image closest to the user's point of gaze;
identifying a region of interest in the Euclidean cluster; and
identifying a shape of the object from points in the region of interest.
14. The method of claim 13, where the identification of the shape of the object is performed using a random sample consensus (RANSAC) algorithm.
15. The method of claim 13, further comprising using a region of the two-dimensional image corresponding to the Euclidean cluster to identify the object.
16. The method of claim 15, where the region of the two-dimensional image is compared to a reference image.
17. The method of claim 16, where the comparison is performed using a speeded up robust features (SURF) method.
18. The method of claim 9, where identifying the object further comprises comparing a histogram of a region of the two-dimensional image near the point of gaze to a reference histogram.
19. The method of claim 9, where the sum is a weighted sum.
US14/482,109 2013-09-10 2014-09-10 Apparatus, system, and method for mobile, low-cost headset for 3D point of gaze estimation Active 2035-09-03 US10007336B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/482,109 US10007336B2 (en) 2013-09-10 2014-09-10 Apparatus, system, and method for mobile, low-cost headset for 3D point of gaze estimation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201361876038P 2013-09-10 2013-09-10
US14/482,109 US10007336B2 (en) 2013-09-10 2014-09-10 Apparatus, system, and method for mobile, low-cost headset for 3D point of gaze estimation

Publications (2)

Publication Number Publication Date
US20150070470A1 US20150070470A1 (en) 2015-03-12
US10007336B2 true US10007336B2 (en) 2018-06-26

Family

ID=52625213

Family Applications (1)

Application Number Title Priority Date Filing Date
US14/482,109 Active 2035-09-03 US10007336B2 (en) 2013-09-10 2014-09-10 Apparatus, system, and method for mobile, low-cost headset for 3D point of gaze estimation

Country Status (1)

Country Link
US (1) US10007336B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032278A (en) * 2019-03-29 2019-07-19 华中科技大学 A kind of method for recognizing position and attitude, the apparatus and system of human eye attention object
US10983593B2 (en) * 2014-07-31 2021-04-20 Samsung Electronics Co., Ltd. Wearable glasses and method of displaying image via the wearable glasses

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015066332A1 (en) * 2013-10-30 2015-05-07 Technology Against Als Communication and control system and method
US9454806B2 (en) * 2014-01-21 2016-09-27 Nvidia Corporation Efficient approximate-nearest-neighbor (ANN) search for high-quality collaborative filtering
US9558712B2 (en) 2014-01-21 2017-01-31 Nvidia Corporation Unified optimization method for end-to-end camera image processing for translating a sensor captured image to a display image
KR101909006B1 (en) * 2014-07-24 2018-10-17 고쿠리츠켄큐카이하츠호진 카가쿠기쥬츠신코키코 Image registration device, image registration method, and image registration program
US9888843B2 (en) * 2015-06-03 2018-02-13 Microsoft Technology Licensing, Llc Capacitive sensors for determining eye gaze direction
CN105488509A (en) * 2015-11-19 2016-04-13 Tcl集团股份有限公司 Image clustering method and system based on local chromatic features
US10444972B2 (en) 2015-11-28 2019-10-15 International Business Machines Corporation Assisting a user with efficient navigation between a selection of entries with elements of interest to the user within a stream of entries
US10068134B2 (en) * 2016-05-03 2018-09-04 Microsoft Technology Licensing, Llc Identification of objects in a scene using gaze tracking techniques
US20170323149A1 (en) * 2016-05-05 2017-11-09 International Business Machines Corporation Rotation invariant object detection
US10776661B2 (en) * 2016-08-19 2020-09-15 Symbol Technologies, Llc Methods, systems and apparatus for segmenting and dimensioning objects
US9972158B2 (en) * 2016-10-01 2018-05-15 Cantaloupe Systems, Inc. Method and device of automatically determining a planogram in vending
US10175650B2 (en) * 2017-01-16 2019-01-08 International Business Machines Corporation Dynamic hologram parameter control
CN110325818B (en) * 2017-03-17 2021-11-26 本田技研工业株式会社 Joint 3D object detection and orientation estimation via multimodal fusion
US10432913B2 (en) 2017-05-31 2019-10-01 Proximie, Inc. Systems and methods for determining three dimensional measurements in telemedicine application
US10872246B2 (en) * 2017-09-07 2020-12-22 Regents Of The University Of Minnesota Vehicle lane detection system
WO2019154511A1 (en) 2018-02-09 2019-08-15 Pupil Labs Gmbh Devices, systems and methods for predicting gaze-related parameters using a neural network
US11393251B2 (en) 2018-02-09 2022-07-19 Pupil Labs Gmbh Devices, systems and methods for predicting gaze-related parameters
WO2019154509A1 (en) 2018-02-09 2019-08-15 Pupil Labs Gmbh Devices, systems and methods for predicting gaze-related parameters
CN108962182A (en) * 2018-06-15 2018-12-07 广东康云多维视觉智能科技有限公司 3-D image display device and its implementation based on eyeball tracking
WO2020042843A1 (en) 2018-08-27 2020-03-05 Shenzhen GOODIX Technology Co., Ltd. Eye tracking based on imaging eye features and assistance of structured illumination probe light
US11537202B2 (en) 2019-01-16 2022-12-27 Pupil Labs Gmbh Methods for generating calibration data for head-wearable devices and eye tracking system
US10997232B2 (en) * 2019-01-23 2021-05-04 Syracuse University System and method for automated detection of figure element reuse
CN109885169B (en) * 2019-02-25 2020-04-24 清华大学 Eyeball parameter calibration and sight direction tracking method based on three-dimensional eyeball model
US11676422B2 (en) 2019-06-05 2023-06-13 Pupil Labs Gmbh Devices, systems and methods for predicting gaze-related parameters
CN110889349A (en) * 2019-11-18 2020-03-17 哈尔滨工业大学 VSLAM-based visual positioning method for sparse three-dimensional point cloud chart
CN111797810B (en) * 2020-07-20 2022-11-29 吉林大学 Method for acquiring forward-looking preview area of driver in driving process
US11601706B2 (en) * 2020-11-12 2023-03-07 Smart Science Technology, LLC Wearable eye tracking headset apparatus and system

Citations (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3678283A (en) 1970-10-22 1972-07-18 Us Navy Radiation sensitive optical tracker
US3806725A (en) 1972-01-18 1974-04-23 Leitz Ernst Gmbh Apparatus for automatic tracking of pupil of eye
US4109145A (en) 1974-05-20 1978-08-22 Honeywell Inc. Apparatus being controlled by movement of the eye
US4595990A (en) 1980-12-31 1986-06-17 International Business Machines Corporation Eye controlled information transfer
US4648052A (en) 1983-11-14 1987-03-03 Sentient Systems Technology, Inc. Eye-tracker communication system
US4676611A (en) 1984-11-14 1987-06-30 New York University Method and apparatus for visual-evoked responses
US4789235A (en) 1986-04-04 1988-12-06 Applied Science Group, Inc. Method and system for generating a description of the distribution of looking time as people watch television commercials
US4836670A (en) 1987-08-19 1989-06-06 Center For Innovative Technology Eye movement detector
US4852988A (en) 1988-09-12 1989-08-01 Applied Science Laboratories Visor and camera providing a parallax-free field-of-view image for a head-mounted eye movement measurement system
US4950069A (en) 1988-11-04 1990-08-21 University Of Virginia Eye movement detector with improved calibration and speed
US5204703A (en) 1991-06-11 1993-04-20 The Center For Innovative Technology Eye movement and pupil diameter apparatus and method
US5331149A (en) 1990-12-31 1994-07-19 Kopin Corporation Eye tracking system having an array of photodetectors aligned respectively with an array of pixels
US5481622A (en) 1994-03-01 1996-01-02 Rensselaer Polytechnic Institute Eye tracking apparatus and method employing grayscale threshold values
US5585813A (en) 1992-10-05 1996-12-17 Rockwell International Corporation All aspect head aiming display
US6120461A (en) 1999-08-09 2000-09-19 The United States Of America As Represented By The Secretary Of The Army Apparatus for tracking the human eye with a retinal scanning display, and method thereof
US6359601B1 (en) 1993-09-14 2002-03-19 Francis J. Maguire, Jr. Method and apparatus for eye tracking
US6943754B2 (en) 2002-09-27 2005-09-13 The Boeing Company Gaze tracking system, eye-tracking assembly and an associated method of calibration
US20050286767A1 (en) * 2004-06-23 2005-12-29 Hager Gregory D System and method for 3D object recognition using range and intensity
US7736000B2 (en) 2008-08-27 2010-06-15 Locarna Systems, Inc. Method and apparatus for tracking eye movement
US7866818B2 (en) 2003-11-07 2011-01-11 Neuro Kinetics, Inc Portable modular video oculography system and video occulography system with head position sensor and video occulography system with animated eye display
US7963652B2 (en) 2003-11-14 2011-06-21 Queen's University At Kingston Method and apparatus for calibration-free eye tracking
US20120133891A1 (en) * 2010-05-29 2012-05-31 Wenyu Jiang Systems, methods and apparatus for making and using eyeglasses with adaptive lens driven by gaze distance and low power gaze tracking
US8342687B2 (en) 2009-10-08 2013-01-01 Tobii Technology Ab Eye-tracking using a GPU
US8433612B1 (en) 2008-03-27 2013-04-30 Videomining Corporation Method and system for measuring packaging effectiveness using video-based analysis of in-store shopper response
US20130321772A1 (en) * 2012-05-31 2013-12-05 Nokia Corporation Medical Diagnostic Gaze Tracker
US20140192050A1 (en) * 2012-10-05 2014-07-10 University Of Southern California Three-dimensional point processing and model generation
US20140336781A1 (en) * 2013-05-13 2014-11-13 The Johns Hopkins University Hybrid augmented reality multimodal operation neural integration environment
US9072929B1 (en) * 2011-12-01 2015-07-07 Nebraska Global Investment Company, LLC Image capture system
US9164580B2 (en) * 2012-08-24 2015-10-20 Microsoft Technology Licensing, Llc Calibration of eye tracking system
US9342610B2 (en) * 2011-08-25 2016-05-17 Microsoft Technology Licensing, Llc Portals: registered objects as virtualized, personalized displays

Patent Citations (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3678283A (en) 1970-10-22 1972-07-18 Us Navy Radiation sensitive optical tracker
US3806725A (en) 1972-01-18 1974-04-23 Leitz Ernst Gmbh Apparatus for automatic tracking of pupil of eye
US4109145A (en) 1974-05-20 1978-08-22 Honeywell Inc. Apparatus being controlled by movement of the eye
US4595990A (en) 1980-12-31 1986-06-17 International Business Machines Corporation Eye controlled information transfer
US4648052A (en) 1983-11-14 1987-03-03 Sentient Systems Technology, Inc. Eye-tracker communication system
US4676611A (en) 1984-11-14 1987-06-30 New York University Method and apparatus for visual-evoked responses
US4789235A (en) 1986-04-04 1988-12-06 Applied Science Group, Inc. Method and system for generating a description of the distribution of looking time as people watch television commercials
US4836670A (en) 1987-08-19 1989-06-06 Center For Innovative Technology Eye movement detector
US4852988A (en) 1988-09-12 1989-08-01 Applied Science Laboratories Visor and camera providing a parallax-free field-of-view image for a head-mounted eye movement measurement system
US4950069A (en) 1988-11-04 1990-08-21 University Of Virginia Eye movement detector with improved calibration and speed
US5331149A (en) 1990-12-31 1994-07-19 Kopin Corporation Eye tracking system having an array of photodetectors aligned respectively with an array of pixels
US5583335A (en) 1990-12-31 1996-12-10 Kopin Corporation Method of making an eye tracking system having an active matrix display
US5204703A (en) 1991-06-11 1993-04-20 The Center For Innovative Technology Eye movement and pupil diameter apparatus and method
US5585813A (en) 1992-10-05 1996-12-17 Rockwell International Corporation All aspect head aiming display
US6359601B1 (en) 1993-09-14 2002-03-19 Francis J. Maguire, Jr. Method and apparatus for eye tracking
US5481622A (en) 1994-03-01 1996-01-02 Rensselaer Polytechnic Institute Eye tracking apparatus and method employing grayscale threshold values
US6120461A (en) 1999-08-09 2000-09-19 The United States Of America As Represented By The Secretary Of The Army Apparatus for tracking the human eye with a retinal scanning display, and method thereof
US6943754B2 (en) 2002-09-27 2005-09-13 The Boeing Company Gaze tracking system, eye-tracking assembly and an associated method of calibration
US7130447B2 (en) 2002-09-27 2006-10-31 The Boeing Company Gaze tracking system, eye-tracking assembly and an associated method of calibration
US7866818B2 (en) 2003-11-07 2011-01-11 Neuro Kinetics, Inc Portable modular video oculography system and video occulography system with head position sensor and video occulography system with animated eye display
US7963652B2 (en) 2003-11-14 2011-06-21 Queen's University At Kingston Method and apparatus for calibration-free eye tracking
US20050286767A1 (en) * 2004-06-23 2005-12-29 Hager Gregory D System and method for 3D object recognition using range and intensity
US8433612B1 (en) 2008-03-27 2013-04-30 Videomining Corporation Method and system for measuring packaging effectiveness using video-based analysis of in-store shopper response
US7736000B2 (en) 2008-08-27 2010-06-15 Locarna Systems, Inc. Method and apparatus for tracking eye movement
US8342687B2 (en) 2009-10-08 2013-01-01 Tobii Technology Ab Eye-tracking using a GPU
US20120133891A1 (en) * 2010-05-29 2012-05-31 Wenyu Jiang Systems, methods and apparatus for making and using eyeglasses with adaptive lens driven by gaze distance and low power gaze tracking
US9342610B2 (en) * 2011-08-25 2016-05-17 Microsoft Technology Licensing, Llc Portals: registered objects as virtualized, personalized displays
US9072929B1 (en) * 2011-12-01 2015-07-07 Nebraska Global Investment Company, LLC Image capture system
US20130321772A1 (en) * 2012-05-31 2013-12-05 Nokia Corporation Medical Diagnostic Gaze Tracker
US9164580B2 (en) * 2012-08-24 2015-10-20 Microsoft Technology Licensing, Llc Calibration of eye tracking system
US20140192050A1 (en) * 2012-10-05 2014-07-10 University Of Southern California Three-dimensional point processing and model generation
US20140336781A1 (en) * 2013-05-13 2014-11-13 The Johns Hopkins University Hybrid augmented reality multimodal operation neural integration environment

Non-Patent Citations (21)

* Cited by examiner, † Cited by third party
Title
Bay et al. (Speeded-up robust features (SURF), Comput. Vis. Image Underst., 110(3) (2008), pp. 346-359). *
Bay, A. Ess, T. Tuytelaars, and L. Van Gool. Speeded-Up Robust Features (SURF). Computer Vision and Image Understanding, 110(3):346-359, Jun. 2008.
Fischler and R. C. Bolles. Random sample consensus: a paradigm for model tting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381-395, Jun. 1981.
Laganiere. OpenCV 2 Computer Vision Application Programming Cookbook. Packt Publishing, Jun. 2011.
Li, J. Babcock, and D. J. Parkhurst. openEyes: a low-cost head-mounted eye-tracking solution. In Proceedings of the 2006 symposium on Eye tracking research & applications-ETRA '06, p. 95, New York, New York, USA, 2006. ACM Press.
Li, J. Babcock, and D. J. Parkhurst. openEyes: a low-cost head-mounted eye-tracking solution. In Proceedings of the 2006 symposium on Eye tracking research & applications—ETRA '06, p. 95, New York, New York, USA, 2006. ACM Press.
Lieberman, C. Sugrue, T. Watson, J. Powderly, E. Roth, and T. Quan. The EyeWriter, 2009.
McMurrough, J. Rich, C. Conly, V. Athitsos, and F. Makedon. Multi-Modal Object of Interest Detection Using Eye Gaze and RGB-D Cameras. In Proceedings of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction-Gaze-In '12, New York, New York, USA, 2012. ACM Press.
McMurrough, J. Rich, C. Conly, V. Athitsos, and F. Makedon. Multi-Modal Object of Interest Detection Using Eye Gaze and RGB-D Cameras. In Proceedings of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction—Gaze-In '12, New York, New York, USA, 2012. ACM Press.
Milner and M. Goodale. The Visual Brain in Action. Oxford University Press, Oxford, UK, 2nd edition, 2006.
Mulvey, A. Villanueva, D. Sliney, R. Lange, S. Cotmore, and M. Donegan. D5 . 4 Exploration of safety issues in Eyetracking. Technical report, Communication by Gaze Interaction (COGAIN), 2008.
Pasian, F. Corno, I. Signorile, and L. Farinetti. The Impact of Gaze Controlled Technology on Quality of Life. In Gaze Interaction and Applications of Eye Tracking: Advances in Assistive Technologies, chapter 6, pp. 48-54. IGI Global, 2012.
Pirri, M. Pizzoli, and A. Rudi. A general method for the point of regard estimation in 3D space. In CVPR 2011, pp. 921-928. IEEE, Jun. 2011.
Rusu and S. Cousins. 3D is here: Point Cloud Library (PCL). In 2011 IEEE International Conference on Robotics and Automation, pp. 1-4. IEEE, May 2011.
San Agustin, H. Skovsgaard, J. P. Hansen, and D. W. Hansen. Low-cost gaze interaction: ready to deliver the promises. In Proceedings of the 27th international conference extended abstracts on Human factors in computing systems-CHI EA '09, p. 4453, New York, New York, USA, 2009. ACM Press.
San Agustin, H. Skovsgaard, J. P. Hansen, and D. W. Hansen. Low-cost gaze interaction: ready to deliver the promises. In Proceedings of the 27th international conference extended abstracts on Human factors in computing systems—CHI EA '09, p. 4453, New York, New York, USA, 2009. ACM Press.
Swirski, Bulling, and Dodgson, Robust Real-Time Pupil Tracking in Highly Off-Axis Images, ETRA '12 Proceedings of the Symposium on Eye Tracking Research and Applications, pp. 173-176, 2012.
Takemura, Y. Kohashi, T. Suenaga, J. Takamatsu, and T. Ogasawara. Estimating 3D point-of-regard and visualizing gaze trajectories under natural head movements. In Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications-ETRA '10, vol. 1, p. 157, New York, New York, USA, 2010. ACM Press.
Takemura, Y. Kohashi, T. Suenaga, J. Takamatsu, and T. Ogasawara. Estimating 3D point-of-regard and visualizing gaze trajectories under natural head movements. In Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications—ETRA '10, vol. 1, p. 157, New York, New York, USA, 2010. ACM Press.
Winfield and D. Parkhurst. Starburst: A hybrid algorithm for video-based eye tracking combining feature-based and model-based approaches. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05)-Workshops, 3:79-79, 2005.
Winfield and D. Parkhurst. Starburst: A hybrid algorithm for video-based eye tracking combining feature-based and model-based approaches. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05)—Workshops, 3:79-79, 2005.

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10983593B2 (en) * 2014-07-31 2021-04-20 Samsung Electronics Co., Ltd. Wearable glasses and method of displaying image via the wearable glasses
CN110032278A (en) * 2019-03-29 2019-07-19 华中科技大学 A kind of method for recognizing position and attitude, the apparatus and system of human eye attention object
CN110032278B (en) * 2019-03-29 2020-07-14 华中科技大学 Pose identification method, device and system for human eye interested object
US11030455B2 (en) 2019-03-29 2021-06-08 Huazhong University Of Science And Technology Pose recognition method, device and system for an object of interest to human eyes

Also Published As

Publication number Publication date
US20150070470A1 (en) 2015-03-12

Similar Documents

Publication Publication Date Title
US10007336B2 (en) Apparatus, system, and method for mobile, low-cost headset for 3D point of gaze estimation
EP3284011B1 (en) Two-dimensional infrared depth sensing
Fischer et al. Rt-gene: Real-time eye gaze estimation in natural environments
US10394334B2 (en) Gesture-based control system
US9750420B1 (en) Facial feature selection for heart rate detection
Akinyelu et al. Convolutional neural network-based methods for eye gaze estimation: A survey
JP2024045273A (en) System and method for detecting human gaze and gestures in unconstrained environments
Sugano et al. Aggregaze: Collective estimation of audience attention on public displays
US9305206B2 (en) Method for enhancing depth maps
KR101471488B1 (en) Device and Method for Tracking Sight Line
JP6571108B2 (en) Real-time 3D gesture recognition and tracking system for mobile devices
Xu et al. Integrated approach of skin-color detection and depth information for hand and face localization
Jabnoun et al. Object recognition for blind people based on features extraction
Reale et al. Pointing with the eyes: Gaze estimation using a static/active camera system and 3D iris disk model
CN112185515A (en) Patient auxiliary system based on action recognition
KR20130051319A (en) Apparatus for signal input and method thereof
Alnaim Hand gesture recognition using deep learning neural networks
McMurrough et al. Multi-modal object of interest detection using eye gaze and rgb-d cameras
Niu et al. Real-time localization and matching of corneal reflections for eye gaze estimation via a lightweight network
Mesbahi et al. Hand Gesture Recognition Based on Various Deep Learning YOLO Models
Paletta et al. An integrated system for 3D gaze recovery and semantic analysis of human attention
Fihl et al. Invariant gait continuum based on the duty-factor
US11675428B2 (en) Determining a gaze direction using depth information
Jain et al. Low-cost gaze detection with real-time ocular movements using coordinate-convolutional neural networks
Vasantrao et al. Improved HCI using face detection and speech recognition

Legal Events

Date Code Title Description
AS Assignment

Owner name: THE BOARD OF REGENTS OF THE UNIVERSITY OF TEXAS SY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MCMURROUGH, CHRISTOPHER D.;REEL/FRAME:035815/0926

Effective date: 20150123

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

Year of fee payment: 4