WO2021146177A1

WO2021146177A1 - Systems and methods for eye tracking using machine learning techniques

Info

Publication number: WO2021146177A1
Application number: PCT/US2021/013058
Authority: WO
Inventors: Robert C. CHAPPELL; Zachary Sharp MICKELSON; Tai Chan
Original assignee: Eye Tech Digital Systems, Inc.
Priority date: 2020-01-13
Filing date: 2021-01-12
Publication date: 2021-07-22
Also published as: CN114930410A; EP4091095A1; WO2021146177A8

Abstract

An eye tracking system includes an eye finder module configured to receive a digital image of a user's face and produce eye region data specifying first and second locations of the user's eyes. A pupil finder module receives the eye region data and determines, using the digital image and the eye region data, the locations of first and second pupil centers within the digital image. A gaze finder module determines a user gaze point based in part on the locations of the first and second pupil centers. At least one of the eye finder module, pupil finder module, and gaze finder module are implemented as a previously trained machine learning model.

Description

SYSTEMS AND METHODS FOR EYE TRACKING USING

MACHINE LEARNING TECHNIQUES

TECHNICAL FIELD

[0001] The present invention relates, generally, to eye-tracking systems and methods and, more particularly, to the application of machine learning techniques to such eye-tracking systems.

BACKGROUND

[0002] Eye-tracking systems — such as those used in conjunction with desktop computers, laptops, tablets, virtual reality headsets, and other computing devices that include a display - generally include one or more illuminators configured to direct infrared light to the user’s eyes and an image sensor that captures the images for further processing. By determining the relative locations of the user’s pupils and the corneal reflections produced by the illuminators, the eye-tracking system can accurately predict the user’s gaze point on the display.

[0003] While currently known eye-tracking systems are reasonably accurate and responsive for gaming purposes, there are a number of ways in which such systems might be improved. For example, there is a need for improved robustness in eye-tracking systems to address partial occlusions or in circumstances where a user’s eyeglasses present image processing challenges.

[0004] Systems and methods are therefore needed that overcome these and other limitations of the prior art.

SUMMARY OF THE INVENTION

[0005] Various embodiments of the present invention relate to systems and methods for, inter alia: i) providing improved eye-tracking using previously trained machine learning models; ii) providing improved eye-tracking calibration through frequent retraining of a machine learning model during normal use of the system, iii) providing improved eye-tracking using machine learning models configured to eye finding and/or pupil finding; iv) providing improved eye-tracking using a combination of shallow artificial neural networks (ANNs) and convolutional neural networks (CNNs); and v) providing improved eye-tracking functionality using a hybrid approach including both traditional and machine learning models. BRIEF DESCRIPTION OF THE DRAWING FIGURES

[0006] The present invention will hereinafter be described in conjunction with the appended drawing figures, wherein like numerals denote like elements, and:

[0007] FIG. 1 is a conceptual block diagram illustrating an eye-tracking system in accordance with various embodiments;

[0008] FIGS. 2 and 3 present schematic block diagrams of eye-tracking systems in accordance with various embodiments;

[0009] FIG. 4 is a flowchart illustrating an eye-tracking method in accordance with various embodiments;

[0010] FIGS. 5A-5C illustrate the determination of eye regions in accordance with various embodiments;

[0011] FIGS. 6A-6B illustrate the imaging of a user’s corneal reflections (CRs) and pupil center (PC) in accordance with various embodiments; and

[0012] FIG. 7 illustrates a shallow neural network in accordance with various embodiments; and

[0013] FIG. 8 illustrates a convolutional neural network (CNN) is accordance with various embodiments. DETAILED DESCRIPTION OF PREFERRED

EXEMPLARY EMBODIMENTS

[0014] The present subject matter relates to improved systems and methods for performing eye-tracking using artificial intelligence (AI) techniques and machine learning (ML) models in place of, or in conjunction with, traditional eye-tracking techniques. In that regard, the following detailed description is merely exemplary in nature and is not intended to limit the inventions or the application and uses of the inventions described herein. Furthermore, there is no intention to be bound by any theory presented in the preceding background or the following detailed description. In the interest of brevity, conventional techniques and components related to eye-tracking algorithms, image sensors, machine learning systems, and digital image processing may not be described in detail herein.

[0015] Referring first to FIG. 1, the present invention is generally implemented in the context of a system 100 including a computing device 110 (e.g., a desktop computer, tablet computer, laptop, or the like) having an eye-tracking assembly 120 coupled to, integrated into, or otherwise associated with device 110. The eye-tracking assembly 120 is configured to observe the facial region 181 of a user 180 within a field of view 170 and, through various techniques described in detail below, track the location and movement of the user’s gaze (or "gaze point”) 113 on display 112 of computing device 110. The gaze point 113 may be characterized, for example, by a tuple ( , y] specifying linear coordinates (in pixels, centimeters, or other suitable unit) relative to an arbitrary reference point on display screen 112 (e.g., the upper left corner, as shown).

[0016] In the illustrated embodiment, eye-tracking assembly 120 includes one or more infrared (IR) light emitting diodes (LEDs) 121 positioned to illuminate facial region 181 of user 180. Assembly 120 further includes one or more cameras 125 configured to acquire (at a suitable frame-rate) digital images corresponding to region 181 of the user’s face (generally referred to as "image data”).

[0017] In some embodiments, the image data may be processed locally (i.e., within computing device 110) to determine gaze point 113. In some embodiments, however, eye tracking is accomplished using an image processing module or modules 162 that that are remote from computing device 110 - e.g., hosted within a cloud computing system 160 communicatively coupled to computing device 110 over a network 150 (e.g., the Internet). In such embodiments, image processing module 162 performs the computationally complex operations necessary to determine the gaze point and is then transmitted back (as eye and gaze data) over the network to computing device 110. An example cloud-based eye-tracking system that may employed in the context of the present invention may be found, for example, in U.S. Pat. App. No. 16/434,830, entitled "Devices and Methods for Reducing Computational and Transmission Latencies in Cloud Based Eye Tracking Systems,” filed June 7, 2019, the contents of which are hereby incorporated by reference.

[0018] Referring now to the block diagram illustrated in FIG. 2 and with continued reference to FIG. 1, an eye-tracking system 200 in accordance with various embodiments includes an eye finder module 210 configured to receive an image 201 (i.e., an image acquired of user’s facial region 181) and determine, as described in further detail below, the location within image 201 of the user’s eyes. This eye location data 211 is then provided to a pupil finder module (or simply "pupil finder”) 220 and a corneal reflection (CR) finder (or simply "CR finder”) 230. In parallel, image 201 may also be directly provided (in raw form) to pupil finder 220 and CR finder 230, as shown.

[0019] The output 221 of pupil finder 220 (e.g., data specifying the predicted location ofthe pupil centers (PCs) in image 201) is provided to gaze finder module (or simply "gaze finder”) 240. Similarly, the output 231 of CR finder 230 is provided to gaze finder 240. Gaze finder 240 then takes the received pupil and CR information and produces gaze data 241, which in one embodiment includes the gaze coordinates ( , ) (113 in FIG. 1) along with other optional information, such as a value specifying the user’s distance from eye tracking assembly 120.

[0020] In accordance with the present invention, one or more modules 210, 220, 230, and 240 as illustrated in FIG. 2 are implemented using previously-trained machine learning models, rather than traditional eye tracking techniques (e.g., conventional geometric models). For example, in one embodiment, eye finder 210, pupil finder 220, and/or CR finder 230 are implemented as convolutional neural networks (CNNs) that perform object detection. In one embodiment, one or more of modules 210, 220, 230, and 240 implement a You Only Look Once (YOLO) algorithm (e.g., YOLO v3) configured to produce a regression output that includes predicted coordinates (i.e., of the CRs and PCs of the user’s eyes). As is known in the art, the YOLO algorithm may be implemented using a variety of programming languages and libraries. In one embodiment, for example, YOLO object detection is implemented on a cloud computing platform using a Tensor Flow library.

[0021] In accordance with various embodiments, gaze finder 240 is implemented as a shallow ANN (e.g., an ANN with a single hidden layer) that takes as its input a vector of integers produced by pupil finder 220 and CR finder 230 and produces a regression output including the predicted gaze point coordinates along with confidence levels associated with that prediction.

[0022] Referring now to FIG. 3, an eye tracking system 300 in accordance with an alternate embodiment includes an eye finder module (or simply "eye finder”) 310 (which, as above, may be implemented using a machine learning model or conventional eye-finding techniques) and a gaze finder module (or simply "gaze finder”) 340. In this embodiment, gaze finder 340 is implemented as a full CNN and receives as its input the data 311 from eye finder 310 as well as the raw user image 301. The result is a gaze point output that may correspond to a regression output (e.g., integer coordinate data) or classification output (a discrete region on display screen 112). Stated another way, while system 200 of FIG. 2 implements a hybrid approach to perform eye tracking (including both machine learning and conventional techniques), system 300 of FIG. 3 performs eye tracking primarily through a single, properly trained CNN.

[0023] In accordance with one embodiment, the gaze point output 341 of gaze finder 340 is further processed to improve the accuracy of the predicted gaze point. The present inventors have determined that such an embodiment is particularly advantageous in accounting for differences between the appearance of the given user and the appearance of the users used for supervised training of the CNN. In one embodiment, for example, numeric x and y offsets are added to the gaze point output 341. In other embodiments, the gaze point output values 341 are multiplied by one or more constants.

[0024] FIG. 4 is a flowchart illustrating an eye-tracking method 400 in accordance with various embodiments that might be performed, for example, by the eye tracking system illustrated in FIG. 2. More particularly, referringto FIG.4 together with FIGS. 5A-5C and FIGS. 6A-6B, the method 400 beings with capturing a first image (510) that includes at least a portion of the user’s face 511 (step 401). In one embodiment, the first image 510 is a high resolution image produced by camera 125 of FIG. 1.

[0025] Next, at step 402, a second image 520 of the user’s facial region 521 is produced by decimating (or otherwise down-sampling or reducing the resolution of) the first image 510. In one embodiment, the second image 520 is a 416 x 416 pixel image. Next, at 403, the eye regions 531, 532 are determined from second image 520 or a transformed/decimated third image 530 based on image 520. In one embodiment, as described above, the eye region determination is made by eye finder module 210 using, for example, a YOLO machine learning model.

[0026] After the general eye regions 531 and 532 are determined, the system then crops out, from the first image (i.e., the high resolution image 510) a pair of close-up images of the respective eye regions at the locations (step 404). In one embodiment, these close-up images are 416 x 416 pixel images.

[0027] Subsequently, at step 405, the system determines (e.g., using a YOLO machine learning model as described above) the PCs and CRs for each eye. This is illustrated in FIGS. 6Aand 6B as a first image 601 including a first eye 531 having a PC 542 and CRs 552; and a second image 602 including a second eye having a PC 543 and CRs 553. Given this data, the system (e.g., gaze finder 240) determines (at step 406), the predicted gaze point (x, y)·

[0028] FIG. 7 is a schematic block diagram of an artificial neural network (ANN) 700 in accordance with various embodiments that may be used to implement, for example, the gaze finder 240 of FIG. 2.

[0029] In general, ANN 700 includes an input layer 701 with a number of input nodes (e.g., 701-1 to 701-n), an output layer 703 with a number of output nodes (e.g., 703-1 to 703-j), and one or more interconnected hidden layers 702 (in this example, a single hidden layer 702 including nodes 702-1 to 702-k). The number of nodes in each layer (n, k, and fl may vary depending upon the application, and in fact may be modified dynamically by the system itself to optimize its performance. In some embodiments (e.g., deep learning systems), multiple hidden layers 702 may be incorporated into ANN 700.

[0030] Each of the layers 702 and 703 receives input from a previous layer via a network of weighted connections (illustrated as arrows in FIG. 7). That is, the arrows in Fig. 7 may be represented as a matrix of floating point values representing weights between pairs of interconnected nodes. Each of the nodes implements an "activation function” (e.g., sigmoid, tanh, or linear) that will generally vary depending upon the particular application, and which produces an output that is based on the sum of the inputs at each node.

[0031] ANN 700 is trained via a learning rule and "cost function” that are used to modify the weights of the connections in response to the input patterns (i.e., eye tracking data) provided to input layer 701 and the training set provided at output layer 703, thereby allowing ANN 700 to learn by example through a combination of backpropagation and gradient descent optimization. Such learning may be supervised (with previously eye tracking data provided as input layer 701 and known gaze point information provided as output layer 703), unsupervised (with uncategorized examples provided to input layer 701), or involve reinforcement learning, where some notion of "reward” is provided during training of the eye-tracking data.

[0032] Once ANN 700 is trained to a satisfactory level, it may be used as an analytical tool to make predictions and perform classification or regression based on the input 701. That is, new inputs are presented to input layer 701, where they are processed by the middle layer 702 and, via forward propagation through the weights associated with each of the edges, produce an output 703. As described above, output layer 703 will typically include a set of confidence levels or probabilities associated with a corresponding number of different classes, such as the location of the gaze point.

[0033] FIG. 8 is a block diagram of an exemplary convolutional neural network (CNN) in accordance with various embodiments, and which may be used, for example, to implement the gaze finder 340 of FIG. 3.

[0034] As shown in FIG. 8, CNN 800 generally receives an input image 810 (e.g., an image of a user’s facial region) and produces an output 840 comprising a vector of gaze point data.

[0035] In general, CNN 800 implements a convolutional phase 822, followed by feature extraction 820 and classification 830. Convolutional phase 822 uses an appropriately sized convolutional filter that produces a set of feature maps 821 corresponding to smaller tilings of input image 810. As is known, convolution as a process is translationally invariant - i.e., features of interest (e.g., nose, eyes, mouth) can be identified regardless of their location within image 810.

[0036] Subsampling 824 is then performed to produce a set of smaller feature maps 823 that are effectively "smoothed” to reduce sensitivity of the convolutional filters to noise and other variations. Subsampling might involve taking an average or a maximum value over a sample of the inputs 821. Feature maps 823 then undergo another convolution 826, as is known in the art, to produce a large set of smaller feature maps 825. Feature maps 825 are then subsampled 828 to produce feature maps 827.

[0037] During the classification phase (830), the feature maps 827 are processed to produce a first layer 831, followed by a fully-connected layer 833, from which outputs 840 are produced. For example, outputs 841 and 842 might correspond to the likelihood that particular features have been recognized.

[0038] In general, the CNN illustrated in FIG. 8 trained in a supervised mode by presenting it with a large number (i.e., a "corpus”) of input images of users’ faces, and "clamping” outputs 840 based on the known, ground truth location of the user’s gaze. Backpropagation as is known in the art is then used to refine the training CNN 800. Subsequently, during normal operation, the trained CNN is used to process images 810 as described above.

[0039] In accordance with various embodiments, training of the machine learning models and consequently the eye-tracking calibration takes place in the background in a way that is largely transparent to the user. That is, the user is not prompted to enter a specified "calibration” mode. Rather, the system, during normal operation, continuously updates and trains the models based on the acquired images.

[0040] While the above discussion often focuses on the use of artificial neural networks, the range of embodiments are not so limited. Any of the various modules described herein (e.g., in FIGS. 2 and 3) may be implemented as one or more machine learning models that undergo supervised, unsupervised, semi-supervised, or reinforcement learning and perform classification (e.g., binary or multiclass classification), regression, clustering, dimensionality reduction, and/or such tasks. Examples of such models include, without limitation, artificial neural networks (ANN) (such as a recurrent neural networks (RNN) and convolutional neural network (CNN)), decision tree models (such as classification and regression trees (CART)), ensemble learning models (such as boosting, bootstrapped aggregation, gradient boosting machines, and random forests), Bayesian network models (e.g., naive Bayes), principal component analysis (PCA), support vector machines (SVM), clustering models (such as K-nearest-neighbor, K-means, expectation maximization, hierarchical clustering, etc.), linear discriminant analysis models.

[0041] In summary, what have been described are various eye-tracking systems and method utilizing novel machine learning techniques. In accordance with one embodiment, an eye tracking system includes: an eye finder module configured to receive a digital image of a user’s face and produce eye region data specifying first and second locations of the user’s eyes; a pupil finder module configured to receive the eye region data and to determine, using the digital image and the eye region data, the locations of first and second pupil centers within the digital image; and a gaze finder module configured to determine a user gaze point based in part on the locations of the first and second pupil centers; wherein at least one of the eye finder module, pupil finder module, and gaze finder module are implemented as a previously trained machine learning model.

[0042] In accordance with one embodiment, the gaze finder module is implemented as a shallow artificial neural network (ANN).

[0043] In accordance with one embodiment, the pupil finder module is implemented as a convolutional neural network (CNN).

[0044] In accordance with one embodiment, the CNN is implemented using YOLO object detection.

[0045] In accordance with one embodiment, the system further includes a corneal reflection finder module configured to receive the eye region data and to determine, using the digital image and the eye region data, a plurality of corneal reflections within the digital image, wherein the gaze finder determines the user gaze point base in part on the locations of the plurality of corneal reflections.

[0046] In accordance with one embodiment, the machine learning model is trained by acquiring images of the user during normal operation.

[0047] In accordance with one embodiment, at least one of the eye finder module and gaze finder module is implemented on a cloud computing platform remote from the user.

[0048] An eye tracking system in accordance with another embodiment includes: an eye finder module configured to receive a digital image of a user’s face and produce eye region data specifying first and second locations of the user’s eyes; and a gaze finder module, including a previously trained convolutional machine learning model, configured to determine a user gaze point based in part on the eye region data.

[0049] In accordance with one embodiment, the previously trained machine learning model is a convolutional neural network (CNN).

[0050] In accordance with one embodiment, the eye finder module is implemented using YOLO object detection.

[0051] In accordance with one embodiment, the previously trained machine learning model is trained by acquiring images of the user during normal operation.

[0052] In accordance with one embodiment, at least one of the gaze finder module and eye finder module is implemented using a cloud computing platform remote from the user.

[0053] An eye tracking method in accordance with one embodiment includes: receiving a digital image of a user’s face; producing eye region data specifying first and second locations of the user’s eyes; determining, using the digital image and the eye region data, the locations of first and second pupil centers within the digital image; and determining, using a previously trained shallow neural network model, a user gaze point based in part on the locations of the first and second pupil centers.

[0054] In accordance with one embodiment, the eye region data is determined using a shallow artificial neural network (ANN).

[0055] In accordance with one embodiment, the locations of the first and second pupil centers are determined using a convolutional neural network (CNN). [0056] In accordance with one embodiment, the CNN is implemented using YOLO object detection.

[0057] In accordance with one embodiment, the method includes receiving the eye region data and determining, using the digital image and the eye region data, a plurality of corneal reflections within the digital image.

[0058] In accordance with another embodiment, the previously trained shallow neural network model is trained by acquiring images of the user during normal operation.

[0059] An eye tracking system in accordance with one embodiment includes: an eye-tracking assembly including at least one infrared (IR) light emitting diode (LED) positioned to illuminate a user’s facial region, and at least one cameral configured to acquire a digital image of the user’s facial region; an eye finder module configured to receive a digital image of a user’s facial region from the eye-tracking assembly and produce eye region data specifying first and second locations of the user’s eyes; a pupil finder module pupil finder module implemented as a YOLO convolutional neural network (CNN) configured to receive the eye region data and to determine, using the digital image and the eye region data, the locations of first and second pupil centers within the digital image; a corneal reflection finder module implemented as a YOLO CNN configured to receive the eye region data and to determine, using the digital image and the eye region data, a plurality of corneal reflections within the digital image; and a gaze finder module implemented as a shallow artificial neural network (ANN) configured to determine a user gaze point based in part on the locations of the first and second pupil centers and the locations of the plurality of corneal reflections. At least one of the eye finder module, pupil finder module, corneal reflection finder module, and gaze finder module are trained by acquiring images of the user during normal operation. At least one of the gaze finder module, corneal reflection module, and eye finder module may be implemented using a cloud computing platform that is remote from the user and/or the eye-tracking assembly and/or the computing device with which the eye-tracking assembly is used.

[0060] Embodiments of the present disclosure may be described herein in terms of functional and/or logical block components and various processing steps. It should be appreciated that such block components may be realized by any number of hardware, software, and/or firmware components configured to perform the specified functions. For example, an embodiment of the present disclosure may employ various integrated circuit components, e.g., memory elements, digital signal processing elements, logic elements, look up tables, or the like, which may carry out a variety of functions under the control of one or more microprocessors or other control devices.

[0061] In addition, those skilled in the art will appreciate that embodiments of the present disclosure may be practiced in conjunction with any number of systems, and that the systems described herein are merely exemplary embodiments of the present disclosure. Further, the connecting lines shown in the various figures contained herein are intended to represent example functional relationships and/or physical couplings between the various elements. It should be noted that many alternative or additional functional relationships or physical connections may be present in an embodiment of the present disclosure. [0062] As used herein, the terms "module” or "controller” refer to any hardware, software, firmware, electronic control component, processing logic, and/or processor device, individually or in any combination, including without limitation: application specific integrated circuits (ASICs), field-programmable gate-arrays (FPGAs), dedicated neural network devices (e.g., Google Tensor Processing Units), electronic circuits, processors (shared, dedicated, or group) configured to execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that provide the described functionality.

[0063] As used herein, the word "exemplary” means "serving as an example, instance, or illustration.” Any implementation described herein as "exemplary” is not necessarily to be construed as preferred or advantageous over other implementations, nor is it intended to be construed as a model that must be literally duplicated.

[0064] While the foregoing detailed description will provide those skilled in the art with a convenient road map for implementing various embodiments of the invention, it should be appreciated that the particular embodiments described above are only examples, and are not intended to limit the scope, applicability, or configuration of the invention in any way. To the contrary, various changes may be made in the function and arrangement of elements described without departing from the scope of the invention.

Claims

1. An eye tracking system comprising: an eye finder module configured to receive a digital image of a user’s face and produce eye region data specifying first and second locations of the user’s eyes; a pupil finder module configured to receive the eye region data and to determine, using the digital image and the eye region data, the locations of first and second pupil centers within the digital image; and a gaze finder module configured to determine a user gaze point based in part on the locations of the first and second pupil centers; wherein at least one of the eye finder module, pupil finder module, and gaze finder module are implemented as a previously trained machine learning model.

2. The eye tracking system of claim 1, wherein the gaze finder module is implemented as an artificial neural network (ANN).

3. The eye tracking system of claim 1, wherein the pupil finder module is implemented as a convolutional neural network (CNN).

4. The eye tracking system of claim 3, wherein the CNN is implemented using YOLO object detection.

5. The eye tracking system of claim 1, further including a corneal reflection finder module configured to receive the eye region data and to determine, using the digital image and the eye region data, at least one corneal reflection within the digital image, wherein the gaze finder determines the user gaze point base in part on the locations of the plurality of corneal reflections.

6. The eye tracking system of claim 1, wherein the machine learning model is trained by acquiring images of the user during normal operation.

7. The eye tracking system of claim 1, wherein at least one of the eye finder module and gaze finder module is implemented on a cloud computing platform remote from the user.

8. An eye tracking system comprising: an eye finder module configured to receive a digital image of a user’s face and produce eye region data specifying first and second locations of the user’s eyes; and a gaze finder module, including a previously trained convolutional machine learning model, configured to determine a user gaze point based in part on the eye region data.

9. The eye tracking system of claim 8, wherein the previously trained machine learning model is a convolutional neural network (CNN).

10. The eye tracking system of claim 8, wherein the eye finder module is implemented using YOLO object detection.

11. The eye tracking system of claim 8, wherein the previously trained machine learning model is trained by acquiring images of the user during normal operation.

12. The eye tracking system of claim 8, wherein at least one of the gaze finder module and eye finder module is implemented using a cloud computing platform remote from the user.

13. An eye tracking method comprising: receiving a digital image of a user’s face; producing eye region data specifying first and second locations of the user’s eyes; determining, using the digital image and the eye region data, the locations of first and second pupil centers within the digital image; and determining, using a previously trained shallow neural network model, a user gaze point based in part on the locations of the first and second pupil centers.

14. The eye tracking method of claim 13, wherein the eye region data is determined using a shallow artificial neural network (ANN).

15. The eye tracking method of claim 13, wherein the locations of the first and second pupil centers are determined using a convolutional neural network (CNN).

16. The eye tracking method of claim 15, wherein the CNN is implemented using YOLO object detection.

17. The eye tracking method of claim 13, further including: receiving the eye region data; and determining, using the digital image and the eye region data, at least one corneal reflection within the digital image.

18. The eye tracking method of claim 13, wherein the previously trained shallow neural network model is trained by acquiring images of the user during normal operation.

19. An eye tracking system comprising: an eye-tracking assembly including at least one infrared (IR) light emitting diode (LED) positioned to illuminate a user’s facial region, and at least one cameral configured to acquire a digital image of the user’s facial region; an eye finder module configured to receive a digital image of a user’s facial region from the eye-tracking assembly and produce eye region data specifying first and second locations of the user’s eyes; a pupil finder module pupil finder module implemented as a YOLO convolutional neural network (CNN) configured to receive the eye region data and to determine, using the digital image and the eye region data, the locations of first and second pupil centers within the digital image; a corneal reflection finder module implemented as a YOLO CNN configured to receive the eye region data and to determine, using the digital image and the eye region data, at least one corneal reflection within the digital image; and a gaze finder module implemented as a shallow artificial neural network (ANN) configured to determine a user gaze point based in part on the respective locations of the first and second pupil centers and the at least one corneal reflection; wherein at least one of the eye finder module, pupil finder module, corneal reflection finder module, and gaze finder module are trained by acquiring images of the user during normal operation.

20. The eye tracking system of claim 19, wherein at least one of the gaze finder module, corneal reflection module, and eye finder module is implemented using a cloud computing platform remote from the user.