WO2016142489A1

WO2016142489A1 - Eye tracking using a depth sensor

Info

Publication number: WO2016142489A1
Application number: PCT/EP2016/055190
Authority: WO
Inventors: Matthias NIESER; Fabian WANNER; Walter Nistico
Original assignee: SensoMotoric Instruments Gesellschaft für innovative Sensorik mbH
Priority date: 2015-03-11
Filing date: 2016-03-10
Publication date: 2016-09-15

Abstract

The invention relates to an eye tracking device (30) for determining a position of at least one predefined feature (12, 14, 16, 18, 20, 22) of at least one eye (10) of a user (36). The eye tracking device (30) comprises a 3D capturing device (32) configured to capture depth data (34) of different regions of the at least one eye (10) of the user (36), and a processing unit (38) configured to determine the 3D position of the at least one predefined feature (12, 14, 16, 18, 20, 22) of the at least one eye (10) in dependency of information derived from the captured depth data (P1(x1, y1, z1), P2(x2, y2, z2)).

Description

Eye tracking using a depth sensor

The invention relates to an eye tracking device for determining a position of at least one predefined feature of at least one eye of a user and a method for determining a position of at least one predefined feature of at least one eye of a user by means of an eye tracking device.

This invention applies in the context of an eye tracking device, which is an apparatus to detect and track the position, orientation or other properties, like pupil dilation, intraocular distance, etc., of the eyes of a user.

Eye tracking measures the spatial direction, especially gaze and eye fixation, where the eyes are pointing. Many different techniques have involved so far. A very common, robust and accurate technique is a use of infrared (IR) illuminators and an IR camera which captures images of the user's eye. Eye features like the pupil and/or iris and the reflections of the IR illuminators, called corneal reflections (CR), are detected in the image. These features are used to reconstruct properties of the eye, e.g. the position of the eye, the gaze direction or the point where the eye is looking at, also called point of regard.

Other techniques which are not using external illuminators and which are not detecting corneal reflections are currently much less accurate. Examples are techniques which use cameras operating in the visual light spectrum, typically RGB cameras. From those images it is much harder to estimate the exact position of the eye since a single image does not contain the information about a global scale factor. Furthermore, there are depth sensors known from the prior art, which can measure distance information of objects and persons. Also the use of such depth sensors in connection with eye tracking is known. Such systems are e.g. described by LI ET AL: "Robust depth camera based multi-user eye tracking for autostereoscopic displays", SYSTEMS, SIGNALS AND DEVICES (SSD), 2012 9TH INTERNATIONAL MULTI- CONFERENCE ON, IEEE, 20 March 2012 (2012-03-20), pages 1 - 6, XP032180264, ISBN: 978-1 -4673-1590-6, DOI: 10.1 109/SSD.2012.6198039, and by CIONG ET AL: "Eye gaze tracking using an RGBD camera: a comparison with a RGB solution", Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, Pages 1 1 13-1 121 , ISBN: 978-1 -4503-3047- 3, DOI: 10.1 145/2638728.2641694. However, therein depth data are only used to determine face or head position and to make crude estimates about the distance of an eye from a screen. Furthermore, WO 2014/209816 A1 describes eye tracking via a depth camera. Thereby the gaze direction of a user is determined with conventional techniques on the basis of images captured from the eyes of a user. Furthermore, by means of a depth sensor distance information about the distance of the user from a screen is determined and on the basis of this distance information and the calculated gaze direction the intersection point of the gaze direction with a screen is determined.

A major limitation of these approaches is that the depth information is calculated and used only for a very limited set of points, which can be relatively far from the eyes, and this limits the accuracy and robustness with regard to noisy measurements. Though methods based on detecting corneal reflections on the eye are at the moment the most robust and accurate techniques, those methods suffer from the drawback that these corneal reflections may overlay important eye features, like the pupil, which makes it more difficult to detect the pupil in the image. Also at least two corneal reflections must be visible on the cornea to determine the distance of the cornea from the camera, but less than two corneal reflections might be visible at a given time due to a number of factors, like corneal reflections disappearing from the cornea due to eye dryness or irregularity on the cornea surface, temporary occlusion of the light source beams, or cornea reflection falling outside of the cornea due to the eye being orientated at larger angle with respect to the camera. Therefore, it is an object of the present invention to provide an eye tracking device and a method for deternnining a position of at least one predefined feature of at least one eye of a user, which allow for a more reliable determination of the position of the eye feature.

This object is solved by an eye tracking device with the features of claim 1 and a method with the features of claim 16. Advantageous embodiments of the invention are presented by the dependent claims.

According to the invention the eye tracking device for determining a position of at least one predefined feature of at least one eye of a user comprises a 3D capturing device configured to capture depth data of different regions of the at least one eye of the user and a processing unit configured to determine the 3D position of the at least one predefined feature of the at least one eye in dependency of information derived from the captured depth data.

By capturing depth data of different regions of the at least one eye it is possible not only to determine the distance of the eye from the 3D capturing device or a screen, it is possible also to determine the orientation of the eye and therefore the gaze direction in dependency of the depth captured depth data. Especially, the orientation of the eye or the gaze direction can be determined on the basis of the 3D position of the predefined feature and therefore the captured depth data can be used even for determining the orientation and/or gaze direction of the eye. This has the great advantage that one does not have necessarily to rely on the detection of corneal reflections, which are usually used for determining the 3D positions of eye features.

For capturing the depth data the 3D capturing device can comprise a depth sensor. A depth sensor (or depth camera) in general is capable of perceiving reflectance and distance information of objects in a scene at real-time video frame rates. Depth cameras can be classified by the underlying technologies in three main categories, i.e., time-of-flight, active and passive triangulation. Depth information can also be acquired by different techniques, like e.g. stereo imaging. So the 3D capturing device can comprise e.g. a stereo camera, a laser scanner, a time-of-flight camera, a light coding system, a sonar/ultrasound sensor or any other depth sensor.

In the following, embodiments of the invention will be presented, by means of which the gaze direction can be determined without needing any corneal reflections at all or using only one single corneal reflection at the most. Also embodiments will be presented by means of which the gaze direction can be calculated only on the basis of the captured depth data without needing a camera providing color or brightness information. In general, the depth data referring to different regions of the eye allow e.g. for a 3D reconstruction or modeling of the eye and/or relevant eye features in 3D space and facilitates very accurate and robust eye tracking.

According to a preferred embodiment of the invention the at least one predefined feature is at least one of an iris, a pupil, a cornea, an eyeball and/or sclera of the at least one eye. The optical axis of the eye for example can be determined as a straight line through the iris/pupil center and through the cornea center or eyeball center. Thereby also the gaze direction can be determined. Furthermore, the gaze direction could also be determined as a normal vector of the iris/pupil plane in the iris/pupil center. So advantageously the depth data can be used to determine the 3D position of only one, several or all of these predefined eye features, and in the end for determining the gaze direction and/or orientation of the eye.

In another advantageous embodiment of the invention the processing unit is configured to determine the 3D position of the at least one predefined feature as a 3D position of a center of the at least one feature and/or as a 3D position of a shape of the at least one feature. For example, the 3D coordinates of the center of the iris, pupil, cornea or eyeball can be calculated as the 3D position of the respective predefined features. Alternatively or additionally also the 3D position of the whole shape of the at least one feature can be determined. For example the iris can be modelled as a circular plate and the 3D position and/or orientation of the whole plate can be determined in 3D space. Also the cornea for example can be modelled as a sphere and a position of this whole sphere can be determined in 3D space. Advantageously this provides many possibilities of determining the gaze direction and/or orientation of the eye, as this can be done in dependency of one or more 3D positions of centers of predefined features of the eye or alternatively or additionally in dependency of the calculated shapes of one or more features in 3D space.

According to another advantageous embodiment the 3D capturing device is configured to determine a 3D model of the captured eye on the basis of the captured depth data and to provide the 3D model to the processing unit. The 3D capturing device can comprise another, especially second, processing unit for post processing the captured data, or the 3D model can be provided or derived directly from the captured depth data. So advantageously the whole eye and its surface structure can be modelled in 3D space so that easily the 3D positions of certain features of the eye can be determined on the basis of this 3D model.

According to a further advantageous embodiment of the invention the 3D capturing device is configured to represent the captured depth data as a depth map and/or a point cloud, and especially to create a mesh model and/or a patch model and/or a CAD model from the depth data. This way a 3D model of the captured eye on the basis of the captured depth data can be provided.

According to another advantageous embodiment of the invention the processing unit is configured to identify the depth data relating to the predefined feature on the basis of a predefined geometrical surface model of the predefined feature, especially by searching for surface structures in the depth data fitting the predefined geometrical surface model. If for example the depth data relating to the iris and the pupil shall be identified or part of a 3D model of the eye relating to the iris or the pupil shall be identified, this can easily be done based on the knowledge or assumption that the iris is flat or planar and the pupil is a hole in it. So a suitable predefined geometric surface model for the iris and the pupil would be a plane, maybe even a circle-shaped plane, with a circle-shaped hole in it. So, on the basis of this predefined geometrical surface model for the pupil and the iris these eye features can easily be identified in the depth data of the eye. This makes it for example possible to determine the gaze direction only on the basis of 3D data provided by the 3D capturing device, especially without needing any image with color or brightness information, but solely on the basis of an eye image comprising the depth data. As described the depth data relating to the iris and pupil can easily be found on the basis of this predefined geometrical surface model, and on the basis of the found depth data describing the iris and pupil a normal vector in the center of the iris or pupil can be calculated which represents the gaze direction. However, it is very advantageous to use additionally color or brightness information, e.g. from a 2D picture or image of the eye, because this additional color or brightness image infromagen can e.g. be used to find and locate the eye and/or features of the eye in the depth data faster and more accurately.

Therefore, according to a further very advantageous embodiment of the invention the 3D capturing device is configured to capture at least one 2D image of the at least one eye of the user, especially wherein the processing unit is configured to determine the 3D position of the at least one predefined feature of the at least one eye additionally in dependency of information derived from the at least one 2D image. This has the great advantage that the predefined feature or certain regions of the eye do not have to be identified only on the basis of the depth data but also with the help of the 2D image, which additionally provides color information or at least brightness information. So, for example, the iris or pupil can first be identified in the 2D image of the eye. Then the corresponding depth information, which corresponds to the 2D coordinates of the identified iris and pupil, can be derived from the depth data. Thereby a much higher accuracy can be achieved in identifying certain features or regions of the eye when using the 2D brightness or color image in combination with the captured depth data. The captured depth data, which can be seen as a depth image, and the 2D image do not necessarily have to be separate image but can also form one single image, e.g. like an image of the eye, wherein each pixel of the image contains a depth value and a color or brightness value. Such images can be captured e.g. by a time of flight camera or a stereo camera system. Such images also can be captured separately, e.g. the depth data can be captured by a laser scanner or a sonar/ultrasound sensor, and the 2D image can be captured by a normal camera or image sensor. So the captured depth data for the different regions or different points of the eye provide for each of these regions or points 3D coordinates, whereas the 2D image provides for different regions or points of the eye 2D coordinates and an additional color or brightness. If the depth image and the 2D image form a single image then in total for each of these regions and/or points 3D coordinates and additional color and/or brightness information is provided. According to another advantageous embodiment of the invention the processing unit is configured to identify the depth data relating to a specific part of the at least one eye in dependency of the at least one 2D image. This way the captured depth data of the different regions of the at least one eye of the user can easily be assigned to certain parts or features of the eye, like the sclera, the pupil, the iris, by identification of these features in the 2D image.

According to another advantageous embodiment of the invention the processing unit is configured to map the captured depth data of the different regions of the at least one eye to corresponding regions of the eye in the 2D image. This mapping can be done simply on the basis of the common 2D coordinates of the 2D image and the 3D depth data. This way one can e.g. locate the iris/pupil in the 2D image, determine a iris/pupil center and then assign the corresponding depth value from the depth data to the determined iris/pupil center. This mapping also works the other way round, one can e.g. identify the iris/pupil in the 2D image, map the identified region to the corresponding region in the depth data and therefore determine the position of the iris/pupil center and/or iris/pupil shape in 3D space.

According to another advantageous embodiment of the invention the processing unit is configured to determine the orientation of the at least one eye and/or the gaze direction in dependency on the information derived from the depth data, and in particular additionally in dependency on the information derived from the at least one 2D image. The ray through the 3D pupil/iris center and the eye position, which is the eyeball center, or the cornea center determines the optical axis of the eye, which, especially after calibration, provides the gaze direction. The gaze direction can also be calculated as the normal vector and the iris/pupil center of the iris plane. There are also further different possibilities of determining the orientation and/or gaze direction of the eye, wherein all of them advantageously make use of the captured depth data and optionally of the 2D image data. The gaze direction and/or orientation of the eye can therefore be determined very precisely and with high accuracy, and especially not necessarily having to rely on the detection of corneal reflections.

According to another advantageous embodiment of the invention the processing unit is configured to fit a predefined geometrical form to the predefined feature of the eye in the 2D image and/or to the depth data corresponding to the predefined feature. For example, for finding the 3D position of the iris center one can identify the iris in the 2D image and fit a circle along the iris boundary. From this fitted circle one can calculate the center of the circle as 2D coordinates and assign the corresponding depth value to this calculated iris center from the depth data. On the other hand one could also identify the iris in the 3D depth data by fitting a circle shaped plate to the depth data and then calculate the center point of this plate, which provides the 3D position of the iris center. Also one could identify the sclera, which is the white part of the eye, in the 2D image and assign the corresponding depth values to the found sclera and then fit the sphere to the depth data corresponding to the sclera, which gives the sphere of the eyeball. Again from this 3D eyeball sphere the center of the eyeball may be calculated. So advantageously on the basis of the depth data and optionally on the basis of the 2D image the eye and different features of the eye can be modelled in 3D space so that the 3D positions of centers of these features or also the 3D positions of the whole shape of these features can be calculated and used for example for determining the gaze direction or orientation of the eye.

According to another advantageous embodiment of the invention the eye tracking device comprises at least one light source to illuminate the eye and to produce a reflection of the eye capturable by the 3D capturing device, wherein the processing unit is configured to identify the reflection in the 2D image and to assign a depth value to the identified reflection based on the depth data corresponding to the region of the eye, in which the reflection was identified in the 2D image. This has the great advantage that for example the cornea center or even the gaze direction can be calculated on the basis of one single corneal reflection instead of using two corneal reflections. In the prior art at least two corneal reflections are needed to determine the 3D cornea position. Instead according to this embodiment of the invention it is possible to use the 3D positions of the light source, the 3D position of the capturing device and the determined 3D position of the reflection on the cornea of the eye to determine the 3D position of the cornea center, e.g. by assuming an average cornea radius, and maybe to determine the gaze direction therefrom. As corneal reflections cannot always be detected reliably it is very advantageous to reduce the number of necessary detectable reflections to one single reflection, on the basis of which the 3D cornea position and even the gaze direction or orientation of the eye can be calculated very accurately. According to another advantageous embodiment of the invention the eye tracking device comprises at least one illumination unit configured to illuminate the at least one eye for capturing the depth data. The light source for producing the cornea reflection mentioned before can be separate from this illumination unit for capturing the depth data. Depth sensors using such an illumination unit are, for example, light coding systems which combine a camera or an image sensor with a light emitter which projects a structure light pattern or code on the scene, wherein light returns distorted depending upon the 3D structure of the illuminated objects in the scene. Also when using a time of flight camera the scene is illuminated with light and the camera measures for each pixel in the image the travelling time of the light from the light source to the scene and back onto the sensor. Also 3D laser scanning can be used as a depth sensor, where the laser scanner scans the scene with a laser beam and computes a 3D model of it. If a depth sensor is used for capturing the depth data, which comprises such an illumination unit, it is a very advantageous embodiment of the invention that the processing unit is configured to determine whether the light captured by the capturing unit originated from the illumination unit, or from somewhere else, like environmental light, or from the light source for producing the corneal reflection. This can be done for example by using a predefined illumination pattern and/or a predefined time multiplexing scheme and/or a predefined spectral multiplexing scheme. This way it can advantageously be avoided that the illumination unit of the depth sensor causes noise in the 2D image. So for example if a certain illumination pattern is used for illuminating the eye when capturing the depth data, this known illumination pattern can again be subtracted from the 2D image. Furthermore, if the depth data and the 2D image are captured separately, the 2D image and the depth data can be captured for example in an alternating fashion so that the 2D image can be captured when the illumination unit of the depth sensor does not illuminate the eye. Also the illumination unit can illuminate the eye in a predefined spectral range, in which the image sensor for capturing the 2D image is not sensitive. The same principals advantageously can be used if the eye tracking device comprises above named at least one light source for producing a reflection on the eye, so that this reflection does not cause noise when capturing the depth data and the light originating from the illumination unit of the depth sensor does not disturb the detection of the corneal reflection in the 2D image. Therefore it is also a very advantageous embodiment of the invention when the processing unit is configured to determine whether light captured by the 3D capturing device originated from the illumination unit or the at least one light source, especially again by means of a predefined illumination pattern and/or a predefined time multiplexing scheme and/or a predefined spectral multiplexing scheme.

According to an especially advantageous embodiment of the invention the illumination unit is configured to create a light pattern for capturing the depth data and the 3D imaging device is configured to capture the 2D image while the illumination unit creates the light pattern, and the processing unit is configured to decode the light pattern and/or to subtract the light pattern from the 2D image, especially to improve the signal to noise ratio of the eye tracking data. Thereby it can advantageously be avoided that the illumination pattern has any negative influence on deriving information from the 2D image and moreover the depth data and the 2D image can be captured synchronously without negatively influencing each other.

For the same reasons it is also very advantageous when, according to another embodiment of the invention, the 3D imaging device is configured to capture the depth data while the at least one light source for producing a corneal reflection illuminates the eye, wherein the processing unit is configured to compensate for the light caused by the at least one light source when creating the depth image.

Furthermore, the invention also relates to a method for determining a position of at least one predefined feature of at least one eye of a user by means of an eye tracking device, wherein the eye tracking device comprises a 3D capturing device that captures depth data of different parts of the at least one eye of the user and a processing unit that determines the 3D position of the at least one predefined feature of the at least one eye in dependency of information derived from the captured depth data, especially to improve the depth image signal to noise ratio.

This way, according to these embodiments light produced for capturing the 2D image does not disturb the capturing of the depth data and light produced for capturing the depth data does effectively not influence the 2D image. The preferred embodiments and advantages thereof described with regard to the eye tracking device according to the invention correspondingly apply to the method according to the invention, wherein in particular the embodiments of the eye tracking device constitute further steps of preferred embodiments of the method according to the invention.

Furthermore, though most embodiments of the invention are described for the sake of simplicity with regard to one eye of the user, all embodiments and features described also apply for the other eye of the user as well. Especially, the 3D capturing device may also be configured to capture the depth data of different regions of each eye of the user, respectively, and the processing unit is configured to determine the 3D position of the at least one predefined feature of each eye in dependency of information derived from the captured respective depth data. Also, the eye tracking device can comprise two 3D capturing devices, one for each eye, or the 3D capturing device can comprise two cameras and/or depth sensors and/or illumination units, one for each eye. Moreover, the eye tracking device can be configured as a remote eye tracking device or as a head mounted eye tracking device. Furthermore, the eye tracking device is preferably configured to perform the described steps, especially the data and image acquisition and analysis and the determination of the position of the eye features, and in particular the determination of the orientation of the eye and/or gaze direction, repeatedly, to be able to track the orientation and/or gaze direction.

Further features of the invention and advantages thereof derive from the claims, the figures, and the description of the figures. All features and feature combinations previously mentioned in the description as well as the features and feature combinations mentioned further along in the description of the figures and/or shown solely in the figures are not only usable in the combination indicated in each case, but also in different combinations or on their own. The invention is now explained in more detail with reference to individual preferred embodiments and with reference to the attached drawings. These show in:

Fig. 1 a schematic illustration of an eye tracking device comprising a 3D capturing device according to a fist embodiment of the invention; a schematic illustration of an eye tracking device comprising a 3D capturing device according to a second embodiment of the invention; a schematic illustration of an eye tracking device comprising a 3D capturing device according to a third embodiment of the invention; a schematic illustration of a 3D capturing device providing depth data in form of a 3D model of an eye according to an embodiment of the invention; a schematic illustration of a 2D image of an eye captured by the 3D capturing device according to an embodiment of the invention; Fig. 6 a schematic illustration of a method for determining the cornea center of an eye on the basis of a detected single cornea reflection and by means of the depth data captured by means of the 3D capturing device according to an embodiment of the invention; and a schematical illustration of an eye and the most relevant eye features for determining the orientation of the eye and/or the gaze direction.

The invention target is to compute features of the user's eye, like position of the eyes or pupils, gaze direction or point of regard. The introduced technique does not necessarily require any active illumination for producing corneal reflections, which is usually used especially to measure the distance of the eyes to the eye tracker. With the use of a depth sensor instead, eye tracking can be done more robust and the results can be computed more precise, which will be shown in more detail in the following for some example embodiments of the invention.

First of all, the eye, eye features and methods for determining the gaze direction of orientation of the eye shall be explained in general. For this purpose, Fig. 7 shows a schematical illustration of an eye 10 and the most relevant eye features. The eye 10 comprises an eyeball 12, the form or shape of which can be assumed to be similar to a sphere. The eyeball 12 comprises an eyeball center 12a, which is the center of this sphere. The sclera 14, the white part of the eye 10, is located on the sphere of the eyeball 12. Furthermore the eye 10 comprises an iris 16, which is approximately planar and circular shaped, and the pupil 18 is the hole in the iris 16. The center of the pupil and the iris is denoted by 16a. The cornea 20 is a transparent and convex front part of the eye 10 that covers the iris 16 and the pupil 18. The shape of the cornea 20, especially the cornea surface, can be approximated by a sphere as well, which is denoted by 20a and which is, except for the part, which is the surface of the cornea 20, drawn with a dashed line in Fig.7. The center of this sphere, also called cornea center, is denoted by 20b. The optical axis 22 of the eye 10 passes through the eyeball center 12a, the cornea center 20b and the iris/pupil center 16a. The optical axis 22 defines the orientation of the eye 10. So by determining the position of the optical axis 22 in 3D space, the orientation of the eye 10 in 3D space is known as well. Also the gaze direction or line of sight can be derived from the determination of the optical axis 22. Usually, the line of sight and the optical axis 22 differ from each other by a (user specific) angle, which can be determined e.g. in a calibration procedure of the eye tracking device.

Aim of eye tracking is in general obtaining at any given time the best quality achievable in terms of a low accuracy error, a low imprecision and being robust in the detection. So far, very good quality is achieved by eye trackers which use active infrared (IR) light sources and an IR camera. The reflections, especially corneal reflections, of the light from the light sources on the cornea of the user's eyes are detected in the camera image and allow the reconstruction of the eyes' positions. Without using light sources, it would not be possible to reconstruct the absolute position of the eyes from only the camera image, since the global scale factor cannot be estimated. The corneal reflections from the light sources provide this missing information and allow estimating the distance of the eyes to the eye tracker. In the following very advantages embodiments of the invention will be presented, by means of which the 3D position of one or more of above describe eye features can be determined in a very accurate and reliable way without needing to detect two corneal reflections, and according to some embodiment without needing any corneal reflections, therefore providing the possibility of calculating the orientation of the eye 10 and/or the gaze direction also very accurately and reliably. This is achieved by using a 3D capturing device with a depth sensor for providing depth data of different regions of the eye 10. Fig. 1 , Fig. 2 and Fig. 3 each show a schematical illustration of an eye tracking device 30 according to different embodiments of the invention. The eye tracking device 30 comprises a 3D capturing device 32 configured to capture depth data 34 (compare Fig.

4) of different regions of the at least one eye 10 of the user 36 and a processing unit 38 configured to determine the 3D position of the at least one predefined feature, like the eyeball 12, sclera 14, iris 16, pupil 18 and/or cornea 20 (compare Fig. 7) of the eye 10 in dependency of information derived from the captured depth data 34. Additionally, the 3D capturing device 32 can also be configured to capture 2D images 40 (compare Fig.

5) in form of a picture containing brightness and/or color information about the eye 10 and/or face of the user 36.

So the eye tracking device 30 of these embodiments of this invention can be separated into two units: a capturing unit, which is the 3D capturing device 32, and an eye tracking processing unit 38. Both are described in detail in the following. The 3D capturing device 32 is able to take a picture of the scene in a (given) spectrum, e.g. visible light or infra-red, or both. This picture constitutes the 2D image 40, which is shown schematically in Fig. 5. The 3D capturing device 32 is also able to capture depth-information and outputs any kind of 3D reconstruction of the scene. This 3D reconstruction is schematically shown as a 3D model in Fig. 4.

The 3D capturing device 32 can be implemented in different ways as shown in Fig. 1 , Fig. 2 and Fig. 3. It may comprise one or more cameras 42 or image sensors, may have an illumination unit 44, e.g. comprising active illuminators like a light emitter 44a, a light source 44b, laser emitters or a laser scanner 44c, and can include a second processing unit 46 to post-process the acquired data. The following list shows some state-of-the-art techniques to acquire such data. The herein described eye tracking device 30 may use one or more of these standard techniques or any other hardware which provides an image 40 of the scene together with a 3D model 34. the 3D capturing device 32 can comprise:

• a light coding system, as illustrated in Fig. 2, which combines a camera 42 or an image sensor with a light emitter 44a which projects a structured light pattern or code on the scene, especially the eye 10 of the user 36. Light returns distorted depending upon where things are and is captured by the camera 42 or image sensor. State of the art algorithms are then used for triangulating the 3D data of the scene to provide the 3D model 34. At the same time, the camera 42 may provide the 2D image 40 of the eye 10.

• a time-of-flight camera, which is also illustrated by Fig. 2. The scene is illuminated with light by means of a light source 44b, and the camera 42 measures for each pixel in the captured image the travelling time of the light from the light source 44b to the scene and back onto the sensor of the camera 42. By means of a time-of-flight camera the depth data for providing the 3D model 34 and the 2D image 40 can be captured. Optionally, the 3D capturing device 32 may comprise a separate camera or image sensor for capturing the 2D image

40.

• a stereo camera system, which is schematically illustrated in Fig. 1 . It takes at least two images of the scene from different viewpoints, preferably captured at the same time. These at least two images are captured by at least two cameras 42 or image sensors at predefined and separated positions. The objects seen in the images are matched and used to reconstruct the 3D model 34 of the scene. Also one of these cameras or both can be used to capture the 2D image 40, either separately or as one of the images used for constructing the 3D model 34.

• a 3D laser scanner 44c together with a camera 42, which is also represented by Fig. 2. The laser scanner 44c scans the scene with a laser beam, like a dot- laser, a laser line, stripe pattern or other patterns, and computes a 3D model 34 of it. The camera 42 or image sensor takes an image 40 with the necessary color information.

• a sonar and/or ultrasound sensor: Such a sensor is slower compared to the previous type of sensors, but it is completely unaffected by environmental light, and additionally has the great advantage that it is able to capture depth data of the cornea 20. • Any other camera system which delivers RGBD images, i.e. images where each pixel has color (RGB) and depth (D) information.

Optionally the eye tracking device 30 may further comprise one or more additional light sources 45 for producing one or more reflections on the eye, especially on the cornea 20, as illustrated in Fig. 3. Especially any of the above described depth sensors, or 3D capturing devices 32 in general, can be combined with such a light source 45 for producing a corneal reflection.

The 3D capturing device 32 outputs a 3D model 34 of the captured scene with the eye 10 of the user 36, as exemplarily shown in Fig. 4. The representation of the 3D data may be different, depending on the used system, namely the depth sensor, to acquire the data:

• As z-map: A z-map is a rasterized image where each pixel contains distance information of the object represented by that pixel. Usually, the distance information is a scalar number and expresses the distance between the object and the measuring device in any metric.

• As point cloud: If only sparse points on the surface of the object are measured, a possible representation is the point cloud. It consists of coordinates of each measured point in 3D space.

• As mesh, patch model or CAD model: Post-processing of point-cloud or z-map data allows the creation of a 3D surface mesh. It usually consists of vertices (points in 3D) and faces/patches which connect the vertices. The faces may be triangles, quadrilaterals, spline surfaces, other polynomial surfaces, CAD primitives or any other surface patches which describe the surface.

Fig. 4 shows a schematical illustration of such a 3D model 34 as a mesh model, provided by the 3D capturing device 32 of the eye tracking device 30. The 3D capturing device is 32 configured to capture depth data, on the basis of which this 3D model is 34 provided. Here, exemplarily two points P1 (x1 , y1 , z1 ) and P2(x2, y2, z2) are shown in different positions, and therefore in different regions of the eye 10 with the corresponding 3D coordinates x1 , y1 , z1 and x2, y2, z2, respectively. The 3D capturing device acquires plenty of such depth data for different points or regions of the eye 10. So for each point or region of the eye 10 the respective 3D coordinates with regard to the reference coordinate system 48 of the 3D capturing device 32 can be determined.

The 3D capturing device 32 also captures a 2D image 40 as shown in Fig. 5. This image 40 provides for different points or regions, here exemplarily for the two points P1 (x1 , y1 ) and P2(x2, y2) corresponding to the two points P1 (x1 , y1 , z1 ) and P2(x2, y2, z2) of the 3D model 34, the corresponding 2D position x1 , y1 and x2, y2 respectively with regard to the reference coordinate system of the 3D capturing device 32, and additional brightness and/or color information.

The depth image constituted by the 3D model 34 and the 2D image 40 are here shown for illustrative purpose separately, but do not have to be separate images. These images can also be provided as a combined image, representing for each captured point or region 3D coordinates as well as a corresponding color and/or brightness information. For example by means of a time-of-flight camera or stereo cameras an image can be provided containing for different regions, e.g. different pixels of the image, the depth data in form of the 3D coordinates as well as the corresponding color information for that region or pixel. The 3D model 32 together with the image 40 of the scene is then passed to the eye tracking processing unit 28.

Usually, eye tracking uses reflections of light sources on the cornea 20 in order to reconstruct the position of the eyeball 12/cornea 20 in 3D space. The image of the pupil is then used to find the 3D position of the pupil on the 3D eye model which gives information about the gaze direction. The main idea of this invention is to replace the need for (more than one) corneal reflections by the depth information. The depth sensor gives enough information to reconstruct 3D locations of (parts of) the eye 10. This depth-information together with e.g the 2D location of the pupil 18 in the image 40 is enough to reconstruct the position of the eye 10, the gaze direction or the "Point Of Regard" or combinations of them.

In 3D-model-based eye tracking techniques, the corneal reflections (CRs) of the light sources are used to determine the position of the eye 10/cornea 20. The gaze direction is estimated by finding the pupil 18/iris 16 in the image and projecting it onto the eyeball 12 or cornea 20 of the eye 10. The ray through the 3D pupil/iris center 16a and the eye position provides the gaze direction (i.e. the optical axis 22). In contrast thereto according to this invention, one can also detect the pupil 18/iris 16 in the image 40, but advantageously one does not have to rely on having corneal reflections available to estimate the eye/cornea position.

In the following several possibilities on reconstructing eye properties with using depth- information are presented:

At first the reconstruction of the 3D shape of the pupil and iris is described: First, the 3D coordinates of the pupil/iris contour points are determined. This can be done in a number of different ways. For example a 2D image segmentation of the 2D image 40 can be performed, considering image brightness, texture, color (if available) local gradient etc.. Thereby the contour points of the pupil 18 and/or iris 16 can be identified in the 2D image 40 and their corresponding coordinates x,y. Then the part in the 3D data representation 34, which corresponds to the found contour points of the pupil 18/iris 16 in the 2D image 40 is identified. This can easily be done by assigning to the contour points, which 2D coordinates x,y, are known form the 2D image, the respective z-coordinates corresponding to these x, y, coordinates on the basis of the 3D model. Alternatively, one can also use, in particular only, the depth information, i.e. the 3D model 34, to find the pupil/iris region considering that the iris 16 is flat/planar and the pupil 18 is a hole in it. So, the processing unit 38 is configured to identify the depth data relating to the pupil 18 and/or iris 16 on the basis of a predefined geometrical surface model of the pupil 18 and/or iris 1 6 by searching for surface structures in the depth data, i.e. the 3D model 34, fitting the predefined geometrical surface model . This way the shape of the pupil 18 and iris 16 and their 3D positions can be determined solely on the basis of the 3D model, without the need of the 2D image 40 or any brightness or color information. But also depth information and color and/or brightness information can be combined. E.g. the identification of the pupil 18 and/or iris 16 in the 3D model 34 can be verified or corrected by using the color information of the 2D image 40, etc.. A first possibility and embodiment is to reconstruct the pupil 18/iris 16 as a circle or ellipse in 3D by a model fitting technique and obtain the pupil and/or iris center 16a and a pupil and/or iris orientation vector, which denotes the direction where the pupil is facing towards, e.g. the surface normal in the pupil center 16a.

A second possibility and embodiment is to find some center point of the surface patch representing the pupil 18/iris 16 without fitting a geometrical model, but instead by finding a center of gravity of the contour points of the pupil 18/iris 16. Also the pupil/iris orientation vector which represents the eye direction can be calculated as some form of weighted average of the orientation vectors of the individual contour points, possibly detecting and filtering out outliers. It can be either the normal vector in some centered point of the patch, an Eigen-direction computed by a singular-value-decomposition (SVD), the normal vector of a fitted surface patch in the center point 16a, or any other point-wise or averaged direction vector computed from the pupil/iris surface patch.

In all of the above possibilities, it is advantageous to consider that all the iris contour or the pupil contour (or both) must be lying close to each other - this condition can be used to filter out outliers, and to determine the coordinates of the pupil or iris center 16a with greater accuracy.

The herein computed pupil center point 16a and the pupil orientation vector describe the gaze direction (optical axis 22) of the reconstructed eye 10.

Next with regard to another embodiment of the invention the reconstruction of the 3D cornea 20 will be described: The depth information can be used to detect and reconstruct the cornea shape. This technique can only be used if the depth sensor detects the cornea 20 which is difficult since the cornea 20 is made of transparent and specular reflecting material. However, some techniques are able to detect those materials, like e.g. acoustic-based (sonar) detectors. The reconstruction of the cornea comprises the following steps:

• Determining the position of the eye 10 in the 2D image 40. That could include searching for a face and then guessing the eye's positions from the position of the face. Also, features of eyes 10 can directly be searched, e.g. by searching for specific patterns like circles/ellipses which could represent a pupil 18 or iris 16, searching for especially dark regions which could be a pupil 18 or finding matches of a template eye image.

• Determining the part in the 3D data, e.g. the 3D model 34, which corresponds to the eye 10 and/or parts of the eye 10.

· Fitting a model of the cornea 20 to the found shape. The easiest possibility would be to fit a sphere assuming that the cornea curvature is nearly constant.

• Determining the pupil 18 in the 2D image 40.

• Projecting the 2D pupil 18 or pupil center 16a onto the 3D cornea 20. This technique is similar to current state-of-the-art model based eye reconstruction techniques, but the reconstruction of the cornea 20 is here done via the 3D data 34 instead of the observed CRs.

According to another embodiment of the invention the reconstruction of the 3D eyeball center 12a, and optionally also the radius of the eyeball 12 can be performed by the following steps, executed by the processing unit 38:

• Determining the position of the eye 10 in the 2D image 40.

• Determining the part in the 3D data 34 which lie on the surface of the sclera 14 (the white part of the eye). The white part of the eye 10 can be determined on the basis of the 2D image 40 and by mapping the identified region to the 3D depth data the 3D depth data relating to the sclera 14 can identified.

• Performing a spherical fitting on the 3D patch corresponding to the sclera 14.

Therefore the result is a sphere in 3D space representing the eyeball 12. By this determined sphere also the center of the sphere, which corresponds to the eyeball center 12a, is determined in 3D space. The radius of the sphere then corresponds to the eyeball radius, which can therefore be determined as well.

According to another embodiment of the invention the reconstruction of the 3D boundary of the pupil 18 and/or iris 16 can be performed by the following steps executed by the processing unit 38:

• Determining the boundary of the pupil 18 or iris 16 in the 2D image 40.

• Fitting an ellipse to that determined boundary. • From the center of that ellipse and the depth value at the center point, derived from the captured depth data of the eye 10, directly reconstruct the 3D pupil/iris center 16a

• From the minor and major ellipse axis radii, reconstruct the 3D pupil/iris circle.

In the following embodiment of the invention, which is schematically illustrated in Fig. 6, a technique is presented that uses one light source 45, for producing a visible reflection, especially a cornea reflection 50, in the user's eyes 10. So according to this embodiment shown in Fig. 6, the eye tracking device 30 comprises the 3D capturing device 32 and the light source 45, as well as the processing unit 38 (not shown in Fig. 6). In this embodiment the processing unit is configured to reconstruct the 3D cornea 20 and 3D cornea center 20.

This reconstruction of the cornea 20 from a single glint, which is the cornea reflection 50, comprises the following steps:

• Detecting the cornea reflection 50 for each eye 10 in the 2D image 40.

• Use the depth-information at that point to locate the 3D position of the cornea reflection 50, i.e. the point where the ray from the light source 45 reflects at the cornea surface 20 and goes through the camera of the 3D capturing device 32.

o If the depth information at that point is not available, for example in some embodiments the depth sensor will not be able to measure distances to translucent surfaces such as the cornea 20, then a distance can be measured to a nearby non-translucent surface, such as the limbus, sclera 14, or the iris plane 16 itself;

o Such distance in some embodiments will be a sufficient approximation to the true distance of the cornea reflection 50, especially those where the distance between the eye tracking device 30 and eye 10 is much bigger than the cornea radius itself (for example, for so called "remote" eye tracking systems the distance between the eye tracking device 30 and eye 10 is usually larger than 30cm, while the cornea radius is on average 7.8mm, so for said systems the approximation would introduce a very little error). o Such estimation can be further improved by using a geometrical model of the eye which considers the actual position of the cornea surface with respect to the portion of the eye for which the distance can be measured by the depth camera, so for example considering the expected distance between the iris plane and the detected reflection point on the cornea surface.

• Bisecting the angle a of the incoming and outgoing ray in the 3D reflection point 50.

• Following the line 52, which is the bisection line lying in the plane of the incoming and outgoing ray in the 3D reflection point 50, through the reflection point 50 in direction away from the eye tracking device 30 by an estimated predefined length which approximates the radius R of the cornea curvature. The end point will be an approximation for the cornea center 20b.

• Projecting the pupil or pupil center 16a from the 2D image 40 onto the cornea sphere 20a and therefore obtain a 3D pupil position.

On the basis of the cornea center 20b and the 3D pupil position the orientation of the eye 10 and/or the gaze direction can be determined in 3D space.

So according to the invention and its embodiments eye tracking can be highly improved by using a depth sensor. These advantageous embodiments of the invention make it possible to determine and reconstruct the position/orientation and shape of the eye directly from the depth data, so that a direct reconstruction/tracking of the user's eyes using 2D and 3D data without the need of reconstructing any other objects like the head or the body of the user is made possible. The result can be various eye properties, like the eye's position, orientation, pupil size, iris size or the gaze direction. Therefore the invention provides possibilities to improve the accuracy and precision of eye tracking and makes the determination of eye features more reliable. However, if a depth sensor using an illumination unit for acquiring the depth data is used, the illumination of eye by means of this illumination unit might disturb the analysis of the 2D image. On the other hand, if e.g. a light source for producing a cornea reflection on the eye is used, this light might disturb the capturing and evaluation of the depth data. To avoid such mutual influences, very advantageous of the invention are presented in the following, which further enhance the eye tracking accuracy.

In another embodiment, like schematically shown in Fig. 3, the 3D capturing device 32 comprises one or more cameras 42 and an illumination unit 44 to generate a depth image. The illumination unit 44 can comprise e.g. a structured illumination emitter, also called structured light pattern projector in the following, which projects a structured light pattern. The illumination unit 44 can also be configured as the illumination unit 44 of a time-of-flight camera. Furthermore the eye tracking device 30 comprises at least one light source 45, like one or more lamps and/or LEDs, to generate cornea reflections on the eye 10. The one or more cameras 42 also acquire brightness and/or color information in form of a 2D image 40.

The structured light pattern projector or the active illumination unit of time of flight system is used in combination with the camera(s) 42 to create a depth image 34 and the lamp(s)/LED(s) 45 produce one or more glints, which can be used to determine the 3D position of the eye and/or one or more eye features.

It is then possible, as in the previous embodiments, to simultaneously activate the depth image projector and the lamp(s)/LED(s), or when it is advantageous the two illumination systems, namely the illumination unit 44 and the at least one light source 45 (the projector and the LEDs) can be activated in time multiplexing, for example in alternating fashion, or with any other type of temporal sequence. The advantage of such approach is that then the camera(s) image acquisition can be synchronized with either illumination system (the projector or the LEDs) so that the two do not interfere with each other or with the detection of specific features: for example in some cases it might be simpler to detect the contour of the pupil 18 or the iris 16 when the structured illumination pattern is not active.

It is furthermore possible to adopt a spectral multiplexing scheme, where one illumination group (e.g. the depth multiplexing projector) is emitting a specified subrange of the spectrum where the image sensor 42 is sensitive (for example, between 750 nm and 830 nm) and the other illumination group, e.g. the LED(s) for producing the cornea reflection(s), emit in another sub-range of the camera spectrum (for example, between 830 nm and 900 nm).

The image sensor 45 can then distinguish between the two illumination groups in a similar fashion as a color RGB sensor distinguishes among different colors: a first subset of the pixels in the image sensor 45 will be equipped with an optical bandpass filter which lets in only the light from one illumination group, e.g. the illumination unit 44, and a second subset of the pixels will be equipped with a different optical bandpass filter which lets in only the light from the other illumination group, e.g. the light source 45.

Alternatively, multiple image sensors/cameras 45 can be used, for example with one camera being a time of flight camera and the other a standard camera. It would then be possible to use different optical filters for the different cameras, each filter letting in only the light of the corresponding illumination subsystem (depth illumination unit for one camera, CR producing LEDs for the other camera) and blocking the light of the other, interfering subsystem or subsystems.

It is furthermore possible to time-multiplex the image acquisition windows of said camera systems, and synchronize the emissions of the illumination units with said windows of the corresponding cameras.

The information provided by each system (the depth image, and the glint positions) is then combined by the processing unit 38 to calculate the eye position, orientation and other features as already described.

In embodiments where the system uses a (structured) light coding technique to obtain distance information, in a system which simply combines a depth camera and an eye tracking system, the light coding pattern superimposed on the image of the eye will be perceived as noise or disturbance from the eye tracker. Likewise, the eye tracker's own illumination (e.g. glints) will be a disturbance for the depth camera.

By combining both techniques in one system, above mentioned problems can be solved the following features: • The depth imaging algorithm, executed by the processing unit 38, decodes the light pattern on the image creating a depth map 34. Based on said decoding it is then possible to subtract the detected light pattern from the image, effectively reducing or cancelling the effects of said illumination from the regions where the eyes 10 are detected, improving signal to noise ratio of the eye tracking device

30.

• Likewise, the eye tracking algorithm executed by the processing unit 38 decodes and recognizes its own light pattern (if it uses additional eye tracking specific illumination, for example glint generating lamps) and said detected eye tracking illumination can be subtracted from the image which is to be processed by the depth imaging algorithm, again improving signal to noise ratio of the depth image reconstruction.

Therefore, the processing unit 38 is configured to recognize the light pattern produced by the illumination unit 44 for acquiring the depth data and to subtract this light pattern from the 2D image 40 for evaluating the brightness and/or color information and to determine eye features like the pupil 18, iris 16, sclera 14 or corneal reflections 50 in that image 40. On the other hand the processing unit 38 is also configured to recognize light originating from the light source 45 for producing cornea reflections 50 when acquiring and evaluating the depth data for providing the 3D model 34.

All of these methods can be combined in an arbitrary way to further enhance accuracy, for example a stereo camera system with a structured light projector. Also, additional light sources may be used to produce cornea reflections 50 on the cornea 20. The reconstruction of the eye/cornea position using the cornea reflections can be used in conjunction with the above described methods.

To conclude, compared to traditional IR based eye tracking, the main advantage of the present invention and its embodiments is that it is not necessary to rely on the presence of glints (cornea reflections) to determine eye distance information.

This has the following advantages, since it is known from the state of the art that at least two cornea reflections must be visible on the cornea to determine the cornea distance from the camera, however even in the presence of two illuminators, less than two cornea reflections might be visible at a given time due to a number of factors:

• Cornea reflections disappearing from the cornea due to eye dryness or irregularity on the cornea surface;

· Temporary occlusion of the light source beams;

• Cornea reflections disappearing due to occlusion from the eyelids;

• Cornea reflections falling outside of the cornea due to the eye being oriented at a larger angle with respect to the camera; In addition, while a cornea reflections provide exclusively information about the center of the cornea position, a depth sensor provides a lot more useful information, such as position of other eye and facial features (iris/pupil plane, eye lids, sclera, eye corners, etc), and in case the user is wearing corrective glasses/spectacles it is possible to use the depth sensor to measure the position of said glasses on the user's face with respect to the eye which can be used e.g. to compensate the error in the measured eye position and distance due to the refractive effect of said glasses' lenses.

Also, glints are destructive elements of the image as they completely saturate the brightness of the pixels which they hit, so that it happens that glints may occlude completely or partially the pupil. Even in case of partial occlusions this results in less accurate detection and measurement of said pupil's contour and center.

Another important point is that, to achieve sufficiently accurate eye distance measurements from glints/cornea reflections, it is typically necessary to position the light sources (LED) far apart from each other and from the camera, this because the distance measurement error grows quickly as the cornea reflections appear closer to each other in the image, to the point where it becomes impossible to measure any distance when they end up merging/fusing - this happens when the user moves away from the eye tracking device and it is a strong limiting factor in the maximum tracking range of an eye tracking device.

By the use of a depth sensor it is not necessary anymore to position light sources far away from the camera of the eye tracking device, so that it is possible to realize very small, compact and portable eye tracking devices with a large tracking range. A further advantage is the possibility to provide both the functionality of an eye tracking system as well as a depth camera/3D sensor in a single package, sharing the same hardware. The depth camera can then be used besides the eye tracker to provide additional functionality, for example to measure objects or a 3D space in front of the device, or as an additional input mechanism which detects for example hand gestures or body movements.

An obvious advantage is the reduced costs compared to a system which would simply integrate an independent eye tracker and a depth camera side by side. Another obvious benefit is that in many cases the eye tracker illumination and the depth camera illumination (for active sensor systems such as time of flight cameras or light- coding/structured illumination sensors) would normally interfere with each other making both systems' performance compromised or impossible; this would not happen with embodiments of this invention.

Summarized, the usage of a depth-sensor has the following benefits:

• Increased robustness since no corneal reflection needs to be visible in the user's eyes.

• Increased accuracy/precision of the results since the depth sensor adds additional information to the observed data and some of the problems of calculating the position from corneal reflections can be overcome.

• No need to use infrared spectrum anymore since no active illumination is used.

Eye tracking can be done in any spectrum, e.g. visible light with high precision.

• Eye tracking device can be much smaller since no light source in a fixed distance to the camera is needed.

List of reference signs:

10 eye

12 eyeball

12a eyeball center

14 sclera

16 iris

16a pupil/iris center

18 pupil

20 cornea

20a cornea sphere

20b cornea center

22 optical axis

30 eye tracking device

32 3D capturing device 34 3D model

36 user

38 processing unit

40 2D image

42 camera

44 illumination unit

44a light emitter

44b light source

44c laser scanner

46 second processing unit

48 reference coordinate system

50 cornea reflection

52 bisection line

a angle

R cornea radius

P1 (x1 , y1 , z1 ); P2(x2, y2, z! 3D points

P1 (x1 , y1 ); P2(x2, y2) 2D points

Claims

CLAIMS:

1 . Eye tracking device (30) for deternnining a position of at least one predefined feature (12, 14, 16, 18, 20, 22) of at least one eye (10) of a user (36), characterized in that

the eye tracking device (30) comprising:

- a 3D capturing device (32) configured to capture depth data (34) of different regions of the at least one eye (10) of the user (36); and

- a processing unit (38) configured to determine the 3D position of the at least one predefined feature (12, 14, 16, 18, 20, 22) of the at least one eye (10) in dependency of information derived from the captured depth data (P1 (x1 , y1 , z1 ), P2(x2, y2, z2)).

2. Eye tracking device (30) according to claim 1 ,

characterized in that

the at least one predefined feature (12, 14, 16, 18, 20, 22) is at least one of

- iris (16);

- pupil (18);

- cornea (20);

- eyeball (12) and/or sclera (14).

3. Eye tracking device (30) according to one of the preceding claims,

characterized in that the processing unit (38) is configured to deternnine the 3D position of the at least one predefined feature (12, 14, 16, 18, 20, 22) as a 3D position of a center (12a, 16a, 20b) of the at least one feature (12, 14, 16, 18, 20, 22) and/or as a 3D position of a shape of the at least one feature (12, 14, 16, 18, 20, 22).

Eye tracking device (30) according to one of the preceding claims,

characterized in that

the 3D capturing device (32) is configured to determine a 3D model (34) of the captured eye (10) on the basis of the captured depth data (P1 (x1 , y1 , z1 ), P2(x2, y2, z2)) and to provide the 3D model (34) to the processing unit (38).

Eye tracking device (30) according to one of the preceding claims,

characterized in that

the 3D capturing device (32) is configured to represent the captured depth data (P1 (x1 , y1 , z1 ), P2(x2, y2, z2)) as a depth map and/or a point cloud, and especially to create a mesh model and/or a patch model and or a CAD model from the depth data (P1 (x1 , y1 , z1 ), P2(x2, y2, z2)).

Eye tracking device (30) according to one of the preceding claims,

characterized in that

the processing unit (38) is configured to identify the depth data (P1 (x1 , y1 , z1 ), P2(x2, y2, z2)) relating to the predefined feature (12, 14, 16, 18, 20, 22) on the basis of a predefined geometrical surface model of the predefined feature (12, 14, 16, 18, 20, 22), especially by searching for surface structures in the depth data (P1 (x1 , y1 , z1 ), P2(x2, y2, z2)) fitting the predefined geometrical surface model.

Eye tracking device (30) according to one of the preceding claims,

characterized in that

the 3D capturing device (32) is configured to capture at least one 2D image (40) of the at least one eye (10) of the user (36) and the processing unit (38) is configured to determine the 3D position of the at least one predefined feature (12, 14, 1 6, 18, 20, 22) of the at least one eye (1 0) additionally in dependency of information derived from the at least one 2D image (40).

8. Eye tracking device (30) according to claim 7,

characterized in that

the processing unit (38) is configured to identify the depth data (P1 (x1 , y1 , z1 ), P2(x2, y2, z2)) relating to a specific part of the at least one eye (10) in dependency of the at least one 2D image (40).

9. Eye tracking device (30) according to one of the claims 7 to 8,

characterized in that

the processing unit (38) is configured to map the captured depth data (P1 (x1 , y1 , z1 ), P2(x2, y2, z2)) of the different regions of the at least one eye (10) to corresponding regions (P1 (x1 , y1 ), P2(x2, y2)) of the eye (10) in the 2D image (40).

10. Eye tracking device (30) according to one of the preceding claims,

characterized in that

the processing unit (38) is configured to determine the orientation of the at least one eye (10) and/or the gaze direction in dependency of the information derived from the depth data (P1 (x1 , y1 , z1 ), P2(x2, y2, z2)), and in particular additionally in dependency of the information derived from the at least one 2D image (40).

1 1 . Eye tracking device (30) according to one of the preceding claims,

characterized in that

the processing unit (38) is configured to fit a predefined geometrical form to the predefined feature (12, 14, 1 6, 18, 20, 22) of the eye (10) in the 2D image (40) and/or to the depth data (P1 (x1 , y1 , z1 ), P2(x2, y2, z2)) corresponding to the predefined feature (12, 14, 16, 18, 20, 22).

12. Eye tracking device (30) according to one of the claims 7 to 1 1 ,

characterized in that the eye tracking device (30) comprises at least one light source (45) configured to illuminate the eye (10) and to produce a reflection (50) on the eye (10) capturable by 3D capturing device (32), wherein the processing unit (38) is configured identify the reflection (50) in the 2D image and to assign a depth value (z) to the identified reflection based on the depth data (P1 (x1 , y1 , z1 ), P2(x2, y2, z2)) corresponding to the region (P1 (x1 , y1 ), P2(x2, y2)) of the eye (10), in which the reflection (50) was identified in the 2D image (40).

Eye tracking device (30) according to one of the preceding claims,

characterized in that

the eye tracking device (30) comprises at least one illumination unit (44) configured to illuminate the at least one eye (10) for capturing the depth data (P1 (x1 , y1 , z1 ), P2(x2, y2, z2)).

14. Eye tracking device (30) according to one of the preceding claims,

characterized in that

the processing unit (38) is configured to determine whether light captured by the 3D capturing device (32) originated from the illumination unit (44) or the at least one light source (45), especially by means of a predefined illumination pattern and/or a predefined time multiplexing scheme and/or a predefined spectral multiplexing scheme.

15. Eye tracking device according to one of the claims 13 or 14,

characterized in that

the illumination unit (44) is configured to create a light pattern for capturing the depth data (P1 (x1 , y1 , z1 ), P2(x2, y2, z2)) and the 3D capturing device (32) is configured to capture the 2D image while the illumination unit (44) creates the light pattern, and the processing unit (38) is configured to decode the light pattern and/or to subtract the light pattern from the 2D image (40), especially to improve the signal to noise ratio of the eye tracking data.

16. Method for determining a position of at least one predefined feature (12, 14, 16, 18, 20, 22) of at least one eye (10) of a user (36) by means of an eye tracking device (30), characterized in that

the eye tracking device (30) comprises:

- a 3D capturing device (32) that captures depth data (P1 (x1 , y1 , z1 ), P2(x2, y2, z2)) of different parts of the at least one eye (10) of the user (36); and - a processing unit (38) that determines the 3D position of the at least one predefined feature (12, 14, 16, 18, 20, 22) of the at least one eye (10) in dependency of information derived from the captured depth data (P1 (x1 , y1 , z1 ), P2(x2, y2, z2)).