NL2004878C2 - System and method for detecting a person's direction of interest, such as a person's gaze direction. - Google Patents

System and method for detecting a person's direction of interest, such as a person's gaze direction. Download PDF

Info

Publication number
NL2004878C2
NL2004878C2 NL2004878A NL2004878A NL2004878C2 NL 2004878 C2 NL2004878 C2 NL 2004878C2 NL 2004878 A NL2004878 A NL 2004878A NL 2004878 A NL2004878 A NL 2004878A NL 2004878 C2 NL2004878 C2 NL 2004878C2
Authority
NL
Netherlands
Prior art keywords
interest
processor
real time
person
determined
Prior art date
Application number
NL2004878A
Other languages
Dutch (nl)
Inventor
Vladimir Nedovic
Roberto Valenti
Original Assignee
Univ Amsterdam
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Amsterdam filed Critical Univ Amsterdam
Priority to NL2004878A priority Critical patent/NL2004878C2/en
Priority to PCT/NL2011/050423 priority patent/WO2012008827A1/en
Application granted granted Critical
Publication of NL2004878C2 publication Critical patent/NL2004878C2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)

Description

SYSTEM AND METHOD FOR DETECTING A PERSON'S DIRECTION OF INTEREST, SUCH AS A PERSON'S GAZE DIRECTION
The invention relates to a system for detecting a person's 5 direction of interest, such as a person's gaze direction.
The system can also be used to detect other visual directions of interest of a person, such as the direction of a person's head, the person's eye, the person's arm and/or finger pointing, or the person's whole body, or a 10 combination thereof.
Visual gaze estimation is the process which determines the 3D line of sight of a person in order to analyze the location of interest. The estimation of the direction or the 15 location of interest of a user is key for many applications, spanning from gaze based human-computer interaction, advertisement [see: Smith, K., Ba, S.O., Odobez, J.M., Gatica-Perez, D.: Tracking the visual focus of attention for a varying number of wandering people. PAMI 30 (2008)], human 20 cognitive state analysis, attentive interfaces (e.g. gaze controlled mouse) to human behavior analysis. Gaze direction can also provide high-level semantic cues such as who is speaking to whom, information on non verbal communications (e.g. interest, pointing with the head/with the eyes) and 25 the mental state/attention of a user (e.g. a driver).
Overall, visual gaze estimation is important to understand someone's attention, motivation and intentions.
Typically, the pipeline of estimating visual gaze mainly 30 consists of two steps (see Figure 2): (1) analyze and transform pixel based image features obtained by sensory information (devices) to a higher level representation (e.g. the position of the head or the location of the eyes) and 2 (2) map these features to estimate the visual gaze vector (line of sight), hence finding the area of interest in the scene.
5 There is an abundance of research in the literature concerning the first component of the pipeline, which principally covers methods to estimate the head position and the eye location, as they are both contributing factors to the final estimation of the visual gaze [see: Langton, S.R., 10 Honeyman, H., Tessler, E.: The influence of head contour and nose angle on the perception of eye-gaze direction. Perception & psychophysics 66 (2004)] .
Nowadays, commercial eye gaze trackers are one of the most 15 successful visual gaze devices. However, to achieve good detection accuracy, they have the drawback of using intrusive or expensive sensors (pointed infrared cameras) which cannot be used in daylight and often limit the possible movement of the head, or require the user to wear 20 the device [see: Bates, R., Istance, H., Oosthuizen, L.,
Majaranta, P.: Survey of de-facto standards in eye tracking. In: COGAIN Conf. on Comm. By Gaze Inter. (2005)]. Therefore, recently, eye center locators based solely on appearance are proposed [see: Cristinacce, D., Cootes, T., Scott, I.: A 25 multi-stage approach to facial feature detection. In: BMVC. (2004) 277-286; Kroon, B., Boughorbel, S., Hanjalic, A.: Accurate eye localization in webcam content. In: FG. (2008); and Valenti, R., Gevers, T.: Accurate eye center location and tracking using isophote curvature. In: CVPR. (2008)] 30 which are reaching reasonable accuracy in order to roughly estimate the area of attention on a screen in the second step of the pipeline.
3 A recent survey [Hansen, D.W., Ji, Q.: In the eye of the beholder: A survey of models for eyes and gaze. PAMI 32 (2010)] discusses the different methodologies to obtain the eye location information through video-based devices. Some 5 of the methods can be also used to estimate the face location and the head pose in geometric head pose estimation methods. Other methods in this category track the appearance between video frames, or treat the problem as an image classification one, often interpolating the results between 10 known poses. The survey in [Murphy-Chutorian, E., Trivedi, M.: Head pose estimation in computer vision: A survey. PAMI 31 (2009) ] gives a good overview of appearance based head pose estimation methods.
15 Once the correct features are determined using one of the methods and devices discussed above, the second step in gaze estimation (see Figure 2) is to map the obtained information to the 3D scene in front of the user. In eye gaze trackers, this is often achieved by direct mapping of the eye center 20 position to the screen location. This reguires the system to be calibrated and often limits the possible position of the user (e.g. using chinrests). In case of 3D visual gaze estimation, this often requires the intrinsic camera parameters to be known. Failure to correctly calibrate or 25 comply to the restrictions of the gaze estimation device may result in wrong estimations of the gaze.
The invention aims at a more accurate, user-friendly and/or cheaper system for detecting a person's direction of 30 interest.
To that end, the system comprises: a processor; at least one video camera connected to said processor for capturing video 4 data; electronic memory connected to said processor; wherein said processor is arranged to determine in real time an interest vector of a person from said video data; wherein said processor is arranged to determine in real time a 5 salient peak closest to the determined interest vector; wherein said processor is arranged to determine in real time a saliency-corrected interest vector between said person and said closest salient peak; wherein said processor is arranged to determine in real time the deviation between the 10 determined interest vector and the determined saliency-corrected interest vector; wherein said processor is arranged to determine in real time further interest vectors of said person from said video data; and wherein said processor is arranged to calculate in real time recalibrated 15 interest vectors by using a calibration error value calculated from said determined deviation.
Preferably said processor is arranged to determine in real time the salient peaks closest to a multitude of determined 20 interest vectors; said processor is arranged to determine in real time a multitude of saliency-corrected interest vectors between the person and said closest salient peaks; wherein said processor is arranged to determine in real time the deviations between the multitude of determined interest 25 vectors and the multitude of saliency-corrected interest vectors; wherein said calibration error value is calculated from said multitude of determined deviations.
Preferably said processor is arranged to iterate in real 30 time said process of calculating said calibration error value by replacing previous determined interest vectors with interest vectors which are corrected using a previous 5 calibration error value, for calculating a current calibration error value.
Preferably said processor is arranged to calculate in real 5 time said calibration error value by minimizing the difference between the multitude of determined deviations and said calibration error value, for instance by using a weighted least square error minimization method.
10 Preferably said salient peaks in the region around the determined interest vector are determined using saliency data about the area which the person is expected to look at, such as video data, screen capture data or manually input data, such as annotated saliency data.
15
In one preferred embodiment said processor is arranged to determine in real time salient peaks in the region around the determined interest vector from video data before determining said salient peak closest to the determined 20 interest vector.
In a further preferred embodiment said system comprises at least two video cameras connected to said processor, one camera for capturing video data of a person's face and/or 25 body, and one camera for capturing said video data.
In an alternative preferred embodiment the processor, electronic memory and said at least two video cameras are combined in one device. The device may for instance be a 30 smartphone, having a videocamera in the back aimed at an area of interest, and a webcam in the front, aimed at the user's face and eyes. A smartphone with gaze detection capabilities is described in US 2010/0079508, wherein gaze 6 detection is used to determine if a person is looking at the screen of the smartphone. By using the teaching of the current invention, the smartphone can be used to detect which objects behind the smartphone the person is looking 5 at.
The invention furthermore relates to a method for detecting a person's interest direction, wherein a processor performs the steps of: determine in real time an interest vector of a 10 person from video data captured by a video camera; determine in real time a salient peak closest to the determined interest vector; determine in real time a saliency-corrected interest vector between said person and said closest salient peak; determine in real time the deviation between the 15 determined interest vector and the determined saliency- corrected interest vector; determine in real time further interest vectors of said person from said video data; and calculate in real time recalibrated interest vectors by using a calibration error value calculated from said 20 determined deviation.
The invention also relates to a computer software program arranged to run on a processor to perform the steps of the method of the invention, a computer readable data carrier 25 comprising a computer software program arranged to run on a processor to perform the steps of the method of the invention, and a computer comprising a processor and electronic memory connected thereto leaded with a computer software program arranged to perform the steps of the method 30 of the invention.
A preferred embodiment of the invention is described in more detail below with reference to the drawings in which: 7
Figure 1 is a perspective view of a system in accordance with the invention; and 5 Figure 2 is a flow chart of the system in accordance with the invention.
According to figure 1 a system for detecting a person's gaze direction comprises a computer 1 with amongst others a 10 processor unit, system memory and a hard drive, a video camera 2 aimed at the face of a person 6, connected to for instance a USB port of the computer 1. A second camera behind the person, which is aimed at the area where the person 6 is looking at, is also connected to a USB port of 15 the computer 1. A software program is loaded from the hard drive into the system memory of the computer 1 in order to perform the steps of the gaze detection method.
An image 4 having several (salient) objects (in this example 20 a car and it components) that may be of interest to the person is present in front of the person 6. Alternatively the system may be used to determine the gaze direction in a physical environment where (salient) real world objects of interest are present.
25
According to figure 2, a visual gaze vector can be resolved from a combination of body/head pose and eye location obtained from imaging device 2 in component I (box 10). As this is a rough estimation, the obtained gaze line 13 in 30 component II (box 11) is then followed until an uncertain location in the gazed area. The area of interest, in this example obtained from imaging device 5, is analyzed in component III (box 12). In the proposed framework, the gaze 8 vector 13 will be steered (arrow 14) to the most probable (salient) object which is close to the previously estimated point of interest. It is proven that salient objects attract eye fixations [see: Spain, M., Perona, P.: Some objects are 5 more equal than others: Measuring and predicting importance. In: ECCV. (2009); and Einhauser, W., Spain, M., Perona, P.: Objects predict fixations better than early saliency. J.
Vis. 8 (2008) 1-26], and this property is extensively used in the literature to create saliency maps (probability maps 10 which represent the likelihood of receiving an eye fixation) to automate the generation of fixation maps [see: Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: ICCV (2009); and Peters, R.J., Iyer, A., Koch, C., Itti, L.: Components of bottom-up gaze 15 allocation in natural scenes. J. Vis. 5 (2005) 692-692] .
According to the prior art on saliency it is predicted where interesting parts of the scene are, and thereby it is being tried to predict where a person would look. However, now 20 that accurate saliency algorithms are available [see: Valenti, R., Sebe, N., Gevers, T.: Image saliency by isocentric curvedness and color. In ICCV. (2009); Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. PAMI 20 (1998) 1254-25 1259; and Ma, Y.F., Zhang, H.J.: Contrast-based image attention analysis by using fuzzy growing. In ACM MM.
(2003)] , the invention proposes to reverse the problem by using saliency maps in aiding the uncertain fixations.
30 In the system according to the invention, the gaze vector 13 obtained by an existing visual gaze estimation system is used to estimate the possible interest area on the scene.
The size of this area will depend on device capabilities and 9 on the scenario. This area is evaluated for salient regions, and filtered so that salient regions which are far away from the centre of interest will be less relevant for the final estimation. The obtained probability landscape is then 5 explored to find the best candidate for the location of the adjusted fixation. This process is repeated for every estimated fixation in the image. After all the fixations and respective adjustments are obtained, the least-square error between them is minimized in order to find the best 10 transformation from the estimated sets of fixations to the adjusted ones.
This transformation is then applied to the original fixations and future ones, in order to compensate for the 15 found error. When a sequence of estimations is available, the obtained improvement is used to correct the previously erroneous estimates. The found error is used to adjust and recalibrate the gaze estimation devices at runtime, in order to improve future estimations. The method may be used to fix 20 the shortcoming of low quality monocular head and eye trackers improving their overall accuracy.
Visual gaze estimators have inherent errors which may occur in each of the components of the visual gaze pipeline. From 25 these errors the size of the area where interesting locations may be found can be derived. To this end, three errors which should be taken into account when estimating visual gaze (one for each of the components of the pipeline) can be identified: the device error, the calibration error 30 and the foveating error. Depending on the scenario, the actual size of the area of interest will be computed by cumulating these three errors and mapping them to the distance of the gazed scene.
10
The device error:
This error is attributed to the first component of the visual gaze estimation pipeline. As imaging devices are 5 limited in resolution, there are a discrete number of states in which image features can be detected and recognized. The variables defining this error are often the maximum level of details which the device can achieve while interpreting pixels as the location of the eye or the position of the 10 head. Therefore, this error mainly depends on the scenario (e.g. the distance of the subject from the imaging device) and on the device that is being used.
The calibration error: 15 This error is attributed to the resolution of the visual gaze starting from the features extracted in the first component. Eye gaze trackers often use a mapping between the position of the eye and the corresponding locations on the screen. Therefore, the tracking system needs to be 20 calibrated. In case the subject moves from his original location, this mapping will be inconsistent and the system may erroneously estimate the visual gaze. Chinrests are often required in these situations to limit the movements of the users to a minimum. Muscular distress, the length of the 25 session, the tiredness of the subject, all may influence the calibration error. As the calibration error cannot be known a priori, it cannot be modeled. Therefore, the aim is to estimate it, so that is can be compensated.
30 The foveating error:
As this error is associated with the new component proposed in the pipeline, it is required to analyze the properties of the fovea to define it. The fovea is the part of the retina 11 responsible for accurate central vision in the direction in which it is pointed. It is necessary to perform any activities which require a high level of visual details. The human fovea has a diameter of about 1.0 mm with a high 5 concentration of cone photoreceptors which account for the high visual acuity capability. Through saccades (more than 10,000 per hour [see: Geisler, W.S., Banks, M.S.: Handbook of Optics, 2nd Ed. Volume I: Fundamentals, Techniques and Design. Volume 1. McGraw-Hill, Inc., New York, NY, USA 10 (1995)], the fovea is moved to the regions of interest, generating eye fixations. In fact, if the gazed object is large, the eyes constantly shift their gaze to subsequently bring images into the fovea. For this reason, fixations obtained by analyzing the location of the center of the 15 cornea are widely used in the literature as an indication of the gaze and interest of the user.
However, it is generally assumed that the fixation obtained by analyzing the center of the cornea corresponds to the 20 exact location of interest. While this is a valid assumption in most scenarios, the size of the fovea actually permits to see the central two degrees of the visual field. For instance, when reading a text, humans do not fixate on each of the letters, but one fixation permits to read and see the 25 multiple words at once.
Another important aspect to be taken into account is the decrease in visual resolution as we move away from the center of the fovea. The fovea is surrounded by the 30 parafovea belt which extends up to 1.25 mm away from the center, followed by the perifovea (2.75 mm away), which in turn is surrounded by a larger area that delivers low resolution information. Starting at the outskirts of the 12 fovea, the density of receptors progressively decreases, hence the visual resolution decreases rapidly as it goes far away from the foveal center [see: Rossi, E.A., Roorda, A.: The relationship between visual resolution and cone spacing 5 in the human fovea. Nature Neuroscience 13 (2009)]. This is modeled by using a Gaussian kernel centered on the area of interest, with standard deviation as a quarter of the estimated area of interest. In this way, areas which are close to the border of the area of interest are of lesser 10 importance. In our model, we consider this region as the possible location for the interest point. As the area of interest is increased by the projection of the total error, the tail of the Gaussian of the area of interest will aid to balance the importance of a fixation point against the 15 distance from the original fixation point. As the point of interest could be anywhere in this limited area, the next step is to use saliency to extract potential fixation candidates .
20 The saliency is evaluated on the interest area by using a customized version of the saliency framework proposed in [Valenti, R., Sebe, N., Gevers, T.: Image saliency by isocentric curvedness and color. In: ICCV. (2009)]. The framework uses isophote curvature to extract the 25 displacement vectors, which indicate the center of the osculating circle at each point of the image. In Cartesian coordinates, the isophote curvature is defined as: 4L(:r •••
Where Lx represent the first order derivative of the 30 luminance function in the x direction, Lxx the second order derivative on the x direction, and so on. The isophote curvature is used to estimate points which are closer to the 13 center of the structure it belongs to, therefore the isophote curvature is inverted and multiplied by the gradient. The displacement coordinates D(x, y) to the estimated centers are then obtained by: n, f +
In this way every pixel in the image gives an estimate of the potential structure it belongs to. To collect and reinforce this information and to deduce the location of the objects, D(x, y)'s are mapped into an accumulator, weighted 10 according to their local importance defined as the amount of image curvature and color edges. The accumulator is then convolved with a Gaussian kernel so that each cluster of votes will form a single estimate. This clustering of votes in the accumulator gives an indication of where the centers 15 of interesting or structured objects are in the image.
In [Valenti, R., Sebe, N., Gevers, T.: Image saliency by isocentric curvedness and color. In: ICCV. (2009), multiple scales are used. Here, since the scale is directly related 20 to the size of the area of interest, the optimal scale can be determined once and then linked to the area of interest itself. Furthermore, in the abovementioned document, the color and curvature information is added to the final saliency map while here this information is discarded. The 25 reasoning behind this choice is that this information is mainly useful to enhance objects on their edges, while the isocentric saliency is fit to locate the adjusted fixations closer to the center of the objects, rather than on their edges. While removing this information from the saliency map 30 might reduce the overall response of salient objects in the scene, it brings the ability to use the saliency maps as smooth probability density functions.
14
Once the saliency of the area of interest is obtained, it is masked by the area of interest model as defined before. Hence, the Gaussian kernel in the middle of the area of interest will aid in suppressing saliency peaks in its 5 outskirts. However, there may still be uncertainties about multiple optimal fixation candidates.
Therefore, a meanshift window with a size corresponding to the standard deviation of the Gaussian kernel is initialized 10 on the location of the estimated fixation point (corresponding to the center of the area of interest). The meanshift algorithm will then iterate from that point towards the point of highest energy. After convergence, the saliency peak on the area of interest which is closer to the 15 centre of the converged meanshift window is selected as the new (adjusted) fixation point. This process is repeated for all fixation points on an image, obtaining a set of corrections. An analysis of a number of these corrections holds information about the overall calibration error. This 20 allows for estimation of the current calibration error of the gaze estimation system which thereafter can be used to compensate it. The highest peaks in the saliency maps are used to align fixation points with the salient points discovered in the area of interest.
25 A weighted least-squares error minimization between the estimated gaze locations and the corrected ones is performed. In this way, the affine transformation matrix T is derived. The weight is retrieved as the confidence of the 30 adjustment, which considers both the distance from the original fixation and the saliency value sampled on the same location. The obtained transformation matrix T is thereafter applied to the original fixations to obtain the final 15 fixation estimates. These new fixations should have minimized the calibration error.
The pseudo code of the proposed system is as follows: 5 Initialize scenario parameters - Calculate the total error = foveating error + device error + calibration error - Calculate the size of the area of interest by projecting total error at distance d as tan (d*total 10 error) for (each new fixation point p) do - Retrieve the estimated gaze point by the device - Extract the area of interest around each the fixation p Inspect the area of interest for salient objects 15 - Filter the result by the Gaussian kernel - Initialize a meanshift window on the center of the area of interest while (maximum iterations not reached or Ap < threshold) do - climb the distribution to the point of maximum energy 2 0 end while - Select the saliency peak closest to the center of the converged meanshift window as being the correct adjusted fixation - Store the original fixation and the adjusted fixation, 25 with weight w found on the same location on the saliency map
- Calculate the weighted least-squares solution between all the stored points to derive the transformation matrix T
30 - Transform all original fixations with the obtained transformation matrix - Use the transformation matrix T to compensate the calibration error in the device 16 end for
It will be appreciated by those skilled in the art that changes could be made to the embodiments described above 5 without departing from the broad inventive concept thereof. It is understood, therefore, that this invention is not limited to the particular embodiments disclosed, but it is intended to cover modifications within the spirit and scope of the present invention as defined by the appended claims.
10

Claims (13)

1. Systeem voor het detecteren van de interesserichting van een persoon, zoals de kijkrichting, oogrichting, 5 hoofdrichting, lichaamsrichting of vingerwijsrichting van een persoon, omvattende: een processor; ten minste een videocamera die is verbonden met de processor voor het opnemen van videogegevens; 10 elektronisch geheugen dat is verbonden met de processor; waarbij de processor is ingericht om onvertraagd een interessevector van een persoon te bepalen uit de videogegevens; 15 waarbij de processor is ingericht om onvertraagd een saillante piek te bepalen die zich het dichtst bij de interessevector bevindt; waarbij de processor is ingericht om onvertraagd een saillantheidsgecorrigeerde interessevector tussen de persoon 20 en de dichtstbijzijnde saillante piek te bepalen; waarbij de processor is ingericht om onvertraagd de afwijking tussen de bepaalde interessevector en de bepaalde saillantheidsgecorrigeerde interessevector te bepalen; waarbij de processor is ingericht om onvertraagd 25 verdere interessevectoren van de persoon uit de videogegevens te bepalen; en waarbij de processor is ingericht om onvertraagd gekalibreerde interessevectoren te berekenen door gebruik te maken van een foutwaarde die is berekend uit de bepaalde 30 afwijking.A system for detecting the direction of interest of a person, such as the viewing direction, eye direction, main direction, body direction or finger pointing direction of a person, comprising: a processor; at least one video camera connected to the processor for recording video data; 10 electronic memory connected to the processor; wherein the processor is arranged to determine a person's interest vector from the video data in real time; Wherein the processor is arranged to determine in real time a salient peak that is closest to the interest vector; wherein the processor is arranged to determine a salinity corrected interest vector between the person and the nearest salient peak in real time; wherein the processor is arranged to determine the deviation between the determined interest vector and the determined salinity corrected interest vector in real time; wherein the processor is arranged to determine further interest vectors of the person from the video data in real time; and wherein the processor is arranged to calculate interest vectors calibrated in real time by using an error value calculated from the determined deviation. 2. Systeem volgens conclusie 1, waarbij de processor is ingericht om onvertraagd de saillante pieken te bepalen die zich het dichtst bij een veelheid aan bepaalde interessevectoren bevinden; de processor is ingericht om onvertraagd een veelheid aan saillantheidsgecorrigeerde interessevectoren tussen de 5 persoon en de dichtstbijzijnde saillante pieken te bepalen; waarbij de processor is ingericht om onvertraagd de afwijkingen tussen de veelheid aan bepaalde interessevectoren en de veelheid aan saillantheidsgecorrigeerde vectoren te bepalen; 10 waarbij de kalibratiefoutwaarde wordt berekend uit de veelheid aan bepaalde afwijkingen.The system of claim 1, wherein the processor is arranged to determine in real time the salient peaks closest to a plurality of particular interest vectors; the processor is arranged to determine a plurality of salinity corrected interest vectors between the person and the nearest salient peaks in real time; wherein the processor is arranged to determine in real time the deviations between the plurality of particular interest vectors and the plurality of salinity corrected vectors; 10 wherein the calibration error value is calculated from the plurality of certain deviations. 3. Systeem volgens conclusie 1 of 2, waarbij de processor is ingericht om onvertraagd het proces van het berekenen van 15 de kalibratiefoutwaarde te itereren door voorafgaand bepaalde interessevectoren te vervangen door interessevectoren die zijn gecorrigeerd door gebruik te maken van een voorafgaand bepaalde kalibratiefoutwaarde, voor het berekenen van een huidige kalibratiefoutwaarde. 203. System as claimed in claim 1 or 2, wherein the processor is arranged to iterate in real time the process of calculating the calibration error value by replacing predetermined interest vectors with interest vectors corrected by using a predetermined calibration error value for calculating a current calibration error value. 20 4. Systeem volgens conclusie 2 of 3, waarbij de processor is ingericht om onvertraagd de kalibratiefoutwaarde te berekenen door het verschil tussen de veelheid aan bepaalde afwijkingen en de kalibratiefoutwaarde te minimaliseren, 25 bijvoorbeeld door gebruik te maken van een gewogen kleinste-kwadratenmethode.4. System as claimed in claim 2 or 3, wherein the processor is adapted to calculate the calibration error value in real time by minimizing the difference between the plurality of determined deviations and the calibration error value, for instance by using a weighted least squares method. 5. Systeem volgens een van de conclusies 1-4, waarbij de saillante pieken worden bepaald door gebruik te maken van 30 saillantheidsgegevens over het gebied waar de persoon verwacht wordt naar te kijken, zoals videogegevens, schermopnamegegevens of handmatig ingevoerde gegevens.5. System according to any of claims 1-4, wherein salient peaks are determined by using salient data about the area that the person is expected to look at, such as video data, screen recording data or manually entered data. 6. Systeem volgens een van de voorgaande conclusies 1-5, waarbij de processor is ingericht om onvertraagd saillante pieken te bepalen in de regio rondom de bepaalde interessevector uit de videogegevens vóór het bepalen van de 5 saillante piek die zich het dichtst bij de bepaalde interessevector bevindt.6. System as claimed in any of the foregoing claims 1-5, wherein the processor is arranged to determine salient peaks in real time in the region around the determined interest vector from the video data before determining the salient peak closest to the determined interest vector is located. 7. Systeem volgens conclusie 6, waarbij het systeem ten minste twee videocamera's omvat die zijn verbonden met de 10 processor, waarbij een camera is bestemd voor het opnemen van videogegevens van het gezicht en/of lichaam van een persoon, en waarbij een camera is bestemd om de genoemde videogegevens op te nemen.7. System as claimed in claim 6, wherein the system comprises at least two video cameras connected to the processor, wherein a camera is intended for recording video data of the face and / or body of a person, and wherein a camera is intended to record the mentioned video data. 8. Systeem volgens conclusie 7, waarbij de processor, het elektronisch geheugen en de ten minste twee videocamera's zijn gecombineerd in een apparaat, bijvoorbeeld een smartphone.System according to claim 7, wherein the processor, the electronic memory and the at least two video cameras are combined in a device, for example a smartphone. 9. Systeem volgens een van de voorgaande conclusies 1-8, waarbij de interesserichting een kijkrichting is.The system according to any of the preceding claims 1-8, wherein the direction of interest is a viewing direction. 10. Werkwijze voor het detecteren van een de interesserichting van een persoon, waarbij een processor de 25 volgende stappen uitvoert: het onvertraagd bepalen van een interessevector van een persoon uit videogegevens die zijn opgenomen door een videocamera; het onvertraagd bepalen van een saillante piek die zich 30 het dichtst bij de interessevector bevindt; het onvertraagd bepalen van een saillantheidsgecorrigeerde interessevector tussen de persoon en de dichtstbijzijnde saillante piek; het onvertraagd bepalen van de afwijking tussen de bepaalde interessevector en de bepaalde saillantheidsgecorrigeerde interessevector; het onvertraagd bepalen van verdere interessevectoren 5 van de persoon uit de videogegevens; en het onvertraagd berekenen van gekalibreerde interessevectoren door gebruik te maken van een foutwaarde die is berekend uit de bepaalde afwijking.10. A method for detecting a person's interest direction, a processor performing the following steps: determining a person's interest vector in real time from video data recorded by a video camera; determining a salient peak closest to the interest vector in real time; determining a salinity corrected interest vector between the person and the nearest salient peak in real time; determining the deviation between the determined interest vector and the determined salinity corrected interest vector in real time; determining real-time interest vectors of the person from the video data in real time; and calculating calibrated interest vectors in real time using an error value calculated from the determined deviation. 11. Computersoftwareprogramma dat is ingericht om te draaien op een processor teneinde de stappen van de werkwijze volgens conclusie 10 uit te voeren.A computer software program arranged to run on a processor to perform the steps of the method of claim 10. 12. Computerleesbare drager omvattende een 15 computersoftwareprogramma dat is ingericht om te draaien op een processor teneinde de stappen van de werkwijze volgens conclusie 10 uit te voeren.12. Computer-readable carrier comprising a computer software program that is arranged to run on a processor in order to carry out the steps of the method according to claim 10. 13. Computer omvattende een processor en daarmee verbonden 20 elektronisch geheugen dat is geladen met een computersoftwareprogramma dat is ingericht om de stappen van de werkwijze volgens conclusie 10 uit te voeren.13. Computer comprising a processor and associated electronic memory that is loaded with a computer software program that is arranged to carry out the steps of the method according to claim 10.
NL2004878A 2010-06-11 2010-06-11 System and method for detecting a person's direction of interest, such as a person's gaze direction. NL2004878C2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
NL2004878A NL2004878C2 (en) 2010-06-11 2010-06-11 System and method for detecting a person's direction of interest, such as a person's gaze direction.
PCT/NL2011/050423 WO2012008827A1 (en) 2010-06-11 2011-06-10 System and method for detecting a person's direction of interest, such as a person's gaze direction

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
NL2004878 2010-06-11
NL2004878A NL2004878C2 (en) 2010-06-11 2010-06-11 System and method for detecting a person's direction of interest, such as a person's gaze direction.

Publications (1)

Publication Number Publication Date
NL2004878C2 true NL2004878C2 (en) 2011-12-13

Family

ID=43589565

Family Applications (1)

Application Number Title Priority Date Filing Date
NL2004878A NL2004878C2 (en) 2010-06-11 2010-06-11 System and method for detecting a person's direction of interest, such as a person's gaze direction.

Country Status (2)

Country Link
NL (1) NL2004878C2 (en)
WO (1) WO2012008827A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106415442A (en) 2014-05-08 2017-02-15 索尼公司 Portable electronic equipment and method of controlling a portable electronic equipment
US10248280B2 (en) 2015-08-18 2019-04-02 International Business Machines Corporation Controlling input to a plurality of computer windows

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100079508A1 (en) 2008-09-30 2010-04-01 Andrew Hodge Electronic devices with gaze detection capabilities

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
HILLAIRE, S., BRETON, G., OUARTI, N., COZOT, R. AND LÉCUYER, A.: "Using a Visual Attention Model to Improve Gaze Tracking Systems in Interactive 3D Applications", COMPUTER GRAPHICS FORUM, vol. 29, no. 6, 22 March 2010 (2010-03-22), pages 1830, XP002624749, DOI: 10.1111/j.1467-8659.2010.01651.x *
ITTI L ET AL: "A MODEL OF SALIENCY-BASED VISUAL ATTENTION FOR RAPID SCENE ANALYSIS", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, IEEE SERVICE CENTER, LOS ALAMITOS, CA, US, vol. 20, no. 11, 1 November 1998 (1998-11-01), pages 1254 - 1259, XP001203933, ISSN: 0162-8828, DOI: DOI:10.1109/34.730558 *
ROBERTO VALENTI ET AL: "Image saliency by isocentric curvedness and color", COMPUTER VISION, 2009 IEEE 12TH INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 29 September 2009 (2009-09-29), pages 2185 - 2192, XP031672570, ISBN: 978-1-4244-4420-5 *
VALENTI R ET AL: "Accurate eye center location and tracking using isophote curvature", COMPUTER VISION AND PATTERN RECOGNITION, 2008. CVPR 2008. IEEE CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 23 June 2008 (2008-06-23), pages 1 - 8, XP031297087, ISBN: 978-1-4244-2242-5 *
YUSUKE SUGANO ET AL: "Calibration-free gaze sensing using saliency maps", 2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 13-18 JUNE 2010, SAN FRANCISCO, CA, USA, IEEE, PISCATAWAY, NJ, USA, 13 June 2010 (2010-06-13), pages 2667 - 2674, XP031725813, ISBN: 978-1-4244-6984-0 *

Also Published As

Publication number Publication date
WO2012008827A1 (en) 2012-01-19

Similar Documents

Publication Publication Date Title
US20180211104A1 (en) Method and device for target tracking
US10048749B2 (en) Gaze detection offset for gaze tracking models
JP6411510B2 (en) System and method for identifying faces in unconstrained media
US10109056B2 (en) Method for calibration free gaze tracking using low cost camera
US9075453B2 (en) Human eye controlled computer mouse interface
US9405364B2 (en) Method of determining reflections of light
CN110807427B (en) Sight tracking method and device, computer equipment and storage medium
US20160162673A1 (en) Technologies for learning body part geometry for use in biometric authentication
JP2016515242A (en) Method and apparatus for gazing point estimation without calibration
Valenti et al. What are you looking at? Improving visual gaze estimation by saliency
JP6822482B2 (en) Line-of-sight estimation device, line-of-sight estimation method, and program recording medium
WO2012126844A1 (en) Method and apparatus for gaze point mapping
JP5001930B2 (en) Motion recognition apparatus and method
KR101288447B1 (en) Gaze tracking apparatus, display apparatus and method therof
Valenti et al. Webcam-based visual gaze estimation
NL2004878C2 (en) System and method for detecting a person&#39;s direction of interest, such as a person&#39;s gaze direction.
Wu et al. NIR-based gaze tracking with fast pupil ellipse fitting for real-time wearable eye trackers
CN112396654A (en) Method and device for determining pose of tracking object in image tracking process
Strupczewski Commodity camera eye gaze tracking
EP2685351A1 (en) Method for calibration free gaze tracking using low cost camera
CN112114659A (en) Method and system for determining a fine point of regard for a user
Kim et al. Gaze tracking based on pupil estimation using multilayer perception
Huang et al. Robust feature extraction for non-contact gaze tracking with eyeglasses
Ratnayake et al. Head Movement Invariant Eye Tracking System
García-Dopico et al. Precise Non-Intrusive Real-Time Gaze Tracking System for Embedded Setups.