GB2384959A - Eye tracking using image data with model data - Google Patents

Eye tracking using image data with model data Download PDF

Info

Publication number
GB2384959A
GB2384959A GB0229264A GB0229264A GB2384959A GB 2384959 A GB2384959 A GB 2384959A GB 0229264 A GB0229264 A GB 0229264A GB 0229264 A GB0229264 A GB 0229264A GB 2384959 A GB2384959 A GB 2384959A
Authority
GB
United Kingdom
Prior art keywords
image
search
positions
dimensional
dimensional space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB0229264A
Other versions
GB2384959B (en
GB0229264D0 (en
Inventor
Matthew David Ryan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Publication of GB0229264D0 publication Critical patent/GB0229264D0/en
Priority to US10/358,067 priority Critical patent/US7197165B2/en
Publication of GB2384959A publication Critical patent/GB2384959A/en
Application granted granted Critical
Publication of GB2384959B publication Critical patent/GB2384959B/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/04815Interaction with a metaphor-based environment or interaction object displayed as three-dimensional, e.g. changing the user viewpoint with respect to the environment or object
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/164Detection; Localisation; Normalisation using holistic features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/167Detection; Localisation; Normalisation using comparisons between temporally consecutive images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Processing Or Creating Images (AREA)
  • Image Analysis (AREA)

Abstract

In a computer processing apparatus, frames of image data received from a camera 4 are processed to track the eyes of a user in each image. A 3D computer model of a head is stored, and search regions are defined in the 3D space corresponding to the eyes and eyebrows. For each image, pixels within the projection of the search regions from the 3D space to the 2D image space are sampled to determine a representative intensity value for each of the search regions. Positions for the eyes in the 3D space are then calculated based on the determined values. The 3D computer model and search bands are moved within the 3D space to align the eyes with the calculated eye positions. In this way, when the next image is processed, the search bands project into the image from a head configuration determined from the previous image.

Description

<Desc/Clms Page number 1>
EYE TRACKING USING IMAGE DATA The present invention relates to the field of eye tracking.
Tracking the eyes of a real-life person is important for many applications. In particular, it allows the person's gaze direction (and therefore what the person is looking at) to be determined.
The present invention is concerned with the processing of images showing the face of a user as the user moves his head, to track the movement of the user's eyes in an accurate and reliable manner.
According to the present invention, eye tracking is carried out using a stored three-dimensional (3D) computer model of at least the eyes and eyebrows of a head in a three-dimensional space. Search regions are defined in the three-dimensional space corresponding to the eyes and eyebrows. For each image in a sequence of images of a user's head, pixels within the projection of the predefined search regions from the three-dimensional space to the two-dimensional image are sampled to determine pixel image values (intensity, or one or more
<Desc/Clms Page number 2>
colour component values, or hue, etc) for each search region. Positions for the eyes in the three-dimensional space are then calculated in dependence upon the determined pixel image values. The 3D computer model and search bands are then moved within the three-dimensional space to align the eyes with the calculated eye positions. In this way, when the next image in the sequence is processed, the search bands project into the image from a head configuration in three-dimensional space determined from the previous image. This facilitates reliable and accurate eye tracking for all of the images in the sequence.
In one embodiment, each search region is a twodimensional area in the three-dimensional space, and positions for the eyes in the three-dimensional space are calculated on the basis that eyes and eyebrows are recorded as dark bands in the images relative to the other parts of the face, and the dark band caused by an eye will be below the dark band caused by an eyebrow.
In another embodiment, each search region is a line in the three-dimensional space, and positions for the eyes in the three-dimensional space are calculated by comparing the pixel image values for the search regions
<Desc/Clms Page number 3>
with prestored reference image values.
The present invention provides apparatus and methods for use in performing the processing, and computer program products for enabling a programmable apparatus to become operable to perform the processing.
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings, in which like reference numbers are used to designate like parts, and in which: Figure 1 illustrates the recording of images of a user as the user changes the position and orientation of his head, the image data subsequently being processed to track the user's eyes; Figures 2a and 2b schematically show the components of a first embodiment of the invention, together with the notional functional processing units into which the processing apparatus component may be thought of as being configured when programmed by programming instructions; Figures 3a and 3b show the processing operations performed by the processing apparatus in Figure 2 to
<Desc/Clms Page number 4>
process input data; Figure 4 schematically illustrates the transformation defined at step S3-2 in Figure 3; Figure 5 illustrates the search regions and sample points defined at step S3-4 in Figure 3 for use in determining the y-axis positions of the user's eyes and eyebrows; Figures 6a to 6f illustrate the search regions and sample points defined at step S3-4 in Figure 3 for use in determining the x-axis positions of the user's eyes and eyebrows; Figure 7 shows the processing operations performed at step S3-14 in Figure 3; Figures 8a and 8b show the processing operations performed at step S7-2 in Figure 7; Figure 9 illustrates the processing to calculate a refined y-position at step S8-22 in Figure 8; Figures 10a and 10b show the processing operations performed at step S7-4 in Figure 7;
<Desc/Clms Page number 5>
Figure 11 shows the processing operations performed at step S3-18 in Figure 3; Figure 12 schematically shows the components of a second embodiment of the invention, together with the notional functional processing units into which the processing apparatus component may be thought of as being configured when programmed by programming instructions ; Figures 13a and 13b show the processing operations performed by the processing apparatus in Figure 12 ; Figures 14a and 14b illustrate the search bands and sample points defined at step S13-4 in Figure 4 for use in determining the Y-axis positions of the user's eyes; Figures 15a and 15b show the processing operations performed at step S13-5 in Figure 13; Figures 16a to 16d illustrate the projection of keypoints and the calculation of sample points at steps S15-12 and S15-14 of Figure 15; Figure 17 illustrates the reference intensity profiles generated by the processing at step S15-18 of Figure 15;
<Desc/Clms Page number 6>
Figure 18 shows the processing operations performed at step S13-14 in Figure 13; Figures 19a and 19b show the processing operations performed at step S15-2 in Figure 15; and Figures 20a to 20e show examples to illustrate the processing performed in step S19-12 in Figure 19.
First Embodiment Referring to Figure 1, in an embodiment of the invention, images of a user 2 are recorded by a single video camera 4, of conventional design, as the user 2 changes the position and orientation of his head, which, in this embodiment, occurs as the user operates a personal computer 6.
The digital image data recorded by camera 4 is input to the processing apparatus component 8 of the personal computer 6, which processes the data to track the eyes of the user 2 in a way which will be described below.
The processing apparatus component 8 of the personal computer 6 contains, in a conventional manner, one or
<Desc/Clms Page number 7>
more processors, memories, graphics cards, etc.
Referring to Figures 2a and 2b, the processing apparatus component 8 is programmed to operate in accordance with programming instructions input, for example, as instructions stored on a storage medium, such as disc 16, and/or as a signal 18 input to the processing apparatus 8, for example from a remote database, by transmission over a communication network (not shown) such as the Internet or by transmission through the atmosphere, and/or entered by the-user via a user input device 10 such as a keyboard.
As will be described in more detail below, the programming instructions comprise instructions to cause the processing apparatus 8 to become configured to process input data defining a three-dimensional (3D) computer model of the head of the user 2, input data defining parameters of the camera 4, and image data recorded by camera 4 of the user 2, to track the eyes of the user 2 through each frame of image data received from camera 4. More particularly, search regions are defined in the three-dimensional co-ordinate space of the 3D head model to cover the eyes and eyebrows therein, and sample points are defined within each search region. For each
<Desc/Clms Page number 8>
frame of image data received from camera 4, each sample point defined in the three-dimensional space of the 3D head model is projected from its current position (resulting from the processing of the previous frame of image data received from camera 4) into the 2D image space, and the value of the corresponding pixel is determined. An average is then calculated for all of the sample pixel values corresponding to the sample points in each search region. This defines a respective average intensity for each search band. Based on the positions of the intensity minima within the search regions, the positions of the user's eyes and eyebrows in the threedimensional coordinate space of the head model are determined using the principle that the user's eyes and eyebrows will both generate dark bands (minima) in the image and the eye dark bands will always be below the eyebrow dark bands. Based on these positions, the 3D head model and associated search regions are moved in 3D space so that the positions of the eyes in the 3D model correspond to the positions determined from the 2D image recorded by camera 4. This processing is repeated for each frame of image data received from camera 4 to track the user's eyes in an accurate and reliable way.
When programmed by the programming instructions,
<Desc/Clms Page number 9>
processing apparatus 8 can be thought of as being configured as a number of functional units for performing processing operations. Examples of such functional units and their interconnections are shown in Figures 2a and 2b. The units and interconnections illustrated in Figures 2a and 2b are, however, notional, and are shown for illustration purposes only to assist understanding; they do not necessarily represent units and connections into which the processor, memory etc of the processing apparatus 8 actually become configured.
Referring to the functional units shown in Figures 2a and 2b, a central controller 20 is arranged to process inputs from the user input devices 10 and to control the storage of data input by the user on a data storage medium, such as disc 12, or as a signal 14. Central controller 20 is also arranged to provide control and processing for the other functional units.
Working memory 22 is provided for use by central controller 20 and the other functional units.
Head model store 24 is configured to store data input to the processing apparatus 8, for example as data stored on storage medium 12 or as a signal 14, defining a 3D
<Desc/Clms Page number 10>
computer model of the head of the user 2. The input data defining the 3D computer model of the user's head may be generated, for example, in a conventional manner by a laser scanner, 3D touch sensing probe, etc.
Camera parameter store 26 is configured to store data input to the processing apparatus 8, for example as data stored on a storage medium 12, as a signal 14, or using a user input device 10, defining parameters of the camera 4 which records the image data for processing. More particularly, in this embodiment, camera parameter store 26 is configured to store data defining a scale factor of the camera 4. In this embodiment, the image data for each image recorded by camera 4 is defined in a twodimensional coordinate system having x and y axes parallel to the x and y axes of the three-dimensional imaging coordinate system of camera 4. The z-axis of the camera coordinate system is defined to lie along the optical axis of the camera 4, so that the origin of the camera coordinate system is at the optical centre of the camera 4. The scale factor stored in camera parameter store 36 is a scalar constant defining how the size of a unit-size object changes in the recorded image as the object moves in the z-axis direction of the camera coordinate system. Consequently, as will be understood
<Desc/Clms Page number 11>
by the skilled person, the scale factor will depend upon a number of internal camera parameters and is determined by calibration in a conventional manner.
Image data store 28 is configured to store frames of image data received from camera 4. In this embodiment, image data store 28 has a capacity to store one frame of image data, which is then processed in real-time and discarded before the next frame of image data is received from camera 4.
Head tracker 30 is arranged to process the pixel data defining each image frame stored in image data store 28, to calculate a position for the centre of the 3D computer head model in three-dimensional space and positions for the eyes of the 3D computer head model in threedimensional space. In addition, head tracker 30 is arranged to translate and rotate the 3D computer head model in three-dimensional space in accordance with the calculated positions. In this way, the position and orientation of the 3D computer head model is updated after the processing of each frame of image data.
In this embodiment, head tracker 30 comprises a position, z-orientation and size calculator 32, and a head model
<Desc/Clms Page number 12>
processor 34.
Position, z-orientation and a size calculator 32 is arranged to process each frame of image data to identify pixels therein relating to the face of the user 2 and, in dependence upon the identified pixels, to determine the position and size of the user's head in the image, and the orientation of the user's head about an axis perpendicular to the 2D image data plane (which corresponds to an axis parallel to the z-axis of the three-dimensional coordinate space of the 3D computer head model in this embodiment).
Referring to Figure 2b, the head model processor 34 comprises a transformation calculator 36, a search parameter definer 38, an eye position calculator 40 and a rotator and translator 42.
Transformation calculator 36 is arranged to define a transformation equation to map points from the threedimensional coordinate space in which the 3D head model of the user is defined into the two-dimensional image space in which a frame of image data received from camera 4 is defined.
<Desc/Clms Page number 13>
Search parameter definer 38 is arranged to define search regions in the three-dimensional space of the 3D head model so as to cover specific parts of the 3D head model, which, as will be described in detail below, comprise the eyes and eyebrows of the 3D head model. In addition, search parameter definer 38 is arranged to define sample points within each search region for subsequent projection into the 2D coordinate space of a frame of image data received from camera 4.
Eye position calculator 40 is arranged to determine positions for the eyes of the 3D computer head model using each frame of image data received from camera 4. This is done by projecting the sample points defined by search parameter definer 38 from the three-dimensional space of the head model into the two-dimensional space of the image data, and processing the intensity values of the image pixels corresponding to the projected points. This processing will be described in detail below.
Rotator and translator 42 is arranged to rotate and translate the 3D computer head model within its threedimensional coordinate system in dependence upon the results of the processing by position, z-orientation and
<Desc/Clms Page number 14>
size calculator 32 and eye position calculator 40. In practice, this is done by changing the values of the rotation and translation components of the transformation equation defined by transformation calculator 36. Again, this processing will be described in detail below.
Referring again to Figure 2a, gaze direction calculator 44 is arranged to determine the user's eye orientation and to calculate the direction in which the user 2 is looking for each frame of image data based on the determined eye orientation and the position and orientation of the 3D computer head model after it has been rotated and translated by rotator and translator 42.
Data defining the calculated gaze direction may be processed further within processing apparatus 8 by an additional application (not shown) or may be output from the apparatus, for example as data on a data storage medium, such as disc 46, and/or as a signal 48. A recording of the output data may be made by recording the output signal 48 either directly or indirectly using recording apparatus (not shown). More particularly, the data may be output from processing apparatus 8 and stored directly on storage medium 46. Alternatively, the data output from processing apparatus 8 may be stored on a
<Desc/Clms Page number 15>
"master"storage medium and then further storage media storing the code may be generated therefrom. In this way, an indirect recording is made of the data.
Similarly, the signal 48 carrying output data may be the direct output of processing apparatus 8 or a signal generated indirectly therefrom.
Referring now to Figure 3, the processing operations performed by the functional components within processing apparatus 8 in this embodiment will now be described.
At step S3-2, transformation calculator 36 defines a transformation equation to map points in the threedimensional coordinate system in which the 3D computer head model of the user 2 is defined into points in the two-dimensional coordinate system in which each frame of image data received from camera 4 is defined. More particularly, in this embodiment, transformation calculator 36 defines a projective transformation equation in accordance with the camera parameters stored in camera parameter store 26. As explained previously, in this embodiment, the image data for each image recorded by camera 4 is defined in a two-dimensional coordinate system having x and y axes parallel to the x and y axes of the three-dimensional imaging coordinate
<Desc/Clms Page number 16>
system of camera 4. The z-axis of the camera coordinate system is defined to lie along the optical axis of the camera 4, so that the origin of the camera coordinate system is at the optical centre of the camera 4. Thus, referring to Figure 4, a transformation equation is defined which maps points in the 3D head coordinate system"H"to points in the 2D image coordinate system "I"in accordance with the following equation:
where Pimage is a point in 2D image space Phead is a point in 3D head model space T is a translation matrix defining translation in the 3D coordinate system of the 3D computer head model Rx, Ry and Rz are rotations about the x, y and z axes respectively in the 3D coordinate system of the 3D computer head model S is a scale matrix defined by S = (s/Pz) I, where s is the scale factor previously stored in camera parameter store 26, Pz is the z- coordinate of Phead and I is the identity matrix.
<Desc/Clms Page number 17>
At step S3-2, transformation calculator 36 defines initial values for T, Rz and Ry and Rx which map the 3D computer head model into 2D image space such that the head is at the centre of the image, is front-facing, and has a predetermined size. As will be explained below, T, R,, Ry and Rx are updated to define movement of the 3D computer head model representing real world movement of the head of user 2 determined from image data received from camera 4.
Referring again to Figure 3, at step S3-4, search parameter definer 38 defines search regions within the three-dimensional coordinate system of the 3D computer head model together with a plurality of sample points within each search region. In this embodiment, search. parameter definer 38 defines a set of search regions for use in calculating y-coordinates of the eyes and eyebrows in the three-dimensional coordinate system and a set of search regions for use in calculating x-coordinates of the eyes and eyebrows in the three-dimensional coordinate system.
Figures 5 and 6 show the search regions defined at step S3-4 by search parameter definer 38 in this embodiment.
<Desc/Clms Page number 18>
Referring to Figure 5, the set of search regions defined by search parameter definer 38 within the threedimensional coordinate system of the 3D computer head model for use in determining the y-axis positions of the head model's eyes and eyebrows in the 3D space are shown.
More particularly, search parameter definer 38 defines a left group 60 of search bands 80 and a right group 70 of search bands 80 arranged on either side of the head symmetry line (which, in this embodiment, is defined by the y-axis of the three-dimensional coordinate space).
In this embodiment, the search bands 80 within each group 60,70 are adjacent and non-overlapping. Each search band 80 is defined relative to the 3D computer head model so that the search bands move with the head model when it is translated and rotated within the three-dimensional coordinate system.
Search parameter definer 38 defines the bands 80 within
each group 60, 70 to lie between y-axis limits of Y., n and Yma.. These limits are defined to ensure that the search bands 80 within each group cover fully the left eye and left eyebrow and the right eye and right eyebrow, respectively of the 3D computer head model.
<Desc/Clms Page number 19>
The centre, Si, of the left group 60 of search bands and the centre, Sr, of the right group 70 of search bands are defined as follows:
where El and Bi are the x, y-positions of the centres of the left eye and left eyebrow, respectively, in the 3D coordinate system of the head model, and Er and Br are the x, y-positions of the centres of the right eye and
1 right eyebrow, respectively, in the 3D coordinate space.
Thus, the centre of each group 60,70 is the mid-point in the y-axis direction between the centre of an eye and the corresponding eyebrow.
Each search band 80 is defined, in this embodiment, to be a rectangular region centred on a point lying on a perpendicular line between Ymax and Ymin which passes through the point Si or Sr. Accordingly, each search band lies in a plane perpendicular to the z-axis of the
<Desc/Clms Page number 20>
3D coordinate system. In this embodiment, the zcoordinate value of the plane containing the search bands is set such that it is equal to the z-coordinate value of the eyes of the 3D computer head model (with the result that the plane passes through the 3D computer head model).
The width of each search band 80 in the x-axis direction is Rx. Search parameter definer 38 defines Rx to ensure that each search band 80 is sufficiently wide in the xaxis direction to enclose fully an eye or eyebrow.
The width of each search band 80 in the y-axis direction is Ry. Search parameter definer 38 defines Ry to ensure that each band is sufficiently wide to enclose an eye pupil or eyebrow, but is sufficiently narrow to ensure that each search band 80 cannot enclose both a part of an eye pupil as well as a part of an eyebrow. In this embodiment, Y, Ym1n and Ry are such that there are 10 search bands 80 in each group 60,70 as shown in Figure 5 (although a different number of search bands 80 could, of course, be set).
Within each respective search band 80, search parameter definer 38 defines a plurality of sample points,"q"
<Desc/Clms Page number 21>
(shown for only some of the search bands 80 in Figure 5 but, in practice, defined for each search band 80). In this embodiment, the number of samples points"q"is defined to be the same for each search band 80, and the number for each search band is selected to be in the range 20 to 50 points, which are evenly spaced throughout the search band.
Figures 6a to 6f show the set of search regions defined at step S3-4 by search parameter definer 38 within the three-dimensional coordinate system of the 3D computer head model for subsequent use in determining the x-axis positions of the head model's eyes and eyebrows in the three-dimensional space.
Referring to Figures 6a to 6f, search parameter definer 38 defines a plurality of groups 100,110, 120,130, 140 of search bands 90. Each group 100-140 comprises four search bands 90, which, in this embodiment, have the same dimensions Rx and Ry as the search bands 80 described above.
Each search band 90 lines in the same plane perpendicular to the z-axis as the search bands 80, and is defined relative to the 3D computer head model so that the search
<Desc/Clms Page number 22>
bands move with the head model when it is translated and rotated within the three-dimensional coordinate system.
Within each group 100-140, the relative positions of the four search bands 90 is the same. More particularly, the centres of the top two search bands 90 are constrained to lie on a line YB parallel to the x-axis which passes through the centre of the eyebrows in the 3D computer head model. Similarly, the centres of the lower search bands 90 are constrained to lie on line YE parallel to the x-axis which passes through the centre of the eye pupils in the 3D computer head model. Thus, the distance between the centres of the upper and lower search bands is YB-YE, as shown in Figure 6a. The distance between the centres of the search bands 90 in the x-axis direction is defined, in this embodiment, to be equal to the distance in the x-axis direction between the points S1 and Sr, where S1 and Sr are defined by equations (2) and (3) above.
Search parameter definer 38 defines each group 100-140 of search bands so that each respective group 100-140 is displaced in the x-axis direction relative to all of the other groups 100-140, as shown in Figures 6b to 6f (although, as noted above, the y-axis positions of each
<Desc/Clms Page number 23>
group 100-140 are the same). Accordingly, referring to Figures 6b to 6f, this can be thought of as sliding the group 100 parallel to the x-axis while keeping the relative positions of the search band 90 within the group fixed to define new groups 110,120, 130 and 140 at different positions in the x-axis direction.
Although only five groups 100-140 are shown in Figures 6b to 6f, in this embodiment, search parameter definer 38 defines eight groups of search bands evenly spaced between a minimum x-coordinate limit X in and a maximum xcoordinate limit Xmax (although a different number of groups could, of course, be set). Consequently, although not shown in Figures 6b to 6f, the search bands 90 from the different groups overlap each other.
The x-coordinate limits Xmin and Xmax are defined so that the eyes and eyebrows in the 3D computer head model lie fully between these limits. As shown in Figure 6b and Figure 6f, the outermost groups 100 and 140 are defined so that the centres of the outermost search bands 90 have x-axis coordinates corresponding to Xmin and Xmax, respectively.
Within each search band 90, search parameter definer 38
<Desc/Clms Page number 24>
defines a plurality of sample points,"q", (shown for one search band 90 only in Figures 6b to 6f but, in practice, defined for each search band 90). In this embodiment, the number of sample points"q"is defined to be the same for each search band 90, and the number for each search band is selected to be in the range of 20 to 50 points, which are evenly spaced throughout the search band.
Referring again to Figure 3, at steps S3-6 to S3-22, processing apparatus 8 processes each frame of image data received from camera 4 to calculate the position, size and z-axis orientation of the head of the user 2 in each frame, to calculate eye positions in three-dimensional space for the 3D computer head model, and to translate and rotate the 3D computer head model of the user in accordance with the results of the calculations. In addition, in this embodiment, processing apparatus 8 calculates the user's gaze direction for each image frame in dependence upon the user's eyeball orientation and the translated and rotated 3D computer head model. This processing will now be described in detail. It should be noted, however, that steps S3-2 and S3-4 described above are performed only once and are not repeated as part of the processing for each frame of image data described below.
<Desc/Clms Page number 25>
At step S3-6, the next frame of image data received from camera 4 is stored in image data store 28 as digital pixel data under the control of central controller 20.
At step S3-8, position, z-orientation and size calculator 32 processes the image data stored at step S3-6 to identify pixels therein relating to the head of the user 2, and to process the identified pixel data to determine: (i) the orientation of the user's head about an axis through the centre of the head perpendicular to the 2D image plane (which corresponds to an axis parallel to the z-axis of the 3D coordinate system in which the 3D computer head model is defined in this embodiment); (ii) the position of the centre of the user's head in the 2D image space; and (iii) the size of the user's head.
In this embodiment, the processing at step S3-8 is performed in a conventional manner, for example as described in"Computer Vision Face Tracking for Use in a Perceptual User Interface"by Bradski, Intel
<Desc/Clms Page number 26>
Technology Journal Q2 1998 available at http ://developer. intel. com/technology/ITJ/Q21998/ articles/art-2. htm.
At step S3-10, rotator and translator 42 changes the position of the 3D computer head model within its 3D coordinate system in accordance with the size and centre positions calculated at step S3-8. More particularly, rotator and translator 42 moves the 3D computer head model in the z-axis direction in accordance with the determined size, and moves the 3D computer head model in the xy plane in accordance with the determined centre position. This is achieved by changing the translation matrix T in equation (1) above.
At step S3-12, rotator and translator 42 rotates the 3D computer head model about an axis through the centre of the head parallel to the z-axis in the 3D model space coordinate system in accordance with the head orientation calculated at step S3-8. This is achieved by changing the rotation Rz in equation (1) above.
At steps S3-14 to S3-18, head model processor 34 performs processing to determine the positions for the eyes of the 3D computer head model in three-dimensional space using
<Desc/Clms Page number 27>
the image data stored in image data store 28, and to rotate the 3D computer head model about respective axes through the centre of the head parallel to the x-axis and y-axis in dependence upon the determined eye positions.
More particularly, at step S3-14, eye position calculator 40 performs processing to process the stored frame of image data to determine the positions of the eyes of the user 2 within the three-dimensional coordinate system of the 3D computer head model.
Figure 7 shows the processing operations performed by eye position calculator 40 at step S3-14.
Referring to Figure 7, at step S7-2, eye position calculator 40 performs processing to calculate the ycoordinate positions of the eyes and eyebrows of the user 2 within the three-dimensional coordinate system of the 3D computer head model.
Figure 8 shows the processing operations performed by eye position calculator 40 at step S7-2.
Referring to Figure 8, at step S8-2, eye position calculator 40 considers the next search band 80 (Figure
<Desc/Clms Page number 28>
5) previously defined at step S3-4 for the y-coordinate eye and eyebrow calculations (this being the first such search band 80 the first time step S8-2 is performed).
At step S8-4, eye position calculator 40 projects the next sample point"q"defined at step S3-4 for the search band currently being considered from the threedimensional coordinate system of the 3D computer head model to the two-dimensional coordinate system of the image data. More particularly, eye position calculator 40 projects the sample point"q"using the transformation equation previously defined at step S3-2 and updated at steps S3-10 and S3-12 (as well as step S3-18 if step S3- 18 has previously been performed), which maps points from 3D model space to 2D image space.
It should be noted that the search band 80 selected at step S8-2 and the sample point"q"selected at step S8-4 have a position and orientation within the threedimensional coordinate space of the 3D computer head model determined by the translation and rotation of the 3D computer head model at step S3-10 and S3-12, as well as steps S3-18 (described below) if step S3-18 has already been performed for a previous frame of image data. Accordingly, each search band 80 will map with
<Desc/Clms Page number 29>
representative distortion onto an area in the image corresponding to approximately the same position on the face in the image as the position of the search band relative to the 3D computer head model. This facilitates reliable tracking of the eyes of the user 2 through the sequence of the frames of image data received from camera 4.
At step S8-6, eye position calculator 40 reads an image value of the image pixel onto which the sample point"q" was projected at step-S8-4. In this embodiment, eye position calculator 40 reads the grey-scale intensity value of the pixel.
At step S8-8, eye position calculator 40 determines whether there is another sample point"q"in the search band 80 currently being considered. Steps S8-4 to S8-8 are repeated until each sample point q in the search band has been processed in the way described above.
At step S8-10, eye position calculator 40 calculates the average of the pixel intensity values read at step S8-6 for the search band currently being considered, to give a single average value for the search band.
<Desc/Clms Page number 30>
At step S8-12, eye position calculator 40 determines whether there is another search band 80 to be processed. Steps S8-2 to S8-12 are repeated until each search band 80 within the left group of bands 60 and within the right group of bands 70 (Figure 5) has been processed in the way described above. Consequently, as a result of this processing, eye position calculator 40 calculates a respective average pixel intensity value for each search band 80.
At step S8-14, eye position calculator 80 performs processing to spatially smooth the average intensities for the search bands 80 in the left group 60 and performs processing to spatially smooth the average intensities for the search bands 80 in the right group 70.
More particularly, in this embodiment, eye position calculator 40 performs the smoothing processing by convolving the average intensity values for the search bands in the left group with a rectangle function (of size 3 in this embodiment) and convolving the average pixel intensity values for the search bands in the right group with a rectangle function (again of size 3 in this embodiment). This spatial smoothing removes artefacts in the average intensities caused by the search bands 80
<Desc/Clms Page number 31>
within the respective left and right groups 60,70 being defined without an overlap.
As a result of the processing at steps S8-2 to S8-14, eye position calculator 40 has calculated values representative of the intensity of the pixels in the stored image data for search bands arranged in left and right groups in the y-axis direction of the threedimensional coordinate space of the 3D computer head model and covering the eyes and eyebrows of the head.
At steps S8-16 to S8-22, eye position calculator 40 processes the calculated intensity values to determine the y-coordinate positions of the user's eyes in the three-dimensional coordinate space of the 3D computer head model based on the positions of the minima in the intensity values which represent dark regions on the user's face corresponding to his eyebrows and eyes.
More particularly, at step S8-16, eye position calculator 40 identifies minima within the smoothed intensity values of the left group 60 and minima within the smoothed intensity values of the right group 70, where a minima is defined to be a search band 80 being a smoothed average intensity value for which the smoothed average
<Desc/Clms Page number 32>
intensity values of the second bands on both sides of it are larger.
At step S8-18, eye position calculator 40 identifies the least two minima from those identified at step S8-16 for the left group 60 and also identifies the least two minima for the right group 70 (the least two minima in each case being the minima of lowest intensity value). The two identified least minima in each group correspond to the y-positions of the user's eyebrow and eye.
At step S8-20, using the principle that an eye of the user will always be below an eyebrow, eye position calculator 40 selects the left eye y-position in the three-dimensional coordinate system of the 3D computer head model as the y-coordinate of the centre of the search band 80 having the minima in the left group 60 identified at step S8-18 which has the lowest y-position.
In addition, eye position calculator 40 selects the left eyebrow y-position in the three-dimensional coordinate system of the 3D head model as the y-coordinate of the centre of the search band 80 having the minima in the left group 60 identified at step S8-18 having the highest y-position. Eye position calculator 40 selects the right eye y-position and right eyebrow y-position in the three-
<Desc/Clms Page number 33>
dimensional coordinate system of the 3D head model in the same way using the relative y-positions of the minima identified at step S8-18 in the right band 70.
At this stage in the processing, the y-positions of the left eyebrow, right eyebrow, left eye and right eye are those of the centre of the search bands 80 containing the minima identified at step S8-18 and selected at step S8- 20.
To calculate more accurate y-position values, eye position calculator 40 performs additional processing in this embodiment at step S8-22.
More particularly, at step S8-22, eye position calculator 40 calculates a respective refined y-position for each of the user's left eyebrow, left eye, right eyebrow and right eye using a parabolic curve technique.
The processing performed at step S8-22 is illustrated in Figure 9.
Referring to Figure 9, eye position calculator 40 considers the smoothed averaged intensity values 150,152 calculated at step S8-14 for the search bands 80 on each
<Desc/Clms Page number 34>
side of the search band 80 containing the minima 154 selected at step S8-20 having the y-coordinate value 156 which is to be refined. Eye position calculator 40 then defines a parabola 160 passing through the points 162 and 164 defined by coordinate pairs comprising an xcoordinate of the respective smoothed average intensity value and a y-coordinate of the centre of the search band 80 for which the smoothed average intensity value was calculated. Eye position calculator 40 then identifies the minima 170 of the parabola 160 and defines the ycoordinate value 180 of the parabola minima to be the refined y-coordinate for the eye or eyebrow for which the calculation has been performed. This processing is repeated for each of the left eyebrow, left eye, right eyebrow and right eye to calculate respective refined ycoordinates therefor.
Referring again to Figure 7, at step S7-4, eye position calculator 40 performs processing to calculate respective x-positions in the three-dimensional coordinate space of the 3D computer head model for the left eye, left eyebrow, right eye and right eyebrow.
Figure 10 shows the processing operations performed by eye position calculator 40 at step S7-4.
<Desc/Clms Page number 35>
Referring to Figure 10, at step S10-2, eye position calculator 40 considers the next group 100-140 of search bands 90 (Figure 6) previously defined at step S3-4 for the calculation of x-coordinates (this being the first such group the first time step S10-2 is performed).
At step S10-4, eye position calculator 40 projects the next sample point"q"for the group 100-140 of search bands currently being considered from the threedimensional coordinate space of the 3D computer head model to the two-dimensional coordinate space of the image data (this being the first such sample point"q" the first time step S10-4 is performed). This projection is carried out using the transformation equation previously defined at step S3-2 and updated at steps S3- 10 and S3-12 (as well as step S3-18 if step S3-18 has previously been performed).
At step S10-6, eye position calculator 40 reads an image value of the pixel in the image data onto which the sample point"q"was projected at step S10-4. In this embodiment, eye position calculator 40 reads the greyscale intensity value of the pixel.
At step S10-8, eye position calculator 40 determines
<Desc/Clms Page number 36>
whether there is another sample point"q"within the group of four search bands currently being considered, and steps S10-4 to S10-8 are repeated until each such sample point has been processed in the way described above.
At step S10-10, eye position calculator 40 calculates the average of the sampled pixel intensity values read at step S10-6 to give a single average intensity value for the group of four search bands 90 currently being considered.
At step S10-12, eye position calculator 40 determines whether there is another group 100-140 of four search bands 90 to be processed. Steps S10-2 to S10-12 are repeated until each group of four search bands has been processed in the way described above.
At step S10-14, eye position calculator 40 identifies minima within the average intensity values calculated at step S10-10 (a minima being defined to be a value where the average intensity values for the group of four search bands on both sides are larger).
Eye position calculator 40 then selects the smallest
<Desc/Clms Page number 37>
minima (that is, the average intensity minima having the lowest value). The four centres of the search bands 90 for the group having the selected smallest minima define the x-coordinates in the three-dimensional coordinate system of the user's eyes and eyebrows. This is because the eyes and eyebrows produce dark regions in the image data received from camera 4 and these regions and identified by selecting the smallest minima at step S10- 14.
At step S10-16, eye position calculator 40 calculates a refined x-coordinate value of the user's eyes and eyebrows in the three-dimensional coordinate space of the head model. This processing is performed using the parabola technique described previously with reference to step S8-22. However, the processing is carried out once to calculate a refinement applicable to the xcoordinates of both eyes and both eyebrows, rather than carrying out separate processing to calculate a respective refinement for each of the eyes and eyebrows.
This is because the x-coordinate values of the eyes and eyebrows determined at step S10-14 are all derived from a single group 100-140 of four search bands.
As a result of the processing described above with
<Desc/Clms Page number 38>
reference to Figures 8 to 10, eye position calculator 40 has calculated a respective x, y coordinate pair defining the centre of each of the user's eye and eyebrows in the three-dimensional coordinate space of the 3D computer head model.
Referring again to Figure 3, at step S3-16, eye position calculator 40 performs processing to verify the x, y positions calculated at step S3-14 and to amend the positions should it be determined that this is necessary.
More particularly, eye position calculator 40 determines whether an eye and eyebrow have been identified for the left-hand and right-hand side of the user's face (because it may be the case that the processing at step S8-16 identifies only one minima, or no minima at all, in each of the left and right groups 60,70 of search bands 80), compares the relative calculated positions of the eyes and eyebrows in the three-dimensional space of the 3D computer head model, and performs the verification and amendment processing in accordance with the following rules: 1. If both the eye and eyebrow have been identified for both the left and right sides of the face,
<Desc/Clms Page number 39>
then: (a) If the identified eye positions lie on a line perpendicular to the y-axis in the 3D coordinate system of the 3D computer head model, then the eye positions are considered to be accurate and no amendment is necessary; else (b) If any of the identified feature positions lie on a line perpendicular to the y-axis in the 3D coordinate system of the 3D computer head model (for example, left eye and right eyebrow), then these are determined to be the eye positions, and the eye position coordinates are amended accordingly.
(2) If only one feature (eye or eyebrow) has been identified on each of the left and right sides of the face, then: (a) If the positions of the features lie on a line perpendicular to the y-axis in the 3D coordinate system of the 3D computer head model, then these are set to be the positions
<Desc/Clms Page number 40>
of the user's eyes; else (b) The position of the feature having the lowest y-coordinate value is set to be the position of the first eye, and the second eye is defined to be at a position having the same y- coordinate value as the first eye and having an x-coordinate value the same as that originally calculated for the feature identified on the same side of the face as the second eye.
3. If only one feature (eye or eyebrow) has been identified on one side of the face and two features have been identified on the other side of the face, then the positions of the eyes are set to be the positions of the features on opposite sides of the face for which the straight line therebetween is closest to parallel to the x-axis.
4. If only one feature (eye or eyebrow) has been identified on one side of the face and no feature has been identified on the other side of the face, then the position of the identified feature is defined to be the position of the first eye, and
<Desc/Clms Page number 41>
the position of the second eye is defined to have the same y-coordinate value and an equal and opposite x-coordinate value.
At step S3-18, head model processor 34 rotates the 3D computer head model within its three-dimensional coordinate system in accordance with the eye positions calculated at steps S3-14 and S3-16. The eyebrow positions calculated at steps S3-14 and S3-16 are not used in this embodiment to rotate the 3D computer head model at step S3-18 because the user can move his eyebrows on his head, and it has been found that this can lead to the 3D computer head model being rotated inaccurately if the eyebrow positions are used at step S3-18 as well as, or instead of, the eye portions.
Figure 11 shows the processing operations performed at step S3-18.
Referring to Figure 11, at step S11-2, if the calculated eye positions were changed during the verification processing at step S3-16, then rotator and translator 42 rotates the 3D computer head model about an axis parallel to the Z-axis of the three-dimensional coordinate system in accordance with the change in the calculated eye
<Desc/Clms Page number 42>
positions. More particularly, rotator and translator 42 rotates the 3D head model about an axis through the centre of the head and parallel to the z-axis so that the symmetry plane of the 3D computer head model (that is, the notional"vertical"plan through the centre of the head model which divides the head model into left-hand and right-hand sides) is perpendicular to the straight line connecting the amended eye positions. This is achieved by changing the rotation Rz in equation (1) above.
At step Sll-4, head model processor 34 projects (using the transformation defined at step S3-2 and subsequently updated as described above) the eye positions calculated as a result of the processing at step S3-14 (and amended, if necessary, during the processing at step S3-16) from the three-dimensional coordinate system of the 3D computer head model into the 2D coordinate system of the frame of image data stored in image data store 28 and currently being processed. These projected positions define the calculated eye positions in 2D image space.
At step S11-6, head model processor 34 performs processing to determine the rotation of the 3D computer head model about an axis through the centre of the head
<Desc/Clms Page number 43>
model parallel to the x-axis (Rz in equation (1) ) and an axis through the centre of the head model parallel to the y-axis (Ry in equation (1) ) of the three-dimensional coordinate system which orientates the head so that the eyes thereof project to the same positions in the 2D image space as the calculated eye positions generated at step Sll-4.
More particular, in this embodiment, head model processor 34 projects the current eye positions (that is, the positions of the eyes-in the 3D computer head model in its current orientation RxRy) from the three-dimensional coordinate system of the 3D computer head model into the 2D coordinate system of the image data. Each point is projected using the transformation previously calculated at step S3-2 and subsequently updated as described above.
Head model processor 34 then iteratively rotates the 3D computer head model in its three-dimensional space about axes through the centre of the head parallel to the xaxis and y-axis (by iteratively changing Rx and Ry in equation (1) ) to minimise the error between the projected calculated eye positions and the projected current eye positions in 2D image space. In this embodiment, this processing is carried out using a spring-mass-damper type
<Desc/Clms Page number 44>
numerical technique.
More particularly, head model processor 34 performs the
following operations a predetermined number of times (set to 8 times in this embodiment) :
.. K B " CALCULATE Y =- (' + E,),,--' - /M/M-n'I II CALCULATE Y =y +y At -n-n-I-n ROTATE 0 = e +V . \. \ Y
REPROJECT CURRENT EYE POSITIONS INTO 2D IMAGE SPACE where: "K"is a spring constant (set to 0.002 in this embodiment).
"B"is a damper constant (set to 2 in this embodiment).
"m"is a mass (set to 1 in this embodiment).
Ei is the error vector between the projected calculated position and projected current position of the left eye.
<Desc/Clms Page number 45>
Er is the error vector between the projected calculated position and projected current position of the right eye.
"n"indicates a value for the current iteration.
"n indicates a value for the previous iteration (yin- being set to 0 for n=l).
At is a constant (set to 0.1 in this embodiment).
Vx is the x component of y' Vy is the y component of y'.
As a result of this processing, the 3D computer head model is rotated about axes through the centre of the head parallel to the x-axis and y-axis of the threedimensional coordinate system to minimise the error vectors in a stable manner. That is, values of Rx and Ry in equation (1) are calculated which give a minimum error.
Referring again to Figure 3, as a result of the processing at step S3-18, head model processor 34 has changed the orientation of the 3D computer head model
<Desc/Clms Page number 46>
defined by the data stored in head model store 24 so that its orientation corresponds to the orientation of the head of user 2 determined from the image data received from camera 4. In other words, the values of T, Rx, Ry and Rz in equation (1) define the position and orientation of the 3D computer head model calculated from the current frame of image data.
At step S3-20, gaze direction calculator 44 determines
the direction in which the user 2 is looking. In this L embodiment, gaze direction calculator 40 performs the calculation by processing the image data stored in image data store 28 to determine the positions of the irises and pupils of the eyes of the user 2 relative to the rest of the eyes, and then uses the calculated positions together with the determined position and orientation of the 3D computer head model (determined at steps S3-10, S3-12 and S3-18) to calculate the direction in which the user 2 is looking.
The processing to calculate the positions of the irises and pupils within the eyes of the user 2 is carried out by gaze direction calculator 44 in a conventional manner, for example by extracting pixels from the image data relating to the eyes, scaling the pixel data to give an
<Desc/Clms Page number 47>
image of each eye of a predetermined size (so that an image is obtained at the same size regardless of the distance of the user 2 from the camera 4), and processing the generated image of each eye to identify a dark circle surrounded by a white region, and the position of the dark circle relative to the white region.
As a result of performing the processing described above, processing apparatus 8 has calculated the gaze direction of the user 2 for the current frame of the image data received from camera 4.
At step S3-22, central controller 20 determines whether another frame of image data has been received from camera 4. Steps S3-6 to S3-22 are repeated for each frame of image data received from the camera 4.
Second Embodiment A second embodiment of the invention will now be described.
In the first embodiment, search bands 80 are defined in the three-dimensional coordinate space of the 3D head model covering regions corresponding to the left and
<Desc/Clms Page number 48>
right eyes and eyebrows. For each frame of image data received from camera 4, sample points q for each search band 80 are projected from the three-dimensional space of the 3D computer head model into the two-dimensional space of the image, and the intensity value of each pixel onto which a sample point projects is read. An average intensity value for each search band 80 is then calculated from the intensity values read for the sample points within the search band. The positions of the user's eyes and eyebrows in the three-dimensional coordinate space of the head model are then determined in dependence upon the positions of intensity minima within the left group 60 of search bands 80 and right group 70 of search bands 80.
In the second embodiment, on the other hand, the programming instructions input on storage medium 16 and/or signal 18 comprise instructions to cause the processing apparatus 2 to become configured to define search bands in the three-dimensional coordinate space of the 3D computer head model such that each band comprises a line (instead of an area having a width Ry as in the first embodiment) with sample point q defined on the line. For each frame of image data received from camera 4, the intensity value of each pixel in the image
<Desc/Clms Page number 49>
data corresponding to a sample point q is read and an average intensity value is calculated for each search band in the three-dimensional coordinate space of the 3D computer head model. As a result, an intensity profile is generated as a function of position over the left and right eyes and eyebrows in the three-dimensional coordinate space of the 3D head model. Each generated profile is compared with a pre-generated intensity profile for the user 2 to calculate a respective Ycoordinate value in the three-dimensional coordinate space of the 3D computer head model for the left eye and the right eye of the 3D computer head model.
Referring to Figure 12, when programmed by the programming instructions, the functional units in the second embodiment are the same as those in the first embodiment, with the exception that the functional units further comprise a reference intensity profile generator 35 operable to calculate and store a reference intensity profile for the left eye and left eyebrow and a reference intensity profile for the right eye and right eyebrow of each user of the apparatus (these being the reference profiles against which profiles determined for each frame of image data during eye tracking are compared). In addition, the processing operations performed by search
<Desc/Clms Page number 50>
parameter definer 38 and eye position calculator 40 are different to the processing operations in the first embodiment. These differences will be described below.
Figure 13 shows the processing operations performed by the functional components within processing apparatus 8 in the second embodiment.
Referring to Figure 13, the processing operations performed at steps S13-2, S13-6 to 813-12, and S13-18 to S13-22 are the same as the processing operations performed at steps S3-2, S3-6 to S3-12, and S3-18 to S3- 22 in the first embodiment. Accordingly, these processing operations will not be described again here.
On the other hand, the processing operations performed at steps S13-4, S13-5 and S13-14 are different to the processing operations performed in the first embodiment.
Accordingly, these processing operations will now be described in detail.
The processing by search parameter definer 38 at step S13-4 will be described first.
Search parameter definer 38 defines search bands for use
<Desc/Clms Page number 51>
in calculating Y-coordinates of the eyes of the 3D computer head model and search bands for use in calculating X-coordinates of the eyes of the 3D computer head model. The search bands defined by search parameter definer 38 at step S13-4 for use in calculating the Xcoordinates of the eyes of the 3D computer model are the same as those defined in the first embodiment and described above with reference to Figures 6a to 6f.
However, the search bands defined for use in calculating the Y-coordinates of the eyes of the 3D computer head model are different to-those in the first embodiment.
Figure 14a shows the search bands for use in calculating the Y-coordinates of the eyes of the 3D computer head model defined by search parameter 38 at step S13-4 in the second embodiment.
Referring to Figure 14a, search parameter definer 38 defines a left group 60 of search bands 200 and a right group 70 of search bands 200 arranged on either side of the head symmetry line (which, in this embodiment, is defined by the Y-axis of the three-dimensional coordinate space, as in the first embodiment).
Each search band 200 is defined relative to the 3D
<Desc/Clms Page number 52>
computer head model so that the search bands move with the head model when it is translated and rotated within the three-dimensional coordinate system.
Search parameter definer 38 defines the search bands 200 within each group 60,70 to lie between Y-axis limits of Ymin and Y. These limits are the same as those in the first embodiment and are defined to ensure that the search bands 200 within each group 60,70 cover fully the left eye and left eyebrow and the right eye and right eyebrow respectively of the 3D computer head model.
Each search band 200 is defined by search parameter definer 38 in the second embodiment to be a straight line having a centre point lying on a perpendicular line 202 between Ymx and Ymin which passes through the centre of the left eye and left eyebrow or the centre of the right eye and right eyebrow. Each search band 200 is perpendicular to the lines 202 and lies in a plane perpendicular to the Z-axis of the 3D coordinate system. In this embodiment, the Z-coordinate value of the plane containing the search bands 200 is set such that it is equal to the Z-coordinate value of the eyes of the 3D computer head model (with the result that the plane passes through the 3D computer head model).
<Desc/Clms Page number 53>
Within each group 60,70, the search bands 200 are equally spaced along the line 202 between the limits Ymax and Yn.
In this embodiment, the spacing between adjacent search bands 200 is defined so that, when the search bands 200 are transformed from the three-dimensional coordinate system of the 3D computer head model to the twodimensional coordinate system of an image, the transformed bands have a spacing in the image equal to the spacing of adjacent lines of pixels in the image. In this way, the search bands 200 correspond to adjacent lines of pixels in the image. Accordingly, although only eleven search bands 200 are shown in Figure 14a in each of the groups 60,70, in practice 40 search bands 200 are typically defined in each group 60,70.
The width of each search band 80 in the X-axis direction is Rx (this being the same width as the search bands 80 in the first embodiment). Search parameter definer 38 defines Rx to ensure that each search band 200 is sufficiently wide in the X-axis direction to enclose fully an eye or eyebrow.
Within each respective search band 200, search parameter definer 38 defines a plurality of sample points"q". In this embodiment, the number of sample points q is defined
<Desc/Clms Page number 54>
to be the same for each search band 200, and the sample points are evenly spaced throughout the width of the search band 200. In this embodiment, the number of sample points q for each search band 200 is set such that, when the sample points q are projected from the threedimensional coordinate system of the 3D computer head model to the two-dimensional coordinate system of an image, the spacing between the projected sample points is equal to the spacing between adjacent pixels on a row in the image. Accordingly, although only nine sample points q are shown in Figure 14a for each search band 200, in practice, 30 sample points q are typically defined for each search band 200.
Although each search band 200 and each sample point q is illustrated in Figure 14a, in this embodiment, search parameter definer 38 does not actually perform processing at step S13-4 to calculate the position of each search band 200 and each sample point q in the three-dimensional space of the 3D computer head model. Instead, referring to Figure 14b, search parameter definer 38 defines the positions of eight sample points q 210-224, referred to as"keypoints".
Keypoints 210,212, 214 and 216 define the positions of the
<Desc/Clms Page number 55>
sample points q at the ends of the two lines 202. Keypoints 218 and 220 define the positions of the sample point q at the ends of one of the search bands 200 in the right group 70 (labelled 226 in Figure 14b). In this embodiment, the keypoints 218,220 define positions of the sample points q at the ends of the search band 200 which passes through the centre of the line 202 defined by keypoints 210 and 212. Similarly, keypoints 222 and 224 define the positions of sample points q at the ends of a search band 200 in the left group 60 (in this embodiment, the search band 200 which passes through the centre of the line 202 defined by keypoints 214 and 216 and labelled 228 in Figure 14b).
To define the positions of the other search bands 200 and sample points q, search parameter definer 38 stores data defining the number of search bands 200 in each of the left and right groups 60,70, and data defining the number of sample point q in each respective search band 200.
As will be explained below, instead of projecting every one of the sample points q shown in Figure 14a from the three-dimensional space of the 3D computer head model to the two-dimensional space of an image in order to calculated Y-coordinates for the eyes of the 3D computer
<Desc/Clms Page number 56>
model, in this embodiment only the eight keypoints 210- 224 shown in Figure 14b are projected and the positions to which the other sample points q would be projected are calculated in the two-dimensional space of the image based upon the positions of the projected keypoints 210- 224. It has been found that performing processing in this way reduces the amount of computation required compared to processing which projects every individual sample point q from the three-dimensional space of the 3D computer head model to the two-dimensional space of an image.
Referring again to Figure 3, the processing performed at step S13-5 in the second embodiment to generate and store left and right eye reference intensity profiles for the user will now be described. This processing is performed by the reference intensity profile generator 35. The purpose of the processing is to generate a respective reference intensity profile for the left and right eyes of the current user 2 of processing apparatus 8 for use in subsequent processing to calculate the Y-coordinates of the user's eyes when the user's eyes are being tracked. Each reference intensity profile defines the intensity (determined from an image of the user 2) at each position corresponding to the position of a search
<Desc/Clms Page number 57>
band 200 along a respective one of the lines 202. Accordingly, each reference profile defines how the intensity varies along a line 202 through positions corresponding to the pupil of the user's eye and the user's eyebrow.
Figure 15 shows the processing operations performed by reference intensity profile generator 35 at step S13-5.
Referring to Figure 15, at step S15-2, reference intensity profile generator 35 processes frames of image data received from camera 4 to determine the 2D positions of the user's eyes within a frame of the image data. In this embodiment, the processing at step S15-2 is performed in a conventional manner to process frames of image data from camera 4 to detect a frame in which the user blinks and to determine the positions of the user's eyes in this frame of image data, for example as described in"Coordination of Perceptual Processes for Computer-Mediated Communication"by Coutaz et al in 1996 Proceedings of 2nd International Conference on Automatic Face and Gesture Recognition, IEEE Computer Security Press, pages 106-111, and"Eigen Space Coding as a Means to Support Privacy in Computer-Mediated Communication" by Coutaz et al in 1977 Proceedings of INTERACT'97,
<Desc/Clms Page number 58>
published by Chapman and Hall. Of course, other processing techniques could be used at step S15-2 to determine the positions of the user's eyes in one of the frames of image data.
At steps S15-4 to S15-10, reference intensity profile generator 35 controls position, Z-orientation and size calculator 32 and head model processor 34 to perform processing to translate and rotate the 3D computer head model within its three-dimensional space to position the eyes of the 3D computer model at locations corresponding to the positions in the frame of image data determined at step S15-2. That is, reference intensity profile generator 35 rotates and translates the 3D computer head model to position the eyes such that, when the transformation previously generated at step S13-2 is applied, the eyes project to the positions in the frame of image data calculated at step S15-2.
The processing performed at steps S15-4 to S15-10 is the same as the processing performed in the first embodiment at steps S3-8, S3-10, S3-12 and Sll-6 (the only difference being that the processing at step S15-10 is performed to minimise the error between the eye positions determined at step S15-2 and the projected current eye
<Desc/Clms Page number 59>
positions rather than to minimise the error between the projected calculated eye positions and the projected current eye positions as in the processing at step Sll-6). Since these processing operations have been described above, they will not be described again here.
When rotating and translating the 3D computer head model at steps S15-4 to S15-10, the positions of the eight keypoints 210-224 shown in Figure 14b will move with the 3D computer head model since they are defined relative to the 3D computer head model.
At steps S15-12 to S15-18, reference intensity profile generator 35 controls head model processor 34 to generate a respective reference intensity profile for each of the lines 202 by reading pixel data from the frame of image data in which the positions of the user's eyes were determined at step S15-2.
More particularly, at step S15-12, reference intensity profile generator 35 projects the eight keypoints 210-224 defined at step S13-4 from the three-dimensional space of the 3D computer head model to the two-dimensional space of the frame of image data in which the positions of the user's eyes were determined at step S15-2. This
<Desc/Clms Page number 60>
projection is carried out using the transformation equation previously defined at step S13-2 and updated at steps S15-6, S15-8 and S15-10.
At step S15-14, reference intensity profile generator 35 calculates the position of each sample point q in the two-dimensional space of the frame of image data using the positions of the projected keypoints generated at step S15-12.
The transformation applied at step S15-12 defines a linear projection from the three-dimensional space of the 3D computer head model to the two-dimensional space of the frame of image data. Therefore, referring to Figures 16a and 16b (which show an example when the keypoints 214,216, 222 and 224 for the left eye and left eyebrow are projected into a frame of image data), when the keypoints 214,216, 222,224 are projected from the three-dimensional coordinate system of the 3D computer head model to the two-dimensional coordinate system of the frame of image data, the straight lines 202,228 defined by the keypoints remain as straight lines 202', 228' in the two-dimensional space of the image data. Accordingly, the straight line 202'connecting the points 214'and 216' (which are the projections of the points 214,216) in the two-dimensional
<Desc/Clms Page number 61>
coordinate system of the image data defines the projection of the line 202, and the straight line 228' connecting the points 222'and 224' (which are the projections of the points 222,224) defines the projection of the line 228 in the two-dimensional coordinate system of the image data.
It should be noted, however, that the lengths of the lines 202'and 228'are not necessarily the same as the lengths of the lines 202,228, and that the angle between the lines 202'and 228'is not necessarily the same as the angle between the lines 202 and 228.
Referring to Figures 16c and 16d, at step S15-14, based on the data stored at step S13-4 defining the number of search bands 200, reference intensity profile generator 35 defines a plurality of straight lines 240 in the twodimensional coordinate system of the frame of image data on each side of the line 228'. A respective line 240 is defined for each search band 200 in the left group 60 of search bands in the three-dimensional coordinate system of the 3D computer head model (other than the search band corresponding to line 228, for which line 228'is already defined). Each line 240 is defined at a position such that the lines 240 and 228'are evenly spaced along the
<Desc/Clms Page number 62>
line 202'. Each line 240 has its mid-point on the line 202', and is defined to be parallel to the line 228'with a length equal to the length of the line 228'.
Having defined the lines 228'and 240, reference intensity profile generator 35 defines sample points q' along each line. For each respective line, the sample points q'are evenly spaced along the line and the number of sample points is set to be the same as the number of sample points for each search band 200 previously defined at step S13-4.
In this way, a respective sample point q'is defined for each sample point position q shown in Figure 14a.
Referring again to Figure 15, at step S15-16, for each calculated sample point q', reference intensity profile generator 35 reads an image value of the image pixel within which the sample point lies. In this embodiment, reference intensity profile generator 35 reads the greyscale intensity value of the pixel.
At step S15-18, for each search band 200, reference intensity profile generator 35 calculates and stores the average of the pixel intensity values read at step S15-16
<Desc/Clms Page number 63>
for the sample points q'on the line 228'or 240 corresponding to the search band 200. Consequently, as a result of this processing, reference intensity profile generator 35 calculates a respective average pixel intensity value for each search band 200 in the left and right groups 60,70 of search bands.
Referring to Figure 17, the average intensity values calculated at step S15-18 define an intensity profile 250 for the left eye and left eyebrow and an intensity profile 260 for the right eye and right eyebrow of the user. Each intensity profile 250,260 defines how the calculated average intensity value varies with position along the line 202. Consequently, each intensity profile 250,260 includes a relative reduction in intensity at positions corresponding to the pupil of an eye and positions corresponding to an eyebrow.
Referring again to Figure 15, at step S15-20, reference intensity profile generator 35 performs processing to normalise the left and right intensity profiles generated at step S15-18 to make the respective mean value of each profile equal to zero. More particularly, in this embodiment, reference intensity profile generator 35 normalises each of the left and right intensity profiles
<Desc/Clms Page number 64>
to make it zero-mean by setting each value, x,, in the profile to be:
where: x is the mean value of the values in the intensity profile before normalisation ; xm is the maximum value in the intensity profile before normalisation.
The normalised profiles are stored as left and right reference intensity profiles for the user.
Referring again to Figure 13, the processing performed by eye position calculator 40 at step S13-14 will now be described.
Figure 18 shows the processing operations performed at step S13-14.
Referring to Figure 18, at step S18-2, eye position calculator 40 performs processing to calculate the Y-
<Desc/Clms Page number 65>
coordinate positions of the left and right eyes within the three-dimensional coordinate system of the 3D computer head model, and at step S18-4 performs processing to calculate the X-coordinate positions of the eyes within the three-dimensional coordinate system of the 3D computer head model.
The processing performed at step S18-4 is the same as the processing performed at step S7-4 in the first embodiment. Accordingly, this processing will not be described again here, and only the processing performed at step S18-2 (which is different to that in the first embodiment) will be described.
Figure 19 shows the processing operations performed by eye position calculator 40 for a frame of image data at step S18-2 in the second embodiment.
Referring to Figure 19, at steps S19-2 to S19-10, eye position calculator 40 performs processing to project the eight keypoints 210-224 from their current positions in the three-dimensional space of the 3D computer head model (determined by the translation and rotation of the 3D computer head model at steps S13-10, S13-12 and S13-18) into the two-dimensional coordinate system of the image
<Desc/Clms Page number 66>
data, to define sample point positions q'in the image data, to read the intensity values of the pixels in which the sample points q'lie, to generate a respective intensity profile for the left and right eyes of the user for the current frame of image data, and to normalise the left and right intensity profiles to make each profile zero-mean.
The processing performed at steps S19-2 to S19-10 is the
same as the processing performed at steps S15-12 to S15L 20, described above. Accordingly, this processing will not be described again here.
At step S19-12, eye position calculator 40 compares the normalised left eye intensity profile for the current frame of image data (generated at step S19-10) with the stored reference intensity profile 250 for the user's left eye (generated at step S15-20) to calculate the Yposition in the three-dimensional space of the 3D computer model for the left eye for the current frame.
More particularly, in this embodiment, eye position calculator 40 compares the intensity profile for the current frame with the reference intensity profile 250 by performing a correlation-type operation on the two
<Desc/Clms Page number 67>
intensity profiles in accordance with the following equation:
where: n is the number of intensity values in a profile (that is the number of search bands 200 in the left or right group 60,70, which is 40 in this embodiment, with the intensity value number running from 0 to n-1, giving n values in total in the profile); C1 is the"i"th correlation value; i runs from- (n-1) to + (n-l) so that 2n-1 correlation values Ci are calculated; RJ is the"j"th intensity value in the reference intensity profile 250; Fj+1 is the (j+i) th intensity value in the current
<Desc/Clms Page number 68>
frame intensity profile; valid intensity values are values for which 0 < j+i < n-1.
Referring to Figures 20a to 20e, an explanation will now be given of the correlation operation performed at step S19-12.
As explained above, a correlation value C is calculated in accordance with equation (5) for each value of i between- (n-1) and + (n-l), so that 2n-l correlation values are calculated.
To generate each correlation value C, one or more intensity values in the intensity profile for the current frame are subtracted from corresponding intensity values in the reference intensity profile 250, with the number and identity of the intensity values from the image frame profile and reference profile used in the subtraction being determined in accordance with the values of i and j in equation (5).
Referring to Figures 20a to 20e, this can be thought of as sliding the intensity profile 270 for the current
<Desc/Clms Page number 69>
frame of image data to different positions relative to the reference intensity profile 250 (each position spaced apart from the previous position by a distance equal to the distance between adjacent search bands 200) and performing the subtraction for each different relative position.
Figure 20a illustrates the calculation of the correlation value C-nein this case, the intensity value Fo from the intensity profile 270 for the current frame of image data is subtracted from the intensity value Ru1 from the reference intensity profile 250, but no other subtractions are performed because no other points in the reference intensity profile 250 and the intensity profile 270 for the current frame of image data align.
In Figure 20b, the intensity profile 270 for the current frame of image data has been moved to the next position relative to the reference intensity profile 250 (that is, the intensity profile 270 has been moved by a distance corresponding to the distance between adjacent points in the profile-which is equal to the distance between adjacent search bands 200). The example shown in Figure 20b, corresponds to the calculation of the correlation value C n+2. In this case, the intensity value Fo from the
<Desc/Clms Page number 70>
intensity profile 270 for the current frame of image data is subtracted from the intensity value Rn-2 for the reference intensity profile 250, and the intensity value F1 from the intensity profile 270 for the current frame image data is subtracted from the intensity value Rnl for the reference intensity profile 250.
The intensity profile 270 for the current frame of image data is repeatedly moved by a distance equal to the distance between adjacent points therein to different positions relative to the reference intensity frame 250 to generate each new correlation value C\.
Figure 20c shows the situation where the intensity profile 270 for the current frame of image data has been moved relative to the reference intensity profile 250 such that every point in the intensity profile 270 is aligned to correspond to a point in the reference intensity profile 250. In this case, each intensity value Fo to Fn-l in the intensity profile 270 for the current frame of image data is subtracted from the corresponding intensity value Ro to Rnl in the reference intensity profile 250 to generate the correlation value Co.
The relative positions between the intensity profiles 250
<Desc/Clms Page number 71>
and 270 for the two final correlation values Cn-2 and Cn-l are shown in Figures 20d and 20e, respectively, with the subtractions Ro-Fn-2 and Ri-Fn-i being performed for Cn-2 and the subtraction Ro-Fn-l being performed for Cn-l.
The correlation value Ci which has the highest value of all the correlation values calculated in accordance with equation (5) defines the alignment of the intensity profile 270 for the current frame of image data relative to the reference intensity profile 250 for which the two profiles 250,270 are maximally similar.
Accordingly, eye position calculator 40 selects the highest correlation value C1 generated in accordance with equation (5) and calculates the Y-coordinate of the position of the left eye of the 3D computer model in dependence upon the relative displacements of the intensity profiles 270 and 250 to which the maximum correlation value corresponds. More particularly, the relative displacement of the intensity profile 270 for the current frame of image data and the reference intensity profile 250 is defined by the value of i of the selected highest correlation value. That is, the relative displacement is zero for correlation value Co, the relative displacement is equal to the distance
<Desc/Clms Page number 72>
between two adjacent search bands 200 in the positive Yaxis direction for correlation value C1 (and increases in the positive Y-axis direction by a distance equal to the distance between two adjacent search bands 200 for each increase by one of the index i to the correlation value), and the relative displacement is equal to the distance between two adjacent search bands 200 in the negative Yaxis direction for correlation value Ci (and increases in the negative Y-axis direction by a distance equal to the distance between two adjacent search bands 200 for each decrease by one of the index i to the correlation value).
Accordingly, eye position calculator 40 calculates the Y-coordinate of the left eye by adding or subtracting from the Y-coordinate value of the current eye position an amount defined by the index i to the highest correlation value calculated in accordance with equation (5).
Referring again to Figure 19, at step S19-14, eye position calculator 40 repeats the processing performed at step S19-12 to calculate the Y-coordinate of the right eye of the 3D computer model. This processing is the same as that performed at step S19-12 and will therefore
<Desc/Clms Page number 73>
not be described again here.
At step S19-16, eye position calculator 40 performs processing to calculate refined Y-positions for the left and right eyes respectively of the 3D computer head model.
More particularly, the Y-positions calculated at steps S19-12 andS19-14 are"quantized"positions because each position corresponds to the position of a search band 200 along one of the left or right lines 202. Accordingly, eye position calculator 40 performs processing corresponding to that performed at step S8-22 in the first embodiment (described above with reference to Figure 9) to calculate a respective refined Y-position for each of the left and right eyes, this processing allowing each refined Y-position to lie between the positions of search bands 200 along a line 202.
By performing the processing in this way, eye position calculator 40 calculates accurate Y-coordinate values for the eyes of the 3D computer head model in threedimensional space.
Modifications and Variations
<Desc/Clms Page number 74>
Many modifications and variations can be made to the embodiments described above within the scope of the claims.
For example, the search regions can be defined at steps S3-4 and S13-4 to have different properties to those described above. For example, referring to Figure 5 and Figure 14a, the number of search bands 80,200 within the left group 60 and right group 70 does not need to be the same. In addition, the search bands 80,200 within the left group do not need-to align with the search bands 80,200 in the right group.
In the first embodiment described above, the dimensions Rx and Ry of each search band 80 (Figure 5) are the same as the dimensions of each search band 90 (Figure 6). However, the search bands 90 can be defined to have different dimensions to the search bands 80. Similarly, the width Rx of each search band 200 in the search embodiment may be different to the width of each search band 90.
In the first embodiment described above, the search bands 80 within each of the left and right groups 60,70 (Figure 5) are defined so that they are adjacent but non-
<Desc/Clms Page number 75>
overlapping in the y-axis direction. However, instead, the search band 80 within a given group 60,70 could be defined to overlap in the y-axis direction. By defining the search bands 80 with sufficient overlap, it becomes unnecessary to perform the spatial smoothing processing at step S8-14 because such smoothing is inherently provided by the way in which the sample points"q"relate to overlapping search bands.
In the embodiments described above, refined Y-positions for the eyes are calculated at step S8-22 and step S19-16. However, in the first embodiment each search band 80 (Figure 5) could be defined with a very small dimension Ry, and in the second embodiment the search bands 200 could be defined with sub-pixel spacing between adjacent bands 200, so that it becomes unnecessary to calculate refined positions. Similarly, the dimensions Rx of each search band 90 (Figure 6) could be defined such that the processing to calculate refined positions at step S10-16 becomes unnecessary.
In the embodiments described above, the grey-scale intensity values of pixels are read at steps S8-6, S10-6, S15-16 and S19-16, and are subsequently used to determine the y-position and x-position of the eyes in 3D model
<Desc/Clms Page number 76>
space. However, different image values of the pixels may be used instead of the intensity values. For example, in a colour system, image values comprising the red, green and blue component values or hue value may be read at steps S8-6, S10-6, S15-16 and S19-6, and used to determine the y-position and x-position of the eyes and eyebrows in 3D model space. In particular, an image value comprising the green component value may be used.
In the embodiments described above, at steps S8-10, S15-18 and S19-8, the average of the pixel image values (intensity values in the embodiments) in a given band is calculated. Similarly, at step S10-10, the average of the pixel image values in a group of search bands is calculated. However, rather than calculating an average value at steps S8-10, S15-18 and S19-8 and S10-10, a different statistical measure of the pixel image values may be calculated. For example, the median, mode, geometric mean or variance may be calculated.
As part of the processing in the first embodiment at step S7-2 to calculate the eye and eyebrow y-positions in 3D model space, a test may be added, for example, between steps S8-18 and S8-20 to ensure that the least two minima in each search group 60,70 are sufficiently spaced apart
<Desc/Clms Page number 77>
in the y-axis direction to ensure that minima have not been selected which represent features closer together than the minimum separation of the user's eye and eyebrow, thereby improving the reliability of the yposition calculations.
In the first and second embodiments above, processing may be performed, for example between steps S10-14 and SIC- 16, to ensure that a region of skin-coloured pixels exist in the 2D image data between the identified eyebrow positions and between-the identified eye positions, thereby improving the reliability of the x-position calculations.
In the embodiments above, a spring-mass-damper technique is used at steps S3-18, S13-18 and S15-10 to rotate the 3D computer head model about axes through the centre of the head parallel to the x-axis and y-axis (that is, to update Rx and Ry in equation (1) ). Similarly, the same technique could be used to rotate the 3D computer head model about axes through the centre of the head parallel to the z-axis (that is, to update Rz), for example if the processing previously performed at step S3-8, S13-8 or S15-4 did not provide head orientation information (with the result that stepS3-12, S13-12 or S15-8 could not be
<Desc/Clms Page number 78>
performed).
Instead of using a spring-mass-damper technique at step S3-18, S13-18 and S15-10 to rotate the 3D computer head model, other techniques could. be used. For example, the 3D computer head model could be rotated about axes through the centre of the head parallel to the x and y axes (and also about an axis through the centre of the head parallel to the z-axis if required) to minimise the average pixel intensity in the group 100-140 of the search bands 90 previously determined at step S10-14 to contain the eyes and eyebrows of the user. More particularly, the 3D computer head model may be rotated to a new orientation (new Rx and/or Ry), and the processing of steps S10-4 to S10-10 repeated for the group of search bands previously selected at step S10-4. The calculated average intensity would then be stored and the processing repeated a number of times, each time following a new rotation of the 3D computer head model. The orientation which generates the lowest average intensity would then be selected as the correct orientation of the 3D computer head model because this most correctly aligns the 3D computer head model with the darkest regions (representing the eyes and eyebrows of the user) in the image data received from camera 4.
<Desc/Clms Page number 79>
In the first embodiment described above, at step S3-4, sample points"q"are defined within each of the search bands 80,90, and at steps S8-4 and S10-4, each sample point is projected from 3D model space into 2D image space to define a pixel in the 2D image whose intensity value is to be read. However, instead of defining sample points at step S3-4, the corner points of each search band 80,90 may be projected from 3D model space into 2D image space at steps S8-4 and S10-4 to define a search region within the 2D image, and the intensity value of every pixel (or a sub-set, such as every other pixel) within the projected search band in the 2D image may be read. Similarly, the four corner points of the left group 60 of search bands 80 and the four corner points of the right group 70 of search bands 80 may be projected from 3D model space into 2D image space to define a search region within the 2D image, the search region in the 2D image space may be divided into strips corresponding to the search bands 80, and the intensity value of every pixel (or a sub-set thereof) within each strip defined in the 2D image may be read. Because the projection from 3D model space to 2D image space is a linear projection, instead of projecting the four corner points of a group 60,70 of search bands 80 from 3D model space into 2D image space, it is only necessary to
<Desc/Clms Page number 80>
project three such points to uniquely define a projection of a search band within a 2D image.
In the second embodiment described above, instead of projecting four keypoints 210,212, 218,220 for the right group 70 of search bands 200 or four keypoints 214, 216, 222,224 for the left group 60 of search bands 200 from the three-dimensional space of the 3D computer head model into 2D image space, only three points need to be projected for each group (such as three of the four keypoints for each of the left and right groups 60,70) in order to uniquely define the projected sample point positions q'. This is because the projection from the three-dimensional space of the 3D computer head model into 2D image space is a linear projection.
In the second embodiment described above, eight keypoints 210-224 are defined at step S13-4 and are projected from the three-dimensional space of the 3D computer head model into 2D image space at steps S15-12 and S19-2. Sample point positions q'are then calculated in the 2D image at steps S15-14 and S19-4 in dependence upon the positions of the projected keypoints. However, instead, the position of each sample point q in the threedimensional space of 3D computer head model may be
<Desc/Clms Page number 81>
defined at step S3-4 (as illustrated in Figure 14a), and instead of performing the steps at S15-12, S15-14, S19-2 and S19-4, each sample point q may be projected from the three-dimensional space of the 3D computer head model into the 2D image space to define a sample point position q'.
In the second embodiment described above, the number of search bands 200 in each of groups 60,70 is fixed, and similarly, the number of sample points q for each search band 200 is fixed. However, the number of search bands 200 in each group 60,70 and/or the number of sample points q for each search band 200 may be varied during processing. For example, the number of search bands and/or number of sample points may be calculated for each frame of image data in dependence upon the size and zaxis orientation of the user's head within the image calculated at step S13-8. In this way, the number of search bands 200 and/or sample points q can be varied in dependence upon the head within the image to ensure that an unnecessarily large number of sample points q are not defined and processed, thereby saving processing resources.
In the second embodiment described above, the left and
<Desc/Clms Page number 82>
right eye reference intensity profiles 250,260 for the user are generated at step S13-5 using one frame of image data (that is, the frame from which the positions of the user's eyes were determined at step S15-2). However, each reference intensity profile may be generated using image data from more than one frame, thereby producing left and right eye reference intensity profiles for the user which take account of lighting changes for example due to movements of the user. Thus, for example, the processing at steps S15-2 to S15-18 may be performed a number of times so that intensity profiles 250,260 are generated at step S15-18 using different frames of image data. The generated intensity profiles for the left eye may then be averaged and normalised to give a reference profile for the left eye, and similarly the generated intensity profiles for the right eye may be averaged and normalised to give a reference profile for the right eye.
In the second embodiment described above, the correlation values Ci calculated in accordance with equation (5) may be thought of as a probability density function, with the Y-coordinate of the eye being defined by the peak (maximum likelihood) of the function. Accordingly, other processing operations conventionally applied when using probability density functions may be used. For example,
<Desc/Clms Page number 83>
the probability density function defined by the correlation values C1 may be multiplied by a Gaussian function representing the expected position of the eye (determined, for example, in dependence upon how far the user's eye may move given the likely speed of movement of a user's head and the time between the recording of consecutive frames of image data). Other operations, could, of course, be performed instead or as well.
In the second embodiment described above, the search bands 200 are defined in the three-dimensional space of the 3D computer head model such that, when the search bands are transformed into a two-dimensional image, the spacing between adjacent transformed search bands is equal to one pixel in the image. However, instead, the search bands 200 may be defined in the three-dimensional space with different spacings therebetween. For example, the search bands 200 may be define in three-dimensional space of the 3D computer head model such that, when the search bands are transformed into a 2D image, the spacing between adjacent transformed search bands is less than one pixel in the image.
In the second embodiment above, each search band 200 is a line. However, instead, the search bands 80 from the
<Desc/Clms Page number 84>
first embodiment may be used in the second embodiment instead of the lines 200, and the processing from the first embodiment to calculate a respective average pixel intensity value for each search band 80 may be used in the second embodiment. In this way, left and right eye reference intensity profiles 250,260 would be generated using the search bands and associated processing from the first embodiment, but the number of values within each reference profile would be fewer in number than if the reference profiles were generated using lines 200 (because each search band 80 has a width Ry greater than the width of each line 200).
In the embodiments described above, the 3D computer model stored in head model store 24 is a 3D computer model of the head of the user 2, generated, for example, by laser scanning etc. However, instead, the three-dimensional computer model does not actually need to be a model of the head of the user 2, but instead may be a 3D computer model of a representative head.
In the embodiments described above, the three-dimensional computer model defined by the data stored in head model store 24 comprises a 3D computer model of a complete head. However, instead, the 3D computer model may
<Desc/Clms Page number 85>
comprise a model of just the relevant parts of the head, that is the parts containing the eyes and eyebrows.
In the embodiments described above, the processing to calculate the eye y-positions in 3D model space (step S7-2 and step S18-2) is performed before the processing to calculate the eye x-positions in 3D model space (step S7-4 and step S18-4). However, instead, the processing to calculate the eye x-positions in 3D model space may be performed before the processing to calculate the eye y-positions in 3D model space.
In the embodiments described above, the search bands 80, 200 for determining the eye and eyebrow y-coordinates are arranged (Figure 5 and Figure 14a) in groups 60,70 to divide the three-dimensional model space into bands extending from above the left eyebrow to below the left eye and bands extending from above the right eyebrow to below the right eye. On the other hand, the search bands 90 for determining eye and eyebrow x-coordinates are arranged (Figure 6) in groups of four bands, and the xcoordinates are calculated in dependence upon the group having the smallest minima in the average pixel intensity values (step S10-14). However, instead, the search bands 90 for determining the eye and eyebrow x-coordinates may
<Desc/Clms Page number 86>
be treated as separate, independent search bands 90 arranged in two groups-the first group aligned on the y-coordinate line YB corresponding to the eyebrows in the 3D computer model and the second group aligned on the ycoordinate line YE corresponding to the eyes in the 3D computer model. The x-coordinates of the eyes and eyebrows may then be calculated by identifying the search bands 90 corresponding to the least two minima within each of the first and second groups. The least two minima in the first group identify the search bands corresponding to the left and right eyebrows, while the least two minima in the second group identify the search bands corresponding to the left and right eyes.
Similarly, in the first embodiment, the search bands 80 for determining the eye and eyebrow y-coordinates may be arranged in groups of four bands having relative positions set in dependence upon the spacing between the eyes and eyebrows in the 3D computer model (in the same way that the relative spacing of the bands 90 in the embodiment described above is set) and the eye and eyebrow y-coordinates may be calculated by identifying the group of search bands having the smallest minima in the average pixel intensity values (in the same way that this was performed for the x-coordinate calculation in the embodiment described above).
<Desc/Clms Page number 87>
In the embodiments described above, data is input by a user defining intrinsic parameters of the camera 4.
However, instead, default values may be assumed for some, or all, of the intrinsic camera parameters, or processing may be performed to calculate the intrinsic parameter values in a conventional manner, for example as described in"Euclidean Reconstruction From Uncalibrated Views"by Hartley in Applications of Invariance in Computer Vision, Mundy, Zisserman and Forsyth eds, pages 237-256, Azores 1993.
In the embodiments described above, processing is performed by a computer using processing routines defined by programming instructions. However, some, or all, of the processing could, of course, be performed using hardware.
Other modifications are, of course, possible.

Claims (74)

  1. CLAIMS 1. A method of processing data in a processing apparatus to perform eye tracking across a sequence of images of a moving head, the method comprising processing each of at least some of the images in the sequence by: identifying pixels in the image corresponding to search regions defined relative to a three-dimensional computer model representative of at least part of a head in a three-dimensional space, the search regions being defined with different positions in a portion of the three-dimensional space corresponding to the eyes and eyebrows of the model, and the identification of pixels being performed in dependence upon data defining a transformation between the three-dimensional space and the two-dimensional image; determining image values of the identified pixels; determining eye positions in the three-dimensional space in dependence upon the determined image values; and rotating the three-dimensional computer head model in dependence upon the determined positions for the eyes, so as to move the search regions within the threedimensional space for subsequent processing of the next image in the sequence to be processed.
    <Desc/Clms Page number 89>
  2. 2. A method according to claim 1, wherein the processing to determine eye positions in the threedimensional space comprises: generating at least one respective representative image value for each search region using the read image values; and determining eye positions in the three-dimensional space in dependence upon the generated representative image values for the search regions.
  3. 3. A method according to claim 1 or claim 2, wherein the identification of pixels is carried out by projecting sample points defined in the three-dimensional space within the search regions into the image, and identifying the pixels onto which the sample points project.
  4. 4. A method according to claim 1 or claim 2, wherein the identification of pixels is carried out by projecting points from the three-dimensional space into the image to generate projected points, defining projected search regions in the image in dependence upon the projected points, and identifying pixels in the projected search regions.
  5. 5. A method according to any preceding claim, wherein
    <Desc/Clms Page number 90>
    each search region is a two-dimensional area of the three-dimensional space.
  6. 6. A method according to any of claims 1 to 4, wherein each search region is a line in the three-dimensional space.
  7. 7. A method of processing data in a processing apparatus defining a sequence of images of a person as the person's head moves, to track the eyes of the person in the images, the method comprising processing each of at least some of the images in the sequence by: determining image values of pixels located in the image at positions defined by the projection of search bands or points therein from a three-dimensional space into the two-dimensional image, wherein the search bands have different positions in the three-dimensional space, and wherein the projection is dependent upon the position and orientation of the search bands in the threedimensional space resulting from the processing of the previous image in the sequence to be processed; calculating positions for the eyes in the threedimensional space in dependence upon the determined image values; and moving the search bands within the three-dimensional
    <Desc/Clms Page number 91>
    space in dependence upon the calculated positions for the eyes, so as to move the search bands for subsequent processing of the next image in the sequence to be processed.
  8. 8. A method of processing data in a processing apparatus to perform eye tracking across a sequence of images of a moving head, the method comprising processing each of at least some of the images in the sequence by: identifying pixels in the image corresponding to search regions defined relative to a three-dimensional computer model representative of at least part of a head in a three-dimensional space, the search regions being defined with different positions in a portion of the three-dimensional space corresponding to the eyes and eyebrows of the model, and the identification of pixels being performed in dependence upon data defining a transformation between the three-dimensional space and the two-dimensional image; determining image values of the identified pixels; generating an image value profile from the image values representative of the variation in image value with variation in search region position; comparing the generated image value profile with a reference profile to determine eye positions in the
    <Desc/Clms Page number 92>
    three-dimensional space; and rotating the three-dimensional computer head model in dependence upon the determined positions for the eyes, so as to move the search regions within the threedimensional space for subsequent processing of the next image in the sequence to be processed.
  9. 9. A method according to claim 8, wherein the identification of pixels is carried out by projecting sample points defined in the three-dimensional space within the search regions into the image, and identifying the pixels onto which the sample points project.
  10. 10. A method according to claim 8, wherein the identification of pixels is carried out by projecting points from the three-dimensional space into the image to generate projected points, defining projected search regions in the image in dependence upon the projected points, and identifying pixels in the projected search regions.
  11. 11. A method according to any of claims 8 to 10, wherein each search region comprises a line in the threedimensional space of the computer model.
    <Desc/Clms Page number 93>
  12. 12. A method according to claim 11, wherein the spacing between adjacent search regions in the three-dimensional space is such that, when the search regions are transformed into a two-dimensional image in accordance with the transformation data, the spacing between adjacent transformed search regions is less than or equal to one pixel in the image.
  13. 13. A method according to any of claims 8 to 12, wherein the search regions extend in a transverse direction to a line through an eye and an eyebrow of the threedimensional computer model.
  14. 14. A method according to claim 13, wherein the search regions extend in a perpendicular direction to the line through the eye and the eyebrow.
  15. 15. A method according to claim 13 or claim 14, wherein the search regions intersect the line through the eye and the eyebrow at different respective positions therealong.
  16. 16. A method according to any of claims 8 to 15, wherein: the processing to identify pixels in the image comprises identifying a plurality of pixels in the image
    <Desc/Clms Page number 94>
    for each respective search region; and the processing to generate an image value profile comprises processing the plurality of read image values for each respective search region to calculate a respective representative image value for each search region.
  17. 17. A method according to claim 16, wherein a respective average value of the plurality of image values is calculated as the representative value for each search region.
  18. 18. A method according to any of claims 8 to 17, wherein the recessing to compare the generated image value profile with the reference profile comprises processing to compare the generated profile and reference profile for different relative positions thereof.
  19. 19. A method according to claim 18, wherein the processing to compare the generated image value profile with the reference profile comprises processing to compare the generated profile and reference profile for different relative positions thereof to determine the match therebetween for each relative position and to determine the relative position for which the match
    <Desc/Clms Page number 95>
    between the generated profile and reference profile is greatest.
  20. 20. A method of processing data in a processing apparatus defining a sequence of images of a person as the person's head moves, to track the eyes of the person in the images, the method comprising processing each of at least some of the images in the sequence by: determining image values of pixels located in the image at positions defined by the projection of search bands or positions therein from a three-dimensional space into the two-dimensional image, wherein the search bands have different positions in the three-dimensional space, and wherein the projection is dependent upon the position and orientation of the search bands in the threedimensional space resulting from the processing of the previous image in the sequence to be processed; generating at least one representative image value for each search band using the determined pixel image values; comparing the generated representative image values with reference image values representative of image values for an eye and an eyebrow to calculate positions for the eyes in the three-dimensional space; and moving the search bands within the three-dimensional
    <Desc/Clms Page number 96>
    space in dependence upon the calculated positions for the eyes, so as to move the search bands for subsequent processing of the next image in the sequence to be processed.
  21. 21. A method of processing data in a processing apparatus to perform eye tracking across a sequence of images of a moving head, the method comprising processing each of at least some of the images in the sequence by: identifying pixels in the image in dependence upon search regions defined relative to a three-dimensional computer model representative of at least part of a head in a three-dimensional space, the search regions dividing into parts a portion of the three-dimensional space corresponding to the eyes and eyebrows of the model, and in dependence upon data defining a transformation between the three-dimensional space and the two-dimensional image; determining image values of the identified pixels; determining which of the search regions project onto dark regions in the image in dependence upon the image values; determining eye positions in the three-dimensional space in dependence upon the relative positions of search regions determined to project onto dark regions of the
    <Desc/Clms Page number 97>
    image; and rotating the three-dimensional computer head model in dependence upon the determined positions for the eyes, so as to move the search regions within the threedimensional space for subsequent processing of the next image in the sequence to be processed.
  22. 22. A method according to claim 21, wherein the identification of pixels is carried out by projecting sample points defined in the three-dimensional space within the search regions into the image, and identifying the pixels onto which the sample points project.
  23. 23. A method according to claim 21 or claim 22, wherein: the search regions include first and second groups of search regions, the first group dividing into parts a portion of the three-dimensional space extending over a first two of the left eyebrow, left eye, right eyebrow and right eye of the three-dimensional computer model, and the second group dividing into parts a portion of the three-dimensional space extending over a second two of the left eyebrow, left eye, right eyebrow and right eye of the three-dimensional computer model; first and second search regions which correspond to dark regions in the image are identified within each
    <Desc/Clms Page number 98>
    group; and positions for the eyes are determined in dependence upon the relative positions of the identified search regions within each group.
  24. 24. A method according to claim 23, wherein: the search regions include first and second groups of search regions, the first group dividing into parts a portion of the three-dimensional space extending from above the left eyebrow to below the left eye of the three-dimensional computer model, and the second group dividing into parts a portion of the three-dimensional space extending from above the right eyebrow to below the right eye of the three-dimensional computer model; first and second search regions corresponding to dark regions in the image are identified within each group; and positions for the eyes are determined in dependence upon the position of the identified search region within each group which corresponds to a lower position on the three-dimensional computer head model.
  25. 25. A method according to claim 23 or claim 24, wherein: the search regions include first and second sets of search regions;
    <Desc/Clms Page number 99>
    the first set of search regions contains first and second first and second groups of search regions, the first group dividing into parts a portion of the threedimensional space extending over a first two of the left eyebrow, left eye, right eyebrow and right eye of the three-dimensional computer model and the second group dividing into parts a portion of the three-dimensional space extending over a second two of the left eyebrow, left eye, right eyebrow and right eye of the threedimensional computer model; within the second-set of search regions, each search region comprises a plurality of spatially separated parts having relative positions dependent upon the spacing of the eyes and eyebrows in the three-dimensional computer model ; values of a first coordinate of the eye positions are determined by: identifying first and second search regions within each group of the first set corresponding to dark regions in the image; and determining values of the first coordinate for the eye positions in dependence upon the relative positions of the identified search regions within each group in the first set; and values for a second coordinate of the eye positions
    <Desc/Clms Page number 100>
    are determined by: identifying a search region in the second set corresponding to dark regions in the image; and determining values of the second coordinate for the eye positions in dependence upon the positions of the spatially separated parts within the identified search region of the second set.
  26. 26. A method according to claim 25, wherein: the plurality of search regions in the second set divide into parts a portion of the three-dimensional space extending from outside the left eye to outside the right eye of the three-dimensional computer model; and the processing to determine the values of the second coordinate includes: selecting from among the search regions in the second set the search region which corresponds to the darkest region in the image; and determining values of the second coordinate for the eye positions in dependence upon the positions of the spatially separated areas within the identified search region of the second set.
  27. 27. A method according to any of claims 21 to 26, wherein the search regions are defined with overlaps in
    <Desc/Clms Page number 101>
    the three-dimensional space.
  28. 28. A method according to any of claims 21 to 27, wherein each search region is a two-dimensional area of the three-dimensional space.
  29. 29. A method according to any of claims 21 to 28, wherein the processing to determine which search regions project onto dark regions in the image includes: using the read image values to calculate a respective representative image value for each search region; spatially smoothing the representative image values; and identifying search regions which project onto dark regions in the image using the smoothed representative image values.
  30. 30. A method according to claim 29, wherein an average value of the pixel image values is calculated as the representative value for each search region.
  31. 31. A method of processing data in a processing apparatus defining a sequence of images of a person as the person's head moves, to track the eyes of the person
    <Desc/Clms Page number 102>
    in the images, the method comprising processing each of least some of the images in the sequence by: determining the image values of pixels located in the image at positions defined by the projection of search bands or points therein from a three-dimensional space into the two-dimensional image, wherein the search bands have different positions in the three-dimensional space so as to divide into parts a portion of the threedimensional space, and wherein the projection is dependent upon the position and orientation of the search bands in the three-dimensional space resulting from the processing of the previous image in the sequence to be processed; calculating a respective representative image value for each search band using the determined pixel image values; calculating positions for the eyes in the threedimensional space by using the representative values to identify search bands corresponding to dark regions in the image and in dependence upon the relative positions of the identified search bands; and moving the search bands within the three-dimensional space in dependence upon the calculated positions for the eyes, so as to move the search bands for subsequent processing of the next image in the sequence to be
    <Desc/Clms Page number 103>
    processed.
  32. 32. A method according to any preceding claim, wherein the processing of each image further comprises: processing the image data to determine a position of the head in the image; and moving the three-dimensional computer model of the head and the associated search regions within the threedimensional space in dependence upon the determined position.
  33. 33. A method according to any preceding claim, wherein the processing of each image further comprises: processing the image data to determine a size of the head in the image; and moving the three-dimensional computer model of the head and the associated search regions within the threedimensional space in dependence upon the determined size.
  34. 34. A method according to any preceding claim, further comprising the step of defining the search regions in the three-dimensional space.
  35. 35. A method according to any preceding claim, further comprising calculating a gaze direction for each of at
    <Desc/Clms Page number 104>
    least some of the images.
  36. 36. A method according to claim 35, further comprising transmitting a signal conveying the determined gaze direction.
  37. 37. A method according to claim 36, further comprising making a recording of the signal either directly or indirectly.
  38. 38. Image processing apparatus for processing image data of at least some images in a sequence of images of a moving head to perform eye tracking across the sequence, the apparatus comprising: means for storing data defining a three-dimensional computer model representative of at least part of a head in a three-dimensional space, and data defining search regions having different positions in a portion of the three-dimensional space corresponding to the eyes and eyebrows of the model; pixel identifying means for identifying pixels in an image corresponding to the search regions, the pixel identifying means being operable to identify pixels in dependence upon a transformation between the threedimensional space and the two-dimensional image;
    <Desc/Clms Page number 105>
    means for determining image values of the identified pixels; means for determining eye positions in the threedimensional space in dependence upon the determined image values; and means for rotating the three-dimensional computer head model in dependence upon the determined positions for the eyes, so as to move the search regions within the three-dimensional space for subsequent processing of the next image in the sequence to be processed.
  39. 39. Apparatus according to claim 38, wherein the means for determining eye positions in the three-dimensional space comprises: means for generating at least one respective representative image value for each search region using the read image values; and means for determining eye positions in the threedimensional space in dependence upon the generated representative image values for the search regions.
  40. 40. Apparatus according to claim 38 or claim 39, wherein the pixel identifying means is operable to project sample points defined in the three-dimensional space within the search regions into the image, and to identify the pixels
    <Desc/Clms Page number 106>
    onto which the sample points project.
  41. 41. Apparatus according to claim 38 or claim 39, wherein the pixel identifying means is operable to project points from the three-dimensional space into the image to generate projected points, to define projected search regions in the image in dependence upon the projected points, and to identify pixels in the projected search regions.
  42. 42. Apparatus according to any of claims 38 to 41, wherein each search region is a two-dimensional area of the three-dimensional space.
  43. 43. Apparatus according to any of claims 38 to 41, wherein each search region is a line in the threedimensional space.
  44. 44. Apparatus for processing image data of each of at least some images in a sequence of images of a person as the person's head moves, to track the eyes of the person in the images, the apparatus comprising: means for identifying pixels located in the image at positions defined by the projection of search bands or points therein from a three-dimensional space into the
    <Desc/Clms Page number 107>
    two-dimensional image, wherein the search bands have different positions in the three-dimensional space, and wherein the projection is dependent upon the position and orientation of the search bands in the three-dimensional space resulting from the processing of the previous image in the sequence to be processed; means for determining image values of the identified pixels; means for calculating positions for the eyes in the three-dimensional space in dependence upon the determined image values; and means for moving the search bands within the threedimensional space in dependence upon the calculated positions for the eyes, so as to move the search bands for subsequent processing of the next image in the sequence to be processed.
  45. 45. Apparatus for processing image data of each of at least some images in a sequence of images of a moving head to perform eye tracking across the sequence, the apparatus comprising: means for storing data defining a three-dimensional computer model representative of at least part of a head in a three-dimensional space, and data defining search regions having different positions in a portion of the
    <Desc/Clms Page number 108>
    three-dimensional space corresponding to the eyes and eyebrows of the model; pixel identifying means for identifying pixels in the image corresponding to the search regions, the pixel identifying means being operable to identify pixels in dependence upon a transformation between the threedimensional space and the two-dimensional image; means for determining image values of the identified pixels; means for generating an image value profile from the image values representative of the variation in image value with variation in search region position; means for comparing the generated image value profile with a reference profile to determine eye positions in the three-dimensional space; and means for rotating the three-dimensional computer head model in dependence upon the determined positions for the eyes, so as to move the search regions within the three-dimensional space for subsequent processing of the next image in the sequence to be processed.
  46. 46. Apparatus according to claim 45, wherein the pixel identifying means is operable to identify pixels by projecting sample points defined in the three-dimensional space within the search regions into the image, and
    <Desc/Clms Page number 109>
    identifying the pixels onto which the sample points project.
  47. 47. Apparatus according to claim 45, wherein the pixel identifying means is operable to identify pixels by projecting points from the three-dimensional space into the image to generate projected points, defining projected search regions in the image in dependence upon the projected points, and identifying pixels in the projected search regions.
  48. 48. Apparatus according to any of claims 45 to 47, wherein each search region comprises a line in the threedimensional space of the computer model.
  49. 49. Apparatus according to claim 48, wherein the spacing between adjacent search regions in the three-dimensional space is such that, when the search regions are transformed into a two-dimensional image in accordance with the transformation, the spacing between adjacent transformed search regions is less than or equal to one pixel in the image.
  50. 50. Apparatus according to any of claims 45 to 49, wherein the search regions extend in a transverse
    <Desc/Clms Page number 110>
    direction to a line through an eye and an eyebrow of the three-dimensional computer model.
  51. 51. Apparatus according to claim 50, wherein the search regions extend in a perpendicular direction to the line through the eye and the eyebrow.
  52. 52. Apparatus according to claim 50 or claim 51, wherein the search regions intersect the line through the eye and the eyebrow at different respective positions therealong.
  53. 53. Apparatus according to any of claims 45 to 52, wherein: the pixel identifying means is operable to identify a plurality of pixels in the image for each respective search region; and the means for generating an image value profile is operable to process the plurality of read image values for each respective search region to calculate a respective representative image value for each search region.
  54. 54. Apparatus according to claim 53, wherein the means for generating an image value profile is operable to calculate a respective average value of the plurality of
    <Desc/Clms Page number 111>
    image values as the representative value for each search region.
  55. 55. Apparatus according to any of claims 45 to 54, wherein the means for comparing the generated image value profile with the reference profile is operable to compare the generated profile and reference profile for different relative positions thereof.
  56. 56. Apparatus according to claim 55, wherein the means for comparing the generated image value profile with the reference profile is operable to compare the generated profile and reference profile for different relative positions thereof to determine the match therebetween for each relative position and to determine the relative position for which the match between the generated profile and reference profile is greatest.
  57. 57. Apparatus for processing image data of each of at least some images in a sequence of images of a person as the person's head moves, to track the eyes of the person in the images, the apparatus comprising: means for identifying pixels located in the image at positions defined by the projection of search bands or positions therein from a three-dimensional space into
    <Desc/Clms Page number 112>
    the two-dimensional image, wherein the search bands have different positions in the three-dimensional space, and wherein the projection is dependent upon the position and orientation of the search bands in the three-dimensional space resulting from the processing of the previous image in the sequence to be processed; means for determining image values of the identified pixels; means for generating at least one representative image value for each search band using the determined pixel image values ; means for comparing the generated representative image values with reference image values representative of image values for an eye and an eyebrow to calculate positions for the eyes in the three-dimensional space; and means for moving the search bands within the threedimensional space in dependence upon the calculated positions for the eyes, so as to move the search bands for subsequent processing of the next image in the sequence to be processed.
  58. 58. Apparatus for processing image data of at least some images in a sequence of images of a moving head to perform eye tracking across the sequence, the apparatus
    <Desc/Clms Page number 113>
    comprising : means for storing data defining a three-dimensional computer model representative of at least part of a head in a three-dimensional space, and data defining search regions dividing into parts a portion of the threedimensional space corresponding to the eyes and eyebrows of the model; pixel identifying means for identifying pixels in the image in dependence upon the search regions and in dependence upon data defining a transformation between the three-dimensional-space and the two-dimensional image; means for determining image values of the identified pixels; search region selection means for determining which of the search regions project onto dark regions in the image in dependence upon the image values; eye position determining means for determining eye positions in the three-dimensional space in dependence upon the relative positions of search regions determined to project onto dark regions of the image; and means for rotating the three-dimensional computer head model in dependence upon the determined positions for the eyes, so as to move the search regions within the three-dimensional space for subsequent processing of the
    <Desc/Clms Page number 114>
    next image in the sequence to be processed.
  59. 59. Apparatus according to claim 58, wherein the pixel identifying means is operable to identify pixels by projecting sample points defined in the three-dimensional space within the search regions into the image, and identifying the pixels onto which the sample points project.
  60. 60. Apparatus according to claim 58 or claim 59, wherein: the search regions include first and second groups of search regions, the first group dividing into parts a portion of the three-dimensional space extending over a first two of the left eyebrow, left eye, right eyebrow and right eye of the three-dimensional computer model, and the second group dividing into parts a portion of the three-dimensional space extending over a second two of the left eyebrow, left eye, right eyebrow and right eye of the three-dimensional computer model; the search region selection means is operable to identify first and second search regions which correspond to dark regions in the image within each group; and the eye position determining means is operable to determine positions for the eyes in dependence upon the
    <Desc/Clms Page number 115>
    relative positions of the identified search regions within each group.
  61. 61. Apparatus according to claim 60, wherein : the search regions include first and second groups of search regions, the first group dividing into parts a portion of the three-dimensional space extending from above the left eyebrow to below the left eye of the three-dimensional computer model, and the second group dividing into parts a portion of the three-dimensional space extending from above the right eyebrow to below the right eye of the three-dimensional computer model ; the search region selection means is operable to identify first and second search regions corresponding to dark regions in the image within each group ; and the eye position determining means is operable to determine positions for the eyes in dependence upon the position of the identified search region within each group which corresponds to a lower position on the threedimensional computer head model.
  62. 62. Apparatus according to claim 60 or claim 61, wherein : the search regions include first and second sets of search regions ;
    <Desc/Clms Page number 116>
    the first set of search regions contains first and second first and second groups of search regions, the first group dividing into parts a portion of the threedimensional space extending over a first two of the left eyebrow, left eye, right eyebrow and right eye of the three-dimensional computer model and the second group dividing into parts a portion of the three-dimensional space extending over a second two of the left eyebrow, left eye, right eyebrow and right eye of the threedimensional computer model; within the second set of search regions, each search region comprises a plurality of spatially separated parts having relative positions dependent upon the spacing of the eyes and eyebrows in the three-dimensional computer model; the apparatus is operable to perform processing to determine values of a first coordinate of the eye positions by: identifying first and second search regions within each group of the first set corresponding to dark regions in the image; and determining values of the first coordinate for the eye positions in dependence upon the relative positions of the identified search regions within each group in the first set; and
    <Desc/Clms Page number 117>
    the apparatus is operable to perform processing to determine values for a second coordinate of the eye positions by: identifying a search region in the second set corresponding to dark regions in the image; and determining values of the second coordinate for the eye positions in dependence upon the positions of the spatially separated parts within the identified search region of the second set.
  63. 63. Apparatus according to claim 62, wherein: the plurality of search regions in the second set divide into parts a portion of the three-dimensional space extending from outside the left eye to outside the right eye of the three-dimensional computer model; and the apparatus is operable to perform processing to determine the values of the second coordinate by: selecting from among the search regions in the second set the search region which corresponds to the darkest region in the image; and determining values of the second coordinate for the eye positions in dependence upon the positions of the spatially separated areas within the identified search region of the second set.
    <Desc/Clms Page number 118>
  64. 64. Apparatus according to any of claims 58 to 63, wherein the search regions are defined with overlaps in the three-dimensional space.
  65. 65. Apparatus according to any of claims 58 to 64, wherein each search region is a two-dimensional area of the three-dimensional space.
  66. 66. Apparatus according to any of claims 58 to 65, wherein the search region selection means includes : representative-value calculating means for calculating a respective representative image value for each search region using the read image values; means for spatially smoothing the representative image values; and means for identifying search regions which project onto dark regions in the image using the smoothed representative image values.
  67. 67. Apparatus according to claim 66, wherein the representative value calculating means is operable to calculate an average value of the pixel image values as the representative value for each search region.
  68. 68. Apparatus for processing image data of each of at
    <Desc/Clms Page number 119>
    least some images in a sequence of images of a person as the person's head moves, to track the eyes of the person in the images, the apparatus comprising: means for identifying pixels located in the image at positions defined by the projection of search bands or points therein from a three-dimensional space into the two-dimensional image, wherein the search bands have different positions in the three-dimensional space so as to divide into parts a portion of the three-dimensional space, and wherein the projection is dependent upon the position and orientation of the search bands in the three-dimensional space resulting from the processing of the previous image in the sequence to be processed; means for determining image values of the identified pixels; means for calculating a respective representative image value for each search band using the determined pixel image values; means for calculating positions for the eyes in the three-dimensional space by using the representative values to identify search bands corresponding to dark regions in the image and in dependence upon the relative positions of the identified search bands; and means for moving the search bands within the threedimensional space in dependence upon the calculated
    <Desc/Clms Page number 120>
    positions for the eyes, so as to move the search bands for subsequent processing of the next image in the sequence to be processed.
  69. 69. Apparatus according to any of claims 38 to 68, further comprising: means for processing the image data to determine a position of the head in the image; and means for moving the three-dimensional computer model of the head and the associated search regions within the three-dimensional space in dependence upon the determined position.
  70. 70. Apparatus according to any of claims 38 to 69, further comprising: means for processing the image data to determine a size of the head in the image; and means for moving the three-dimensional computer model of the head and the associated search regions within the three-dimensional space in dependence upon the determined size.
  71. 71. Apparatus according to any of claims 38 to 70, further comprising means for defining the search regions in the three-dimensional space.
    <Desc/Clms Page number 121>
  72. 72. Apparatus according to any of claims 38 to 71, further comprising means for calculating a gaze direction for each of at least some of the images.
  73. 73. A storage medium storing instructions for programming a programmable processing apparatus to become operable to perform a method as set out in at least one of claims 1 to 35.
  74. 74. A signal carrying instructions for programming a programmable processing apparatus to become operable to perform a method as set out in at least one of claims 1 to 35.
GB0229264A 2002-02-04 2002-12-16 Eye tracking using image data Expired - Fee Related GB2384959B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/358,067 US7197165B2 (en) 2002-02-04 2003-02-04 Eye tracking using image data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GBGB0202520.3A GB0202520D0 (en) 2002-02-04 2002-02-04 Eye tracking in image data

Publications (3)

Publication Number Publication Date
GB0229264D0 GB0229264D0 (en) 2003-01-22
GB2384959A true GB2384959A (en) 2003-08-06
GB2384959B GB2384959B (en) 2005-10-26

Family

ID=9930341

Family Applications (2)

Application Number Title Priority Date Filing Date
GBGB0202520.3A Ceased GB0202520D0 (en) 2002-02-04 2002-02-04 Eye tracking in image data
GB0229264A Expired - Fee Related GB2384959B (en) 2002-02-04 2002-12-16 Eye tracking using image data

Family Applications Before (1)

Application Number Title Priority Date Filing Date
GBGB0202520.3A Ceased GB0202520D0 (en) 2002-02-04 2002-02-04 Eye tracking in image data

Country Status (1)

Country Link
GB (2) GB0202520D0 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015110852A1 (en) * 2014-01-24 2015-07-30 Sony Corporation Face tracking for a mobile device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115798019B (en) * 2023-01-06 2023-04-28 山东星科智能科技股份有限公司 Computer vision-based intelligent early warning method for practical training driving platform

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002009025A1 (en) * 2000-07-24 2002-01-31 Seeing Machines Pty Ltd Facial image processing system
WO2002089064A1 (en) * 2001-04-27 2002-11-07 Smart Eye Ab Method for automatic tracking of a moving body

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002009025A1 (en) * 2000-07-24 2002-01-31 Seeing Machines Pty Ltd Facial image processing system
WO2002089064A1 (en) * 2001-04-27 2002-11-07 Smart Eye Ab Method for automatic tracking of a moving body

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015110852A1 (en) * 2014-01-24 2015-07-30 Sony Corporation Face tracking for a mobile device
US9958938B2 (en) 2014-01-24 2018-05-01 Sony Corporation Gaze tracking for a mobile device

Also Published As

Publication number Publication date
GB0202520D0 (en) 2002-03-20
GB2384959B (en) 2005-10-26
GB0229264D0 (en) 2003-01-22

Similar Documents

Publication Publication Date Title
US7197165B2 (en) Eye tracking using image data
Yenamandra et al. i3dmm: Deep implicit 3d morphable model of human heads
CN112766160B (en) Face replacement method based on multi-stage attribute encoder and attention mechanism
CN111428586B (en) Three-dimensional human body posture estimation method based on feature fusion and sample enhancement
US6072903A (en) Image processing apparatus and image processing method
KR100682889B1 (en) Method and Apparatus for image-based photorealistic 3D face modeling
US5805745A (en) Method for locating a subject&#39;s lips in a facial image
US7369687B2 (en) Method for extracting face position, program for causing computer to execute the method for extracting face position and apparatus for extracting face position
EP1223540A2 (en) Method and apparatus for collating object
US20240037852A1 (en) Method and device for reconstructing three-dimensional faces and storage medium
Brunelli Estimation of pose and illuminant direction for face processing
CN112950775A (en) Three-dimensional face model reconstruction method and system based on self-supervision learning
Jones et al. Model-based matching by linear combinations of prototypes
CN113570684A (en) Image processing method, image processing device, computer equipment and storage medium
CN113160275B (en) Automatic target tracking and track calculating method based on multiple videos
CN115239861A (en) Face data enhancement method and device, computer equipment and storage medium
CN108615256A (en) A kind of face three-dimensional rebuilding method and device
CN112101195A (en) Crowd density estimation method and device, computer equipment and storage medium
Khan et al. An efficient encoder–decoder model for portrait depth estimation from single images trained on pixel-accurate synthetic data
CN115147261A (en) Image processing method, device, storage medium, equipment and product
Phillips et al. Bayesian faces via hierarchical template modeling
Kar Mastering Computer Vision with TensorFlow 2. x: Build advanced computer vision applications using machine learning and deep learning techniques
GB2384959A (en) Eye tracking using image data with model data
Tous Pictonaut: movie cartoonization using 3D human pose estimation and GANs
KR20040042501A (en) Face detection based on template matching

Legal Events

Date Code Title Description
PCNP Patent ceased through non-payment of renewal fee

Effective date: 20141216