GB2396003A - System for comparing colour images - Google Patents

System for comparing colour images Download PDF

Info

Publication number
GB2396003A
GB2396003A GB0223761A GB0223761A GB2396003A GB 2396003 A GB2396003 A GB 2396003A GB 0223761 A GB0223761 A GB 0223761A GB 0223761 A GB0223761 A GB 0223761A GB 2396003 A GB2396003 A GB 2396003A
Authority
GB
United Kingdom
Prior art keywords
image
data
colour
images
stored
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB0223761A
Other versions
GB0223761D0 (en
GB2396003B (en
Inventor
Richard Ian Taylor
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Publication of GB0223761D0 publication Critical patent/GB0223761D0/en
Publication of GB2396003A publication Critical patent/GB2396003A/en
Application granted granted Critical
Publication of GB2396003B publication Critical patent/GB2396003B/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/245Aligning, centring, orientation detection or correction of the image by locating a pattern; Special marks for positioning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

A method and apparatus for comparing colour images independently of the illumination of a subject object in said images comprises: means for receiving image data comprising colour data for a plurality of pixels representing colours in an image; a processing unit for generating a colour reflectance image independent of illumination colour; a comparator operable to compare generated colour reflectance images with stored image data. Said correspondence is determined on the basis of the comparison of ratios of colour data for pixels in the generated colour reflectance image and ratios of colour data in said stored images. The reflectance image may be generated by a Retinex processing algorithm. The colour comparison system may be used in a system for determining gaze direction of a computer operator.

Description

GAZE TRACKING SYSTEM
The present application relates to gaze tracking using computer processing of image data.
There are many applications which require information about where an operator is looking at. Two examples are utilising the determination of gaze of an operator to effect cursor control or icon selection.
In a further example, if the orientation and direction of an operators head can be determined, data representing this orientation and direction can be transmitted and used to animate a virtual representation of that 15 operator. The virtual representation of the operator can then be shown to interact with other virtual representations with the representation of the operator being made to look at appropriate portions of the other virtual representations. In this way the interaction of 20 groups of people may be more accurately represented.
An example of a system in which an operator's gaze is tracked and used to animate virtual representations of an operator is described in GB-A2351216. In GB-A-2351216 25 the gaze of an operator is tracked through the use of
markers placed on the operators head. With the markers in place, images of the operator are then recorded. These images are then processed to identify the relative positions of the markers in each image. These relative 5 positions are then used to calculate the orientation of the operator's head and hence the location of the operators gaze which is used to determine how a virtual representation of the operator is animated and interacts with other virtual representations. Although, using 10 markers simplifies the tracking of gaze, requiring an operator to place markers on their head is undesirable.
An alternative image processing system which avoids the need for such markers is therefore required. In such a 15 system, it is desirable that the tracking system can be used in circumstances where the operating environment does not have to be strictly controlled. That is to say the tracking system does not require lighting levels to remain substantially fixed and the tracking system does 20 not overly limit the allowed movement of an operator whose gaze is being tracked. It is also desirable that the system is able to track the gaze of a variety of users and that the tracking of gaze can be achieved without excessive demands for training data.
In accordance with one aspect of the present invention there is provided an apparatus for comparing colour images independently of the illumination of a subject object in said images, comprising: 5 a receiver operable to receive image data representative of images, said image data comprising colour data for a plurality of pixels indicative of colours of a subject object appearing in said images; a processing unit operable to derive from image data 10 received by said receiver, a colour reflectance image said colour reflectance image comprising colour data for said plurality of pixels indicative of the contribution to the colour of a subject object in an image not arising due to the colour of the illumination of the subject 15 object in said image, said processing unit being operable to derive said colour reflectance image such that the ratios of colour data of pixels in said colour reflectance image for a said subject object are independent of the colour of the illumination of said 20 subject object in said image; and a comparator operable to compare generated colour reflectance images with stored image data to determine the correspondence between said stored data and said generated images;
wherein said correspondence is determined on the basis of the comparison of the ratios of colour data for pixels in a said generated colour reflectance image and corresponding ratios of colour data for pixels of said 5 stored image data.
In accordance with a further aspect of the present invention there is provided a method of comparing colour images independently of the illumination of a subject 10 object in said images, comprising the steps of: receiving image data representative of images, said image data comprising colour data for a plurality of pixels indicative of the colours of a subject object appearing in said images; 15 processing a received image to derive from received image data a colour reflectance image said colour reflectance image comprising colour data for said plurality of pixels indicative of the contribution to the colour of a subject object in an image not arising due 20 to the colour of the illumination of the subject object in said image, said processing being such to derive said colour reflectance image such that the ratios of colour data of pixels in said colour reflectance image for a said subject object are independent of the colour of the 25 illumination of said subject object in said image; and
comparing generated colour reflectance images with stored image data to determine the correspondence between said stored data and said generated images; wherein said correspondence is determined on the 5 basis of the comparison of the ratios of colour data for pixels in a said generated colour reflectance image and corresponding ratios of colour data for pixels of said stored image data.
10 Embodiments and applications of the present invention will now be described, by way of example only, with reference to the accompanying drawings, in which: Figure 1 is a schematic perspective view of a user operating a computer system incorporating a gaze tracking 15 system embodying the present invention; Figure 2 is a schematic block diagram of the functional components of the computer system of Figure 1; Figure 3 is an illustration of areas of an image 20 utilised to generate a set of image patches for processing by the classification unit of Figure 2; Figure 4 is an exemplary illustration of a feature vector passed from the first self organising map to the second self organising map of Figure 2;
Figures 5A and 5B are schematic illustrations for explaining the generation of tables representing feature vectors received by the second self-organising map of Figure 2; 5 Figure 6 is a flow diagram of the processing operations performed by the image processing module of Figure 2; Figure 7 is a flow diagram of the processing operations performed by the first self organising map of 10 Figure 2; Figure 8 is a schematic illustration of a self organising map for explaining the interaction of the update module, the patch matching module and the feature image store of Figure 2; 15 Figure 9 is a flow diagram of the processing operations performed by the second self organising map of Figure 2; Figures lOA, lOB, lOC and lOD are schematic illustrations for explaining the relative ordering of 20 points using data identifying adjacent points; Figure 11 is a flow diagram of the processing operations performed by the calibration unit of Figure 2;
Figure 12 is a schematic illustration of data stored in the conversion table of Figure 2; and Figure 13 is a flow diagram of the processing operations performed by the co-ordinate update unit of 5 Figure 2 for updating data stored in the conversion table of Figure 2.
First Embodiment Referring to Figure 1, a computer 1 is provided which is 10 connected to a display 3, a keyboard 5 and a mouse 7.
Also connected to the computer 1 is a camera 8 that is oriented to obtain images of an operator 9 inputting data into the computer using the mouse 7 and the keyboard 5.
15 The computer 1 is programmed to operate in accordance with programming instructions input for example as data stored on a storage medium such as a disc 10 or down loaded as a signal from a network such as the Internet.
As will be described in more detail below, the 20 programming instructions comprise instructions to cause the computer 1 to be configured to process image frames obtained via the camera 8 showing the operator 9 to associate the image frames with estimated gaze co ordinates of the location at which the operator 9 was 25 looking when the images were recorded by the camera 8.
These calculated co-ordinates can then be utilised by an application program to, for example, animate a virtual representation of the operator 9.
5 As will be described in detail later, the processing of the computer 1 associates image frames with estimated gaze co-ordinates primarily on the basis of the content of the recorded images and the relative timing at which the images were recorded. Occasionally, images of the 10 operator 9 will be obtained when the operator 9 uses a pointer 11 under the control of the mouse 7 to select a specific portion of the display 3. In this embodiment, gaze co-ordinates generated by the computer 1 are further refined using data identifying such operator-selected 15 portions of the display 3 when this is available.
System Overview Referring to Figure 2, when programmed by the programming instructions by being read from a disk 10 or by being 20 downloaded as signal 15 over a network or the Internet, the computer 1 effectively becomes configured into a number of functional units for performing processing operations. Examples of such functional units and their interconnections are shown in Figure 2. The units and the 25 interconnections illustrated in Figure 2 are, however,
notional and are shown for illustrative purposes only to assist understanding; they do not necessarily represent the exact units and connections into which the processors, memory etc. of the computer l become 5 configured. Referring to the functional units shown in Figure 2, in this embodiment, the programing instructions cause the computer to be configured into three main functional 10 units: a classification unit 20, a calibration unit 22 and an application program 23.
The classification unit 20 receives a stream of camera images from the camera 8, each of which comprises, in 15 this embodiment, 176 by 144 pixels. Each image is processed in turn by the classification unit 20 and assigned a classification number. As soon as an image has been processed its classification number is passed to the calibration unit 22. Once a classification number has 20 been passed to the calibration unit 22, the classification unit 20 then proceeds to process the next image in the stream of images received from the camera 8. Meanwhile, the calibration unit 22 utilises the received classification number to generate gaze co 25 ordinates indicative of a gaze location. When gaze co
ordinates have been determined by the calibration unit 22, these are passed on to the application program 23.
The application program 23 then utilises the gaze co ordinates whilst the calibration unit 22 proceeds to 5 process the classification number for the next image frame. The processing of the classification unit 20 is such that, after an initial set up period, images 10 corresponding to a user looking in the same area are assigned the same classification number. This initial set up period does not correspond to the processing of a fixed number of frames. Rather, after processing increasing numbers of frames the classification unit 20, 15 will increasingly assign the same classification numbers to images corresponding to a user looking in the same area. The exact number of images which the classification unit 20 requires to achieve this consistent classification of images will depend upon the content of 20 the images being processed.
In this embodiment, the classification unit 20 is arranged to associate each image with one of fifty different classification numbers. The classification of 25 an image frame by the classification unit 20 therefore
identifies that the image frame corresponds to a user looking at one of fifty distinct areas.
As will be described in detail later, the processing of the classification unit 20 is not such that the relative 5 locations of areas represented by the fifty different classifications can be determined in advance. However, after the initial set up period, the calibration unit 22 is able to determine the relative locations represented by the different locations utilising the classification 10 numbers assigned to consecutive images in a stream camera images from the camera 8.
This determination of relative gaze position is achieved because an operator 9 is only able to move his head a 15 certain distance in the time between image frames obtained by the camera 8 (because one frame is recorded by the camera 8 every 1/25th of a second in this embodiment). By ensuring that the classification unit 20 classifies an operators gaze in terms of areas which a 20 users gaze will not cross in 1/25th of a second, the calibration unit 22 is then able to assume that consecutive numbers in the stream of numbers received from the classification unit 20 correspond to a user looking at adjacent areas. As will be described in detail 25 later, by monitoring which numbers correspond to adjacent
areas, the calibration unit 22 generates records associating each number output by the classification unit 20 with a gaze position which is correct relative to the positions for the other classification numbers.
In order for the calibration unit 22 to match the numbers received from the classification unit 20 to absolute co ordinate positions, in this embodiment, the calibration unit 22 is also arranged to monitor the mouse 7 to 10 identify image frames which correspond to points in time when the operator 9 clicks on the mouse 7. In general, when an operator 9 clicks on the mouse 7 the operator 9 will be looking at the point on the screen corresponding to the pointer 11 controlled by the mouse 7. This 15 therefore provides absolute information about the operators gaze position on the screen at that point in time. The calibration unit 22 then utilises this data to associate absolute positions to the fifty different image classification numbers received from the classification 20 unit 20.
The constitution of classification unit 20 and calibration unit 22 will now each be described.
CLASSIFICATION UNIT
In order for the images received from the camera 8 to be processed by the classification unit 20 in a consistent manner, a number of problems must be overcome.
Firstly, received images must be processed by the classification unit 20 in a way which accounts for changes in images which arise due to variation in lighting and the movement of an operator 9 towards and 10 away from the display 3 which do not indicate that the operator 9 is looking at a different part of the screen but which will result in differences in the images recorded by the camera 8.
15 Further the classification unit 20 must be able to process images received from the camera 8 to identify similar images which correspond to the operator looking at the same part of the display part 3. Information in the form of mouse click data about the actual location 20 of an operator's gaze is only available for relatively few frames. That is to say very few images are directly associated with information relating to an operator's gaze which could be used as training examples to ensure that the classifications of images by the classification 25 unit 20 are accurate. It is therefore important that the
classification unit 20 can achieve classification of images utilising all the available images and not just the few images associated with mouse click data.
5 To this end, in this embodiment, the classification unit 20 comprises an image processing module 24, a first self organising map 26 and a second self-organising map 28.
In this embodiment, the first and second self organising maps 26, 28 comprise unsupervised learning systems of the 10 type described in detail in 'Self Organising Maps', T. Kohonen, 3rd Edition Springer, 2001, ISBN 3540-67921-9 which is hereby incorporated by reference.
An overview of the processing of the image processing 15 module 24 and, the first and second self organising maps 26 28 of the classification unit 20 Will now be described. a) Image Processing Module 20 In this embodiment, the image processing module 24 comprises: a lighting normalization module 30 for substantially removing lighting effects from the images received from the camera 8, and a patch generation module 32 for identifying portions of each image which are 25 likely to indicate changes resulting from the change of gaze of an operator 9 and for outputting image patches
based on areas of an image frame corresponding to these portions. The image patches extracted by the image processing module 24 for each image frame are then passed to the first self-organising map 26.
Specifically, and as will be described in detail later, the patch generation module 32 initially compares the value of each pixel in the current image frame from which lighting effects have been removed with the value of the 10 corresponding pixel in the previous image frame from which lighting effects were removed. Where these pixel values differ by more than a predetermined threshold, the patch generation module 32 then proceeds to generate a set of image patches for the pixel using different sized 15 areas of the image centered on that pixel.
Thus, for example, as is illustrated Figure 3, after determining that a pixel 55 has varied compared with the corresponding pixel in the previous image frame, the 20 patch generation module 32 proceeds to utilise three differently sized areas 56, 57 and 58 to generate three image patches for subsequent processing. In this embodiment each generated image patch comprises 31 by 31 pixels, where one of the image patches is a copy of the 25 31 by 31 pixel area 57 centred on the identified pixel 55, one of the image patches is generated from the pixels
in a larger area 56 and the other is generated from the pixels in a smaller area 58. The effect of generating three image patches from three differently sized areas of the image is to generate patches at different scales 5 to take account of the different sizes the face of the operator 9 may appear in an image because of the operator's movement towards or away from the camera 8.
The processing by the image processing module 24 is 10 therefore to generate three image patches associated with each of the identified pixels which normally will correspond to points representing parts of the operator 9. Normally, at least one patch generated for each pixel will correspond to the scale utilised by the first self 15 organizing map 26 for performing an initial classification of an image. This processing therefore allows the effects of an operator 9 moving towards or away from the camera 8 to be removed.
20 b) First Self Orqanisinq Map Returning to Figure 2, the first selforganising map 26 comprises: a patch matching module 34, a feature image store 36 and an update module 38.
Specifically the feature image store 36 comprises a data store storing a 15 by 15 array of feature images each feature image being a 31 by 31 pixel image (so that each stored feature image is the same size as an image patch 5 received from the image processing module 24). The feature images initially each comprise random pixel values but, as will be described in detail later, the stored feature images are modified based upon the content of the image patches received from the image processing 10 module 24 and, after an initial set up period, become images representative of a selection of different image patches received from the image processing module 24.
The patch matching module 34 is arranged to process each 15 image patch received from the image processing module 24 to identify the feature image in the feature image store 36 which most closely corresponds to that image patch and to determine a match score indicative of the closeness of the correspondence between the image patch and the 20 identified closest feature image.
Each of the pixels for which image patches were generated is then classified by associating the pixel with the feature image which most closely matches an image patch 25 generated for that pixel. The patch matching module 34
then makes a selection of the pixels utilising the match scores and uses the selection to generate a feature vector which is passed to the second self-organising map 26. Figure 4 is an illustrative example of the data structure of a feature vector passed by the first self organising map 22 to the second self organising map 26. In this embodiment each feature vector passed by the first self 10 organising map 26 for each image frame comprises data defining each of the selected pixels in the form of a pixel number 60 identifying the pixel in the image, and data 62 comprising a feature number for each selected pixel identifying the feature image within the array of 15 feature images in the feature image store 36 to which an image patch generated from an area centred on the identified pixel most closely corresponds.
Thus for example the following feature vector might be 20 passed to the second self organising map 26 for a particular image frame: Pixel No. Feature No. 3,572 35
3,625 14
25 4,945 102
10,674 200
12,745 107
20,982 40
24,003 225
where 1 is the pixel number for the top left corner of an image frame and 25,344 is the pixel number for the bottom right hand corner of the image frame and the feature number for an image patch matched to the feature image 10 stored as the i,jth image in the 15 by 15 array of feature images stored in feature store 36 is i+l5(j-1).
Thus for example, the first entry in the above feature vector indicates that an image patch centered on pixel 15 number 3,572, that is the 52nd pixel on the 21St line of a 176 by 144 pixel image, was for this example determined to most closely correspond to the 35th feature image, that is the feature image stored as the fifth image in the third row of the array of feature images in the feature 20 store 36.
For each frame, immediately after a feature vector for the frame has been output, a selection of the generated image patches received from the image processing module 25 24 for the frame is utilised by the update module 38 to
update the feature images in the feature image store 36.
As will be described in detail later this processing is such that the random images initially stored in the feature store 36 are updated such that the feature images 5 in the feature image store 36 become increasingly representative of a selection of the image patches received from the image processing module 24. The first self organising map 26, is therefore able to classify image frames in terms of areas of an image frame 10 resembling particular representative feature images in the feature image store 36. It is this initial classification of images in terms of areas identified by pixel numbers 60 and feature images identified by feature numbers 62 which is output from the first selforganising 15 map 26 as a feature vector for processing by the second self-organising map 28.
c) Second Self Oruanisinq Map Returning to Figure 2, the second selforganising map 28 20 in this embodiment comprises a table generation module 40, a matching module 42, a table store 44 and an update module 46.
Initially, as will be described in detail later, the 25 table generation module 40 processes each received
feature vector to generate a table representation of that feature vector. Specifically, as is illustrated in Figure 5A, for each received feature vector the table generation module 40 initially generates a table comprising a set of 5 O's where each row in the table comprises a number of O's corresponding to the number of feature images in the feature image store 36 and the number of rows in the table is equal to the number of pixels in an original image recieved from the camera 8.
10 Thus in the present embodiment where the classification unit 20 is arranged to process images comprising 176 x 144 pixels and the feature image store 36 is arranged to store a 15 x 15 array of feature images, the initial table generated by the table generation module comprises 15 a m x n array of O's where m = 176 x 144 and n = 15 x 15.
The table generation module 40 then takes the first pixel number 60 and feature number 62 of the received feature vector and changes the jth entry of the kth row of the generated table to 1 where j is equal to the feature 20 number 62 and k is equal to the pixel number 60 taken from the feature vector.
Figure 5B is a schematic illustration of the table generated by the table generation module 40 after the jth 25 entry of the kth row of the table has been changed to a 1.
This process is then repeated for each of the other pairs of pixel number 60 and feature number 62 forming the rest of the feature vector received from the first self organising map 26. Thus as a result a sparsely filled 5 table is generated comprising a large number of entries set to O and a small number of entries set to 1 where the entries identified by a 1 are at points in the table the co-ordinates of which correspond to the pixel number 60 and feature number 62 pairs of the feature vector.
When a table has been generated by the table generation module 40, the matching module 42 then determines which of the tables stored in the table store 44 most closely corresponds to the generated table.
In this embodiment, the table store 44 is arranged to store a 10 x 5 array of tables, each of the tables comprising a table having m by n entries, where the entries in each m by n table comprise fractional values 20 ranging between 0 and 1. A classification number identifying the most closely corresponding table is then passed to the calibration unit 22.
After a generated table has been matched to one of the 25 tables in the table store 44, the update module 46 utilises the generated table to update the tables in the
table store 44 to make the stored tables more closely resemble the generated table.
Specifically, as in the case of the feature images in the 5 feature image store 36, initially the entries of the tables in the table store 44 are set to different random values, in this case fractional values between O and 1.
As will be described later, the update module 46 updates the entries of the tables in the table store 44 after 10 processing each image frame. This processing is such that as the second self organising map 28 processes feature vectors output from the first self organising map 26 the stored tables become increasingly representative of tables generated by the table generation module 40. In 15 this way, the classification number for a frame output from the second self organising map 28 is caused to identify that a feature vector received by the second self organising map 28 is similar to an earlier feature vector. This is because the stored tables will have been 20 updated to be representative of the generated tables by the update module 40. This processing therefore enables similar images recorded by the camera 8 to be classified with the same classification number.
25 As stated previously the processing of the first self organising map 26 is such that after an initial set up
period an image frame is assigned a feature vector on the basis of the presence and location of features in the image corresponding to feature images in the image store 36. The subsequent processing of the table corresponding 5 to a feature vector by the second self organising map 28 is such as to assign the same classification number to images where similar features are found in similar areas of an image frame and hence assign the same classification number to image frames of an operator 9 10 where the operator is looking at the same location.
CALIBRATION UNIT
The calibration unit 22 processes each classification number output by the classification unit 20 to determine 15 a pair of gaze co-ordinates identifying the location at which the operator 9 appearing in the image assigned that classification number was looking towards. This is then utilised by the application program 23 to, for example, animate a representation of the operator 9.
In this embodiment, the calibration unit 22 comprises: a conversion table 48, a link table 49 and a co-ordinate update unit 50. 25 The conversion table 48 stores data associating each classification
number which can be output by the second
self organising map 28 with a set of gaze co-ordinates identifying an area where an operator may be directing his gaze. Initially the gaze coordinates associated with each classification number are set to random values.
The link table 49 is initially empty. When image classifications are received from the classification unit 20, the link table 49 is updated to store data identifying which pairs classifications have been 10 assigned to consecutive image frames. Thus for example if consecutive images were to be assigned the classification numbers 40 and 37, data identifying the pairs of numbers 40 - 37 and 37 - 40 would be stored in the link table 49.
15 Each time a classification number for a new frame is received by the calibration unit 22, it is first compared with the classification number received for the previous frame. If the two classification numbers are different, the link table 49 is then checked to determine whether 20 data identifying that pair of classification numbers has been stored in the link table 49. If this is not the case the pair of classification numbers for the current frame and previous frame are added to the data stored in the link table 49.
As consecutive frames matched to different classification numbers represent images of the operator 9 looking at the screen 3 separated by a time period corresponding to the frame rate of the camera 8, it can be assumed that the 5 points on the screen 3 at which an operator 9 is looking at those times must be adjacent. In contrast, if consecutive image frames are never matched to a particular pair of classification numbers, it is probable that the classification numbers represent an operator 9 10 looking at portions of a screen 3 distant from one another. The co-ordinate update unit 50 utilises the data stored in the link table 49 to set the gaze co-ordinates 15 associated with each classification number in the conversion table 48 so that co-ordinates associated with pairs of classification numbers which are assigned to consecutive image frames are made indicative of adjacent areas, and co-ordinates associated with pairs of 20 classification numbers which are never assigned to consecutive image frames are made indicative of areas further apart.
More specifically, after the link table 49 has been 25 updated, the coordinate update unit 50 considers each possible pair of classification numbers in turn. Where a pair of classification numbers is identified by data in
the link table 49, and the gaze co-ordinates for the classification numbers identify areas far apart, the gaze co-ordinates are updated by the co-ordinate update unit 50 to identify areas closer together. Conversely, 5 where a pair of classification numbers is not identified by data in the link table 49 and the gaze co-ordinates for the classifications numbers identify adjacent areas, the gaze co-ordinates are updated by the co-ordinate update unit 50 to identify areas further apart. After 10 processing a number of image frame classifications, to update the gaze co-ordinates in the above manner, the relative positions of the areas identified by gaze co ordinates associated with different classification numbers are correctly assigned.
The setting of gaze co-ordinates utilising the link data enables the relative locations of areas associated with different classification numbers to be determined.
However, the determined co-ordinates for the areas can 20 differ from absolute co-ordinates locations for the areas by being a reflection or rotation of the absolute positions or by being displaced. To ensure that the relative co-ordinates correspond to actual positions, further information is required to determine whether the 25 relative co-ordinates have been reflected, rotated and/or displaced.
In this embodiment the co-ordinate update unit 50 identifies image frames obtained when a position on the screen of the display 3 is selected using the pointer 11 under the control of the mouse 7. When this occurs, it is 5 reasonable to assume that the image frame recorded when the mouse button was pressed is indicative of an image obtained of the operator 9 looking at the selected position on the display 3. The click co-ordinates identifying the selected position on the screen of the 10 display 3 therefore identify an absolute co-ordinate position for the operators gaze for the frame obtained when the mouse 7 was clicked. This absolute co-ordinate position, is in this embodiment used to fix the gaze co ordinates associated with the classification number for 15 that frame.
The relative positions determined by the co-ordinate update unit 50 using data in the link table 49 and the absolute positions identified by mouse click data 20 together enable all of the classification numbers to be associated with gaze co-ordinates indicative of the location of an operator's gaze when an image assigned that classification is recorded.
25 Once the conversion table 48 has been updated, the conversion table 48 is used to convert the classification
number for the image frame being processed into a pair of absolute gaze co-ordinates for the frame which are passed to the application program 23 which utilises the gaze co ordinates to for example animate a virtual representation 5 of the operator 9.
Processing by Imace Processing Module The processing operations performed by image processing module 24 will now be described in detail.
Referring to Figure 6 which is a flow diagram of the processing operations performed by the image processing module 24, initially (S6-1) the image processing module 24 receives a frame of image data from the camera 8 15 directed towards the operator 9 of the computer 1.
The image processing module 24 then passes the received image frame to the lighting normalization module 30 which (S6-2) processes the received image to account for 20 variation in illumination of the operator 9 of the computer 1.
Identifying colour features (e.g. the position of eyes, hairline, edge of face, mouth etc.) in a sequence of 25 colour images obtained via camera 8 is difficult because the assigned colour of pixels in an image is dependent
upon a combination of the actual colour of the surfaces the pixels represent and the colour of the illumination of those surfaces. Unless illumination is carefully controlled, utilising colour image data to identify 5 features in images is therefore unreliable. One solution to overcome this problem is to ignore the colour information and use grey scale, or edge/line information obtained by processing the image. Although this is more reliable than relying directly upon colour information, 10 such processing is less powerful as much of the data within received images is ignored.
In this embodiment, in order to remove the contribution to the apparent colour of portions of an image which 15 arise due to illumination, the lighting normalization module 30 processes the entire image utilising the Retinex model as is described in E.Land and J.McCann, "Lightness and Retinex theory", Journal of the Optical Society of America, 61: 1-11, 1971 and J McCann, "Lessons 20 Learned from Mondrians Applied to Real Images and Colour Gamuts", Imaging Science and Technology Reporter, Volume 14 No.6, 1999, which are both herein incorporated by reference. 25 The processing of an image frame utilising the Retinex algorithm substantially removes from the image frame
variations in apparent colour arising due to changes in illumination. This is because, the effect of the processing is to scale the colour information relative to the brightest and darkest points on the image whilst 5 retaining the ratios of the red, green and blue for each pixel. As a result of the processing, the skin colour of a particular operator 9 appearing in images is made substantially consistent for images obtained under different lighting conditions as the resultant processed 10 images effectively correspond to reflectances rather than colours and hence are not lighting dependent. However, as reflectance information is retained, variations in reflectance can still be utilised to identify features in images, which is not possible where images are processed 15 by being converted to grayscale data.
After the image received from the camera 8 has been processed by the lighting normalization module 30 to generate a reflectance image, the image processing module 20 24 then passes the generated reflectance image to the patch generation module 32. The patch generation module 32 then (S6-3) compares the generated reflectance image with the reflectance image which was the result of processing the previous image frame received from the 25 camera 8.
Specifically, the patch generation module 32 compares each pixel in the newly generated reflectance image with the corresponding pixel in the reflectance image generated for the immediately previous image frame 5 obtained by the camera 8. The red, green and blue values for the corresponding pixels are compared. When the difference in the R. G. or B pixel values for the corresponding pixels are greater than a threshold value, for example where the R. G. B values each range between 10 0 and 255, a threshold value of 5 for any colour, the pixel number for the pixel which differs from the pixel in the previous image is added to a list by the patch generation module 32.
15 Differences in pixel values between consecutive frames obtained by the camera 8 will primarily arise due to the motion of the operator 9 as viewed by the camera 8 because the background will largely be static. The effect
of processing successive video images received from the 20 camera 8 in this way therefore causes the patch generation module 32 to generate a list of pixels numbers identifying the positions of pixels in an image which are most likely to represent portions of the operator 9.
Typically the pixels identified as changing will identify 25 pixels representing the edges of an operators head and facial features of distinct colour e.g. the edge of an
operators lips, eyes and hair line, as these pixels will change colour as an operator 9 moves.
When a list of pixels in the image which vary from the 5 previous processed image has been generated, the patch generation module 32 then, for each of the pixels identified in the list generates (S6-4) from the image a number of image patches for the pixels. Specifically, in this embodiment which is arranged to process camera 10 images comprising 176 by 144 pixels, the patch generation module 32 generates for each pixel in the list three 31 by 31 image patches derived from three differently sized areas of the image centered on each pixel on the list.
15 Thus for example as is illustrated in Figure 3, for the pixel 55 identified in the figure, the patch generation module 32 would generate three image patches, one derived from each of the differently sized areas identified as 56, 57 and 58 in the figure each of the differently sized 20 areas being centred on the pixel 55.
In this embodiment the three differently sized areas 56, 57, 58 utilised to generate 31 x 31 pixel image patches comprise a 25 x 25 pixel area 56 of an image centred on 25 the identified pixel 55, a 31 x 31 pixel area 57 of an
image centered on the identified pixel 55, and a 37 x 37 a pixel area 57 centred on the identified pixel 55.
For the 31 x 31 pixel area 57, a 31 x 31 pixel image 5 patch is generated by copying the pixel data for the area 57. For the 25 x 25 pixel area 56, a 31 x 31 pixel image patch representative of the area 56 is calculated utilising the pixel data for the area 56 and interpolating to obtain pixel data for a 31 by 31 pixel 10 image patch. For the 37 x 37 pixel area 58 a 31 x 31 pixel image patch representative of the area 58 is calculated utilising the pixel data for the area 58 and averaging to obtain pixel data for a 31 x 31 pixel image patch. In this way, three 31 by 31 image patches are generated for each of the pixels identified in the list generated by the image patch generation module 32. The effect of generating 31 by 31 pixel image patches from three 20 differently sized areas centered on a pixel is to scale the images contained in those areas to a lesser or greater extent. Where an operator 9 moves towards or away from the camera 8, the size of the portion of the image corresponding to the operator 9 will change. The scaling 25 which results from using three different sized areas to generate three image patches for each identified pixel in
this embodiment attempts to counteract these variations in apparent size of an operator 9. Specifically, generating three differently scaled image patches centered in each pixel usually ensures that at least one 5 of the image patches corresponds to a scale which the first self-organising map 26 is arranged to process as will now be described.
Processing by First Self Orqanisinq Map 10 Referring to Figure 7 which is a flow diagram of the processing performed by the first self organising map 26, initially (S7-1) the first self organising map 26 receives from the image processing module 24 a generated list of pixel numbers identifying pixels in the 15 reflectance image which have varied for a frame compared to the reflectance image for the previous frame, together a set of image patches for each pixel number in the list (each set comprising three 31 by 31 pixel images generated from portions of the images centred on the 20 corresponding pixel identified in the list).
The patch matching module 34 then (S7-2) selects the first of the received image patches and determines which feature image in the feature image store 36 most closely 25 corresponds to the first image patch.
As has been stated previously, the feature image store 36 comprises a data store storing a 15 by 15 array of 31 by 31 pixel images. In order to determine which stored feature image most closely corresponds to the image patch 5 being processed, the patch matching module 34 determines a match score for each of the stored feature images indicative of the correspondence between the stored feature image and the image patch being processed.
10 In this embodiment, the match score for each feature image is determined by calculating for each pixel in the image patch the normalised dot product of a vector characterizing the colour of the pixel in the image patch and a vector characterizing the colour of the 15 corresponding pixel in the feature image stored in the feature image store 36.
Thus for example, where pixel (i) in the image patch being processed has red, green and blue values Rp(i), 20 Gp(i) and Bp(i) and the corresponding pixel (i) in the feature image being compared with the image patch have red, green and blue values Rp(i), GF(i), Bp(i) a normalised dot product for pixel (i) is calculated using the following equation, where is a summing operation over 25 all pixels in an image patch.
Rp(i)RF(i)+Gp(i)GF(i)+Bp(i)B(i) DotProduct= 42(Rp(j))2 + Z(Gp(j))2 + Z(Bp(j))2 42(RF(i))2 + I(GF(i))2 + I(BF(i)) The total of the sum of the dot products for each of the pixels in the image patch and the feature image is then determined. This match score is representative of the 5 correspondence between the image patch and the feature image. When an image is processed utilising the Retinex algorithm, the contribution to apparent colour arising 10 from illumination of objects in an image is substantially removed. However, although the hue of a pixel in a processed image is substantially independent of the illumination of objects in an image, the brightness of a processed pixel will not necessarily be constant.
15 However, the match scores for the comparison between image patches and feature images generated in the above manner are independent of the brightness of a pixel. This is because for each pixel in an image patch the normalised dot product for a pixel in the image patch and 20 the corresponding pixel in a feature image is dependent upon only the relative ratios of the red, green and blue values for the two pixels.
Thus for example if a pixel in an image represents an i object having a true colour represented by red, green and blue values RI(i), GI (i) and BI (i), the processing of an image utilising the Retinex algorithm will generate red, 5 green and blue values RP ( i), GP (i), Bp(i), such that Rp(i) = ARI(i), GP(i) = AGI(i) and Bp(i) = ABI(i), where A is a scaling factor dependent upon illumination of the object in an image. The normalised dot product for that pixel and a corresponding pixel in a feature image having red, 10 green and blue values RF(i), GF(i) and BF(i) is then: JRI(i)RF(i) + GI(i)GF(i)+ BI(i)BF(i) 4E (3RI (i))2 + (1GI (i))2 + (1BI (i))2 dE (RF(i))2 + (GF(i))2 + (BF(i))2 The match score for a feature image and an image patch as 15 the sum of normalised dot products is then equal to: [(RI(i)RF(i)+Gl(i)PF(i)+BI(i)BF(i)] 4(1R,(i))2+(]Gl(i)) 2+(]Bl(i))24(RF(i))2+(GF(i))2+(BF(i)) (Rl(i)RF(i)-t Gl(i)GF(i)+ Bl(i)BF() =. 4(RI(i))2 + T(GI(j))2 + 7(Bi(i))247(Rp(i))2 + Z(GF(i))2 + (BF(i))2
The match score is therefore independent of any scaling factors introduced by the processing of the Retinex algorithm. 5 When match scores have been determined for the comparison of the image patch being processed and each of the feature images in the feature image store 36, the patch matching module 34 records for the image patch, data identifying the feature image in the image store 36 which 10 resulted in the greatest match score for the image patch together with data defining the greatest match score.
The patch matching module 34 then (S7-3) determines whether all of the image patches received for a frame 15 from the image processing module 24 have been processed.
If this is not the case the next image patch is selected (S7-2) and greatest match score and best matching feature image from the feature image store 24 are determined for that next image patch.
Thus in this way each of the image patches passed by the image processing module 24 to the first self organising map 26 is compared with each of the feature images stored in the feature image store 36 and data identifying the 25 feature images within the feature image store 36 which
most closely correspond to these image patches is recorded. When all of the image patches have been processed and a 5 best match score and closest matching feature image within the feature image store 36 have been determined for each image patches, the image patches are then filtered (S 7-4).
10 Specifically, for each of the three image patches generated from areas of an image centred on the same pixel, the image patch associated with the greatest match score is determined. The remaining two image patches generated from areas centred on that pixel are then 15 discarded. As discussed previously, the generation of three 31 by 31 pixel image patches from three differently sized areas of an image centred on the same pixel has the effect of 20 generating three image patches where parts of an image have been scaled to a lesser or greater extent. The determination of which of the three image patches is associated with the highest match score indicates which of the three scales causes the selected portion of an 25 image to most closely correspond to one of the stored feature images. By discarding the remaining two image
patches, the patch match module 34 ensures that the image patch corresponding to the scale of images stored in the feature store 36 for which matching should be most accurate is used for subsequent processing.
The patch matching module 34 then determines for each of the remaining image patches the extent to which the area of the image used to generate each image patch overlaps any of the areas of the image used to generate any of the 10 other remaining image patches. Where the area for an image patch is determined to overlap the area for another image patch by at least 50% of the area of a patch, the patch matching module 34 identifies the image patch with the greatest match score and discards the other patch.
The selection of areas for generating image patches by the image processing module 24 is, in this embodiment, based upon the identification of pixels in an image frame determined to vary in colour by more than a threshold 20 value from corresponding pixels in a previous frame.
Although monitoring for changes in colour values enables some pixels to be identified as corresponding to part of an operator, it cannot be guaranteed that exactly the same points on an operator will necessarily give rise to 25 the same detected changes in colour. In view of this it is desirable that an image is classified in terms of
certain features appearing in identified areas of an image. If two pixels close to one another are identified as 5 having varied in colour, the image patches generated for those pixels will correspond to areas which largely overlap one another. Further it is likely that the generated image patches will be similar and hence matched with similar or the same feature image. Rather than 10 utilising all of the image patches to characterize an image, in this embodiment, only the most closely matching image patch to a stored feature image for image patches for substantially overlapping areas is utilised to identify an area of an image as resembling a particular 15 feature image. This filtering of image patches also ensures that the image patches used to update the stored feature images are relatively distinct from one another as will be described later.
20 When the image patches have been filtered, the first self organising map 26 then (S7-5) passes a feature vector to the second self organising map 28 for the image frame currently being processed. As has previously been described with reference to Figure 4, this feature vector 25 comprises a list of pairs of pixel numbers 60 and feature numbers 62, where each pixel number 60 identifies a pixel
in the image frame and each feature number 62 identifies which of the feature images stored in the feature image store 36 most closely resembles an image patch centred on the identified pixel. The generated feature vector is 5 therefore an initial classification of an image frame which identifies the locations of areas in the image frame which resemble stored feature images.
After a feature vector for an image frame has been passed 10 from the first self-organising map 26 to the second self organising map 28, the first self-organising map 26 then (S7-6) causes the update module 38 of the first self organising map 26 to update the feature images in the feature image store 36, utilising the filtered image 15 patches selected by the patch generation module 34.
Figure 8 is a schematic illustration of a 15 by 15 array of feature images 36-1-1 to 36-15-15 stored within the feature image store 36. As stated previously, in this 20 embodiment, each of the feature images 36-1- 1 to 36-15-15 initially comprises a 31 by 31 pixel image where the red, green and blue values for each pixel are randomly assigned. Each time, a set of image patches is processed by the first self-organising map 26 the filtered set of 25 image patches determined for an image frame is utilised to amend these stored feature images 36-1-1 to 36-15-15.
The result of this processing is that after a number of image frames have been processed, the feature images become representative of typical image patches filtered by the patch matching module 34.
More specifically, using each of the filtered image patches in turn, each pixel in each of the filtered image patches is utilised to update the pixel values of corresponding pixels in each of the feature images in the 10 feature image store 36. This updating is achieved for each pixel in a feature image by determining a weighted average of the stored pixel values for the feature image pixel and the pixel values for the corresponding pixel from the image patch.
Thus, each of the Red, Green and Blue values for a pixel in a feature image is updated utilising the following equation: Pixelnew = r. Pixelpatcl + (l - r) PixelOld 20 where Pixelnew is the new colour value for the feature image being updated; Pixelpatch is the corresponding colour value for the corresponding pixel in the image patch being used to update the feature images; and PixelOld is the colour value for the corresponding pixel in the 25 feature image being updated, and r is a weighting factor
which varies depending upon the number of frames which have been processed by the first self-organising map 20, the location in the array of the feature image being updated, and the location in the array of the feature 5 image which has been determined to most closely correspond to the image patch being used to update the feature images in the feature image store 36.
In this embodiment this weighting factor r is determined 10 by the equation: - Dot with -t/T = yOe + 700 and -t/T 20 = MOO t WOO where t is the frame number of the frame being processed, 770/ T/ 1700/ To and TO are constants which this embodiment are set to the following values: 25 y0 = 0.1, T = 10,000, COO = 0.001
DO = 4152 + 152 21, Too = 0.25;and D2 = (xf - xm) + (yf - Ye) where xt,yf are the co-ordinates within the array of feature images stored in the feature image store 36 of 5 the feature image being updated and xm,ym are the co ordinates within the array of the feature images stored in the feature image store 36 matched to the image patch being utilised to update the stored feature images.
10 The practical effect of updating the feature images 36-1 1 to 36-15-15 using the above equation is that the weighted averages are dependent upon both time (in terms of a number of image frames which have been processed by the first self-organising map 26) and a distance value D 15 dependent upon the relative location within the array of feature images stored in the feature image store 36 of the feature image to which an image patch has been matched and the feature image being updated. As the number of frames processed by the first self-organising 20 map 26 increases the weighting factor r decreases so that the extent to which the feature images vary as a result of an update reduces. Similarly, as the distance D
between the location in the array identifying the feature image being updated and the feature image determined to most closely correspond to the image patch being used to update the feature images increases, the weighing factor 5 decreases. Thus, for example, when an image patch is determined to most closely correspond to the feature image 36-2-2 marked E in Figure 8, initially due to the size of the 10 weighting factors the pixel values all of the feature images including those marked J. K, L, M, N. O. P are updated using the feature image matched to image E 36-2 2. As the total number of frames processed increases, gradually the weighting factor used for update image P 15 36-15-15 which is remote from image E 36-2-2 in the array is reduced such that the effect of the update for an image patch matched to image E 36-2-2 is negligible.
As the number of frames increases further, gradually this 20 is also true for all of the feature images stored in the portions of the array remote from the feature image matched to the image patch being used to update the feature images. Thus for example, in the case of an image patch determined to most closely correspond to the 25 feature image E 36-2-2 only feature images A, Is, C, D, E,
F. G. H and I will be effectively updated by the update module 38.
The combined effect of processing a number of frames of 5 image patches is two-fold. Firstly, as the feature images are updated utilising weighted averages, the feature images are made to resemble the image patches most frequently output by the image processing module 20.
Secondly the effect of the weighted averages is such as 10 to make adjacent images similar. The feature store 36 after processing a number of frames, for example a few hundred frames, therefore becomes a selfordered selection of feature images where similar images are groupedtogether in adjacent areas in the array.
After all of the feature images in the feature image store 36 have been updated using one image patch, the update module 38 then proceeds to update the stored feature images using the next image patch until of the 20 feature images have been updated using all of the image patches which remain after the image patches have been filtered by the patch matching module 36.
As the feature images in the feature image store 36 come 25 to resemble an ordered representative sample of image patches from images obtained by the camera 8, the
identification of a particular image patch with a particular feature image therefore will indicate that a particular portion of the operator, identified by a stored feature image is located at a certain position in 5 the processed camera image. As the operator 9 then moves their head to look at different portions of the display 3, the features present in obtained images and positions of features will change. The feature vectors output by the first self organising map 26 track these changes and 10 therefore provide an initial classification of an image.
This initial classification by the first self organising map 26 is then processed by the second self organising map 28 to assign a single classification number to each 15 image frame where images of an operator 9 looking at the same part of the display 3 are consistently assigned the . same classification number.
. - at.
my, The processing of the second self organising map 28 will 20 now be described in detail.
Processing by Second Self-Ornanisinq Man Figure 9 is a flow diagram of the processing operations performed by the second self-organising map 28.
Initially (S9-1) the second self-organising map 28 receives from the first self-organising map 26 a feature vector comprising a list of pixel numbers 60 each with an associated feature number 62 as is illustrated in Figure 5 4. As detailed above, these pairs of pixel numbers 60 and feature numbers 62 are an initial classification of an image that identify that image patches centered on the pixels identified by the pixel numbers 60 have been matched by the first self-organising map 26 to the 10 feature images identified by the associated feature numbers 62.
The table generation module 40 of the second self organising map 28 then (S9-2) proceeds to generate a 15 table representation of the data received as a feature vector from the first self-organising map 26.
Specifically, as illustrated by Figures 5A and 5B and as previously described, the table generation module 40 20 first generates a table, all of whose entries are set to zero where the number of rows in the table corresponds to the number of pixels in an image frame processed by the classification unit 20 and the number of columns corresponds to the number of feature images stored in the 25 feature image store 36.
Taking each of the pairs of pixel numbers 60 and feature numbers 62 of the feature vector in turn, the table generation module 40, then alters the entry in the row corresponding to the selected pixel number and the column 5 corresponding to the selected feature number 62 by setting the value of that entry to one. This operation is then repeated for the next pair of pixel numbers 60 until the entire feature vector has been processed.
10 When a table has been generated by the table generation module 40 the matching module 42 then calculates (S9-3) for each of the tables in the array of tables stored in the table store 44, a match score based on a comparison of the entries of the generated table with the 15 corresponding entries of the tables in the array of tables in the table store 44.
Specifically, in this embodiment, the match score for a match between the generated table and a stored table is 20 calculated by: m,n 2 match score = (g(j, k) - s(j, k)) where g (j,k) is the value of the jth entry of the ith row of the generated table; s(j,k) is the value of the jth entry of the ith row of the stored table for which a
match score is being calculated; and m and n are equal to the number of pixels in the original image and the number of features the image store respectively.
5 The calculation of match scores in the above manner will cause the table in the table store 44 which most closely resembles the generated table to be associated with the lowest match score. When the stored table in the table store 44 having the lowest match score has been 10 determined, the matching module 42 then (S9-4) outputs as a classification number for the frame of image data being processed a number identifying that table in the table store 44.
15 Once the matching module 42 has output a classification number for a particular image frame, the matching module 42 then causes the update module 46 to update (S9-5) the array of tables stored in the table store 44.
20 This updating of tables in the table store 44 is performed in the same way in which the update module 38 of the first self-organising map 26 updates the feature images in the feature image store 36 with the image patches as is previously been described.
That is to say each of the entries in the tables in the array is updated to be a weighted average of the entry in the stored table and the corresponding entry in the table generated by the table generation module 40 where the 5 weighted average is dependent upon the number of image frames which have been processed and the distance between the table in the array identified as the best match for the generated table and the stored table which is being updated. Thus in a similar way in which the feature images in the feature image store 36 are updated so that they become increasingly representative of the image patches received by the first self-organising map 26, updating the tables 15 in table store 44 in this manner causes the stored tables to become an ordered representative sample of tables having high and low values in areas where ones and zeros respectively are located in tables generated by the table generation module 40.
The combined processing of the image processing module 24 and the first and second self organising maps 26, 28 after an initial set up period is therefore such that each image frame in the stream of images received from 25 the camera 8 is identified with one of the 10 x 5 array of tables in the table store 44 and hence with one of 50
possible classifications where the classifications are dependent upon the location of features identified as appearing within parts of the images of the stream of video images received from the camera 8. These 5 classifications will therefore assign the same classification number to images of an operator where the operator looks at the same location.
Processing by Calibration Unit 10 The calibration unit 22 receives the classification number for each image in a stream of images processed by the classification unit 20. The calibration unit 22 then proceeds to convert each of these classification numbers into data identifying gaze co-ordinates as will now be 15 described in detail with reference to Figures lOA-D, 11, 12 and 13.
As explained previously, because consecutive frames are images obtained of an operator 9 at times separated by 20 the frame rate for the camera 8, it is reasonable to assume that the operator 9 can only move their gaze a certain distance in the time period between consecutive images (the distance being determined by the angle through which the operator is reasonably likely to turn 25 his head during the period between frames). If consecutive images are classified under two different
classifications, this therefore provides an indication that the area of gaze represented by those two classifications must be adjacent to one another.
Conversely, if consecutive image frames are never 5 classified as corresponding to a particular pair of classification numbers, this indicates that the areas represented by those classifications are not adjacent to one another.
10 By monitoring classification numbers assigned to consecutive images, as will be described in detail later a link table 49 identifying pairs of classification corresponding to adjacent areas is generated by the calibration unit 22. This link table 49 is generated as 15 a video stream of images is processed. The co-ordinate update unit 50 then utilises this data to update gaze co ordinates associated with the classification numbers so that the relative positions of the areas identified by the classification numbers can be determined.
Information that certain classification numbers identify adjacent areas is sufficient to determine the relative positions identified by those classifications as will now be explained by way of an initial overview with reference 25 to a simple example illustrated by Figures lOA-D, illustrating how relative positions of points on a grid
can be determined using data identifying which points are adjacent to one another on that grid.
Figure lOA is an illustration of a grid of points 101-109 5 and Figure lOB is an illustration of the same points assigned to random positions.
The following table identifies for the points 101-109 of Figure lOA the adjacent points on the grid.
Point Adjacent Points 101 102, 105, 104
102 101, 103, 104, 105, 106
103 102, 105, 106
15 104 101, 102, 105, 108, 107
105 101, 102, 103, 104, 106, 107, 108, 109
106 102, 103, 105, 108, 109
107 104, 105, 108
108 104, 105, 106, 107, 109
20 109 105, 106, 108
As will be described in detail the processing of the calibration unit 22 is such to cause a link table 49 containing data identifying classification numbers 25 corresponding to adjacent areas to be generated as a video stream is processed. The gaze co-ordinate update
unit 50 then utilises this data to update the gaze co ordinates associated with the classifications which in this embodiment are initially set to random values. This updating is such to cause gaze co- ordinates for 5 classification identifying non-adjacent areas to be moved further apart and gaze co-ordinates identifying adjacent areas to be moved closer together.
In the same way, the information about adjacent points in 10 the original grid of Figure lOA can be utilised to reconstruct that grid from the random co-ordinates assigned to points illustrated in Figure lOB.
Specifically, the co-ordinates of the points in Figure 15 lOB can be updated so that where points are less than a set distance apart (in this example the set distance is equal to the distance between points 101 and 102 in the original grid) and they do not identify adjacent areas, they are moved away from one another; where two points 20 identify adjacent areas and they are further than this distance apart they are moved closer together.
Considering point 102, as shown in Figure lOB, initially in Figure lOB point 102 is close to point 108 but remote 25 from points 101, 103, 104, 105 and 106 which as shown in the table above were all adjacent points to point 102 in
the original grid of Figure lOA. By updating the co ordinates of point 102 by moving it towards points 101, 102, 103 and 104 and away from other nearby points, the co-ordinates of point 102 are updated to be in the 5 position of point 102 shown in Figure lOC. In Figure lOC the arrow pointing towards point 102 illustrates the change in co-ordinates assigned to 102 between Figures lOB and lOC. The other points and arrows in Figure lOC illustrate the change in co-ordinates assigned to the 10 other points 101-104 by considering each of the points in turn and applying the same processing to those points so as to move the points towards the other points identified by as adjacent points in the original grid of Figure lOA and away from other nearby points.
Figure lOD illustrates the co-ordinates assigned to points 101-109 after repeated processing of co-ordinates in the manner described above to cause adjacent points as identified by the table to identify points closer 20 together and to alter co-ordinates of points not identified by the table as adjacent points further apart.
In Figure lOD the arrows illustrate the change of co ordinates of the points the arrow identify relative to the positions shown in Figure lOC.
As can be seen in Figure lOD the processing after a number of iterations eventually causes the positions of points to be realigned in the arrangement of the grid of Figure lOA based solely on identifying the pairs of 5 adjacent pairs of points. Thus as will be described in detail the combination of generating a record of pairs of classifications identifying adjacent gaze locations and adjusting gaze co-ordinates using the recorded information, enables relative gaze locations for the 10 classifications to be determined.
The processing of the calibration unit 22 will now be described in detail referring to Figure 11.
15 Initially (Sll-l) the calibration unit 22 receives for a frame a classification number identifying the particular table in the table store 44 to which a generated table for that image frame has been matched.
20 The calibration unit 22 then (S11-2) proceeds to compare this classification number with the classification number received for the previous image frame to determine whether the classification number corresponds to the classification number for the previous frame.
If this is not the case the calibration unit 22 then (S11-3) determines whether the classification number for the current frame and the classification number for the previous frame have been recorded as consecutive 5 classifications in the link table 49. If this is not the case the calibration unit 22 then (S11-4) adds to the data stored in the link table 49 data identifying these two classifications.
10 Thus, for example, if the calibration unit 22 were to receive the following sequence of six classification numbers from the classification unit 20: 50, 2, 50, 1, 3, 50,
15 when the second image frame was identified as being classified as a type 2, data identifying the combinations 50-2 and 2-50 would be stored in the link table 49. When the third frame was classified as a type 50 because data for the pair of classification 50-2 and 2-50 will have 20 already been stored in the link table 49 no further action is taken. When the fourth frame is identified as being classified as type 1, because the previous frame was classified as type 50 data identifying the pairs 1-50 and 50-1 are stored in the link table 49.
Thus after processing the sixth frame of the above sequence, data identifying the following pairs would be stored within the link table 49: 1-3, 3-1, 1-50, 2-50, 3-50, 50-1, 50-2, 50-3.
The link table 49 generated in this way thereby indicates which pairs of classifications have been found as classifications for consecutive image frames for images received from the camera 8 and hence correspond to 10 adjacent gaze locations.
By processing the received classification numbers in this way, the link table 49 is caused to generate a record identifying for each classification number, which 15 classification numbers identify adjacent areas. The calibration unit 22 therefore builds up a record similar to the table referred to above in relation to the simple example illustrated by Figures lOA-D.
20 After either the link table 49 has been updated (S11-4) or after it has been determined that either the classification number received for an image frame corresponds to the same classification number for the previous image frame (S11-2) or it has been determined 25 that data identifying the current frame classification and previous frame classification as being consecutive
classifications has already been stored in the link table (S11-3) the coordinate update unit 50 then utilises the data stored in the link table 49 to update (S11-5) data stored in the conversion table 48.
An exemplary data structure of data stored in the conversion table 48 in this embodiment of the present invention is illustrated in Figure 12. In this embodiment the conversion table 48 comprises 50 conversion records 10 70, being one conversion record 70 for each of the tables stored in the array of tables in the table store 44. The conversion records 70 each comprise: a classification number 72 corresponding to a classification number identifying a table in the table store 44, a click number 15 74 which is initially set to 0, and an X co-ordinate 76 and a Y co-ordinate 78 which together identify a gaze position on or in the vicinity of the screen of display device 3. In this embodiment, the X co-ordinate data 76 and Y co-ordinate data 78 are initially set to random 20 values. The updating of data in the conversion table 48 by the co-ordinate update unit 50 will now be described in detail with reference to Figure 13.
Referring to Figure 13, when the table 48 is to be updated, the coordinate update unit 50 first (S13-1) determines whether, in the period represented by the image frame being processed, the user clicked on the 5 mouse button of the mouse 7.
If this is the case, it may be assumed that the operator 9 was looking at the pointer 11 under the control of the mouse 7 at the time represented by the frame and the 10 mouse click co-ordinates can therefore be used to fix a an absolute gaze location as will be described later.
If it is determined (S13-1) that no mouse click is associated with a particular frame, the co-ordinate 15 update unit 50 then (S13-3) selects a first pair of classifications numbers for processing, for example the classification numbers 1 and 2.
The co-ordinate update unit 50 then (S13-4) determines 20 whether the click number data 74 of the conversion record 70 identified by classification data 72 corresponding to the first number in the pair of numbers being processed is equal to 0. This will be the case, when no mouse clicks have been associated with a frame classified as 25 corresponding to that particular classification number 72. If the click number data 74 of the conversion record
70 is not equal to 0, the co-ordinate data 76, 78 for the conversion record 70 for that classification number 72 will have been fixed utilising mouse click co-ordinates as will be described later and is not updated utilising 5 data within the link table 49.
If the click number data 74 for the conversion record 70 is determined to be equal to O the co-ordinate update unit 50 then (S13-5) compares the pair of classification 10 numbers being processed with the pairs of numbers identified by data in the link table 49.
If the pair of numbers being processed correspond to data in the link table 49, the co-ordinate update unit 50 then 15 (S13-6) compares the x co-ordinates and y co-ordinates 76, 78 of the conversion record 70 for the first number with the x co-ordinates and y co-ordinates 76, 78 of the conversion record 70 of the second number of the pair of numbers being processed.
If the distance between the points identified by the x co-ordinates and y co-ordinates 76, 78 of the conversion records 70 identify points further apart than a pre-set threshold, (in this embodiment set to correspond to the 25 distance of a tenth of the width of the screen of the display 3) the calibration unit 22 then (S13-7) proceeds
to update the x co-ordinate data 76 and y co-ordinate data 78 of the conversion record 70 associated with the first number of the pair of numbers being processed so as to move the point associated with the first number nearer 5 to the point associated with the search number.
The updating of the x co-ordinate data 76 and y co ordinate data 78 of the conversion record 70 associated with the first number of the pair of numbers being 10 processed is achieved utilising the following equations 1 2 ( 1 2)
Ynew = Y1 - 2 (Y1 Y2) where xnew is the new value for the x co-ordinate data 76 of the conversion record 70 identified by the first 15 number of the pair of numbers being processed; Ynew is the new value of the y coordinate data 76 of the conversion record 70 identified by the first number of the pair of numbers being processed; and xl and Y1 and x2 and Y2 are the x co-ordinates 76, 78 of the conversion records 70 20 identified by the first and second numbers of the pair of numbers being processed respectively.
If the calibration unit 22 determines that the pair of numbers being processed do not correspond to data within the link table 49 (S13-5) the co-ordinate update unit 50 then (S13-8) determines whether the distance between the 5 points represented by the x co-ordinates and y co ordinates 76, 78 of the conversion records 70 identified by the pair of numbers 72 being processed are less than a second preset distance apart. In this embodiment this second threshold distance is also set to be equal to the 10 distance corresponding to a tenth of the width of the screen of the display 3.
If the distance between the points identified by the x co-ordinates 76 and y co-ordinates 78 of the two 15 conversion records 70 is determined to be less than this preset threshold distance, the x co-ordinate data 76 and y co-ordinate data 78 of the conversion record 70 identified by the first classification number 72 of the pair of numbers being processed is then updated in order 20 to move the point associated with the first number away from the point associated with the second number utilising the following equations: X = X + I (X - X)
Ynew = Y1 + 2 (Yl Y2)
where xnew is the new value for the x co-ordinate data 76 of the conversion record 70 identified by the first number of the pair of numbers being processed; Ynew is the new value of the y co-ordinate data 76 of the conversion 5 record 70 identified by the first number of the pair of numbers being processed and x1 and Y1 and x2 and Y2 are the x coordinates and y co-ordinates 76, 78 of the conversion records 70, identified by first and second numbers of the pair of numbers being processed 10 respectively. Either after the x co-ordinate data 76 and y co-ordinate data 78 for the conversion record 70 associated with the first number of the pair of numbers has been updated 15 (S13-7, S13-9) or it has been determined that such an update is not necessary (S13-4, S13-6, S13-8), the calibration unit 22 then determines whether all possible pairs of classification numbers have been processed (S13 10). If this is not the case, the next pair of 20 classification numbers are selected and processed in the same way as has been described above (S13-3-S13-10).
Thus in a similar way to that illustrated by the simple example of Figures lOA-lOD, the co-ordinates associated 25 with classification numbers identifying adjacent areas are made to move closer together and the co-ordinates
associated with classification numbers identifying non adjacent areas are moved further apart.
The effect of repeatedly updating the conversion records 5 70 in the manner described above is to utilise the data stored in the link table 49 to cause co-ordinate data 76, 78 for conversion records 70 for classification numbers 72 assigned to consecutive images to identify points close together and to cause co-ordinate data 76, 78 for 10 conversion records 70 for classification numbers not assigned to consecutive images to identify points further apart. As discussed above, this results in the relative positions identified by the co-ordinate data 76, 78 of the conversion records 70 to identify locations 15 indicative of an operator's gaze, positioned correctly relative to each other (but possibly being a reflection, rotation or translation or a combination of a reflection, rotation or translation of absolute gaze co-ordinates at this stage).
The determination of gaze locations in this way is generally satisfactory. However, the gaze locations generated are limited by the fact that the positions assigned to different classifications are correct only 25 relative to one another. That is to say the identified
positions do not identify absolute locations in space but locations relative to one another.
In order to overcome this limitation, in addition to 5 determining relative locations of gaze from the classification numbers assigned to consecutive images, in this embodiment, the calibration unit 22 is also arranged to utilise mouse click data to assign absolute co ordinate data to classification records 70. Once the 10 absolute position of one classification record 70 has been fixed, any relative gaze co-ordinates determined relative to that fixed point will be correct subject to being a reflection or rotation of the absolute position or by being both reflected and rotated relative to the 15 fixed point. When the absolute locations of three non colinear gaze locations has been determined, relative gaze co-ordinates relative to these fixed positions will correspond to absolute gaze locations.
20 Referring to Figure 13, in this embodiment, whenever an image frame is determined to be associated with a mouse click (S13-1), the calibration unit 22 first (S13-11) proceeds to utilise the click co-ordinates to update the gaze co-ordinates 76, 78 for the conversion record 70 25 containing the classification number 72 corresponding to
the classification number assigned to the image frame being processed.
Specifically, the calibration unit 22 proceeds to update 5 the conversion record 70 identified by the classification number 72 for the current frame by incrementing the click number 74 of that record 70 by 1 and then updating the X co-ordinate data 76 and Y co-ordinate data 78 of the record 70 utilising the following equations: Xnew Xclick + (1 - 1) Xold and click no. click no. click no. ( click no.)Yold where xnew is the updated x co-ordinate data 76; xOld is the previous value of the x co- ordinate data 76; XCliCk iS the x co-ordinate for the pointer when the mouse button was clicked; Ynew is the updated y co-ordinate data 78; Yold iS the previous value of the y co-ordinate data 78 Yclick iS the y co- ordinate for the pointer when the mouse button was clicked; and click no. is the click number data 74 20 for the conversion record 70 being updated.
Thus, if previously no mouse clicks have been associated with frames classified to a particular classification
number 72 then the x co-ordinate and y co-ordinate data 76, 78 for the conversion record 70 having that classification number 72 is updated so that the x co ordinate data 76 and y co-ordinate date 78 correspond to 5 the x and y co-ordinates respectively for the position identified by the pointer 11 when the mouse 7 was clicked. If more than one frame classified utilising the same classification number is associated with a mouse click, the x co-ordinate data 76 and y co-ordinate data 10 78 for the conversion record 70 identified by that classification number 72 are updated so that the x co ordinate data and y co-ordinate data 76, 78 each correspond to an average value for the x co-ordinates and y co-ordinates of positions pointed to by the pointer 11 15 when the mouse 7 was clicked and obtained images were assigned to that classification.
Once the co-ordinate data for the conversion record 70 including the classification number for the current frame 20 has been updated (S13-11). The calibration unit 22 then proceeds to update co-ordinate data for the remaining records in the manner which has previously been described ( S133-S13-10).
25 Finally, returning to Figure 11, after the conversion table 48 has been updated (S11-5) the calibration unit 22
then outputs as gaze co-ordinates for a particular image frame the x coordinate data 76 and y co-ordinate data 78 of the conversion record 70 having the classification number 72 corresponding to the classification number for 5 the image frame being processed.
These gaze co-ordinates are output to the application program 23 which then utilizes the co-ordinates to modify the functioning of the application program 23 for example 10 by amending the animation of a virtual representation of the operator 9. As soon as a pair of gaze coordinates for an image frame have been passed to the application program23, the calibration unit 22 then proceeds to process the next classification number output by the 15 classification unit 20 for the next image frame.
Further Modifications and Embodiments In the above described embodiment, an image of an operator 9 obtained by a camera 8 is described as being 20 processed to associate pixels in the images with sets of three image patches. It will be appreciated that instead of generating image patches from three different sizes of image area, a lesser or greater number of image patches could be generated so that features present at a lesser 25 or greater number scales could be detected.
* Further in addition or instead of processing images to obtain image patches which are substantially independent of lighting variations and movement towards and away from the camera 8, other forms of image processing could be 5 undertaken to account for other distortions arising from the motion of an operator. Thus for example image patches could be processed to account for rotations or affine distortions of portions of an image.
10 Although in the above described embodiment, an image of an operator comprising an array of 176 x 144 pixels is described as being processed, it will be appreciated that certain portions of an image convey more information about the gaze of an operator 9 than others. Thus for 15 example, a higher resolution image of the portion of an operator 9 corresponding to the operator's eyes could be obtained and utilised to obtain an improved resolution of gaze tracking.
20 Although in the above described embodiment, the processing of images is described in relation to tracking an operator's gaze, it will be appreciated that the two stage process of removing lighting effects subject to a scaling factor and then comparing values in a manner 25 which is unaffected by that scaling factor could be utilised to identify colour features in any form of image
processing system where identification of colour features is desirable. Thus for example, extraction and matching of features as described could be utilised for, for example, pattern recognition.
In the above described embodiment, absolute gaze locations for some frames are determined utilising mouse click data identify portions of a screen of a display 3 selected using the mouse 7 where the frames were 10 obtained. It will be appreciated that alternative systems could be utilised to obtain absolute gaze locations for some image frames. Thus for example, after a certain number of frames had been processed and the data stored by the self-organising maps 24, 26 had substantially 15 become fixed, the operator 9 could be prompted by the screen display to look at specific portions on the screen. The classification numbers of image frames when the user was looking at those known portions on the screen could then be utilised to fix the absolute gaze 20 locations identified by those classifications.
Alternatively instead of utilising mouse click data, the stream of classification numbers output by the classification unit 20 alone could be used to estimate 25 absolute gaze locations. In such an alternative embodiment, the relative gaze locations represented by
different classifications could be calculated the same way as is described in detail in the above embodiment.
The calibration unit 22 could then also be arranged to store the number of times each type of classification was 5 encountered. The frequencies which different classifications were encountered could then be compared with stored data so that where for example for a particular application a user is known to look at the top left hand corner of the screen more often than the bottom 10 right hand corner, the frequency with which particular gaze locations are identified by the calibration unit 22 could be checked and the relative gaze locations rotated or reflected as appropriate so that the frequency for different points on the screen corresponds to the 15 expected frequency with which the positions are encountered. Although in the above described embodiment the determination of relative gaze positions and the fixing 20 of absolute gaze positions utilising mouse click data is achieved at the same time, it will be appreciated that the determination of relative gaze positions and absolute gaze positions could be achieved consecutively.
25 Thus for example, in one alternative embodiment, the calibration unit 22 could be arranged to obtain mouse
click data for three or more frames and delay determining relative gaze locations for classifications until the absolute locations for three or more classifications had been fixed using the mouse click data. After these 5 absolute gaze locations had been fixed, relative gaze locations for the remaining classification could then be determined. Alternatively, relative gaze locations could first be 10 determined and later absolute gaze positions data for a number of classifications could be obtained so that the transformation of the calculated relative gaze locations to absolute gaze locations could then be determined and then applied to the previously generated relative gaze 15 locations. Although in the above described embodiment, an application program 23 for animating virtual representations of the operator has been described, it 20 will be appreciated that the gaze tracking system described could be utilised to provide an input to any type of application program 23 so that the users gaze could be taken as an input to the program and the processing of the program amended accordingly. Thus for 25 example an application program 23 might utilise the
calculated position of the users gaze to control the location of a cursor or for selecting icons.
Although the embodiments of the invention described with 5 reference to the drawings comprise computer apparatus and processes performed in computer apparatus, the invention also extends to computer programs, particularly computer programs on or in a carrier, adapted for putting the invention into practice. The program may be in the form 10 of source or object code or in any other form suitable for use in the implementation of the processes according to the invention. The carrier be any entity or device capable of carrying the program.
15 For example, the carrier may comprise a storage medium, such as a ROM, for example a CD ROM or a semiconductor ROM, or a magnetic recording medium, for example a floppy disc or hard disk. Further, the carrier may be a transmissible carrier such as an electrical or optical 20 signal which may be conveyed via electrical or optical cable or by radio or other means.
When a program is embodied in a signal which may be conveyed directly by a cable or other device or means, 25 the carrier may be constituted by such cable or other device or means.
Alternatively, the carrier may be an integrated circuit in which the program is embedded, the integrated circuit being adapted for performing, or for use in the performance of, the relevant processes.

Claims (28)

1. Apparatus for comparing colour images independently of the illumination of a subject object in said images, 5 comprising: a receiver operable to receive image data defining images, said image data comprising colour data for a plurality of pixels representative of the colours of a subject object appearing in said images; 10 a processing unit operable to derive from image data received by said receiver, a colour reflectance image said colour reflectance image comprising colour data for said plurality of pixels representative of the contribution to the colour of a subject object in an 15 image not arising due to the colour of the illumination of the subject object in said image, said processing unit being operable to derive said colour reflectance image such that the ratios of colour data of pixels in said colour reflectance image for a said subject object are 20 independent of the colour of the illumination of said subject object in said image; and a comparator operable to compare generated colour reflectance images with stored image data to determine the correspondence between said stored data and said 25 generated images;
wherein said correspondence is determined on the basis of the comparison of the ratios of colour data for pixels in a said generated colour reflectance image and corresponding ratios of colour data for pixels of said 5 stored image data.
2. Apparatus in accordance with claim l wherein said processing unit is operable to derive a colour reflectance image by processing image data received by 10 said receiver utilising the Retinex algorithm.
3. Apparatus in accordance with claim 1 or 2 further comprising a data store configured to store colour data defining images comprising pixels, wherein said 15 comparator is operable to compare data derived from a generated colour reflectance image with colour data stored in said data store, said comparator being operable to generate a match score indicative of correspondence between derived data and colour data stored in said data 20 store by calculating the sum of a dot product of colour data of corresponding pixels of said stored colour data and said derived data.
4. Apparatus in accordance with claim 3 wherein said 25 comparator is operable to generate said match score
utilising the sum of the products of the red, green and blue data for corresponding pixels for said derived and said stored colour data.
5 5. Apparatus in accordance with claims 3 or 4 wherein said data store is configured to store colour data defining a plurality of images, said comparator being ( operable to compare data derived from generated colour reflectance images with images defined by colour data 10 stored in said data store to determine the correspondence between said images and said derived data.
6. Apparatus in accordance with any of claims 3 to 5 wherein said comparator means further comprises: 15 an image patch generation module operable to process portions of a colour reflectance image generated by said ( processing unit to generate image patches of a known size, said comparator being operable to compare pixel data of image patches generated by said patch generation 20 module with colour data stored in said data store to determine correspondence between said stored colour data and said image patches.
7. Apparatus in accordance with claim 6, further 25 comprising an update module operable to utilise image
patches generated by said image patch module to update stored colour data defining images stored in said data store. 5
8. Apparatus in accordance with claim 7 wherein said update module is operable to update colour data defining images utilising a weighted average of colour data for corresponding pixels in said image patches and said colour data defining images stored within said data 10 store, wherein said data store is operable to store colour data defining an array of images and said weighted average for updating colour data stored in said data store is determined upon the basis of the relative locations in said array of the image being updated and 15 the image to which an image patch has been determined to most closely correspond.
(,
9. Apparatus in accordance with any preceding claim, further comprising a classification unit operable to 20 utilise the comparison by said comparator of data derived from colour reflectance images to stored data to determine a classification of image data received by said receiver.
10. Apparatus in accordance with claim 9 wherein said receiver is operable to receive image data defining a stream of video images of an operator and said classification unit is operable to classify said images 5 in said video stream such that images of an operator looking at the same location are associated with the same classification.
11. Apparatus in accordance with claim 10, further 10 comprising: an output unit operable to output in response to the classifications of images by said classification unit data identifying for each said image the location of the point at which an operator appearing in said image is 15 looking towards.
12. A method of comparing colour images independently of the illumination of a subject object in said images, comprising the steps of: 20 receiving image data defining images, said image data comprising colour data for a plurality of pixels representative of the colours of a subject object appearing in said images; processing a received image to derive from received 25 image data a colour reflectance image said colour
reflectance image comprising colour data for said plurality of pixels representative of the contribution to the colour of a subject object in an image not arising due to the colour of the illumination of the subject 5 object in said image, said processing being such to derive said colour reflectance image such that the ratios of colour data of pixels in said colour reflectance image ( for a said subject object are independent of the colour of the illumination of said subject object in said image; 10 and comparing generated colour reflectance images with stored image data to determine the correspondence between said stored data and said generated images; wherein said correspondence is determined on the 15 basis of the comparison of the ratios of colour data for pixels in a said generated colour reflectance image and (. corresponding ratios of colour data for pixels of said stored image data.
20
13. A method in accordance with claim 12 wherein said processing step comprises deriving a colour reflectance image by processing received image data utilising the Retinex algorithm.
14. A method in accordance with claim 12 or 13 further comprising the steps of: storing colour data defining images comprising pixels, wherein said comparison step comprises comparing 5 data derived from a generated colour reflectance image with said stored colour data and generating a match score indicative of correspondence between derived data and ( said stored colour data by calculating the sum of a dot product of colour data of corresponding pixels of said 10 stored colour data and said derived data.
15. A method in accordance with claim 14 wherein said comparison step comprises generating said match score utilising the sum of the products of the red, green and 15 blue data for corresponding pixels for said derived and said stored colour data.
i
16. A method in accordance with claims 14 or 15 wherein said stored colour data defines a plurality of images, 20 said comparison step comprising comparing data derived from generated colour reflectance images with images defined by said stored colour data to determine the correspondence between said images and said derived data.
17. A method in accordance with any of claims 14 to 16 wherein said comparison step further comprises the steps of: processing portions of a generated colour 5 reflectance image to generate image patches of a known size, said comparison step comprising comparing pixel data of generated image patches with stored colour data to determine correspondence between said stored colour data and said image patches.
18. A method in accordance with claim 17, further comprising the steps of: utilising said generated image patches to update said stored colour data.
19. A method in accordance with claim 18 wherein said updating step comprises updating colour data defining images utilising a weighted average of colour data for corresponding pixels in said image patches and said 20 stored colour data defining images, wherein said stored colour data comprises colour data defining an array of images and said weighted average for updating stored colour data is determined upon the basis of the relative locations in said array of the stored image being updated
and the image to which an image patch has been determined to most closely correspond.
20. A method in accordance with any of claims 12 to 19 5 further comprising the step of: utilising the comparison of data derived from colour reflectance images to stored data to determine a ( classification of said stored image data.
10
21. A method in accordance with claim 20 wherein said image data defines a stream of video images of an operator and said classification of said images in said video stream is such that images of an operator looking at the same location are associated with the same 15 classification. (
22. A method in accordance with claim 21 further comprising the steps of: outputting in response to the classifications of 20 images, data identifying for each said image the location of the point at which an operator appearing in said image is looking towards.
23. A data carrier storing computer implementable 25 process steps for generating within a programmable
computer an apparatus in accordance with any of claims 1 to 11 or for causing a programmable computer to perform a method in accordance with any of claims 12 to 22.
5
24. A data carrier in accordance with claim 23 comprising a computer disc.
(
25. A data carrier in accordance with claim 23 comprising an electric signal transferred via a network.
26. A computer disc in accordance with claim 24 wherein said computer disc comprises an optical, magneto-optical or magnetic disc.
15
27. A method of comparing colour images independently of the illumination of a subject object in said images ( substantially as herein described with reference to the accompanying drawings.
20
28. Apparatus for comparing colour images independently of the illumination of a subject object in said images substantially as herein described with reference to the accompanying drawings.
GB0223761A 2002-10-09 2002-10-11 Gaze tracking system Expired - Fee Related GB2396003B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GBGB0223491.2A GB0223491D0 (en) 2002-10-09 2002-10-09 Gaze tracking system

Publications (3)

Publication Number Publication Date
GB0223761D0 GB0223761D0 (en) 2002-11-20
GB2396003A true GB2396003A (en) 2004-06-09
GB2396003B GB2396003B (en) 2005-12-07

Family

ID=9945612

Family Applications (2)

Application Number Title Priority Date Filing Date
GBGB0223491.2A Ceased GB0223491D0 (en) 2002-10-09 2002-10-09 Gaze tracking system
GB0223761A Expired - Fee Related GB2396003B (en) 2002-10-09 2002-10-11 Gaze tracking system

Family Applications Before (1)

Application Number Title Priority Date Filing Date
GBGB0223491.2A Ceased GB0223491D0 (en) 2002-10-09 2002-10-09 Gaze tracking system

Country Status (1)

Country Link
GB (2) GB0223491D0 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4384336A (en) * 1980-08-29 1983-05-17 Polaroid Corporation Method and apparatus for lightness imaging
US20010031073A1 (en) * 2000-03-31 2001-10-18 Johji Tajima Face recognition method, recording medium thereof and face recognition device
EP1255225A2 (en) * 2001-05-01 2002-11-06 Eastman Kodak Company Method for detecting eye and mouth positions in a digital image

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4384336A (en) * 1980-08-29 1983-05-17 Polaroid Corporation Method and apparatus for lightness imaging
US20010031073A1 (en) * 2000-03-31 2001-10-18 Johji Tajima Face recognition method, recording medium thereof and face recognition device
EP1255225A2 (en) * 2001-05-01 2002-11-06 Eastman Kodak Company Method for detecting eye and mouth positions in a digital image

Also Published As

Publication number Publication date
GB0223761D0 (en) 2002-11-20
GB2396003B (en) 2005-12-07
GB0223491D0 (en) 2002-11-13

Similar Documents

Publication Publication Date Title
GB2396001A (en) Gaze tracking system
US11000107B2 (en) Systems and methods for virtual facial makeup removal and simulation, fast facial detection and landmark tracking, reduction in input video lag and shaking, and method for recommending makeup
US11741639B2 (en) Locating and augmenting object features in images
US20200118279A1 (en) Locating and Augmenting Object Features in Images
RU2680765C1 (en) Automated determination and cutting of non-singular contour of a picture on an image
US9031317B2 (en) Method and apparatus for improved training of object detecting system
GB2601067A (en) Locating and augmenting object features in images
US9721387B2 (en) Systems and methods for implementing augmented reality
US20080123959A1 (en) Computer-implemented method for automated object recognition and classification in scenes using segment-based object extraction
US11580634B2 (en) System and method for automated surface assessment
CA2774974A1 (en) Real time hand tracking, pose classification, and interface control
Jun et al. Robust real-time face detection using face certainty map
JP2001043376A (en) Image extraction method and device and storage medium
CN109934129B (en) Face feature point positioning method, device, computer equipment and storage medium
US20230237777A1 (en) Information processing apparatus, learning apparatus, image recognition apparatus, information processing method, learning method, image recognition method, and non-transitory-computer-readable storage medium
JP2003108980A (en) Head area extraction device and real-time expression tracing device
JP2023508641A (en) Data augmentation-based matter analysis model learning device and method
CN114255493A (en) Image detection method, face detection device, face detection equipment and storage medium
GB2548088A (en) Augmenting object features in images
GB2396003A (en) System for comparing colour images
GB2396002A (en) Gaze tracking system
GB2550344A (en) Locating and augmenting object features in images
JP2003178311A (en) Real time facial expression tracking device
US11907841B1 (en) Machine learning based consumer product identification system and method therefor
US11967016B1 (en) Systems and methods for presenting and editing selections of three-dimensional image data

Legal Events

Date Code Title Description
PCNP Patent ceased through non-payment of renewal fee

Effective date: 20161011