AU2002300345B2

AU2002300345B2 - Video Feature Tracking with Loss-of-track Detection

Info

Publication number: AU2002300345B2
Application number: AU2002300345A
Authority: AU
Inventors: Julian Frank Andrew Magarey
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2001-08-01
Filing date: 2002-07-31
Publication date: 2004-05-13
Anticipated expiration: 2022-07-31

Description

/S

S&F Ref: 602075

AUSTRALIA

PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT

ORIGINAL

Name and Address of Applicant: Actual Inventor(s): Address for Service: Invention Title: Canon Kabushiki Kaisha 30-2, Shimomaruko 3-chome, Ohta-ku Tokyo 146 Japan Julian Frank Andrew Magarey Spruson Ferguson St Martins Tower,Level 31 Market Street Sydney NSW 2000 (CCN 3710000177) Video Feature Tracking with Loss-of-track Detection ASSOCIATED PROVISIONAL APPLICATION DETAILS [33] Country [31] Applic. No(s) AU PR6762 [32] Application Date 01 Aug 2001 The following statement is a full description of this invention, including the best method of performing it known to me/us:- 5815c -1- VIDEO FEATURE TRACKING WITH LOSS-OF-TRACK DETECTION Technical Field of the Invention The present invention relates generally to digital video analysis and, in particular, to the task of tracking a feature over a sequence of video frames.

Background Art Feature tracking is an important task in the field of digital video analysis. Digital video consists of a sequence of two-dimensional arrays, known as frames, of sampled 1o intensity values, known as picture elements or pixels. A feature may be defined as a pattern of pixels in such a frame. Given the location of a feature of interest in one frame, the aim of feature tracking is then to determine the location of that feature in other, usually subsequent frames. That is, a trajectory for the selected feature must be found with respect to the coordinate system of the camera used to capture the sequence of frames.

The feature is typically selected through some intervention by a human user, usually by directing a pointing device at the feature displayed as part of an image on a screen. The feature may also be selected through an automatic detection process which, by using some predefined criteria, selects a feature that corresponds to such criteria.

If the selection is performed in real time, feature tracking may be used for controlling some other variable, such as the pointing direction of a sensor such as a camera, by feeding the results to a control system. In such applications, speed is of the utmost importance. Other applications use feature trajectories in post-processing tasks 602075AU.doc 1. 1 -2such as adding dynamic captions or other graphics to the video. Speed is less important in such applications.

There are two broad categories of feature tracking. A first approach, sometimes known as centroid tracking, requires the feature or object to be clearly distinguishable from the background in some sensing modality. An example of this first category is the tracking of movemrnent of people across a fixed, known scene, in a surveillance application. In this case, a detection process may be employed independently in each fiame to locate one or more objects. The task of tracking is to associate these locations into coherent trajectories for one or more of the detected objects as they interact with one i0 another.

The second category may be referred to as motion-based or correlation tracking.

In this case there is no separate detection process, and the location of the feature in the current frame must be found by reference to its position in the previous frame. This is a more general category with wider application, since there are fewer restrictions on the is nature of the scene. The present disclosure falls into this category.

A critical step in the second approach is motion estimation, in which a region is sought in the current frame that is most similar to the region surrounding the feature in the previous frame. There exist many approaches to motion estimation including search and match, optical flow, and fast correlation among others, and all are potentially applicable to motion-based tracking. Because these methods have various limitations in tenns of speed and reliability, many systems use some form of predictive tracking, whereby the trajectory over previous frames is extrapolated to predict the location of the feature in the current frame. If the trajectory is accurate, only a small correction to the predicted position need be found by the motion estimation process, potentially reducing 602075AU.doc -3computation and increasing reliability. The Kalman filter is an example of a predictive tracking strategy which is optimal under certain estimation error assumptions. An estimated motion vector is the "measurement" which enables correction of the current prediction. If the camera is moving between frames, and this motion may somehow be independently estimated, the camera motion may be compensated for in formnning the prediction. This also helps to reduce the reliance on motion vector accuracy.

The main disadvantage of motion-based tracking in complex dynamic scenes with cluttered backgrounds arises from the lack of a separate detection stage. The feature may be occluded by another object, or suddenly change course, so that predictive motion estimation fails and tracking is lost. In these cases, tracking should be halted and the system notified of the "loss-of-track" (LOT) condition. However, the nature of motion estimation is such that a vector is always returned whether or not the feature is still actually visible near the predicted position. Hence, detecting the LOT condition requires some extra checking after the correction to the predicted position.

Most commonly, the region surrounding the current feature position is compared with stored reference data in some domain, and if that region is sufficiently different, an LOT condition is flagged. The reference data is initially derived from the region around the feature in the frame in which the feature was selected. Previous approaches have either kept the reference data fixed while tracking, or updated it continuously with the contents of the previous frame. Using a "goodness of fit" measure supplied by the motion estimation itself for example, the height of a correlation peak as the LOT criterion, is equivalent to the second approach, that is, comparing the region surrounding the current feature position with the region surrounding the feature position in the previous frame.

602075AU.doc -4- However, both these approaches, which may be viewed as opposite extremes of adaptivity, have disadvantages. Keeping the reference data fixed means the associated feature tracking system is unable to adapt to gradual but superficial changes in the appearance of the feature as it, for example, rotates in depth or undergoes lighting changes. Consequently, a LOT condition will be flagged prematurely. On the other hand, continual updates of the reference data can make such a feature tracking system too robust, causing it to fail to detect an insidious but fundamental change in the feature surrounds. Such a situation often occurs when a feature is occluded by another object.

Summary of the Invention It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements.

According to a first aspect of the invention, there is provided a method of tracking a feature across a sequence of image frames, each said image frame comprising a two-dimensional array of pixel data, said method comprising the steps of: estimating a current feature position in a current frame from at least a previous feature position in a previous frame; extracting feature data from pixel data of said current frame that are substantially around said current feature position; comparing said feature data with reference data, wherein a difference between said feature data and said reference data that is larger than a first predetermined number indicates that track of said feature has been lost; and updating said reference data periodically with feature data of a plurality of frames.

602075AU.doc According to a second aspect of the invention, there is provided an apparatus for tracking a feature across a sequence of image frames, each said image frame comprising a two-dimensional array of pixel data, said apparatus comprising: means for estimating a current feature position in a current frame from at least a previous feature position in a previous frame; means for extracting feature data from pixel data of said current frame that are substantially around said current feature position; means for comparing said feature data with reference data, wherein a difference between said feature data and said reference data that is larger than a first predetermined io number indicates that track of said feature has been lost; and means for updating said reference data periodically with feature data of a plurality of frames.

Other aspects of the invention are also disclosed.

Brief Description of the Drawings One or more embodiments of the present invention will now be described with reference to the drawings, in which: Fig. 1 is a system block diagram of a feature tracking system; Fig. 2A is a graphical illustration of a rounded previous feature position measurement estimate from a previous frame and the small window centred on that estimate; Fig. 2B is a graphical illustration of a rounded predicted position measurement, a window centred on that predicted position measurement, and a motion vector as correction to obtain a current position measurement; 602075AU.doc Fig. 3 is an illustration of the relationship between the position of a current frame and the frames from which reference data is generated; Fig. 4 is a flow diagram of a method for extracting a feature vector from the window; Figs. 5A and 5B show a flow diagram of a feature tracking nethod; and Fig. 6 is a flow diagram of an alternative feature tracking method.

Detailed Description including Best Mode Some portions of the description which follows are explicitly or implicitly 1o presented in terms of algorithms and symbolic representations of operations on data within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.

It should be borne in mind, however, that the above and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise, and as apparent from the following, it will be appreciated that throughout the present specification, discussions utilizing terms such as "calculating", "determining", "replacing", "generating" 602075AU.doc -7- "initializing", "outputting", or the like, refer to the action and processes of a computer system, or similar electronic device, that manipulates and transforms data represented as physical (electronic) quantities within the registers and memories of the computer system into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.

Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears.

Fig. 1 shows a schematic block diagram of a system upon which feature tracking can be practiced. The system 100 comprises a computer module 101, such as a conventional general-purpose computer module, input devices including a video camera 115, a keyboard 102 and pointing device 103, and output devices including a display device 114.

The computer module 101 typically includes at least one processor unit 105, a memory unit 106, input/output interfaces including a video interface 107 for the video display 114 and the video camera 115, and an 1/0 interface 113 for the keyboard 102 and the pointing device 103. A storage device 109 is provided and typically includes a hard disk drive and a floppy disk drive. A CD-ROM drive 112 is typically provided as a non-volatile source of data. The components 105 to 113 of the computer module 101, typically communicate via an interconnected bus 104 and in a manner which results in a conventional mode of operation of the computer module 101 known to those in the relevant art.

602075AU.do The feature tracking method 500, described with reference to Fig. 5, may be performed on "live" video data. Such "live" video data may be captured by the video camera 115, forming a sequence of two-dimensional frames Ak of sampled pixels. A frame Ak, captured at time k measured in frame intervals, consists of a rectangularly s sampled grid of values, with each value representing the intensity of light falling onto a corresponding element of an image plane sensor of the video caimera 115. The data for the frames Ak is stored on the storage device 109 or memory unit 106 as a twodimensional array of size L columns by R rows. The location of a pixel x columns from the left border of the frame Ak and y rows down from the top border is denoted as (xk, yk).

The value may be a scalar value representing overall intensity, or a vector value representing the intensities of different colour components.

The video data, which may be derived directly from the video camera 115, or from playback of stored data, is displayed on the video display 114 under control of the processor 105. A user uses the pointing device 103 to point to a feature to be tracked that is displayed on the display 114, thereby establishing the location of the feature in an initial frame A as (xl, yl). Alternatively, selection of a feature to be tracked may be by an automatic detection process.

The description that follows assumes tracking is to take place forward in time from the initial frame A from which the feature to be tracked was selected. However, if a stored sequence of frames Ak is being analysed the tracking may well be carried out backwards in time from the selection frame A The aim of feature tracking, given the sequence of frames A l

A

2 is to estimate the position coordinates (xk, Yk) of the feature at each frame interval k for as long as the feature corresponding to the selected position (xl, yi), which typically forms part of 602075AU.doc 1 -9a real-world object, remains visible to the camera 115. It is noted that the feature tracking method 500 set out below may be applied simultaneously to any number of selected features. However, for simplicity it shall be assumed that only a single feature is being tracked.

Kalman tracking formulation The Kalman-based formulation of the tracking problem follows. Kalman filtering is a well-known method of estimating dynamic parameters of a linear system under conditions of imperfect observation. Such filtering is formulated to provide an optimum estimate of the state of the system given all the previous and current observations at any time, under certain assumptions about observation errors. Although those assumptions are not met in the case of general feature tracking, and in particular the assumption that the observation error is Gaussian in statistical form, a departure from those assumptions does not critically affect the performance of feature tracking using Kalman filtering.

A zero-order model state vector xk (of length 2) for the system 100 at time k may be set to xk [xk yk In the preferred implementation a first-order model is used for the state vector Xk having a length of 4, and which explicitly includes feature velocity as independent variables as follows: x k Yk

X

k (1)

X

k yk 602075AU.doc Under such a first-order model, if the feature is moving with a constant velocity relative to the boundary of the frames, minimal measurement corrections should be required.

It is assumed that the state of the system 100, represented by the state vector xk, evolves linearly with time k (in frame intervals) as follows: ,k Bii, s, (2) where: D is the 4 by 4 system evolution matrix, given by [1 0 1 0" D (3) 0010 0 001 Sk is the "process noise", assumed to be drawn from a zero-mean Gaussian distribution with 4 by 4 covariance matrix Q: sk (4) Ik is the 2-vector of "inputs", i.e. the displacement at pixel (xk, yk) of frame Ak induced by camera operation (panning, tilting, zooming): ,k v k and B is the (4 by 2) input matrix: 1 0 0 1 B= 0 0 (6) 0 0 0 0 602075AU.doc 11 It is assumed that the camera-operation-induced displacement uik is provided by an external process. If it is not provided, the algorithm may still proceed with the displacement uk set to zero for all k, but greater strain will be placed on the motion estimator in this case if the camera is indeed being operated, particularly if the operation changes erratically from frame to frame.

The system state Xk may not be observed directly. Instead, the system state Xk can be estimated through a linear measurement step as follows: zk Hxk e(7) where: Zk is the 2-vector of position measurements obtained by motion estimation as described below; H is the (2 by 4) position measurement matrix H= (8) 0 1 0 0] and ek is the "measurement noise", assumed to be drawn from a zero-mean Gaussian distribution with (2 by 2) covariance matrix A: ek N(0, A) (9) The Kalman tracking algorithm is a recursive process carried out on each frame Ak from time interval k 2 onwards. The aim is to produce an estimate of the state vector from a previous estimate the current input uM(, and the current measurement Zk.

An additional output at each time interval k is an estimate Pk of the (4 by 4) covariance matrix of the state estimation error (Xk k 602075AU.doc -12- To initialise the Kalman tracking process, the initial state estimate and initial covariance matrix of the state estimation error PI must be specified. The former comes from the initial feature location: x1 y, 0 Because the first two components of the initial state estimate comprising the initial location (xl,yl), are known exactly, while the velocity estimates, which are set to zero, are independent but uncertain, the initial covariance matrix of the state estimation error P 1 is set to 0 0 0 0(11) 0 0 0 a where o 2 is the initial velocity variance.

The other quantities required upon initialisation are the process noise and measurement noise covariance matrices Q and A. Because the variables of the state vector Xk and measurements Zk are independent, the process noise and measurement noise covariance matrices Q and A may be assumed to be diagonal matrices. The values on the is diagonals of the process noise covariance matrice Q, the measurement noise covariance matrices A, and the initial covariance matrix of the state estimation error P 1 mainly affect the speed of initial "locking": the larger they are, the slower the response of the algorithm.

The initialisation is followed by the prediction of the state vector xc and the covariance matrix of the state estimation error Pk as follows: 602075AU.doc -13- Xk Bu k (12) P, DP_,D T +Q (13) where the superscript indicates these are prior estimates, made without reference to the current measurement Zk. The system evolution matrix D and the input matrix B are constants as set out in Equations and respectively.

Prediction is followed by a measurement update step wherein the predictions of the state vector -k and the covariance matrix of the state estimation error P are updated with the current measurement Zk as follows:

K

k PH (HP H 7 (14) 1o =k K H Hi(15) (14 KH)P~ (16) wherein Kk is known as the Kalman gain matrix (4 by and 14 is the (4 by 4) identity matrix. The position measurement matrix H is a constant as set out in Equation The method used to obtain the current feature position measurement Zk for use in Equation (15) is based on estimating the motion of a small window Wk-1 centred on the estimated feature position Zk-I in the previous frame Ak-1. The best estimate of the previous feature position measurement Zk-_ is given by ik-l Hk- (17) As the previous feature position measurement is real valued, but the position coordinate system used comprises a discrete grid of positions, the previous feature position measurement Zk-I must be rounded as: 602075AU.do 14- -k (18) where indicates a rounding operation. The feature 201, with its rounded position measurement estimate _k-i and the small window Wk-i are illustrated in Fig. 2A.

The window Wk-1 should be as small as possible, to exclude background clutter from the measurement zk, while still containing sufficient detail to enable a reasonable measurement estimate ik to be obtained. In the preferred implementation, the window Wk- is square, with power-of-two size w in rough proportion to the frame size, being R rows by L columns, as follows: w= 2 Dog(min(R,L))]-4 (19) rounded down to a minimum size of eight.

The current feature position measurement zk is that which satisfies the following: Ak- Ak One option for finding a feature position measurement zk that satisfies Equation (20) is to use a search-and-match technique. That is, the pixel values in the 1i previous feature window Wk-1 are compared with a number of candidate windows surrounding a predicted position measurement Zk in the current frame Ak using some similarity criterion. The candidate window with the best match is selected, and the centre of that selected candidate window will provide the current position measurement Zk.

However, such a search-and-match technique for finding a feature position measurement Zk is laborious and time-consuming. Moreover, the correction from the predicted position measurement ;k to the position measurement Zk is limited in precision by the number of search candidate windows compared.

602075AU.doc 15 In the preferred implementation, a motion vectorfk is estimated directly, with the motion vectorfk being a correction to the predicted position measurement k to obtain the actual feature position measurement Zk, using a fast algorithm such as phase correlation.

In principle the correction fk is of infinite precision, i.e. real-valued, though rounding errors and noise limit the precision in practice.

The estimated state vector k, as calculated in Equation is used to obtain the current prediction of the feature position ik k H k (21) Again the predicted position measurement 'k must be rounded to the nearest integer as follows: k (22) to be used as the centre of the window W k to be extracted from frame Ak. That is, Ak( k) (23) for integer components of position in the range relative to the window centre. Fig. 2B shows the frame Ak with the rounded predicted position measurement (k and the window Wk illustrated. The previous window Wk. from the previous frame Ak-1 is also illustrated.

Similarly, the pixel values of window Wk-1 are extracted from the previous frame Ak-1 centred on the rounded previous position i.e.

Wk- Ak,( (24) 602075AU.doc 16- Having extracted the pixel values of the windows Wk-1 and Wk from the previous and current frames Ak-l and Ak, the windows Wk-l and Wk are passed as input to a fast motion estimation algorithm. The result is a motion vector fk which approximately satisfies the following: Wk Wk-, Combining Equations and the current position measurement Zk may be obtained as zk zk- fk k k-i (26) The motion vector fk and the current position measurement Zk, which is an estimate of the position of the feature 202, are also illustrated in Fig. 2B: Conventional Kalman tracking proceeds by incrementing time interval k, obtaining a new frame Ak, and alternating the prediction and measurement update steps set out above in relation to Equations (12) to (16) and (26).

Checking for frame boundary Once the estimated state vector x k has been corrected using the position measurement Zk in Equation a current "filtered" feature position may be obtained as: x k ]=Hk (27) yk The current estimated feature position k ]T is then checked to see if it is "too close" to the boundary of the current frame Ak. In the preferred implementation, it is determined that the feature position [k I, ]T is too close to the boundary of the frame Ak when an edge of the window Wk around the feature position [kYk touches the 602075AU.doc 17boundary of the frame Ak. If the feature position is determined to be too close, tracking may still continue for a small number of frames. This allows for continued tracking of a feature that temporarily moves near the frame boundary without leaving the frame Ak altogether. If the feature position li k ]T does not move away from the frame boundary, tracking is terminated on the grounds that the feature has left the frame Ak.

Checking for loss-of-track It needs to be determined whether the feature position estimate [kkJk corresponds to the "true" position of the feature. This is done by comparing the pixel values within the w by w window Wk centred on the rounded feature position L[k ,Yk in frame Ak with a "reference" data set. This window Wk will be referred to hereafter as the feature window Wk. It is determined that a Loss of Track (LOT) condition exists when the difference between pixel values of the feature window Wk and the reference data set exceeds a predetermined threshold. The method of computing this difference is described below.

When a LOT condition is detected, feature tracking may continue for a limited number of frames Ak, allowing for track to be regained after a transient disturbance.

However, the measurement update step is bypassed during such a "probation" period in that the state vector ik is not updated with the current measurement Zk (Equation but is simply made equal to the predicted state vector xk: Xk =X~k (28) 602075AU.doc -18- If track condition is regained during the probation period, normal feature tracking is resumed. However, if the track is not regained during the limited number of frames Ak, feature tracking is halted.

The reference data is clearly crucial to the LOT detection step. Previous methods have used one of two approaches: Obtain the reference data from a plurality of feature windows Wk at the start of the feature tracking sequence, then keep the reference data fixed for all time intervals k; and Update the reference data every frame Ak, using the current feature window Wk.

However, both these approaches, which may be viewed as opposite extremes of adaptivity, have disadvantages as set out in the Background Art section.

The feature tracking system 100 finds an appropriate compromise, by updating the reference data periodically using statistics extracted from previous feature windows Wk. In the preferred implementation, the reference data is extracted from a set of N frames. This means that the contents of every feature window Wk are compared with reference data extracted from a previous set of N feature windows. Only feature windows Wk where a LOT condition is not detected are added to the set of N frames to be used for the reference data.

Fig. 3 illustrates the relationship between the position of a current frame Ak and the frames from which reference data is generated. Following the initial frame A 1 the feature windows Wk of the next N frames A, for which a LOT condition is not detected, with N=5 in the illustration, are added to the reference set of feature windows. When the set includes N feature windows, reference data is calculated from that set. This reference data is used for comparison with feature window Wk from frame Ak, until new reference 602075AU.doc -19data is calculated from the following set of N feature windows. For example, the feature window Wo from frame Alo is compared with the reference data from the set of 5 feature windows {W2, W3, W4, W 5

W

6 Because of a LOT condition in frames All and A 12 the next set of 5 feature windows is {W 7

W

8

W

9

W

10

W

1 3 from which the new reference data for use with, for example, frames A 1 4 and A 1 5 is calculated.

Feature extraction and comparison In the LOT detection step, a feature vector Vk is extracted from the window Wk, and compared with the reference data. Any combination of feature extraction and comparison criterion may be applied. The combination should have the following characteristics: Robustness to superficial changes in the feature window Wk. Examples of superficial changes are global intensity changes, small translations of the feature within the window, and additive noise; Sensitivity to fundamental changes in the feature window Wk,, such as the intrusion of an occluding object; Wide applicability, i.e. no need to "tune" the algorithm to a specific situation; and Computational efficiency.

The preferred method 400 for extracting the feature vector vk from the window Wk is illustrated in Fig. 4, wherein the two-dimensional Fast Fourier Transform (FFT) of the feature window Wk is used. Step 405 calculates the average luminance Ek of the feature window Wk.

602075AU.doc Step 410 sets a counter i to 1. In step 415 the i-th colour component Wki of feature window Wk is multiplied by a two-dimensional Hanning function centred on the centre of feature window Wk. Next, the FFT is applied to the modified feature window Wki in step 420.

The phase of each complex-valued Fourier coefficient is discarded in step 425, thereby retaining only the modulus. The Fourier coefficients in the second and third quadrant of the spatial frequency plane are also discarded in step 430. Furthermore, all Fourier coefficients with a spatial frequency component greater than 7T/2 in either direction (horizontal and vertical) are also discarded in step 435.

The remaining Fourier coefficients are normalised in step 440 by dividing each by the average luminance Ek calculated in step 405. An i-th component feature vector vki is obtained in step 445 by concatenating all the remaining, normalised Fourier coefficients into a column vector vki. Step 450 then determines whether the counter i is equal to K, which is the number of colour components used in the frames Ak. If the counter i is still smaller than the number of colour components, then step 452 increments the counter i, and steps 415 to 445 are repeated to obtain the next component feature vector vki.

After all the component feature vectors vki have been obtained, the method 400 continues to step 455 where all vectors Vki are concatenated into a single column vector vk to produce the feature vector vk. The feature vector Vk has length M given by: 2 M (29) 8 The method 400 also ends in step 455.

The choices implemented into method 400 were made to meet the criteria of sensitivity, robustness, and efficiency listed above. In particular: 602075AU.doc -21 The Fourier transform moduli are invariant under small spatial shifts in the window contents; The Hanning function emphasises the pixels near the centre of the window at the expense of those near the edge, to reduce the effect of background clutter; The pixel values of the window are real-valued, so the second and third quadrants of the spatial frequency plane contain redundant information and may be discarded for efficiency; The typical Fourier-domain content of real scenes is concentrated at low spatial frequencies, so the higher frequency components (those above half-band, or nr 2) may be discarded to reduce the effect of noise; and Normalisation by average luminance removes the effect of global luminance changes as a result of, for example, changes in lighting conditions.

In the preferred implementation the Mahalanobis distance is used as the comparison criterion. The Mahalanobis distance of the feature (column) vector vk from a reference distribution is given by: tk (vk- T where v and C are the mean feature vector and covariance matrix of the distribution of the reference set of N feature windows. These are computed at intervals shown in Fig. 3, using the following two equations. Compute the mean feature vector v as: 1u k (31) N k=1 and the covariance matrix C as: C ElM (vk -vXvk-Y (32) N-1 k=1 602075AU.doc -22where E is a small positive number. The first term is present to ensure the covariance matrix C is invertible in extreme cases, while not affecting the values of the covariance matrix C significantly in other cases. During the first N frames after the first (assuming no LOT condition occurs during the first N frames), the inverted covariance matrix C' is set to zero, so that the Mahalanobis distance tk evaluates to zero.

The Mahalanobis distance tk is a normalised Euclidean vector distance, where each component of the difference (vk v) is weighted inversely according to the variance of the associated component over the set of N feature windows. In the one-dimensional case, i.e. when the length M of the feature vector Vk is one, the Mahalanobis distance tk is simply the number of standard deviations a given number lies away from the mean. As such, it is statistically based and may be compared with a constant threshold over a wide variety of feature extraction techniques.

Complete feature tracking method Fig. 5 is a flowdiagram of a feature tracking method 500 for estimating the position coordinates (xk, Yk) of the feature at each frame interval k, given the sequence of frames Al, A 2 and the selected position (xl, yl), thereby forming a trajectory. The method 500 is implemented in the feature tracking system 100 (Fig. 1) as an application program which is resident on the hard disk drive 110 and read and controlled in its execution by the processor 105. Intermediate storage of the program and any frame data received from the video camera 115 may be accomplished using the memory 106, possibly in concert with the hard disk drive 110. In some instances, the application program may be supplied to the user encoded on a CD-ROM or floppy disk, or alternatively may be read by the user from a network via a modem device (not 602075AU.doc -23illustrated). Still further, the software can also be loaded into the system 100 from other computer readable medium. Computer readable medium is taken herein to include any transmission medium for communicating the computer program between a source and a designation.

The feature tracking method 500 starts in step 501, followed by step 503 where a selection is received by the processor 105 of the location (xl, yl) of the feature in an initial frame A 1 The system 100 is initialised in step 505 by setting the initial state estimate k, and initial covariance matrix of the state estimation error P 1 as set out in Equations and (11) respectively. The process noise and measurement noise covariance matrices Q io and A are also initialised. To ensure reasonably rapid locking, in the preferred implementation the velocity variance o2 is set to 1 and the values on the diagonal of the process noise and measurement noise covariance matrices Q and A are set to 0.1. The other values of matrices Q and A are set to 0.

A first position measurement il is also obtained as [xi yl] Finally, all entries Is of the mean feature vector 7 and inverse C'of the covariance matrix are set to 0.

Step 507 sets a number of variables used within the method 500. In particular: variable k is set to 1, where k is used as the frame time interval; variable lost is set to 0, where lost=-O indicates that the track has not been lost, whereas lost=l- indicates a LOT condition; variable bdry is set to 0, where bdry=l indicates that the feature is close to the frame boundary; and counters Si, S 2 and 53 are all set to 0.

Step 509 follows where the variable k is incremented. The data for the next frame Ak is retrieved by the processor 105 in step 511 from the storage device 109 or 602075AU.doc -24memory unit 106. The camera-operation-induced displacement Ilk is received by the processor 105 from an external system (not illustrated) in step 513.

The prediction step 515 follows where the processor 105 calculates the predicted state vector xk and the predicted covariance matrix of the state estimation error Pk using Equations (12) and (13) respectively.

The prediction step 515 is followed by the measurement update steps. However, it is first determined in step 517 whether a LOT condition exists by determining whether the variable lost is equal to 1. If a LOT condition does not exist, then a measurement step 519 follows where the motion vector fk and the current position measurement Zk are calculated using Equations (25) and (26) respectively.

Step 519 is followed by step 521 wherein the predictions of the state vector x^ and the covariance matrix of the state estimation error P are updated with the current measurement Zk by the processor 105 using Equations (15) and (16).

If step 517 determined that a LOT condition does exist, the measurement update step is bypassed in that the state vector xk is not updated with the current measurement Zk, but is simply made equal to the predicted state vector k using Equation The Kalman gain matrix Kk and the covariance matrix of the state estimation error Pk are also updated using Equations (14) and (16).

Following either of step 521 or 523, the checking for frame boundary steps follow. The processor 105 determines in step 525 whether a "close to the boundary" condition now exists by checking whether an edge of the window Wk around the feature position [k yk ]T touches the boundary of the frame Ak. If a "close to the boundary" condition does exist then, in step 527, the counter S 1 is incremented and the variable bdry 602075AU.doc is set to 1. Counter S 1 keeps track of how many successive feature windows Wk are too close to the boundary. Step 529 follows wherein the processor 105 determines whether the counter S, is higher than a predetermined number pl. This allows for continued tracking even when the feature is near the frame boundary for up to pi frames. In the preferred implementation, the value of the predetermined number p, is set to 5. If step 529 determines that the counter S1 is higher than pl, then tracking is terminated on the grounds that the feature has left the frame Ak. Accordingly method 500 ends in step 530 with a "Out of frame" message displayed on the display 114.

If step 525 determines that a "close to boundary" condition does not exist, then it is determined in step 526 whether the variable bdry is equal to 1. If the variable bdry is equal to 1, then the counter S, is set to 0 and the variable bdry is set to 0 in step 531.

From either of step 526, 529 or 531 the method 500 continues to step 533 where the processor calculates the feature vector Vk using method 400 illustrated in Fig. 4. Step 535 then uses the feature vector vk to calculate the Mahalanobis distance tk using Equation The processor 105 determines in step 537 whether a LOT condition exists by checking whether the Mahalanobis distance tk is higher than a predetermined threshold. If a LOT condition does exist, then the counter S2 is incremented, and the variable lost is set to 1 in step 539. Counter 52 keeps track of how many successive frames Ak had a LOT condition. Step 541 follows wherein the processor 105 determines whether the counter S 2 is higher than a predetermined number P2. This allows for continued tracking even when a LOT condition exists for a probation period of up to p2 frame intervals. In the preferred implementation, the predetermined numberp2 is set to 5. If step 541 determines that the counter S2 is higher than P2, then tracking is terminated on the grounds that the track has 602075AU.doc -26been lost. Accordingly method 500 ends in step 542 with a "Lost Track" message displayed on the display 114.

If step 537 determines that a LOT condition does not exist, then the counter S 2 and variable lost are (again) set to 0 in step 543. The processor 105 also includes the feature vector Vk to the reference data set in step 545 and increments counter S 3 on step 547. Counter S3 keeps track of the number of feature windows Wk whose feature vectors Vk have been added to the reference data set. Step 549 follows where it is determined whether S3 is equal to N, thereby checking whether the reference data set includes the required number N of feature vectors vk.

If the required number N of feature vectors Vk have been included in the reference data set, then the processor 105 calculates, in step 551, the mean feature vector iT and the covariance matrix C using Equations (31) and (32) respectively for use in subsequent calculations of the Mahalanobis distance tk (in step 535). The counter S 3 is also reset to 0 in step 553.

The method 500 continues from either of steps 541, 549 or 553 to step 560 where the feature position k ]T is appended to the trajectory of the feature. Finally, the method 500 continues to step 509 from where tracking is performed with the next frame Ak.

Fig. 6 is a flowdiagram of an alternative feature tracking method 600 for constructing a trajectory of the feature given the sequence of frames A2, and the selected position (xl, yl). The feature tracking method 600 starts in step 601, followed by step 605 where the system 100 is initialised. This includes receiving the location (xl, yl) of the feature in an initial frame A 1 setting the initial state estimate the initial covariance matrix of the state estimation error P 1 the process noise and measurement 602075AU.doc -27noise covariance matrices Q and A. A first position measurement i, is also obtained as [xi y Finally, all entries of the mean feature vector and inverse C'ofthe covariance matrix are set to 0.

Step 610 estimates the current feature position Zk as described in relation to steps 515 to 523 of method 500 (Fig. The method 600 continues to step 615 where the processor calculates the feature vector Vk using method 400 illustrated in Fig. 4. Step 620 then compares the feature vector Vk with reference data. The reference data is a statistical representation of feature vectors from a plurality of previous frames.

The processor 105 determines in step 625 whether a LOT condition exists. If a LOT condition does exist, then tracking is terminated in step 626 on the grounds that the track has been lost.

If step 625 determines that a LOT condition does not exist, then the feature position [kk -k ]T is appended to a trajectory of the feature. The reference data is also is updated in step 635 in a manner described in steps 545 to 551 in method 500 (Fig. Finally, the method 600 continues to step 640 from where tracking is performed with the next frame Ak.

The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiment(s) being illustrative and not restrictive.

In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including" and not "consisting only of'. Variations of the word comprising, such as "comprise" and "comprises" have corresponding meanings.

602075AU.doc

Claims

2. A method according to claim 1 wherein said reference data is a statistical representation of feature data of said plurality of frames.
3. A method according to claim 1 or 2 wherein said feature data comprises a feature vector, said reference data comprises a reference vector, step comprises calculating an average of feature vectors of said plurality of frames, and step comprises calculating a nonnalised Euclidean vector distance between said feature vector and said reference vector. 602075AU.doc -29-
4. A method according to claim 3 wherein a covariance of the distribution of said reference data is used to calculate said normalised Euclidean vector distance between said feature vector and said reference vector.
5. A method as claimed in claim 3 or 4 wherein step comprises the sub-steps of: (bl) applying a Fast Fourier Transform to each colour component; (b2) concatenating magnitude coefficients from each Fast Fourier Transform to form component feature vectors; and (b3) concatenating said component feature vectors to form said feature i0 vector.
6. A method according to claim 5 comprising the further sub-step performed after step (bl) and before step (b2) of: discarding either of first or third quadrant coefficients of each Fast Fourier Transform, and either of second or fourth quadrant coefficients of each Fast Fourier Transform.
7. A method according to claim 5 or 6 comprising the further sub-step performed after step (bl) and before step (b2) of: discarding coefficients of each Fast Fourier Transform having either a horizontal or vertical spatial frequency component above r/2.
8. A method according to any one of claims 5 to 7 comprising the further sub-step performed after step (bl) and before step (b2) of: 602075AU.doc normalising coefficients of each Fast Fourier Transform with luminance data of said pixel data.
9. A method according to any one of claims 1 to 8 comprising the further initial step of: multiplying each colour component of said pixel data by a window function. A method according to claim 9 wherein said window function is a Hanning function.
11. A method according to any one of claims 1 to 10 wherein steps to are performed on a second predetermined number of subsequent frames after said track of said feature has been lost, and step resumed if said difference between said feature data and said reference data is smaller than said first predetermined number.
12. A method according to any one of claims 1 to 11, comprising the further step of concatenating said current feature positions to form a trajectory.
13. An apparatus for tracking a feature across a sequence of image frames, each said image frame comprising a two-dimensional array of pixel data, said apparatus comprising: means for estimating a current feature position in a current frame from at least a previous feature position in aprevious frame; 602075AU.doc -31 means for extracting feature data from pixel data of said current frame that are substantially around said current feature position; means for comparing said feature data with reference data, wherein a difference between said feature data and said reference data that is larger than a first predetermined number indicates that track of said feature has been lost; and means for updating said reference data periodically with feature data of a plurality of frames.
14. An apparatus according to claim 13 wherein said reference data is a statistical 1o representation of feature data of said plurality of frames. An apparatus according to claim 13 or 14 wherein said feature data comprises a feature vector, said reference data comprises a reference vector, said means for updating said reference data calculates an average of feature vectors of said plurality of frames, and said means for comparing calculates a normalised Euclidean vector distance between said feature vector and said reference vector.
16. An apparatus according to claim 15 wherein a covariance of the distribution of said reference data is used to calculate said normalised Euclidean vector distance between said feature vector and said reference vector.
17. An apparatus as claimed in claim 15 or 16 wherein said means for extracting feature data comprises: means for applying a Fast Fourier Transform to each colour component; 602075AU.doc 32 means for concatenating magnitude coefficients from each Fast Fourier Transform to form component feature vectors; and means for concatenating said component feature vectors to form said feature vector.
18. An apparatus according to any one of claims 13 to 17 wherein, after said track of said feature has been lost, said current feature position is estimated, said feature data extracted and compared with said reference data for a second predetermined number of subsequent frames, and said reference data is updated if said difference between said feature data and said reference data is smaller than said first predetermined number.
19. An apparatus according to any one of claims 13 to 18, further comprising means for concatenating said current feature positions to form a trajectory.
20. A program stored on a memory medium for tracking a feature across a sequence of image frames, each said image frame comprising a two-dimensional array of pixel data, said program comprising: code for estimating a current feature position in a current frame from at least a previous feature position in a previous frame; code for extracting feature data from pixel data of said current frame that are substantially around said current feature position; code for comparing said feature data with reference data, wherein a difference between said feature data and said reference data that is larger than a first predetermined number indicates that track of said feature has been lost; and 602075AU.doc -33- code for updating said reference data periodically with feature data of a plurality of frames.
21. A program according to claim 20 wherein said reference data is a statistical representation of feature data of said plurality of frames.
22. A program according to claim 20 or 21 wherein said feature data comprises a feature vector, said reference data comprises a reference vector, said code for updating said reference data calculates an average of feature vectors of said plurality of frames, and lo said code for comparing calculates a normalised Euclidean vector distance between said feature vector and said reference vector.
23. A program according to claim 22 wherein a covariance of the distribution of said reference data is used to calculate said normalised Euclidean vector distance between said feature vector and said reference vector.
24. A program as claimed in claim 22 or 23 wherein said code for extracting feature data comprises: code for applying a Fast Fourier Transform to each colour component; code for concatenating magnitude coefficients from each Fast Fourier Transform to form component feature vectors; and code for concatenating said component feature vectors to form said feature vector. 602075AU.doc -34- A program according to any one of claims 20 to 24 wherein, after said track of said feature has been lost, said current feature position is estimated, said feature data extracted and compared with said reference data for a second predetermined number of subsequent frames, and said reference data is updated if said difference between said feature data and said reference data is smaller than said first predetermined number.
26. A program according to any one of claims 20 to 25, further comprising code for concatenating said current feature positions to form a trajectory.
27. A method of tracking a feature across a sequence of image frames, said method being substantially as described with reference to Figs. 2 to 6.
28. An apparatus for tracking a feature across a sequence of image frames, said apparatus being substantially as described with reference to Figs. 1 to 6.
29. A program for tracking a feature across a sequence of image frames, said program being substantially as described with reference to Figs. 2 to 6. DATED this 3 lth Day of July 2002 Canon Kabushiki Kaisha Patent Attorneys for the Applicant SPRUSON FERGUSON 602075AU.doc