EP1683108A2 - Object tracking within video images - Google Patents
Object tracking within video imagesInfo
- Publication number
- EP1683108A2 EP1683108A2 EP04798412A EP04798412A EP1683108A2 EP 1683108 A2 EP1683108 A2 EP 1683108A2 EP 04798412 A EP04798412 A EP 04798412A EP 04798412 A EP04798412 A EP 04798412A EP 1683108 A2 EP1683108 A2 EP 1683108A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- detected
- objects
- matched
- object model
- characteristic features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
Definitions
- This invention relates to a method and system for tracking objects detected within video images from frame to frame.
- the present invention addresses the above by the provision of an object tracking method and system for tracking objects in video frames which takes into account the scaling and variance of each matching feature. This provides for some latitude in the choice of matching feature, whilst ensuring that as many matching features as possible can be used to determine matches between objects, thus giving increased accuracy in the matching thus determined.
- the present invention provides a method for tracking objects in a sequence of video images, comprising the steps of: storing one or more object models relating to objects detected in previous video images of the sequence, the object models comprising values of characteristic features of the detected objects and variances of those values; receiving a further video image of the sequence to be processed; detecting one or more objects in the received video image; determining characteristic features of the detected objects; calculating a distance measure between each detected object and each object model on the basis of the respective characteristic features using a distance function which takes into account at least the variance of the characteristic features; matching the detected objects to the object models on the basis of the calculated distance measures; and updating the object models using the characteristic features of the respective detected objects matched thereto so as to provide a track of the objects.
- the distance measure is a scaled Euclidean distance. This provides the advantage that high-dimensional data can be processed by a computationally inexpensive process, suitable for real-time operation.
- the distance function is of the form:-
- V ' 1 ⁇ li for object model / and detected object k, and where the index i runs through all the N features of an object model, and is the corresponding component of the variance of each feature.
- the distance measure is the Mahalanobis distance, which takes into account not only the scaling and variance of a feature, but also the variation of other features based on the covariance matrix. Thus, if there are correlated features, their contribution is weighted appropriately.
- the accuracy of matching of object model to detected object can be increased.
- the variances of the characteristic feature values of that object are increased. This provides the advantage that it assists the tracker in recovering lost objects that may undergo sudden or unexpected movements.
- the updating step comprises updating the characteristic feature values with an average of each respective value found for the same object over a predetermined number of previous images. This provides for compensation in the case of prediction errors, by changing the prediction model to facilitate re-acquiring the object.
- the method preferably further comprises counting the number of consecutive video images for which each object is tracked, and outputting a tracking signal indicating that tracking has occurred if an object is tracked for a predetermined number of consecutive frames. This allows short momentary object movements to be discounted.
- an object model is not matched to a detected object in the received image then preferably a count of the number of consecutive frames for which the object model is not matched is incremented, the method further comprising deleting the object model if the count exceeds a predetermined number.
- a count of the number of consecutive frames for which the object model is not matched is incremented, the method further comprising deleting the object model if the count exceeds a predetermined number.
- the present invention also provides a system for tracking objects in a sequence of video images, comprising:- storage means for storing one or more object models relating to objects detected in previous video images of the sequence, the object models comprising values of characteristic features of the detected objects and variances of those values; means for receiving a further video image of the sequence to be processed; and processing means arranged in use to:- detect one or more objects in the received video image; determine characteristic features of the detected objects; calculate a distance measure between each detected object and each object model on the basis of the respective characteristic features using a distance function which takes into account at least the variance of the characteristic features; match the detected objects to the object models on the basis of the calculated distance measures; and update the stored object models using the characteristic features of the respective detected objects matched thereto.
- the present invention also provides a computer program or suite of programs arranged such that when executed on a computer system the program or suite of programs causes the computer system to perform the method of the first aspect.
- a computer readable storage medium storing a computer program or suite of programs according to the third aspect.
- the computer readable storage medium may be any suitable data storage device or medium known in the art, such as, as a non-limiting example, any of a magnetic disk, DVD, solid state memory, optical disc, magneto-optical disc, or the like.
- Figure 1 is a system block diagram illustrating a computer system according to the present invention
- Figure 2 (a) and (b) are a flow diagram illustrating the operation of the tracking method and system of the embodiment of the invention
- Figure 3 is a drawing illustrating the concept of object templates being matched to detected object blobs used in the embodiment of the invention
- Figure 4 is a frame of an video sequence showing the tracking performed by the embodiment of the invention
- Figure 5 is a later frame of the video sequence including the frame of Figure 4, again illustrating the tracking of objects performed by the invention.
- Figure 1 illustrates an example system architecture which provides the embodiment of the invention. More particularly, as the present invention generally relates to an image processing technique for tracking objects within input images, the invention is primarily embodied as software to be run on a computer. Therefore, the system architecture of the present invention comprises a general purpose computer 16, as is well known in the art.
- the computer 16 is provided with a display 20 on which output images generated by the computer may be displayed to a user, and is further provided with various user input devices 18, such as keyboards, mice, or the like.
- the general purpose computer 16 is also provided with a data storage medium 22 such as a hard disk, memory, optical disk, or the like, upon which is stored programs, and data generated by the embodiment of the invention.
- An output interface 40 is further provided by the computer 16, from which tracking data relating to objects tracked within the images by the computer may be output to other devices which may make use of such data.
- On the data storage medium 22 are stored data 24 corresponding to stored object models (templates), data 28 corresponding to an input image, and data 30 corresponding to working data such as image data, results of calculations, and other data structures or variables or the like used as intermediate storage during the operation of the invention.
- the computer 16 is arranged to receive images from an image capture device 12, such as a camera or the like.
- the image capture device 12 may be connected directly to the computer 16, or alternatively may be logically connected to the computer 16 via a network 14 such as the internet.
- the image capture device 12 is arranged to provide sequential video images of a scene in which objects are to be detected and tracked, the video images being composed of picture elements (pixels) which take particular values so as to have particular luminance and chrominance characteristics.
- the colour model used for the pixels output from the image capture device 12 may be any known in the art e.g. RGB, YUV, etc.
- the general purpose computer 16 receives images from the image capture device 12 via the network, or directly, and runs the various programs stored on the data storage medium 22 under the general control of the control program 31 so as to process the received input image in order to track objects therein. A more detailed description of the operation of the embodiment will now be undertaken with respect to Figures 2 and 3.
- a new video image is received from the image capture device 12, forming part of a video sequence being received from the device.
- the first processing to be performed is that objects of interest (principally moving objects) need to be detected within the input image, a process generally known as "segmentation". Any segmentation procedure already known in the art may be used, such as those described by McKenna et al. in “Tracking Groups of People", Computer Vision and Image Understanding, 80, 42-56, 2000 or by Horpraset et al.
- an object detection technique as described in the present applicant's co-pending international patent application filed concurrently herewith and claiming priority from U.K. application 0326374.4 may also be used.
- object detection is performed by the object detection program 26 to link all the pixels presumably belonging to individual objects into respective blobs.
- the purpose of the following steps is then to temporally track respective blobs representing objects throughout their movements within the scene by comparing feature vectors for the detected objects with temporal templates (object models). The contents of the object templates is discussed below.
- Fitting of an ellipse to determine r and ⁇ may be performed as described in Fitzgibbon, A. W. and Fisher, R. B., "A buyer's guide to conic fitting", Proc. 5 th British Machine Vision Conference, Birmingham, pp. 513 ( — 522 (1995).
- For explanation of methods of determining c see Zhou Q. and Aggarwal, J. K., "Traching and classifying moving objects from video", Proc. 2nd IEEE Intl. Workshop on Performance Evaluation of Tracking and Surveillance (PETS '2001), Kauai, Hawaii, U.S.A. (December 2001). Having calculated the detected objects' feature vectors, it is then possible to begin matching the detected objects to the tracked objects represented by the stored object templates.
- each object of interest that has been previously tracked within the scene represented by the input images is modelled by a temporal template of persistent characteristic features.
- a template of features At any time f, we have, for each tracked object / centred at p h , a template of features: These object models (or templates) are stored in the data storage medium 22 in the object models area 24.
- M,(t + l) has already been predicted and stored.
- the stored object models also include the mean ,(t)and variance v,(t) vectors; these values are updated whenever a candidate blob k in frame t+1 is found to match with the template. Therefore, at step 2.8 the matching distance calculation program 36 is launched, which commences a FOR processing loop which generates an ordered list of matching distances for every stored object template with respect to every detected object in the input image. More particularly, at the first iteration of step 2.8 the first stored object template is selected, and its feature vector retrieved. Then, at step 2.10 a second nested FOR processing loop is commenced, which acts to step through the feature vectors of every detected object, processing each set in accordance with step 2.12.
- a matching distance value is calculated between the present object template and the present detected object being processed, by comparing the respective matching features to determine a matching distance therebetween. Further details of the matching function applied at step 2.12 are given next. Obviously, some features are more persistent for an object while others may be more susceptible to noise. Also, different features normally assume values in different ranges with different variances. Euclidean distance does not account for these factors as it will allow dimensions with larger scales and variances to dominate the distance measure. One way to tackle this problem is to use the Mahalanobis distance metric, which takes into account not only the scaling and variance of a feature, but also the variation of other features based on the covariance matrix. Thus, if there are correlated features, their contribution is weighted appropriately.
- such a distance metric may be employed.
- the covariance matrix can become non- invertible.
- matrix inversion is a computationally expensive process, not suitable for real-time operation.
- a scaled Euclidean distance shown in Eq. (2), between the template ,(t + i) and a candidate blob k is adopted.
- x h and y h are the scalar elements of the template M, and feature vector B k respectively
- ⁇ is the corresponding component of the variance vector F, ) and the index i runs through all the features of the template.
- Equation (2) is the same result as given by the Mahalanobis distance in the case that there is no correlation between the features, whereupon the covariance matrix become a diagonal matrix.
- Equation (2) represents a simplification by assuming that the features are uncorrelated.
- step 2.14 an evaluation is performed to determine whether all of the detected objects have been matched against the present object template being processed i.e. whether the inner FOR loop has finished. If not, then the next detected object is selected, and the inner FOR loop repeated. If so, then processing proceeds to S.2.16.
- step 2.16 the present state of processing is that a list of matching distances matching every detected object against the stored object template currently being processed has been obtained, but this list is not ordered, and neither has it been checked to determine whether the distance measure values are reasonable.
- a threshold is applied to the distance values in the list, and those values which are greater than the threshold are pruned out of the list.
- a THR value of 10 proved to work in practice, but other values should also be effective.
- the resulting thresholded list is ordered by matching distance value, using a standard sort routine.
- step 2.20 checks whether all of the stored object templates have been processed i.e. whether the outer FOR loop has finished. If not, then the next object template is selected, and the outer and inner FOR loops repeated. If so, then processing proceeds to S.2.22. At this stage in the processing, we have, stored in the working data area 30, respective ordered lists of matching distances, one for each stored object model.
- step 2.22 Using these ordered lists it is then possible to match detected objects to the stored object models, and this is performed next. More particularly, at step 2.22 a second FOR processing loop is commenced, which again acts to perform processing steps on each stored object template in turn.
- a second FOR processing loop is commenced, which again acts to perform processing steps on each stored object template in turn.
- an evaluation is performed to determine whether the object model being processed has an available match. A match is made with the detected object which gave the lowest matching distance value in the present object model's ordered list. No match is available if, due to the thresholding step carried out previously, there are no matching distance values in the present object model's ordered list. If the evaluation of step 2.24 returns true, i.e.
- present object / is matched by a candidate blob k in frame t+1, by way of the template prediction M,(t+ ⁇ ) , variance vector v,(t) and B k (t+ ⁇ ) , then processing proceeds to step 2.26 and the updates for the present object model / are performed.
- M,(t+l) B k (t + i)
- mean and variance M,(t+l) , v,(t + l)
- the template for each object being tracked has a set of associated Kalman filters that predict the expected value for each feature (except for the dominant colour) in the next frame, respectively.
- the Kalman filters KF,(t)for the object model are also updated by feeding with the values of the matched detected object using the predictive filter program 38, and the predicted values M l for the features of the object model for use with the next input frame are determined and stored.
- a ' ⁇ _counts' counter value representing the number of frames for which the object has been tracked is increased by 1
- an ' MS _count$ ' counter which may have been set if the track of the object had been temporarily lost in the preceding few frames is set to zero at step 2.32.
- step 2.56 (described later). If all of the stored object templates have not been processed, then the FOR loop of s.2.22 is recommenced with the next stored object template to be processed. Returning to step 2.24, consider now the case where the evaluation of whether there is an available match returns a negative. In this case, as explained above, due to the thresholding applied to the list of distance measures for an object template there are no matching distances within the list i.e. no detected object matches to the object template within the threshold distance.
- processing proceeds first to the evaluation of step 2.36, wherein the TK_counts counter for the present object template is evaluated to determine whether it is less than a predetermined value MIN_SEEN, which may take a value of 20 or the like. If TK_counts is less than MIN_SEEN then processing proceeds to step 2.54, wherein the present object template is deleted from the object model store 24. Processing then proceeds to step 2.34, shown as a separate step on the diagram, but in reality identical to that described previously above. This use of the MIN_SEEN threshold value is to discount momentary object movements and artefact blobs which may be temporarily segmented but which do not in fact correspond to proper objects to be tracked.
- step 2.38 If the evaluation of step 2.36 indicates that the TKjsounts counter exceeds the MIN SEEN threshold then a test for occlusion is next performed, at step 2.38.
- no use is made of any special heuristics concerning the areas where objects enter/exit into/from the scene. Objects may just appear or disappear in the middle of the image, and, hence, positional rules are not necessary. To handle occlusions, therefore, the use of heuristics is essential. As a result within the embodiment every time an object has failed to find a match with a detected object a test on occlusion is carried out at step 2.38.
- step 2.40 if the present object's bounding box overlaps with some other object's bounding box, as determined by the evaluation at step 2.40, then both objects are marked as 'occluded' at step 2.42. Processing then proceeds to step 2.48, which will be described below.
- step 2.40 if the occlusion test indicates that there are no overlapping other templates i.e. the present object is not occluded, then the conclusion is drawn that the tracking of the object has been lost. Therefore, processing proceeds to s.2.48 where an MS counts counter is incremented, to keep a count of the number of input frames for which the tracking of a particular object model has not been successful.
- step 2.44 can also be reached from step 2.42, where the present object model is marked as being occluded. As an error in the matching can occur simply due to the prediction errors, at step 2.44 the prediction model is changed to facilitate the possible recovery of the lost tracking.
- the same update is performed.
- step 2.44 processing proceeds to the evaluation of step 2.34, which has already been described.
- step 2.34 indicates that every object template has been processed in accordance with the processing loop commenced at s.2.22, the present state of processing is that every stored object model will have been either matched with a detected object, marked as occluded, not matched but within the MAX_LOST period, or deleted from the object model store 24 (either by virtue of no match having been found within the MIN_SEEN period, or by virtue of the MAX_LOST period having been exceeded without the object having been re-acquired). However, there may still be detected objects in the image which have not been matched to a stored object model, usually because they are new objects which have just appeared within the image scene for the first time in the present frame (for example, a person walking into the image field of view from the side).
- step 2.34 once it indicates that every object template has been processed in accordance with the processing loop commenced at s.2.22) processing proceeds to step 2.56, wherein a further FOR processing loop is commenced, this time processing the detected objects.
- step 2.58 is an evaluation which checks whether the present detected object being processed has been matched to an object model. If this is the case i.e. the present object has been matched, then there is no need to create a new object model for the detected object, and hence processing proceeds to step 2.62.
- Step 2.62 determines whether or not all the detected objects have been processed by the FOR loop commenced at step 2.56, and returns the processing to step 2.56 to process the next detected object if not, or ends the FOR if all the detected objects have been processed. If the present detected object has not been matched with a stored object model, however, then a new object model must be instantiated and stored at step 2.60, taking the detected object's feature values as it's initial values i.e. for the present detected object k in frame t+1, a new object template M k (t + ⁇ ) is created fromfi t (t + i) .
- step 2.60 the loop evaluation of step 2.62 is performed as previously described, and once all of the detected object have been processed by the loop processing can proceed onto step 2.64.
- object models are deleted if the tracking of the object to which they relate is lost (i.e. the object model is not matched) within the MIN_SEEN period.
- the output tracking information is used to manipulate the image to place a visible bounding box around each tracked object in the image, as shown in Figures 4 and 5.
- Figures 4 and 5 are two frames from a video sequence which are approximately 40 frames temporally separated (Figure 5 being the later frame).
- Figure 5 illustrates the ability of the present embodiment to handle occlusions, as the group of people tracked as object 956 are occluded by the van tracked as object 787, but each object has still been successfully tracked.
- the tracking information provided by the embodiment may be employed in further applications, such as object classification applications or the like.
- the tracking information may be output at the tracking output 40 of the computer 16 (see Figure 1) to other systems which may make use of it.
- the tracking information may be used as input to a device pointing system for controlling a device such as a camera or a weapon to ensure that the device remains pointed at a particular object in an image as the object moves.
- a device pointing system for controlling a device such as a camera or a weapon to ensure that the device remains pointed at a particular object in an image as the object moves.
- Other uses of the tracking information will be apparent to those skilled in the art.
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB0326375A GB0326375D0 (en) | 2003-11-12 | 2003-11-12 | Object tracking within video images |
PCT/GB2004/004687 WO2005048196A2 (en) | 2003-11-12 | 2004-11-08 | Object tracking within video images |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1683108A2 true EP1683108A2 (en) | 2006-07-26 |
Family
ID=29726403
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP04798412A Withdrawn EP1683108A2 (en) | 2003-11-12 | 2004-11-08 | Object tracking within video images |
Country Status (6)
Country | Link |
---|---|
EP (1) | EP1683108A2 (en) |
JP (1) | JP2007510994A (en) |
CN (1) | CN1875379A (en) |
CA (1) | CA2543978A1 (en) |
GB (1) | GB0326375D0 (en) |
WO (1) | WO2005048196A2 (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101212658B (en) * | 2007-12-21 | 2010-06-02 | 北京中星微电子有限公司 | Target tracking method and device |
CN101315700B (en) * | 2008-01-14 | 2010-11-24 | 深圳市蓝韵实业有限公司 | Fast automatic positioning method for multi-sequence image |
TWI585596B (en) * | 2009-10-01 | 2017-06-01 | Alibaba Group Holding Ltd | How to implement image search and website server |
JP5709367B2 (en) * | 2009-10-23 | 2015-04-30 | キヤノン株式会社 | Image processing apparatus and image processing method |
EP2345998B1 (en) * | 2009-12-01 | 2019-11-20 | Honda Research Institute Europe GmbH | Multi-object tracking with a knowledge-based, autonomous adaptation of the tracking modeling level |
JP5656567B2 (en) | 2010-11-05 | 2015-01-21 | キヤノン株式会社 | Video processing apparatus and method |
AU2011203219B2 (en) * | 2011-06-30 | 2013-08-29 | Canon Kabushiki Kaisha | Mode removal for improved multi-modal background subtraction |
KR101618814B1 (en) * | 2012-10-09 | 2016-05-09 | 에스케이텔레콤 주식회사 | Method and Apparatus for Monitoring Video for Estimating Gradient of Single Object |
KR101591380B1 (en) * | 2014-05-13 | 2016-02-03 | 국방과학연구소 | Conjugation Method of Feature-point for Performance Enhancement of Correlation Tracker and Image tracking system for implementing the same |
CN111402296B (en) * | 2020-03-12 | 2023-09-01 | 浙江大华技术股份有限公司 | Target tracking method and related device based on camera and radar |
-
2003
- 2003-11-12 GB GB0326375A patent/GB0326375D0/en not_active Ceased
-
2004
- 2004-11-08 CN CNA2004800327094A patent/CN1875379A/en active Pending
- 2004-11-08 WO PCT/GB2004/004687 patent/WO2005048196A2/en active Application Filing
- 2004-11-08 EP EP04798412A patent/EP1683108A2/en not_active Withdrawn
- 2004-11-08 JP JP2006538931A patent/JP2007510994A/en not_active Withdrawn
- 2004-11-08 CA CA002543978A patent/CA2543978A1/en not_active Abandoned
Non-Patent Citations (1)
Title |
---|
See references of WO2005048196A3 * |
Also Published As
Publication number | Publication date |
---|---|
WO2005048196A3 (en) | 2005-12-29 |
CA2543978A1 (en) | 2005-05-26 |
JP2007510994A (en) | 2007-04-26 |
CN1875379A (en) | 2006-12-06 |
GB0326375D0 (en) | 2003-12-17 |
WO2005048196A2 (en) | 2005-05-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20070092110A1 (en) | Object tracking within video images | |
EP1859411B1 (en) | Tracking objects in a video sequence | |
EP1844443B1 (en) | Classifying an object in a video frame | |
Nguyen et al. | Fast occluded object tracking by a robust appearance filter | |
Gabriel et al. | The state of the art in multiple object tracking under occlusion in video sequences | |
EP1859410B1 (en) | Method of tracking objects in a video sequence | |
Javed et al. | Tracking and object classification for automated surveillance | |
CN109035304B (en) | Target tracking method, medium, computing device and apparatus | |
Gutchess et al. | A background model initialization algorithm for video surveillance | |
EP1879149B1 (en) | method and apparatus for tracking a number of objects or object parts in image sequences | |
JP2009015827A (en) | Object tracking method, object tracking system and object tracking program | |
US8401243B2 (en) | Articulated object region detection apparatus and method of the same | |
US7940957B2 (en) | Object tracker for visually tracking object motion | |
CN112883819A (en) | Multi-target tracking method, device, system and computer readable storage medium | |
EP2521093A1 (en) | Moving object detection device and moving object detection method | |
CN106682619B (en) | Object tracking method and device | |
JP7272024B2 (en) | Object tracking device, monitoring system and object tracking method | |
KR20190023389A (en) | Multi-Class Multi-Object Tracking Method using Changing Point Detection | |
Pareek et al. | Re-projected SURF features based mean-shift algorithm for visual tracking | |
WO2005048196A2 (en) | Object tracking within video images | |
JP4086422B2 (en) | Subject recognition device | |
Dockstader et al. | Tracking multiple objects in the presence of articulated and occluded motion | |
US20080198237A1 (en) | System and method for adaptive pixel segmentation from image sequences | |
JP7316236B2 (en) | Skeletal tracking method, device and program | |
Yun et al. | Unsupervised moving object detection through background models for ptz camera |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20060508 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LU MC NL PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL HR LT LV MK YU |
|
17Q | First examination report despatched |
Effective date: 20060801 |
|
DAX | Request for extension of the european patent (deleted) | ||
RTI1 | Title (correction) |
Free format text: OBJECT TRACKING IN VIDEO IMAGES |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20100129 |