WO2010019024A2

WO2010019024A2 - Method and system for tracking and tagging objects

Info

Publication number: WO2010019024A2
Application number: PCT/MY2009/000116
Authority: WO
Inventors: Kim Meng Liang
Original assignee: Mimos Berhad
Priority date: 2008-08-13
Filing date: 2009-08-13
Publication date: 2010-02-18
Also published as: WO2010019024A3; MY152566A

Abstract

A method and an automated system for tracking and tagging objects, wherein each object is tracked and tagged as a motion block. The method (100) includes detecting a plan view and a lateral view of the motion blocks in a current frame (102) to identify occlusion of the motion blocks in the current frame (104), extracting color information from motion blocks in the current frame (108) to identify matching color information between motion blocks in the current frame and all motion blocks in previous frames (110) and assigning a tag to the motion blocks in the current frame (112). The automated system includes a first video camera to detect the plan view (200) of the motion blocks in the current frame and a second video camera to detect the lateral view (208) of the motion blocks in a current frame, a processor comprising means of identifying occlusion of the motion blocks in the current frame, means of extracting color information from the motion blocks in the current frame to identify matching color information between the motion blocks in the current frame and all motion blocks in previous frames and means of assigning a tag to the motion blocks in the current frame, and a data storage system.

Description

METHOD AND SYSTEM FOR TRACKING AND TAGGING OBJECTS

FIELD OF INVENTION

The present invention relates to a method and system for tracking and tagging objects.

BACKGROUND ART

Object tracking and tagging is an important part of video surveillance and video analysis systems. Object tracking and tagging is best described as tracking the motion of an object of interest by consistently assigning tags to the object throughout consecutive video frames of a scene.

Object tracking and tagging is highly complicated especially when the object of interest has an irregular non-rigid shape, moves in unpredictable speeds and direction, and is located in a highly crowded area. Additionally, object tracking and tagging is made more complicated when it involves partial and full occlusion of object-to-object and object-to- surroundings. Several surroundings factors that pose a challenge to object tracking and tagging are changes in the weather and lighting conditions and the changes in the appearance of the surrounding area. Limitations on the apparatus used in object tracking and tagging contribute a fair amount of challenges such as introduction of noise from the recording media and the lost of crucial object recognition information due to image projection from 3-D to 2-D in digital image processing. -3-

SUMMARY OF INVENTION

In one embodiment of the present invention is a method wherein each object is tracked and tagged as a motion block. The method includes detecting a plan view and a lateral view of the motion blocks in a current frame to identify occlusion of the motion blocks in the current frame, extracting color information from motion blocks in the current frame to identify matching color information between motion blocks in the current frame and all motion blocks in previous frames and assigning a tag to the motion blocks in the current frame.

In another embodiment of the present invention is an automated system for tracking and tagging objects. The automated system includes a first video camera to detect the plan view of the motion blocks in the current frame and a second video camera to detect the lateral view of the motion blocks in a current frame, a processor comprising means of identifying occlusion of the motion blocks in the current frame, means of extracting color information from the motion blocks in the current frame to identify matching color information between the motion blocks in the current frame and all motion blocks in previous frames and means of assigning a tag to the motion blocks in the current frame, and a data storage system.

The present invention consists of several novel features and a combination of parts hereinafter fully described and illustrated in the accompanying drawings, it being understood that various changes in the details may be made without departing from the scope of the invention or sacrificing any of the advantages of the present invention. -2-

Due to these complications and challenges, several assumptions are made in prior object tracking and tagging techniques. Amongst others, these include the assumption that occlusion does not occur and objects have predictable motion wherein an abrupt change in speed or direction of an object is not anticipated. However, these assumptions are not practical, as it does not reflect the situation of the scene as it is.

Prior object tracking and tagging methods and systems are based on several techniques. In the object modeling and continuous recognition technique, objects have to be modeled prior to tracking and tagging the same. The prediction and searching technique applies a semi-automated method and apparatus but is inefficient for tracking and tagging objects that move abruptly and objects that move at high speeds. In the mapping of segmented regions technique, objects are mapped into segments and each segment is tracked and tagged. This approach requires high computational capacity and is inefficient for tracking and tagging every object in a highly crowded area, as there will be a large quantity of segments that require tracking and tagging.

There are several other semi-automated object tracking and tagging methods and systems, whereby a user is required to manually select the object that is required to be tracked and tagged. However, this technique requires human intervention, and more often that not, this results in errors during object tracking and tagging.

BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS

For the purpose of facilitating and understanding of the present invention, there is illustrated in the accompanying drawings, from an inspection of which, when considered in connection with the following description, the invention, its construction and operation and many of its advantages would be readily understood and appreciated.

FIG. 1 is a flowchart of the method of tracking and tagging objects.

FIG. 2A is a plan view of the motion blocks detected by the first video camera.

FIG. 2B is a lateral view of the motion blocks detected by the second video camera.

FIG. 3 is a flowchart of part-based detection to identify occlusion of the motion blocks.

FIG. 4 is a flowchart of extracting color information from the motion blocks.

FIG. 5 is a flowchart of computing the average comparison score of the motion blocks.

FIG. 6 is an illustrative view of computing the average comparison score of the motion blocks.

FIG. 7 is a flowchart of assigning tags to motion blocks. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present invention relates to a method and system of tracking and tagging objects. Hereinafter, this specification will describe the present invention according to the preferred embodiments of the present invention. However, it is to be understood that limiting the description to the preferred embodiments of the invention is merely to facilitate discussion of the present invention and it is envisioned that those skilled in the art may devise various modifications and equivalents without departing from the scope of the appended claims.

The method and system of tracking and tagging objects of the present invention provides a method and an automated system to track and tag objects with irregular non- rigid shape that move in unpredictable speeds and direction, and are located in a highly crowded area. Objects are consistently tracked and automatically tagged from frame to frame in an image sequence of a scene. Each and every object in a frame is assigned with a tag and this tag is retained with that particular object throughout the successive frames in the image sequence of the scene. The tagging information obtained throughout the successive frames in the image sequence of the scene is stored for further video analysis.

The present invention describes the method and automated system of tracking and tagging objects, whereby objects throughout the successive frames in the image sequence of the scene are consistently tracked as motion blocks and these motion blocks are automatically tagged based on color information. Reference is now being made to FIG. 1. FIG. 1 is a flowchart that illustrates the method of tracking and tagging objects of the present invention. More specifically the method of tracking and tagging objects begins with tracking objects as motion blocks by detecting appropriate views of the motion blocks in a current frame (102) to identify the presence occlusion of the detected motion blocks (104). If occlusion is present, the occluded part of the motion blocks is identified (114). The objects are then tagged by extracting color information from the motion blocks (108) to identify matching color information (110) between motion blocks in successive frames in the image sequence. Using this information, the motion blocks in the current frame are assigned with respective tags (112).

The automated system of tracking and tagging objects includes video cameras to detect the appropriate views of the motion blocks throughout the successive frames in the image sequence. This information is then fed into processors that are programmed to identify the presence occlusion of the detected motion blocks. If occlusion is present, the occluded part of the motion blocks is identified. The objects are then tagged by the processor whereby the processor is programmed to extract color information from the motion blocks to identify matching color information between motion blocks in successive frames in the image sequence. Using this information, the processor then tags the motion blocks in the current frame with respective tags. The automated system of tracking and tagging objects also includes a data storage system that stores the color information of the motion blocks in successive frames in the image sequence as well as the tagging information of the motion blocks in successive frames in the image sequence for further video analysis. The configuration and arrangement of the video cameras of the automated system is crucial to ensure the effectiveness and efficiency of consistently tracking and automatically tagging objects. The video cameras are located at a specific height to have maximum coverage of the scene where the objects required for tracking and tagging are present.

In one embodiment of the present invention, the video cameras may consist of multiple pairs of video camera and each pair includes a first video camera and a second video camera. The first video camera is preferably a low-resolution camera and the second video camera is preferably a high-resolution video camera. The first video camera is located vertically above the second video camera at a predetermined distance. The predetermined distance is such that the first video camera has a top view or plan view coverage and the second video camera has a side view or lateral view coverage of the scene where the objects required for tracking and tagging are present.

In another embodiment of the present invention, the video cameras may consist of two sets of video cameras that include a first video camera and a plurality of second video cameras. The first video camera is preferably a low-resolution camera and the set of second video cameras are preferably high-resolution video cameras. The first video camera is located in the center of the scene and at a predetermined height such that it has a top view or plan view coverage of the scene where the objects required for tracking and tagging are present. The set of second video cameras are located along the circumference of the scene and at a predetermined distance from one another such that collectively, all the video cameras in the set of second video cameras have complete side view or lateral view coverage of the scene where the objects required for tracking and tagging are present.

Reference is now being made to FIGs. 2A and 2B. FIG. 2A illustrates the plan view of the motion blocks in the current frame detected by the first video camera. FIG. 2B illustrates the lateral view of the motion blocks in the current frame detected by the second video camera. The video cameras detect moving objects throughout the successive frames in the image sequence of the scene as motion blocks by examining the change of intensity in each pixel in the current frame as compared to the original intensity in the background frame. Pixels with high change of intensity are group together to form a motion block. Each motion block represents an object that appears in the scene. Motion blocks that have a small area or a limited life span are removed, as these motion blocks may represent noise introduced by the video cameras as well as noise introduced by changes in lighting condition of the scene. Motion blocks that have a substantially large area may represent occluded objects, whereby the part of the occluded motion blocks are identified by the processor.

The first video camera detects the plan view (200) of the motion blocks in the current frame and the second video camera detects the lateral view (208) of the motion blocks in the current frame. The information pertaining to the plan view (200) and the lateral view (208) of the motion blocks from the first and second video camera is fed into the processor to identify the presence of occluded motion blocks in the current frame.

The processor is programmed to designate a number of regions in the plan view (200) that correspond to a number of regions in the lateral view (208) of the current frame.

With reference to FIGs. 2A and 2B, the region (202) in the plan view corresponds to the region (210) in the lateral view, the region (204) in the plan view corresponds to the region (212) in the lateral view and the region (206) in the plan view corresponds to the region (214) in the lateral view. The number of corresponding regions is based on the requirement of the automated system in terms of the level of accuracy required to identify the presence of occluded motion blocks in the current frame.

The presence of occluded motion blocks in the current frame is detected using the corresponding regions (202:210, 204:212, 206:214) in the plan view and the lateral view of the current frame. If a region in the plan view (200) contains more than one motion block and the corresponding region in the lateral view (208) contains a lesser number of motion blocks than that of the region in the plan view (200), therefore occlusion is present. If a region in the plan view (200) contains more than one motion block and the corresponding region in the lateral view (208) contains an equal number of motion blocks than that of the region in the plan view (200), therefore occlusion is not present.

For example, the region (202) in the plan view contains two motion blocks, whereas the corresponding region (210) in the lateral view contains only one overlapping motion block. Therefore, the motion block in the region (210) of the lateral view of the current frame is occluded with two motion blocks. On the other hand, the region (204) in the plan view contains two motion blocks, whereas the corresponding region (212) in the lateral view also contains two motion blocks. Therefore, the two motion blocks in the region (212) of the lateral view of the current frame are not occluded. Reference is now being made to FIG. 3. Where occlusion is present, part-based object detection (300) is used to identify the part of the occluded motion block prior to extracting color information from the motion blocks in the current frame. The processor is programmed to use part-based object detection (300) to identify the part of the occluded motion block. FIG. 3 is a flowchart that illustrates the steps of part-based object detection (300). Part-based object detection (300) begins with edge detection (302) where an edge map is generated, curve detection (304) where the edge map is utilized to generate a curve map, part detection (306) where the parts of the occluded motion block is characterized and finally, part grouping (308) where the characterized parts are grouped to form a complete occluded motion block.

In edge detection, the edges or prominent points of the part of the occluded motion block are detected using edge detection technique. This technique generates an edge map of the part of the occluded motion block. These edges are linked to generate curves that pass through all edges or prominent points of part of the occluded motion block in the edge map. These curves are then merged to form a curve map that represents the outline of the part of the occluded motion block. The curve map is utilized to characterize the part of the occluded motion block where the curve map is compared against several part models.

Part models are predetermined models of various types of objects and each part model contains several parts of an object represented as part curve maps. For example, a part model of a human contains the several parts of the human body that is the head, upper body, lower body, hands and legs. Respective part curve maps represent each of these parts. The curve map that represents the outline of the part of the occluded motion block is compared against the part curve maps of several part models to identify the various parts that form the curve map. All the identified part curve maps are then topological^ grouped to form the complete part of the occluded motion block. The capability of identifying the complete part of the occluded motion block depends on the number of part models made available in the automated system.

Upon identification of the complete part of the occluded motion block, the lateral view (208) of the motion blocks in the current frame is used to extract color information from the motion blocks in the current frame. Reference is now being made to FIG. 4. The processor is programmed to extract color information from the motion blocks in the current frame using cluster color extraction (400).

The color information is extracted based on luminance and chrominance measures. This enables extraction of color information from monochromatic and colored objects during the day as well as at night. The extracted color information is then used to identify matching color information between motion blocks in the current frame and all motion blocks in the previous frame in order to assign tags to the motion blocks in the current frame.

FIG. 4 is a flowchart that illustrates the steps of cluster color extraction (400). Each motion block in the current frame is segmented into areas of almost similar color known as clusters (402). For each cluster of the motion block, color information is then derived (404). This color information is known as cluster color information. The cluster color information is computed using color quantization and it consists of a fixed number of square bins in a 3-D color cube. The number of square bins is based on the requirement of the automated system in terms of the level of accuracy required to identify matching color information between the motion blocks in the current frame and all motion blocks in the previous frames, which includes all motion blocks in the previous frame and any motion blocks that had left the scene.

Reference is now being made to FIG. 5. The processor is programmed to identify matching color information between motion blocks in the current frame and all motion blocks in the previous frames, which includes all motion blocks in the previous frame and any motion blocks that had left the scene, using weighted cluster-based matching (500).

FIG. 5 is a flowchart that illustrates the steps of weighted cluster-based matching (500) between two motion blocks. Weighted cluster-based matching (500) begins with comparing the cluster color information of a cluster of the motion block in the current frame with the cluster color information of clusters in all motion blocks in the previous frames (502), which includes all motion blocks in the previous frame and any motion blocks that had left the scene. This is repeated for every cluster of the motion block in the current frame. For each comparison made, the processor computes a respective comparison score (504). The comparison score for each of the clusters of the motion block in the current frame is stored in the data storage system. The processor then identifies the highest comparison score of each cluster in the current frame.

Prior to computing an average comparison score of the motion blocks in the current frame, the processor assigns a predetermined weight for each cluster of the motion block in the current frame (506). The predetermined weight is assigned based on the location of the cluster in the motion block. The predetermined weight assigned for each cluster of the motion block in the current frame is stored in the data storage system.

The processor then computes the average comparison score of the motion blocks in the current frame using the comparison score of the clusters of the motion block in the current frame and the predetermined weight assigned for the clusters of the motion block (508) stored in the data storage system.

Reference is now being made to FIG. 6. FIG. 6 illustrates the steps of computing the average comparison score of the motion blocks in the current frame by comparing the cluster color information of all clusters of the motion blocks in the current frame with the cluster color information of all clusters of the motion blocks in the previous frames, which includes all motion blocks in the previous frame and any motion blocks that had left the scene. The motion block in the current frame (600) is segmented into three clusters (602, 604, 606). The corresponding motion block in the previous frame (608) is also segmented into three clusters (610, 612, 614).

The cluster color information of the first cluster (602) of the motion block in the current frame (600) is compared with the cluster color information of all three clusters (610, 612, 614) of the motion block in the previous frame (608). The processor computes a comparison score for each of the three comparisons made. This is repeated for second cluster (604) and the third cluster (606) of the motion block in the current frame (600), wherein the cluster color information of the second cluster (604) and the third cluster (606) of the motion block in the current frame (600) are respectively compared with the cluster color information of all three clusters (610, 612, 614) of the motion block in the previous frame (608). The comparison scores for each of the three clusters (602, 604, 606) of the motion block in the current frame (600) are stored in the data storage system.

Based on the computed comparison scores, the processor then identifies the highest comparison score (A) for cluster (602) of the motion block in the current frame (600). This is repeated for the second cluster (604) and the third cluster (606) of the motion block in the current frame (600) respectively, wherein the processor identifies the highest comparison score (B) for cluster (604) and the highest comparison score (C) for cluster (606) of the motion block in the current frame (600).

The processor assigns a predetermined weight for each of the three clusters (602, 604, 606) of the motion block in the current frame (600) and the predetermined weight assigned is stored in the data storage system.

The processor then computes the average comparison score of the motion block in the current frame (600) using the highest comparison scores (A, B, C) of the clusters (602, 604, 606) of the motion block in the current frame (600) and the predetermine weight assigned for the clusters (602, 604, 606) of the motion block in the current frame (600).

Reference is now being made to FIG. 7. Once the average comparison score of the motion block in the current frame is computed, the processor then assigns a tag to the motion blocks in the current frame (700). FIG. 7 is a flowchart that illustrates the steps of tagging the motion blocks in the current frame.

The processor tags the motion blocks in the current frame with either a tag similar to that of the previous frames, including tags of motion blocks that had left the scene or a new tag. The decision to retain a tag or assign a new tag is dependent on the average comparison score computed for the motion block in the current frame and the corresponding average comparison score computed for the motion block in the previous frames, which includes all motion blocks in the previous frame and any motion blocks that had left the scene.

If a motion block in the previous frames, which includes all motion blocks in the previous frame and any motion blocks that had left the scene is tagged as N, and the motion block in the current frame has an average comparison score that is higher than a predetermined threshold of the motion block in the previous frames, then the motion block in the current frame will be assigned with the same tag, N (704). On the other hand, if a motion block in the previous frames, which includes all motion blocks in the previous frame and any motion blocks that had left the scene is tagged as N, and the motion block in the current frame has an average comparison score that is lower than the predetermined threshold of the motion block in the previous frames, then the motion block in the current frame will be assigned with a new tag (706).

The tracking and tagging method as described is repeated for all motion blocks throughout the successive frames in the image sequence of the scene. The tagging information of the motion blocks throughout the successive frames in the image sequence is stored in the data storage system for further video analysis.

Claims

1. A method (100) for tracking and tagging at least one object, wherein each object is tracked and tagged as a motion block, the method comprises:

detecting a plan view and a lateral view of at least one motion block in a current frame (102) to identify occlusion of the at least one motion block in the current frame (104);

extracting color information from the at least one motion block in the current frame (108) to identify matching color information between the at least one motion block in the current frame and all motion blocks in previous frames (110); and

assigning a tag to the at least one motion block in the current frame (112).

2. The method according to claim 1 , wherein identifying occlusion of the at least one motion block in the current frame comprises:

designating a plurality of regions (202, 204, 206) in the plan view (200) to a plurality of corresponding regions (210, 212, 214) in the lateral view (208) of the current frame and;

detecting occlusion of the at least one motion block in the current frame using the designated plurality of regions, wherein occlusion of the at least one motion block in the current frame is present if a region in the plan view (200) contains more than one motion block and a corresponding region in the lateral view (208) contains a lesser number of motion blocks than that of the region in the plan view (200).

3. The method according to claim 2, wherein if occlusion of the at least one motion block in the current frame is present, part-based object detection (300) is used to identify at least one part of the at least one occluded motion block.

4. The method according to claim 3, wherein part-based object detection (300) comprises:

edge detection (302), wherein an edge map of the at least one occluded motion block is generated;

curve detection (304), wherein a curve map is generated from the edge map;

part detection (306), wherein the curve map is compared against at least one part model to identify the at least one part of the at least one occluded motion block; and

part grouping (308), wherein the at least one part is grouped to form the at least one occluded motion block.

5. The method according to claim 1 , wherein extracting color information from the at least one motion block in the current frame (108) comprises:

segmenting each motion block of the at least one motion block in the current frame into at least one cluster (402); and deriving color information for each cluster of the at least one cluster of the at least one motion block in the current frame (404).

6. The method according to claim 5, wherein color information is derived using color quantization.

7. The method according to claim 1 and 5, wherein identifying matching color information between the at least one motion block in the current frame and all motion blocks in the previous frames (110) is determined using weighted cluster- based matching (500), comprises:

comparing the color information of the at least one cluster in the current frame to the color information of each cluster of the at least one cluster in the previous frames (502);

computing and storing a comparison score for each cluster of the at least one cluster in the current frame (504);

assigning a predetermined weight for each cluster of the at least one cluster in the current frame (506); and

computing an average comparison score of the at least one motion block in the current frame using the comparison score and the predetermine weight assigned for each cluster of the at least one cluster of the at least one motion block in the current frame (508).

8. The method according to claim 1 and 7, wherein assigning the tag to the at least one motion block in the current frame (112) comprises:

assigning the tag to the at least one motion block in the current frame that is similar to that of the tag of the at least one motion block in the previous frames if the average comparison score of the at least one motion block in the current frame is higher than a predetermined threshold of the at least one motion block in the previous frames (704); and

assigning a new tag to the at least one motion block in the current frame if the average comparison score of the at least one motion block in the current frame is lower than the predetermined threshold of the at least one motion block in the previous frames (706).

9. An automated system for tracking and tagging at least one object, wherein each object is tracked and tagged as a motion block, the automated system comprising:

at least one video recording device to detect a plan view (200) and a lateral view (208) of at least one motion block in a current frame;

at least one processor, wherein the at least one processor comprises:

means of identifying occlusion of the at least one motion block in the current frame;

means of extracting color information from the at least one motion block in the current frame to identify matching color information between the at least one motion block in the current frame and all motion blocks in previous frames and means of assigning a tag to the at least one motion block in the current frame.

and at least one data storage system.

10. The automated system according to claim 9, wherein the at least one video recording device comprises

a first video camera to detect the plan view (200) of the at least one motion block in the current frame; and

a second video camera to detect the lateral view (208) of the at least one motion block in the current frame.

11. The automated system according to claim 10, wherein the first video camera is located above the second video camera.

12. The automated system according to claim 9, wherein the means of identifying occlusion of the at least one motion block in the current frame comprises:

means of designating a plurality of regions in the plan view to a plurality of corresponding regions in the lateral view of the current frame and;

means of detecting occlusion of the at least one motion block in the current frame using the designated plurality of regions, wherein occlusion of the at least one motion block in the current frame is present if a region in the plan view contains more than one motion block and a corresponding region in the lateral view contains a lesser number of motion blocks than that of the region in the plan view.

13. The automated system according to claim 12, wherein the at least one processor comprises means of part-based object detection to identify at least one part of the at least one occluded motion block if occlusion of the at least one motion block in the current frame is present.

14. The automated system according to claim 13, wherein the means of part-based object detection comprises:

means of edge detection, wherein an edge map of the at least one occluded motion block is generated;

means of curve detection, wherein a curve map is generated from the edge map;

means of part detection, wherein the curve map is compared against at least one part model to identify the at least one part of the at least one occluded motion block; and

means of part grouping, wherein the at least one part is grouped to form the at least one occluded motion block.

15. The automated system according to claim 9, wherein the means of extracting color information from the at least one motion block in the current frame comprises: means of segmenting each motion block of the at least one motion block in the current frame into at least one cluster; and

means of deriving color information for each cluster of the at least one cluster of the at least one motion block in the current frame.

16. The automated system according to claim 15, wherein color information is derived using color quantization.

17. The automated system according to claim 9 and 15, wherein the means of identifying matching color information between the at least one motion block in the current frame and all motion blocks in the previous frames is determined using weighted cluster-based matching, comprises:

means of comparing the color information of the at least one cluster in the current frame to the color information of each cluster of the at least one cluster in the previous frames;

means of computing and storing a comparison score for each cluster of the at least one cluster in the current frame;

means of assigning a predetermined weight for each cluster of the at least one cluster in the current frame; and

means of computing an average comparison score of the at least one motion block in the current frame using the comparison score and the predetermine weight assigned for each cluster of the at least one cluster of the at least one motion block in the current frame.

18. The automated system according to claim 9 and 17, wherein the means of assigning the tag to the at least one motion block in the current frame comprises:

means of assigning the tag to the at least one motion block in the current frame that is similar to that of the tag of the at least one motion block in the previous frames if the average comparison score of the at least one motion block in the current frame is higher than a predetermined threshold of the at least one motion block in the previous frames; and

means of assigning a new tag to the at least one motion block in the current frame if the average comparison score of the at least one motion block in the current frame is lower than the predetermined threshold of the at least one motion block in the previous frames.