WO2000018128A1 - Systeme et procede de segmentation d'objet video semantique - Google Patents

Systeme et procede de segmentation d'objet video semantique Download PDF

Info

Publication number
WO2000018128A1
WO2000018128A1 PCT/US1999/022264 US9922264W WO0018128A1 WO 2000018128 A1 WO2000018128 A1 WO 2000018128A1 US 9922264 W US9922264 W US 9922264W WO 0018128 A1 WO0018128 A1 WO 0018128A1
Authority
WO
WIPO (PCT)
Prior art keywords
regions
video
segmenting
region
image data
Prior art date
Application number
PCT/US1999/022264
Other languages
English (en)
Inventor
Shih-Fu Chang
Di Zhong
Original Assignee
The Trustees Of Columbia University In The City Of New York
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Trustees Of Columbia University In The City Of New York filed Critical The Trustees Of Columbia University In The City Of New York
Priority to JP2000571664A priority Critical patent/JP2002525988A/ja
Priority to AU62654/99A priority patent/AU6265499A/en
Publication of WO2000018128A1 publication Critical patent/WO2000018128A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/20Contour coding, e.g. using detection of edges
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/001Model-based coding, e.g. wire frame
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/537Motion estimation other than block-based

Definitions

  • the present invention relates generally to digital image processing and more particularly relates to systems and methods for performing object segmentation and image tracking in digital video image sequences.
  • Object segmentation and tracking is a fundamental step for many digital video applications, such as object-based coding used in MPEG-4; content-based video indexing, for example as described in S. F. Chang, et al., "VideoQ: An Automated Content-Based Video Search System Using Visual Cues", ACM 5th Multimedia Conference, Seattle, WA, Nov. 1997; video editing and authorization.
  • object-based coding used in MPEG-4
  • content-based video indexing for example as described in S. F. Chang, et al., “VideoQ: An Automated Content-Based Video Search System Using Visual Cues", ACM 5th Multimedia Conference, Seattle, WA, Nov. 1997
  • video editing and authorization There have been several attempts in this field at decomposing images into regions with uniform features. See, for example, M.P. Dubuisson et al., "Contour Extraction of Moving Objects in Complex Outdoor Scenes," 14 Int'l J. of Computer Vision 83-105 (1995); Nikhil R. Pal et
  • Pattern Analysis & Machine Intelligence (1991); and certain stochastic models, for example as described by Geman, "Stochastic relaxation, Gibbs distributions and the Bayesian Restoration of Images", PAMI-6 IEEE Trans. Pattern Analysis and Machine Intelligence 721-741 (1984). These various approaches extract homogeneous image regions based on visual features such as color, edge or their combinations.
  • image segmentation techniques are context independent and can be adapted to various applications.
  • video object tracking techniques are usually directed to meaningful objects (e.g. walking people, moving cars) and are domain specific. As uniform motion often indicates a semantic object in many applications, motion is widely used as the main feature in object segmentation and tracking techniques.
  • region-based tracking algorithms which combine common image segmentation techniques with motion estimation methods. For example, techniques described in the Dubuisson et al. article, Chuang Gu et al., “Semantic Video Object Segmentation and Tracking Using Mathematical Morphology and Perspective Motion Model,” ICIP'97, and in Kui Zhang et al., “Motion Based Image Segmentation for Video Coding," Proc. Int'l Conf. on Image Processing, Washington (ICIP'95) pp.476-479, exploit more region features than feature points, segments or boundary based tracking, and are more robust in handling real-world problems such as occlusion.
  • the Dubuisson et al. article proposes an approach to combine motion segmentation using image subtraction with static color segmentation using the split-and-merge paradigm. Color segmentation regions are examined against motion masks to form the final object mask based on certain confidence measures.
  • the Zhang et al. article proposes an algorithm to match edge detection and line approximation results with motion segmentation, and to determine the final object boundary.
  • the Gu et al. article proposes a semantic object tracking system using mathematical morphology and perspective motion. That system uses a modified morphological watershed procedure to segment uncertain areas between the interior and exterior outlines. Flooding seeds are sampled on both interior and exterior outlines, and regions growing from interior seeds define the segmented object boundary. In the first frame, the interior outline is defined by users, and the exterior outline is obtained by dilation of the interior one. For subsequent frames, the interior and exterior outlines are created by erosion and dilation of the motion projected boundary from the previous frame.
  • region-based techniques may achieve satisfactory results for a specific class of video content, i.e., video which includes only rigid objects exhibiting simple motion.
  • these techniques all track a single contour or video object, ignoring possibly complex components and motions within the semantic object.
  • a semantic object usually contains several parts of differing motion, some of which are non-rigid and/or exhibit rapid motion changes.
  • a single motion model is generally inadequate to track a semantic object.
  • the present invention combines an innovative method for combining low level automatic region segmentation and tracking with an active method for defining and tracking video objects at a higher level.
  • a semantic object is modeled as a set of regions with corresponding spatial and visual features.
  • the present invention directly links the semantic object to its underlying regions and is able to handle many real-world situations, such as complex objects, fast and/or intermittent motion, complicated backgrounds, multiple moving objects and partial occlusion, which cannot be satisfactorily modeled using existing techniques.
  • the instant technique can be carried out in two stages: an initial object segmentation stage where user input at the starting frame is used to create a semantic object with underlying homogeneous regions; and an object tracking stage where homogeneous regions and the semantic object are tracked through successive frames.
  • the object model is constructed through a region segmentation process and a region aggregation process.
  • the segmentation process decomposes the frame image into a set of non-overlapping regions with homogeneous visual features including color, edge and motion.
  • the aggregation process groups these regions into foreground and background objects according to the initial object boundary given by users.
  • innovative methods are deployed to map regions to the future frame as seeds for an inter-frame region tracking and segmentation process.
  • An estimated object boundary is extracted from the tracked regions and then an aggregation process is used to classify regions into foreground objects or background based on their tracks as well as feature similarity.
  • a method for segmenting and actively tracking semantic video objects from a sequence of frames of digital video information includes the steps of: receiving an initial frame of video image data and at least one succeeding frame of video image data from said sequence of frames of digital video information; defining at least one object boundary in the initial frame of video; partitioning the initial frame of video image data into a plurality of regions based on substantially homogenous video features; and tracking the object boundary and plurality of regions in the at least one succeeding frame of video image data.
  • the defining step can include receiving an approximation of the object boundary from a user.
  • the partitioning step can include edge, color and a motion field as the applicable video features.
  • An aggregation step can be performed after the partitioning step in order to classify the regions as background and foreground regions.
  • the plurality of regions do not overlap.
  • the classification of regions can be determined by a percentage of the region which resides within the object boundary.
  • the classification of regions after a first frame can also be determined by the classification of the region in a previous frame.
  • the classification of regions can also be determined by features such as color, motion and edge.
  • FIG. 1 is a system diagram of an illustrative embodiment of the present invention
  • Fig. 2 is a block diagram of an object segmentation system in accordance with a preferred embodiment of the present invention
  • Fig. 3 is a flow diagram of an iterative region merging process useful in the object segmentation process of Fig. 2;
  • Fig. 4 is a block drawing of an automatic semantic object tracking process in accordance with a preferred embodiment of the present invention.
  • Fig. 5 is an illustrative diagram showing the inter-frame pixel labeling process useful in the process of Fig. 4;
  • Fig. 6 is an illustrative drawing showing the results of the application of the present invention to five different video sequences
  • Figs. 7a and 7b are graphs plotting the average number of false pixels and missed pixels, respectively, against the number of user inputs for the video sequences shown in Fig. 6;
  • Fig. 8 is a block diagram of a computer system suitable for practicing the present invention.
  • Figure 1 illustrates an embodiment of a system which combines automatic region segmentation methods with an active method for defining and tracking video objects at a higher level.
  • the system generally contains two stages: an initial object segmentation stage, where user input 100 at the starting frame 102 is used to create a semantic object with underlying homogeneous regions 104; and an object tracking stage 110, where homogeneous regions 104 and the object are tracked through the successive frames.
  • the semantic object is generated through a region segmentation process 106 and aggregation process.
  • a region segmentation method can be applied to an area inside a slightly expanded bounding box of the user-specified object which effectively fuses color and edge features in a region-growing process that produces homogeneous color regions with accurate boundaries.
  • region aggregation procedure 108 where homogeneous regions are classified as either foreground or background to form the semantic object with an accurate boundary. Region aggregation 108 is based on the coverage of each region by the initial object boundary: regions that are covered by more than a certain percentage are grouped into the foreground object. The final contour of the semantic object is computed from foreground regions. Foreground regions belonging to the object and background regions are stored and can be tracked over time in the successive frames.
  • Tracking at both the region and object levels is the main task in the second stage of processing. Segmented regions from the previous frame are projected to the current frame using their individual affine motion models. An expanded bounding box including all projected foreground regions can then be computed. Then the area inside the bounding box can be split to homogeneous color and motion regions following a region tracking process. Unlike in other existing approaches, projected regions are not used directly as the new segmentation. Rather, the projected regions are used as seeds in another color based region growing process, which is similar to the fusing algorithm in the first stage, to track existing regions. Pixels that can not be tracked from any old regions are labeled as new regions.
  • the resulting homogeneous regions are tagged as either foreground, which are tracked from a foreground region, or background, which are tracked from a background region, or new, which are not tracked.
  • the homogeneous regions 104 are then passed to a region aggregation process 108 and are classified as either belonging to the foreground object or the background.
  • the approximated object boundary is obtained from the projected foreground regions.
  • the region aggregation process 108 can be carried out iteratively.
  • the object contour can then be computed from foreground regions and all regions are advanced to the motion projection process 112 for a succeeding frame 114 of video image data.
  • Figure 2 is a system diagram further outlining a semantic object segmentation operation at an initial frame.
  • Semantic object segmentation preferably utilizes four processes which are discussed below, i.e., object definition and threshold specification, feature map generation, region segmentation, and object composition.
  • object definition 202 and threshold specification users initially define a semantic object by using a tracing interface, such as a digital pointer, light pen, touch screen, mouse or the like.
  • the input can be a polygon whose vertices and edges are roughly placed along the desired object boundary.
  • a snake algorithm 204 such as that described in the Kass et al.
  • the snake algorithm can be based on minimizing a specific energy function associated with edge pixels.
  • Users can also choose to skip the snake module 204 if a relatively accurate outline of the object is provided. The final accurate object boundary will be obtained by the subsequent region segmentation and aggregation processes.
  • users can start the tracking process by specifying a set of thresholds. These thresholds include a color merging threshold, weights on the three color channels, a motion merging threshold and a tracking buffer size. These thresholds can be chosen based on the characteristic of a given video shot and experimental results. For example, for a video shot where foreground objects have similar luminance with background regions, users may put a lower weight on the luminance channel. Users can perform the tracking process over a small number of frames using predetermined default thresholds which are automatically generated by the system, and then adjust the thresholds based on the segmentation and tracking results. The user can stop the tracking process at any frame, modify the object boundary that is being tracked and then restart the tracking process from the modified frame.
  • thresholds include a color merging threshold, weights on the three color channels, a motion merging threshold and a tracking buffer size. These thresholds can be chosen based on the characteristic of a given video shot and experimental results. For example, for a video shot where foreground objects have similar luminance with background regions, users may put
  • a slightly extended (-15 pixels) bounding box 206 surrounding the arbitrarily shaped object is computed.
  • the following segmentation procedures are fulfilled inside this bounding box.
  • the second process involves the creation of three feature maps, i.e., an edge map 208, color map 210 and a motion field 212, within the bounding box from the original frame image.
  • the color map is the major feature map in the following segmentation module 214.
  • a process to generate the color map 210 can include the following steps.
  • the original image is converted into CIE L*u*v* color space. It is well known that perceptually non-uniform color spaces such as RGB are not well suited for color segmentation, as distance measure in these spaces is not proportional to perceptual difference.
  • L*u*v* is a perceptually uniform color space which divides color into one luminance channel (L*) and two chrominance channels (u* and v*). This channel separation also allows different weights to be placed on luminance and chrominance distance measures. The color difference is thus given by the weighted Euclidean distance in the three dimensional space, shown in equation (1):
  • the L*u*v* image is preferably simplified and smoothed to remove possible noise as well as minor details for the purpose of region merging. This can be accomplished by an adaptive quantization and a median filtering process. Quantization can be performed to produce homogeneous regions in which pixels having similar colors are mapped into a common bin. To quantize an image into a very limited number of colors (e.g., 32 or 16 bins), common fixed-level quantization methods can not always preserve color information under variant situations. Thus, the use of a clustering based method, such as for example a K-means technique, to analyze the input image and determine an adaptive quantization scale in the L*u*v* space is preferred.
  • the edge map 208 is a binary mask wherein edge pixels are distinguished from non-edge pixels, such as by setting the edge pixels to 1 and setting the non-edge-pixels to 0.
  • the edge map 208 can be generated by applying edge detection 214, such as the CANNY edge detection algorithm.
  • the CANNY edge detector performs 2-D Gaussian pre-smoothing on the image and then takes directional derivatives in the horizontal and vertical directions. These derivatives are used to calculate a gradient. Local gradient maxima can then be defined as candidate edge pixels.
  • This output is preferably run through a two-level thresholding synthesis process to produce the final edge map 208.
  • a simple algorithm can be used to select the two threshold levels in the synthesis process based on a histogram of the edge gradient.
  • the motion field 212 can be generated by a hierarchical block matching algorithm, such as described in M. Bierling, "Displacement Estimation by Hierarchical Block Matching", 1001 SPIE Visual Communication & Image Processing (1988) which is hereby incorporated by reference.
  • This algorithm uses distinct sizes of measurement windows for block matching at different grid resolutions, and thus is able to generate a homogeneous motion field that is closer to the true motion than block based estimation algorithms.
  • a 3-level hierarchy, such as suggested in Bierling, is appropriate for use in the instant process.
  • the region segmentation process is developed based on the three feature maps: color map 210, edge map 208 and motion field 212. Because the color map 210 and edge map 208 are complementary, i.e., the former captures low frequency information (means) while the latter captures high-frequency details (edges) of an image, fusion of these maps greatly improves segmentation results.
  • a color-based pixel labeling process is applied to the color map. Labeling is a process where one label is assigned to a group of neighboring pixels which exhibit substantially the same color.
  • non-edge pixels i.e., pixels that are set to 0 in the edge mask
  • Edge pixels remains un-labeled.
  • This process generates an initial group of regions (i.e., pixels with the same label) as well as their connection graph. Two regions are linked as neighbors if pixels in one region has neighboring pixels (4-connection) in another region. As edge-pixels are not labeled, two regions separated by edge pixels are not linked as neighboring regions.
  • the color merging process is an iterative spatial-constrained clustering process.
  • Color distances (Eq. 1) between two connected regions are computed in block 302. Two connected regions are merged if the color distance between them is 1 ) smaller than the given color threshold; and 2) the local minimal, i.e., it is smaller than all other color distances associated with the two regions.
  • the mean color of the new region can be computed (block 306) by taking a weighted average of the mean colors of the two old regions m consult m 2 according to the equation:
  • the region connections are also updated for all neighbors of the two old regions (block 308).
  • the new region can be assigned a label (block 310), such as the label of the larger one of the two merged regions.
  • the two old regions can then be dropped (block 312).
  • the color merging process can be iterated until color distances between every two connected regions are above the color threshold. As edge pixels are not labeled, two regions separated by edge pixels are not connected as neighbors. Thus, the growth of each region is naturally stopped at its edge pixels. After the color merging process is complete, the edge pixels can simply be assigned to their neighboring regions with the smallest color distances.
  • a motion-based segmentation using the dense optical flow can be applied to the segmented color regions to check the uniformity of the motion distribution.
  • a similar iterative spatially constrained clustering process can also be used to group pixels inside a color region according to their motion vectors and the given motion threshold. Because an objective is to track the object, this process need only be applied to those regions that intersect with the initial object boundary.
  • the region aggregation module 220 receives homogeneous regions from the segmentation module 214 and the initial object boundary from the snake module 204 or user input 220 directly.
  • Region aggregation at the starting frame is relatively simple compared with that for the subsequent frames, as all regions are newly generated (not tracked) and the initial outline is usually not far from the real object boundary.
  • a region is classified as foreground if more than a certain percentage (e.g., 90%) of the region is included by the initial object outline. On the other hand, if less than a certain percentage (e.g., 30%) of a region is covered by the initial object outline, the region is considered as background. Regions between the low and high thresholds are split into foreground and background regions according to the intersection with the initial object outline.
  • affine estimation 222 is performed.
  • Affine motion parameters of all regions, including both foreground and background, can be estimated by a multivariate linear regression process over the dense optical flow inside each region.
  • a two dimensional affine model with six parameters, as set forth in Equation 3 can be used as a good first-order approximation of a distant object undergoing three dimensional translation and linear deformation:
  • An inter-frame segmentation process can be employed to segment a frame into homogeneous regions. Unlike in the starting frame where all regions are tagged as new, these regions are classified in the inter-frame segmentation process as either foreground, background or new, according to their relationship with the existing foreground and background regions of the previous frame.
  • the estimated, or projected, object boundary can then be used to group the tagged regions into the object and background. For regions around the object boundary with the tag "new," their visual features are also examined to determine whether they belong to the object or not.
  • a motion projection module 402 segmented regions from the previous frame, including both foreground and background region, are projected onto the current frame (virtually) using their individual affine motion models. Projected regions keep their labels and original classifications. For video shots with static or homogeneous background (i.e. only one moving object), users can choose not to project background regions to save time and processing overhead. An expanded bounding box of all projected foreground regions can be computed. Similarly, the following segmentation and aggregation processes are generally applied only to the area inside the bounding box.
  • the edge map 404, color map 406 and motion field 408, generally utilize the same methods as described above.
  • the existing color palette computed at the starting frame is directly used to quantize the current frame.
  • the color consistency of segmented regions between successive frames is enhanced which improves the performance of region based tracking.
  • object tracking is limited to single video shots in which there is no abrupt scene change, using one color palette is generally valid.
  • a new quantization palette can be generated automatically, such as when a large quantization error is encountered.
  • three feature maps are preferably fused to track existing regions and segment the current frame.
  • an inter-frame labeling process is performed where non-edge pixels are sequentially labeled, such as by labeling the pixels one by one from left to right and top to bottom.
  • a pixel has the same color as that of its labeled neighboring pixels, it is assigned the label of its neighbors.
  • its color is compared with all projected regions that cover its coordinate at the current frame. If the pixels's color distance to the closest region is under the given color threshold, this pixel is "tracked," and assigned the label and classification (foreground or background) of the closest projected region. Otherwise, a new label is generated and assigned to the pixel. Regions identified by this new label are classified as "new" regions.
  • interframe color merge, edge labeling, motion split and small region elimination processes which take place in the tracking module 410 are similar to those previously described, with some additional constraints. Foreground or background regions tracked from the previous frame are allowed to be merged only with regions of the same class or new regions. Merging of the foreground and background regions is forbidden. New regions can be merged with each other or merged with foreground/background regions. When a new region is merged with a tracked region, the merging result inherits the label and classification from the tracked region. In motion segmentation, split regions remain in their original classes. After the inter-frame tracking process 410, a list of regions temporarily tagged as either foreground, background, or new is obtained.
  • the temporarily tagged regions are then passed to an iterative region aggregation module 412.
  • the region aggregation module 412 receives two inputs: the homogeneous regions from tracking module 410 and the estimated object boundary from the motion projection module 402.
  • the object boundary can be estimated from projected foreground regions. Foreground regions from the previous frame are projected independently of one another and the combination of projected regions forms the mask of the estimated object.
  • the mask can be refined using a morphological closing operation (i.e. dilation followed by erosion) with a size of several pixels in order to close tiny holes and smooth boundaries. To tolerate motion estimation error, the result can be further dilated for the tracking buffer size, which is specified by users at the beginning of the tracking process.
  • the region aggregation module 412 implements a region grouping and boundary alignment algorithm based on the estimated object boundary as well as the edge and motion features of the region.
  • An exemplary algorithm includes the following steps:
  • step 2 For every segmented region, do step 2 to step 8.
  • step 8 If the region is tagged as background, keep it as background. Go to step 8.
  • step 3 Compute an intersection ratio of the region with the object mask. Go to step 5 if the region is tagged as new.
  • the region foreground
  • a given threshold such as about 80%
  • the region belongs to the semantic object. Otherwise, the region is intersected with the object mask and split: a) split regions inside the object mask are still kept as foreground b) split regions outside the object mask are tagged as new; go to step 8.
  • step 8 If the region (new) is covered by the object mask by less than a threshold, such as about 30%, keep it as new. Go to step 8.
  • a threshold such as about 30%
  • step 6 If the region is covered by the object mask by more than a threshold, such as about 80%, classify it as foreground. Go to step 8. 7. Compute numbers of edge pixels (using edge map) between this region and the current background and foreground regions. Compute differences between the mean motion vector of this region with those of its neighboring regions and find the closest motion neighbor.
  • a threshold such as about 80%
  • the region is separated from background regions by more edge pixels than foreground regions ( or this region is not connected to any background regions) and its closet motion neighbor is a foreground region, intersect it with the object mask and split: a) split regions inside the object mask are classified as foreground b) split regions outside the object mask are tagged as new. Otherwise, keep the region as new. 8. Advance to the next region. Go to step 2.
  • the object mask refers to the estimated object projection.
  • a relatively lower ratio e.g., 80%
  • 80% can be used to include a foreground or new region. This is to improve handling of motion projection errors.
  • the above aggregation and boundary alignment process can be iterated multiple times. This is useful in correcting errors which can result from the estimation of rapid motion. At the end of the last iteration, all remaining new regions are classified into background regions.
  • affine models of all regions can be estimated by a linear regression process over the optical flow. As described before, these affine models are used to project regions into a future frame in the motion projection module 112.
  • FIG. 8 is a block diagram which illustrates the essential operational components of a system for practicing the image segmentation and tracking methods described herein.
  • the system can take the form of a personal computer with appropriate high speed processor and video handling peripherals, suitable digital video editing workstation and the like.
  • Such a system will generally include a processor 802 which is coupled to a video source 804.
  • the processor 802 generally includes a microprocessor, memory, control logic, and input/output circuits operatively interconnected to perform the described processing operations.
  • the video source 804 can take the form of magnetic data storage media, optical data storage media, streaming video content from a remote source and the like.
  • the input device is used to provide user input for, inter alia, establishing object boundaries and providing threshold assignments for object definition.
  • a display device 808 is also coupled to the processor 802 to provide a visual display of the digital video.
  • the present invention has shown very good segmentation and tracking results on general video sources.
  • Figures 6A-E five different types of video sequences were used to do subjective and objective evaluation of the present system.
  • the first sequence (akiyo), illustrated in Figure 6A, contains an anchor person with small motion and strong color difference with background.
  • the second sequence (foreman) illustrated in Figure 6B contains an object with relatively large motion and very similar brightness to background.
  • Figure 6C a skater having many different fast motion parts (i.e. body, arms and legs) is depicted.
  • the characteristic of the fourth sequence shown in Figure 6D is that the shape of the flying bird changes abruptly from frame to frame.
  • Figure 6E a plane is flying away from the camera.
  • FIGS. 6A-E show object tracking results of the five testing sequence after 3 user inputs.
  • segmented objects were superimposed onto gray background with random noise and played in real time for users to see whether there were observable errors.
  • boundary jitter i.e. high frequency noise
  • a temporal median filtering process was applied to the binary object masks before they were put onto the background with random noise.
  • the following user inputs bring very small improvement because only a few number of frames need to be modified.
  • the bird sequence of Figure 6D contains a small and fast moving object.
  • the abrupt shape change between successive frames causes the missing of part of the wings at some frames.
  • the new background region is falsely classified as foreground.
  • the above two types of errors can be corrected by one user input at the frame where the error starts to happen. Restarting the tracking process after the correction will correct errors in the successive frames. As shown in Figures 7A and 7B, for the foreman and skater sequence, the numbers of missing and false pixels drop rapidly after the first 2 user inputs.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Color Television Systems (AREA)

Abstract

La présente invention concerne la segmentation et le repérage actif d'objets vidéo sémantiques à partir d'une séquence de trames d'information vidéo numérique consistant à recevoir une trame initiale de données d'image vidéo et au moins une deuxième trame de données d'image vidéo d'une séquence de trames d'information vidéo numérique. A partir de la première trame, au moins un contour d'objet est défini par un utilisateur. La trame initiale de données d'image vidéo est alors partagée en une pluralité de régions sur la base de caractéristiques vidéo choisies, pratiquement homogènes. Le repérage est alors effectué sur le contour de l'objet et la pluralité de régions qui sont projetés sur au moins une deuxième trame des données d'image vidéo.
PCT/US1999/022264 1998-09-24 1999-09-24 Systeme et procede de segmentation d'objet video semantique WO2000018128A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2000571664A JP2002525988A (ja) 1998-09-24 1999-09-24 意味的映像オブジェクト分割のためのシステムおよび方法
AU62654/99A AU6265499A (en) 1998-09-24 1999-09-24 System and method for semantic video object segmentation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10167598P 1998-09-24 1998-09-24
US60/101,675 1998-09-24

Publications (1)

Publication Number Publication Date
WO2000018128A1 true WO2000018128A1 (fr) 2000-03-30

Family

ID=22285841

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/022264 WO2000018128A1 (fr) 1998-09-24 1999-09-24 Systeme et procede de segmentation d'objet video semantique

Country Status (3)

Country Link
JP (1) JP2002525988A (fr)
AU (1) AU6265499A (fr)
WO (1) WO2000018128A1 (fr)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2426137A (en) * 2005-05-10 2006-11-15 Thomson Licensing Sa Tracking object contours in a sequence of images
US7428315B2 (en) 2001-12-03 2008-09-23 Microsoft Corporation Automatic detection and tracking of multiple individuals using multiple cues
US8027513B2 (en) 2007-03-23 2011-09-27 Technion Research And Development Foundation Ltd. Bitmap tracker for visual tracking under very general conditions
US9389767B2 (en) 2013-08-30 2016-07-12 Cyberlink Corp. Systems and methods for object tracking based on user refinement input
US9633446B2 (en) 2014-02-20 2017-04-25 Nokia Technologies Oy Method, apparatus and computer program product for segmentation of objects in media content
US9720937B2 (en) * 2008-12-22 2017-08-01 Koninklijke Philips N.V. Relevance feedback on a segment of a data object
US10121254B2 (en) 2013-08-29 2018-11-06 Disney Enterprises, Inc. Methods and systems of detecting object boundaries
US20190311480A1 (en) * 2018-04-10 2019-10-10 Facebook, Inc. Automated cinematic decisions based on descriptive models
CN110782469A (zh) * 2019-10-25 2020-02-11 北京达佳互联信息技术有限公司 一种视频帧图像分割方法、装置、电子设备及存储介质
CN112800850A (zh) * 2020-12-31 2021-05-14 上海商汤智能科技有限公司 一种视频处理方法、装置、电子设备及存储介质
EP3979199A4 (fr) * 2019-05-30 2022-08-03 Panasonic Intellectual Property Management Co., Ltd. Procédé de traitement d'image, appareil de traitement d'image et programme

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3967368B2 (ja) * 2005-06-09 2007-08-29 株式会社テクノビジュアル 動画像圧縮プログラム及び動画像伸長プログラム
JP5697989B2 (ja) * 2007-12-26 2015-04-08 コーニンクレッカ フィリップス エヌ ヴェ グラフィックスオブジェクトを重ね合わせるための画像プロセッサ

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0579319A2 (fr) * 1992-07-16 1994-01-19 Philips Electronics Uk Limited Poursuite d'un objet en mouvement
EP0587329A2 (fr) * 1992-09-05 1994-03-16 International Business Machines Corporation Système de traitement d'images
WO1998033323A1 (fr) * 1997-01-29 1998-07-30 Levent Onural Segmentation d'objets mobiles a base de regles

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0579319A2 (fr) * 1992-07-16 1994-01-19 Philips Electronics Uk Limited Poursuite d'un objet en mouvement
EP0587329A2 (fr) * 1992-09-05 1994-03-16 International Business Machines Corporation Système de traitement d'images
WO1998033323A1 (fr) * 1997-01-29 1998-07-30 Levent Onural Segmentation d'objets mobiles a base de regles

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GEIGER D ET AL: "DYNAMIC PROGRAMMING FOR DETECTING, TRACKING, AND MATCHING DEFORMABLE CONTOURS", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,US,IEEE INC. NEW YORK, vol. 17, no. 3, 1 March 1995 (1995-03-01), pages 294 - 302, XP000498121, ISSN: 0162-8828 *
GUNSEL B ET AL: "TEMPORAL VIDEO SEGMENTATION USING UNSUPERVISED CLUSTERING AND SEMANTIC OBJECT TRACKING", JOURNAL OF ELECTRONIC IMAGING,US,SPIE + IS&T, vol. 7, no. 3, 1 July 1998 (1998-07-01), pages 592 - 604, XP000771766, ISSN: 1017-9909 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7428315B2 (en) 2001-12-03 2008-09-23 Microsoft Corporation Automatic detection and tracking of multiple individuals using multiple cues
US7433495B2 (en) 2001-12-03 2008-10-07 Microsoft Corporation Automatic detection and tracking of multiple individuals using multiple cues
GB2426137A (en) * 2005-05-10 2006-11-15 Thomson Licensing Sa Tracking object contours in a sequence of images
US8027513B2 (en) 2007-03-23 2011-09-27 Technion Research And Development Foundation Ltd. Bitmap tracker for visual tracking under very general conditions
US9720937B2 (en) * 2008-12-22 2017-08-01 Koninklijke Philips N.V. Relevance feedback on a segment of a data object
US10121254B2 (en) 2013-08-29 2018-11-06 Disney Enterprises, Inc. Methods and systems of detecting object boundaries
US9389767B2 (en) 2013-08-30 2016-07-12 Cyberlink Corp. Systems and methods for object tracking based on user refinement input
US9633446B2 (en) 2014-02-20 2017-04-25 Nokia Technologies Oy Method, apparatus and computer program product for segmentation of objects in media content
US20190311480A1 (en) * 2018-04-10 2019-10-10 Facebook, Inc. Automated cinematic decisions based on descriptive models
WO2019199904A1 (fr) * 2018-04-10 2019-10-17 Facebook, Inc. Décisions automatisées basées sur des modèles descriptifs
US10511808B2 (en) 2018-04-10 2019-12-17 Facebook, Inc. Automated cinematic decisions based on descriptive models
US10523864B2 (en) 2018-04-10 2019-12-31 Facebook, Inc. Automated cinematic decisions based on descriptive models
US10659731B2 (en) 2018-04-10 2020-05-19 Facebook, Inc. Automated cinematic decisions based on descriptive models
CN112335256A (zh) * 2018-04-10 2021-02-05 脸谱公司 基于描述性模型的自动决策
US10979669B2 (en) 2018-04-10 2021-04-13 Facebook, Inc. Automated cinematic decisions based on descriptive models
EP3979199A4 (fr) * 2019-05-30 2022-08-03 Panasonic Intellectual Property Management Co., Ltd. Procédé de traitement d'image, appareil de traitement d'image et programme
CN110782469A (zh) * 2019-10-25 2020-02-11 北京达佳互联信息技术有限公司 一种视频帧图像分割方法、装置、电子设备及存储介质
CN112800850A (zh) * 2020-12-31 2021-05-14 上海商汤智能科技有限公司 一种视频处理方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
JP2002525988A (ja) 2002-08-13
AU6265499A (en) 2000-04-10

Similar Documents

Publication Publication Date Title
Zhong et al. An integrated approach for content-based video object segmentation and retrieval
Price et al. Livecut: Learning-based interactive video segmentation by evaluation of multiple propagated cues
US7783118B2 (en) Method and apparatus for determining motion in images
Zhong et al. AMOS: an active system for MPEG-4 video object segmentation
Mezaris et al. Video object segmentation using Bayes-based temporal tracking and trajectory-based region merging
JP3178529B2 (ja) オブジェクト境界検出装置及びオブジェクト境界検出方法
Fu et al. Tracking visible boundary of objects using occlusion adaptive motion snake
Erdem et al. Video object tracking with feedback of performance measures
WO2000018128A1 (fr) Systeme et procede de segmentation d'objet video semantique
Toklu et al. Simultaneous alpha map generation and 2-D mesh tracking for multimedia applications
Lin et al. Temporally coherent 3D point cloud video segmentation in generic scenes
Reso et al. Occlusion-aware method for temporally consistent superpixels
Nagahashi et al. Image segmentation using iterated graph cuts based on multi-scale smoothing
Tripathi et al. Improving streaming video segmentation with early and mid-level visual processing
Casas et al. Mutual feedback scheme for face detection and tracking aimed at density estimation in demonstrations
Guo et al. Semantic video object segmentation for content-based multimedia applications
Galmar et al. Graph-based spatio-temporal region extraction
Patras et al. Semi-automatic object-based video segmentation with labeling of color segments
Talouki et al. An introduction to various algorithms for video completion and their features: a survey
Zeng et al. Semantic object segmentation by a spatio-temporal MRF model
Jabid et al. An edge-texture based moving object detection for video content based application
Dorea et al. A motion-based binary partition tree approach to video object segmentation
Babu et al. Kernel-based spatial-color modeling for fast moving object tracking
Dorea et al. Trajectory tree as an object-oriented hierarchical representation for video
Zeng et al. Unsupervised segmentation of moving object by region-based mrf model and occlusion detection

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
ENP Entry into the national phase

Ref country code: JP

Ref document number: 2000 571664

Kind code of ref document: A

Format of ref document f/p: F

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase