WO2009067170A1 - Estimation d'un emplacement d'objet dans une vidéo - Google Patents

Estimation d'un emplacement d'objet dans une vidéo Download PDF

Info

Publication number
WO2009067170A1
WO2009067170A1 PCT/US2008/012797 US2008012797W WO2009067170A1 WO 2009067170 A1 WO2009067170 A1 WO 2009067170A1 US 2008012797 W US2008012797 W US 2008012797W WO 2009067170 A1 WO2009067170 A1 WO 2009067170A1
Authority
WO
WIPO (PCT)
Prior art keywords
picture
location
particular picture
comparing
data
Prior art date
Application number
PCT/US2008/012797
Other languages
English (en)
Inventor
Yu Huang
Joan Llach
Original Assignee
Thomson Licensing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing filed Critical Thomson Licensing
Publication of WO2009067170A1 publication Critical patent/WO2009067170A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30221Sports video; Sports image
    • G06T2207/30224Ball; Puck

Definitions

  • At least one disclosed implementation generally relates to video processing and, more particularly, to detecting and tracking an object in a series of video pictures.
  • the ball In sports videos, such as soccer videos, the ball is one of the most important objects in that it represents a large amount of information about each instant of a game. In transmitting compressed soccer videos, it is desirable to encode disproportionately more data regarding the location of the ball than the rest of the frames. Accurate detection and tracking of the ball in soccer videos is thus important for proper encoding of data prior to transmission. In addition, highlighting of a region of a picture around a ball in a display of a soccer video may be desirable. Accurate detection and tracking of the ball is also desirable in connection with displays of soccer videos. Detection and tracking of the ball have been motivated by various applications, such as event detection, tactics analysis, automatic indexing/summarization, and object-based compression.
  • a model of an object is compared with data from multiple locations in a particular picture. Based on the comparing, indicators are determined of whether the object is at one of the multiple locations in the particular picture. Based on the determined indicators and on a location of the object in a previous picture, a location is estimated of the object in the particular picture.
  • implementations may be configured or embodied in various manners. For example, an implementation may be performed as a method, or embodied as an apparatus configured to perform a set of operations, or embodied as an apparatus storing instructions for performing a set of operations, or embodied in a signal. Other aspects and features will become apparent from the following detailed description considered in conjunction with the accompanying drawings and the claims.
  • FIG. 1 illustrates a block diagram of an implementation of a state estimator.
  • FIG. 2 is a block diagram of an implementation of an apparatus for providing an estimated state.
  • FIG. 3 is a block diagram of an implementation of an apparatus for encoding and transmitting or storing picture data.
  • FIG. 4 is a block diagram of an implementation of an apparatus for enhancing and displaying picture data.
  • FIG. 5 is a block diagram of an implementation of a device for receiving encoded picture data and displaying enhanced and decoded data.
  • FIG. 6 is a process flow diagram of an implementation of a method for estimating a location of an object in a picture.
  • FIG. 7 is a process flow diagram of another implementation of a method for estimating a location of an object in a picture.
  • FIG. 8 is a process flow diagram of an implementation using a strong detector, a tracker, and a weak detector in a method for estimating a location of an object in a picture.
  • FIG. 9 is a process flow diagram of an implementation of a method performed by a weak tracker as part of a process for estimating a location of an object in a picture.
  • FIG. 10 is a block diagram of an implementation of an apparatus for transmitting and storing picture data.
  • FIG. 11 is a block diagram of an implementation of an apparatus for receiving and displaying picture data.
  • FIG. 12 is a block diagram of an implementation of an apparatus for estimating a location of an object in a picture and preparing a picture for transmission or storage.
  • FIG. 13 is a block diagram of an implementation of an apparatus for receiving and displaying a picture.
  • FIG. 14 is a process flow diagram of an implementation of a method for estimating a location of an object in a picture and encoding the picture.
  • FIG. 15 is a process flow diagram of an implementation of a method for receiving an encoded picture and processing the picture for display.
  • the implementations also referred to as embodiments) set out herein are intended as examples of particles features and aspects. Such implementations are not to be construed as limiting the scope of this application or the scope of contemplated implementations in any manner.
  • a detector which may be implemented as a processor executing a series of instructions stored in memory, may be employed to process pixel values in a picture to locate an object.
  • the detector may process each pixel in a picture by, for example, comparing a neighborhood (of pixels) around the pixel to a model or template of the object.
  • the processing requirements associated with processing each pixel value in each picture in a video are very significant.
  • a tracker that uses data from one or more pictures in a video sequence to estimate a location of an object in a subsequent picture may be employed.
  • a variety of trackers are known.
  • An exemplary type of tracker is a particle-based framework.
  • a Monte Carlo simulation is typically conducted over numerous particles.
  • the particles may represent, for example, different possible locations of an object in a frame.
  • a particular particle may be selected based on the likelihood determined in accordance with a Monte Carlo simulation.
  • a particle filter is an exemplary particle-based framework.
  • numerous particles are generated, representing possible states, which may correspond to possible locations of an object in an image.
  • a likelihood also referred to as a weight, is associated with each particle in the particle filter.
  • particles having a low likelihood or low weight are typically eliminated in one or more resampling steps.
  • a state representing an outcome of a particle filter may be a weighted average of particles, for example.
  • a challenge in implementations of trackers is accuracy in estimating a location of an object in a picture based on data from one or more prior pictures in a sequence of pictures.
  • a challenge in implementing a particle-filter based framework is accuracy in assessing likelihood of particles prior to resampling.
  • a strong detector determines an object's location in a given frame.
  • the given frame may be, for example, a first frame in a sequence or a frame occurring after a tracking algorithm has lost the object during a sequence of frames.
  • the strong detector operates by comparing pixel values to the object template.
  • a tracker determines one or more estimates of the object's location in the next frame.
  • a weak detector determines a likelihood map for various pixel locations in this "next" frame.
  • the likelihood map provides an indication (such as a probability) of whether the object is at the various pixel locations.
  • the likelihood map may cover the whole frame or, for example, areas surrounding each estimate provided by the tracker.
  • the tracker uses the likelihood map to estimate a location for the object in this "next" frame.
  • a system 100 includes a state estimator 1 10 that may be implemented, for example, on a computer.
  • the state estimator 1 10 includes a particle algorithm module 120, a local-mode module 130, and a number adapter module 140.
  • the particle algorithm module 120 performs a particle-based algorithm, such as, for example, a particle filter (PF), for estimating states of a dynamic system.
  • the local-mode module 130 applies a local-mode seeking mechanism, such as, for example, by performing a mean-shift analysis on the particles of a PF.
  • the number adapter module 140 modifies the number of particles used in the particle-based algorithm, such as, for example, by applying a Kullback- Leibler distance (KLD) sampling process to the particles of a PF.
  • KLD Kullback- Leibler distance
  • the particle filter can adaptively sample depending on the size of the state space where the particles are found. For example, if the particles are all found in a small part of the state space, a smaller number of particles may be sampled. If the state space is large, or the state uncertainty is high, a larger number of particles may be sampled.
  • the modules 120-140 may be, for example, implemented separately or integrated into a single algorithm.
  • the state estimator 110 accesses as input both an initial state 150 and a data input 160, and provides as output an estimated state 170.
  • the initial state 150 may be determined, for example, by an initial-state detector or by a manual process. More specific examples are provided by considering a system for which the state is the location of an object in an image in a sequence of digital images, such as a frame of a video. In such a system, the initial object location may be determined, for example, by an automated object detection process using edge detection and template comparison, or manually by a user viewing the video.
  • the data input 160 may be, for example, a sequence of video pictures.
  • the estimated state 170 may be, for example, an estimate of the position of a ball in a particular video picture.
  • FIG. 2 an exemplary apparatus 200 for implementing the state estimator 110 of FIG. 1 is shown.
  • the apparatus 200 includes a processing device 210 that receives initial state 150 and data input 160, and provides as output an estimated state 170.
  • the processing device 210 accesses a storage device 220, which may perform storing data relating to a particular image in a sequence of digital images.
  • the estimated state 170 may be used for a variety of purposes.
  • a system 250 includes an encoder 260 coupled to a transmit/store device 270.
  • the encoder 260 and the transmit/store device 270 may be implemented, for example, on a computer or a communications encoder.
  • the encoder 260 accesses the estimated state 170 provided by the state estimator 110 of the system 100 in FIG. 1, and accesses the data input 160 used by the state estimator 110.
  • the encoder 260 encodes the data input 160 according to one or more of a variety of coding algorithms, and provides an encoded data output 280 to the transmit/store device 270.
  • the encoder 260 uses the estimated state 170 to differentially encode different portions of the data input 160. For example, if the state represents the position of an object in a video, such as the position of a ball in a soccer video, the encoder 260 may encode a portion of the video corresponding to the estimated position of the ball using a first coding algorithm, and may encode another portion of the video not corresponding to the estimated position of the ball using a second coding algorithm.
  • the first algorithm may, for example, provide more coding redundancy than the second coding algorithm, so that the estimated position of the ball
  • the transmit/store device 270 may include one or more of a storage device or a transmission device. Accordingly, the transmit/store device 270 accesses the encoded data 280 and either transmits the data 280 or stores the data 280.
  • a system 300 includes a processing device 310 coupled to a local storage device 315 and coupled to a display 320.
  • the processing device 310 accesses the estimated state 170 provided by the state estimator 110 of the system
  • a system 500 includes a processing device
  • Processing device 510 coupled to a source of encoded data 505, such as encoded digital video. Processing device 510 is further coupled to a display 530 and to a storage device 520. Storage device 520 may be local or remote.
  • the processing device 510 decodes received data and estimates a location of an object in a picture included in the decoded data. Processing device 510 provides decoded data 515 and enhanced data 525 to display 530. Enhanced data 525 may include, for example, data relating to an estimated location of an object in a picture.
  • the model of an object is compared with data from multiple locations in a particular picture 605.
  • the model of an object may be a template of an object, by way of example.
  • the object may be a ball, such as a soccer ball.
  • the particular picture may be a frame in a video, such as a sports video, and further such as a soccer video.
  • Indicators are determined of whether the object is at each of the particular locations 610.
  • the indicators may be probabilities of whether the object is at each of the particular locations.
  • a location of the object in the particular picture is estimated 615 based in part on the indicators.
  • the estimated location may be stored in memory, and may be provided to an encoder for use in encoding of the particular picture for transmission or storage.
  • the estimated location may also be used in a post-processor to a decoder to, for example, enhance the presentation of the object in a display.
  • a location of an object in a previous picture is determined 705.
  • the object location may be determined, by way of example, by successive comparison of regions around pixels in the picture with an object template.
  • a particle-filter based tracker is used to estimate 710 multiple possible locations of the object in a particular picture. Pixel data at each of the multiple possible locations is evaluated 715 to obtain a likelihood of the object being at each of the multiple possible locations.
  • An estimated location may be determined by resampling 720 particles in the particle filter based tracker, using the likelihood in the resampling.
  • the likelihood may be used, for example, by eliminating from resampling locations that exhibit a probability below a threshold.
  • the likelihood may be used in weighting the particles during resampling, or at the end of a resampling process.
  • the method of FIG. 8 may be an embodiment of the method of FIG. 7.
  • a strong detector 805 is employed to detect an object in a frame.
  • a strong detector may be implemented in a processing device executing instructions. If the strong detector 805 operates on the picture and fails 810 to find the object, the process flow returns to the strong detector 805 for the next picture.
  • the object may be occluded in the frame. Occlusion may result, for example, from a soccer ball being behind a player. If the strong detector 805 detects the object, then the process flow moves to use of a tracker 815 to locate the object in the next picture.
  • Tracker 815 may be a particle-based tracker, and in particular a particle-filter based tracker. Tracker 815 employs data including the location of the object in the prior picture to determine multiple estimated locations in the particular picture. Tracker 815 operates in parallel with a weak detector 820. Data including the location in the prior picture is provided to weak detector 820, although in another implementation such location data need not be provided and the weak detector can operate on an entire frame without such location data. In another implementation, the multiple estimated locations are provided to the weak detector 820. The weak detector may compare characteristics of pixels in the picture, at the multiple estimated locations or at additional locations, to one or more tests, such as comparisons to factors characteristic of an object, or factors characteristic of a background.
  • a pixel's green color value may be compared to a threshold to determine if the pixel is part of a playfield surface (such as grass). In one implementation, if the green color value is above a threshold, then we decide that the pixel is grass. If the green color value is not above the threshold, then we determine a likelihood that the pixel is green and a likelihood that the pixel is white (like the color of a ball), and these two likelihoods can then be used to make a more informed decision at the tracker, as described in more detail further below.
  • the weak detector may determine, for each pixel in the particular picture, a likelihood of the object being at that location.
  • the determined likelihoods for each pixel in the picture may be referred to as a likelihood map.
  • the weak detector 820 makes likelihood determinations and provides likelihood data to the tracker 815.
  • tracker 815 uses the likelihood data, tracker 815 provides an estimated object location in the current frame. Additionally, in at least one implementation, tracker 815 also provides a likelihood (also referred to as a confidence score) for the estimated object location. Tracker 815 also makes a determination 825 as to whether the object has been lost.
  • the determination may include checking the confidence score of the estimated object location against a threshold. If the confidence score for this single frame is below the threshold, then the object is concluded to be lost, and the strong detector is employed to locate the object in the next frame. If the confidence score for this single frame is above the threshold, then the tracker is employed to locate the object in the next frame, using the estimated location of the object from the current frame.
  • the determination may include checking the number of consecutive frames in a video in which the confidence score is below a threshold. If this number of consecutive frames in which the confidence score has fallen below the threshold is above a threshold number of frames, then the object is concluded to be lost. If the object is concluded to be lost, then the strong detector is employed to locate the object in the next frame. Otherwise, the tracker and weak detector are employed to locate the object in the next frame. This implementation allows for occlusion to occur in one or more frames as long as the system recovers the object (that is, occlusion ends) before the threshold number of frames is reached.
  • the determination of a likelihood may be developed for each pixel location to provide a likelihood map. In another implementation, the determination may be made only for those estimated multiple locations that correspond to particles generated by a tracker. On a pixel-by-pixel basis for the estimated multiple locations, a hard decision may be applied 905 using a playfield pixel classifier. The classifier may operate on a selected region at which the pixel corresponding to one of the multiple locations is centered.
  • a color histogram learning technique is employed to determine whether a given pixel is properly classified as a playfield pixel.
  • Color models may be learned for playfield pixels and nonplayfield pixels, using a training set of soccer videos.
  • the color model is an RGB (red, green, blue) color histogram with N bins per channel in the RGB color space.
  • the playfield and nonplayfield models are learned as follows.
  • the pixels in the training set videos are labeled as playfield pixels and nonplayfield pixels using a semi- supervised method (adapting a mixture of Gaussian models). Based on its RGB color vector, each labeled playfield pixel is placed into the appropriate rgb bin of the playfield histogram. 5
  • a similar process is carried out for the pixels labeled as nonplayfield.
  • the histogram counts are converted into a discrete probability distribution:
  • J[rgb] is the pixel count in bin rgb of the playfield histogram
  • n[rgb] is the 10 corresponding count from the nonplayfield histogram
  • 7/ and T n are the total pixel counts contained in the playfield arid nonplayfield histograms, respectively.
  • a playfield pixel classifier is derived through the likelihood ratio approach as: where the likelihood ratio for pixel (x,y) is:
  • ⁇ > 0 a threshold that can be adjusted to trade off between correct detections and false positives.
  • a false positive occurs when, for example, a non-playfield pixel is labeled as a playfield pixel.
  • a false negative occurs when, for example, a playfield pixel is0 labeled as a non-playfield pixel.
  • the number of bins per channel, N, and the detection threshold, ⁇ can be chosen based on the receiver operating characteristic (ROC) curve that plots the probability of correct detection against the probability of false detection and the number of bins per channel is varied across a range of values such as for N as 32 to 256 bins.
  • the probability of correct detection is defined as the fraction of pixels for which classification5 and label match, i.e.
  • detection threshold ⁇ corresponding to the operating point is then used for the pixel classification in the equation above.
  • the weak detector returns a zero likelihood score 915 to the tracker. If the result of the classifier is that the resemblance to the playfield is below the threshold, then a playfield soft likelihood detection may be applied 920.
  • An object soft likelihood determination may be performed 925.
  • An object soft likelihood indicates a resemblance between the pixels at the multiple location and characteristics of the object.
  • the likelihood score is an indicator that may be employed in deciding whether to resample the particle. By way of example, if a likelihood score returned by the weak detector is less than a threshold, the corresponding particle will not be resampled.
  • the likelihood score is an indicator that may be employed in the observation model of a particle filter based tracker. In the observation model, the likelihood of a given particle may be based on a product of three probability values. The intensity measurement Z, mt , the motion measurement Z, mo ' , and the detector measurement Z, det are assumed to be independent.
  • the particle likelihood may be represented as:
  • Z t (Z, 1 " , ZTM° , Z 1 e ⁇ , O 1 - 0 if the ball is occluded, and 1 otherwise.
  • the likelihood value provided by the weak detector may be employed to provide the probability for the detector measurement P(Zf et ⁇ X,) -
  • a likelihood value of the intensity measurement may be determined as follows.
  • X, ⁇ Ne ⁇ b Z eW where W is the ball window, Neib is a small neighborhood around X 1 , T is the ball template, and /(.) is the image in the current frame.
  • J+i hypotheses can be defined as:
  • Hypothesis H Q means that none of the candidates is associated with the true match.
  • the clutter is assumed to be uniformly distributed as U(-) , and hence the true match-oriented measurement is Gaussian distributed as N(y) .
  • the motion likelihood probability may be expressed as a factor based on the difference between the particle's speed (position change) and the average ball speed in the latest history.
  • the motion likelihood probability may be expressed as:
  • ⁇ od (I ⁇ x, I - ⁇ ) 2 + (I Ay 1
  • (Ax n Ay 1 ) is the particle's speed with respect to (*,_, ,.)/,_, )
  • (Ax, Ay) is the average ball speed in the latest history.
  • the average ball speed may be represented as:
  • the motion likelihood is calculated as a normalized function of the motion likelihood probability as follows:
  • an object template may be updated after each frame, may be updated selectively, or the same template may be maintained. It may be desirable to update the object template because of changes in an attribute of the object. For example, in a soccer video, a soccer ball may become dirty during the game, and thus less white. In such an event, a template that represents the ball may be updated based on an attribute, where the attribute is the color of the object. In another implementation, an attribute may be the velocity of the object. Based on detected change in velocity, the template may be updated. As another example, the shape or size of a ball may change during a sequence.
  • Various schemes for selective updating of a template are known. For example, in the technique known as alpha blending, data from the current estimated location is used to modify the template.
  • the template may be updated by taking a weighted average of the color indicated in the original template and the current estimated location.
  • the attribute is replaced by the attribute in the current estimated location.
  • the attribute of object size is not updated when the template is updated. Updating may also be based on feedback from an estimated likelihood that the tracking is accurate. For example, if the likelihood falls, then the template is updated because, for example, the likelihood falling may indicate that the appearance of the object is changing..
  • Apparatus 1000 receives an estimated state and image data at pre-processor 1005.
  • the steps of comparing, determining, and estimating described above are performed by pre-processor 1005 prior to passing the data to encoder 1010.
  • Encoder 1010 encodes the data in accordance with a suitable compression standard, such as MPEG-2, MPEG-4, or H.264/ AVC.
  • encoder 1010 allocates bits disproportionately to the estimated location of the object as provided by pre-processor 1005. For example, encoder 1010 may give the estimated location of the object a disproportionately high allocation to enhance a viewer's ability to follow the object from frame to frame.
  • Encoder 1010 outputs encoded picture data to transmit/store 1015, which may include a data storage device, such as a digital video recorder (DVR) drive, a computer hard drive, or a portable device.
  • Transmit/store 1015 may also include a system for transmission of video, such as a cable head-end system.
  • transmit/store 1015 may include a transmitter adapted to transmit a program signal having a plurality of bitstreams representing encoded pictures. Typical transmitters perform functions such as, for example, one or more of providing error-correction coding, interleaving the data in the signal, randomizing the energy in the signal, and modulating the signal onto one or more carriers.
  • the transmitter may include, or interface with, an antenna (not shown).
  • Apparatus 1100 may be a video receiving system, which may be configured to receive signals over a variety of media, such as, for example, satellite, cable, telephone-line, or terrestrial broadcast. The signals may be received over the Internet or some other network.
  • Apparatus 1100 may be, for example, a cell phone, a computer, a set-top box, a television, or other device that receives encoded video and provides, for example, decoded video for display to a user or for storage.
  • the video receiving system 1100 may provide its output to, for example, a screen of a television, a computer monitor, a computer (for storage, processing, or display), or some other storage, processing or display device.
  • Apparatus 1100 receives encoded video data at decoder 1105.
  • Encoded video data may be encoded according to any suitable compression standard, such as MPEG-2, MPEG-4, or H.264/AVC.
  • Decoder 1105 decodes the encoded video data and outputs decoded video data to post-processor 1 1 10. The steps of comparing, determining, and estimating described above are performed by post processor 1110.
  • Post processor 1110 may output video data with enhanced data, such as data identifying an estimated location of an object, for display of video data with enhanced data by display 1115.
  • Enhanced data may include data highlighting an estimated location of an object (and thus highlighting the object).
  • Highlighting of an estimated location of an object may include providing a pointer, providing differential color, differential brightness, sharpening of edges of the object, increasing the size of the object, or modifying one or more other attributes of the object at the estimated location, and other steps. Highlighting may also, or alternatively, include modifying one or more attributes of the area (background, for example) immediately surrounding the object. For example, if the object is a white ball against a green background (grass), then the ball may be made a brighter white, and the background may be made a darker green, so that the ball stands out (that is, so that the ball is highlighted or emphasized). Further, highlighting may include causing the object to change color, brightness, size, or other attributes as the object is displayed, for example, from frame to frame to produce a pulsing effect so that a viewer more readily notices the object.
  • an encoder may performed the steps of comparing, determining, and estimating (or otherwise detect and track an object), and encode the location of the object.
  • a decoder or other receiver may receive the estimated location and perform a highlighting operation without needing to track the object (for example, without needing to perform the steps of comparing, determining, and estimating).
  • highlighting may also, or alternatively, be done at the encoder, and such highlighting may be encoded and transmitted.
  • Comparator 1205 receives data including a model of an object with picture data from multiple locations in a particular picture, and provides a comparison output.
  • the comparison output is received by indicator unit 1210 coupled to comparator 1205.
  • Indicator unit 1210 is configured to determine, based on the comparison output, indicators of whether the object is at one of the multiple locations in the particular picture.
  • Indicator unit 1210 outputs indicators to estimator 1215 coupled to indicator unit 1210.
  • Estimator 1215 is configured to estimate, based on the determined indicators and on a location of the object in a previous picture, a location of the object in the particular picture.
  • Estimator 1215 is coupled to encoder 1217, which encodes the data for transmission.
  • Encoder 1217 is configured such that encoding allocates a disproportionate amount of bits to the estimated location in the particular picture.
  • encoder 1217 may encode a portion of the picture corresponding to the estimated position using a first coding algorithm, and may encode another portion of the picture not corresponding to the estimated position using a second coding algorithm.
  • the first algorithm may, for example, provide more coding redundancy than the second coding algorithm, so that the estimated position of the object (and hopefully the object itself) will be expected to be reproduced with greater detail and resolution than other portions of the picture.
  • Encoder 1217 is coupled to modulator 1220, which modulates the estimated location data with picture data for transmission.
  • Apparatus 1300 includes receiver 1305 configured to receive encoded video data, which may include an encoded version of a particular picture.
  • Typical receivers perform functions such as, for example, one or more of receiving a modulated and encoded data signal, demodulating the data signal from one or more carriers, de-randomizing the energy in the signal, de-interleaving the data in the signal, and error-correction decoding the signal.
  • Receiver 1305 may include, or interface with, an antenna (not shown).
  • Receiver 1305 is coupled to decoder 1315, which is configured to decode the encoded video data.
  • a separate demodulator may be provided coupled to receiver 1305 and configured to demodulate the encoded video data to produce demodulated encoded video data, which may include a demodulated encoded version of the particular picture.
  • Such a separate demodulator may be coupled, for example, between a receiving antenna and a receiver 1305 that is configured to process a demodulated signal by, for example, performing de-randomizing and de-interleaving.
  • Decoder 1315 is coupled to comparator 1320, which is configured to compare a model of an object, such as a template, with data from multiple locations in pictures in the decoded video data, and to provide a comparison output for one or more of the pictures.
  • Comparator 1320 is coupled to indicator unit 1325 which is configured to determine, based on the comparison output, indicators of whether the object is at one of the multiple locations in the decoded pictures.
  • Indicator unit 1325 is coupled to estimator 1330, which is configured to estimate, based on the determined indicators and on a location of the object in a previous decoded picture, a location of the object in the decoded particular picture.
  • Estimator 1330 is coupled to data output 1335, which is configured to provide data allowing a display of the decoded particular picture that highlights, also referred to as emphasizes, the object at the estimated location.
  • Data output 1335 is coupled to display unit 1340, which is configured to display the decoded particular picture with a highlight, also referred to as an emphasis, on the object at the estimated location.
  • display unit 1340 which is configured to display the decoded particular picture with a highlight, also referred to as an emphasis, on the object at the estimated location.
  • comparator 1320 can also be performed by the combination of comparator 1320, indicator unit 1325, and estimator 1330, without the additional components of apparatus 1300.
  • the combination of a comparator (such as comparator 1205), an indicator unit (such as indicator unit 1210), an estimator (such as estimator 1215), and an encoder (such as encoder 1217) may be used in various applications such as, for example, encoding data for storage on a distributable medium.
  • video data may be encoded, with a disproportionate amount of bits allocated to a tracked object, and with or without highlighting of the tracked object.
  • the encoded video data may be stored to a digital versatile disc (DVD; also referred to as a digital video disc) for sale to the public or for personal use.
  • DVD digital versatile disc
  • Such encoding of the video data may be performed, for example, using editing software, transcoding software, or authoring software.
  • a method 1400 is illustrated.
  • a model of an object is compared 1405 with data from multiple locations in a particular picture. Based on the comparing, indicators are determined 1410 of whether the object is at one of the multiple locations in the particular picture. Based on the determined indicators and on a location of the object in a previous picture, a location of the object in the particular picture is estimated 1415.
  • the particular picture is encoded 1420, wherein encoding allocates a disproportionate amount of bits to the estimated location in the particular picture.
  • the encoded particular picture is modulated 1425 for transmission.
  • a method 1500 is illustrated. In the method, encoded modulated video data, including an encoded version of a particular picture, is received 1505.
  • the received encoded version of the particular picture is decoded 1515 to produce a decoded particular picture.
  • the receiving step may include demodulating the received encoded version of the particular picture.
  • a model of an object is compared 1520 with data from multiple locations in the decoded particular picture. Based on the comparing, indicators are determined 1525 of whether the object is at one of the multiple location in the decoded particular picture. Based on the determined indicators and on a location of the object in a previous decoded picture, a location of the object in the decoded particular picture is estimated 1530. Data is provided 1535 allowing a display of the decoded particular picture that highlights (also referred to as emphasizes) the object at the estimated location.
  • the decoded particular picture with a highlight (also referred to as an emphasis) on the object at the estimated location may be displayed.
  • terms such as “playfield”, “playing field”, and “soccer field” are intended to be synonymous in referring to the surface upon which the soccer match is played.
  • references to "players” are intended to include players and/or referees alike. Implementations are not limited to a soccer field but can be applied to other play scenes, such as, for example, basketball, baseball, football, tennis, hockey, and volleyball fields, rinks, and courts. Implementations may further be applied to applications outside of the field of sports video of identifying a location of an object against a relatively uniform background.
  • picture includes, without limitation, a frame in a digital video, a field in a digital video, or a limited portion of a frame such as for example a macroblock or a partition of a macroblock.
  • the implementations described herein may be implemented in, for example, a method or process, an apparatus, or a software program. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program).
  • An apparatus may be implemented in, for example, appropriate hardware, software, and firmware.
  • the methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processing devices also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs”), and other devices that facilitate communication of information between end-users.
  • PDAs portable/personal digital assistants
  • Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, equipment or applications associated with data encoding and decoding, or equipment or applications associated with content production.
  • equipment include video coders, video decoders, video codecs, web servers, set-top boxes, laptops, personal computers, cell phones, PDAs, and other communication devices.
  • the equipment may be mobile and even installed in a mobile vehicle.
  • the methods may be implemented by instructions being performed by a processor, and such instructions may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette, a random access memory ("RAM"), or a read-only memory (“ROM").
  • the instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two.
  • a processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium having instructions for carrying out a process.
  • implementations may also produce a signal formatted to carry information that may be, for example, stored or transmitted.
  • the information may include, for example, instructions for performing a method, or data produced by one of the described implementations.
  • a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal.
  • the formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream.
  • the information that the signal carries may be, for example, analog or digital information.
  • the signal may be transmitted over a variety of different wired or wireless links, as is known.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne diverses mises en œuvre pour détecter et suivre un objet. Au moins une mise en œuvre comprend la comparaison d'un modèle d'un objet avec des données provenant de multiples emplacements dans une image particulière. La mise en œuvre consiste également à déterminer, à partir de la comparaison, des indicateurs permettant de savoir si l'objet se trouve au niveau de l'un des multiples emplacements dans l'image particulière. En outre, la mise en œuvre comprend l'estimation, d'après les indicateurs déterminés et un emplacement de l'objet dans une image précédente, d'un emplacement de l'objet dans l'image particulière.
PCT/US2008/012797 2007-11-16 2008-11-14 Estimation d'un emplacement d'objet dans une vidéo WO2009067170A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US341507P 2007-11-16 2007-11-16
US61/003,415 2007-11-16

Publications (1)

Publication Number Publication Date
WO2009067170A1 true WO2009067170A1 (fr) 2009-05-28

Family

ID=40513944

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/012797 WO2009067170A1 (fr) 2007-11-16 2008-11-14 Estimation d'un emplacement d'objet dans une vidéo

Country Status (1)

Country Link
WO (1) WO2009067170A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011011059A1 (fr) * 2009-07-21 2011-01-27 Thomson Licensing Procédé fondé sur une trajectoire pour détecter et améliorer un objet mobile dans une séquence vidéo
US9020259B2 (en) 2009-07-20 2015-04-28 Thomson Licensing Method for detecting and adapting video processing for far-view scenes in sports video
US20150146007A1 (en) * 2013-11-26 2015-05-28 Honeywell International Inc. Maintenance assistant system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1536635A1 (fr) * 2003-11-28 2005-06-01 Casio Computer Co., Ltd. Appareil de contrôle d'affichage et programme
WO2008069995A2 (fr) * 2006-12-01 2008-06-12 Thomson Licensing Estimation d'une localisation d'un objet dans une image

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1536635A1 (fr) * 2003-11-28 2005-06-01 Casio Computer Co., Ltd. Appareil de contrôle d'affichage et programme
WO2008069995A2 (fr) * 2006-12-01 2008-06-12 Thomson Licensing Estimation d'une localisation d'un objet dans une image

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
ARNAUD E ET AL: "Conditional Filters for Image Sequence-Based Tracking-Application to Point Tracking", IEEE TRANSACTIONS ON IMAGE PROCESSING, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 14, no. 1, 1 January 2005 (2005-01-01), pages 63 - 79, XP011123603, ISSN: 1057-7149 *
BLAKE A ET AL: "Data Fusion for Visual Tracking With Particles", PROCEEDINGS OF THE IEEE, IEEE. NEW YORK, US, vol. 92, no. 3, 1 March 2004 (2004-03-01), pages 495 - 513, XP011108678, ISSN: 0018-9219 *
DAWEI LIANG ET AL: "A Scheme for Ball Detection and Tracking in Broadcast Soccer Video", ADVANCES IN MULITMEDIA INFORMATION PROCESSING - PCM 2005 LECTURE NOTES IN COMPUTER SCIENCE;;LNCS, SPRINGER, BERLIN, DE, vol. 3767, 1 January 2005 (2005-01-01), pages 864 - 875, XP019024023, ISBN: 978-3-540-30027-4 *
J. VERMAAK P. PÉREZ PEREZ M. GANGNET A. BLAKE: "Toward improved observation models for visual tracking: Selective adaptation", PROC. EUR. CONF. COMPUTER VISION, vol. 2350/2002, 2002, pages 645 - 660, XP002523249 *
MACCORMICK J ET AL: "A probabilistic exclusion principle for tracking multiple objects", INTERNATIONAL JOURNAL OF COMPUTER VISION, KLUWER ACADEMIC PUBLISHERS, NORWELL, US, vol. 39, no. 1, 1 January 2000 (2000-01-01), pages 57 - 71, XP007902729, ISSN: 0920-5691 *
MAGGIO E ET AL: "Hybrid Particle Filter and Mean Shift tracker with adaptive transition model", 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PISCATAWAY, NJ, USA,IEEE, vol. 2, 18 March 2005 (2005-03-18), pages 221 - 224, XP010790616, ISBN: 978-0-7803-8874-1 *
MATTHEWS I ET AL: "THE TEMPLATE UPDATE PROBLEM", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, IEEE SERVICE CENTER, LOS ALAMITOS, CA, US, vol. 26, no. 6, 1 June 2004 (2004-06-01), pages 810 - 815, XP001211169, ISSN: 0162-8828 *
XIAO-FENG TONG ET AL: "An effective and fast soccer ball detection and tracking method", PATTERN RECOGNITION, 2004. ICPR 2004. PROCEEDINGS OF THE 17TH INTERNAT IONAL CONFERENCE ON CAMBRIDGE, UK AUG. 23-26, 2004, PISCATAWAY, NJ, USA,IEEE, vol. 4, 23 August 2004 (2004-08-23), pages 795 - 798, XP010724040, ISBN: 978-0-7695-2128-2 *
YU HUANG ET AL: "Players and Ball Detection in Soccer Videos Based on Color Segmentation and Shape Analysis", MULTIMEDIA CONTENT ANALYSIS AND MINING; [LECTURE NOTES IN COMPUTER SCIENCE;;LNCS], SPRINGER BERLIN HEIDELBERG, BERLIN, HEIDELBERG, vol. 4577, 30 June 2007 (2007-06-30), pages 416 - 425, XP019064709, ISBN: 978-3-540-73416-1 *
YU HUANG ET AL: "Variable Number of Informative Particles for Object Tracking", MULTIMEDIA AND EXPO, 2007 IEEE INTERNATIONAL CONFERENCE ON, IEEE, PI, 1 July 2007 (2007-07-01), pages 1926 - 1929, XP031124028, ISBN: 978-1-4244-1016-3 *
YU-WEN HUANG ET AL: "Simple and effective algorithm for automatic tracking of a single object using a pan-tilt-zoom camera", MULTIMEDIA AND EXPO, 2002. ICME '02. PROCEEDINGS. 2002 IEEE INTERNATIO NAL CONFERENCE ON LAUSANNE, SWITZERLAND 26-29 AUG. 2002, PISCATAWAY, NJ, USA,IEEE, US, vol. 1, 26 August 2002 (2002-08-26), pages 789 - 792, XP010604487, ISBN: 978-0-7803-7304-4 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9020259B2 (en) 2009-07-20 2015-04-28 Thomson Licensing Method for detecting and adapting video processing for far-view scenes in sports video
WO2011011059A1 (fr) * 2009-07-21 2011-01-27 Thomson Licensing Procédé fondé sur une trajectoire pour détecter et améliorer un objet mobile dans une séquence vidéo
US20150146007A1 (en) * 2013-11-26 2015-05-28 Honeywell International Inc. Maintenance assistant system
US9740935B2 (en) * 2013-11-26 2017-08-22 Honeywell International Inc. Maintenance assistant system

Similar Documents

Publication Publication Date Title
US10782688B2 (en) Method, control apparatus, and system for tracking and shooting target
US10372970B2 (en) Automatic scene calibration method for video analytics
D’Orazio et al. A review of vision-based systems for soccer video analysis
US10402987B2 (en) Methods and systems of determining object status for false positive removal in object tracking for video analytics
US10268895B2 (en) Methods and systems for appearance based false positive removal in video analytics
US8326042B2 (en) Video shot change detection based on color features, object features, and reliable motion information
JP5492087B2 (ja) コンテンツベースの画像調整
KR101463085B1 (ko) 색 구분 및 모양 분석에 의해 축구 비디오에서 관심 객체를 검출하는 방법 및 장치
US20070294716A1 (en) Method, medium, and apparatus detecting real time event in sports video
EP2457214B1 (fr) Procédé de détection et d'adaptation d'un traitement vidéo pour des scènes en vision de loin dans une vidéo de sport
US20030081836A1 (en) Automatic object extraction
US20180144476A1 (en) Cascaded-time-scale background modeling
CA2720900A1 (fr) Systeme et procede pour ameliorer la visibilite d'un objet dans une image numerique
WO2010083235A1 (fr) Système de traitement d'image et procédé de suivi d'objet
CN107358141B (zh) 数据识别的方法及装置
US20190130586A1 (en) Robust sleeping object detection in video analytics
KR20080105387A (ko) 스포츠 동영상 요약 방법 및 장치
US8311269B2 (en) Blocker image identification apparatus and method
WO2007045001A1 (fr) Pretraitement de sequences de jeux video transmises sur des reseaux mobiles
WO2009067170A1 (fr) Estimation d'un emplacement d'objet dans une vidéo
EP2277142A1 (fr) Système et procédé pour améliorer la visibilité d un objet dans une image numérique
CN114745550A (zh) 基于roi编码的在线巡课方法及巡课装置
US20110026606A1 (en) System and method for enhancing the visibility of an object in a digital picture
Xu et al. A reliable and unobtrusive approach to display area detection for imperceptible display camera communication
CN107194954B (zh) 多视角视频的球员追踪方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08851180

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08851180

Country of ref document: EP

Kind code of ref document: A1