WO2008040945A1 - A method of identifying a measure of feature saliency in a sequence of images - Google Patents

A method of identifying a measure of feature saliency in a sequence of images Download PDF

Info

Publication number
WO2008040945A1
WO2008040945A1 PCT/GB2007/003707 GB2007003707W WO2008040945A1 WO 2008040945 A1 WO2008040945 A1 WO 2008040945A1 GB 2007003707 W GB2007003707 W GB 2007003707W WO 2008040945 A1 WO2008040945 A1 WO 2008040945A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
motion
salient
predicted
identifying
Prior art date
Application number
PCT/GB2007/003707
Other languages
French (fr)
Inventor
Peter Cheung
Christos Bouganis
Yang Liu
Original Assignee
Imperial Innovations Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Imperial Innovations Limited filed Critical Imperial Innovations Limited
Publication of WO2008040945A1 publication Critical patent/WO2008040945A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Definitions

  • HVS Human Visual System
  • the invention allows for determining the saliency regions in a sequence of video frames based on both spatial and temporal information in the video sequence. Because the provision of steps such as 1) the processing of spatial information of individual frames in order to extract a plurality of features; 2) a collection of processing modules that provide predictions on the motion of each of the said features in future frames; 3) a set of combining modules that determine the degree of saliency of the said features based on the degree of unpredictability of the motions of the said features, and/or by selecting different types of prediction modules, the present invention can be applied to different application domains with different saliency characteristics. By changing the parameters in the prediction modules in a time varying fashion, the present invention can be used to adapt to changing conditions.
  • the present invention provides a new way of exploiting both the spatial and temporal information of such feature points and presents a general framework for creating temporal saliency map that is both flexible (i.e. coping with different applications) and adaptable (i.e. adapting to timing varying environment).
  • Applications for the present invention include, but are not limited to, the field of communications where the non-salient parts of the image can be compressed more heavily than the salient parts, the field of computer graphics applications where high- fidelity selective rendering can be achieved based on the output of the saliency model, in video quality estimation, and in the field of intelligent surveillance systems where only the vital regions are processed or recorded, thus saving both processing time and storage.
  • Figure 1 is a diagram that shows an overview of the system.
  • Figure 2 is a diagram showing the architecture of the system.
  • Figure 3 is a diagram showing one embodiment of the invention
  • Figure 1 depicts aspects of the present approach.
  • the input to the system is the pixel data from a frame of the video sequence (101).
  • Features such as corners, lines and defined shapes are extracted from the image frame in (102).
  • the coordinates of the features are stored and are compared with the coordinates of the respective features from one or more previous frames to determine the actual motion of each feature in (103).
  • the result is the calculated motion.
  • a predictor is used to predict the motion of each feature in (104). This is carried out from one or more preceding frames in which the feature is identified and its future motion predicted using a model, for example, assuming a predetermined motion behaviour such as linear motion.
  • the system performs detection of "interesting features" in the current frame.
  • interesting features can include low-level features e.g. corners, or high-level features e.g. faces or textures and can be predetermined or user selected from an appropriate menu. This is performed using existing feature selection or object detection algorithms of the type identified above.
  • Information regarding the coordinates of the detected features are stored in a "pool” of features for example as a table stored in computer memory. The "pool" of features is updated every n frames in order for new features to be detected by the system.
  • the parameter n is set according to the scene's activity such that the more rapidly the scene changes, the smaller the value of n, increasing the update frequency, or user selected, or can be predetermined.
  • the current system utilizes two realizations of predictors. The first one is realized through the "short-term" temporal saliency map module, which is responsible for the prediction of feature's motion in the next frame. The second one is realized through the "long-term” temporal saliency map module which is responsible for detecting any periodicity in the feature's motion.
  • the "short-term" temporal saliency map module detects the features that exhibit discrepancies between their calculated motion and their predicted motion.
  • the core of the module exists an array of predictors, one for each feature.
  • the predictors are responsible for predicting the position of each feature in the next frame.
  • the module stores in the "pool” or table of features information regarding each features' dissimilarity measure between its calculated and predicted motion.
  • the "long-term” saliency map module detects features that undergo a periodic motion.
  • the module stores in the “pool” or table of features information regarding the detected degree of periodicity in the features' motion.
  • the present realization focuses on the low-level features which are identified every n frames.
  • Shi and Tomasi's algorithm [1] has been adopted for feature detection. This algorithm is chosen because a) it detects the corners in an image with robustness and b) the algorithm has been developed to maximize the quality of tracking, which is important for the subsequent steps of the proposed system.
  • an apparatus for performing the approach is shown in Fig. 2 and includes a feature detection module 200, a feature tracker module 202, a memory module 204, a short term predictor module 206, a long term predictor module 208, a distance measure module to compute the degree of saliency 210 and an output module 212 which may be a display or interface for communication or download of the saliency or other data.
  • the steps performed can be understood with reference to Fig. 3.
  • the calculation of the features' position in the new frame (k), 300 is performed (step 302) in the feature tracker module.
  • the calculation is based on the pyramidal implementation of the iterative Lucas-Kanade optical flow algorithm [2].
  • the position of a feature in the new frame is calculated (step 304) in the lowest-resolution level within the pyramid and the result is propagated to the next finer level until the original resolution is reached.
  • the output of the module is the position of each feature in the new frame. From this the calculated motion for this feature is computed simply and at step 306 stored in the "pool" or table in the memory module.
  • the Kalman filter [3] is used as the prediction instrument at step 314 determine short-term temporal saliency in the short term prediction module, taking into account the feature coordination in frame R (312) stored in the pool of features 306.
  • the measurements for the Kalman filters are the coordinates of the features in the previous frame.
  • the output of the filter is a prediction for the position of the features in the current frame (308a, 308b).
  • the state vector is composed by four variables. These are the two components regarding the position of a feature in the image, and the two components of its velocity.
  • the state vector is denoted by ⁇ : (1), where x andy are the coordinates of the feature, and x' andj ⁇ ' are the velocity components in x and y directions respectively.
  • the transition matrix from one state to another is denoted by F.
  • the values in the matrix guarantee that the model follows a linear motion with constant velocity.
  • the measurement matrix is denoted by H.
  • the entries of the matrix denote that the only measurements that are used by the system are the coordinates of the features.
  • the covariance matrices for the model noise and the measurement noise are assumed to be diagonal.
  • the measurements Z k denote the coordinates of a feature at time k.
  • the Kalman filter has two stages: the prediction and the update stage. During the prediction stage, the coordinates in the next time step of each feature are predicted using the "project state” and “project covariance” equations, where the Kalman filter parameters are updated using the "Kalman gain”, "update estimate” and “update covariance” equations. Table 1 summarizes the notation used in the Kalman filters, where Table 2 summarizes the aforementioned equations.
  • the system uses the prediction regarding the feature's positions that is provided by the Kalman filter to calculate the predicted velocity of the feature.
  • the predicted velocity is stored as the predicted motion of the feature in the memory module and is employed by the system for the saliency detection.
  • the amount of information that we learn from the new measurement about the velocity of the feature is used as a measure of unpredictability. This is achieved at step 316 by measuring the distance between the probability distributions of the feature's predicted velocities before and after the measurement.
  • the Bhattacharyya distance [4] is applied. Under the Kalman filter formulation, the variables in the state vector are assumed to follow a Normal probability distribution.
  • the Bhattacharyya distance between two q and p Gaussian distributions can be expressed as follows:
  • the q and p denote the prior and posterior distributions for the velocity of the features.
  • the mean is calculated using the Kalman filter's prediction, where the variance is extracted from the estimation error covariance matrix P.
  • the short-term saliency system has the ability to cope with fast and slow motions. This is achieved by updating the appropriate predictor in the array of predictors by the coordinates of the feature according to its average speed. Using the above motion interpolation, slow and fast moving features are treated equally well by the system.
  • the long-term saliency map is based on the periodicity in the motion of a feature.
  • the system maintains a history of N frames regarding the features' position (318). Their periodicity, if any, is detected in the long-term prediction module 320 through the use of the autocorrelation function R(t), where t denotes the lag. ⁇ x ⁇ . v
  • the vector X denotes the position of a feature in the image, where the vector m denotes the mean value.
  • the variable ⁇ x denotes the standard deviation of the feature's position in the x direction, where the variable ⁇ y denotes the standard deviation in the y direction.
  • the long-term periodicity value 322 is stored in the pool 306 in the memory module R(t) is stored for different values oft).
  • a threshold is set appropriate to the application. Any such features that are detected to undergo a periodic motion are removed from the temporal saliency map.
  • the resulting salient map 328 is smoothed by a 2D Gaussian filter which produces the final temporal saliency map comprising a 2D array of numbers, one for each pixel location at the appropriate resolution, to indicate the degree of saliency (i.e. significance).
  • the size of the Gaussian filter and its extend depend on the image resolution.
  • Different predictors can be used.
  • the above embodiment uses Kalman filter as predictor to predict linear motion.
  • features exhibiting motion that is not linear are detected as salient.
  • Replacing the predictor with one that is good in predicting sinusoidal motion will eliminate wave-like motion from the saliency map.
  • a plurality of predictors can be operated in parallel to eliminate a plurality of motion behaviour.
  • Parameters within the predictor can be varied in time. In this way, the invention can be used to cope with time-varying factors due to, say,

Abstract

A method is provided of identifying feature saliency in a sequence of images. The method comprises extracting a corresponding feature from first and second images in the sequence, calculating an actual motion of the feature, calculating a predicted motion of the feature when identifying the feature saliency as a function of the difference between the actual and predicted motion.

Description

A METHOD OF IDENTIFYING A MEASURE OF FEATURE SALIENCY IN A
SEQUENCE OF IMAGES
FIELD OF THE INVENTION
This invention relates to the field of video processing. Particularly, the present invention is related to a method and an apparatus for creating the temporal saliency map of a sequence of video frames.
BACKGROUND OF INVENTION
It is well known that the Human Visual System (HVS) has the ability to capture a vast amount of visual information but only a fraction of this reaches higher levels of processing in the brain. This is due to human's remarkable ability to focus its attention only to the parts of the image that are more "interesting" (or more salient) than the other parts without requiring any attentional effort. These points (or features) of the image that attract the human attention are called salient points. Initial research was focused on spatial saliency, the detection of salient points in a still image. Recently some work uses temporal (i.e. changes in time) as well as spatial information to derive the salient points using the so-called spatiotemporal saliency models. From such a map (known as saliency map), regions of significance can be identified.
However, the existing work on temporal saliency detection is based on a direct extension of the spatial saliency detection model to the time domain or on the concept of motion contrast. Using the idea of motion contrast, temporal salient points are those points (or features) that exhibit motion that is different from the average motion in the image or from the motion in the point's neighbourhood. These existing methods for deriving the spatiotemporal saliency map are still founded mostly on spatial properties of features and do not fully exploit the available time domain information. Nor do they emulate sufficiently the way that the HVS handles the temporal information.
SUMMARY OF INVENTION
The invention is set out in the claims. The invention allows for determining the saliency regions in a sequence of video frames based on both spatial and temporal information in the video sequence. Because the provision of steps such as 1) the processing of spatial information of individual frames in order to extract a plurality of features; 2) a collection of processing modules that provide predictions on the motion of each of the said features in future frames; 3) a set of combining modules that determine the degree of saliency of the said features based on the degree of unpredictability of the motions of the said features, and/or by selecting different types of prediction modules, the present invention can be applied to different application domains with different saliency characteristics. By changing the parameters in the prediction modules in a time varying fashion, the present invention can be used to adapt to changing conditions.
The present invention provides a new way of exploiting both the spatial and temporal information of such feature points and presents a general framework for creating temporal saliency map that is both flexible (i.e. coping with different applications) and adaptable (i.e. adapting to timing varying environment).
Applications for the present invention include, but are not limited to, the field of communications where the non-salient parts of the image can be compressed more heavily than the salient parts, the field of computer graphics applications where high- fidelity selective rendering can be achieved based on the output of the saliency model, in video quality estimation, and in the field of intelligent surveillance systems where only the vital regions are processed or recorded, thus saving both processing time and storage.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention will be understood more fully from the detailed description given below and from the accompanying drawings of the preferred embodiments of the invention, which, however, should not be taken to limit the invention to the specific embodiments, but are for explanation and understanding only.
Figure 1 is a diagram that shows an overview of the system. Figure 2 is a diagram showing the architecture of the system. Figure 3 is a diagram showing one embodiment of the invention
Overview
Figure 1 depicts aspects of the present approach. The input to the system is the pixel data from a frame of the video sequence (101). Features such as corners, lines and defined shapes are extracted from the image frame in (102). The coordinates of the features are stored and are compared with the coordinates of the respective features from one or more previous frames to determine the actual motion of each feature in (103). The result is the calculated motion. A predictor is used to predict the motion of each feature in (104). This is carried out from one or more preceding frames in which the feature is identified and its future motion predicted using a model, for example, assuming a predetermined motion behaviour such as linear motion. The actual and the predicted motions are compared and their differences computed using a distance measure (105) to produce a numerical value that represent the degree of saliency for each feature (106). If the degree of saliency is larger than a predetermined or computed threshold, the feature is deemed salient (i.e. significant), otherwise it is deemed non-salient (i.e. not significant). In this way, the degree of saliency is a function of the difference between the predicted and the actual feature motion.
In an initialization step (not shown), and every n frames, the system performs detection of "interesting features" in the current frame. The term "interesting features" can include low-level features e.g. corners, or high-level features e.g. faces or textures and can be predetermined or user selected from an appropriate menu. This is performed using existing feature selection or object detection algorithms of the type identified above. Information regarding the coordinates of the detected features are stored in a "pool" of features for example as a table stored in computer memory. The "pool" of features is updated every n frames in order for new features to be detected by the system. The parameter n is set according to the scene's activity such that the more rapidly the scene changes, the smaller the value of n, increasing the update frequency, or user selected, or can be predetermined. The current system utilizes two realizations of predictors. The first one is realized through the "short-term" temporal saliency map module, which is responsible for the prediction of feature's motion in the next frame. The second one is realized through the "long-term" temporal saliency map module which is responsible for detecting any periodicity in the feature's motion.
In more detail, the "short-term" temporal saliency map module detects the features that exhibit discrepancies between their calculated motion and their predicted motion. In the core of the module exists an array of predictors, one for each feature. The predictors are responsible for predicting the position of each feature in the next frame. The module stores in the "pool" or table of features information regarding each features' dissimilarity measure between its calculated and predicted motion.
The "long-term" saliency map module detects features that undergo a periodic motion. The module stores in the "pool" or table of features information regarding the detected degree of periodicity in the features' motion.
Finally, the system uses the information stored in the "pool" or table to produce the temporal saliency map. The module combines the information regarding the predictability about the motion of a feature and the degree of periodicity in its motion that has been detected by the system in order to produce the temporal saliency map.
System realization
This section describes a realization of the present temporal attention model. In the following description some details are set forth, such as the type of motion to be predicted, the specific algorithms used for feature detection and tracking in consecutive frames etc., in order to provide a thorough understanding of how this invention works. However, it will be obvious to one skilled in the art that the present invention may be practiced without these specific details and/or using alternative implementational approaches that will be familiar to the skilled reader.
In one possible approach, it is assumed that features that undergo a linear motion are classified as non-salient, where features with more complicated motion types are classified as salient. The motivation behind this comes from the observation that the human brain can predict easier a linear motion than any other type of motion. However alternative approaches and assumption can of course be adopted.
The present realization focuses on the low-level features which are identified every n frames. Shi and Tomasi's algorithm [1] has been adopted for feature detection. This algorithm is chosen because a) it detects the corners in an image with robustness and b) the algorithm has been developed to maximize the quality of tracking, which is important for the subsequent steps of the proposed system.
The method described herein can be implemented by any appropriate computer processor or hardware processor, the results stored in any appropriate memory and displayed in any appropriate manner. In one embodiment, for example, an apparatus for performing the approach is shown in Fig. 2 and includes a feature detection module 200, a feature tracker module 202, a memory module 204, a short term predictor module 206, a long term predictor module 208, a distance measure module to compute the degree of saliency 210 and an output module 212 which may be a display or interface for communication or download of the saliency or other data.
The steps performed can be understood with reference to Fig. 3. The calculation of the features' position in the new frame (k), 300 is performed (step 302) in the feature tracker module. The calculation is based on the pyramidal implementation of the iterative Lucas-Kanade optical flow algorithm [2]. According to this algorithm, the position of a feature in the new frame is calculated (step 304) in the lowest-resolution level within the pyramid and the result is propagated to the next finer level until the original resolution is reached. The output of the module is the position of each feature in the new frame. From this the calculated motion for this feature is computed simply and at step 306 stored in the "pool" or table in the memory module.
To implement feature tracking at step 310 the implementation of the system described here is based on the non-salient assumption of linearity, thus the Kalman filter [3] is used as the prediction instrument at step 314 determine short-term temporal saliency in the short term prediction module, taking into account the feature coordination in frame R (312) stored in the pool of features 306. The measurements for the Kalman filters are the coordinates of the features in the previous frame. The output of the filter is a prediction for the position of the features in the current frame (308a, 308b). The state vector is composed by four variables. These are the two components regarding the position of a feature in the image, and the two components of its velocity. The state vector is denoted by Λ: (1), where x andy are the coordinates of the feature, and x' andjμ' are the velocity components in x and y directions respectively. x y x' (i) y'
The transition matrix from one state to another is denoted by F. The values in the matrix guarantee that the model follows a linear motion with constant velocity. Finally, the measurement matrix is denoted by H. The entries of the matrix denote that the only measurements that are used by the system are the coordinates of the features. The values for F and H matrices are shown below:
1 0 1 0 1 0 0 0 0 1 0 1 H = 0 1 0 0
F = 0 0 1 0 0 0 0 1
The covariance matrices for the model noise and the measurement noise are assumed to be diagonal. The measurements Zk denote the coordinates of a feature at time k. The Kalman filter has two stages: the prediction and the update stage. During the prediction stage, the coordinates in the next time step of each feature are predicted using the "project state" and "project covariance" equations, where the Kalman filter parameters are updated using the "Kalman gain", "update estimate" and "update covariance" equations. Table 1 summarizes the notation used in the Kalman filters, where Table 2 summarizes the aforementioned equations.
Table 1. Kalman filter's symbol definition
Figure imgf000009_0002
Table 2. Kalman filter's update and prediction equations
Kalman Gain: Kft = PJTH7 [HP-H7 + R]-1 Update estimate: ±k = xζ 4- Kj3(Zt - Hxj~) Update covariance: P;; = [I — K/;H]P^
Project state: ic^ = FxΛ_i
Project covariance: P^" = FP^-iFT + Q
The system uses the prediction regarding the feature's positions that is provided by the Kalman filter to calculate the predicted velocity of the feature. The predicted velocity is stored as the predicted motion of the feature in the memory module and is employed by the system for the saliency detection. The amount of information that we learn from the new measurement about the velocity of the feature is used as a measure of unpredictability. This is achieved at step 316 by measuring the distance between the probability distributions of the feature's predicted velocities before and after the measurement. To quantify the distance between the two distributions, the Bhattacharyya distance [4] is applied. Under the Kalman filter formulation, the variables in the state vector are assumed to follow a Normal probability distribution. The Bhattacharyya distance between two q and p Gaussian distributions can be expressed as follows:
Figure imgf000009_0001
The q and p denote the prior and posterior distributions for the velocity of the features. The variable σp 2 >( denotes the variance of the ith component of the velocity in the p distribution, where the variable μp 2 t denotes the mean. The mean is calculated using the Kalman filter's prediction, where the variance is extracted from the estimation error covariance matrix P.
The short-term saliency system has the ability to cope with fast and slow motions. This is achieved by updating the appropriate predictor in the array of predictors by the coordinates of the feature according to its average speed. Using the above motion interpolation, slow and fast moving features are treated equally well by the system.
The long-term saliency map is based on the periodicity in the motion of a feature. The system maintains a history of N frames regarding the features' position (318). Their periodicity, if any, is detected in the long-term prediction module 320 through the use of the autocorrelation function R(t), where t denotes the lag.
Figure imgf000010_0001
σxσ. v
The vector X denotes the position of a feature in the image, where the vector m denotes the mean value. The variable σx denotes the standard deviation of the feature's position in the x direction, where the variable σy denotes the standard deviation in the y direction. The long-term periodicity value 322 is stored in the pool 306 in the memory module R(t) is stored for different values oft).
In the last stage 324, the information regarding the features' position, the level of periodicity that they exhibit, and their dissimilarity measures between the prior and posterior distribution of the velocity are combined together in the saliency module 326. The features that have large values in their dissimilarity measure, implying that do not undergo a linear motion, ie exceed a saliency threshold give rise to salient regions. The threshold is a means of removing feature that has low degree of saliency. It may be set to zero, in which case the saliency map will simply be a 2 dimensional array of number representing how significant the scene is in each location. However, if the map is used to make decision, such as whether location is worth further processing and attention, or to display the locations of saliency feature on a monitor, then a threshold is set appropriate to the application. Any such features that are detected to undergo a periodic motion are removed from the temporal saliency map. The resulting salient map 328 is smoothed by a 2D Gaussian filter which produces the final temporal saliency map comprising a 2D array of numbers, one for each pixel location at the appropriate resolution, to indicate the degree of saliency (i.e. significance). The size of the Gaussian filter and its extend depend on the image resolution.
It is important to note that the present invention can easily be adapted to detect saliency in different applications. Some example variations to the above embodiment are:
1) Different feature extraction methods can be used. For example, one could focus on extracting oval like objects to detect faces or a objects having rectangular or other geometric or definable shapes;
2) Different predictors can be used. For example, the above embodiment uses Kalman filter as predictor to predict linear motion. As a result, features exhibiting motion that is not linear are detected as salient. Replacing the predictor with one that is good in predicting sinusoidal motion will eliminate wave-like motion from the saliency map. Furthermore a plurality of predictors can be operated in parallel to eliminate a plurality of motion behaviour.
3) Parameters within the predictor can be varied in time. In this way, the invention can be used to cope with time-varying factors due to, say,
changing environment.
4) The resolution at which respective stages of the algorithm are performed
can be varied.
In the following above description of the present invention numerous specific details are set forth, such as the specific feature extraction method, the Kalman filter predictor, the Bhattacharyya distance measure etc., in order to provide a thorough understanding of the present invention. However, it will be obvious to one skilled in the art that the present invention may be practiced without these specific details. In other instances well known methods, functions, components and procedures have not been described in detail as not to unnecessarily obscure the present invention. Furthermore, the present invention can easily be modified in such a way that different processing steps can be applied to the video frames presented at different resolutions.
[1] J.Shi and C. Tomasi, "Good features to track", in IEEE Conference on Computer Vision and Pattern Recognition, 1994
[2] J. Bouguet, "Pyramidal implementation of the Lukas Kanade feature tracker. Description of the algorithm", 2001 (http://sourceforge.net/projects/opencvlibrary)
[3] G. Welch and G. Bishop, "An Introduction to the Kalman Filter", Technical Report, TR 95-041, Department of Computer Science, University of North Carolina at Chapel Hill.
[4] A. Djouadi, O. Snorrason and F. Garber, "The quality of Training-Sample estimates of the Bhattacharyya coefficient", IEEE Tran. Pattern analysis and machine intelligence, vol. 12, pp. 92-97, 1990

Claims

1. A method of identifying feature saliency in a sequence of images comprising extracting a corresponding feature from first and second images in the sequence, calculating an actual motion of the feature, calculating a predicted motion of the feature and identifying the feature saliency as a function of the difference between the actual and predicted motion.
2. A method as claimed in claim 1 which in the extracted feature comprises one of a geometric feature or a colour feature or texture.
3. A method as claimed in claim 1 or claim 2 in which the feature to be extracted is identified based on a predetermined characteristic in at least one of an initialization step and/or at repeated intervals in an image sequence.
4. A method as claimed in any preceding claim in which the predicted motion is predicted using a prediction model based on motion of the feature in one or more preceding images.
5. A method as claimed in any preceding claim in which the predicted motion is predicted using a prediction model identifying periodic motion.
6. A method as claimed in claim 4 or 5 in which a feature exhibiting predicted or periodic motion is excluded from being a salient feature.
7. A method as claimed in any preceding claim in which an extracted feature is stored together with its calculated actual and predicted motion in a store.
8. A method as claimed in any preceding claim in which the predicted motion for a feature is predicted using a prediction model specific to that feature.
9. A method as claimed in any preceding claim in which the extracted feature is extracted using an extraction process specific to that feature.
10. A method as claimed in any preceding claim in which a feature is identified a salient if the difference between actual and predicted motion exceeds a saliency threshold.
11. A method as claimed in any preceding claim further comprising constructing a map of identified salient features.
12. A method of compressing data comprising identifying salient features according to the method of any of claims 1 to 11 and discarding data relating to non salient features.
13. A method as claimed in claim 12 in which all non-salient feature data is discarded.
14. A method as claimed in claim 13 in which a proportion of non-salient feature data is discarded.
15. A method of processing multiple sequential images for display comprising identifying a salient feature according to the method of any of claims 1 to 12 and rendering a non-salient feature at a lower resolution for display.
16. A method of monitoring a sequence of images comprising identifying salient feature according to the method of any of claims 1 to 12 and storing only data relating to the salient feature.
17. A computer programme comprising a set of instructions configured to carry out the method steps of any of claims 1 to 16. (This method may be practised as a compute program in software. However, due to the high data rate and large amount of computation, it may also be partly implemented using hardware. Therefore this may be too restrictive.)
18. A computer arranged to operate according to a set of instructions as claimed in claim 17.
19. A Video surveillance apparatus including the computer as claimed in claim 18.
PCT/GB2007/003707 2006-10-06 2007-09-28 A method of identifying a measure of feature saliency in a sequence of images WO2008040945A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GBGB0619817.0A GB0619817D0 (en) 2006-10-06 2006-10-06 A method of identifying a measure of feature saliency in a sequence of images
GB0619817.0 2006-10-06

Publications (1)

Publication Number Publication Date
WO2008040945A1 true WO2008040945A1 (en) 2008-04-10

Family

ID=37454140

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2007/003707 WO2008040945A1 (en) 2006-10-06 2007-09-28 A method of identifying a measure of feature saliency in a sequence of images

Country Status (2)

Country Link
GB (1) GB0619817D0 (en)
WO (1) WO2008040945A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101697593A (en) * 2009-09-08 2010-04-21 武汉大学 Time domain prediction-based saliency extraction method
US8098886B2 (en) 2001-03-08 2012-01-17 California Institute Of Technology Computation of intrinsic perceptual saliency in visual environments, and applications
CN103020985A (en) * 2012-11-12 2013-04-03 华中科技大学 Video image saliency detection method based on field quantity analysis
CN103077536A (en) * 2012-12-31 2013-05-01 华中科技大学 Space-time mutative scale moving target detection method
CN103077534A (en) * 2012-12-31 2013-05-01 南京华图信息技术有限公司 Space-time multi-scale moving target detection method
US8649606B2 (en) 2010-02-10 2014-02-11 California Institute Of Technology Methods and systems for generating saliency models through linear and/or nonlinear integration
CN104778721A (en) * 2015-05-08 2015-07-15 哈尔滨工业大学 Distance measuring method of significant target in binocular image
CN113591708A (en) * 2021-07-30 2021-11-02 金陵科技学院 Meteorological disaster monitoring method based on satellite-borne hyperspectral image

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1225769A2 (en) * 2001-01-17 2002-07-24 Tektronix, Inc. Spatial temporal visual attention model for a video frame sequence
US20040086046A1 (en) * 2002-11-01 2004-05-06 Yu-Fei Ma Systems and methods for generating a motion attention model
WO2004043054A2 (en) * 2002-11-06 2004-05-21 Agency For Science, Technology And Research A method for generating a quality oriented significance map for assessing the quality of an image or video
WO2006072637A1 (en) * 2005-01-10 2006-07-13 Thomson Licensing Device and method for creating a saliency map of an image

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1225769A2 (en) * 2001-01-17 2002-07-24 Tektronix, Inc. Spatial temporal visual attention model for a video frame sequence
US20040086046A1 (en) * 2002-11-01 2004-05-06 Yu-Fei Ma Systems and methods for generating a motion attention model
WO2004043054A2 (en) * 2002-11-06 2004-05-21 Agency For Science, Technology And Research A method for generating a quality oriented significance map for assessing the quality of an image or video
WO2006072637A1 (en) * 2005-01-10 2006-07-13 Thomson Licensing Device and method for creating a saliency map of an image

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHIA-CHIANG HO ET AL: "A user-attention based focus detection framework and its applications", INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING, 2003 AND FOURTH PACIFIC RIM CONFERENCE ON MULTIMEDIA. PROCEEDINGS OF THE 2003 JOINT CONFERENCE OF THE FOURTH INTERNATIONAL CONFERENCE ON SINGAPORE 15-18 DEC. 2003, PISCATAWAY, NJ, USA,IEEE, 15 December 2003 (2003-12-15), pages 1315 - 1319, XP010702992, ISBN: 0-7803-8185-8 *
WIXSON L ET AL: "Detecting salient motion by accumulating directionally-consistent flow", COMPUTER VISION, 1999. THE PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON KERKYRA, GREECE 20-27 SEPT. 1999, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, vol. 2, 20 September 1999 (1999-09-20), pages 797 - 804, XP010350486, ISBN: 0-7695-0164-8 *
YANG LIU ET AL: "A Spatiotemporal Saliency Framework", IMAGE PROCESSING, 2006 IEEE INTERNATIONAL CONFERENCE ON, IEEE, PI, October 2006 (2006-10-01), pages 437 - 440, XP031048667, ISBN: 1-4244-0480-0 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8098886B2 (en) 2001-03-08 2012-01-17 California Institute Of Technology Computation of intrinsic perceptual saliency in visual environments, and applications
CN101697593A (en) * 2009-09-08 2010-04-21 武汉大学 Time domain prediction-based saliency extraction method
US8649606B2 (en) 2010-02-10 2014-02-11 California Institute Of Technology Methods and systems for generating saliency models through linear and/or nonlinear integration
CN103020985A (en) * 2012-11-12 2013-04-03 华中科技大学 Video image saliency detection method based on field quantity analysis
CN103077536A (en) * 2012-12-31 2013-05-01 华中科技大学 Space-time mutative scale moving target detection method
CN103077534A (en) * 2012-12-31 2013-05-01 南京华图信息技术有限公司 Space-time multi-scale moving target detection method
CN103077534B (en) * 2012-12-31 2015-08-19 南京华图信息技术有限公司 Spatiotemporal object moving target detecting method
CN103077536B (en) * 2012-12-31 2016-01-13 华中科技大学 Space-time mutative scale moving target detecting method
CN104778721A (en) * 2015-05-08 2015-07-15 哈尔滨工业大学 Distance measuring method of significant target in binocular image
CN113591708A (en) * 2021-07-30 2021-11-02 金陵科技学院 Meteorological disaster monitoring method based on satellite-borne hyperspectral image
CN113591708B (en) * 2021-07-30 2023-06-23 金陵科技学院 Meteorological disaster monitoring method based on satellite-borne hyperspectral image

Also Published As

Publication number Publication date
GB0619817D0 (en) 2006-11-15

Similar Documents

Publication Publication Date Title
EP1975879B1 (en) Computer implemented method for tracking object in sequence of frames of video
WO2008040945A1 (en) A method of identifying a measure of feature saliency in a sequence of images
JP4208898B2 (en) Object tracking device and object tracking method
CN109272509B (en) Target detection method, device and equipment for continuous images and storage medium
KR101643672B1 (en) Optical flow tracking method and apparatus
CN107256225B (en) Method and device for generating heat map based on video analysis
EP2378485B1 (en) Moving object detection method and moving object detection apparatus
JP4699564B2 (en) Visual background extractor
US7822275B2 (en) Method for detecting water regions in video
JP2978406B2 (en) Apparatus and method for generating motion vector field by eliminating local anomalies
EP2352128B1 (en) Mobile body detection method and mobile body detection apparatus
CN108182695B (en) Target tracking model training method and device, electronic equipment and storage medium
CN106875426B (en) Visual tracking method and device based on related particle filtering
CN110827320B (en) Target tracking method and device based on time sequence prediction
Mahmoudi et al. Multi-gpu based event detection and localization using high definition videos
CN112036381B (en) Visual tracking method, video monitoring method and terminal equipment
CN111914756A (en) Video data processing method and device
CN102314591B (en) Method and equipment for detecting static foreground object
CN107346547B (en) Monocular platform-based real-time foreground extraction method and device
KR101799143B1 (en) System and method for estimating target size
CN113542868A (en) Video key frame selection method and device, electronic equipment and storage medium
Okarma et al. A fast image analysis technique for the line tracking robots
Kim et al. Video object segmentation and its salient motion detection using adaptive background generation
KR20210132998A (en) Apparatus and method tracking object in image fames based on neural network
CN113807354A (en) Image semantic segmentation method, device, equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07823965

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07823965

Country of ref document: EP

Kind code of ref document: A1