WO2010070128A1 - Method for multi-resolution motion estimation - Google Patents

Method for multi-resolution motion estimation Download PDF

Info

Publication number
WO2010070128A1
WO2010070128A1 PCT/EP2009/067589 EP2009067589W WO2010070128A1 WO 2010070128 A1 WO2010070128 A1 WO 2010070128A1 EP 2009067589 W EP2009067589 W EP 2009067589W WO 2010070128 A1 WO2010070128 A1 WO 2010070128A1
Authority
WO
WIPO (PCT)
Prior art keywords
motion
estimation
level
parameters
dominant
Prior art date
Application number
PCT/EP2009/067589
Other languages
French (fr)
Inventor
Fabrice Urban
Olivier Le Meur
Edouard Francois
Original Assignee
Thomson Licensing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing filed Critical Thomson Licensing
Publication of WO2010070128A1 publication Critical patent/WO2010070128A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/53Multi-resolution motion estimation; Hierarchical motion estimation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/207Analysis of motion for motion estimation over a hierarchy of resolutions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/223Analysis of motion using block-matching
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/56Motion estimation with initialisation of the vector search, e.g. estimating a good candidate to initiate a search
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows

Definitions

  • the invention relates to a multi-resolution motion estimation method. It applies notably to the domains of video analysis, coding and transcoding.
  • a video sequence comprises by its nature a high statistical redundancy both in the temporal and spatial domains. This redundancy can be used on the one hand to compress said sequence and on the other in order to analyse and characterize its content in identifying, for example, the areas in motion of images of said sequence.
  • the motion estimation algorithms search for the block or the area in the reference images that best corresponds to a given block or area of the image being processed, said image being referred to as the current image in the remainder of the description.
  • a motion estimation vector is obtained, said vector corresponding to the displacement of the block or the area between two images.
  • BMA block matching type algorithms
  • the current image is divided into blocks of MxN pixels.
  • the BMA algorithm searches for, for a given block of the current image, a corresponding block in a reference image.
  • a measurement distance D is calculated between the current image block and each candidate.
  • An example of measurement distance D using a Lagrangien is described in the article by G. Sullivan and T. Wiegand entitled 'Rate-Distortion Optimization for Video Compression', IEEE Signal Processing Magazine, pp. 74-90, November 1998. Optimization by Lagrangien enables the homogeneity of the motion field obtained by BMA to be improved.
  • the simplest version of the BMA algorithm carries out a complete search in a given window with a width of p pixels, that is to say that each reference image block present inside said window is a candidate to be considered.
  • This technique requires a significant computing power.
  • faster algorithms have been proposed, such as for example the hierarchical HME (Hierarchical Motion Estimator) model, or the improved HDS (Hierarchical Diamond Search) model
  • the BMA type algorithms thus enable a motion field to be generated composed of motion vectors, a vector being associated with each of the blocks analysed.
  • the purpose of the DME (Dominant Motion Estimator) type algorithms is to estimate the motion relative to the background of images of the video sequence. This is due, for example, to camera movements, the effects of zoom or to a panoramic shot.
  • the algorithm uses as inputs motion vectors resulting from, for example, a BMA estimation, and then proceeds to the estimation of parameters of a motion model, a two-dimensional refined model, for example.
  • the reliability of motion vectors estimated via a BMA type algorithm is usually poor.
  • these vectors do not necessarily correspond to a real motion.
  • incoherent results can thus be obtained.
  • the homogenous areas corresponding to the dominant motion are thus not detected.
  • the overall motion estimation only uses a reduced number of correct motion vectors. As a consequence, the precision of results is not good.
  • One purpose of the invention is notably to overcome the aforementioned disadvantages.
  • the object of the invention is a motion estimation method for a video sequence in which the images are divided into blocks of pixels, the motion estimation being carried out by the analysis of N versions of a same image corresponding to different resolution levels, said analysis starting with the lowest level resolution and ending with the highest level resolution of the current image.
  • a motion field estimation is carried out for the different resolution levels and the dominant motion parameters are estimated for at least one low or medium resolution level, said parameters being used as predictions for the estimation of the motion field of a higher resolution level.
  • the dominant motion parameters estimated for a given level are memorized in order to be used as predictions during the motion field estimation of the image or images corresponding to the current image for the same resolution level.
  • the motion field vectors of a given resolution level can be used, for example, as predictions for the motion field estimation of a higher resolution level.
  • the dominant motion parameters estimated for a given resolution level are, for example, memorized in order to be used to initialise the step of estimation of dominant motion parameters of the image or images corresponding to the current image for the same resolution level.
  • the dominant motion parameters verify a two- dimensional refined model.
  • a translation parameter is estimated and for the highest resolution levels, 6 parameters verifying a two- dimensional refined model are determined.
  • SAD is the sum of absolute differences between the current block and the reference block
  • C is the motion vectors coding cost, that is to say the distance measured between the motion vector and a cost indicator, ⁇ is a real constant.
  • the cost indicator corresponds to the median of motion vectors of neighbouring blocks.
  • the cost indicator corresponds to a prediction corresponding to the dominant motion estimation parameters.
  • the choice between a cost indicator corresponding to the median of motion vectors of neighbouring blocks and a cost indicator corresponding to dominant motion estimation parameters is selected per block according to, for example, the best motion vector prediction.
  • the algorithm carrying out the dominant motion estimation at a given resolution level is initialised by the dominant motion parameters estimated for the current image at a lower resolution level.
  • a confidence level of the motion estimation carried out on the current image is determined, for example, by calculating the vector level corresponding to the dominant motion at the highest resolution level.
  • figure 1 illustrates the principle of multi-resolution motion estimation
  • figure 2 provides an diagram example implementing the method according to the invention
  • figure 3 presents a way of carrying out the dominant motion estimation in the context of the invention.
  • Figure 1 illustrates the principle of multi-resolution motion estimation.
  • the BMA type algorithms as described previously involve a high level of computing complexity. So as to produce a motion estimation on a video sequence, it is therefore recommended to use this algorithm type intelligently.
  • the video sequences content is taken into account by the motion prediction techniques.
  • the motion fields usually present spatial and temporal continuity properties.
  • a set of predictions is then available.
  • a prediction corresponds to a candidate vector representing the motion of a block between two images and having to be tested in order to verify that it indeed corresponds to the real motion of said block.
  • Each prediction is evaluated by calculating, for example, a measurement distance D.
  • this measurement distance could be the sum of absolute differences, designated by the abbreviation SAD (Sum of Absolute Differences).
  • SAD Sum of Absolute Differences
  • This SAD represents the distortion between the current block and the reference block.
  • the motion vectors coding cost C can be taken into account due to the introduction of a Lagrange coefficient in order to minimize the distortions introduced by the estimation.
  • the distance D can be described by the following expression:
  • a search for the best motion vector is then carried out in the neighbouring area of the best prediction using, for example, a local search schema.
  • a local search schema An example of an algorithm enabling this search type is described in the article by Alexis Micheal Tourapis "Enhanced Predictive Zonal Search for Single and Multiple Frame Motion Estimation" proceedings of Visual Communications and Image Processing, pages 1069-1079, 2002. Numerous other BMA type algorithms exist and are distinguished by the manner in which the set of predictions is determined for a block as well as by the local search schema selected.
  • a way of enabling a reduction in the calculation complexity is to use a multi-resolution approach.
  • the HME (Hierarchical Motion Estimator) algorithm is an example.
  • a pyramid of images is deduced from the current image. This pyramid of images is composed of several images deduced from the current image, each of said images representing a search level.
  • the level 0 corresponds to the current image at full resolution.
  • a low or medium resolution level is a level other than level 0, this latter corresponding to the highest resolution level of the pyramid if images.
  • the n+1 level corresponds to the image obtained by low-pass filtering and under-sampling of the n level image.
  • the n+1 level image has therefore a lower resolution than the n level image.
  • a motion field is estimated on the highest level, that is to say on the lowest resolution image.
  • said motion field is improved using the motion field vectors obtained at the higher level as prediction, and those in descending the levels of the image pyramid until level 0 is reached.
  • the motion vectors of neighbouring blocks that have already been calculated are also used as predictions.
  • the estimation is then refined by searching for the best motion vector around the best prediction.
  • the example of figure 1 illustrates the principle of multi-resolution motion estimation.
  • Three levels are considered.
  • Level 0 corresponds to the image to be analysed and for which the resolution is not reduced.
  • Levels 1 and 2 correspond to the image to be analysed after alteration of the resolution, the resolution of level 2 being less good than that for level 1.
  • the estimation process starts at the highest level, that is to say at level 2 for the example of figure 1.
  • the image is analysed block by block.
  • a refinement can be carried out so as to find the best possible candidate 101 best corresponding to the real motion of the block analysed.
  • a prediction 102 for the block being analysed 106 at level 1 can be the result of the motion estimation carried out for the same block but at the higher level 101.
  • the refinement of the search then leads to a more refined estimation 103.
  • the same principle is then reproduced at the level 0, with one of the predictions 104 corresponding to the result of the estimation at the higher level and a refinement enabling the final result 105 to be obtained.
  • the selection of the best prediction and of the final vector resulting from the refinement mentioned previously is carried out, for example, by calculating and comparing the distance D for each candidate vector.
  • An image memory 200 contains the pyramid of multi-resolution images associated with the current image as well as the reference image or images to be used for the motion estimation.
  • the current image pyramid 201 as well as the reference image pyramid or pyramids 202 are used to carry out the different estimations described hereafter.
  • a multi-resolution approach at 5 levels, indexed 0 to 4 is used.
  • a BMA estimation of the motion field is carried out for the low resolution images starting with the level 4 203, to then process the level 3 204, the level 2 205, the level 1 206 and the level 0 208.
  • the motion field vectors resulting from the estimation at level 1 are used as prediction in order to estimate the dominant motion parameters 207.
  • the dominant motion estimation is first calculated for a low resolution motion field, namely at level 1.
  • the dominant motion estimation can be carried out, for example, according to a two-dimensional refined model. In this case, this estimation means estimating for each image block to be analysed the dominant motion parameters a 0 , ai, a 2 and b 0 , bi, b 2 verifying the equation:
  • V x and v y are the coordinates of a vector V of the motion field and X and Y are the coordinates enabling the block being processed to be located for which a dominant motion estimation is carried out.
  • the dominant motion parameters are then used to add a new prediction during the motion field estimation for the next resolution level.
  • This prediction is evaluated in the same way as the other predictions available for each block by calculating, for example, the measurement distance D previously explained using the expression (1 ).
  • the reliability of the motion field estimation is thus improved.
  • the term C of the expression (1 ) represents the motion vector coding cost, that is to say the distance measured between the motion vector and a cost indicator.
  • the median of motion vectors of neighbouring blocks is usually selected as the cost indicator. The taking into account of the coding cost enables a more homogenous motion field to be obtained.
  • the sky is usually a homogenous area.
  • a motion estimation algorithm belonging to the prior art a null motion is in general associated with this area, even in the presence of camera movements.
  • the dominant motion estimation the camera movement is identified and the sky area is constrained to follow this dominant motion, which corresponds best to the real motion.
  • the motion field estimation followed by the calculation of dominant motion parameters at each level leads to a recursive approach with a reasonable calculation complexity.
  • the dominant motion parameters of level 1 are stored in the memory
  • the dominant motion parameters estimated 207 at level 1 are moreover used as prediction for the estimation 208 of the motion field of level 0.
  • the best set of dominant motion parameters is selected for the entire image 212, that is to say the result of the dominant motion estimation carried out at the higher level 207, the memorized dominant motion parameters 210 or no parameter.
  • the motion field of level 1 is also used as prediction for the motion field estimation of level 0.
  • An overall estimation 209 of motion parameters is also carried out following the motion field estimation of level 0. To do this, the predictions used at inputs are on one hand the vector field of level 0, and on the other hand the overall motion parameters estimated 215 based on the motion field of level 1 , and finally the overall motion parameters of level 0 estimated during the analysis of the preceding image and stored in the memory 214.
  • results 217 are available following the analysis of an image belonging to a video sequence. It may be decided to have as output the motion field CM resulting from the field estimation carried out on the high resolution image. Moreover, the confidence level TC, as well as the dominant motion parameters MD estimated at level 0 can be presented at output and used for later processing operations.
  • the confidence level TC can be defined, for example, as the vector level according to the dominant motion at level 0.
  • Figure 3 presents a way of carrying out the dominant motion estimation in the context of the invention.
  • the dominant motion parameters are estimated using a recursive weighted least square algorithm.
  • the dominant motion estimation model can be adapted according to the level of resolution. Thus for medium and low resolutions, only one translation parameter can be estimated, while for the highest resolutions, a refined model with 6 parameters such as that previously described can be used.
  • the purpose of the estimation algorithm of dominant motion parameters is to estimate the values of said parameters by use of the weighted least square algorithm.
  • Three types of initial parameters can be used to initialise the algorithm. These three types of initialisation are called temporal initialisation, hierarchical initialisation and simple initialisation.
  • the main input of the dominant motion estimation algorithm is a motion field CM.
  • the parameters used for the temporal initialisation are the dominant motion parameters 214, 216 calculated for an image previously processed and stored 210, 211 in the memory.
  • the parameters used for the hierarchical initialisation, noted as initialisation 2 are the dominant motion parameters calculated for the current image at a lower resolution level 215.
  • an initialisation is calculated from all the vectors of the vector field CM using a non-weighted simple least square algorithm 302. If the temporal initialisation parameters are available, an evaluation 300 is made. It is then verified 303 that the result 307 is reliable, in that it does not comprise a number of "inliers" 309, that is to say a vector according to the dominant motion, less than a threshold value. If this is the case, the result is not considered reliable. If the result is reliable, an iteration of the weighted least square algorithm is calculated 311.
  • the hierarchical parameters when these are available and from a higher level, are used for the initialisation.
  • An evaluation 301 of parameters is carried out. As described previously, the reliability of the result is verified 304, 308, 310. If the result is reliable, an iteration 311 of the weighted least square algorithm is then calculated. If the result is not reliable 306, a step 302 using a simple least square algorithm is used and uses the motion field calculated for the current level. The least square algorithm is then executed recursively 311 , with the initialisation previously described.
  • a coherence indicator TC and the dominant motion parameters MD are presented as results. The coherence of dominant motion parameters is ensured by temporal initialisation.
  • the initialisation by the recursive approach enables cases where the motion is not temporally constant to be overcome, and the number of iterations to be reduced without affecting the final result. The processing is consequently accelerated.

Abstract

The purpose of the invention is a motion estimation method for a video sequence in which the images are divided into blocks of pixels, the motion estimation being carried out by the analysis of N versions of a same image corresponding to different resolution levels, said analysis starting with the lowest level resolution and ending with the highest level resolution of the current image. A motion field estimation (203, 204, 205, 206, 208) is carried out for the different resolution levels and the dominant motion parameters are estimated (207) for at least one low or medium resolution level, said parameters being used as predictions for the estimation of the motion field of a higher resolution level.

Description

METHOD FOR MULTI-RESOLUTION MOTION ESTIMATION
The invention relates to a multi-resolution motion estimation method. It applies notably to the domains of video analysis, coding and transcoding.
A video sequence comprises by its nature a high statistical redundancy both in the temporal and spatial domains. This redundancy can be used on the one hand to compress said sequence and on the other in order to analyse and characterize its content in identifying, for example, the areas in motion of images of said sequence. Thus, the motion estimation algorithms search for the block or the area in the reference images that best corresponds to a given block or area of the image being processed, said image being referred to as the current image in the remainder of the description. A motion estimation vector is obtained, said vector corresponding to the displacement of the block or the area between two images.
Today numerous applications require the implementation of algorithms enabling analysis in real time of the physical motion within a video sequence. To do this, "block matching" type algorithms, usually designated by the abbreviation BMA, can be used. In this case, the current image is divided into blocks of MxN pixels. The BMA algorithm then searches for, for a given block of the current image, a corresponding block in a reference image. To do this, a measurement distance D is calculated between the current image block and each candidate. An example of measurement distance D using a Lagrangien is described in the article by G. Sullivan and T. Wiegand entitled 'Rate-Distortion Optimization for Video Compression', IEEE Signal Processing Magazine, pp. 74-90, November 1998. Optimization by Lagrangien enables the homogeneity of the motion field obtained by BMA to be improved.
The simplest version of the BMA algorithm carries out a complete search in a given window with a width of p pixels, that is to say that each reference image block present inside said window is a candidate to be considered. This technique requires a significant computing power. Thus, faster algorithms have been proposed, such as for example the hierarchical HME (Hierarchical Motion Estimator) model, or the improved HDS (Hierarchical Diamond Search) model The BMA type algorithms thus enable a motion field to be generated composed of motion vectors, a vector being associated with each of the blocks analysed.
The purpose of the DME (Dominant Motion Estimator) type algorithms is to estimate the motion relative to the background of images of the video sequence. This is due, for example, to camera movements, the effects of zoom or to a panoramic shot. The algorithm uses as inputs motion vectors resulting from, for example, a BMA estimation, and then proceeds to the estimation of parameters of a motion model, a two-dimensional refined model, for example.
For the homogenous areas of an image as well as for the areas with unidirectional texture, the reliability of motion vectors estimated via a BMA type algorithm is usually poor. In fact, in these areas, these vectors do not necessarily correspond to a real motion. Within the context of an images segmentation application of the video sequence to be analysed, incoherent results can thus be obtained. In fact, the homogenous areas corresponding to the dominant motion are thus not detected. Moreover, if the vectors thus obtained are used by a DME type algorithm, the overall motion estimation only uses a reduced number of correct motion vectors. As a consequence, the precision of results is not good.
One purpose of the invention is notably to overcome the aforementioned disadvantages.
For this purpose the object of the invention is a motion estimation method for a video sequence in which the images are divided into blocks of pixels, the motion estimation being carried out by the analysis of N versions of a same image corresponding to different resolution levels, said analysis starting with the lowest level resolution and ending with the highest level resolution of the current image. A motion field estimation is carried out for the different resolution levels and the dominant motion parameters are estimated for at least one low or medium resolution level, said parameters being used as predictions for the estimation of the motion field of a higher resolution level.
According to an aspect of the invention, the dominant motion parameters estimated for a given level are memorized in order to be used as predictions during the motion field estimation of the image or images corresponding to the current image for the same resolution level.
The motion field vectors of a given resolution level can be used, for example, as predictions for the motion field estimation of a higher resolution level.
The dominant motion parameters estimated for a given resolution level are, for example, memorized in order to be used to initialise the step of estimation of dominant motion parameters of the image or images corresponding to the current image for the same resolution level. In one embodiment, the dominant motion parameters verify a two- dimensional refined model.
In another embodiment, for the estimation of dominant motion parameters of low and medium resolution levels, a translation parameter is estimated and for the highest resolution levels, 6 parameters verifying a two- dimensional refined model are determined.
For a block of pixels of a given resolution level of the current image, the best prediction available for the estimation of vectors of the motion field can be selected such that the measurement distance D is minimized, said distance being expressed by an equation of type D = SAD + λx C in which:
SAD is the sum of absolute differences between the current block and the reference block,
C is the motion vectors coding cost, that is to say the distance measured between the motion vector and a cost indicator, λ is a real constant.
According to an aspect of the invention, the cost indicator corresponds to the median of motion vectors of neighbouring blocks.
According to another aspect of the invention, the cost indicator corresponds to a prediction corresponding to the dominant motion estimation parameters.
The choice between a cost indicator corresponding to the median of motion vectors of neighbouring blocks and a cost indicator corresponding to dominant motion estimation parameters is selected per block according to, for example, the best motion vector prediction. In one implementation, the algorithm carrying out the dominant motion estimation at a given resolution level is initialised by the dominant motion parameters estimated for the current image at a lower resolution level.
A confidence level of the motion estimation carried out on the current image is determined, for example, by calculating the vector level corresponding to the dominant motion at the highest resolution level.
Other characteristics and advantages of the invention will emerge with the help of the description that follows provided as a non-restrictive example, made with regard to the annexed drawings wherein:
figure 1 illustrates the principle of multi-resolution motion estimation, figure 2 provides an diagram example implementing the method according to the invention, figure 3 presents a way of carrying out the dominant motion estimation in the context of the invention.
Figure 1 illustrates the principle of multi-resolution motion estimation. The BMA type algorithms as described previously involve a high level of computing complexity. So as to produce a motion estimation on a video sequence, it is therefore recommended to use this algorithm type intelligently.
The video sequences content is taken into account by the motion prediction techniques. In fact, the motion fields usually present spatial and temporal continuity properties. Thus, it is possible to predict the motion of a given block from the motion of its neighbouring blocks and preceding images. A set of predictions is then available. Hereafter in the description, a prediction corresponds to a candidate vector representing the motion of a block between two images and having to be tested in order to verify that it indeed corresponds to the real motion of said block. Each prediction is evaluated by calculating, for example, a measurement distance D. For example, this measurement distance could be the sum of absolute differences, designated by the abbreviation SAD (Sum of Absolute Differences). This SAD represents the distortion between the current block and the reference block. The motion vectors coding cost C can be taken into account due to the introduction of a Lagrange coefficient in order to minimize the distortions introduced by the estimation.
The distance D can be described by the following expression:
D = SAD + λ x C (1 )
A search for the best motion vector is then carried out in the neighbouring area of the best prediction using, for example, a local search schema. An example of an algorithm enabling this search type is described in the article by Alexis Micheal Tourapis "Enhanced Predictive Zonal Search for Single and Multiple Frame Motion Estimation" proceedings of Visual Communications and Image Processing, pages 1069-1079, 2002. Numerous other BMA type algorithms exist and are distinguished by the manner in which the set of predictions is determined for a block as well as by the local search schema selected.
A way of enabling a reduction in the calculation complexity is to use a multi-resolution approach. The HME (Hierarchical Motion Estimator) algorithm is an example. A pyramid of images is deduced from the current image. This pyramid of images is composed of several images deduced from the current image, each of said images representing a search level. The level 0 corresponds to the current image at full resolution. A low or medium resolution level is a level other than level 0, this latter corresponding to the highest resolution level of the pyramid if images.
The n+1 level corresponds to the image obtained by low-pass filtering and under-sampling of the n level image. The n+1 level image has therefore a lower resolution than the n level image.
Initially, a motion field is estimated on the highest level, that is to say on the lowest resolution image. Next, said motion field is improved using the motion field vectors obtained at the higher level as prediction, and those in descending the levels of the image pyramid until level 0 is reached. For a given block, the motion vectors of neighbouring blocks that have already been calculated are also used as predictions. The estimation is then refined by searching for the best motion vector around the best prediction.
The example of figure 1 illustrates the principle of multi-resolution motion estimation. Three levels are considered. Level 0 corresponds to the image to be analysed and for which the resolution is not reduced. Levels 1 and 2 correspond to the image to be analysed after alteration of the resolution, the resolution of level 2 being less good than that for level 1. The estimation process starts at the highest level, that is to say at level 2 for the example of figure 1. The image is analysed block by block. For a given block 100, one or more predictions are available. In fact, it is possible to have several predictions for each block to be analysed, and this in taking account, for example, of the motion of neighbouring blocks or indeed of preceding images, but also of the result of the motion estimation at the higher level. For each prediction, a refinement can be carried out so as to find the best possible candidate 101 best corresponding to the real motion of the block analysed.
A prediction 102 for the block being analysed 106 at level 1 can be the result of the motion estimation carried out for the same block but at the higher level 101. The refinement of the search then leads to a more refined estimation 103. The same principle is then reproduced at the level 0, with one of the predictions 104 corresponding to the result of the estimation at the higher level and a refinement enabling the final result 105 to be obtained. The selection of the best prediction and of the final vector resulting from the refinement mentioned previously is carried out, for example, by calculating and comparing the distance D for each candidate vector.
The result of these calculations per level is a motion field composed of a set of vectors, a vector of said field being associated with a current image block. Even if the HME type multi-resolution approach enables the complexity to be reduced, it remains significant. To further accelerate the calculations, it is possible, in order to improve the local search around a prediction, to implement an algorithm referred to as HDS (Hierarchical Diamond Search). This algorithm carries out a multi-resolution motion estimation while using a refinement step based on a diamond recursive search. The best prediction is refined by local search using a small pattern of several blocks in the form of a diamond or square. Figure 2 provides an example of implementation of the method according to the invention. The images of the video sequence to be analysed are processed one after another. An image memory 200 contains the pyramid of multi-resolution images associated with the current image as well as the reference image or images to be used for the motion estimation. The current image pyramid 201 as well as the reference image pyramid or pyramids 202 are used to carry out the different estimations described hereafter. In this example, a multi-resolution approach at 5 levels, indexed 0 to 4, is used. A BMA estimation of the motion field is carried out for the low resolution images starting with the level 4 203, to then process the level 3 204, the level 2 205, the level 1 206 and the level 0 208.
The motion field vectors resulting from the estimation at level 1 are used as prediction in order to estimate the dominant motion parameters 207. In other words, the dominant motion estimation is first calculated for a low resolution motion field, namely at level 1. The dominant motion estimation can be carried out, for example, according to a two-dimensional refined model. In this case, this estimation means estimating for each image block to be analysed the dominant motion parameters a0, ai, a2 and b0, bi, b2 verifying the equation:
Figure imgf000008_0001
in which Vx and vy are the coordinates of a vector V of the motion field and X and Y are the coordinates enabling the block being processed to be located for which a dominant motion estimation is carried out.
The dominant motion parameters are then used to add a new prediction during the motion field estimation for the next resolution level. This prediction is evaluated in the same way as the other predictions available for each block by calculating, for example, the measurement distance D previously explained using the expression (1 ). The reliability of the motion field estimation is thus improved. The term C of the expression (1 ) represents the motion vector coding cost, that is to say the distance measured between the motion vector and a cost indicator. The median of motion vectors of neighbouring blocks is usually selected as the cost indicator. The taking into account of the coding cost enables a more homogenous motion field to be obtained.
In the context of the invention, it is possible to use two different cost indicators, said indicator being selected according to the best motion vector prediction: either the prediction from the dominant motion estimation or the median previously described.
The areas according to the dominant motion are then identified directly, even in the case of homogenous areas.
As an example, the sky is usually a homogenous area. Using a motion estimation algorithm belonging to the prior art, a null motion is in general associated with this area, even in the presence of camera movements. Using the dominant motion estimation, the camera movement is identified and the sky area is constrained to follow this dominant motion, which corresponds best to the real motion. The motion field estimation followed by the calculation of dominant motion parameters at each level leads to a recursive approach with a reasonable calculation complexity.
The dominant motion parameters of level 1 are stored in the memory
211 to be used for the analysis of the next image as prediction 214 for the estimation 206 of the motion field of level 1. The use of dominant motion is rejected 213 for the entire image if the parameters are not reliable according to a reliability criterion estimated with said parameters.
The dominant motion parameters estimated 207 at level 1 are moreover used as prediction for the estimation 208 of the motion field of level 0. The best set of dominant motion parameters is selected for the entire image 212, that is to say the result of the dominant motion estimation carried out at the higher level 207, the memorized dominant motion parameters 210 or no parameter.
The motion field of level 1 is also used as prediction for the motion field estimation of level 0. An overall estimation 209 of motion parameters is also carried out following the motion field estimation of level 0. To do this, the predictions used at inputs are on one hand the vector field of level 0, and on the other hand the overall motion parameters estimated 215 based on the motion field of level 1 , and finally the overall motion parameters of level 0 estimated during the analysis of the preceding image and stored in the memory 214.
Several results 217 are available following the analysis of an image belonging to a video sequence. It may be decided to have as output the motion field CM resulting from the field estimation carried out on the high resolution image. Moreover, the confidence level TC, as well as the dominant motion parameters MD estimated at level 0 can be presented at output and used for later processing operations. The confidence level TC can be defined, for example, as the vector level according to the dominant motion at level 0.
Figure 3 presents a way of carrying out the dominant motion estimation in the context of the invention. The dominant motion parameters are estimated using a recursive weighted least square algorithm. The dominant motion estimation model can be adapted according to the level of resolution. Thus for medium and low resolutions, only one translation parameter can be estimated, while for the highest resolutions, a refined model with 6 parameters such as that previously described can be used.
The purpose of the estimation algorithm of dominant motion parameters is to estimate the values of said parameters by use of the weighted least square algorithm. Three types of initial parameters can be used to initialise the algorithm. These three types of initialisation are called temporal initialisation, hierarchical initialisation and simple initialisation.
The main input of the dominant motion estimation algorithm is a motion field CM.
The parameters used for the temporal initialisation, noted as initialisation 1 , are the dominant motion parameters 214, 216 calculated for an image previously processed and stored 210, 211 in the memory. The parameters used for the hierarchical initialisation, noted as initialisation 2, are the dominant motion parameters calculated for the current image at a lower resolution level 215.
If none of the initialisations 1 and 2 are reliable, an initialisation is calculated from all the vectors of the vector field CM using a non-weighted simple least square algorithm 302. If the temporal initialisation parameters are available, an evaluation 300 is made. It is then verified 303 that the result 307 is reliable, in that it does not comprise a number of "inliers" 309, that is to say a vector according to the dominant motion, less than a threshold value. If this is the case, the result is not considered reliable. If the result is reliable, an iteration of the weighted least square algorithm is calculated 311.
In the case where temporal initialisation leads to a non-reliable result 305, the hierarchical parameters when these are available and from a higher level, are used for the initialisation. An evaluation 301 of parameters is carried out. As described previously, the reliability of the result is verified 304, 308, 310. If the result is reliable, an iteration 311 of the weighted least square algorithm is then calculated. If the result is not reliable 306, a step 302 using a simple least square algorithm is used and uses the motion field calculated for the current level. The least square algorithm is then executed recursively 311 , with the initialisation previously described. A coherence indicator TC and the dominant motion parameters MD are presented as results. The coherence of dominant motion parameters is ensured by temporal initialisation. The initialisation by the recursive approach enables cases where the motion is not temporally constant to be overcome, and the number of iterations to be reduced without affecting the final result. The processing is consequently accelerated.

Claims

1 . Method for motion estimation of a video sequence for which the images are divided into blocks of pixels, the motion estimation being carried out by analysis of N versions of a same image corresponding to different resolution levels of the image, said analysis starting at the lowest resolution level and ending at the highest resolution level of the current image, said method being characterized in that a motion field estimation (203, 204, 205, 206, 208) is carried out for the different resolution levels and that dominant motion parameters defining the dominant motion within the image are estimated (207) on at least one low and medium level resolution, said parameters being used as predictions for the motion field estimation of a higher resolution level.
2. Method according to claim 1 , characterized in that the dominant motion parameters estimated for a given level are memorized (21 1 ) in order to be used as predictions (214) during the motion field estimation of the image or images corresponding to the current image for the same resolution level.
3. Method according to any one of the preceding claims, characterized in that the vectors of a motion field of a given resolution level are used as predictions for the estimation of the motion field of the higher resolution level.
4. Method according to any one of the preceding claims, characterized in that the dominant motion parameters estimated for a given resolution level are memorized (210, 21 1 ) in order to be used to initialise (214,216) the step of estimation of the dominant motion parameters of the image or images corresponding to the current image for the same resolution level.
5. Method according to any one of the preceding claims, characterized in that the dominant motion parameters verify a two-dimensional refined model.
6. Method according to any one of the preceding claims, characterized in that for a block of pixels of a given resolution level of the current image, the best prediction available for the estimation of vectors of the motion field can be selected such that the measurement distance D is minimized, said distance being expressed by an equation of type
D = SAD + λ x C in which:
SAD is the sum of absolute differences between the current block and the reference block, C is the motion vectors coding cost, that is to say the distance measured between the motion vector and a cost indicator, λ is a real constant.
7. Method according to claim 6, characterized in that the cost indicator corresponds to the median of motion vectors of neighbouring blocks.
8. Method according to any one of claims 6 or 7, characterized in that the cost indicator corresponds to a prediction corresponding to the dominant motion estimation parameters.
9. Method according to claims 7 and 8, characterized in that the choice between a cost indicator corresponding to the median of motion vectors of neighbouring blocks and a cost indicator corresponding to dominant motion estimation parameters is selected per block according to the best motion vector prediction.
10. Method according to any one of the preceding claims, characterized in that the algorithm carrying out the dominant motion estimation at a given resolution level is initialised (215) by the dominant motion parameters estimated for the current image at a lower resolution level.
1 1 . Method according to any one of the preceding claims, characterized in that a confidence level TC of the motion estimation carried out on the current image is determined by calculating the vector level corresponding to the dominant motion at the highest resolution level.
PCT/EP2009/067589 2008-12-19 2009-12-18 Method for multi-resolution motion estimation WO2010070128A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR0858832 2008-12-19
FR0858832A FR2940492A1 (en) 2008-12-19 2008-12-19 MULTI-RESOLUTION MOTION ESTIMATING METHOD

Publications (1)

Publication Number Publication Date
WO2010070128A1 true WO2010070128A1 (en) 2010-06-24

Family

ID=40897596

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2009/067589 WO2010070128A1 (en) 2008-12-19 2009-12-18 Method for multi-resolution motion estimation

Country Status (2)

Country Link
FR (1) FR2940492A1 (en)
WO (1) WO2010070128A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9790791B2 (en) 2011-05-11 2017-10-17 Innovative Technological Systems S.R.L. External combustion engine
CN107492113A (en) * 2017-06-01 2017-12-19 南京行者易智能交通科技有限公司 A kind of moving object in video sequences position prediction model training method, position predicting method and trajectory predictions method
GB2575672A (en) * 2018-07-19 2020-01-22 Snell Advanced Media Ltd Motion estimation in video

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3090969B1 (en) * 2018-12-21 2022-06-03 Naval Group Device and method for estimating movement of an image sensor between two images, associated computer program

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040091047A1 (en) * 2002-11-11 2004-05-13 Sony Corporation Method and apparatus for nonlinear multiple motion model and moving boundary extraction
FR2872989A1 (en) * 2004-07-06 2006-01-13 Thomson Licensing Sa METHOD AND DEVICE FOR CHOOSING A MOTION VECTOR FOR ENCODING A BLOCK ASSEMBLY

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040091047A1 (en) * 2002-11-11 2004-05-13 Sony Corporation Method and apparatus for nonlinear multiple motion model and moving boundary extraction
FR2872989A1 (en) * 2004-07-06 2006-01-13 Thomson Licensing Sa METHOD AND DEVICE FOR CHOOSING A MOTION VECTOR FOR ENCODING A BLOCK ASSEMBLY

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ALEXIS MICHEAL TOURAPIS: "Enhanced Predictive Zonal Search for Single and Multiple Frame Motion Estimation", PROCEEDINGS OF VISUAL COMMUNICATIONS AND IMAGE PROCESSING, 2002, pages 1069 - 1079
G. SULLIVAN; T. WIEGAND: "Rate-Distortion Optimization for Video Compression", IEEE SIGNAL PROCESSING MAGAZINE, November 1998 (1998-11-01), pages 74 - 90, XP011089821, DOI: doi:10.1109/79.733497
HSU C-T ET AL: "Mosaics of video sequences with moving objects", SIGNAL PROCESSING. IMAGE COMMUNICATION, ELSEVIER SCIENCE PUBLISHERS, AMSTERDAM, NL, vol. 19, no. 1, 1 January 2004 (2004-01-01), pages 81 - 98, XP004476840, ISSN: 0923-5965 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9790791B2 (en) 2011-05-11 2017-10-17 Innovative Technological Systems S.R.L. External combustion engine
CN107492113A (en) * 2017-06-01 2017-12-19 南京行者易智能交通科技有限公司 A kind of moving object in video sequences position prediction model training method, position predicting method and trajectory predictions method
GB2575672A (en) * 2018-07-19 2020-01-22 Snell Advanced Media Ltd Motion estimation in video
GB2575672B (en) * 2018-07-19 2021-11-10 Grass Valley Ltd Motion estimation in video

Also Published As

Publication number Publication date
FR2940492A1 (en) 2010-06-25

Similar Documents

Publication Publication Date Title
RU2323541C2 (en) Method and device for conducting high quality fast search for predicted movement
JP2978406B2 (en) Apparatus and method for generating motion vector field by eliminating local anomalies
RU2381630C2 (en) Method and device for determining block conformity quality
KR100534207B1 (en) Device and method for motion estimating of video coder
KR100492127B1 (en) Apparatus and method of adaptive motion estimation
KR100378902B1 (en) A method and an apparatus for processing pixel data and a computer readable medium
KR0171146B1 (en) Feature point based motion vectors detecting apparatus
EP0737012B1 (en) Method for segmenting and estimating a moving object motion
JPH0799660A (en) Motion compensation predicting device
Philip et al. A comparative study of block matching and optical flow motion estimation algorithms
CN110992393B (en) Target motion tracking method based on vision
WO2010070128A1 (en) Method for multi-resolution motion estimation
WO2004064403A1 (en) Efficient predictive image parameter estimation
US20080144716A1 (en) Method For Motion Vector Determination
CN109688411B (en) Video coding rate distortion cost estimation method and device
JP2017515372A (en) Motion field estimation
US20070104275A1 (en) Motion estimation
Strobach Quadtree-structured linear prediction models for image sequence processing
US8179967B2 (en) Method and device for detecting movement of an entity provided with an image sensor
JP2005517257A (en) Segmentation apparatus and method
JPH06205403A (en) Movement vector detecting method using integral projection
Heithausen et al. Temporal Prediction of Motion Parameters with Interchangeable Motion Models
JPH05314391A (en) Method and apparatus for registration of image in series of digital images
Zhang et al. Automatic video object segmentation by integrating object registration and background constructing technology
JP2002539685A (en) Block matching motion estimation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09805708

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09805708

Country of ref document: EP

Kind code of ref document: A1