Motion estimation
The present patent application relates in general to improved motion estimation in order to overcome the aperture problem.
With the advent of new technology in the field of video processing, the motion compensated video algorithms became affordable as well as necessary for high quality video processing. To provide high quality video processing, different motion compensation applications are provided. Applications such as motion compensated (MC) filtering for noise reduction, MC prediction for coding, MC de-interlacing for conversion from interlaced to progressive formats, or MC picture rate conversions are known. These applications benefit from motion estimation (ME) algorithms, for which various methods are known.
One example of a motion estimation algorithm in video format conversion, which is a block based motion estimator, is known as 3D recursive search (3D RS) block- matcher. Motion estimation algorithms were based on the assumption, that the luminance or chrominance value of a pixel may be approximated by a linear function of the position. This assumption may only be correct for small displacements. The limitation may, however, be resolved by pixel-based motion estimation methods (PEL-recursive methods). The implementation of motion estimation also includes block estimation. In block matching motion estimation algorithms, a displacement vector D is assigned to the center X of a block of pixels B( X ) in a current field n, by searching a similar block within a search area SA(X ), also centered at X , but in a temporary neighbouring field, for example n-1, n+1. The similar block may have a center, which may be shifted with respect to X over the displacement D (X , n). To find D (X , n), a set of candidate vectors C are evaluated. To evaluate the candidate vectors C of the set, an error measure ε (c , X , n ), which quantifies block similarities is calculated.
The set of candidate vectors C describing all possible displacements with respect to X within the search are SA( X ) can be described as
CS™ = {c\- N ≤ CX ≤ +N -M < C < +M},
with N, M constants limiting the search area SA( X ). The displacement vector D (X , n) resulting from the full- search block- matching process, is a candidate vector C which yields the minimum values of at least one error function ε (c , X , n ) . This can be expressed by:
D(X , n) = arg min -e£S__ (ε (c, X,n))
Usually the vector D (X , n) with the smallest match error is assigned to all positions x in the block B(X). The error value for a given candidate vector C can be a function of the luminance values of the pixels in the current block and those of the shifted block from a previous field, summed over the block B(X). The error value can also be any other function of pixel values, and can be expressed as a sum of cost functions:
ε (c, X,n)= ∑ Cost (F (X, Π ), F (X - C ,n - p )) xs B (X )
with a common choice for p=l for non-interlaced signals and p=2 for interlaced signals. A cost function can, for example, be the Sum of Absolute Difference between two blocks of pixels.
The error value of a given candidate vector can also be considered as a Cost function or constraint function. Constraint functions which relate to the nature of motion within the image can, for example, be the intensity conservation constraint or the spatial coherence constraint. The choice of constraints is best when they separately lead to orthogonal subspaces of solutions. From the image content, it may be possible to impose physical restrictions on the set of possible motion vectors, for example, like motion smoothness, inertia of objects, etc. Mathematically, these restrictions can take the form of an equation having as variables the motion parameters/motion vectors (constraint on the motions parameters/motion vectors), and it can be added to or combined with the cost function to be minimized.
Fig. 1 depicts a block-matching motion estimation algorithm as described above. Shown are two temporal instances n-1, n of an image sequence 2. Within image sequence 2, various blocks 4 at horizontal position X and vertical position Y are determined.
To determine the displacement D (X , n) of a block 4, various candidate vectors C 8 may be evaluated applying the above-mentioned error measure ε (c , X , n ) . One possible error function may be the assumed absolute difference (SAD) criterion, which is
SAD(c,X,n)= Y F(x,n)-F(x-C,n-p] uiftt)
where F (x,n) is the luminance value of pixels within block 4. The displacement vector D is assigned to the center J of a block 4 of pixel positions B( X ) in the current image by searching a similar block 10 within a search area SA( X ) 6, also centered at X , but in a previous or following image. This may be a temporally previous or following image or field. A correlation measure between the two blocks 4, 10 is therefore optimized to identify the displacement vector D .
Further error criteria, such as Mean Square Error, and Normalized Cross Correlation Function may be used. Particularly the last may be used in case of calculating in the Fourier domain. A further example of an error criterion may be the number of significantly different pixels. The above criteria can be cost functions. The physical restrictions, such as boundary conditions, could provide new candidates for independent cost functions
A block matching method enables finding candidate vectors close to the true- motion vectors. To further improve the consistency, a penalty system has been adopted, which adds to the error function a penalty value, which penalty value can depend on the type of prediction, i.e. spatial prediction of temporal prediction.
However, the error/cost functions described above do not fully determine the 2-dimensional motion parameters. These cost functions may suffer from the so called aperture problem. To overcome this problem, additional information about the image content, e.g. some physical constraints i.e. the motion is smooth, the object have inertia, etc., might be necessary. For example in a sequence with a single orientation, i.e. edges, all candidate vectors having the same vector component in the direction of the single edge may have the same cost function. Therefore, the SAD does not fully determine the two dimensional
components of a motion, but just a component perpendicular to the edge. A motion estimation using SAD solely is degenerate, being only determined up to a constant tangential to the single orientation in the sequence, e.g. an edge. This problem is also called the aperture problem of motion compensation. This problem has been solved by simultaneously imposing additional error functions, such as boundary conditions, or constraints related to the nature of motion in the video scene. A necessary condition to remove the degeneracy in determining the motion vector is that each constraint should lead to a reciprocally orthogonal subspace solution. The physical constraints imposed on the space of motion vector solutions (i.e. smooth motion, object inertia, boundary conditions, etc.) can divide the total space of solutions in subspaces. Each motion vector in a subspace obeys at least one constraint. When two physical constraints are independent, their corresponding subspaces contain reciprocally independent vectors, in other words the corresponding subspaces are orthogonal. . A set of candidate motion vectors with orthogonal subspaces CSmax can be used. In general, a set of m constraints with error functions εm can lead to a solution for the motion estimated vector
Cn^n , if all constraints are fulfilled. For this motion estimated vector the total cost function has its minimum. The total cost function can be calculated as
ε (cmm ,X,«)= ^ λmεm (cmm ,X,« ) m
with λm some arbitrary multiplication factor. However, the total cost iunction having its minimum does not necessarily lead to fulfilling all individual constraints.
For example, one cost function can be degenerate with multiple minima and an absolute minimum for candidate motion vector Ckmm . The total cost function can have a local minimum for this constraint at the local minimum of the one constraint. If a local minimum of one constraint is much smaller, than the local minima of the other constraints, the non- minimum contributions from the other cost functions can be compensated. Thus, the total error function results in an erroneous minimum, which might not be the minimum value for all the individual constraints. In such a case, the inequality
ε (cmm ,X,n) = ^ λ/ε / (cmm ,X,n)+ λ4ε4 (cmm , X,n) l≠k
> ∑ λιει{cm k m ,X,n)+ λkεk {cm k m ,X,n) l≠k
= ε (c*m ,X,n)
holds, provided that Δε^ = εk[Cmm,X,n)-εk{C^m,X,n) is large enough to satisfy the following inequality
∑λβ,(cπm,X,n)+ λkAεk l≠k k > l≠k
Therefore, one object of the present application is to provide a solution, which overcomes the degeneration of cost functions. Another object of the invention is to provide a motion estimation, which overcomes the aperture problem. A further object of the invention is to provide motion estimation, which result in improved estimated motion vectors.
To overcome one or more of these problems, the application provides according to one aspect a method for determining estimated motion vectors within image signals comprising creating at least two candidate motion vectors for at least one pixel within an image of the signal, calculating for each of said candidate motion vectors at least two error criteria, and choosing the candidate motion vector that minimizes a non- linear function of the error criteria as the estimated motion vector for the at least one pixel. Signals according to embodiments can be any images sequence, for example a video sequence. Images within the signals can be composed of pixels. Pixels can be image elements describing the luminance and chrominance of the particular part of the image. A plurality of adjacent pixels within the image can be understood as pixel block.
Elements within the image can be subject to motion of several frames. Motion of the elements can be described by motion vectors. Motion vectors can describe the direction and speed of movement of particular pixels or blocks of pixels.
Motion estimation can be understood as calculating a probability of motion. Motion vectors which are most likely to describe the actual motion within the image can be calculated using motion estimation. With these motion vectors, it can be possible to predict images of following frames. The estimated motion vectors can also be used for de-interlacing interlaced images.
Candidate motion vectors can be a set of possible vectors describing possible motion of pixels or blocks of pixels. The set of candidate motion vectors can be used to determine one estimated motion vector which suits best the actual motion within the image. For example, high quality video format conversion algorithms, such as for example de- interlacing and temporal up-conversion, and computer vision applications and video compression may require motion estimation. The aperture problem during motion estimation arises from the absence of additional knowledge about the nature of the motion in the scene. From signals containing single orientations, i.e. edges, the two dimensional motion components are not determined or are only determined up to a constant in the direction tangential to the edge.
This uncertainty may lead to multiple minima in a cost function and to a degeneracy of motion estimation. The degeneracy may lead to erroneous evaluation of motion vectors, which may lead to artefacts in the video format conversion. According to one embodiment this problem is solved by imposing multiple error functions with different constraints, which are related to the nature of motion. Such constraints can be intensity conservation or spatial coherence. The choice of the constraints is preferred such that they separately lead to orthogonal solutions.
The application of a non- linear combination of multiple error functions, which minimizes the most expensive constraint over a set of candidate motion vectors may lead to optimized motion estimation. The most expensive cost function can take the maximum value. By minimizing the maximum of all the cost functions, it is possible to minimize the most expensive cost function.
According to embodiments the at least two candidate motion vectors describe possible displacements of a pixel within a search area. Such displacements can be in the x- and y-direction. The vectors can describe the direction of motion by their x- and y- components. The speed of motion can be described by the absolute value of the vectors.
The at least two candidate motion vectors are created using spatial and/or temporal prediction according to embodiments. For example, in scanned images providing
scanned image lines, causality prohibits the use of spatial prediction in blocks of the image not yet been transmitted. Instead, temporal prediction can be used.
The error criteria can be at least one of a summed absolute difference criterion, a mean square error criterion, a normalized cross correlation criterion, or a number of significant pixels criterion. These error criteria can be understood as constraints. The non- linear function is the maximum of the error criteria with:
ε(c,X,n) = MAX^m(c,X,n)}
with ε m (c, X, n ) the n/1 (with m> 1 ) error criterion ε of candidate vector C at position X .
By this, the most expensive constraint can be minimized over the set of candidate motion vectors. The non- linear function can also be the median of the error criteria. Further, at least one of the error criteria is calculated from absolute differences of interpolated pixels. At least one of the error criteria can be calculated from absolute differences of an interpolated pixel and an intra-field interpolated pixel. At least one of the error criteria can also be calculated from an absolute difference between one pixel from a current frame or field and a motion compensated interpolated pixel from a previous or following de-interlaced frame or field.
To provide motion estimation improvement with interlaced signals, calculating pixel values from an interlace signal using a generalized sampling theorem is provided. Another aspect of the invention is a computer program for determining estimated motion vectors within image signals the program comprising instructions operable to cause a processor to create at least two candidate motion vectors for at least one pixel within an image of the signal, calculate for each of said candidate motion vectors at least two error criteria, and choose the candidate motion vector that minimizes a non- linear function of the error criteria as the estimated motion vector for the at least one pixel.
A further aspect is a computer program product for determining estimated motion vectors within image signals with a program tangibly stored thereon with instructions operable to cause a processor to create at least two candidate motion vectors for at least one pixel within an image of the signal, calculate for each of said candidate motion vectors at least two error criteria, and choose the candidate motion vector that minimizes a non-linear function of the error criteria as the estimated motion vector for the at least one pixel.
These and others aspects of the invention will become apparent from and elucidated with reference to the following embodiments.
In the drawings show: Fig. 1 an illustration of a block matching; Figs. 2a-b an illustration of a candidate set of vectors of a recursive search block-matcher;
Fig. 3 an illustration of a block matching on a sequence with a single orientation;
Fig. 4 an illustration of orthogonal solution subspaces that can lead to unique solution;
Fig. 5 an illustration of multiple criteria cost functions.
The block-matcher as depicted in Fig. 1 has been described above. A block 4 in the current image n and a test block 10 within the search area 6 in the previous image n-1 are connected using candidate vector C 8. A correlation measure, the match error between the two blocks 4, 10 may be optimized to identify the best candidate vector C 8. By that, different test block 10 using different candidate vectors C 8 may be tested and the match error may be minimized for a best matching candidate vector. Searching the minimum of a match criterion in a block-matcher, is a two dimensional optimization problem for which many solutions are available. One possible implementation uses a three-step block-matcher, a 2D logarithmic, or cross search method, or the one-at-a-time-search block-matching. Different block-matching strategies are disclosed in G. de Haan, "Progress in Motion Estimation for Consumer Video Format Conversion", IEEE transactions on consumer electronics, vol. 46, no. 3, August 2000, pp. 449-459.
One possible implementation of an optimization strategy may be a 3D recursive search block-matcher (3D RS). This 3D RS accounts for that for objects larger than blocks, a best candidate vector may occur in the spatial neighbourhood of a pixel or block. As depicted in Fig. 2a, assuming a scanning direction from left to right, and from top to bottom, causality prohibits the use of spatial prediction vectors 4 Ds, right and below the current block Dc 4a. Instead, temporal prediction vectors D 4c need to be used. In relation to a current block Dc 4a, within a search area 2, spatial prediction vectors Ds 4b and temporal prediction vectors Dt 4c are available. As only blocks that already have been
scanned may be used for spatial prediction of the current block Dc 4a, spatial prediction is only possible with the blocks Ds 4b. Temporal prediction is possible with the blocks Dt 4c, as from a previous temporal instance of search area 2, information about the blocks Dt 4c may be available. Fig. 2b shows the use of two spatial prediction vector Ds 4b and one temporal prediction vector Dt 4c to predict a current block 4a.
It has been found that evaluating all possible vectors within the search range makes no sense. It may already be sufficient to evaluate vectors taken from spatially neighbouring blocks such as:
where CSmax is defined as a set of candidate vectors C describing all possible displacements
(integers, or non- integers on the pixel grid) with respect to X within the search area SA ( x ) in the previous image as
where n and m are constants limiting SA( X ). To reduce calculations overhead, it may be sufficient to evaluate vectors C only taken from the spatially neighbouring blocks CS. X, Y may define the block width and height, respectively. Causality and the need for pipelining in the implementation prevents that all neighbouring blocks are available, and at initialization, all vectors may be zero.
To account for the availability of the vectors, those vectors that have not yet been calculated in the current image may be taken from the corresponding location in the previous vector field. Fig. 2a illustrates the relative position of the current block Dc 4a and the block from which the result vectors are taken as candidate vectors Ds 4b, Dt 4c. In case the blocks are scanned from top left to bottom right, the candidate set may be defined as
Cs(x,n) =
k= -1, 0, 1 i=-l, 0, 1 j= 0,l
This candidate set CS implicitly assumes spatial and/or temporal consistency. The problem of zero vectors at initialization may be accounted for by adding an update vector. One possible implementation of omitting some spatio-temporal predictions from the candidate set is depicted in Fig. 2b, where the candidate set CS((X,n) may be defined by
where the update vectors U1(X^) and U2(X,n) may be alternately available, and taken from a limited fixed integer, or non-integer, update set, such as
0
US1(X, n) = 2yu,-2yu,3xu,-3xu,
A model capable of describing more complex object motion than only translation, for instance rotation, or scaling, may use segmenting the image in individual objects and estimating motion parameter sets for each of these objects. As the number of blocks usually exceeds the number of objects with more than an order of magnitude, the
number of motion parameters that needs to be calculated per image is reduced. However, the calculation complexity increases.
According to embodiments, a pixel block object may be determined, which may be referred to as a group of pixels. A motion parameter, for example a motion vector for each group of pixels, may be determined. Candidate vectors may be tested by calculating the summed absolute difference SAD between the luminance values of the group of pixels in the current image and the corresponding motion compensated luminance values in a second temporally neighbouring image. Two temporal instances may be used in order to estimate the motion parameter sets of a local group of pixels. The determination of the estimated motion vector from the candidate motion vectors using the summed absolute difference criterion or any other single criterion as described above, does not fully determine the 2-dimensional motion parameters. As illustrated in Fig. 3 schematically, video content can comprise one single edge 12. The motion in the image can be the motion of the single edge 12. Motion estimation for one single block 4 can be done using candidate motion vectors 8a-8c. The test blocks 10a- 10c for the candidate motion vectors 8a- 8c result in the same value for the cost function. As a result, using one single cost function does not fully determine the two dimensional components of the motion, but just the component perpendicular to the edge. Providing one single cost function is degenerate, being only determined up-to a constant in the direction tangential to the edge. This problem may also be called the aperture problem in motion estimation.
To overcome this problem, different cost iunctions can be solved simultaneously. Such cost functions can be boundary conditions, or constraints related to the nature of motion in video scenes. One important condition to remove the degeneracy in determining the motion vector is that each cost function should lead to a reciprocally orthogonal subspace solution.
Fig. 4 represents such orthogonal subspaces of the candidate motion vector set CS"18", minimizing to different cost functions E1 , and ε2 . Cost function E1 can be characterized by candidate vectors 18a-18e. Cost function ε2 can be represented by candidate vectors 19a-19d. Each subspace is characterized by the fact that all its elements have equal, and well defined motion components in the direction perpendicular to the edge. Imposing simultaneously both constraints may lead to a unique solution, if the subspace of solutions of these two constraints are orthogonal to each other. A unique solution may be the candidate vectors 18c, 19c.
In general, a set of m constraints can be imposed simultaneously on the set of vector candidates. These constraints can be:
ε ! (c, X,n) =
εm(c,X,n) = y CoStm(F{x,n),F(x - C,n -p)}
XEB[X )
If all the constraints are fulfilled, e.g. have absolute minima for some value
C00n , then also the total cost function
ε(cmm,X,n)= £λmεw(cmm,X,n) m
has a minimum, where λm > 0 . However, if the total cost function is minimal, the individual constraints are not necessarily fulfilled. This may lead to artefacts in motion estimation.
To overcome this problem, the application renders the overall cost function more robust. This may be done by a non- linear combination of the cost functions, as
This non-linearity may be less susceptible to errors. Indeed, the value C^1n can become the minimum of the total cost function only if the remaining constraints are also close to a local or absolute minimum. The following inequality is true for the above nonlinear combination
Fig. 5 illustrates the effect of individual cost functions ε15ε2 on the total cost, when calculated differently. Slopes 20a, 20b indicate to individual cost function. Slope 20a
has two local minima and one absolute minimum. Cost function 20b has one absolute minimum.
With a linear combination of the cost functions 20, a total cost function (ε15ε2)/2 22 has its minimum value at the absolute minimum of one of the cost functions. The illustrated cost functions represent the variation over a candidate vector set of two arbitrary constraints E1 and ε2 . The linear cost function falls into one of the minima of constraint ε2 . The non- linear combination of the cost function MAX(ελ,ε2 ) is illustrated as slope 24. The minimum of this total cost function is enforced to take a value close to the common local minimum of both individual constraints. From this example it becomes apparent that the total cost function lies closer to the minima of each of the constraints than a linear combination.
Considering applying the non- linear combination of the constraints to interlaced video material results in the problem, that in the interlace material pixels are not always available, depending on the interlacing phase. For example in case a pixel is not available, the missing pixel at the position x can be attributed the value of the luminance calculated at that position by means of a de- interlacing algorithm. For calculating this pixel value, it can be reconstructed by using a generalized sampling theory (GST) interpolation filter, which may used samples either from the n-, and n-1 fields, or from the n-, and n+1 -fields. With F"'"'1 (x,Cn,n±l,n) the GST output using the n- and the n ± 1 - fields, a squared absolute difference error function can be
This first constraint is not sufficient to build up a robust total cost function. In order to avoid erroneous motion vectors occurring at even numbers of pixel displacements between two successive fields, a second constraint can be imposed. This second constraint can make use of the already de-interlaced previous frame n-1. This allows performing a motion compensated bilinear interpolation to estimate existing pixel values in the current field. The output of the be-linear interpolator is given by F
n'
n+l(x,n), which allows building a second error function as
(c,X,n)=
This second constraint alone again does not allow building up a robust total cost function. Building a linear combination of these error functions results in just an alternation of the motion estimation criterion from a solution of one of these criteria. However, applying the non- linear combination of the error functions, as provided by the current application, results in a more robust motion estimation.
The application provides a robust solution for motion estimation, which can be applied to different video format conversion algorithms. Motion estimation by means of a non- linear combination between different motion estimation criteria leading to orthogonal solutions is proposed.