EP1790169A1

EP1790169A1 - Method for estimating motion using deformable meshes

Info

Publication number: EP1790169A1
Application number: EP05805589A
Authority: EP
Inventors: Nathalie Cammas; Stéphane PATEUX; Nathalie Laurent-Chatenet
Original assignee: France Telecom SA
Current assignee: Orange SA
Priority date: 2004-09-15
Filing date: 2005-09-06
Publication date: 2007-05-30
Also published as: CN101036390A; CN101036390B; WO2006030103A1; JP4870081B2; JP2008514073A; WO2006030103A8; US8761250B2; US20070291845A1

Abstract

The invention concerns a method which consists in analyzing a field of motion of images, estimated by using a first mesh, to detect a faulty area in the first mesh, and in locating a rupture line in said area; then generating a second mesh including a faultless part consisting of meshes of the first mesh outside the faulty area and two sub-meshes which overlap in a region including the rupture line. Each of the two sub-meshes includes respective meshes delimited by nodes including nodes shared with the faultless part, located at the boundary of the faulty area, and additional nodes not belonging to the faultless part, the rupture line being located between the respective nodes of the two sub-meshes shared with the faultless part. Said second mesh is used to finally estimate the field of motion in the group of images concerned.

Description

PE MOTION ESTIMATION METHOD IN SEQUENCES

IMAGES ANIMATED USING DEFORMABLE MESHING, ENCODER

AND VIDEO DECODER IMPLEMENTING THE METHOD

The present invention relates to digital processing of moving images, and more particularly to motion estimation techniques between successive images of a sequence.

Most video coding schemes (notably MPEG-1, 2,4 and ITU-T H26x) use a representation of the movement using translations on block partitioning of the images. This model of movement generates many problems. It is largely the source of the blocking effect often seen in decoding with current video coding schemes, and it offers a model of representation that is not well adapted to certain types of movement (zoom, rotation, etc.). .).

Other modes of representation of the movement have been proposed in order to overcome these defects. These include active meshes.

In this representation mode, the motion is represented by means of a set of values defined on the nodes of a mesh positioned on an image. An interpolation technique is used to deduce from the values stored at the nodes of this mesh, a motion vector at any point of the image. Typically, it can be a type interpolation

Lagrange, that is to say that the movement vector assigned to a point of the image is an affine function of the vectors calculated for the neighboring nodes.

It is thus possible to substitute the mode of motion compensation of an MPEG or other type of video encoder with a compensation mode in motion by meshes. The meshes can also be used to decorrelate the motion and texture information of a video sequence in order to perform an analysis-synthesis coding scheme.

These active meshes offer both richer motion models and the possibility of gaining coding efficiency through more efficient coding of motion information, especially when meshes hierarchical are used (see for example WO 00/14969).

Deformable meshes define a continuous representation of a motion field, while the actual motion of a video sequence is usually of a discontinuous nature. Thus, when different planes and objects overlap in a scene, areas of occultation and discoveries appear, generating lines of discontinuities.

Modeling such artefacts by a global mesh, as opposed to segmented meshes according to the video objects constituting the scene, constitutes a difficulty that can not be solved without modifying the representation model. The challenge is to eliminate this visual degradation and the limit in terms of analysis, by determining the zones in discontinuity.

Classically, this type of disturbance of the real field of movement results in reversals of meshes in its mesh representation.

A post-processing technique can be implemented to solve this problem. One of these techniques proceeds by a posteriori correction, and consists of applying the motion vectors such as calculating the products, detecting those in default and then correcting their value. Another of these techniques proceeds iteratively, adding a portion of the expected displacement to the nodes at each iteration so that there is no reversal, and continuing the iterations until the process converges.

The post-processing techniques act once the motion estimation has been performed. As a result, the result is suboptimal because the motion vectors are corrected independently of their contribution to the minimization of the prediction error.

An improvement consists in optimizing the motion field by taking into account non-inversion constraints during the optimization process. To do this, we adapt the motion estimation by adding to the quadratic error of prediction an augmented Lagrangian allowing to correct the deformation of the meshes as their area approaches from scratch. This last technique makes it possible to determine the optimal solution, but provided that it represents a continuous field. However, the nature of a video sequence is most often discontinuous.

Another technique, introduced in WO 01/43446, consists in identifying the discontinuity zones in order to restore them, by monitoring the appearance or disappearance of objects. A first estimation of movement is made between two successive instants t, and t ₂ without preventing the reversals of meshes. By identifying the reversals at the end of this first calculation using geometric criteria, the discontinuity zones are detected. The process then consists in realizing a new movement estimate between I ₁ and t ₂ , excluding from the optimization criterion the contributions of the faulty zones, containing at least one upturn, in order to minimize the error of prediction between the two images. considered. This reoptimization makes it possible to determine the optimal motion vectors for the continuous zone (admitting a bijection between t 1 and t ₂ ) and thus to avoid the disturbance of the values of the motion vectors obtained in the previous optimization, generated by the zones of discontinuities. The faulty areas are subject to a frequency or spatial approximation in image compression, and they are excluded from the video object tracking optimization method.

The various known techniques strive to make continuous a field of discontinuous movement, by imposing a movement calculated from the continuous areas in the discontinuous areas. This results in a false movement and a poor temporal prediction of the texture in the discontinuous zones, and therefore an additional cost of coding.

The technique that aims to exclude discontinuous areas does not impose movement in these areas and code them differently. However, in the case of a large number of discontinuous areas, there are as many zones to be coded differently, implying an overhead of encoding the headers of these flows. In addition, in the context of a scalable coding (scalable) this technique is relatively expensive.

An object of the invention is to estimate the motion of a video sequence - AT -

using a 2D mesh and to represent this movement in a discontinuous way in order to better represent the real motion field.

The invention thus proposes a motion estimation method in a sequence of animated digital images, comprising the following steps:

generating a first mesh, comprising meshs delimited by nodes, to be applied to a reference image of the sequence;

estimating a first displacement field in a group of images including the reference image, by assigning to each point of an image a displacement value calculated from values assigned to the nodes delimiting a mesh of the first mesh at which belongs to that point;

detecting at least one discontinuity zone in the first mesh by analyzing the first displacement field, each discontinuity zone including at least one mesh fulfilling a mesh deformation criterion in the group of images;

in each detected discontinuity zone, determining at least one break line appearing in the group of images;

generating a second mesh to be applied to the reference image, comprising a regular part composed of meshes of the first mesh that do not belong to any discontinuity zone and, for at least one detected discontinuity zone, at least two sub-meshes overlapping in a region including the break line determined in said discontinuity zone, each of the two sub-meshes including respective meshs delimited by nodes including nodes shared with the regular part, located at the edge of the discontinuity zone, and additional nodes not belonging to the regular part, the breaking line being located between the respective nodes of the two sub-meshes shared with the regular part; and estimating a second displacement field in the group of images, assigning to each point in a detected discontinuity zone a displacement value calculated from values assigned to nodes delimiting a selected mesh of the second mesh to which said point belongs, the selected mesh depending on the position of said point with respect to the predetermined break line in said discontinuity zone.

The method performs global optimization to determine motion. It is not imposed a priori constraints on the criteria to be optimized, and one also avoids to exclude from the calculation the zones of discontinuity frequently present in the animated images. The motion estimation performed can therefore be optimal, even in the discontinuity zones, provided that the rupture lines are reliably identified. The estimated movement can then be used by a video coder. In this context, it will allow a good prediction of the images of the sequence even in the discontinuity areas of the mesh and will allow the improvement of the coding cost of the video sequence. Parameters representing the estimated motion will then be transmitted to a decoder, or stored in memory for later decoding.

The motion estimation method is compatible with the use of hierarchical meshes, with which the displacement field estimates are made ranging from the coarsest hierarchical level (1) to the finest hierarchical level (nivFin) of the meshes . The discontinuity zone is preferably detected as a connected set of meshes of the finest hierarchical level fulfilling the mesh deformation criterion. It is then defined at the higher hierarchical levels as being composed of at least one mesh including at least one mesh respectively of the finest hierarchical level fulfilling the criterion of mesh deformation.

Advantageously, the two sub-meshes of the second mesh are generated starting at the level of nivFin, and the meshes of the higher levels are then generated during a progressive recovery in the hierarchy. Raising from a hierarchical level n to an immediately higher hierarchical level n-1 includes the following steps for each of the sub-meshes and for 1 <n ≤nivFin: / a / integrating each mesh of said sub-mesh previously defined at level n into a new mesh of said sub-mesh generated at level n-1;

IbI take n ¹ = n; Here, if said new mesh of the level ne-1 can not be completed with meshes of said sub-mesh already generated at level n ¹ , generating at level n ¹ at least one new mesh of said sub-mesh to complete said new mesh of level no-1; and he if n ¹ <nivFin, increase n ¹ by one unit and repeat from the step here.

In a preferred embodiment of the method, respective depth values are assigned to the nodes of the regular part and to the additional nodes of each sub-mesh of the second mesh. The value assigned to the additional nodes of a sub-mesh generated for a detected discontinuity zone is a function of the position of said sub-mesh with respect to the breaking line determined in said zone. The step of estimating the second displacement field comprises for each point of an image belonging to a mesh of the regular part of the second mesh and to at least one mesh of a sub-mesh, the calculation for each mesh including this point of a weighted sum of the depth values respectively assigned to the nodes delimiting said mesh, and the selection, for the assignment of a displacement value to said point, of the mesh for which the calculated weighted sum is maximum.

The use of these depth values makes it possible to account for the existence of several coexisting planes in the group of images. When there are more than two planes, the relative depth values will be communicated to the decoder for the motion synthesis.

Other aspects of the invention relate to a motion estimation device in an animated digital image sequence, comprising means adapted to the implementation of a method as defined above, as well as to a computer program to be installed in a moving image processing apparatus, including instructions for implementing the steps of a motion estimation method as defined above during execution of the program by a computing unit of said apparatus.

The invention also proposes a video encoder, comprising motion estimation means in a sequence of animated digital images and means for constructing an output stream including motion parameters produced by the motion estimation means, in wherein the motion estimation means is arranged to operate according to a method as defined above.

Still another aspect of the invention relates to a signal representative of a sequence of animated digital images, comprising a representation of motion parameters obtained by executing a method as defined above, as well as to a support of recording, on which is recorded such a signal. Motion parameters include, for a group of images including a reference image:

First motion parameters indicating, in a first mesh to be applied to the reference image, meshes of which at least one discontinuity zone is composed in the group of images;

Second motion parameters for positioning at least one break line in each discontinuity zone; and

Third movement parameters describing displacement values assigned to the nodes of a second mesh to be applied to the reference image, the second mesh comprising a regular part composed of meshes of the first mesh that do not belong to any discontinuity zone. and, for at least one discontinuity zone, at least two overlapping sub-meshes in a region including the break line positioned in said discontinuity zone, each of the two sub-meshes comprising respective grids delimited by nodes including nodes shared with the regular part, located at the edge of the discontinuity zone, and additional nodes not belonging to the regular part, the breaking line being located between the respective nodes of the two sub-meshes shared with the regular part.

The motion parameters represented in the signal can be supplemented by parameters indicating depth values respectively assigned to the nodes of the regular part and to the additional nodes of each sub-mesh of the second mesh.

The invention is also apparent on the motion decoding side realized in a video decoder or other moving image processing apparatus.

The invention thus proposes a motion decoding method in a sequence of animated digital images, using image meshes comprising meshs delimited by nodes. This process comprises the following steps:

receiving an input stream including motion parameters as defined above;

generating the second mesh based on the first and second motion parameters; and

generating a displacement field in the group of images, assigning to each node of the second mesh displacement values obtained from the third motion parameters and assigning to each point situated in a detected discontinuity zone a displacement value calculated from the values assigned to the nodes delimiting a selected mesh of the second mesh to which said point belongs, the selected mesh depending on the position of said point relative to the breaking line determined in said discontinuity zone.

Other aspects of the invention relate to a motion decoding device in a sequence of animated digital images, comprising means adapted to the implementation of a motion decoding method as defined above, as well as a computer program to be installed in a moving image processing apparatus, comprising instructions for implementing the steps of a motion decoding method as defined above during execution of the program by a computing unit of said apparatus.

The invention also proposes a video decoder, comprising motion decoding means and synthesis means for constructing a sequence of animated digital images by taking into account a displacement field generated by the motion decoding means, which are arranged to operate according to a motion decoding method as defined above.

Other features and advantages of the present invention will become apparent in the following description of nonlimiting exemplary embodiments, with reference to the appended drawings, in which:

FIG. 1 is a diagram illustrating the hierarchical mesh of an image;

FIG. 2 is a diagram illustrating the phenomenon of cell reversal;

FIG. 3 is a flowchart of a motion estimation method according to the invention;

FIGS. 4 to 7 are diagrams illustrating a remeshing process used in one embodiment of the invention; FIG. 8 is a diagram illustrating the definition of a discontinuity zone in higher levels of a hierarchical mesh once it has been determined at the finest level;

FIGS. 9a-d, 10a-d, 11a-d and 12a-c are diagrams illustrating the generation of the mesh in higher levels of a hierarchical mesh in one embodiment of the invention; and

- Figures 13 and 14 are simplified block diagrams of a video encoder and a video decoder according to the invention.

We consider a sequence of digital images l (x, y, t), where x, y denote the coordinates of the pixels in the field of the image and t the discretized time, assumed here increasing from 1 to each new image of the image. sequence.

The values l (x, y, t) associated with the pixels are typically values of luminance.

Motion estimation consists in generating, for each coordinate point (x, y) in the image l (x, y, t), a displacement vector D (x, y, t) = (d _x , d _y ) making it possible to construct, from the image l (x, y, t-1), an image displaced the (x, y, t) = 1 (xd _x , yd _y , t-1) which is a good approximation of l (x, y, t).

The calculation is performed on an estimation support Ω. It consists in determining the displacement field D (x, y, t) which minimizes a functional Φ (t) of the form:

Φ (t) = Σ p (l (xd _x , yd _yi t-1), l (x, y, t)) (1)

(Χ, y) εΩ

where p (A, B) is a metric whose most common form is p (A, B) = (AB) ² .

Using a mesh of images reduces the number of unknowns. Only the displacement vectors D (X _j [t], y _j [t], t) are located at the nodes i of the mesh. Outside these nodes, the displacement field D (x, y, t) is interpolated, for example according to an affine method:

D (X, y.t) = Σw, (x, y, t) .D (Xj [t], y, [tlt) (2) i

where the weights W ₁ (X, y, t) represent coordinates of the point (x, y) expressed with respect to the position of the nodes i in the image at time t.

A convenient mesh is the triangular mesh, in which each point (x, y) is considered to belong to a triangle whose vertices are nodes i, j, k of the respective coordinate mesh (X _j [t], y, [ t]),

(Xj [t], yj [t]) and (x _k [t], y _k [t]) at time t. The interpolation weights associated with the point

(x, y) at time t are its barycentric coordinates in the triangle, given by:

W _j (x, y, t) = 0 if \ '≠ \, j, k (3)

_w . _(Xιy _it) - ^X k [t1-yι [t] - ^χ i [t] .yk [t3 + (yk [t1-yι [t]) ^χ "( ^χ k [t] - ^χ ι [t]) y ₍₅₎

^J '' π _iijik [t]

_{Wk (} . _{Y> t)} _ ^χ ιW-yjW- ^χ jW-yιW + (y »w-yjW) ^χ - ( ^χ ιra- ^χ jW) y ₍₆₎ π _ijjik [t]

where π _ιJik [t] = xj [t] .y _k [t] - x _k [t] .yj [t] + x _k [t] .y, [t] - X | [t] .y _k [ t] + X | [t] .yj [t] - xj [t] .y | [t] is a vector product associated with the triangle at time t.

The calculation is conducted on a group of consecutive images of the sequence, typically of the order of a dozen images. The mesh is defined on the first image of the group (t = 0), usually by a network of equilateral triangles. The displacement vectors D (X _i [I], y _f [1], 1) are estimated by minimizing the functional Φ (1), for example by applying a Gauss-Seidel gradient descent method or the like. We deduce the positions of the nodes i of the mesh at time 1 by the formula (X _j [I]. Y _j [I]) = (X _j [O], Y _j [O]) + D (X ₁ [I ] ₁ y, [1], 1). This process is repeated until the last image of the group (t = 2, 3, 4, etc.): estimation of the displacement vectors D (x, [t], y _{ [t], t) by minimization of Φ ( t), then calculating the positions of the nodes of the mesh at time t: (X _j [t], y _j [t]) = (X _j [t-1], V _j [M]) + D (X _j [ t], V _j [t], t).

The estimation of the movement is advantageously carried out using a hierarchical mesh, which, in a manner known per se, ensures a better convergence of the system. A certain fineness of mesh is necessary to represent faithfully the movement within the image. But in case of strong movement, the previous minimization technique may not converge if it is applied directly to a fine mesh. In addition, the use of very fine meshes can cause instability of the system due to too many parameters.

Figure 1 shows an example of hierarchical mesh. The hierarchical representation consists of several levels of representation. The lowest level 30 (level 0 in the figure) has a coarse field, with only three nodes to define the mesh. Going towards the finer levels 32, 33, 35, the field becomes denser and the number of nodes of the mesh grows. The quality of movement varies with the levels, the low level 30 representing the dominant movement of the scene, and the fine levels refining the dominant movement to represent the local movements. The number of levels of the hierarchical mesh is an adjustable parameter of the estimation phase, it can vary according to the sequence to be estimated.

In the hierarchical mesh motion estimation technique, we generate several levels of hierarchical mesh on the images, we start by estimating the movement on the coarser level 30, then we go to the next level by starting the gradient descent to from displacement values to the nodes deduced from those estimated at the previous level: the nodes common to the two levels receive initial displacement vectors equal to those which have just been estimated, and the nodes added at the finer level receive calculated initial displacement vectors by spatial interpolation. At the end of the iterations, it is the displacement vectors estimated at the finest level that are quantized to be transmitted to the decoder.

The hierarchical mesh motion estimation technique is combinable with a multi-resolution estimation technique, in which we work on a pyramid of filtered and decimated images constructed from the original images. Motion estimation in a hierarchical mesh level is then performed based on sampled images at a suitable resolution level.

A general problem of meshing motion estimation techniques is that of mesh reversals. This problem is illustrated in FIG. 2, where we see the mesh of an image at two successive instants with, on the left part of the figure, an example of displacement vectors estimated between these two instants at the nodes i, j, k forming the vertices of a triangle elementary of the mesh. The reversal of this triangle results from the fact that the node k crosses the line passing through the nodes i and j. In general, the inversion of a triangle i, j, k corresponds to a change of sign of the vector product% - _x j _k [t]. Such artifacts greatly disrupt motion estimation. They are usually due to relative movements of objects in different shots of the filmed scene. The illustration in Figure 2 is very simplified because only one triangle turns over (passing through a triangle of zero area). In practice, the overlays most often occur on discontinuity areas having a certain extent in the image.

With a hierarchical mesh, the mesh reversals naturally have a greater probability of occurring in the fine levels than in the coarse levels.

To deal with the problem of mesh reversals, the invention uses a tracking of the discontinuity zones and the rupture lines that they contain. A remeshing of the image is done in the discontinuity zones, using multiple sub-meshes, anchored to the initial mesh on both sides of the break lines. The multiple sub-meshes generated in a discontinuity zone extend beyond the break line, out of which they overlap each other. They may even overflow outside the discontinuity area. To estimate the displacement of a point of the image located in the discontinuity zone using an interpolation formula such as (2), reference is made to the nodes of one of the sub-meshes, selected depending on the position of the point with respect to the break line or lines. Thus, the sub-meshes make it possible to account for different planes present in the sequence of images, their use depending on the objects which appear or disappear in the scene.

The invention makes it possible to manage the zones of discontinuity of movement without putting them in default or rejecting them at the time of the coding. When an overlap or discovery is detected, the principle is to cut the mesh locally where the discontinuity is created, and transform the mesh into a so-called "non-manifold" mesh. A non-manifold mesh is a mesh whose edges can be shared by more than two meshes. It allows to estimate the motion in the video sequence and to model a discontinuous motion field. One advantage is that it is thus possible to take into account the discontinuity zones at the time of coding in the same way as the continuous zones.

Figure 3 shows a flowchart of a motion estimation method according to the invention.

The first step 9 consists in defining the initial mesh on an image of a video sequence to be encoded. Then in step 10, we make a first estimation of the motion field in a group of T consecutive images. This estimation is performed conventionally using a preferably hierarchical mesh, for example according to the method explained above. During this calculation, some triangular meshes can turn over or deform too strongly.

Then, the method comprises a step 11 of detecting the discontinuity zones in the initial mesh.

The discontinuity zones each consist of a connected set of degenerate meshes defined at the finest hierarchical level.

They include at least the triangles that turn during motion estimation 10. These triangles are easily detectable from the TCJ vector products: _k [t] that were calculated in step 10 (for interpolation displacements in the functional to be minimized) relative to the different triangles of the mesh at the finest hierarchical level and at the successive instants t = 0, 1, 2 T-1, T. We can initially orient the triangles so that the π vector products , j _k [t] are all positive. A reversal of mesh then manifests itself by a negative vector product. The detection can be generalized to include in a discontinuity zone a triangular mesh i, j, k whose area (equal to half of the absolute value of the vector product ⁷¹ _M kM) becomes close to zero, that is to say say less than a predefined threshold, for at least one moment t. The detection of degenerate triangles, to be included in a discontinuity zone, may more generally comprise a study of the deformation of the triangles between the image at time 0 and the image at time T. If the deformation of a mesh exceeds a certain threshold, this mesh is considered degenerate.

A related set of degenerate meshes forms an area of discontinuity. This is the area where a discontinuity appeared in the movement. It is defined at the finest hierarchical level, and the triangular meshes that compose it (or the nodes that border it) are part of the parameters that will be transmitted to the decoder. The contour of the discontinuity zone can also be represented by splines.

If no discontinuity zone is detected in step 11 (test 12), the motion estimation method ends in step 20 where the motion parameters are output which will be quantized to be transmitted to the video decoder. In this case, these parameters are those obtained in step 10, to which is added an indicator signaling that no discontinuity zone has been detected (continuous movement).

If one or more discontinuity areas are detected in the group of images, a break line determination is first made in each detected discontinuity area (step 13).

A break line is positioned on the edge of an object that caused a discontinuity in the area. The following is the case of a single break line in a discontinuity zone. It will be observed that the process is generalizable to several fault lines within the same zone.

The outline of the object is oriented to define an inner region (foreground region) and an outer region (background region). Several methods, known in themselves, are applicable to find this contour in step 13. If we already have segmentation masks of the images of the sequence, the contour is extracted from these masks. However, for most sequences, masks are not available. segmentation.

In this case, the image can be pre-segmented by a "mean shift" technique such as that described by Dorin Comaniciu and Peter Meer in "Mean Shift: A Robust Approach Toward Feature Space Analysis", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 5, May 2002, pp. 603-619. Then, a succession of dilations and morphological erosions make it possible to eliminate the small segmented regions. The outline of the object is finally extracted from the segmented images.

The salient point detection technique can also be applied in step 13. The salient points are essentially positioned on the contours of the objects. Since the list of highlights does not define a complete contour, it is necessary to add a contour refinement step from a chaining of these points. The highlights of an image I correspond to the pixels of I belonging to high frequency regions. To detect them, we can use wavelet theory. The wavelet transform is a multi-resolution representation of the image that allows it to be expressed at different resolutions 1/2, 1/4, etc. Thus, at each level of resolution 2J (j <-1), the wavelet transform represents the image Im of size nxm = 2 ^k X 2 ¹ (k, Z), as a group of pictures size 2 ^{k +} J * 2 ^{I +} J, namely: a coarse image A _i .Im; an image D ¹ ^ Im details

representing vertical high frequencies, that is, horizontal contours; an image D ² ,. Im details representing the high frequencies

horizontal, that is to say the vertical contours; and an image D ³ ..Im of

details representing the diagonal high frequencies, that is to say the corners.

Each of the three detail images is obtained from A. _+1. Im per

a filtering followed by a subsampling of a factor two in each direction (with A _o .lm = Im). In order to detect the highlights of an image,

we first use a wavelet base and a resolution level minimal 2 ^r (r ≤-1). Once the wavelet transformation is done, we go through each of the three detail images D ¹ . Im, D ² ..Im, D ³ ,. Im in order to

build a tree of wavelet coefficients. This tree is based on the so-called "Zerotree" approach, known in the field of image coding. It allows to set up a saliency map of size 2 ^{k + r} x 2 ^{l + r} reflecting the importance of each coefficient of wavelets at resolution 2 ^r . Thus, a coefficient having a high saliency corresponds to a region of Im having high frequencies. Indeed, an important module wavelet coefficient at resolution 2 ^r corresponds to a contour of the image A _{r + 1} .lm in a particular direction (horizontal, vertical or

oblique). The Zerotree approach indicates that each of the wavelet coefficients at resolution 2 ^r corresponds to a spatial area of size 2 ^{~ "r} * 2 ^{~ r} in the image Im, from the constructed saliency map choose from the 2 ^{~ r} * 2 ~ ^r pixels Im, the most representative pixel of this area.

Once these salient points have been determined in the discontinuity zone, they are connected together to provide a break line. For this, one can use a known technique of chaining points or interpolation or polynomial approximation (Newton, splines, Chebicheff, least squares, etc.).

It should be noted that in step 13, the break lines are determined in each of the images of the group of images. The positions of these lines will be part of the movement parameters delivered in step 13 to be communicated to the decoder.

In step 14 of the method, the discontinuity areas that were detected in step 11 are subject to non-manifold remeshing. This remeshing is done first at the finest hierarchical level.

Figure 4 shows an example of discontinuity zone Z, here composed of eight adjacent meshes of the initial triangular mesh. This mesh consists of equilateral triangles when it is defined on the first image of the group. Fig. 4 shows an L-shaped break line which was determined in zone Z at step 13.

The new mesh adopted in step 14 includes a regular part composed of the triangles of the initial mesh which do not belong to any zone of discontinuity. In each zone of discontinuity Z containing a break line L, two sub-meshes are generated attached to the regular part along the edges of the zone of discontinuity Z. Each of these two sub-meshes is assigned to one side of the line. L, and it includes the nodes of the initial mesh located on this side along the edges of the zone of discontinuity Z.

The triangles in broken lines in FIGS. 5 and 6 thus represent, respectively, two sub-meshes that can be generated in the discontinuity zone Z of FIG. 4. In this example, the nodes of the initial mesh denoted a, b, c , d, e, f in Fig. 4 belong to the "left" sub-mesh (i.e., attached to the initial mesh on the left side of the break line L, the left and right sides being defined relative to the orientation determined for the break line L) shown in FIG. 5, and the nodes of the initial mesh denoted a, f, g, h, i, j in FIG. 4 belong to the "right" sub-mesh shown in FIG. 6.

Some of the nodes of the initial mesh that border the zone of discontinuity are common to the two sub-meshes (here the nodes a and f).

In the example of FIG. 5, the left sub-mesh has eight new nodes a'-h ¹ and sixteen new triangles (a.a'.h ¹ ), (a, b, a '), (b, b ', a'), Ob ₁ Cb ¹ ), (c, d, b '), (d.c'.b ¹ ), (dec ¹ ), (e.d'.c ¹ ), (efd ¹ ) , (f'e'.d ¹ ), (d \ e \ f), (c'.d'f), (Cf, g '), (b'.c'.g ¹ ), (a') .b'.g ¹ ) and (a'.g'.h ¹ ).

In the example of Figure 6, the right sub-mesh has eight new nodes "-h" and sixteen new triangles (a, h ", a"), (j ^a _> ^a ") _> (i, j , a "), (i.a '-. b"), (i, b ", c"), (hic "), (h, c", d "), (ghd"), (fgd ") , (f, d ", e"), (c ", e", d "), (c", f, e "), (b'T, c"), (b-.g'T), (a ", g", b ") and (a", h ", g").

The additional nodes generated in the new sub-meshes may have positions in the first image merged with those of nodes of the initial mesh. They have been shown offset in Figures 5 and 6 to facilitate the reading of the drawing.

The nodes of the edges at the edges of the zone of discontinuity Z crossed by the line of discontinuity L are border nodes which can move only with the initial mesh. These border nodes can be of three types:

- left border nodes, serving as a basis for the only left sub-mesh; these are the nodes b, c, d and e in figures 4-6;

- right border nodes, serving as a basis for the only right sub-mesh); these are the nodes g, h, i and j in Figures 4-6; and

shared edge nodes, serving as a basis for both sub-meshes; these are the nodes a and f in Figures 4-6.

When the break line L passes through a triangle having at least one vertex edge node, this node is identified as being left or right border depending on its position relative to the oriented line. For a triangle where the break line L ₁ ends, for example, the nodes on the edge traversed by the line L can be identified as left and right edge nodes and the third node as shared edge node (as in the figures 4-6). Another possibility is to extend the break line by extrapolation until it encounters an edge of the triangle, and identify the nodes on that edge as shared border nodes and the third node as the left border node or right according to its position relative to the oriented line.

In order to take into account any discoveries that may occur in the video sequence, the new meshes extend beyond the zone of discontinuity Z as shown in FIGS. 5 and 6. Meshes of the regular part and meshes of the sub -maillages overlap then.

To avoid conflicts during the reconstruction of an image, we use a "z-order" method at the nodes, inspired by the "z-buffer" whose use is well known in the field of the synthesis of three-dimensional images.

The adaptation to non-manifold meshes used here is done by assigning new nodes of each sub-mesh a value of depth z, positive or negative, assigned to this sub-mesh.

The nodes of the initial mesh that are preserved receive the value of depth z = 0. A value z> 0 generally corresponds to an object in the foreground, and a value z <0 to an object in the background. The sign of z is given by the orientation of the rupture line L. The object in the foreground, whose contour corresponds to the rupture line, is positioned relative to the orientation of the line L (for example to right of the line if we move in the direction of its orientation). Thus, in the case of FIGS. 4-7, the hatched portion in FIG. 7 belongs to the object whose break line L constitutes the contour.

This value z at the nodes makes it possible to calculate at each point of a mesh a value of z by an interpolation technique (affine for example). When reconstructing a point that can be reconstructed by several meshes, a value z is calculated at this point for these different meshes, and these values are compared to retain the mesh giving the largest value z. This helps to promote objects in the foreground relative to the background.

When several break lines appear in a discontinuity area, there are more than two planes in the corresponding portion of the image sequence. The detection of break lines makes it possible to position the different planes, and they are assigned differentiated z values. The preceding method then makes it possible to select the relevant mesh for the reconstruction of each point of the image. The z-values at the nodes are positioned so as to reconstruct at best the image for which the meshes were introduced. The positioning can be done using an iterative Condition Mode (ICM) algorithm, seeking to minimize the mean squared error between the initial image and the reconstructed image. When there are several break lines in a discontinuity zone, the z values that have been determined for the corresponding sub-meshes are part of the parameters of the movement to be transmitted to the decoder. Once the remaillage has been operated at the finest level, the discontinuity that represents a line of rupture L is raised in the higher levels until its disappearance at a certain level. As long as the discontinuity exists on a mesh level, the zone of discontinuity defined on this level is remailled taking into account the remaillage of the lower level to preserve the hierarchy of the mesh.

The rise of the discontinuity in the hierarchy comprises two steps: determining the discontinuity zone for each level, and determining the constraints imposed on the nodes at the edge of the zone. Let nivFin be the finest level of the mesh to which remeshing was originally done. If, for a level n less than or equal to nivFin, the zone of discontinuity has been detected, the discontinuity zone at level n-1 is defined by the set of parent meshes of the meshes of the zone of occlusion of level n, as shown in Figure 8.

The constraint of the border nodes shared by the two sub-meshes is raised in the hierarchy, for example according to the following algorithm. For a node m constituting a shared border node at level n,

^• if my father is p at level n-1, p becomes shared border node at level n-1; ^• otherwise, let A be the edge by which the break line L enters the triangle of level n-1, q is the node of the mesh located in front of that edge. The breaking line L is artificially extended to q, and this node q becomes a shared border node for the n-1 level.

The creation of a new mesh for a level of hierarchy n-1 lower than a level n which has already been re-worked, can be of three types:

1 / Figures 9a-d: the breaking line L passes entirely through the ABC mesh belonging to the discontinuity zone at level n-1.

Border nodes for the right side being C and B, a new node

A ¹ is created at level n-1. The new mesh A ¹ BC of the level n-1 takes for meshes girls the meshes A'E'D ', E ¹ CF, D'FB and E'D'F of the level n, where

F is the middle of the edge BC and D ¹ and E 'are the nodes that were created by remeshing the right side at level n. The mesh A ¹ E ¹ D ^{1 is} of the level n even if it has not been generated during the remeshing at the level n, but it must be at the higher level n-1. Similarly, for the doubling of the left side, the edge node is A, the nodes B ', C and F' having been created by the remeshing of the level n-1. The new mesh AB ¹ C of the level n-1 has for meshes girls AED, EC ¹ F, EDF ¹ and DF ¹ B 'at level n, where D and E are respectively the midpoints of the edges AB and AC. / Figures 10a-d: the break line L crosses the mesh ABC belonging to the discontinuity zone at level n-1 ending at node C.

This case is similar to case 1 /, except that node C becomes a shared border node. For remeshing on the right side at level n-1, C and B (for example) are border nodes, and node A 'is created. The new mesh A ¹ BC of the level n-1 has for meshes girls the meshes A'E'D ', E ¹ CF, D'FB and E ¹ D ¹ F of the level n, whose mesh AΕ'D ¹ added during remeshing at level n. For split on the left side, C and A are border nodes, and node B ¹ is created. The new mesh AB'C of level n-1 has for meshes girls AED, ECF ¹ , EF ¹ D and DF ¹ B 'at level n. / Figures 1 1a-d: the break line L does not completely cross the mesh ABC level n-1.

At level n, it is artificially extended either to the node C opposite to the edge EF that it passes through entering the mesh, which brings to the case of Figures 10a-d, or up to an edge (EF on Figure 1 1a) opposite to the incoming edge. E and F are then edge nodes shared at level n. When one considers the hierarchical superior level of the mesh (n-1), the line of rupture L is prolonged towards a node or edge (case similar to what one has just seen for the finer level n). In FIG. 11d, the contour has been extended on the node C. For remeshing on the right side, C and B are edge nodes, and the node A ¹ is created. The mesh A ¹ BC of the level n-1 has for meshes girls

A ¹ ED ', ED'F, EFC and D'FB at level n. The mesh A ¹ ED 'falls under level n even if it was not generated during the remeshing at level n, but it must be at the top level n-1. For remeshing on the left side, the border nodes are C and A, and the B 'node is created. The ACB 'mesh of level n-1 has for meshes girls AED ₁ ECF, EDF and DFB ¹ at level n. Note that in this case, the ECF mesh of the level n is shared by the meshs A ¹ BC and ACB ¹ .

When the line of rupture is entirely contained in a mesh, it disappears at the higher level. The new nodes introduced at the current level for the creation of the new mesh are defined by their barycentric coordinates in the parent mesh of the higher level. These nodes will thus have a global movement influenced by the nodes of the mesh of the higher level.

Figures 12a-c show the case of the disappearance of the break line. At level n, the break line was extended to nodes B and C, which became shared edge nodes. The remeshing introduced for the right side the nodes E 'and D', and for the high side the node F ¹ . At the level n-1, the contour is entirely contained in the mesh ABC. For the right side, the node A 'is introduced to form the mesh A'BC. For the left side, no knot is introduced, the remeshing producing the initial mesh ABC. The mesh A ¹ BC is forced to move with the initial mesh ABC, so that the points A and A 'at the level n-1 are virtually the same. At level n, A 'exists and is defined by its barycentric coordinates in ABC of level n-1.

The remeshing process is compatible with a geometric multigrid approach, sometimes used for motion estimation in order to obtain a weighting between successive hierarchical level nodes that takes into account the deformation of the lower mesh. In this case, the weighting of the nodes can be done as follows: (i) if the end node is a direct child of a coarse node, the weighting is 1; (ii) if the end node is derived from several coarse nodes, the weighting is the average of the barycentric weights of the end node relative to the coarse nodes.

Once the new mesh has been completed at all levels hierarchical, the motion is reestimated on the group of images in step 15. This reestimation can be performed as in step 10, for example using the formulas (1) to (6) above, with a precaution for the pixels likely to be reconstructed by several triangles of the new mesh. This ambiguity exists because some triangular meshes of the new mesh overlap.

To lift it in the discontinuity zone, a visibility mask is defined at each time t. In the example illustrated above, this mask corresponds to the hatched portion in FIG. 7. It consists of the points which, at time t, are inside the discontinuity zone (that is to say do not belong to any mesh of the initial mesh resumed in the new mesh) and are, for example, to the right of the L-oriented break line determined for this time t. The points inside the discontinuity zone are likely to be reconstructed either by a triangle of the right sub-mesh or by a triangle of the left sub-mesh. The triangle i, j, k retained for the application of formulas (3) - (5) at such a point is that of the right sub-mesh if the point belongs to the mask and that of the left sub-mesh if not.

The ambiguity also exists for certain points outside the discontinuity zone due to the overflow of the sub-meshes. To lift it, we use the z-values as indicated above to decide whether the sub-mesh that overflows outside the discontinuity zone is in the foreground or in the background. Thus, for a point outside the zone of discontinuity and belonging to a triangle of a sub-mesh, we calculate the value z relative to each triangle of the mesh including this point and we select the one which gives the greatest value z for application of formulas (3) - (5).

To improve the convergence of the minimization algorithm in step 15, the gradient descent can be started by initializing the displacement vectors at the retained nodes of the initial mesh to the values that were obtained during the first estimation 10.

During the reestimation of the movement 15, it is possible that at one instant t, one of the nodes added to the remeshing step 14 does not reconstruct no point of the image. In this case, the minimization of the functional (1) does not provide a displacement vector for such a node. The displacement vector is then regenerated by interpolation of those obtained for neighboring nodes of the same sub-mesh.

Finally, the motion parameters delivered in step 20 when the image group includes at least one discontinuity zone include:

(a) the indication of the meshes of the initial mesh which belong to a zone of discontinuity; (b) for each discontinuity zone, locating at least one break line in each image of the group;

(c) if a discontinuity zone contains more than one break line, an indication of the z values associated with the different sub-meshes generated in the zone, to designate the relative depth of the objects; (d) the vectors of displacement at the nodes of the mesh, computed in step 15.

The motion estimation as described above is particularly usable in video coding applications. A simplified block diagram of an encoder embodying the invention is presented in FIG. 13. Such an encoder carries out a motion estimation on a digital image sequence of a video stream (module 36), and of on the other hand, to a texture coding (module 37) which can be implemented according to various techniques known in the field of video coding. In an encoder according to the invention, the module 36 operates according to the method described with reference to FIG. 3. The parameters of movement (a) - (d) that it delivers are coded by the module 38. before being inserted into the digital output stream of the encoder by the module 39, with the texture encoding information.

A signal carrying this output stream can be transmitted or broadcast over a communication channel. It can also be recorded on a recording medium such as an optical disc, a magnetic tape, etc.

With reference to FIG. 14, a video decoder compatible with such encoder receives an input stream similar to the output stream of the encoder, and separates in this stream the motion parameters and the texture information (module 40). Modules 41 and 42 respectively process this information to decode motion and texture in the successive image groups of the encoded video sequence. The decoded movement and texture are processed by a synthesis module 43 to reconstruct the video images.

The operation of the motion decoding module 41 is as follows. Groups of images are first identified in the sequence, as in a conventional decoder. From the initial mesh fixed by convention, the module 41 locates the discontinuity zones according to the information (a) above. He then places the break lines in these discontinuity zones according to their location in the first image of the group (b). The module 41 then regenerates the non-manifold mesh by remeshing the discontinuity zones in accordance with the step 14 previously described with reference to FIGS. 3 to 12. The quantized displacement vectors assigned to the nodes of the non-manifold mesh are indicated in FIG. the coded stream. To know the field of displacement of the image at each time t, the module 41 identifies the triangular mesh used to synthesize the displacement vector of each point according to the same process as that used by the encoder in step 15 described above, from the position of the point with respect to the break line (b) (if this point is in a discontinuity zone) and from the z-values indicative of depth (c).

The encoder according to FIG. 13, or the decoder according to FIG. 14, can be made in the form of a specific electronic circuit. However, it will often be done as software. The steps of the methods described above are then controlled by instructions of a program executed by a processor of the video encoding or decoding apparatus. For coding, this apparatus may for example be a computer, a video camera, a workstation of a television relay, a recording apparatus, etc. For decoding, it may for example be a computer, a recording medium reader, a television signal receiver, an image display, etc.

Claims

A motion estimation method in a sequence of moving digital images, comprising the steps of:

detecting at least one discontinuity zone in the first mesh by analyzing the first displacement field, each discontinuity zone including at least one mesh fulfilling a mesh deformation criterion in the group of images; in each detected discontinuity zone, determining at least one break line appearing in the group of images;

generating a second mesh to be applied to the reference image, comprising a regular part composed of meshes of the first mesh that do not belong to any discontinuity zone and, for at least one detected discontinuity zone, at least two sub-meshes overlapping in a region including the break line determined in said discontinuity zone, each of the two sub-meshes including respective meshs delimited by nodes including nodes shared with the regular portion, located at the edge of the discontinuity zone, and additional nodes not belonging to the regular part, the breaking line being located between the respective nodes of the two sub-meshes shared with the regular part; and

estimating a second displacement field in the group of images, by assigning to each point situated in a detected discontinuity zone a displacement value calculated from values assigned to nodes defining a selected mesh of the second mesh to which said point belongs, the selected mesh depending on the position of said point with respect to the predetermined break line in said discontinuity zone.

2. The method of claim 1, wherein the discontinuity zone is separated by the break line into two parts respectively associated with the two sub-meshes, and wherein for a point in said discontinuity zone and belonging to several meshes, we select a mesh of the sub-mesh associated with the part of the zone of discontinuity where is this point.

The method of claim 1 or 2, wherein the first and second meshes are hierarchical meshes, the displacement field estimates being operated by going from a coarser hierarchical level to a finer hierarchical level of the meshes, wherein the discontinuity zone is detected as a connected set of meshes of the finest hierarchical level fulfilling the mesh deformation criterion, and wherein the discontinuity zone is defined at the higher hierarchical levels as being composed of at least one mesh including at least one respective mesh of the finest hierarchical level fulfilling the mesh deformation criterion.

4. The method as claimed in claim 1, in which the two sub-meshes of the second mesh are generated starting at the finest hierarchical level nivFin, the meshes of the higher levels being then generated during a progressive reassembly. in the hierarchy, the rise from a hierarchical level n to an immediately higher hierarchical level n-1 comprising the following steps for each of the sub-meshes and for 1 <n ≤nivFin:

/ a / integrating each mesh of said sub-mesh previously defined at level n into a new mesh of said sub-mesh generated at level n-1; IbI take n '= n;

/ c / if said new mesh of level ri-1 can not be completed with meshes of said sub-mesh already generated at level n ', generating at level n ¹ at least one new mesh of said sub-mesh to complete said new mesh level n-1; and he if n ¹ <nivFin, increase n ¹ by one unit and repeat from the step here.

5. Method according to any one of claims 1 to 4, in which respective depth values are assigned to the nodes of the regular part and to the additional nodes of each sub-mesh of the second mesh, the value assigned to the additional nodes of a sub-mesh generated for a detected discontinuity zone being a function of the position of said sub-mesh relative to the breaking line determined in said zone, and wherein the step of estimating the second displacement field comprises for each point an image belonging to a mesh of the regular part of the second mesh and at least one mesh of a sub-mesh, the calculation for each mesh including this point of a weighted sum of the depth values respectively assigned to the nodes delimiting said mesh, and the selection, for the assignment of a displacement value to said point, of the mesh for which the calculated weighted sum is maximum.

A motion estimation device in an animated digital image sequence, comprising means (36) suitable for carrying out a method according to any one of claims 1 to 5.

A computer program to be installed in a moving image processing apparatus, comprising instructions for carrying out the steps of a motion estimation method according to any one of claims 1 to 5 in a execution of the program by a computing unit of said apparatus.

A video encoder comprising motion estimating means (36) in an animated digital image sequence and means (38-39) for constructing an output stream including motion parameters produced by the motion estimation means, wherein the motion estimation means is arranged to operate according to a method according to any one of the claims 1 to 5.

The video encoder of claim 8, wherein the motion parameters included in the output stream comprise:

parameters indicating the meshes of the first mesh of which each zone of detected discontinuity is composed;

positioning parameters of a determined breaking line in each zone of discontinuity detected; and

parameters describing displacement values assigned to the nodes of the second grid, obtained in the estimation of the second displacement field.

The video encoder according to claim 9, wherein the motion parameters included in the output stream further comprise parameters indicating depth values respectively assigned to the nodes of the regular part and to the additional nodes of each sub-mesh of the second. mesh generated by the motion estimation means.

1 1. Signal representative of a sequence of animated digital images, comprising a representation of motion parameters comprising, for a group of images including a reference image:

second movement parameters for positioning at least one break line in each discontinuity zone; and

third movement parameters describing displacement values assigned to the nodes of a second mesh to be applied to a reference image, the second mesh comprising a regular part composed of meshes of the first mesh that do not belong to any discontinuity zone and, for at least one discontinuity zone, at least two sub-meshes that overlap in a region including the breaking line positioned in said discontinuity zone, each of the two sub-meshes comprising respective meshes delimited by nodes including nodes shared with the regular part, situated at the edge of the discontinuity zone, and additional nodes not belonging to to the regular part, the breaking line being located between the respective nodes of the two sub-meshes shared with the regular part.

The signal according to claim 11, wherein the motion parameters further comprise parameters indicating depth values respectively assigned to the nodes of the regular part and the additional nodes of each sub-mesh of the second mesh.

13. Recording medium, on which is recorded a signal according to claim 1 1 or 12.

A method of motion decoding in a sequence of animated digital images, using image meshes comprising meshs delimited by nodes, the method comprising the following steps:

receiving an input stream including motion parameters comprising, for a group of images including a reference image:

Third movement parameters describing displacement values assigned to the nodes of a second mesh to be applied on the reference image, the second mesh comprising a regular part composed of meshes of the first mesh which do not belong to any discontinuity zone and, for at least one discontinuity zone, at least two sub-meshes which overlap in a a region including the break line positioned in said discontinuity zone, each of the two sub-meshes comprising respective meshs delimited by nodes including nodes shared with the regular part, located at the edge of the discontinuity zone, and additional nodes n not belonging to the regular part, the breaking line being situated between the respective nodes of the two sub-meshes shared with the regular part;

generating the second mesh based on the first and second motion parameters; and - generating a displacement field in the group of images, assigning to each node of the second mesh displacement values obtained from the third motion parameters and assigning to each point in a detected discontinuity zone a value of displacement calculated from the values assigned to the nodes delimiting a selected mesh of the second mesh to which said point belongs, the selected mesh depending on the position of said point with respect to the break line determined in said discontinuity zone.

15. The method of claim 14, wherein the discontinuity zone is separated by the break line into two parts respectively associated with the two sub-meshes, and wherein for a point in said discontinuity zone and belonging to several meshes, we select a mesh of the sub-mesh associated with the part of the zone of discontinuity where is this point.

The method of claim 14 or 15, wherein the motion parameters of the input stream further comprise values of depths respectively assigned to the nodes of the regular part and to the additional nodes of each sub-mesh of the second mesh, the value assigned to the additional nodes of a sub-mesh corresponding to a discontinuity zone being a function of the position of said sub-mesh by relative to the break line positioned in said zone, and wherein the step of generating the displacement field comprises for each point of an image belonging to a mesh of the regular part of the second mesh and to at least one mesh of a sub-mesh, the calculation for each mesh including this point of a weighted sum of the depth values respectively assigned to the nodes defining said mesh, and the selection, for the assignment of a displacement value to said point, of the mesh for which the calculated weighted sum is maximum.

17. A motion decoding device in a sequence of animated digital images, comprising means (41) adapted to the implementation of a method according to any one of claims 14 to 16.

A computer program to be installed in a moving image processing apparatus, comprising instructions for implementing the steps of a motion decoding method according to any one of claims 14 to 16 during execution. of the program by a computing unit of said apparatus.

19. A video decoder, comprising means (41) for motion decoding and synthesis means (43) for constructing a sequence of animated digital images by taking into account a displacement field generated by the motion synthesis means, in wherein the motion decoding means is arranged to operate according to a method according to any one of claims 14 to 16.