EP1354482A1

EP1354482A1 - Image coding and decoding method, corresponding devices and applications

Info

Publication number: EP1354482A1
Application number: EP02701323A
Authority: EP
Inventors: Henri Sanson; Nathalie Laurent-Chatenet; Alexandre Résidence "Ondine" BUISSON
Original assignee: France Telecom SA
Current assignee: Orange SA
Priority date: 2001-01-26
Filing date: 2002-01-25
Publication date: 2003-10-22
Also published as: KR20030081403A; US20040151247A1; FR2820255A1; US7512179B2; CN1288914C; BR0206798A; CN1531824A; JP2004520744A; KR100922510B1; WO2002060184A1

Abstract

The invention concerns a method for coding images using selectively at least two image coding modes, each optimising compression of at least a video sequence image on the basis of different optimisation criteria.

Description

METHOD AND DECODING OF IMAGES, DEVICES AND APPLICATIONS THEREOF

The technical field of the invention is that of bit rate reduction coding of moving image sequences, particularly video. Video coding applications are particularly numerous.

Let us quote in a non-exclusive way:

- Transmission of digital TV;

- Real time video transmission on different types of networks: IP, mobile, ("IP Streaming"); - Computer storage of videos.

The invention is particularly applicable in the context of systems implementing an MPEG type coding. By MPEG type coding is understood a coding based on a temporal prediction and a discrete cosine transformation based on a rigid block structure, often of fixed size, but possibly of variable size. The 2 representative standards for this coding family are the MPEG-4 standards from version 1 to 4 and ITU-T / H.263 up to version 2. The invention can also be applied in the context of the recommendation CCITT H26L (see for example the corresponding document VCEG-N83dl).

The video coding and decoding schemes offered to date fall into 2 categories:

- Standardized codings, either by ISO MPEG or by ITU-T, all based on the same type of techniques (temporal prediction and a discrete cosine transformation based on a block structure)

- Codings under development proposed by research laboratories, which use a wide range of techniques: Coding by Wavelets, by Regions, by Fractals, by Meshes, etc. .. Currently, MPEG-4 encoding is considered to be the state of the art not only of standardized encodings, but also of all published encodings. MPEG-4 or ITU-T / H.263 ++ encodings are considered to have reached their limits, in particular because of the rigid block structure of fixed size used to support all the calculations and coding operations. In particular, the temporal prediction of images within a sequence is insufficiently exploited.

Furthermore, the published alternative codings have not yet reached a sufficient degree of optimization.

Thus, in order to obtain coded video sequences at low bit rate, coders generally reduce the size of the images and time sub-sample the original video sequence. However, the second technique has the drawback of restoring jerky movements which are more or less annoying for the user depending on the level of subsampling.

To avoid such jerks, it is important to reconstruct the missing images (not coded) with the decoder by temporal interpolation.

However, current techniques for temporal interpolation of images do not allow satisfactory results to be obtained, especially when they are implemented using the single decoder. Indeed, these techniques are the source of visual artifacts linked to block-based motion compensation techniques which define only one motion vector for all the pixels in a block.

The objective of the invention is precisely to overcome the limitations of the prior techniques. More specifically, an objective of the invention is to provide a technique for coding, and decoding, image data, which makes it possible to obtain a reduced bit rate and / or a better quality of reconstructed image, compared with the techniques. known.

This objective is achieved, according to the invention, using an image coding method, selectively implementing at least two image coding modes, each optimizing the compression of at least one image. a video sequence based on different optimization criteria.

According to several advantageous embodiments, information on the choice of one of said coding modes can be known to a decoder according to at least one of the techniques belonging to the group comprising: predefined choice, known during coding and decoding; information representative of the choice included in a data stream comprising at least some of the coded image data; - information representative of the choice included in a data stream independent of the coded image data; determination of the choice intrinsically, by the decoder. Advantageously, the method comprises a step of selecting a coding mode to be applied to said image, among at least: - a first coding optimizing a photometric representation of an image; a second coding optimizing a representation of the movement between at least two images.

Thus, the present invention relates to a new video coding method by hybridization of a coding in particular of the MPEG type and of a coding by temporal interpolation based on a representation by meshes, as well as the decoding method and the structure of the binary representation. associated.

Preferably, said second coding takes account of at least one previous image and / or at least one following image coded using said first coding.

Advantageously, said second coding takes account of a motion vector field calculated from the immediately preceding image coded using said first coding and / or a motion vector field calculated from the image immediately following coded using said first coding. Advantageously, these vector fields are applied to a mesh.

In this case, said motion vector fields can be used to determine a deduced motion vector field, associated with an image coded using said second coding.

According to a preferred embodiment of the invention, said selection step is based on the implementation of a subsampling of fixed factor N, an image on N being coded using said first coding.

Advantageously, this value N is variable, as a function of at least one predetermined criterion.

According to a particular embodiment, said first coding implements a transformation on blocks of images and a temporal prediction by blocks. This transformation is for example of the DCT type, Hadamard transformation, wavelet transformation, ...

It should be noted that the blocks of images are not necessarily square, but can take any form suited to the needs and the means available. This first coding can in particular be an MPEG-4 or H26L coding.

In the latter case, type I (intra) and / or type P (predictive) images are preferably used (and not, preferably, type B images).

According to another particular aspect of the invention, said second coding advantageously rests on the implementation of a hierarchical mesh of M levels,

M being greater than or equal to 1, and for example a triangular mesh.

In this case, the method preferably comprises a step for managing the occlusion zones.

The data produced can be gathered in a single flow. Advantageously, it is possible to provide at least two data streams, which can be transmitted over independent transmission channels.

Said data flows advantageously belong to the group comprising: a global header; - image data coded according to said first coding; image data coded according to said second coding. The transmission of flows can therefore take place independently. This allows in particular a progressive and / or partial decoding of the images, according to the means and the needs. Thus, according to a particular embodiment of the invention, use is made of including the following aspects:

Advanced optimization of the constituent modules of standard codings of MPEG or ITU-T / H.263 type Temporal prediction and associated associated error coding for mesh-based techniques.

Indeed, the mesh-based approach avoids the usual block effects thanks to the use of continuous motion fields. In addition, the mesh technique makes it possible to detect occultations of “objects”, as well as an error coding well suited to these areas. By further combining an MPEG type error coding around these zones, it is possible to significantly improve the efficiency of the interpolation at a cost much lower than the bidirectional images (type B images) proposed by MPEG.

Thus, it is possible to effectively code the basic information at a low temporal resolution thanks to the coding of the MPEG type, with good quality, then to restore the full fluidity of the sequence thanks to the coding in interpolated mode by mesh.

The invention also relates, of course, to: the methods of decoding an image signal coded using the coding method described above; - the devices for coding an image signal coded using the coding method described above; the devices for decoding an image signal coded using the coding method described above (advantageously comprising means for determining at least part of a vector field and / or at least part of the occlusion zones, similar to those used during coding); devices for storing at least one image signal coded using the coding method described above; systems for coding, transmitting and / or decoding an image signal coded using the coding method described above (the choice of one of said coding modes can advantageously be known to a decoder according to at least one of the techniques belonging to the group comprising: predefined choice, known for coding and decoding; - information representative of the choice included in a data stream comprising at least some of the coded image data; information representative of the choice included in a data stream independent of the coded image data; - determination of the choice intrinsically, by the decoder); computer program products for encoding and / or decoding an image signal encoded using the encoding method; the data carriers of such a program. The invention also relates to the image data signals comprising data coded according to the method described above.

Advantageously, this signal comprises at least one indicator indicating whether or not the process is activated.

Preferably, the signal comprises data specifying the structure of the frames, at the start of the video sequence and / or in each signal frame.

Advantageously, a sequence coded using said second coding begins with a header specifying the number of frames coded according to this second coding.

According to a particular embodiment, the signal comprises at least two data streams, which can be transmitted over independent transmission channels.

In this case, said data streams advantageously belong to the group comprising: a global header; - image data coded according to said first coding; image data coded according to said second coding. The invention finds applications in numerous fields, and in particular in the fields belonging to the group comprising: digital television; - real-time video on network D? ; network real-time video to mobiles; storage of image data. Other characteristics and advantages of the invention will appear more clearly on reading the description of a preferred embodiment of the invention, given by way of simple illustrative and nonlimiting example, and of the appended drawings among which: Figure 1 is a block diagram of the coding of the invention; FIG. 2 presents an example of a hierarchical mesh structure for the movement; FIG. 3 illustrates the principle of the affine interpolation on a triangular mesh; Figure 4 is an example of occultation detected by overlapping of triangles; FIG. 5 illustrates the process of transformation of any triangle of the image into a symmetrical square matrix; FIG. 6 illustrates the transformation of any triangle into a right isosceles triangle; FIG. 7 illustrates a hierarchical mesh and the representation by associated quaternary tree; - Figure 8 is an example of coding decision for the hierarchical mesh; FIG. 9 shows an overall structure of a binary train according to the invention; FIG. 10 presents a block diagram of a decoder according to the invention. The embodiment of the invention described below essentially consists in the hybridization of an MPEG type coding, for example MPEG-4 with a mesh coding operating in interpolated mode, also called B mode or B images in the MPEG standards. Note that the MPEG-4 coding mentioned here can be replaced by any coder based on equivalent techniques, i.e. using a temporal prediction and a discrete cosine transformation based on a block structure, and the quantifications and codings entropy for the information generated. In particular, ITU-T / H.263 ++ coding can be substituted for MPEG-4 coding. For each image of the sequence entering the coder, the latter decides according to a certain decision process (for example, a temporal subsampling of a fixed factor) to code it with the MPEG-4 encoding module, or well with the mesh-based encoding module.

The coded images in mesh mode use as references for their temporal prediction the coded images in MPEG-4 mode located immediately before or immediately after the group of coded images in mesh mode to which they belong.

The key point of the compression efficiency of the invention is that the mesh-based motion compensation leads to a very powerful time prediction, for a very low associated coding cost.

Indeed, this technique:

- Takes into account different types of movements in the images

- Treats cleanly the overlaps and discoveries of zones due to the movements of objects. Figure 1 gives a general view of the principle of the coder.

First of all, the incoming images are routed either to the MPEG encoding module, or to the mesh-based encoding module, according to a given decision mode, for example according to a predefined rhythm: 1 image on N is coded in MPEG, the others in interpolated mesh mode. We denote by N _k the numbers of the images coded in MPEG mode. All the other images I „N _k <l <N _{k + 1} , are encoded by an encoder based on a mesh, for example triangular, operating in interpolated mode called mode B. the general principle of this encoder is as follows:

1. Calculation of the front and rear motion fields between the images N _k and N _{k + 1} . These fields are modeled in the form of triangular meshes.

2a. Estimation of predictable areas: during the interpolation, we estimate the movement between It and If. If the application of motion vectors does not lead to any reversal (which means that we are in the presence of a predictable area), we perform motion compensation by weighting the vectors with a scalar k (0 <k <1 ) so as to interpolate I _{t +} with t + k belonging to] t, t '[. Three approaches are possible for motion compensation. They are described below; 2b. Estimation of non-predictable zones: Detection of occultation zones, non-predictable, in images II to be coded, based on knowledge of these motion fields 3. Specific coding of these occultation zones according to one of the three modes following possibilities: - Prediction with one of the reference images (N _k , N _{k + 1} , or these motion compensated images with their motion fields) without motion compensation, then coding of the prediction error with a technique based on triangular mesh - Prediction with one of the reference images (N _k , N _{k + 1} , or these motion compensated images with their motion fields) with intra-image motion compensation, then coding of the prediction error with a triangular mesh-based technique - Intra-image coding based with a mesh-based technique triangular. 4. Optionally, MPEG type P coding of the residual prediction or coding error, limited to an area around the occultation area. As mentioned above, motion compensation can be performed according to three approaches: with a forward estimate, with a back estimate or with a front estimate and a back estimate. 1) with an estimation before: • During the interpolation, the motion between I _α and I _{t 2} is estimated. If the application of the motion vectors does not lead to any reversal (which corresponds to the predictable areas) then we perform motion compensation by weighting the vectors with a scalar k = m / (tl + t2) (0 <k <l) so as to interpolate Im with m belonging to [tl, t2]. We obtain the estimated image El. 2) with a rear estimate:

During the interpolation, the movement between I _t2 and I _n is estimated. If the application of the motion vectors does not lead to any which corresponds to the predictable zones) then a motion compensation is carried out by weighting the vectors by a scalar k '= lm / (tl + t2) (o <= k'<= l) so as to interpolate Im with m belonging to [tl, t2]. The estimated image E2 is obtained. 3) the two together:

During the interpolation, the movement I _n and I _β is estimated and between l _t2 and I _tl . If the application of the motion vectors does not lead to any reversal (which corresponds to the predictable areas), then we perform motion compensation by weighting the vectors "before" by a scalar k (0 <= k <= l) and l 'we get El. We do the same with the "back" vectors by a scalar k' (0 <= k '<= l) and we get E2. The estimated image is then E = aEl + (la) E2 with 0 <= a <= l. We take each time with 2 bits the best solution after calculation of the PSNR between the estimated solution and the associated source image.

1. Calculation of the front and rear motion fields between the images

N _b ^β ^ _bt!

Fields of forward and backward motion between the images N _k and N _{k + 1} are calculated, in the form of hierarchical meshes, for example triangular, T ^b _k and T ^f _{k + 1} , as indicated in FIG. 2.

Such meshes are obtained by dividing certain meshes, for example, the triangular meshes are divided into 4 sub-triangles, according to a certain criterion during the process of estimation of the movement. At each level of the hierarchy, the decisions of division or not are taken for each mesh. Once these divisions have been decided, the adjacent meshes of the divided meshes are then divided so as to maintain a conforming mesh structure. The initial mesh, before division (top of the hierarchy), can be arbitrary.

In the example in Figure 2, the motion estimator decides to divide triangles 3 and 8. this results in the division of triangles 2, 4, 7 and 9. the process is iterated up to a predefined level of hierarchy.

In the case of triangular meshes, the expression of the field of motion defined by a triangular mesh T is given on each triangle e by:

or :

• e denotes the triangular element of T containing the current point with coordinates x and y,

• - {er (e)} denotes the set of its three nodes or vertices, numbered i, j, k with positions p _ι , p and p _k ,

• Ψ ₇ (/ = i, j, k) represents the barycentric coordinates of the point p (x, y) in the triangular element _e , _{J k} with:

Such a model defines an everywhere continuous field. In addition, it allows fine control of the representation accuracy, an essential characteristic for compression.

At each level of the mesh hierarchy, the nodal motion vectors are calculated so as to minimize a prediction error. Different mesh-based motion estimators can be used, for example the one described in patent FR No. 98 11227, or FR No. 99 15568.

The important point is that the final mesh results from a hierarchical process starting from an initial mesh by divisions. This hierarchical character is in fact used for the differential coding of the nodal motion vectors between a node and its parent nodes (the ends of the arc on which it has been inserted). The structure of the mesh is recalculated at the decoder on the basis of the knowledge of the initial mesh, and of the mesh division indicators.

Thus, at the end of the process, 2 motion meshes are obtained for each group of images comprised between the images N _k and N _{k + J} , used to reconstruct the set of images of the group.

2. Detection of blackout areas

From these 2 meshes, the occultation zones, that is to say not predictable in the image N _k from the image N _{k +!} or vice versa, due to the recovery or the discovery of objects, are detected.

These zones are simply defined by the overlapping triangles, once displaced by their nodal vectors.

The figure illustrates the occultation detection based on the overlapping of triangles after displacement. The coder can continue the motion estimation by deactivating the triangles of the occultation zones, so as to obtain less biased displacement vectors.

This however is strictly internal to the strategy of the coder, and in the end, it is 2 complete motion meshes T ^b _k and T ^f _{k + 1} which are coded and inserted in the binary train. The decoder is then able to find the occultation zones from these 2 meshes.

These occultation zones are defined on the images N _k and N _{k + 1} and once detected, the triangles belonging to them are labeled accordingly, both to the coder and to the decoder. However, the coder needs to know these areas on the images N _k +1 to N _{k + 1} -1. These are simply obtained by projection of the meshes T ^b _k and T ^f _{k +} ι on the image to be coded, by application of the nodal motion vectors, renormalized to take account of the temporal distance between the current image and the image of reference N _k or N _{k + 1} . 3. Coding of shading areas:

For each occultation zone, the reference image for a possible prediction is selected from I _Nk , I _{N (k + 1)} , but also I ^c „which is the image obtained at time 1 by motion compensation with the mesh T ^b _k or T ^f _{k + 1} at a level where there is not yet an overlap of mesh. More precisely, the choice between I _Nk and I _{N (k + 1)} simply depends on the mesh T ^b _k or T ^f _{k + 1} having generated the current occultation zone. Then, this mesh is used to predict the image and give I ^e,. The choice between I ° ι and I _Nk or ₊ υ ^se ^ based ^on prediction error criteria: the image giving the lowest error is retained. Thus, it suffices to insert 1 bit in the bit stream, per zone, to code the choice of the prediction retained.

Let us denote I _r the selected reference image.

The rest of the coding of these areas consists of 2 steps:

- A prediction step

- A step of coding the prediction error or the original texture in case of bad prediction

3.1 Residual prediction of the texture of the occlusion zones 3 modes can be used, exclusively. The decision is made based on the criterion of least error. Mode 1: The Y, U and V values of a pixel in the area are simply that of the pixel with the same location of the reference image L. Let î be the resulting image. We then code the prediction error between I, and II. Mode 2:

A motion estimate is then made between 1, (the image to be coded) and Ii (the result of the prediction of mode 1) on the occultation area. The resulting mesh, resulting from the last level of the mesh T _r , r = k or k + 1, before the overlaps of meshes, is then coded as well as its nodal movements. Finally, the residual prediction error is coded according to a procedure defined below. Mode 3; No prediction is made and the original values of the pixels of the area are coded.

4. Coding of the texture or the prediction error on the occultation zones.

The original texture and the prediction error undergo the same coding, the principle of which is as follows:

It is assumed that an initial triangular mesh could have been defined from the motion mesh T _{r used} for the prediction of the area to be coded. The way to derive this initial mesh will be described later.

The texture is then approximated on each mesh according to a choice: - The meshes rich in high frequencies are coded based on transform in discrete cosine, known as DCT - The smoother meshes are coded by a model of finite elements, affine . Again, we will take advantage of a hierarchical approach to reduce the cost of coding the representation by mesh.

The approach adopted makes it possible to conserve the low coding cost associated with a regular hierarchy of meshes while allowing local adaptation to the content of the images made possible by the irregular decomposition of meshes.

From the initial coarse mesh of the zone, the meshes are subdivided into 4 triangular sub-meshes up to a given level. On the last level, an optional permutation of the diagonals of the quadrilaterals generated by 2 adjacent triangles can be implemented, if this induces a reduction in the approximation error.

4.1 Initialization of the mesh of texture on the occultation zones This mesh is simply given by the last level of Tr (mesh resulting from the displacement of T _k or T _{k + 1} according to the chosen direction) before appearance of the reversals on the zone considered. Thus, we have a texture mesh that fits naturally into the movement mesh, since extracted from the latter. 4.2 Representations used for the texture on the triangles

2 representations are combined: affine interpolation and DCT

Angular sorting.

Affine interpolation

The nodes of the triangular mesh carry the photometric information (color, error) and the interpolation for the points inside the triangle is carried out by a finite element of Lagrange, also called affine interpolation. The value v (j?) Of the point p (x, y) inside the triangle e _{i J k} defined by the 3 nodes p _t , l = i, J, k is given by the following equation:

where Ψ, (/ = / ^' , j, k) represents the barycentric coordinates of the point. v (J3) can be one of the photometric components Y, U or N of the point, or even the prediction error for these components.

Several methods can be used for the calculation of the nodal values, in particular the method of least squares.

Transformation into discrete cosine (DCT) on triangles The principle of the method consists in transforming any triangle into a right isosceles reference triangle. The content of this triangle is then symmetrized with respect to the hypotenuse to give a symmetrical square matrix (Figure 4).

A classic (square) CSD is then applied to this matrix. We can show that the transformed matrix is also symmetrical. Only the coefficients of its lower triangle are then quantified and then statistically coded (entropy coding).

Figure 4 describes the different stages of the process: selection of the triangle T, affine transformation of this into a right isosceles triangle T '. Due to the affine transformation, the pixels of the triangle are no longer located on a regular orthogonal grid, and the photometric values of the interior of the reference triangle should be resampled. For this, we use a process analogous to that of motion compensation in the image (in this case the affine transformation), using an interpolator, for example bilinear.

The affine transformation F and its inverse F ¹ are defined by the following equations:

[x - ⁽ * ³ ~ ^{l) 1 ~ y) +} ( ^{y1 "yg) (Xi ~ x)} N

( ^χ ₃ - ^χ ^ + (y ₂ -y ^) + (y ₁ -y ₃ ) (* ₂ - ι)

F: (x ₁ - x ₃ ) + (y ₁ - y) + (y ₃ - y) (x ₁ - x) _N

(χ ₃ - χι) (κ ₂ - yi) + (ι - y ₃ ) (^ 2 - ^ 1) x = x, + (x ₃ -x _x ) - + (x ₂ ^{_ Λ} ι) - ^

F ^"

The photometric values M (i, j) an triangle T '(therefore of block M, symmetrized of T') are obtained by inverse transformation F ^"1 then interpolation F of the texture of the image to be coded: where the coefficient M (i _γ , j _x ) is the value at point Q (X, Y) whose transform P (x, y) is F ^~ Q)

I _v denotes the interpolator used to calculate the value of the image at the point, with potentially non-integer coordinates.

The reconstruction f of the texture F is given by: or /. denotes the interpolated texture from the values of block M ', quantified version of M.

This technique can only be applied to triangles with non-zero areas. But such triangles do not require texture coding by definition. Unlike the SADCT (DCT adapted to a shape), this transformation does not guarantee perfect reconstruction after reverse transformation, even in the absence of quantification.

In order to reduce the reconstruction error, a scale factor α is introduced for the calculation of block M; (of size N, XN ₍ ) for each triangle i: or :

• E is the whole part by excess,

• A, is the area of triangle i.

In fact, c = 1 achieves an interesting compromise, which is more effective for triangles close to an isosceles. Casa <1 is used in conjunction with the quantification step to compress the volume of information.

Once the block Mi has been defined for each triangle, a conventional DCT transformation is applied to them, and the transformed coefficients are quantified according to several possible methods, for example a uniform scalar quantization, or even an incremental quantization with the frequency of the transformed coefficient. The use of well known MPEG or JPEG quantization matrices is also possible.

We have the relation F (u, v) ≈ F (v, u) because: K j) = / CAO (V ", v, i, = 0, -, Nl) pr definition. Consequently, one can be satisfied with calculating only the coefficients of the lower part of the transformed matrix. 4.3 global texture coding

As indicated above, we use a uniform hierarchical mesh obtained by dividing each triangle of a given level of the hierarchy into 4 sub-triangles, by inserting nodes in the middle of the arcs. The process is repeated iteratively up to a maximum level. This hierarchy of triangles is also represented and managed by the coder in the form of a quaternary tree (Figure 5). Note that only. the triangles included in the area to be coded are taken into account. The basic initial mesh construction process guarantees that any triangle in the mesh hierarchy belongs to the area to be coded.

The process of coding by meshing a concealment area can be summarized as follows:

© a nested hierarchical mesh is defined on the area to be coded, by creating a regular initial mesh then iterative sub-di vision of the triangles into 4 sub-triangles by insertion of new nodes in the middle of the arcs. The values with the nodes are calculated to minimize the error of approximation of the zone by the mesh; © the pixel values are approximated by an affine interpolation on the triangle containing them from its values at the nodes.

For each triangle of the hierarchy, we then evaluate the approximation error E then we decide on the different modes of representation and coding according to 2 thresholds: σ _x and σ ₂ : φ if E <σ _x , the interpolation refines is sufficient on the triangle; © if ^{σχ σ2} , a finer decomposition of the triangle must be used to obtain a good approximation, always by affine interpolation. © if E> σ, the triangle is textured and the refining interpolation error is coded using the DCT. Finally, on the finest mesh, we test the error reduction provided by the diagonal permutation of the quadrilaterals formed by 2 adjacent triangles. In the event of a positive result, this permutation is validated.

According to the coding modes chosen for the different triangles, the different information is coded as follows. The YUN nodal values are first predicted from the values of the parent nodes (ends of the arc where the current node is inserted) . The difference between the value of the node and its predicted value is then quantified.

Finally, the structure of the quaternary tree (including the indicators of division or not of the triangles), the indicators of permutation of diagonals, the nodal differential values of YUN and the quantized DCT coefficients are coded by an arithmetic coder and inserted in the train binary.

5. Summary of the information coded in the bit stream of the coded frames by mesh

Each group of frames encoded in mesh mode between Ν _k +1 and N _{k + 1} -1 (where N _k N _{k + 1} are respectively the previous frame and the next frame encoded in MPEG mode) is represented as a whole in the stream binary.

The information conveyed includes, in coded form:

A header for the entire group of frames, including among other things the actual number of encoded frames. The motion meshes (structure and displacement vectors of the nodes) T and T ^f _{k + 1} .

The prediction or original error texture, for each image in the group.

6. Global structure of the bit stream The global bit stream consists of a succession of frames encoded in MPEG mode, and groups of frames encoded in mesh mode, as shown in Figure 8.

The global header of the bit stream representative of the coded sequence contains inter alia the indication of hybrid coding. The part of the bit stream corresponding to a group of frames coded in mesh mode begins with a header indicating inter alia the number of frames actually coded, possibly zero.

The different data streams (bit streams) corresponding respectively to the global header of the coded sequence, to the MPEG encoded images and to the groups of i images encoded in interpolated mesh mode can be sent on different independent channels if necessary. In particular, the coding method allows hierarchical (or scalable) decoding of the sequence, that is to say a decoding using only part of the total bit rate.

7. Decoding process Figure 9 gives a general view of the principle of decoding.

First of all, the decoding of the header makes it possible to activate the hybrid decoding.

Then, the decoder recognizes for each part of the bit stream corresponding to an autonomous entity whether it is an MPEG-4 encoded frame or a group of frames encoded by mesh. The MPEG-4 frames are supplied to the MPEG-4 decoding module, and the groups of coded frames in mesh mode are supplied to the mesh decoding module.

7.1 Decoding based on mesh

First of all, the motion meshes T ^b _k and T ^f _{(k +} ι _} for the group of images I „N _k <l <N _{k + 1} are decoded.

Then, the occultation zones for these images are found according to the same process as with the coder.

The pixels outside the occultation zones are simply interpolated from the I _Nk images and and fields of motion T ^b _k and T ^f _{k + 1} . The coarsest texture mesh (top of the hierarchy) is found for each occultation zone according to a process identical to that of the coder.

The information associated with the corresponding hierarchical mesh (triangle division indicator, affine interpolation or DCT coding decisions, differential nodal YUV value and quantized DCT coefficients) is then decoded and the YUy values of the pixels of these reconstructed areas.

Claims

1. Image coding method, characterized in that it selectively implements at least two image coding modes, each optimizing the compression of at least one image of a video sequence according to criteria of different optimization.

2. Coding method according to claim 1, characterized in that information on the choice of one of said coding modes is known to a decoder according to at least one of the techniques belonging to the group comprising: predefined choice, known to coding and decoding; - information representative of the choice included in a data stream comprising at least some of the coded image data; information representative of the choice included in a data stream independent of the coded image data; - determination of the choice intrinsically, by the decoder.

3. Coding method according to any one of claims 1 and 2, characterized in that it comprises a step of selecting a coding mode to be applied to said image, among at least: - a first coding substantially optimizing a photometric representation of an image; a second coding substantially optimizing a representation of the movement between at least two images.

4. Coding method according to claim 3, characterized in that said second coding takes account of at least one previous image and / or at least one following image coded using said first coding.

5. Coding method according to claim 4, characterized in that said second coding takes into account a field of motion vectors calculated from the immediately preceding image coded using said first coding and / or a field motion vectors calculated from the image immediately following coded using said first coding.

6. Coding method according to claim 5, characterized in that said motion vector field is applied to a mesh.

7. Coding method according to any one of claims 5 and 6, characterized in that said motion vector fields are used to determine a deduced motion vector field, associated with an image coded using said second coding.

8. Coding method according to any one of claims 5 to 7, characterized in that said second coding implements a forward motion estimation, between an image I _t ι and a next image I _t2 , and a compensation step motion in which the motion vectors obtained during said motion estimation and leading to no reversal are weighted by a scalar k = m / (tl + t2), 0 <k <l, so as to interpolate at least one image Iml , m belonging to [tl, t2].

9. Coding method according to any one of claims 5 to 7, characterized in that said second coding implements an estimate of backward movement, between an image I _β and a previous image I _tl , and a step of compensating for motion in which the motion vectors obtained during said motion estimation and leading to no reversal are weighted by a scalar k '= lm / (tl + t2), O≤k'≤l, so as to interpolate at least one image Im2, m belonging to [tl, t2].

10. Coding method according to claims 8 and 9, characterized in that it implements a back estimate and a front estimate, so as to obtain an estimated image Im being worth Im = aEl + (la) E2 with O≤a≤ l.

11. Coding method according to any one of claims 3 to

10, characterized in that said selection step is based on the implementation of a fixed factor sub-sampling N, an image on N being coded using said first coding.

12. Coding method according to claim 11, characterized in that N is greater than 2.

13. Coding method according to any one of claims 11 and 12, characterized in that N is variable.

14. Coding method according to any one of claims 3 to 11, characterized in that said first coding implements a transformation on blocks of images and a temporal prediction by blocks.

15. Coding method according to claim 14, characterized in that said first coding is an MPEG-4 or H26L coding.

16. Coding method according to claim 115, characterized in that the images delivered by said MPEG-4 or H26L coding comprise type I (intra) and or type P (predictive) images.

17. Coding method according to any one of claims 3 to 16, characterized in that said second coding is based on the implementation of a hierarchical mesh of M levels, M being greater than or equal to 1.

18. Coding method according to claim 17, characterized in that said mesh is triangular.

19. Coding method according to any one of claims 17 and

18, characterized in that it comprises a step for managing the occlusion zones.

20. Coding method according to any one of claims 1 to

19, characterized in that it produces at least two data streams, which can be transmitted over independent transmission channels.

21. Coding method according to claim 20, characterized in that said data streams belong to the group comprising: a global header; image data encoded according to said first encoding; - image data coded according to said second coding.

22. Method for decoding an image signal coded using the coding method of any one of claims 1 to 21.

23. Device for coding an image signal coded using the coding method of any one of claims 1 to 21.

24. Device for decoding an image signal coded using the coding method of any one of claims 1 to 21.

25. A decoding device according to claim 24, characterized in that it comprises means for determining at least part of a vector field and / or at least part of the occlusion zones, similar to those implemented during coding.

26. Device for storing at least one image signal coded using the coding method of any one of claims 1 to 21.

27. System for coding, transmitting and / or decoding an image signal coded using the coding method of any one of claims 1 to 17.

28. System according to claim 27, characterized in that information on the choice of one of said coding modes is known to a decoder according to at least one of the techniques belonging to the group comprising: predefined choice, known for coding and for decoding; - information representative of the choice included in a data stream comprising at least some of the coded image data; information representative of the choice included in a data stream independent of the coded image data; ^' - determination of the choice intrinsically, by the decoder.

29. A computer program product for coding and / or decoding an image signal coded using the coding method of any one of claims 1 to 21.

30. Data carrier carrying a computer program for coding and / or decoding an image signal coded using the coding method of any one of claims 1 to 21.

31. Image data signal, characterized in that it comprises data coded according to the method of any one of claims 1 to 21.

32. Signal according to claim 31, characterized in that at least one flag indicating whether or not the method according to any of claims 1 to 21 is activated.

33. Signal according to any one of claims 31 and 32, characterized in that it comprises data specifying the structure of the frames, at the start of the video sequence and / or in each signal frame.

34. Signal according to any one of claims 31 to 33, characterized in that a sequence coded using said second coding begins with a header specifying the number of frames coded according to this second coding.

35. Signal according to any one of claims 31 to 34, characterized in that it comprises at least two data streams, which can be transmitted on independent transmission channels.

36. Signal according to claim 35, characterized in that said data streams belong to the group comprising: a global header; - image data coded according to said first coding; image data coded according to said second coding.

37. Application of the coding method according to any one of claims 1 to 21 to at least one of the fields belonging to the group comprising: - digital television; real-time video over IP network; network real-time video to mobiles; storage of image data.