WO2015055902A1

WO2015055902A1 - Method of encoding and decoding a sequence of video images with predetermined modes of filling for the pixels of occluded parts in the images

Info

Publication number: WO2015055902A1
Application number: PCT/FR2014/000224
Authority: WO
Inventors: Gang Xiao
Original assignee: Université de Nice Sophia Antipolis; Centre National De La Recherche Scientifique
Priority date: 2013-10-18
Filing date: 2014-10-16
Publication date: 2015-04-23
Also published as: FR3012280A1; FR3012280B1

Abstract

The invention relates to a method of video encoding and decoding, on the basis of a reference image (F1, F2) and of a field of motion vectors (M1, M2), there is calculated a predicted image (P2, P3) and a residual image (R2, R3). During encoding, the method comprises a step of generating a list of indicators of order of occlusion which is transmitted to the decoding with the fields of motion vectors (M1, M2), the predicted image (P2, P3) containing occlusion bands formed by pixels whose vector field does not match a valid position in the reference image (F1, F2). The list of indicators of mode of filling of pixels of the occlusion bands of the predicted image (P2, P3) is transmitted from the encoding to the decoding, according to which list, during decoding, there is tracked a pre-established mode of filling so as to fill each pixel in said occlusion bands of the predicted image (P2, P3).

Description

A method of encoding and decoding a sequence of video images with predetermined fill modes for pixels of occluded portions in the images

The present invention relates to a method for encoding and decoding video images with predetermined fill modes for the pixels of the occluded portions in the images.

It is known, in a video coding and decoding method for a sequence of images, to perform, when encoding, from a reference image, respectively preceding or after, and from vector fields of motion, computing a predicted image, respectively next or previous, and a residual image.

The residual images as well as the motion vector fields are transmitted to the decoding for obtaining display images, the decoding consisting for each display image in calculating a predicted image according to the associated motion vector field and obtaining a reference image obtained after correction of the image predicted by the associated residual image, the reference image used to produce the display image.

Conventionally, the pixels of the predicted image are grouped in blocks and the motion vector field is presented in approximation with a vector per block. These vectors are then compressed before transmission to decoding.

Alternatively, the image can be divided into coherent motion segments, each segment corresponding to an object or scene of the image that moves independently of the other segments. In such a segment, the motion vector field can be described by a simple mathematical model, which greatly reduces the bandwidth for transmission of the vector field with respect to block-to-block transmission. Frequently, most of the bandwidth of a sequence of video images is generated by occlusion zones, where part of an object of the predicted image is hidden by another moving object or frame. of the image in the reference image. In this case, it is used known algorithms that try to estimate the hidden part by taking a part of the reference image that closely resembles this hidden part. Several disadvantages limit the performance of these algorithms.

First, since the hidden part does not exist in the reference image, the estimation of the occluded part of the predicted image is only an approximation which is often quite far from the true content of the image. The difference between the prediction and the real part of the image must then be corrected by the residual image which must not only provide the real part of the image but also rectify the predicted part, because the latter is very lucky to to be very different from the real part.

Tests carried out show that in the majority of cases, this double correction requires a higher bandwidth for the residual image than a simple supply of the occluded part of the predicted image.

When the bandwidth is limited, a compression at a higher rate of the residual image is necessarily applied, inevitably causing a decrease in the quality of the image. In addition, the artefact introduced by aggressive compression at high rate comes not only from the real part of the image but also from the predicted part and the artifact of the predicted part is much more visible than that of the real part because it is noises without corresponding signals. These noises are often in the form of rings in occluded areas.

Then, the search for the predicted parts of the occluded area being random, the resulting motion vectors are also random, these vectors of random nature being difficult to compress, which leads to a significant increase in the bandwidth for the transmission of motion vector field. An analysis of the real images shows that these random vectors often occupy the majority of the bandwidth of the compressed motion vector field. Finally, in the occluded parts, the random motion vectors do not allow efficient temporal interpolation for the intermediate images between the reference image and the predicted image. As a result, new corrections by residual images are required for each of the intermediate images, whereas the information for these images is very often already contained in the reference image and the predicted image. Therefore a more correct vector field would have the potential to significantly increase the compression ratio for these intermediate images.

Recently, several occlusion treatment methods have come to the attention of the public. These methods focus on the determination and interpretation of occlusions in the case of temporal interpolation of an image interspersed between two known reference images.

For compression and transmission of a sequence of video images, the most common step is to predict an image from a previous reference image. This situation is very different from the temporal interpolation of the image and an improvement of the efficiency of the compression in this case is very important because it occupies a large part of the bandwidth of a compressed video sequence.

The document US-A1-2010/283892, representing the state of the art closest, describes a video encoding and decoding method in a sequence of images, for which method, from a reference image and a motion vector field, it is calculated, during encoding and decoding, a predicted image and a residual image, the residual images as well as the motion vector fields being transmitted to the decoding for obtaining of decoding display images, which method, during encoding, includes the step of generating a list of occlusion order indicators, the list of occlusion order indicators being transmitted to the decoding with the motion vector fields, the predicted image containing occlusion bands formed by pixels whose vector field does not match a valid position in the reference image. If, in this document, it is indicated that the pixel filling of the occlusion bands is done according to a list of occlusion order indicators, it is not described a mode of filling each pixel in the bands. occlusion that can be selected according to the parameters of the occlusion bands.

The same is true for the documents US-A1-2011 / 129015, US-A1- 2011/211111 and EP-A1-2 602 997.

Document US-A-2003/039307 relates to a system and method for video encoding and decoding with the transmission of an occlusion order indicator. This order indicator can indicate a relative order between two segments, such as a position of one segment relative to another segment or indicate an absolute order that gives each segment a place in the sequence of orders. The order indicator can be determined based on occlusion differences in segments between the reference image and the predicted image.

If this document describes an occlusion order indicator, it has been found that the application of such a method produces artifacts that impair image quality and adversely affect compression. This document will be more fully detailed in the present patent application by way of example to illustrate a treatment of the occlusion zones according to the state of the art.

The problem underlying the present invention is thus to find a method of decoding and encoding which reduces the transmission bandwidth required by occlusion positions or positions outside the pixel frame of the reference image by making a better prediction of the luminance values or color values of the pixels in these positions.

For this purpose, the invention relates to a video encoding and decoding method in an image sequence, for which, from a reference image and a motion vector field, it is calculated, during the encoding and decoding, a predicted image and a residual image, the residual images as well as the motion vector fields being transmitted to the decoding for obtaining display images after decoding, which method, during the encoding, comprises the step of generating a list of occlusion order indicators, the list of occlusion order indicators being transmitted to the decoding with the vector fields of motion, the predicted image containing occlusion bands formed by pixels whose vector field does not match a valid position in the reference image, characterized in that a list of fill mode indicators of pixels of the occlusion bands of the predicted image is transmitted from encoding to decoding, according to which list, during decoding, it is followed by a preset filling mode to fill each pixel in said occlusion bands of the predicted picture.

Advantageously, the list of fill mode indicators is included in the occlusion order list or the list of fill mode indicators is independent of the occlusion order list and is transmitted from the encoding decoding separately from this order list. Advantageously, two zones of continuity of the motion vector field are defined in the reference image comprising an occluded zone and an occluding zone with a discontinuity curve between the two zones, the occluding zone covering at least partially the occluded zone, with in the predicted image the occlusion band created by the discontinuity between the two zones integrating in the occluded zone, the preset filling mode for a given pixel of the occlusion band is the value of the average of a selection of pixels belonging to the occluded area of the reference image, this selection of pixels being made for pixels of the occluded area of the reference image matched with the pixels of the predicted image closest to the given pixel.

Advantageously, said average is weighted by a weighting function dependent on the positioning of each pixel in said selection of pixels of the reference image.

Advantageously, said weighting function is a function of the gradients of the luminance values of pixels on the selection of pixels of the reference image and, when all of said gradients have a direction dominant, said function gives a stronger weight to the pixels whose positioning is close to the direction perpendicular to the dominant direction of the gradients.

Advantageously, the selection of pixels of the reference image is refined by taking into account only the corresponding pixels of the reference image mapped to the pixels of the predicted image closest to the pixel and lying solely in the a direction within an angular range defined by the fill mode indicator.

Advantageously, the pixels of an occlusion band of the predicted image, matched by the motion vector field to no valid corresponding position in the reference image, are divided into a multiple of groups and a control order. precedence is assigned to each of these groups, such that, when decoding, the pixel values of the predicted image are filled in groups by groups in order of precedence and the values of the pixels in an earlier group corrected by the content of the residual image, are used as reference values to calculate the averages in order to deduce the predicted value of a pixel in a posterior group.

Advantageously, when the residual image is divided into blocks during its compression, the division into groups of pixels of an occlusion band of the predicted image giving no corresponding valid position in the reference image is coherent. with the division in blocks used during the compression of the residual image.

Advantageously, during decoding, for a discontinuity curve separating an occluding zone and an occluded zone, the pixels belonging to the discontinuity curve are assigned to the occluding zone.

The invention also relates to a video encoding and decoding system for implementing such a method, which comprises:

means for encoding video images, comprising means for detecting a discontinuity between two continuity zones of a field of motion vectors on the reference image and means for generating a list of occlusion order indicators and a list of fill mode indicators,

decoding means for obtaining each display image, the decoding means comprising processing means according to the list of occlusion order indicators and a list of fill mode indicators, as well as means for selective implementation of these filling modes performing a pixel filling of the occlusion bands of the predicted images, ^{■ ·} - ^'

- The decoding means comprising means for previously storing a plurality of predetermined pixel filling modes occlusion bands.

Other features, objects and advantages of the present invention will appear on reading the detailed description which follows and with reference to the appended drawings given by way of non-limiting examples and in which:

FIGS. 1 and 2 show respectively first and second successive images of a video sequence,

FIG. 3 shows an occlusion band between two objects moving in an image,

FIGS. 4 and 5 are respectively a schematic representation of the known steps of a method for encoding and decoding video images, the method according to the invention being able to be used in the context of such a method of encoding and decoding,

FIGS. 6 and 7, 8 and 9, 10 and 11, 12 and 13 respectively show a predicted image and a residual image obtained according to first, second, third and fourth processes for treating the occluded zones in an image according to FIG. state of the art,

FIGS. 14 and 15 respectively show a predicted image and a residual image obtained according to a zone treatment method. occluded in an image according to the present invention for a first example of implementation of filling occluded pixels,

FIGS. 16 and 17 respectively show a predicted image and a residual image obtained according to a process for treating the occluded zones in an image according to the present invention for a second example of implementation of filling the occluded pixels,

FIGS. 18 and 19 respectively show a predicted image and a residual image obtained according to a process for treating the occluded zones in an image according to the present invention for a third example of filling implementation of the occluded pixels.

Fig. 1 shows a first image which, in a sequence of video images, is followed by a second image shown in Fig. 2 after moving at least one object in the first image. The images are actually in color and 32x32 pixels and represent a mobile object 1, here the rear of a motor vehicle, which advances relative to fixed objects, in Figures 1 and 2 a pole 2 and vegetation 3 .

In this example, between these two images respectively shown in Figures 1 and 2, the moving vector of the moving object 1 is five pixels to the left and one pixel down, relative to the fixed bottom 2, 3 .

The membership of the pixels of the second image of FIG. 2 to the different zones is illustrated in FIG. 3. This FIG. 3 shows three zones: an area relating to the moving object 1a forming the foreground, a zone relating to the fixed object 2a forming the background and an occlusion zone 4a here forming a band. In Figure 3, this occlusion band contains 193 pixels on the 1024 of the image.

To illustrate the steps of the encoding and decoding methods, reference will be made to FIGS. 4 and 5, respectively.

Referring to Figure 4, it is shown how the encoding of a sequence of video images takes place. In this figure, the references S1, S2, S3 indicate the source images of the video sequence that enter the encoder by its input Ee. The reference S1 indicates the first source image of the sequence and can give an image of type I, referenced Ί to this figure. In this image I, a motion vector field M1 is applied to produce the predicted picture P2, which serves as a prediction for the second picture. The image I and the motion vector field M1 are transmitted from the encoder to the decoder.

The predicted image P2 is compared with the second image source S2 and the difference between the predicted image P2 and the source image S2 ^'gives the image ^residual R2. The predicted image is then asso P2 ^'Ciee to the residual image R2 to give an F2 reference image which is used for preparation of the next predicted image P3 when M2 applies a vector field of motion. The same is true for the motion vector M3 and the following motion vectors, although this is not shown in Figure 4.

The same process is applied to the predicted image P3 to obtain a residual image R3 and then a reference image F3 to which the motion vector field M3 is applied. Such a process continues for the n reference images.

The residual images R2, R3 as well as the motion vector fields M1, M2, M3 obtained during the encoding are transmitted to the decoding by leaving the encoder by its output Se and entering the decoder by its input De shown in Figure 5.

Thus, generally during the encoding, from a reference image I, F2, F3, respectively preceding or subsequent, and motion vector fields M1, M2, M3, it is calculated, a predicted image P2, P3, respectively following or earlier, and a residual image R2, R3, the motion vector fields M1, M2, M3 and the residual images R2, R3 being transmitted from the encoding to the decoding.

In the specific case of a first image and a second image, it is calculated from the first image by adding to it motion vector fields a predicted image and the corresponding residual image results from the difference between predicted image. and second image. Referring to Figure 5, it is shown how the decoding of a sequence of video images takes place. In this figure, the references A1, A2, A3 indicate the display images of the video sequence that leave the decoder by its output Sd. During the decoding, in order to obtain display images A1, A2, A3 after decoding, a predicted picture P2, P3 is calculated according to the associated motion vector field M1, M2 and from FIG. a reference image F1, F2, F3. * The predicted image. 2, P3, corrected by the residual image

- *. . . ..

R2, R3 gives a next reference image F2, F3. The reference images F2 and F3 may have occluded zones and the pixels of these zones must be filled in order to obtain the corresponding display images A1, A2, A3.

To the reference images F2, F3 is added the associated motion vector field M2, M3 in order to obtain a new predicted image P3 which is processed in the same way with correction by the associated residual image R3 in order to obtain an image following reference F3. The process is thus continued for the n images of the video sequence thus decoded.

In what follows, four methods of treating the occlusion zones according to the state of the art will be described for comparison with three encoding and decoding methods with predetermined embodiments of fill mode implementation for occlusion zones, these alternative embodiments may further be used in the method according to the present invention, this in combination with a filling order list according to the present invention and will be described later.

Reference will be made to the first and second images, this being applicable to any reference image and to a previous or subsequent source image according to the encoding sense. For simplification it will be taken encoding and decoding on subsequent images, the second image following the first image, which is not limiting. This corresponds to compression of the images in front, but backward compression may also be possible.

In all the methods of treatment of occlusion zones compared, it is envisaged, during encoding, to calculate predicted images from the image of previous or subsequent reference and motion vector fields and then calculate the corresponding residual images that correct the predicted image to obtain the next or subsequent image in the encoding direction. These methods will be described by taking the first image as the reference image to which the images are applied _. motion vector fields to calculate a predicted image. The corresponding residual image is obtained by differentiating between the predicted image and the second image.

The first method according to the state of the art is described with reference to FIGS. 6 and 7. FIG. 6 shows the predicted image and FIG. 7 shows the residual image. This first method is called "block vector". It provides for the determination of an "optimal compensation vector" for each 8 × 8 pixel block by a motion search algorithm, the block in the predicted image being provided by the contents of the first image displaced by the compensation vector.

The predicted image and the residual image are obtained as previously indicated. The overflow effects of the pixel value limit where there is a large difference between the predicted image and the second image sometimes create strong local variations in the residual image.

According to this method, there are several blocks of which at least part is covered by an occlusion band. Such an occlusion band is formed by pixels of the predicted image whose vector field does not correspond to a valid position in the reference image.

It is thus defined for the present invention in the reference image of the pairs of continuity zones of the motion vector field. Each pair comprises an occluded zone and an occluding zone with a discontinuity between the two zones, the occluding zone covering at least partially the occluded zone. For the predicted image, the occlusion band is created by the discontinuity between the two zones, a discontinuity that forms the invalid positions in the reference image. This is valid for all the examples given in this application. In the specific case of FIGS. 6 and 7, for these blocks, the motion vectors have no significant effect on the result, because in any case there is no sufficiently similar counterpart in the first image.

For the methods and implementations described in the remainder of the present application, the motion vectors are transmitted by movement zone, ie the foreground or the background, the occlusion band belonging to the rear plan. These motion vectors contain a list of indicators that specifies the area of membership (foreground, background, or occlusion) for each pixel in the predicted picture.

The second method according to the state of the art is described with reference to FIGS. 8 and 9. This second method uses the "direct vector" filling mode to fill the predicted luminance or chromatic values of a pixel in the occlusion band using a continuity extension of the motion vectors of the background for matching to a corresponding position in the reference image, here the first image. If this extension actually reflects the true motion of the occluded pixel, the corresponding position in the reference image is occupied by the foreground object, so the luminance or chromatic value thus obtained is a bad prediction.

Referring to FIGS. 8 and 9, it can be seen that the prediction of the occlusion band thus obtained is erroneous, which generates a significant correction in the residual image and consequently a large bandwidth.

The third method according to the state of the art is described with reference to FIGS. 10 and 11. This third method uses the "inverted vector" filling mode, improves the previous method by replacing the motion vectors in the band. Occlusion by the vectors of the occluante zone, that is to say that of the foreground which covers the other zone, the latter zone being said occluded zone as previously mentioned.

These latter motion vectors map an occluded pixel of the second image to a position of the first image of the rearward image. plane, the content of which is closer to the occluded pixel than a position in the foreground.

It is visible in Figures 10 and 11 that, apart from the difficulties created by the post 2 visible in Figures 1 and 2, the improvement over the second method is significant.

The fourth method according to the state of the art is described with reference to FIGS. 12 and 13. This fourth method uses the filling mode, called "without prediction", according to which in the occlusion band no prediction is performed, the content of the second image being entirely provided by the residual image in this band. This method is similar to that described in US-A-2003/039307 mentioned in the introductory part of this patent application.

A major disadvantage of this filling mode is that the end of the occlusion band, here the right contour of the occlusion band as shown in FIG. 3, constitutes a line of strong gradients in the residual image, whereas that in the second image such a line does not exist. A compression of the residual image will create artifacts around this line, which are not masked by gradients in the composite image. The regularity of the line makes these artifacts easily noticeable by the human eye and can be unpleasant. This phenomenon therefore limits the compression ratio of the residual image, thus reducing the efficiency of the process.

Thus, the closest state of the art illustrated by the fourth method comprises a step of generation to encoding a list of occlusion order indicators, each occlusion order indicator being related to with a discontinuity detected between two continuity zones of a motion vector field on the reference image forming a discontinuity curve, the list of occlusion order indicators being transmitted to the decoding with the vector fields of movement.

Three embodiments will now be described for a method according to the present invention, these three forms not being limiting and being each relating to an example of filling implementation of occluded pixels or out-of-frame pixels. The names used to designate each of the three embodiments are purely illustrative and the embodiments so designated should not be restricted to a strict sense given to their respective denomination.

In general, these three embodiments and other possible variants of setting ^'filling -oeuvre for occluded parts or

, * '· "^'

occluding bands are used in a method according to the present invention for the video encoding and decoding of an image sequence processing the occlusion bands in the previously defined reference images, for which method a list of pixel fill mode indicators of the occlusion bands of the predicted image is transmitted from encoding to decoding. According to this list, during decoding, it is followed by a preset filling mode to fill each pixel of the occlusion bands of the predicted image.

By occlusion bands, it is understood any part hidden by a moving object and also any part in the out position of the frame of the reference image.

As previously mentioned, the occlusion bands are formed by pixels whose vector field does not match a valid position in the reference image F1, F2. It has been used valid position because the vector field can be fractional, so the corresponding position may not have integer coordinates.

Various filling modes may be contained in the decoder prior to decoding. According to a list of fill mode indicators emitted during the encoding, each indicator being valid for one pixel of an occlusion band, the filling of the pixel or the group of pixels of the occlusion band is done according to a pre-established filling mode already contained in the decoder and selected from among other filling modes.

This can be done in the following way: during the encoding, it is compared the efficiency of each of the modes of filling, modes that the encoding has previously stored and which are therefore at his disposal, for a pixel or for a group of pixels in an occlusion band. Then, from encoding to decoding via the list of mode indicators of filling, it is indicated the most efficient filling mode for the pixel or the group of pixels in question.

Advantageously, the list of fill mode indicators that is transmitted with the motion vectors can be independent or integrated in the list ^* of occlusion order indicators, each order indicator

»« »'

occlusion being related to a detected discontinuity between two continuity zones of a motion vector field on the reference image during encoding.

It is of course possible to combine these various variants of implementation of filling between them, this for various occluded pixels or various groups of out-of-frame pixels of the same image. There may also be the definition of a default fill mode for which, when no fill mode flag of a pixel of an occlusion band is provided, the fill mode is automatically the default one , either for an image or for all the images in a video sequence.

In what follows, what is stated for an occluded pixel is also valid for an out-of-frame pixel, these two pixels being part of an occlusion band.

Figures 14 and 15 relate to a first embodiment of filling implementation according to the invention. This first example of implementation, called "single filling", provides that for each pixel p in the occlusion band of the predicted image, it is first carried out a search for the minimum distance between the pixel p occluded and a non-occluded pixel of the background area.

The variable d1 therefore designates the minimum distance between the occluded pixel p and a non-occluded pixel of the background area.

This implementation example uses a single filling mode, called "omnidirectional average". When the motion vector field does not correspond to a given pixel of the predicted image any corresponding valid position in the reference image, i.e. this pixel is part of an occlusion band, the preset filling mode, called "omnidirectional average", gives to said occluded pixel of the occlusion band of the image predicts the average value of a selection of pixels of the reference image belonging to the occluded area. This selection of pixels is made for pixels of the reference image mapped to pixels of the predicted image closest to the given pixel.

In general, the previously calculated average may be weighted by a weighting function depending on the positioning of each pixel in said selection of pixels of the reference image belonging to the occluded zone.

For example, a weighted average can be calculated for the pixel content q of a reference image, whose corresponding pixel in the predicted image is located at a distance d2 from the occluded pixel p which does not exceed a certain multiple m of d1. The weighting r is a function of the ratio between d1 and d2. This average will be put on the occluded pixel p for the predicted image thus obtained.

In this example, m = 1, 7 and r = (d1 / d2) ^{2 have been taken} , which is not limiting.

Still in the case of a reference image equivalent to the first image shown in Figure 1, the result is shown in Figure 14 which can be seen a significant improvement over the methods of the state of the art. The biggest defect, however, remains in the center of the image, where the omnidirectional mean is not very effective in the highly directional characteristic of the column, which has been referenced 2 in FIGS. 1 and 2 and forming an element of the background, this being specific to the images processed and not characteristic of this first example of filling implementation.

However, unlike the first three methods according to the state of the art, there is no longer any phenomenon of overflow of luminance values or chromatic values of the pixel in the residual image.

Alternatives exist for the omnidirectional average. If in the preceding example, the weighting is inversely proportional to the square of the distance d2 between the pixels p and q, it is also possible in alternative of propose that r = d1 / d2, that is to say that the weighting is inversely proportional to the distance d2 between the pixels p and q.

Another alternative is to have a constant weighting, i.e. it does not depend on the position of the pixel q.

It is also possible to take, alternatively, weightings that favor one direction over the others. For example, the vector v = pq of coordinates (x, y), it can be defined a weighting r = y. (D1 / d2) ² .

This weighting has the effect of favoring the pixels q located in the vertical direction relative to the occluded pixel p. For example images, this formula performs better than the omnidirectional average for the four blocks in the center, but less efficient than a directional average defined later.

FIGS. 16 and 17 show a second example of implementation for a filling mode according to the invention, referred to as "adaptive filling." This second implementation example improves the first example of implementation of filling by applying another mode. for filling the four blocks in the center of the image According to this so-called "directional average" filling mode, only the pixels in the column direction, referenced 2 in FIGS. 1 and 2 and giving an example of an immobile object of the background, are taken into account in calculating the average.

According to a preferred implementation example, the list of fill mode indicators contained or not in the list of occlusion indicators transmitted by the encoder to the decoder with the motion vector fields specifies a directional average mode. filling the pixels of the occlusion band for some blocks, in the example of Figures 16 and 17 for the four blocks in the center and specifies an omnidirectional average mode for the other blocks of the predicted picture.

According to the "directional average" filling mode, the selection of pixels of the reference image for calculating the mean of the prediction of an occluded pixel p of the predicted image occlusion band is refined by taking into account only the corresponding pixels of the reference image set in corresponding to the pixels of the predicted image closest to the pixel p and lying only in a direction included in an angular interval defined by the fill mode indicator.

According to this filling mode and in the specific case of the images shown, it is preferentially the pixels q whose direction at the occluded pixel p is included in the angular interval between 68 and 90 ° with respect to the abscissa which are taken into account. in the calculation of the average for the pixel p.

According to an alternative of the implementation, the filling mode indicator indicates to the decoder that the angular interval is detected by the decoder by a statistical analysis of the gradients.

During decoding, the direction of filling belonging to a central block is detected by analyzing the gradients of the pixels belonging to the occluded or background zone in the neighboring blocks and then taking the direction perpendicular to the direction dominant of these gradients as filling direction.

Thus, a weighting function may also depend on a statistical function of the gradients of the luminance values of pixels on a selection of pixels of the reference image belonging to the occluded zone, all of said gradients having a dominant direction. The statistical function then advantageously gives a stronger weighting to pixels whose positioning is close to the direction perpendicular to the dominant direction of the gradients.

According to this example of implementation, in the specific case of the images shown, only the gradients whose perpendicular line meets the central block are taken into account in detecting the dominant direction of the gradients.

According to another example of implementation, the list of fill mode indicators integrated or not into the list of occlusion indicators transmitted by the encoder contains an automatic mode of filling the pixels of the occlusion band for all the blocks of the image. In this mode, the decoder performs an analysis of the gradients of the pixels belonging to the rear area. occluded plane or zone in neighboring blocks and switches to directional fill mode if a dominant gradient direction is detected.

The selection of pixels of the reference image can be refined by taking into account only the corresponding pixels of the reference image mapped to the pixels of the predicted image closest to the pixel and lying only in a direction included in an angular interval defined by the fill mode indicator.

The embodiments described above have the advantage of optimizing the filling while requiring a minimum of additional bandwidth for the list of fill indicators. The result of this example of filling implementation is illustrated in Figures 16 and 17. The improvement is substantial. This implementation example has the additional advantage of being less sensitive than the previous one to the accuracy of the positioning of the discontinuity contour.

The disadvantage of these two embodiments of filling implementation according to the present invention which have been previously described is that their complexity increases with the square of the width of the occlusion band. In addition, their effectiveness decreases with the increase of this width, because of the averages more and more wide.

Referring to Figures 18 and 19, there is shown a third example of filling implementation according to the present invention called "successive filling." This third example of implementation that can be combined with the two previous examples of implementation differs of the second example of implementation by dividing the pixels in the occlusion band into several groups and by establishing an order of precedence among the groups.

The filling of the band is done in groups starting from the group of superior precedence and continuing according to the decay of the precedence. Each time a group is filled, the correction by the residual image is applied and the result is used as reference values to calculate the predicted values of the pixels in the lower groups. The precedence is given to the groups whose pixels are closest to the known pixels of the continuity zone occluded by the other continuity zone, the latter being the occluding zone.

This implementation example, however, has a risk of accumulation of error. Indeed, since the luminance values or chromatic values of the residual image on certain pixels are used to deduce the luminance values or color values on other pixels, the noise introduced by the compression of the residual image risks spread, accumulate and grow in certain circumstances.

One solution to avoid this risk is to divide into groups of pixels respecting the divisions of the residual image used by the compression process. The residual image is frequently divided into blocks during encoding. It is therefore advantageous to perform the division into groups of the pixels of the predicted image that are in an occluded or invalid zone of the reference image in coherence with the division in blocks used during compression of the residual image. .

Thus, the noise introduced by the compression of the residual image can be taken into account in the calculation of the averages of the pixels in the subsequent groups, thus avoiding the propagation of the noise.

This is done in the present example, with the grouping of pixels in the occlusion band according to their belonging to the 8x8 pixel square blocks of the image. The precedence of the blocks, in descending order, containing pixels of the occlusion band is as follows:

(24,16), (8,0), (24,24), (16,8), (0,0), (16,16), (8,8), (16,24), (8) , 16).

In addition, given a shorter pixel referencing distance, only blocks (16,8) and (16,16) are placed in directional mode.

Figures 18 and 19 show the result of this example of implementation of filling, the improvement over the previous implementation example of Figures 14 and 15 is noticeable.

Another solution is to apply the fill mode recursively several times during encoding. At each recovery, these are the values luminance or color values of the pixels incorporating the compression noise of the residual image of the previous recovery which are used as a reference for calculating the averages.

In summary, in the examples shown in FIGS. 14 to 19, for the so-called single-fill implementation example, all the occluded pixels are filled by a single fill mode, i.e., in the average weighted, the weighting depends only on the distance d2 between the pixels p and q, knowing that, by its definition, d1 depends only on the occluded pixel p but not on the pixel q.

In the so-called adaptive fill implementation example, the occluded pixels in the four blocks in the center of the image are filled by taking a directional fill mode, where the average is taken only on the reference pixels in one direction given. Adaptive filling is therefore an example of a mixture of two filling modes for the same occlusion band.

The change of the filling modes from one block to another in the example can be done according to a list of indicators transmitted by the encoder to the decoder, with an indicator indicating the omnidirectional mode for each block 8x8 which is not located not at the center of the image and an indicator indicating the directional mode, with or without directional precision, for each of the four 8x8 blocks in the center of the image.

The filling modes, of which, during the encoding, a filling mode is selected, may also be known modes, such as the mode of filling by direct extension of the motion vector field, or inverse extension, or without prediction. Thus, it is possible to have at least five filling modes that can be predetermined to leave the choice when decoding between the possible modes of filling this according to the list of fill mode indicators developed by the encoder.

Several indicators can be used to quantitatively compare the efficiencies of the methods and their method of filling the pixels that have been described above. The first indicator d is the root mean square difference, pixel by pixel and color by color, between the predicted image and the actual image.

Then comes the indicator D which is the root mean square, pixel by pixel and color by color, of the difference of a pixel with its direct neighbor, horizontally and vertically, in the residual image. This indicator measures better than the previous the amount of information contained in an image, so its correlation with the size of the compressed residual image is narrower.

In addition, it is possible to have an indicator a, which is an indicator of creation of artifacts. It measures the excess of local variations in the residual image compared to the real image. This indicator measures the tendency for compression of the residual image to create visible artifacts in the composite image.

More concretely, for a pair of pixels p1 and p2 that are directly adjacent, vertically or horizontally, there is a difference v1 of the contents of p1 and p2 in the real image, and a difference v2 of the contents of p1 and p2 in the residual image .

It is defined:

a1 = (v2-v1 +2) / (v1 +2) if v2> v1, and

a1 = 0 if v2 <v1

where a is the root mean square of the a1 values for all directly adjacent pixel pairs.

The following table summarizes the values of each indicator for the described methods and fill mode implementation variants, adding as another indicator the size of a residual image compression by the jpeg standard, with a quality set at 80%.

For such compression with this standard, the image size is too small to achieve significant compression. So the residual image is repeated 32 times in each direction, to create an image of 1024x1024 pixels. It is this last image which is then compressed in jpeg, whose size is divided by 1024 to obtain the displayed value. The jpeg compression tool used (Imagemagick® version 6.6.9) does not take into account the redundancy of the repetition at 32 pixels, so these values represent the general situation.

The "noise" column gives the difference introduced by the jpeg compression, in quadratic pixel by pixel and color by color, which is the square root of the mean squared error known by the abbreviation MSE for "mean square error" in English, which is one of the most common measures of differentiation between two images.

For methods 1 to 3 of the state of the art, the values in parentheses correspond to the case where the overflows of the pixel values in the residual image are suppressed by limiting the differences by these limits of the luminance values or chromatic values of pixels. The true measure of the efficiency of the process depends on the treatment of the overflows and must be between the value in front of the parentheses and that between the parentheses. The advantage of the method according to the invention compared to the methods of the state of the art with any of the three examples of implementation of filling, these examples being respectively numbered 5 to 7, is even more important than the difference displayed according to the criteria of jpeg files.

The methods according to the invention with their example of implementation of the filling mode give a much smaller artefact creation value, which allows a compression at a higher rate of the residual image, thus a further reduction of the size of the file. Not to mention that, compared to the first method says 1 according to the state of the art, there is also the advantage of time interpolation.

In fact, to increase the noise level of 4,4 obtained with the third example of so-called implementation successively filled and numbered 7 in the table up to that of 5.3 of the method numbered 4 of the state of the technical, it is necessary to reduce the quality of jpeg compression to 50% by the same compression software, which produces a file size reduced to 55.

If the effect of such a reduction in compression quality on the visual quality of the image may become questionable, a 75% jpeg compression for the third exemplary fill mode numbered 7, producing a size file 91 for a noise level at 4.8, gives a visual quality clearly superior to that of processes 1 to 4 remaining on 80%.

Claims

claims

A method for encoding and decoding video in a sequence of images, for which, from a reference image (F1, F2) and a motion vector field (M1, M2), it is calculated during the encoding and decoding, a predicted picture (P2, P3) and a residual picture (R2, R3), the residual pictures (R2, R3) as well as the motion vector fields (M1 to M2) being transmitted to the decoding for obtaining display images (A2, A3) after decoding, which method, during encoding, comprises the step of generating a list of occlusion order indicators, the list of occlusion order indicators being transmitted to the decoding with the motion vector fields (M1, M2), the predicted image (P2, P3) containing occlusion bands formed by pixels whose field vector (M1, M2) does not match a valid position in the reference image (F1, F2), characterized in that a list of mode indicators of pixel mapping of the occlusion bands of the predicted image (P2, P3) is transmitted from encoding to decoding, according to which list, during decoding, it is followed a preset filling mode to fill each pixel in said bands occlusion of the predicted image (P2, P3).

The method of claim 1, wherein the list of fill mode indicators is included in the occlusion order list or the list of fill mode indicators is independent of the occlusion order list and is transmitted from encoding to decoding separately from this order list.

Method according to one of the preceding claims, for which two zones of continuity of the motion vector field in the reference image (F1, F2) are defined comprising an occluded zone and an occluding zone with a discontinuity curve. between the two zones, the occluding zone covering at least partially the occluded zone, with in the predicted image (P2, P3) the occlusion band created by the discontinuity between the two zones integrating in the occluded zone, the pre-established filling mode for a given pixel of the occlusion band is the value of the average of a selection of pixels belonging to the occluded zone of the image reference (F1, F2), this selection of pixels being made for pixels of the occluded area of the reference image (F1, F2) matched with the pixels of the predicted image (P2, P3) most close to the given pixel.

4. The method of claim 3, wherein said average is weighted by a weighting function dependent on the positioning of each pixel in said selection of pixels of the reference image (F1, F2).

The method according to claim 4, wherein said weighting function is a function of the gradients of the luminance values of pixels on the selection of pixels of the reference image (F1, F2) and, when all of said gradients are Dominant direction, said function gives a stronger weight to the pixels whose positioning is close to the direction perpendicular to the dominant direction of the gradients.

6. Method according to any one of claims 4 to 5, for which the selection of pixels of the reference image (F1, F2) is refined by taking into account only the corresponding pixels of the reference image (F1 , F2) matched to the pixels of the predicted image (P2, P3) closest to the pixel and lying only in a direction within an angular interval defined by the fill mode indicator.

The method according to any one of claims 3 to 6, wherein the pixels of an occlusion band of the predicted image (P2, P3) are divided into a multiple of groups and an order of precedence is assigned to each of these groups, such that, during the decoding, the pixel values of the predicted picture (P2, P3) are filled in groups according to their order of precedence and the values of the pixels in an earlier group, corrected by the content of the residual image (R2, R3), are used as reference values to calculate the averages to derive the predicted value of a pixel in a posterior group.

8. The method of claim 7, wherein, when the residual image (R2, R3) is divided into blocks during its compression, the division into groups of pixels of an occlusion band of the predicted image (P2 , P3) is consistent with the block division used during compression of the residual image (R2, R3).

9. The method according to claim 3, wherein, during decoding, for a discontinuity curve separating an occluding zone and an occluded zone, the pixels of the discontinuity curve are assigned to the occluding zone.

Video encoding and decoding system for implementing a method according to any one of the preceding claims, which comprises:

means for encoding video images comprising means for detecting a discontinuity between two continuity zones of a motion vector field (M1, M2) on the reference image (F1, F2) and means for generating a list of occlusion order indicators and a list of fill mode indicators,

decoding means for obtaining each display image (A1 to A3), the decoding means comprising processing means according to the list of occlusion order indicators and a list of mode indicators filling, as well as means for selective implementation of these filling modes performing a pixel filling of the occlusion bands of the predicted images (P2, P3),