GB2497812A

GB2497812A - Motion estimation with motion vector predictor list

Info

Publication number: GB2497812A
Application number: GB1122194.2A
Authority: GB
Inventors: Fabrice Le Leannec; Xavier Henocq
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2011-12-22
Filing date: 2011-12-22
Publication date: 2013-06-26
Anticipated expiration: 2031-12-22
Also published as: GB201122194D0; GB2497812B

Abstract

A means of estimating motion in successive pictures of a video stream as part of a compression process such as scalable video coding (SVC) or joint scalable video model(JVSM) is described. A motion vector representing motion of a current block of pixels is determined and used to update a list of motion search starting blocks of a reference picture used in determining a next motion vector for the next block to be coded. The aim of the invention is to avoid motion rupture caused by a discontinuity in the motion field due for example to sudden changes in motion at the boundaries of foreground regions and at such times the use of the median value motion vector from previously coded neighbouring blocks is not a good predictor of the starting position for the motion vector search. Moreover when later during the coding process the motion estimation re-enters such a region from the motion point of view the motion estimation process has forgotten any information previously gained about such a region. In such a case the median motion vector for the current macroblock is added to the list of motion search starting points used for motion estimation of subsequent macroblocks. The list of such points may be indexed and ordered and a pruning mechanism may be applied to avoid the list becoming too long.

Description

MOTION ESTIMATION WITH MOTION VECTOR PREDICTOR LIST

The present invention relates to video coding and decoding. In particular, the present invention relates to motion estimation and video compression based on motion estimation.

The SVC (Scalable Video Coding") video compression standard extends the H.264/AVC video compression standard.

When coding long groups of pictures (referred to as "GOP?) typically containing B or 16 pictures, SVC encoders with fast mode decision and fast motion estimation have low rate-distortion performances compared to the SVC reference software (referred to as "JSVIvT') In particular, they have low performances at low bitrates.

SVC encoders may employ a fast motion estimation process during the motion compensated temporal prediction performed during the encoding of P and B pictures of the SOPs.

This fast motion estimation provides a good coding speed. However, it has low compression efficiency.

Document US20040247029 discloses such a fast motion estimation algorithm.

Document US200801 72384 discloses a fast motion estimation algorithm dedicated to Multi-View video coding. According to the algorithm, a point in the block of pixels to predict is determined and then an "epipolar" line based on this point is determined. A starting point for the motion search is then determined with this epipolar line and a determination of the best starting point is computed for each macroblock.

Other techniques consist in searching, for each block of pixels to encode, a reference block in a reference picture that best matches the block to predict. A restricted search is performed around two candidate starting points, namely the co-located block of the current block to predict (with a zero motion vector associated) and the block pointed to by the motion vector predictor of the current block to predict. In the H.264/AVC video compression standard, the motion vector predictor is computed as the median value (respectively in x (abscissa) and y (ordinate) coordinates) of the motion vector components of three neighboring, already encoded, blocks of the current picture to code. The restricted search around these two starting points consists in three successive steps, which correspond to a restricted integer-pel motion search around the block considered, then half-and quarter-pel motion refinements around the best integer position found.

In order to improve the rate distortion performances of this motion estimation process in the case of hierarchical B pictures, an adaptive motion search area around the zero motion vector can be used for a pre-defined subset of macroblocks.

The size of the motion search area extension may be determined as tnction of the temporal distance between the predicted picture and the reference picture. However, the motion search can only be extended for a limited set of macroblocks for complexity reasons. The motion search may not enable to find a good motion vector for macroblocks where a discontinuity of the motion fieid appears and for which the median motion vector is not a good motion vector predictor.

This may happen if a motion field discontinuity appears in some macroblocks where the motion search is not extended.

Thus, there is a need for improving motion estimation methods, in particular for pictures containing discontinuous motion fields, without excessively increasing complexity.

According to a first aspect of the invention there is provided a method of estimating motion in successive pictures of a video stream comprising the following steps performed for a current block of pixels of a current picture of the video stream: -determining a current motion vector representing motion of the current block of pixels, and -updating, based on the current motion vector determined, a list of motion search starting blocks of a reference picture used for determining a next motion vector representing motion of a next block.

Embodiments of the invention provide a good trade-off for the motion estimation by a video encoder (for example an H.264/SVC encoder), between motion estimation speed and compression efficiency. They provide a good trade-off, in particular, in the case of the hierarchical B pictures temporal coding structure.

Embodiments of the invention are compatible with the motion vector competition (MVComp) scheme.

Embodiments of the invention make it possible to: -maintain the speed of the restricted three step motion search, -manage discontinuities in the motion field when several foreground and background objects appear in the video scene, and/or -increase the coding efficiency of motion information with the adaptation of the MVCOMP motion vector coding allowed by the present fast motion estimation.

A starting block is fully specified by its spatial position relative to the spatial position of the current block being predicted in current picture.

According to embodiments, the current motion vector is determined based on at least one search starting block of the list.

Hence, it is made possible to perform a restricted fast motion search around more than two starting points. Also, these points may be different from the so called "co-located" starting point and the so called "median" starting point.

The starting points considered in motion estimation may be dynamically updated as long as successive blocks are being encoded.

For example, the determination of the current motion vector comprises the following steps: -for each starting block in the list, performing a restricted motion search around the starting block and determining a possible motion vector based on a result of the restricted motion search, and -selecting one of the said possible motion vectors determined for each starting block in the list based on a selection criteria, the possible motion vector selected being the determined current motion vector.

The method may further comprise associating each possible motion vector respectively determined for each search starting block in the list with a cost value on which is based the selection criteria.

Hence, the selection of the motion vector is fast and accurate.

Updating the list may comprise inserting a block pointed to in the reference picture by the current motion vector in replacement of a current search starting block in the list used for determining the current motion vector.

For example, the block pointed to in the reference picture by the current motion vector is inserted in replacement of a current search starting block when the motion vector selected is determined based on a restricted motion search around a search starting block which is not: -a block pointed to in the reference picture by a median motion vector, said median motion vector being a median vector of motion vectors already determined for blocks in a vicinity of the current block, or -a block of pixels in the reference picture having same coordinates in the reference picture as the current block in the current picture (i.e. collocated block).

Hence, the length of the list may be controlled so that it does not require too much memory to store it. Also, keeping a reasonable length for the list enables keeping the processing speed at a satisfying level.

Updating the list may comprise inserting a block pointed to in the reference picture by the current motion vector.

Updating the list may comprise inserting a block pointed to in the reference picture by a median motion vector, said median motion vector being a median vector of motion vectors already determined for blocks in a vicinity of the current block.

For example, the block pointed to in the reference picture is inserted in the list when the current motion vector is determined based on a block of pixels in the reference picture having same coordinates in the reference picture as the current block in the current picture.

According to embodiments, the method further comprises a step of ordering the list of motion search starting blocks.

Hence, the selection of the search starting block may be facilitated since the relevant search starting block is at the top of the list.

For example, the motion search starting blocks are ordered according to respective frequencies of use for determining motion vectors.

Hence, the most used search starting blocks are easily available since they are more likely to be used.

The method may further comprise updating the frequencies of use of the motion search starting blocks.

According to embodiments, the motion search starting blocks are ordered according to a spatial location of the current block for which they have respectively been inserted in the list.

Hence, when a specific region is to be considered again, the relevant search starting block for this region are easily available without performing again a selection for this region.

According to embodiments, the method may further comprise, for a current block of pixels of a next picture of the video stream, determining a current motion vector representing motion of the current block of pixels in the pictures of the video stream based on at least one search starting block of the updated list, Hence, the list may be reused from picture to picture thereby optimizing processing resources.

The current and the next pictures may be comprised in a same temporal level.

The current and next pictures may be comprised in a same scalability layer.

Updating the list comprises removing at least one search starting block.

At least one search starting block may be removed in order to keep a length of the list under a length limit.

For example, at least one search starting block having a lowest frequency of use for determining motion vectors is removed.

According to embodiments, the at the at least one search starting block removed is different from at least one of: -a block pointed to in the reference picture by a median motion vector, said median motion vector being a median vector of motion vectors already determined for blocks in a vicinity of the current block, and -a block of pixels in the reference picture having same coordinates in the reference picture as the current block in the current picture.

Thus, these blocks may be kept in the list in any case, even in case they have the lowest frequency of use.

For example, in case at least one of these blocks has the lowest frequency of use, the block in the list with the lowest frequency and different from these two blocks is removed.

The list may be associated with the reference picture.

For example, the list is associated with a reference picture index representing the reference picture.

Hence, the list may be available for each picture using the reference picture.

Motion may be estimated using a plurality of reference pictures and a list of search starting blocks may be associated with each reference picture.

According to a second aspect of the invention there is provided a method of coding a video stream comprising the following steps: -estimating motion in successive pictures of the video stream according to the first aspect, -coding a first syntax element indicating a search starting block of the list used for determining the current motion vector, and -coding the current motion vector as a difference between said current motion vector and the motion vector pointing to the search starting block of the list used for determining the current motion vector.

The syntax element may enable a decoder to rebuild the same list of search starting point as the coder for decoding the video stream.

For example, the first syntax element comprises a position in the list.

The method may further comprise coding a second syntax element containing an indication to a decoder receiving the coded video stream to insert a search starting block in a list of motion search starting blocks of the decoder.

The second syntax element may contain an indication to the decoder to insert in the list of motion search starting blocks of the decoder a block pointed to in the reference picture by a median motion vector, said median motion vector being a median vector of motion vectors already determined for blocks in a vicinity of the current block.

The method may further comprise coding a third syntax element that contains an indication to a decoder receiving the coded video stream to remove a search starting block from the list of motion search blocks of the decoder.

The method may further comprise coding a fourth syntax element that contains an indication to a decoder receiving the coded video stream to reorder the list of motion search blocks of the decoder.

The fourth syntax element may contain an indication to the decoder to reorder the list of motion search blocks of the decoder in the same order as the list of motion search blocks updated during the updating step of the motion estimation according to the first aspect.

According to a third aspect of the invention there is provided a method of decoding a video stream comprising the following steps performed for a current block of pixels of a current picture of the video stream: -decoding a current motion vector representing motion of the current block of pixels coded by a coder as a difference between said current motion vector and a current motion vector predictor, by adding to said difference a motion vector pointing to a motion starting block from a list of motion search starting blocks of a reference picture, -determining a search starting block used by said coder for determining the current motion vector, and -updating the list of motion search starting blocks said search starting block determined for decoding a next motion vector representing motion of a next block.

According to embodiments, for example when the search starting blocks are represented by relative positions with respect to the current block, the determination step is part of the decoding step.

The method may further comprise decoding a first syntax element indicating said search starting block determined.

The first syntax element may comprise a position in the list.

The method may further comprise inserting a search starting block in the list of motion search starting blocks.

The method may further comprise decoding a second syntax element containing an indication to insert said search starting block in the list of motion search starting blocks.

The method may further comprise inserting in the list of motion search starting blocks a block pointed to in the reference picture by a median motion vector, said median motion vector being a median vector of motion vectors already determined for blocks in a vicinity of the current block.

The second syntax element may contain an indication to insert in the list of motion search starting blocks the block pointed to in the reference picture by a median motion vector, said median motion vector being a median vector of motion vectors already determined for blocks in a vicinity of the current block.

The method may further comprise removing a search starting block from the list of motion search blocks.

The method may further comprise decoding a third syntax element that contains an indication to remove the search starting block from the list of motion search blocks.

The method may further comprise reordering the list of motion search blocks.

The method may further comprise decoding a fourth syntax element that contains an indication to reorder the list of motion search blocks.

The method may further comprise reordering the list of motion search blocks in the same order as a list of motion search blocks of the coder.

The fourth syntax element may contain an indication to reorder the list of motion search blocks in the same order as a list of motion search blocks of the coder.

According to a fourth aspect of the invention there is provided a motion estimation device for estimating motion in successive pictures of a video stream comprising: -a control unit configured to determine a current motion vector representing motion of a current block of pixels of a current picture of the video stream, and to update, based on the current motion vector determined, a list of motion search starting blocks of a reference picture used for determining a next motion vector representing motion of a next block.

The motion estimation device may further comprise a memory unit for storing said list.

The current motion vector may be determined based on at least one search starting block of the list.

For determining of the current motion vector, the control unit may be further configured to: perform, for each starting block in the list, a restricted motion search around the starting block and determining a possible motion vector based on a result of the restricted motion search, and -select one of the said possible motion vectors determined for each starting block in the list based on a selection criteria, the possible motion vector selected being the determined current motion vector.

The control unit may be further configured to associate each possible motion vector respectively determined for each search starting block in the list with a cost value on which is based the selection criteria.

For updating the list, the control unit may be further configured to insert a block pointed to in the reference picture by the current motion vector in replacement of a current search starting block in the list used for determining the current motion vector.

The block pointed to in the reference picture by the current motion vector may be inserted in replacement of the current search starting block when the motion vector selected is determined based on a restricted motion search around a search starting block which is not: -a block pointed to in the reference picture by a median motion vector, said median motion vector being a median vector of motion vectors already determined for blocks in a vicinity of the current block, or -a block of pixels in the reference picture having same coordinates in the reference picture as the current block in the current picture.

For updating the list, the control unit may be further configured to insert a block pointed to in the reference picture by the current motion vector.

For updating the list, the control unit may be further configured to insert a block pointed to in the reference picture by a median motion vector, said median motion vector being a median vector of motion vectors already determined for blocks in a vicinity of the current block.

The control unit may be further configured to insert the block pointed to in the reference picture in the list when the current motion vector is determined based on a block of pixels in the reference picture having same coordinates in the reference picture as the current block in the current picture.

The control unit may be further configured to order the list of motion search starting blocks.

The control unit may be further configured to order the motion search starting blocks according to respective frequencies of use for determining motion vectors.

The control unit may be further configured to update the frequencies of use of the motion search starting blocks.

The control unit may be further configured to order the motion search starting blocks according to a spatial location of the current block for which they have respectively been inserted in the list.

For a current block of pixels of a next picture of the video stream, the control unit may be further configured to determine a current motion vector representing motion of the current block of pixels in the pictures of the video stream based on at least one search starting block of the updated list.

The current and the next pictures may be comprised in a same temporal level.

The current and next pictures may be comprised in a same scalability layer.

Updating the list may comprise removing at least one search starting block.

At least one search starting block having a lowest frequency of use for determining motion vectors may be removed.

For example, the at the at least one search starting block removed is different from at least one of: -a block pointed to in the reference picture by a median motion vector, said median motion vector being a median vector of motion vectors already determined for blocks in a vicinity of the current block, and a block of pixels in the reference picture having same coordinates in the reference picture as the current block in the current picture.

The list may be associated with the reference picture.

According to a fifth aspect of the invention there is provided a coder for coding a video stream comprising: -a motion estimation device according to the fourth aspect for estimating motion in successive pictures of the video stream, and -a control unit configured to code a first syntax element indicating a search starting block of the list used for determining the current motion vector, and to code the current motion vector as a difference between said current motion vector and the motion vector pointing to the search starting block of the list used for determining the current motion vector.

The first syntax element may comprise a position in the list.

The control unit may be further configured to code a second syntax element that contains an indication to a decoder receiving the coded video stream to insert a search starting block in a list of motion search starting blocks of the decoder.

The control unit may be further configured to code a third syntax element that contains an indication to a decoder receiving the coded video stream to remove a search starting block from the list of motion search blocks of the decoder The control unit may be further configured to code a fourth syntax element that contains an indication to a decoder receiving the coded video stream to reorder the list of motion search blocks of the decoder.

The fourth syntax element may contain an indication to the decoder to reorder the list of motion search blocks of the decoder in the same order as the list of motion search blocks updated by the control unit of the motion estimation device.

According to a sixth aspect of the invention there is provided decoder for decoding a video stream comprising a control unit configured, for a current block of pixels of a current picture of the video stream, to: -decode a current motion vector representing motion of the current block of pixels coded by a coder as a difference between said current motion vector and a current motion vector predictor, by adding to said difference a motion vector pointing to a motion starting block from a list of motion search starting blocks of a reference picture, -determine a search starting block used by said coder for determining the current motion vector, and -update the list of motion search starting blocks said search starting block determined for decoding a next motion vector representing motion of a next block.

The control unit may be further configured to decode a first syntax element indicating said search starting block determined.

The first syntax element may comprise a position in the list.

The control unit may be further configured to insert a search starting block in the list of motion search starting blocks.

The control unit may be further configured to decode a second syntax element containing an indication to insert said search starting block in the list of motion search starting blocks.

The control unit may be further configured to insert in the list of motion search starting blocks a block pointed to in the reference picture by a median motion vector, said median motion vector being a median vector of motion vectors already determined for blocks in a vicinity of the current block.

The control unit may be further configured to remove a search starting block from the list of motion search blocks.

The control unit may be further configured to decode a third syntax element that contains an indication to remove the search starting block from the list of motion search blocks.

The control unit may be further configured to reorder the list of motion search blocks.

The control unit may be further configured to decode a fourth syntax element that contains an indication to reorder the list of motion search blocks.

The control unit may be further configured to reorder the list of motion search blocks in the same order as a list of motion search blocks of the coder.

According to a seventh aspect of the invention there is provided a system comprising a coder according to the fifth aspect and a decoder according to the sixth aspect.

According to an eighth and a ninth aspect of the invention, there are provided computer programs and computer program products comprising instructions for implementing methods according to the first, second and/or third aspect(s) of the invention, when loaded and executed on computer means of a programmable apparatus such as a motion estimation device, a coder and/or a decoder.

According to an embodiment, information storage means readable by a computer or a microprocessor store instructions of a computer program, that it makes it possible to implement a method according the first, second and/or third aspect(s) of the invention.

The objects according to the second, third, fourth, fifth, sixth, seventh, eight, and ninth aspects of the invention provide at least the same advantages as those provided by the method according the first aspect of the invention.

Other features and advantages of the invention will become apparent from the following description of non-limiting exemplary embodiments, with reference to the appended drawings, in which: -Figure 1 is a schematic illustration of a device according to embodiments; -Figure 2 is a block diagram schematically representing a non-scalable H.264/AVC encoder; -Figure 3 illustrates a block diagram of an SVC encoder; -Figure 4 illustrates an SVC temporal coding structure; -Figure 5 is a flowchart of an initial coding mode selection algorithm to select macroblock coding modes; -Figures 6a, 6b and 6c illustrate a motion estimation process used in a fast H.264/SVC encoder; -Figure 7 shows coding modes obtained with initial fast motion estimation; -Figure 8 shows motion vectors obtained with exhaustive and fast motion estimation processes; -Figure 9 illustrates a motion vector prediction list management; -Figure 10 is a flowchart illustrating a global algorithm for encoding a video sequence; -Figure 11 is a flowchart illustrating an algorithm for searching the best INTER coding mode for a current macroblock to encode; -Figure 12 is a flowchart illustrating an algorithm used to perform motion estimation for a particular block to temporally predict based on a motion vector prediction list management; -Figure 13 which is a flowchart illustrating an algorithm for updating the list of motion search starting points.

The inventors have observed that when several objects in a same scene may have respective specific own motions, fast motion estimation according to the prior art may fail in finding the motion vectors (referred to as MVs) associated with each object, especially near the boundaries of the objects.

They have also observed that when the coding process considers (or "enters") a new region in the pictures, it does not use (or forgets") information concerning motion of the objects in the previously considered region. Therefore, if the "forgotten" region appears again in the picture (e.g. in subsequent lines of macroblocks), the coding process cannot use information concerning motion that was already processed and associated with that region.

Each time the best MV found results from a search around the "median starting point", it can be interpreted as meaning that motion remains substantially constant from macroblock to macroblock. Prior art motion estimation (referred to as ME) strategies may be used in such case.

However, if the best MV found results from another starting point, this can be interpreted as a "rupture" in motion in the scene. In this case, for subsequent macroblocks to encode, the median starting point for the motion search will have a different value from the one before the motion rupture. Prior art motion estimation strategies thus make the SVC encoders forget" the median predictor used before the motion rupture.

To overcome the above drawbacks and make the fast motion estimation find better motion vectors for images showing motion ruptures, a list of motion search starting points is generated and updated as a function of the motion vectors successively found during the processing of a current and preceding (if any) pictures.

Figure 1 is a schematic illustration of a device 100 according to embodiments. The device may be a motion estimation device, a coder and/or a decoder. The device comprises a RAM (Random Access Memory) unit 102 for storing processing data used for computations for implementing a method according to embodiments. The device may also comprise a ROM (Read Only Memory) unit 103 for storing a computer program according to an embodiment.

The device further comprises a control unit 701. The control unit may comprise a processor configured for implementing a method according to an embodiment, for exampie by executing instructions of a computer program according to embodiments. The computer program may be loaded from the ROM unit 103 or a hard-disc 106. The device further comprises a communication interface 104 for receiving an input video stream to process (motion estimation (ME, coding, and/or decoding). For example, the communication interface allows the connection of the device to a network for receiving the input video stream. Alternatively, the communication interface allows a direct connection to a device providing the input video stream. The connections to the network or the device may be wireless or cable connections.

The device may further comprise a user interface 105 for displaying information to a user and/or receive inputs from the user.

Figure 2 is a block diagram schematically representing a non-scalable H.264/AVC encoder 200.

An original sequence 201 to compress is input to the encoder. The encoder successively performs the following steps to encode an H.264/AVC compliant bitstream.

A current picture to compress is divided into 16x16 pixel macroblocks by a splitting module 202.

Each macroblock first undergoes a motion estimation operation by a motion estimation module 203, which searches among the reference pictures stored in a dedicated memory buffer 204, some reference blocks that would provide a good prediction of the current macroblock. This motion estimation step provides one or two reference pictures indexes that contain the found reference blocks, as well as the corresponding motion vectors.

A motion compensation module 205 then applies the estimated motion vectors on the found reference blocks and copies the so-obtained blocks into a temporal prediction picture.

An lntra prediction module 206 determines the spatial prediction mode that provides the best performance to predict the current macroblock and encode it in INTRA mode.

Next, a coding mode selection module 207 selects the coding mode, among the spatial and temporal predictions, that provides the best rate distortion trade-off in the coding of the current macroblock. The difference between the current macroblock (in its original version) and the so-selected prediction macroblock is calculated, which provides the (temporal or spatial) residual to compress.

The residual macroblock then undergoes a transform (DCT) and a quantization in a transform and quantize module 208.

Entropy coding of the so quantized coefficients is performed by an entropy coding module 209 which outputs the compressed texture data associated with the coded current macroblock.

The entropy coding module is given the coding mode and, in case of an inter macroblock, the motion data, as well as the quantized DCT coefficients previously calculated. This entropy coder encodes each of these data into their binary form and encapsulates the so-encoded macroblock into a container called NAL unit (Network Abstract Layer, not represented). A NAL unit contains all encoded macroblocks from a given slice. An encoded F-l.264/AVC bitstream consists in a series of NAL units.

Finally, the current macroblock is reconstructed through an inverse quantization and inverse transform (module 210) and a sum between the inverse transformed residual and the prediction macroblock of the current macrob lock.

Once the current picture is reconstructed, it undergoes a deblocking filtering process (module 211), which aims at reducing the blocking artifacts inherent to any block-based video coding system. Then it is stored in the memory buffer (the DPB, Decoded Picture Buffer) so that it can be used as a reference picture to predict next pictures to encode.

Figure 3 illustrates a block diagram of an SVC encoder 300. It is made of two or more stages 301 and 302.

The first stage 301 aims at encoding the H.264/AVC compliant base layer of the output SVC stream. Stage 301 is identical to the H.264/AVC encoder of Figure 2. It receives the original sequence to compress downsampled by a downsampling unit 305.

The second stage 302 codes an SVC enhancement layer on top of the base layer. This enhancement layer refines the spatial resolution of the base layer. As shown in Figure 3, the coding scheme of this enhancement layer is similar to that of the base layer, except that for each macroblock of a current picture being compressed, an additional prediction mode can be selected by the coding mode selection module 303 of stage 301.

The additional coding mode corresponds to the inter-layer prediction of SVG (SVCILP). Ft is outputted by an upsampling unit 304 which upsamples output signals from the motion estimation module 203, the coding mode selection module 207 and the memory buffer 204 of the first stage 301. The output of the upsampling module is also fed to the motion estimation module 306 and the Intra prediction module 307 of the second stage 302.

Inter-layer prediction consists in re-using the data coded in a layer lower than the current refinement layer as prediction data of the current macroblock. The lower layer used is called the reference layer for the inter-layer prediction of the current enhancement layer. In case the reference layer contains a picture that temporally coincides with the current picture, then it is called the base picture of the current picture-The co-located macroblock (which has a same spatial position) of the current macroblock that has been coded in the reference layer can be used as a reference to predict the current macroblock. More precisely, the prediction data that can be used in the co-located macroblock corresponds to the coding mode, the macroblock partition, the motion data (if present) and the texture data (temporal residual or reconstructed INTRA macroblock). In case of a spatial enhancement layer, some up-sampling operations of the texture and motion prediction data are performed.

With reference to Figure 4, the SVC temporal coding structure is discussed.

In the SVC scalable video compression standard, the temporal scalability feature is provided by the temporal coding structure of the hierarchical B pictures. Figure 4 depicts the SVG GOP coding structure employed to code SVC scalability layers. An SVC GOP corresponds to a picture interval in the sequence, delimited by two anchor (key) pictures with type I (INTRA) or P (forward predicted). These pictures are assigned a temporal level equal to 0. In other words, the temporal level 0 represents the lowest temporal layer of the illustrated video stream. A GOP contains so-called hierarchical B pictures. Hierarchical B pictures provide the temporal scalability feature in SVC.

They are noted B1 (i »= 1), where i is an integer representing the temporal level of picture B1. A picture with type B can be temporally predicted from its surrounding I or P pictures and also from the pictures B (j c i), located in the same GOP. In particular, B1 pictures can only be predicted with reference to their surrounding anchor pictures with type I or P. As a matter of fact, pictures from the highest temporal layer could be discarded from the stream, which would lead to a sub-stream which would be decodable, with a frame rate divided by two compared to the initial bitstream.

Figure 5 is a flowchart of an initial coding mode selection algorithm to select macroblock coding modes in the considered SVC encoder implementation.

The inputs for the algorithm comprise the original macroblock to encode, the reconstructed neighboring INTRA macroblocks, the neighboring INTER macroblocks information (motion data, residual reconstructed texture) and the reference picture(s) used to temporally predict the current picture being encoded. The goal of the algorithm is to choose a coding mode for the current macroblock to encode. It may be called by both the H.264/AVC encoder of Figure 2 and the SVC encoder of Figure 3.

The algorithm is initialized during a first step 500. Next, during step 501, it is tested whether the current macroblock is contained in an INTRA slice.

If so (yes), the best INTRA coding is searched for current macroblock during step 502. Then INTRA coding mode is selected during step 508.

Otherwise (no), in case of a P or B slice, a fast SKIP coding mode decision is involved during step 503.

This step comprises deriving the reference macroblock of current macroblock according to the SKIP mode. This derivation method uses the direct mode prediction process, as specified in the H.264/AVC standard. Then, the residual texture associated to the direct mode is calculated by subtracting the found reference macroblock from the original macroblock. This residual is transformed and quantized.

If the quantization leads to all zero coefficients (yes), then the SKIP mode is adopted for current macroblock during step 504. The algorithm then goes to the encoding step 510.

If the SKIP mode condition is not fulfilled (no), then the encoder performs an INTRA coding mode selection process.

The best INTRA coding mode is searched during step 505 for the current macroblock. In particular, this includes the determination of the best spatial prediction and the best partition of current macroblock in INTRA mode.

The result of this step comprises the BestintraMode variable, which identifies the best found INTRA coding mode, and the associated cost BestlntraCost which takes the form of an SAD (Sum of Absolute Difference) or an SATD (Sum of Absolute Transform Differences).

The cost function may combine the coding distortion D and the coding rateR according to the following formula:D+AR wherein A is a so called Lagrange parameter.

Next, the best INTER coding mode is searched for during step 506 for the current macroblock. This includes a forward motion estimation process in case of a P slice. In case of a B slice, the forward motion estimation process is followed by a backward and then a bi-directional motion estimation steps (not represented). For each temporal direction, the macroblock partition that leads to the best temporal predictor is also determined. The temporal prediction mode that gives the minimum SAD or SATD is selected as the best INTER coding mode. The results of this temporal prediction process is stored in the form of the best inter coding mode (variable BestlnterMode) and the cost (SAD or SAID) associated to it (variable BestlnterCost).

The next step 507 comprises selecting the best coding mode between the best found INTRA coding mode and the best found INTER coding mode. To do so, the coding mode that gives the lowest cost is selected.

lithe INTRA coding mode has a cost lower than the cost of the INTER coding mode (yes), the algorithm goes to step 508 during which the INTRA coding mode is selected.

Otherwise (no), the algorithm goes to step 509 during which the INTER coding mode is selected.

Next, the current macroblock is encoded with the coding mode selected during step 510.

If it is tested during step 511 that the current rnacroblock is the last in the current slice (yes), the algorithm is terminated during step 512. Otherwise (no), the next macroblock is processed (step 513) and the algorithm goes back to step 501.

One advantage of the fast coding mode selection of Figure 5 is its speed. Moreover, the motion estimation stage that is invoked during the best INTER coding mode search in this algorithm is also performed in a fast way, through the use of a restricted motion search area, as described in what follows.

Figures 6a and 6b iflustrate the motion estimation process used in the fast H.264/SVC encoder described hereabove.

This motion estimation employs the same strategy as that of the underlying H.264 encoder. As shown in Figure 6a, it consists in using two starting points for the motion search. The first starting point 600 corresponds to the co-located macroblock of current niacroblock 601 to predict in the reference picture 602. The co-located macroblock has the same position in the reference picture as the current macroblock in the current picture 607. The second starting point 603 corresponds to the reference macroblock that is pointed to by the motion vector predictor of current macroblock. Indeed, in case current macroblock has available neighboring macroblocks 604, 605, 606, then its motion vector, if Inter coding mode is selected, is coded using predictive coding.

The motion vector predictive coding consists in coding a motion refinement of the so-called motion vector prediction. The motion vector prediction of a macroblock is derived by calculating the median values of neighboring blocks motion vectors components.

A restricted four steps motion search is then performed around these two starting points, and is illustrated in Figure 6b. Letters A' to I' represent integer-pixel (integer-pel) positions, numbers 1' to 8' represent half-pixel (half-pel) positions and letters a to h' correspond to quarter-pixel (quarter-pet) positions. The restricted motion searches around each starting point consists in first evaluating A' to I' integer-pixel position as candidate integer-pixel motion vectors. Then the best motion vector issued from these 9 evaluations, i.e. which provides the lowest SAD (Sum of Absolute Differences between original and predicted niacroblocks) undergoes a further half-pixel motion refinement step.

This comprises determining the best motion vectors, among the found integer position and the 1' to 8' half-pel positions around it. A last quarter-pcI motion refinement is applied around found best half-pet position. It selects, among the found half-pel position and quarter-pel positions around it (labeled a' to h' on Figure db), the motion vector leading to the minimum SAD. Finally, the motion search that leads to the best motion vectors among the two initial starting points is selected to temporally predict current original macroblock. The motion search in the fast encoder may be restricted in order to ensure a good encoding speed.

Generally, the motion estimation and coding mode selection processes are those that take the longest time during the overall coding. In order to improve the motion search strategies described with reference to Figures 6a and Gb, the motion search may be extended to pictures that are "far" from their reference picture(s) in terms of temporal distance. Such approach is illustrated in Figure 6c, which shows a modified version of the initial four steps motion search presented with reference to Figure Gb.

The goal of the motion estimation illustrated in Figure 6c isto be able to find high amplitude motion vectors when relevant, while keeping reasonable complexity Df the motion estimation process. It comprises selecting a motion search area as a function of the temporal level of the picture to code. This may take the form of an increase of the motion search area for some macroblocks of low temporal level pictures. This motion search extension is determined as a function of the total GOP size and the temporal level of the current picture to encode. Hence, it increases according to the temporal distance between the picture to predict and its reference picture(s). Moreover, in pictures where the motion search area is extended, as illustrated in Figure 6c, the extension is only applied for a subset of the macroblocks (in bold) in the picture for complexity reasons. For other macroblocks corresponding to the squares (not in bold in Figure 6c), the initial motion estimation process described with reference to Figure Gb is employed. This combined method, where the motion search is extended for a subset of macroblocks, allows finding a reasonable trade-off between motion estimation accuracy and limited complexity increase.

In the top-left macroblocks of the picture represented in Figure 6c, the proposed extended motion search is systematically employed, since the motion vectors of these macroblock are used afterwards to derive the motion vector predictors for subsequent macroblocks in the picture.

Due to complexity issues, the motion search cannot systematically be extended for all macroblocks. Therefore, it may be difficult to find a good motion vector for macroblocks where a discontinuity of the motion field appears, and for which the median motion vector is not a good motion vector predictor.

This typically happens when motion field discontinuities appear in some macroblocks where the motion search is not extended or is not sufficiently extended.

In what follows, there is described a fast motion estimation process that enables fast motion estimation in pictures containing discontinuities in their

motion field with a limited complexity increase.

Before that, limitations of existing motion estimation methods are presented.

Figure 7 shows coding modes obtained with initial fast motion estimation. The coding modes chosen by the SVC reference software (referred to as "JSVM") are represented on the left and those selected by the fast coding process described hereabove are represented on the right side.

On the bottom of each picture, a time axis with coded pictures is illustrated. The hierarchical B pictures are illustrated, and the relative size of each compressed picture is represented by a rectangle (grey rectangle for the first picture which is an IDR (Instantaneous Decoder Refresh) picture; white rectangle for the last picture which is a P picture in the end of the GOP; grey black rectangles for the remaining pictures which are B pictures) above each picture symbol. Moreover, the two pictures also illustrate the macroblock coding modes that have been employed in the P picture at the end of the represented GOP, both for the JSVM and fast encoders. The grey macroblocks represent INTRA macroblocks, the black macroblocks are Inter-coded rnacroblocks, the white ones correspond to skipped macroblocks It can be noticed that a higher amount of macroblocks are encoded in INTRA mode in the case of the fast coding algorithm. In particular, the last coding process tends to select more INTRA macroblocks in areas where the motion fields significantly changes from macroblock to macroblock. The illustrated sequence precisely corresponds to foreground and background areas, each with its own overall motion field. The uncovered area that appears between these two regions is naturally INTRA coded by both coding processes. However, it clearly appears that the fast motion estimation process has more difficulties to find out the right motion vectors for macroblock located around this uncovered area, which leads to more INTRA coded macroblocks than in the JSVM case.

Figure 8 shows the motion vectors obtained with exhaustive and fast motion estimation processes. Figure 8 emphasizes the observation made with reference to Figure 7. It illustrates the motion vectors determined by the JSVM (left side) and fast coder (right side). First, the two main foreground and background regions clearly appear on the pictures, each with its own motion.

Moreover, it appears that near the transitions between these two main regions, the fast coder fails in finding good motion vectors for some macroblocks whereas the exhaustive motion search succeeds. This is explained in particular by the fact that an extended motion search is employed by the fast coder only for a few macroblocks in the picture (see Figure 6c). As a consequence, it seems necessary to find a motion estimation strategy that better handles drastic changes in the motion contained in the scene, while keeping the motion search algorithm as fast as possible.

Figure 9 illustrates the moflon vector prediction list management.

For each macroblock to process, the initial motion search (described with reference to Figure 6a) performs a restricted motion search around two starting positions.

One of the two starting positions is the reference block pointed to by the motion vector predictor available for the current macroblock. This predictor is computed as the median value of three neighboring blocks already coded. It is referred to as the median motion vector and it is noted median in what follows.

One of the two starting positions is the co-located macroblock of the current macroblock in the considered reference picture. This motion vector has zero coordinates and is noted (0,0) in what follows.

The inventors have observed that when the motion vector selected by the motion search results from a search around the (0; 0) starting point, it may mean that as the median motion fails in predicting a good motion vector for the current macroblock, then a discontinuity may exist in the scene content, at the location of the current macroblock. Therefore, this means that the current macroblock may belong to a new picture region from the motion field view point.

Moreover, when, later during the picture coding process, the motion estimation re-enters" an old region from the motion view point, the motion estimation process has "forgotten" everything about the motion contained in that old region.

As a consequence, in such case, the list of motion search starting points that will be considered by the motion search for subsequent macroblocks in the picture may advantageously be enriched.

Practically, when the (0; 0) starting point provides the best motion vectors, then the median motion vector for current macroblock is added to the list of motion search starting points that will be used by the motion estimation :rGcess for subsequent macroblocks.

Therefore, a list of motion search starting points is created and is updated from macroblock to macroblock during the picture coding process.

As a result, the list of starting points used for the restricted motion search of Figure 6a is longer than in the prior art approach, according to which only two starting points are considered.

One advantage of the present approach is that even if a median motion vector no more serves as the good starting point for motion search, it is being stored in memory into the proposed list of motion search starting points.

Therefore, if the motion search enters a region in which motion is quite similar to another region formerly processed in the current picture, then a motion vector typically representative of the motion contained in that region is available in the list and is known by the motion search process. As a result, this known motion field can be easily exploited as a relevant starting point by the fast, restricted, motion search of Figure 6a.

In addition to the starting point list progressive generation process, means for managing this dynamic list of motion search starting points are provided.

A list indexing and ordering may be performed. For example, the list is continuously ordered as a function of the frequency at which motion search starting points are used. The starting points that are used the most often are placed at the beginning of the list. Thus, a list re-ordering process is periodically involved so as to keep most relevant starting points at the top of the list.

A list pruning mechanism may be implemented. For example, a starting point deletion process is employed so as to keep the length of the list reasonable (for example in view of the memory available and the processing means available). In practice, when the insertion of a new starting point makes the list become too long, then the element at the bottom of the list, which is the less often used, is discarded from the list.

In what follows, exemplary embodiments are presented.

Figure 10 is a flowchart illustrating a global algorithm for encoding a video sequence. During a first step 1000 the original video sequence to encode is received along with the coding parameter (e.g. the number of B pictures in each GOP as illustrated in Figure 4). The algorithm described with reference to Figure 10 corresponds to the encoding of an SVC bitstream that provides temporal scalability (hierarchical B pictures), and contains one scalability layer.

Next, a loop comprising steps 1000, 1001 and 1002 is performed in order to successively load original pictures and store them into a dedicated buffer (array m_gopframe in Figure 10). This loading and storing process is repeated until a number of pictures equal to the target GOP size is contained in the buffer m_gopframe[].

The next part of the algorithm consists in a loop performed on pictures of the current GOP to encode which are contained in the buffer m_gopframe[].

A picture index "currPic" is initialized during a step 1003. The index is to take values from 0 to (GOPSize -1), where GOPSize represents the size of the current GOP (number of pictures).

Next, pictures are successively processed in coding order. Therefore, the Picture Order Count (POC) value, called currPic, associated to the next picture to encode is obtained during step 1004. This POC value is determined as a function of the temporal coding structure employed to encode the GOP, i.e. the hierarchical B pictures coding structure illustrated in Figure 4. The determination of the PUG value is not detailed here since the skilled person is familiar with such determination.

The picture m_gopframe[currPicl, i.e. the picture located at position currPic in the original pictures buffer, is encoded. To do so, a loop on the slices to code in this picture is performed.

During a step 1005 a slice index "currSlice" is initialized. Next, for each slice in current picture, the slice header is encoded during a step 1006 and then the slice data, i.e. the macroblocks contained in the slice, is encoded during a step 1007 by invoking an algorithm as described with reference to Figure 5.

The loop on slices is over when all slices in current picture have been encoded. A step 1008 is performed for testing the current value of index currSlice.

If the current slice is not the last one in the picture (no), the index currSlice is incremented during step 1009 and the algorithm goes back to step 1006.

If the current slice is the last one in the picture (yes), it is tested, during step 1010 whether the last picture to encode in current GOP has been processed.

If this is not the case (no), the current picture index currPic is incremented during step 1011 and the algorithm returns to the POC value obtaining step 1004.

Alternatively (yes), if all pictures in current GOP have been processed, it is tested during step 1012 whether the end of the video sequence has been reached.

If this is not the case (no), the next GOP is processed during step 1013 and the algorithm goes back to the original picture loading step 1000.

If the end of the video sequence has been reached (yes), the algorithm is terminated during step 1014.

Figure 11 is a flowchart illustrating an algorithm for searching the best INTER coding mode for a current macroblock to encode. This algorithm is called by the algorithm of Figure 5 during the macroblock coding mode selection process, The inputs comprise the original macroblock being encoded, and the reference pictures useful to temporally predict it.

The algorithm of Figure 11 successively considers all the possible macroblock and sub-macroblock partitions in current macroblock, and performs a motion estimation process for each block contained in these macroblock and sub-macroblock partitions.

To do so, it first successively considers the macroblock partition types 16 x 16, 16 x 8 and 8 x 16. For each of these macroblock partition types, represented by the mbPartType variable, it successively processes the macroblock partitions in current macroblock, which size corresponds to the current macroblock partition type. These partitions are represented by the variable mbPartldx on Figure 11 Each macroblock partition of the current type undergoes an integer-pel motion estimation process which is described in what follows with reference to Figure 12. The result of this motion estimation consists in the motion vector associated to current macroblock partition, the reference picture index, and the cost (e.g. the SAD) associated to this motion vector.

During a first step 1100 the loop on macroblock partition types is initialized. The variable "mbPartType" identifying the marcroblock partition type receives the value "16x16".

Next, for the macroblock partitions having the current type, a loop is initialized during step 1101 and index mbPartldx receives the value "0".

An inter-pet motion estimation is then performed for the marcoblocks having the current type and the current index during step 1102. The algorithm described with reference to Figure 12 may be used.

Next, it is tested whether the current index corresponds to the last macroblock partition in the current macroblocks during step 1103.

If this is not the case (no), the index mbPartldx is incremented during step 1104 and the algorithm goes back to step 1102.

Alternatively (yes), once all macroblock partitions for a given macroblock partition type have been processed, it is tested, during step 1105 whether all macroblock partition types have been treated.

If this is not the case (no), another macroblock partition type is considered and the variable mbPartType takes the next value (16x8 or 8x16) during step 1106. The algorithms then goes back to step 1101.

Alternatively (yes), when the oop on macroblock partition types is over (i.e. all the macroblock partition type have been considered), a loop on 8 x 8 macroblock partitions contained in the current macroblock is performed.

For each B x 8 macroblock partition, all possible sub-macroblock partition types: 8 x 8, 8 x 4, 4 x 8 and 4 x 4 are successively considered. For each sub-macroblock partition type subMbPartType, a loop on sub-macroblock partitions which size corresponds to subMbPartlype is performed. For each sub-macroblock partition with index subMbPartldx, an integer-pel motion estimation is performed. To do so, it invokes the algorithm of Figure 12. This integer-pel motion estimation operation results in a motion vector for current sub-macroblock partition subMbPartldx, as well as a reference picture index and an associated cost (under the form of a sAD).

The loop on 8x8 macroblock partitions is initialized during step 1107 by setting index mbPartldx to zero.

Next, during step 1108, the loop on sub-macroblock partition types is initialized by setting a variable "subMbPartlype" representing the current type of the sub-macroblock partition to "8x8".

After that, during step 1109, the loop on macroblock partitions with the current type is initialized by setting the index value subMbPartldx to zero.

The inter-pel motion estimation on the current sub-macroblock partition corresponding to the current subMbPartldx value is performed during steplilO.

It is then tested during step 1111 whether all sub-macroblock partitions in the current macroblocks partition have been processed.

If this is not the case (no), the index subMbPartldx is incremented during step 1112 and the algorithm goes back to step 1110.

If all sub-macroblocks partitions have been considered (yes), it is then tested during step 1113 whether all sub-macroblock partition types have been processed.

If this is not the case (no), the variable subMbPartType is assigned the next sub-macroblock partition type in set {8 x 8; 8 x 4; 4 x 8; 4 x 4} during step 1114 and the algorithm goes back to step 1108 and repeats the ioop on sub-macroblock partitions.

Once motion information and associated cost has been determined for each macroblock partition and sub-macroblock partition in current macroblock (yes), then the coding mode that leads to the lowest accumulated cost for the whole macroblock is determined. To do so, the sum of previously calculated costs of partitions corresponding to each possible coding mode is calculated during step 1115. The coding mode that provides the lowest sum of partitions' costs is selected as the best inter-pel coding mode.

At last, the selected integer-pel temporal prediction on the sub-pixel level is refined during step 1116. This comprises evaluating the half-pci positions (positions 1 to 8' in Figure 6b) and quarter-pel positions (positions a' to h' in Figure 6b) around the found integer-pel position, and selecting the position that gives the minimal accumulated cost. Once this is done, the algorithm is terminated during step 1117.

Figure 12 is a flowchart illustrating an algorithm used to perform motion estimation for a block to temporally predict, based on a motion vector prediction list management The inputs comprise: -The original block (or partition) to predict. This corresponds to the macroblock or sub-macroblock partition currently being considered by the algorithm described with reference to Figure 12. Hence the size of this block varies from 4 x 4to 16 x 16.

-The reference pictures useful to temporally predict the picture currently being encoded.

During a first step 1200 of the algorithm, the current list of motion search starting points is obtained. At the beginning of the video sequence coding process, this list comprises two elements, which are the median motion vector for current block and the zero motion vector (starting point corresponding to the co-located reference block).

Next, the list can be maintained during the coding of a picture.

Alternatively, the list is maintained and continuously updated on the sequence level rather than on the picture level. The list state is kept in memory from a picture to the next one.

Therefore, in the general case, at the beginning of the algorithm described with reference to Figure 12, the list of starting points contains the median motion, the zero motion vector and starting points that have been stored into the list during preceding motion estimation operations.

Next, a loop is performed on the elements contained in the current list.

The loop is initialized during step 1201, wherein the currently considered motion search starting point SJ-3 takes the first element in the current list of starting points A restricted motion search as described with reference to Figure 6b is then performed during step 1202 around the starting point pointed to by vector SFT. The best motion vector found by this restricted search is noted BestMV(Sf, and has an associated cost noted Bestcost( During the next step 1203, it is tested whether the last element in the list has been reached.

If this is not the case (no), vector SI is set as the next element in the list during step 1204, and the algorithm goes back to the restricted motion search step 1202.

Alternatively (yes), if all elements in the list have been considered, the best motion vector found among the results of all the restricted motion searches during the ioop is selected during step 1205. The starting point that led to the best motion vector is determined using the following equation: 1seiec, = Argrnin,jBestCost(SP)} According to the equation, the starting point selected is the one giving the minimum cost value. Next, during step 1206, the best motion vector is deduced from the stored best motion vectors obtained during each restricted motion search: BCStMV(SPSfCd) . This represents the best motion vector found for current biock.

The current list of motion search starting points is then updated during step 1207.

Once the list is updated, the algorithm is terminated during step 1208.

The updating step is further described in what follows with reference to Figure 13 which is a flowchart illustrating an update algorithm for the list of motion search starting points.

The inputs comprise: -the current list of motion search starting points{rnedian,(o0),SF,SP,...,Sk}, -the motion search starting that has provided the best motion vector during the last motion estimation process: SPjg; and -the motion vector chosen by the iast motion estimation process MV.

During a first step 1300, it is tested whether the selected motion search starting point corresponds to the zero motion vector.

If this is the case (yes), it may be interpreted that none of the previously stored elements in the list has been considered as relevant for predicting the motion of the current block. Therefore, this may indicate that the coding process has entered a new picture area in terms of motion. Therefore, the motion vector that was typically representative of the motion contained in the previous picture area is stored during step 1301, For example, the median motion vector of the current block is inserted into the list of motion starting points.

Alternatively (no), if the selected motion search starting point does not correspond to the zero motion vector, it may be interpreted that either the median motion vector of the current block or on the previously stored starting points led to the best found motion vector.

It is then tested, during step 1302, whether the selected motion starting point is different from the median motion vector.

If this is the case (yes), if one of the stored starting points provided the best motion vectors, then its value is updated and is set equal to the motion vector actually found for current block during step 1303. This is done because current block's motion vector is likely to better predict motion vectors of subsequent block in current picture than old motion search starting points previously stored in the list.

Once the set of starting points contained in the list has been updated, the frequencies respectively associated with each element contained in the list are updated during step 1304. This frequency value serves as the criteria used to sort the list of motion search starting points. It may be defined as the number of blocks for which the considered starting points led to the best prediction of a block's motion vector, divided by the total number of blocks processed so far in the picture/sequence.

If the selected motion starting point corresponds to the median motion vector (no), the algorithm goes directly to step 1304.

Next, during step 1305, the list of motion search starting points is re-sorted according to the frequency value associated with each element in the list.

The algorithm then tests whether the list has reached a length higher than the maximum authorized length during step 1306.

If the list has not reached such length (no), the algorithm terminates during step 1307.

If the list has reached such length (yes), the element which has the lowest frequency of use (found using the previous sorting step) is discarded from the list during step 1308. The algorithm then terminates during step 1307.

According to embodiments, if the element which has the lowest frequency is the median or the (0,0) vector, then in order to keep in any case these elements in the list, the element with the lowest frequency different from these two elements is be discarded.

In the case wherein the list is maintained from picture to picture, the list may be re-ordered before starting the encoding of a new picture. This re-ordering process may comprise sorting the motion search starting points as a function of the spatial location of the block at which each motion search starting point was added to the list.

The algorithms described with reference to Figures 12 and 13 where described with reference to the case wherein only one list of motion search starting points is managed. This may be adapted to cases wherein only one reference picture list and only one reference picture are used. Such cases also correspond to the so-called IPPP temporal coding structure. Indeed, in such case, the motion vectors are likely to be correlated from a picture to the following one.

However, multiple reference pictures may be used. The motion vectors that link current picture with the at least two reference pictures may be quite different for the two reference pictures. As a matter of fact, it may be of interest to consider one list of motion search starting points for each reference picture.

In the above description, it has been considered the case wherein only one reference picture list is employed. This case typically corresponds to the use of only forward temporal motion compensated temporal prediction.

However, the algorithms described hereabove may be extended to the case wherein B pictures are also encoded.

In the case of non hierarchical B pictures, it may be created and maintained one motion search starting points list for each list of reference pictures.

In order to address the coding of hierarchical B pictures (previously introduced with reference to Figure 4) it may be created and maintained one list of motion search starting points from picture to picture inside a given temporal level, and for each reference picture list. Indeed, since all pictures in a given temporal level have equal temporal distance from their reference picture(s), motion vectors are likely to be highly correlated from a picture to the next one inside a given temporal level.

According to embodiments, scalable video streams made of several scalability layers may be encoded. In such case, one list of motion search starting points may be created inside each scalability layer. Moreover, the motion search starting points list associated with an enhancement layer (as opposed to the base layer) may contain two types of motion search starting points: -the starting points corresponding to previously computed median motion vectors, similarly to the non-scalable case -some starting points corresponding to derived motion vectors issued from lower layers motion vectors, optionally up-sampled in the case of spatial scalability.

The so-called MV-Comp (Motion Vector Competition) motion vector coding process, used in particular in evolutions of the H.264/AVC coding standard, basically consists in encoding a syntax element that indicates the motion vector predictor that is used in the predictive coding of the motion vector of a given picture block. A typical practical implementation of MV-Comp consists in encoding a flag that indicates whether the median motion vector predictor or the motion vector of current block's co-located block in the reference picture is used to predict current block's motion vector. Moreover, if both predictor candidates happen to be identical, then that flag needs not being encoded, and is omitted. As a consequence, the list of motion search starting points as described above can be re-used in the encoding of motion vectors, in a MV-Comp like approach.

According to embodiments compatible with the MV-Comp, a syntax element that indicates the decoder which element in the starting points list serves as the predictor in the coding of a block's motion vector may be encoded. For example, an index indicating the position of the relevant starting point is transmitted to the decoder. Such approach corresponds to the application of the list of motion search starting points described above to a predictive motion vector coding process similar to MV-Comp.

In addition, some additional syntax elements can be inserted into the video coded bitstream so as to make the decoder manage/update the list of motion search starting points in a way that is identical to the list updating performed on the encoder side. Such syntax elements may comprise: -addition of a new starting point: such syntax element indicates to the decoder that the median motion vector predictor of previous block's has to be added to the list, -deletion of an element: this syntax element indicates that the last element (which is also the one having the lowest frequency of use) in the list has to be discarded from the list, -re-ordering: a dedicated syntax element may be used so as to indicate to the decoder how to re-order its list of motion vector predictors in the same way as the list of motion search starting points is re-ordered on the encoder side.

A decoding process associated with the above MV-Camp compatible encoding is described hereafter.

The decoder applies the list management commands listed above, in order to have exactly the same list of motion vector predictors as the corresponding video encoder. Therefore, the video decoder has to be able to manage the list of motion vector search starting points in a way that is synchronized with the encoder.

Then, given this synchronous list of motion vectors predictors, the decoder is able to decode and reconstruct motion vectors as follows.

The motion data that is encoded together with coded block data according to the present invention comprise the following elements, among others: -the index of the motion vector predictor in the list of the motion vector predictors previously introduced, and -the motion vector difference (according to the abscissa and the ordinates) between current block's motion vector and the motion vector predictor indicated by the motion vector predictor index.

If a given block is divided into several sub-blocks, also called partitions, then one motion vector predictor index and one motion vector difference are encoded for each sub-block. Also, in the case of bi-directional predicted pictures, called B pictures, two lists of motion vector predictors may be maintained according to the present invention: one associated to the past reference picture and one associated to reference picture located in the future on the time axis.

Once the decoder has parsed and decoded temporal prediction dedicated syntax elements from the video bit-stream, it is able to reconstruct the motion vectors associated to one block and its sub-blocks. To do so, it sums the value of the motion vector predictor and the motion vector difference of the considered block, which provides the reconstructed motion vector value. This sum is repeated for each sub-block of a considered block. Once this is done, the fully reconstructed motion vectors of a given block are available, and subsequent video decoding operation can be performed.

According to an embodiment, the evolution of the list of motion vector predictors is fully inferred by the decoder, in a synchronous way with the encoder. In such case, no syntax element is encoded in the bit-stream to manage the list, which may help optimizing the bitrate.

A computer program according to embodiments may be designed based on the flowcharts of Figures 5, 10, 11, 13, 13 and the present description.

Such computer program may be stored in a ROM memory of a device as described with reference to Figure 1. It may then be loaded into and executed by a processor of such device for implementing steps of a method according to the invention.

While the invention has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive, the invention being not restricted to the disclosed embodiment. Other variations to the disclosed embodiment can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure and the appended claims.

In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used. Any reference signs in the claims should not be construed as limiting the scope of the invention.

Claims

<claim-text>CLAIMS1. A method of estimating motion in successive pictures of a video stream comprising the following steps performed for a current block of pixels of a current picture of the video stream: -determining a current motion vector representing motion of the current block of pixets, and -updating, based on the current motion vector determined, a list of motion search starting blocks of a reference picture used for determining a next motion vector representing motion of a next block.</claim-text> <claim-text>2. A method according to claim 1, wherein the current motion vector is determined based on at least one search starting block of the list.</claim-text> <claim-text>3. A method according to claim 2, wherein the determination of the current motion vector comprises the following steps: -for each starting block in the list, performing a restricted motion search around the starting block and determining a possible motion vector based on a result of the restricted motion search, and -selecting one of the said possible motion vectors determined for each starting block in the list based on a selection criteria, the possible motion vector selected being the determined current motion vector.</claim-text> <claim-text>4. A method according to claim 3, further comprising associating each possible motion vector respectively determined for each search starting block in the list with a cost value on which is based the selection criteria.</claim-text> <claim-text>5. A method according to any one of claims 3 and 4, wherein updating the list comprises inserting a block pointed to in the reference picture by the current motion vector in replacement of a current search starting block in the list used for determining the current motion vector.</claim-text> <claim-text>6. A method according to claim 5, wherein the block pointed to in the reference picture by the current motion vector is inserted in replacement of the current search starting block when the possible motion vector selected is determined based on a restricted motion search around a search starting block which is not: -a block pointed to in the reference picture by a median motion vector, said median motion vector being a median vector of motion vectors already determined for blocks in a vicinity of the current block, or -a block of pixels in the reference picture having same coordinates in the reference picture as the current block in the current picture.</claim-text> <claim-text>7. A method according to any one of claims I to 4, wherein updating the list comprises inserting a block pointed to in the reference picture by the current motion vector.</claim-text> <claim-text>8. A method according to any one of claims I to 4, wherein updating the list comprises inserting a block pointed to in the reference picture by a median motion vector, said median motion vector being a median vector of motion vectors already determined for blocks in a vicinity of the current block.</claim-text> <claim-text>9. A method according to any one of claims 7 and 8, wherein the block pointed to in the reference picture is inserted in the list when the current motion vector is determined based on a block of pixels in the reference picture having same coordinates in the reference picture as the current block in the current picture.</claim-text> <claim-text>10. A method according to any one of the preceding claims, further comprising a step of ordering the list of motion search starting blocks.</claim-text> <claim-text>11. A method according to claim 10, wherein the motion search starting blocks are ordered according to respective frequencies of use for determining motion vectors.</claim-text> <claim-text>12. A method according to claim 11, further comprising updating the frequencies of use of the motion search starting blocks.</claim-text> <claim-text>13. A method according to claim 10, wherein the motion search starting blocks are ordered according to a spatial location of the current block for which they have respectively been inserted in the list.</claim-text> <claim-text>14. A method according to any one of the preceding claims further comprising for a current block of pixels of a next picture of the video stream, determining a current motion vector representing motion of the current block of pixels in the pictures of the video stream based on at least one search starting block of the updated list 15. A method according to claim 14, wherein the current and the next pictures are comprised in a same temporal level.16. A method according to claim 14, wherein the current and next pictures are comprised in a same scalability layer.17. A method according to any one of the preceding claims, wherein updating the list comprises removing at least one search starting block.18. A method according to claim 17, wherein at least one search starting block is removed in order to keep a length of the list under a length limit.19. A method according to any one of claims 17 and 18, wherein at least one search starting block having a lowest frequency of use for determining motion vectors is removed.20. A method according to any one of claims 17 to 19, wherein the at the at least one search starting block removed is different from at least one of: -a block pointed to in the reference picture by a median motion vector, said median motion vector being a median vector of motion vectors already determined for blocks in a vicinity of the current block, and -a block of pixels in the reference picture having same coordinates in the reference picture as the current block in the current picture.21. A method according to any one of the preceding claims, wherein the list is associated with the reference picture.22. A method according to claim 21, wherein motion is estimated using a plurality of reference pictures and wherein a list of search starting blocks is associated with each reference picture.23. A method of coding a video stream comprising the following steps: estimating motion in successive pictures of the video stream according to any one of claims 1 to 22, -coding a first syntax element indicating a search starting block of the list used for determining the current motion vector, and -coding the current motion vector as a difference between said current motion vector and the motion vector pointing to the search starting block of the list used for determining the current motion vector.24. A method according to claim 23, wherein the first syntax element comprises a position in the list.25. A method according to any one of claims 23 and 24, further comprising coding a second syntax element containing an indication to a decoder receiving the coded video stream to insert a search starting block in a list of motion search starting blocks of the decoder.26. A method according to claim 25, wherein the second syntax element contains an indication to the decoder to insert in the list of motion search starting blocks of the decoder a block pointed to in the reference picture by a median motion vector, said median motion vector being an median vector of motion vectors already determined for blocks in a vicinity of the current block.27. A method according to any one of claims 23 to 26. further comprising coding a third syntax element containing an indication to a decoder receiving the coded video stream to remove a search starting block from the list of motion search blocks of the decoder.28. A method according to any one of claims 23 to 27, further comprising coding a fourth syntax element containing an indication to a decoder receiving the coded video stream to reorder the list of motion search blocks of the decoder.29. A method according to claim 28, wherein the fourth syntax element contains an indication to the decoder to reorder the list of motion search blocks of the decoder in the same order as the list of motion search blocks updated during the updating step of the motion estimation according to any of claims 1 to 22.30. A method of decoding a video stream comprising the following steps performed for a current block of pixels of a current picture of the video stream: -decoding a current motion vector representing motion of the current block of pixels coded by a coder as a difference between said current motion vector and a current motion vector predictor, by adding to said difference a motion vector pointing to a motion starting block from a list of motion search starting blocks of a reference picture, -determining a search starting block used by said coder for determining the current motion vector, and -updating the list of motion search starting blocks with said search starting block determined for decoding a next motion vector representing motion of a next block.31. A method according to claim 30, further comprising decoding a first syntax element indicating said search starting block determined.32. A method according to claim 31, wherein the first syntax element comprises a position in the list.33. A method according to any one of claims 30 to 32, further comprising inserting a search starting block in the list of motion search starting blocks.34. A method according to claim 33, further comprising decoding a second syntax element containing an indication to insert said search starting block in the list of motion search starting blocks.35. A method according to any one of claims 33 and 34, comprising inserting in the list of motion search starting blocks a block pointed to in the reference picture by a median motion vector, said median motion vector being a median vector of motion vectors already determined for blocks in a vicinity of the current block.36. A method according to claim 35, wherein the second syntax element contains an indication to insert in the list of motion search starting blocks the block pointed to in the reference picture by a median motion vector, said median motion vector being an median vector of motion vectors already determined for blocks in a vicinity of the current block.37. A method according to any one of claims 30 to 36, further comprising removing a search starting block from the list of motion search blocks.38. A method according to claim 37, further comprising decoding a third syntax element that contains an indication to remove the search starting block from the list of motion search blocks.39. A method according to any one of claims 30 to 38, further comprising reordering the list of motion search blocks.40. A method according to claim 39, further comprising decoding a fourth syntax element that contains an indication to reorder the list of motion search blocks.41. A method according to any one of claims 39 and 40, comprising reordering the list of motion search blocks in the same order as a list of motion search blocks of the coder.42. A method according to claim 41, wherein the fourth syntax element contains an indication to reorder the list of motion search blocks in the same order as a list of motion search blocks of the coder.43. A motion estimation device for estimating motion in successive pictures of a video stream comprising: -a control unit configured to determine a current motion vector representing motion of a current block of pixels of a current picture of the video stream, and to update, based on the current motion vector determined, a list of motion search starting blocks of a reference picture used for determining a next motion vector representing motion of a next block.44. A motion estimation device according to claim 43, further comprising a memory unit for storing said list.45. A motion estimation device according to any one of claims 43 and 44, wherein the current motion vector is determined based on at least one search starting block of the list.46. A motion estimation device according to claim 45, wherein for determining the current motion vector, the control unit is further configured to: -perform, for each starting block in the list, a restricted motion search around the starting block and to determine a possible motion vector based on a result of the restricted motion search, and -select one of the said possible motion vectors determined for each starting block in the list based on a selection criteria, the possible motion vector selected being the determined current motion vector.47. A motion estimation device according to claim 46, wherein the control unit is further configured to associate each motion vector respectively determined for each search starting block in the list with a cost value on which is based the selection criteria.48. A motion estimation device according to any one of claims 46 and 47 wherein, for updating the list, the control unit is further configured to insert a block pointed to in the reference picture by the current motion vector in replacement of a current search starting block in the list used for determining the current motion vector.49. A motion estimation device according to claim 48, wherein the block pointed to in the reference picture by the current motion vector is inserted in replacement of the current search starting block when the motion vector selected is determined based on a restricted motion search around a search starting block which is not: -a block pointed to in the reference picture by a median motion vector, said median motion vector being a median vector of motion vectors already determined for blocks in a vicinity of the current block, or -a block of pixels in the reference picture having same coordinates in the reference picture as the current block in the current picture.50. A motion estimation device according to any one of claims 43 to 49, wherein, for updating the list, the control unit is further configured to insert a block pointed to in the reference picture by the current motion vector.51. A motion estimation device according to any one of claims 43 to 49, wherein for updating the list, the control unit is further configured to insert a block pointed to in the reference picture by a median motion vector, said median motion vector being a median vector of motion vectors already determined for blocks in a vicinity of the current block.52. A motion estimation device according to any one of claims 50 and 51, wherein the control unit is further configured to insert the block pointed to in the reference picture in the list when the current motion vector is determined based on a block of pixels in the reference picture having same coordinates in the reference picture as the current block in the current picture.53. A motion estimation device according to any one of claims 43 to 49, wherein the control unit is further configured to order the list of motion search starting blocks.54. A motion estimation device according to claim 53, wherein the control unit is further configured to order the motion search starting blocks according to respective frequencies of use for determining motion vectors.55. A motion estimation device according to claim 54, wherein the control unit is further configured to update the frequencies of use of the motion search starting blocks.56. A motion estimation device according to claim 53, wherein the control unit is further configured to order the motion search starting blocks according to a spatial location of the current block for which they have respectively been inserted in the list.57. A motion estimation device according to any one of claims 43 to 56, wherein, for a current block of pixels of a next picture of the video stream, the control unit is further configured to determine a current motion vector representing motion of the current block of pixels in the pictures of the video stream based on at least one search starting block of the updated list 58. A motion estimation device according to claim 57, wherein the current and the next pictures are comprised in a same temporal level.59. A motion estimation device according to claim 57, wherein the current and next pictures are comprised in a same scalability layer.60. A motion estimation device according to any one of claims 43 to 59, wherein updating the list comprises removing at least one search starling block.61. A motion estimation device according to claim 60, wherein at least one search starting block is removed in order to keep a length of the list under a length limit.62. A motion estimation device according to any one of claims 60 and 61, wherein at least one search starting block having a lowest frequency of use for determining motion vectors is removed.63. A motion estimation device according to any one of claims 60 to 62, wherein the at the at least one search starting block removed is different from at least one of: -a block pointed to in the reference picture by a median motion vector, said median motion vector being a median vector of motion vectors already determined for blocks in a vicinity of the current block, and -a block of pixels in the reference picture having same coordinates in the reference picture as the current block in the current picture.64. A motion estimation device according to any one of claims 43 to 63, wherein the list is associated with the reference picture.65. A motion estimation device according to claim 64, wherein motion is estimated using a plurality of reference pictures and wherein a list of search starting blocks is associated with each reference picture.66. A coder for coding a video stream comprising: -a motion estimation device according to any one of claims 43 to 65 for estimating motion in successive pictures of the video stream, and -a control unit configured to code a first syntax element indicating a search starting block of the list used for determining the current motion vector, and to code the current motion vector as a difference between said current motion vector and the motion vector pointing to the search starting block of the list used for determining the current motion vector.67. A coder according to claim 66, wherein the first syntax element comprises a position in the list.65. A coder according to any one of claims 66 and 67, wherein the control unit is further configured to code a second syntax element containing an indication to a decoder receiving the coded video stream to insert a search starting block in a list of motion search starting blocks of the decoder.69. A coder according to claim 68, wherein the second syntax element contains an indication to the decoder to insert in the list of motion search starting blocks of the decoder a block pointed to in the reference picture by a median motion vector, said median motion vector being an median vector of motion vectors already determined for blocks in a vicinity of the current block.70. A coder according to any one of claims 66 to 69, wherein the control unit is further configured to code a third syntax element containing an indication to a decoder receiving the coded video stream to remove a search starting block from the list of motion search blocks of the decoder.16 71. A coder according to any one of claims 66 to 70, wherein the control unit is further configured to code a fourth syntax element containing an indication to a decoder receiving the coded video stream to reorder the list of motion search blocks of the decoder.72. A coder according to claim 71, wherein the fourth syntax element contains an indication to the decoder to reorder the list of motion search blocks of the decoder in the same order as the list of motion search blocks updated by the control unit of the motion estimation device.73. A decoder for decoding a video stream comprising a control unit configured, for a current block of pixels of a current picture of the video stream, to: decode a current motion vector representing motion of the current block of pixels coded by a coder as a difference between said current motion vector and a current motion vector predictor, by adding to said difference a motion vector pointing to a motion starting block from a list of motion search starting blocks of a reference picture, -determine a search starting block used by said coder for determining the current motion vector, and -update the list of motion search starting blocks said search starting block determined for decoding a next motion vector representing motion of a next block.74. A decoder according to claim 73, wherein the control unit is further configured to decode a first syntax element indicating said search starting block determined.75. A decoder according to claim 74, wherein the first syntax element comprises a position in the list, 76. A decoder according to any one of claims 73 to 75, wherein the control unit is further configured to insert a search starting block in the list of motion search starting blocks.77. A decoder according to claim 76, wherein the control unit is further configured to decode a second syntax element containing an indication to insert said search starting block in the list of motion search starting blocks.78. A decoder according to any one of claims 76 and 77, wherein the control unit is further configured to insert in the list of motion search starting blocks a block pointed to in the reference picture by a median motion vector, said median motion vector being an median vector of motion vectors already determined for blocks in a vicinity of the current block.79. A decoder according to claim 78, wherein the second syntax element contains an indication to insert in the list of motion search starting blocks the block pointed to in the reference picture by a median motion vector, said median motion vector being an median vector of motion vectors already determined for blocks in a vicinity of the current block.80. A decoder according to any one of claims 73 to 79, wherein the control unit is further configured to remove a search starting block from the list of motion search blocks.81. A decoder according to claim 80, further wherein the control unit is further configured to decode a third syntax element that contains an indication to remove the search starting block from the list of motion search blocks.82. A decoder according to any one of claims 73 to 81, wherein the control unit is further configured to reorder the list of motion search blocks.83. A decoder according to claim 82, wherein the control unit is further configured to decode a fourth syntax element that contains an indication to reorder the list of motion search blocks.84. A decoder according to any one of claims 82 and 83, wherein the control unit is further configured to reorder the list of motion search blocks in the same order as a list of motion search blocks of the coder.85. A decoder according to claim 84, wherein the fourth syntax element contains an indication to reorder the list of motion search blocks in the same order as a list of motion search blocks of the coder.86. A system comprising a coder according to any one of claims 66 to 72 for coding a video stream and a decoder according to any one of claims 73 to 85 for decoding a video stream.87. A computer program product comprising instructions for implementing a method according to any one of claims 1 to 65 when the program is loaded and executed by a programmable apparatus.88. A non-transitory information storage means readable by a computer or a microprocessor storing instructions of a computer program, characterized in that it makes it possible to implement a method according to any one of claims ito 65.89. A device substantially as hereinbefore described with reference to, and as shown in, Figure 1 of the accompanying drawings.90. A method of estimating motion in successive pictures of a video stream substantially as hereinbefore described with reference to, and as shown in, Figures 12 and 13 of the accompanying drawings.</claim-text>