GB2519514A

GB2519514A - Method and apparatus for displacement vector component prediction in video coding and decoding

Info

Publication number: GB2519514A
Application number: GB1318079.9A
Authority: GB
Inventors: Guillaume Laroche; Christophe Gisquet; Patrice Onno
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2013-10-11
Filing date: 2013-10-11
Publication date: 2015-04-29
Also published as: GB201404588D0; WO2015052273A1; GB201318079D0; GB2519616A

Abstract

A two dimensional vector representing a location in a current image comprising blocks of pixels, is encoded. At least one encoding mode used for encoding the blocks is a Intra Block Copy prediction mode, in which a block of pixels is predictively encoded based on a predictor block corresponding to a reconstructed actual block of the current image. The encoding of a displacement vector is improved by introducing a step of prediction of this vector into an INTRA Block Copy encoding mode. Determining prediction information of the vector may comprise using a vector associated to another (e.g. last, left, above) block of the image encoded in a mode using at least a 2D vector for identifying a predictor block e.g. Intra Block Copy mode. The determined prediction information may comprise a plurality of vectors, one vector minimising a rate distortion criterion being selected among the plurality. The location of the predictor block may be constrained within a window, and the prediction information of a vector being determined as invalid if it points outside the window.

Description

METHOD AND APPARATUS FOR DISPLACEMENT VECTOR COMPONENT

PREDICTION IN VIDEO CODING AND DECODING

The present invention concerns a method and a device for predicting a displacement vector component in the process of encoding or decoding a video.

It applies more particularly to a mode of coding where a block of pixel is predictively encoded based on a predictor block pertaining to the same image.

This mode of encoding a block of pixel is generally referred to as INTRA Block Copy mode. It is contemplated to adopt this mode, for example, in lossless configuration of the Range Extension of the High Efficiency Video Coding (HE\/C: ISO/IEC 23008-2 MPEG-H Part 2/ ITU-T H.265) international standard.

When encoding an image in a video sequence, the image is first divided into coding entities of pixels of equal size referred to as Coding Tree Block (CTB). The size of a Coding Tree Block is typically 64 by 64 pixels. Each Coding Tree Block may then be decomposed in a hierarchical tree of smaller blocks which size may vary and which are the actual blocks to encode. These smaller blocks to encode are referred to as Coding Unit (CU).

The encoding of a particular Coding Unit is typically predictive. This means that a predictor block is first determined. Next, the difference between the predictor block and the Coding Unit is calculated. This difference is called the residue. Next, this residue is compressed. The actual encoded information of the Coding Unit is made of some information to indicate the way of determining the predictor block and the compressed residue. Best predictor blocks are blocks as similar as possible to the Coding Unit in order to get a small residue that could be efficiently compressed.

Encoding may be lossy, meaning that information is lost in the encoding process. The decoded block of pixel is not exactly the same as the original Coding Unit. Typically the loss of information comes from a quantization applied to the residue before entropy coding. This quantization allows a higher compression rate at the price of the loss of accuracy. Typically, high frequencies, namely the high level of details, are removed in the block.

Encoding may be lossless, meaning that the residue is not quantized.

This kind of encoding allows retrieving the exact copy of the original samples of the Coding Unit. The lossless encoding is obtained at the expense of compression rate which is much smaller compared to a lossy compression.

The coding mode is defined based on the method used to determine the predictor block for the predictive encoding method of a Coding Unit.

A first coding mode is referred to as INTRA mode. According to INTRA mode, the predictor block is built based on the value of pixels immediately surrounding the Coding Unit within the current image. It is worth noting that the predictor block is not a block of the current image but a construction. A direction is used to determine which pixels of the border are actually used to build the predictor block and how they are used. The idea behind INTRA mode is that, due to the general coherence of natural images, the pixels immediately surrounding the Coding Unit are likely to be similar to pixels of the current Coding Unit. Therefore, it is possible to get a good prediction of the value of pixels of the Coding Unit using a predictor block based on these surrounding pixels.

A second coding mode is referred to as INTER mode. According to INTER mode, the predictor block is a block of another image. The idea behind the INTER mode is that successive images in a sequence are generally very similar. The main difference comes typically from a motion between these images due to the scrolling of the camera or due to moving objects in the scene.

The predictor block is determined by a vector giving its location in a reference image relatively to the location of the Coding Unit within the current image. This vector is referred to as a motion vector. According to this mode, the encoding of such Coding Unit using this mode comprises motion information comprising the motion vector and the compressed residue.

We focus in this document on a third coding mode called INTRA Block Copy mode. According to the INTRA Block Copy mode, the block predictor is an actual block of the current image. A displacement vector is used to locate the predictor block. This displacement gives the location in the current image of the predictor block relatively to the location of the Coding Unit in the same current image. It comes that this displacement vector shares some similarities with the motion vector of the INTER mode. It is sometime called motion vector by analogy. As there could not be a motion within an image, strictly speaking, and for the sake of clarity, in this document motion vector always refer to the INTER mode while displacement vector is used for the INTRA Block Copy mode.

The causal principle is the principle that states that all information to decode a particular Coding Unit must be based on already reconstructed Coding Units. At encoding, the whole information may be considered as available. Namely, to encode a given Coding Unit it would be possible to use any information from the entire current images or from all decoded and available other images in the sequence. At decoding, things are different. The decoding of the current images is typically done by decoding sequentially all Coding Unit. The order of decoding follows typically a raster scan order, namely beginning in the upper left of the image, progressing from left to right and from top to bottom. It come that when decoding a given Coding Unit, only the part of the current image located up or left to the current Coding Unit has already been decoded. This is the only available information for the decoding of the current Coding Unit. This has to be taken into account at encoding. For example, a predictor block in INTRA Block Copy mode, should pertain to the part of the image that will be available at decoding.

In INTRA Block Copy mode, the displacement vector is encoded using the regular motion vector difference (MVD) coding. This encoding consists, for a component of the vector, in coding whether this component is zero, and if not, to encode its sign and its magnitude. Both components corresponding to the vertical and horizontal directions are encoded that way.

The present invention has been devised to improve the encoding of the displacement vector in INTRA Block Copy mode.

It proposes to improve the encoding of a displacement vector by introducing a step of prediction of this vector into an INTRA Block Copy encoding mode.

According to a first aspect of the invention there is provided a method for encoding a bi-dimensional vector representing a location in a current image, comprising a plurality of blocks of pixels, at least one encoding mode used for encoding the blocks of pixels being an INTRA Block Copy prediction mode consisting in predicting a current block in the current image using a predictor block corresponding to a reconstructed actual block of the current image, the method comprising: determining prediction information for this bi-dimensional vector; encoding the bi-dimensional vector with respect to the determined prediction information.

Accordingly, the coding of the displacement vector is improved.

In an embodiment, determining prediction information of this bi-dimensional vector comprises: using a bi-dimensional vector associated to another block of the current image encoded in a mode using at least a bi-dimensional vector for identifying a predictor block.

In an embodiment, determining prediction information of this bi-dimensional vector comprises: using a bi-dimensional vector associated to another block encoded according to the INTRA Block Copy prediction mode of the current image as prediction information.

In an embodiment, determining prediction information of this bi-dimensional vector comprises: using a bi-dimensional vector associated to the last block of the current image encoded according to the INTRA Block Copy prediction mode as prediction information.

In an embodiment, determining prediction information of this bi-dimensional vector comprises: using a bi-dimensional vector associated with the block located at the left of the current block.

In an embodiment, determining prediction information of this bi-dimensional vector comprises: using a bi-dimensional vector associated with the block located at the above of the current block.

In an embodiment, determined prediction information comprises a plurality of bi-dimensional vectors, and one bi-dimensional vector minimizing a rate distortion criterion is selected among the plurality of bi-dimensional vectors.

In an embodiment, the current image being divided in coding entities of pixels of equal size, each block of pixels belonging to one coding entity, determining prediction information of this bi-dimensional vector comprises: using a bi-dimensional vector associated with blocks located within the same coding entity.

In an embodiment, the location of the predictor block for the current block being restrained in a given window, determining prediction information of this bi-dimensional vector comprises: determining this prediction information as invalid if it points outside said window.

In an embodiment, determining a prediction information of this bi-dimensional vector comprises: determining this prediction information as invalid if it points to a block overlapping with the current block.

In an embodiment, determining a prediction information of this bi-dimensional vector comprises: determining this prediction information as invalid if it's value is nil.

According to another aspect of the invention there is provided a method for decoding a bi-dimensional vector representing a location in a current image, comprising a plurality of blocks of pixels, at least one encoding mode used for encoding the blocks of pixels being an INTRA Block Copy prediction mode consisting in predicting a current block in the current image using a predictor block corresponding to a reconstructed actual block of the current image, the method comprising: determining prediction information for this bi-dimensional vector; decoding the bi-dimensional vector with respect to the determined prediction information.

In an embodiment, determined prediction information comprises a plurality of bi-dimensional vectors, and one bi-dimensional vector is selected among the plurality of bi-dimensional vectors.

According to another aspect of the invention there is provided a method for encoding video data comprising a method for encoding a bi-dimensional vector according to the invention.

According to another aspect of the invention there is provided a method for decoding video data comprising a method for decoding a bi-dimensional vector according to the invention.

According to another aspect of the invention there is provided a device for encoding a bi-dimensional vector representing a location in a current image, comprising a plurality of blocks of pixels, at least one encoding mode used for encoding the blocks of pixels being an INTRA Block Copy prediction mode consisting in predicting a current block in the current image using a predictor block corresponding to a reconstructed actual block of the current image, the device for encoding comprising: a prediction module for determining prediction information for this bi-dimensional vector; an encoding module for encoding the bi-dimensional vector with respect to the determined prediction information.

According to another aspect of the invention there is provided a device for decoding a bi-dimensional vector representing a location in a current image, comprising a plurality of blocks of pixels, at least one encoding mode used for encoding the blocks of pixels being an INTRA Block Copy prediction mode consisting in predicting a current block in the current image using a predictor block corresponding to a reconstructed actual block of the current image, the device for decoding comprising: a prediction module for determining prediction information for this bi-dimensional vector; a decoding module for decoding the bi-dimensional vector with respect to the determined prediction information.

According to another aspect of the invention there is provided a computer program product for a programmable apparatus, the computer program product comprising a sequence of instructions for implementing a method according to the invention, when loaded into and executed by the programmable apparatus.

According to another aspect of the invention there is provided a non-transitory computer-readable storage medium storing instructions of a computer program for implementing a method according to the invention.

According to another aspect of the invention there is provided an information storage means readable by a computer or a microprocessor storing instructions of a computer program, wherein it makes it possible to implement the method according to the invention.

At least parts of the methods according to the invention may be computer implemented. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit", "module" or "system". Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.

Since the present invention can be implemented in software, the present invention can be embodied as computer readable code for provision to a programmable apparatus on any suitable carrier medium. A tangible carrier medium may comprise a storage medium such as a floppy disk, a CD-ROM, a hard disk drive, a magnetic tape device or a solid state memory device and the like. A transient carrier medium may include a signal such as an electrical signal, an electronic signal, an optical signal, an acoustic signal, a magnetic signal or an electromagnetic signal, e.g. a microwave or RF signal.

Embodiments of the invention will now be described, by way of example only, and with reference to the following drawings in which: Figure 1 illustrates the HEVC encoder architecture; Figure 2 illustrates the HEVC decoder architecture; Figure 3 illustrates the neighboring positions blocks used to generate motion vector predictors in AMVP and Merge of HEVC; Figure 4 illustrates the derivation process of motion vector predictors in AMVP; Figure 5 illustrates the derivation process of motion candidates in Merge; Figure 6 illustrates the Coding Tree Block splitting in Coding Units and the scan order decoding of these Coding Unit; Figure 7 illustrates the concept of the causal area; Figure 8 illustrates the coding of a motion vector difference in HEVC; Figure 9 illustrates the concept of the causal area and its restrictions in the case of INTRA Block Copy; Figure 10 illustrates the concept of the last decoded vector in an embodiment of the present invention; Figure 11 illustrates one vector prediction scheme for INTRA Block Copy in an embodiment of the present invention; Figure 12 illustrates a second vector prediction scheme for INTRA Block Copy in an embodiment of the present invention; Figure 13 is a schematic block diagram of a computing device for implementation of one or more embodiments of the invention.

Figure 1 illustrates the HEVC encoder architecture. In the video encoder, an original sequence 101 is divided into blocks of pixels 102. A coding mode is then affected to each block. There are two families of coding modes typically used in HEVC: the modes based on spatial prediction or INTRA modes 103 and the modes based on temporal prediction or INTER modes based on motion estimation 104 and motion compensation 105. An INTRA Coding Unit is generally predicted from the encoded pixels at its causal boundary by a process called INTRA prediction.

Temporal prediction first consists in finding in a previous or future frame called the reference frame 116 the reference area which is the closest to the Coding Unit in a motion estimation step 104. This reference area constitutes the predictor block. Next this Coding Unit is predicted using the predictor block to compute the residue in a motion compensation step 105.

In both cases, spatial and temporal prediction, a residual is computed by subtracting the Coding Unit from the original predictor block.

In the INTRA prediction, a prediction direction is encoded. In the temporal prediction, at least one motion vector is encoded. However, in order to further reduce the bitrate cost related to motion vector encoding, a motion vector is not directly encoded. Indeed, assuming that motion is homogeneous, it is particularly interesting to encode a motion vector as a difference between this motion vector, and a motion vector in its surrounding. In H.264/AVC coding standard for instance, motion vectors are encoded with respect to a median vector computed between 3 blocks located above and on the left of the current block. Only a difference, also called residual motion vector, computed between the median vector and the current block motion vector is encoded in the bitstream. This is processed in module "Mv prediction and coding" 117. The value of each encoded vector is stored in the motion vector field 118. The neighboring motion vectors, used for the prediction, are extracted from the

motion vector field 118.

Then, the mode optimizing the rate distortion performance is selected in module 106. In order to further reduce the redundancies, a transform, typically a DCI, is applied to the residual block in module 107, and a quantization is applied to the coefficients in module 108. The quantized block of coefficients is then entropy coded in module 109 and the result is inserted in the bitstream 110.

The encoder then performs a decoding of the encoded frame for the future motion estimation in modules 111 to 116. These steps allow the encoder and the decoder to have the same reference frames. To reconstruct the coded frame, the residual is inverse quantized in module 111 and inverse transformed in module 112 in order to provide the "reconstructed" residual in the pixel domain. According to the encoding mode (INTER or INTRA), this residual is added to the INTER predictor 114 or to the INTRA predictor 113.

Then, this first reconstruction is filtered in module 115 by one or several kinds of post filtering. These post filters are integrated in the encoded and decoded loop. It means that they need to be applied on the reconstructed frame at encoder and decoder side in order to use the same reference frame at encoder and decoder side. The aim of this post filtering is to remove compression artifacts.

In Figure 2, have been represented the principle of a decoder. The video stream 201 is first entropy decoded in a module 202. The residual data are then inverse quantized in a module 203 and inverse transformed in a module 204 to obtain pixel values. The mode data are also entropy decoded and in function of the mode, an INTRA type decoding or an INTER type decoding is performed. In the case of INTRA mode, an INTRA predictor is determined in function of the INTRA prediction mode specified in the bitstream 205. If the mode is INTER, the motion information is extracted from the bitstream 202. This is composed of the reference frame index and the motion vector residual. The motion vector predictor is added to the motion vector residual to obtain the motion vector 210.

The motion vector is then used to locate the reference area in the reference frame 206. Note that the motion vector field data 211 is updated with the decoded motion vector in order to be used for the prediction of the next decoded motion vectors. This first reconstruction of the decoded frame is then post filtered 207 with exactly the same post filter as used at encoder side. The output of the decoder is the de-compressed video 209.

This INTRA Block Copy coding mode is particularly well suited for extremely repetitive patterns. In particular, it is known to help coding graphical elements such as glyphs, the graphical representation of a character, or traditional GUI elements, which are very difficult to code using traditional INTRA prediction methods.

It is worth noting that prediction is based on coherence between neighbor Coding Units. This coherence may be geographic when considered within the current frame or temporal when considered across successive frames. This kind of coherence occurs in natural images. As INTRA Block Copy encoding mode is seen as a mode dedicated to text or symbolic images, predication is thought as useless for this kind of image. For instance, there is no reason to have two successive Coding Units in an image representing a text having good predictors close to each other. The first Coding Unit may be the part of letter "A", a good predictor block would therefore come from another "A" in the text. While the next Coding Unit would be a "F" letter having a predictor block from another "F" in the text. There is no reason, a-priori, to have the two predictor blocks in the same neighborhood. This is why prior art does not contemplate introducing prediction in INTRA Block Copy encoding mode.

Figure 3 illustrates spatial and temporal blocks that can be used to generate motion vector predictors in Advanced Motion Vector Prediction (AMVP) and Merge modes of HEVC coding and decoding systems and Figure 4 shows simplified steps of the process of the AMVP predictor set derivation.

Two predictors, i.e. the two spatial motion vectors of the AMVF mode, are chosen among the top blocks and the left blocks including the top corner blocks and left corner block and one predictor is chosen among the bottom right block and center block of the collocated block as represented in Figure 3.

Turning to Figure 4, a first step aims at selecting a first spatial predictor (Cand 1, 406) among the bottom left blocks A0 and Al, that spatial positions are illustrated in Figure 3. To that end, these blocks are selected (400, 402) one after another, in the given order, and, for each selected block, following conditions are evaluated (404) in the given order, the first block for which conditions are fulfilled being set as a predictor: -the motion vector from the same reference list and the same reference image; -the motion vector from the other reference list and the same reference image; -the scaled motion vector from the same reference list and a different reference image; or -the scaled motion vector from the other reference list and a different reference image.

If no value is found, the left predictor is considered as being unavailable. In this case, it indicates that the related blocks were INTRA coded or those blocks do not exist.

A following step aims at selecting a second spatial predictor (Cand 2, 416) among the above right block BO, above block Bi, and left above block B2, that spatial positions are illustrated in Figure 3. To that end, these blocks are selected (408, 410, 412) one after another, in the given order, and, for each selected block, the above mentioned conditions are evaluated (414) in the given order, the first block for which the above mentioned conditions are fulfilled being set as a predictor.

Again, if no value is found, the top predictor is considered as being unavailable. In this case, it indicates that the related blocks were INTRA coded or those blocks do not exist.

In a next step (418), the two predictors, if both are available, are compared one to the other to remove one of them if they are equal (i.e. same motion vector values, same reference list, same reference index and the same direction type). If only one spatial predictor is available, the algorithm is looking for a temporal predictor in a following step.

The temporal motion predictor (Cand 3, 426) is derived as follows: the bottom right (H, 420) position of the collocated block in a previous frame is first considered in the availability check module 422. If it does not exist or if the motion vector predictor is not available, the center of the collocated block (Center, 424) is selected to be checked. These temporal positions (Center and H) are depicted in Figure 3.

The motion predictor value is then added to the set of predictors.

Next, the number of predictors (Nb_Car,d) is compared (428) to the maximum number of predictors (Max Cand). As mentioned above, the maximum number of predictors (MAX_Gand) of motion vector predictors that the derivation process of AMVP needs to generate is two in the current version of HEVC standard.

If this maximum number is reached, the final list or set of AMVP predictors (432) is built. Otherwise, a zero predictor is added (430) to the list.

The zero predictor is a motion vector equal to (0,0).

As illustrated in Figure 4, the final list or set of AMVP predictors (432) is built from a subset of spatial motion predictors (400 to 412) and from a subset of temporal motion predictors (420, 424).

As mentioned above, a motion predictor candidate of Merge mode or of Merge Skip mode represents all the required motion information: direction, list, reference frame index, and motion vectors. An indexed list of several candidates is generated by a merge derivation process. In the current HEVC design the maximum number of candidates for both Merge modes is equal to five (4 spatial candidates and 1 temporal candidate).

Figure 5 is a schematic of a motion vector derivation process of the Merge modes. In a first step of the derivation process, five block positions are considered (500 to 508). These positions are the spatial positions depicted in Figure 3 with references Al, Bl, BO, AU, and B2. In a following step, the availability of the spatial motion vectors is checked and at most five motion vectors are selected (510). A predictor is considered as available if it exists and if the block is not INTRA coded. Therefore, selecting the motion vectors corresponding to the five blocks as candidates is done according to the following conditions: -if the "left" Al motion vector (500) is available (510), i.e. if it exists and if this block is not INTRA coded, the motion vector of the "left" block is selected and used as a first candidate in list of candidate (514); -if the "above" Bi motion vector (502) is available (510), the candidate "above" block motion vector is compared to "left" Al motion vector (512), if it exists. If Bl motion vector is equal to Al motion vector, Bl is not added to the list of spatial candidates (514). On the contrary, if Bl motion vector is not equal to Al motion vector, Bl is added to the list of spatial candidates (514); -if the "above right" BO motion vector (504) is available (510), the motion vector of the "above right" is compared to Bl motion vector (512). If BO motion vector is equal to Bi motion vector, BO motion vector is not added to the list of spatial candidates (514).

On the contrary, if BO motion vector is not equal to Bl motion vector, BO motion vector is added to the list of spatial candidates (514); -if the "below left" AU motion vector (506) is available (510), the motion vector of the "below left" is compared to Al motion vector (512). If AU motion vector is equal to Al motion vector, AU motion vector is not added to the list of spatial candidates (514). On the contrary, if AU motion vector is not equal to Al motion vector, AU motion vector is added to the list of spatial candidates (514); and -if the list of spatial candidates doesn't contain four candidates, the availability of "above left" B2 motion vector (508) is checked (510). If it is available, it is compared to Al motion vector and to Bl motion vector. If B2 motion vector is equal to Al motion vector or to Bl motion vector, B2 motion vector is not added to the list of spatial candidates (514). On the contrary, if B2 motion vector is not equal to Al motion vector or to Bl motion vector, B2 motion vector is added to the list of spatial candidates (514).

At the end of this stage, the list of spatial candidates comprises up to four candidates.

For the temporal candidate, two positions can be used: the bottom right position of the collocated block (516, denoted H in Figure 3) and the center of the collocated block (518). These positions are depicted in Figure 3.

As for the AMVP motion vector derivation process, a first step aims at checking (520) the availability of the block at the H position. Next, if it is not available, the availability of the block at the center position is checked (520). If at least one motion vector of these positions is available, the temporal motion vector can be scaled (522), if needed, to the reference frame having index 0, for both list [0 and [1, in order to create a temporal candidate (524) which is added to the list of Merge motion vector predictor candidates. It is positioned after the spatial candidates in the list.

If the number (Nb_Gand) of candidates is strictly less (526) than the maximum number of candidates (Max_Cand that value is signaled in the bit-stream slice header and is equal to five in the current HEVC design) and if the current frame is of the B type, combined candidates are generated (528).

Combined candidates are generated based on available candidates of the list of Merge motion vector predictor candidates. It mainly consists in combining the motion vector of one candidate of the list LO with the motion vector of one candidate of list Ll.

If the number (Nb Gand) of candidates remains strictly less (530) than the maximum number of candidates (Max Cand), zero motion candidates are generated (532) until the number of candidates of the list of Merge motion vector predictor candidates reaches the maximum number of candidates.

At the end of this process, the list or set of Merge motion vector predictor candidates is built (534).

As illustrated in Figure 5, the list or set of Merge motion vector predictor candidates is built (534) from a subset of spatial candidates (500 to 508) and from a subset of temporal candidates (516, 518).

Figure 6 illustrates the Coding Tree Block splitting in Coding Units and the scan order decoding of these Coding Units. In the HEVC standard, the block structure is organized by Coding Tree Block (CTB). A frame contains several non-overlapped and square Coding Tree Block. The size of a Coding Tree Block can be equal to 64x64 to 16x16. This size is determined at sequence level. The most efficient size, in term of coding efficiency, is the largest one: 64x64. Please note that all Coding Tree Block have the same size except for the image border. In that case, the size is adapted according to the amount of pixels.

Each Coding Tree Block contains one or more square Coding Units (CU).

The Coding Tree Block is split based on a quad-tree structure into several Coding Units. The coding or decoding order of each Coding Unit in the Coding Tree Block follows the quad-tree structure based on a raster scan order. Figure 6 shows an example of the decoding order of Coding Units. In this figure, the number in each Coding Unit gives the decoding order of each Coding Unit of this Coding Tree Block.

Figure 7 illustrates how this prediction method works.

At a high-level, an image is divided into Coding Units that are encoded in raster scan order. Thus, when coding block 7.1, all the blocks of area 7.3 have already been encoded, and can be considered available to the encoder. Area 7.3 is called the causal area of the Coding Unit 7.1. Once Coding Unit 7.1 is encoded, it will belong to the causal area for the next Coding Unit. This next Coding Unit, as well as all the next ones, belongs to area 7.4 illustrated as doted area, and cannot be used for coding the current Coding Unit 7.1. It is worth noting that the causal area is constituted by reconstructed blocks. The information used to encode a given Coding Unit is not the original blocks of the image for the reason that this information is not available at decoding. The only information available at decoding is the reconstructed version of the blocks of pixels in the causal area, namely the decoded version of these blocks. For this reason, at encoding, previously encoded blocks of the causal area are decoded to provide this reconstructed version of these blocks.

INTRA Block Copy works by signaling a block 7.2 in the causal area which should be used to produce a prediction of block 7.1. In the HEVC Range Extension draft specifications (at the time of writing, Draft 4 from document JCTVC-N1005-v3), this block is indicated by a displacement vector 7.5, and is transmitted in the bitstream.

This displacement vector is the difference in coordinates between a particular point of the Coding Unit 7.1 and the equivalent point in the predictor block 7.2. Although it would be possible to use subpixel accuracy as for INTER blocks, this displacement is typically in integer units of pixels, therefore not to require costly subpixel interpolation. This vector is coded in the simplest way currently, the vector is not predicted, and the coordinates are coded using HEVC regular motion vector difference (MVD) coding.

Put in a simple way, the motion vector difference coding consists, for a value d, in coding whether d is zero, and if not, its sign and its magnitude minus 1. In HEVC motion vector difference coding interleaves the x and y components of the vector.

The motion vector difference coding is illustrated in Figure 8. Please note that each part of the encoding (or decoding) of a component is interleaved with the corresponding part for the other component, and therefore no shortcut can be taken when one component is null. Considering that the abscissa of a vector to encode is noted MVx, and its ordinate is noted MVy. Whether the Boolean value for whether MVx is null is coded during step 8.0, followed by the Boolean value for whether MVy is null in step 8.1. Then, step 8.2 checks whether MVx is non-null, in which case it codes the sign of MVx in step 8.3.

(which is equivalent to testing whether MVx>1 as the Boolean for whether its null has already been encoded). In all cases, step 8.4 follows: it checks whether MVy is non-null. If such is the case, step 8.5 codes the sign of MVy.

Then, in all cases, step 8.6 occurs: it checks whether MVx is non-null, in which case the magnitude of MVx minus 1 is coded during step 8.7. In all cases, step 8.8 follows, which checks whether MVy is non-null. If such is the case step 8.9 occurs and the magnitude of MVy minus 1 is coded. In all cases, the MVD coding can then end at step 8.10.

It is not very worthwhile to describe the corresponding decoder: one just needs to swap any occurrence of "encode" by "decode" in the two previous paragraphs.

Figure 9 illustrates the definition of the causal area and some of the restriction than can apply to HEVC.

In order to encode the Coding Unit 9.0, the encoder must evaluate different possible predictor blocks in the causal area. Because this is a significantly complex task as it is the case for motion estimation for INTER blocks, a search window 9.1 must be defined. This window defines the actual set of positions that can be tested. The causal area is therefore the part of the image that has been already reconstructed when contemplating the reconstruction of the current Coding Unit. Pixels belonging to the causal area are available for the reconstruction of the current Coding Unit. It does not comprise the current Coding Unit by definition.

In that case, we have illustrated the case chosen by a specific implementation of a HEVC encoder: window 9.1 is made of the current Coding Tree Block 9.10 and its two neighbors Coding Tree Block 9.11, 9.12 on the left.

When considering a potential position and the corresponding prediction block 9.2 and its associated vector 9.3, one can distinguish the cases where only causal data is accessed. Block 9.2 does not overlap block 9.0, but block 9.4 does. Therefore, part of block 9.4 actually does not use causal pixels and is not possible for INTRA Block Copy. Block 9.4 cannot be used as a proper predictor block for the prediction of the Coding Unit 9.0.

Inventors have found that, despite the a-priori lack of prediction that could be expected for symbolic images, adding a step of prediction for the displacement vector actually leads to an improvement of the encoding of this displacement vector. Moreover, the proposed embodiments have a small impact on memory consumption which is advantageous to provide simple and affordable decoders.

When looking for predictors for the displacement vector, the information that may be made available is the determined displacement vectors for previously decoded Coding Units. But, to be useful as predictor for the current displacement vector, those previously determined vectors are subject to several restrictions.

In a first embodiment, the predictor is selected only among vectors associated with Coding Units contained in the current Coding Tree Block. This avoids storing the vectors from other Coding Tree Blocks. For example, if it is needed to access the vector of the above Coding Tree Block, it means that it is needed to store the vector for one line of blocks. These blocks correspond to all the blocks at the bottom of each Coding Tree Block which need to be stored before being used for the prediction of the next Coding Tree Block line. For Hardware, it is needed to design memory for the worst case. So it means that a vector needs to be stored for each 8x8 block (smallest size for IBC mode) at the bottom of Coding Tree Blocks. So for an HD sequence (1920x1280), 1920/8=240 vectors need to be stored to access the above vector outside the Coding Tree Block. Please note that access to the left Coding Tree Block (with a size of 64x64) requires to store only 64/8=8 vectors for the worst case. So, in another embodiment the predictor can be set in the left Coding Tree Block but advantageously not in the above Coding Tree Block.

Moreover, this first embodiment (vector predictor only from the current Coding Tree Block) at encoder side offers a better parallelization of INTRA Block Copy estimation (the estimation could be parallelized for all Coding Tree Blocks). In the following, we consider that the process receives a value for the vector (x, y) to be predicted.

In an embodiment, a predictor value for the current displacement vector is considered as invalid if its value leads to a displacement leading for the current Coding Unit outside the window considered as the causal area. This may occur because the value of the displacement vector is relative to the coordinates of the associated Coding Unit. It may happen that a displacement vector points to a block inside the window for its associated Coding Unit and that it points to a block outside the window for the current Coding Unit.

In another embodiment, a predictor value is considered invalid if the block it points to overlap the current Coding Unit or non-decoded area.

In yet another embodiment, a predictor value of (0, 0) is considered as invalid. This nil value may have been affected as a default value when no predictor block may be found. This may be the case for the first Coding Unit in the frame for instance.

These three embodiments may be combined in order to determine the validity of a given candidate to be a predictor for the displacement vector.

In one embodiment, the neighboring vectors can be considered for the prediction of the current INTRA Block Copy block. As depicted in Figure 10, the left block number 15 or the above block number 12 can be considered as the predictor of the vector of the current block. In one embodiment, only the left is considered because it gives the most interesting gain. Indeed, according to the restriction of the search window, the block pointed by the selected vector should have higher value for horizontal direction than for vertical direction. Moreover as described in the next section, the vertical displacement is more limited due to the search window than the horizontal displacement.

In another embodiment, the vector predictor is equal to (-2N, 0) where N is the block size. For INTER Motion prediction, the default Motion vector predictor is (0, 0). For INTRA Block Copy method, the default vector should be (-aN, 0) where a>1 where N is the size of the block. Indeed, with this condition, we are sure that the block predictor doesn't overlap a non-encoded area. (With (0, 0) the predictor overlaps the current block). In a preferred embodiment a is set equal to 2. Moreover, the component Y is set to 0, because it avoids being outside the search windows for the top block of the Coding Tree Block. Without this restriction of the search window, another value can be considered (-aN, -aN) or (0, -aN). In one embodiment, the predictor is always (-2N, 0). This embodiment corresponds to the choice of a default predictor without any other consideration. In another embodiment, the value N is set to an average of block size. For example it is fixed to 16.

In another embodiment the selected predictor is the last decoded vector from an INTRA Block Copy Coding Unit. For example in Figure 10, in this embodiment the predictor of the current Coding Unit is the block 16 and not the vector of the neighboring blocks 12, 13, 15 or 17 because these blocks are coded in a classical INTRA mode and don't contain any vector. So with this solution we are sure to obtain a predictor even if the neighboring blocks aren't INTRA Block Copy coded. The main advantage of using this predictor compared to the use of the left or the above vector is the memory reduction.

Indeed with this solution only one vector needs to be stored instead of 63 for the worst case of above and left predictors. When the restriction on the selection of the predictor inside the Coding Tree Block is used, no vector is available before decoded a first INTRA Block Copy block. In one embodiment a default predictor is then made available and set to (-2N, 0) (exceptions can be considered for the first column of Coding Tree Block, because the predictor (-2N, 0) should point outside the image).

In one embodiment, when the INTRA Block Copy mode is used in an INTER frame, the motion vector of a neighboring INTER mode (Merge or Inter) is used as the predictor for the current INTRA Block Copy block. If several INTER vectors are available, the vector with the nearest reference frame is selected. But if an INTRA Block Copy vector is available, it is selected.

In another embodiment, the temporal block, meaning the collocated block in the reference image, is considered to define a vector predictor, especially if it is an INTRA Block Copy.

In one embodiment the vector predictor can depends on the INTRA direction of the neighboring blocks. (The direction of an INTRA mode is the direction used to propagate or interpolate the neighboring pixels in order to build the block prediction. In HEVC 32 directional modes are defined in addition to the DC and planar mode.) In this embodiment, an input vector is considered.

The aim is to change this vector or add a vector in a list of predictors. The INTRA direction of one or more neighboring blocks is determined. If the direction or respectively an average (or median) direction is near to the horizontal direction than the vertical direction, then the component y of the predictor vector (x, y) is set equal to 0. In opposite if the direction is near to the vertical direction, the component x is set to 0. The advantage of this embodiment is that if an horizontal edge is determined on the neighboring left block, it is interesting in term of coding efficiency to avoid vertical displacement by setting y to 0 for the INTRA Block Copy vector (x,y = 0). Same remark can be made for a vertical edge. In an additional embodiment, only the neighboring left block are considered to determine an horizontal INTRA direction and only the above block are considered to determine an vertical INTRA direction. In an additional embodiment only the direction horizontal and vertical are considered useful to determine the INTRA direction. Please note that the direction can be computed based on the gradient and not on the INTRA direction.

In the first embodiment, only one predictor is considered. Yet several vectors can be considered, as the predictor, it the first one is not available.

Figure 11 illustrates this process. In this figure, the Input vector (1109) will be predicted.

For example, the predictor Predi (1101) is the above vector, the predictor Fred 2 (1104) is the left vector and the predictor Fred 3 (1107) is the default predictor (-2N, 0).

In the preferred embodiment, the restrictions are the ones described above, namely the first one is to verify that the predictor does not point outside the window and the second one that the pointed block does not overlap with the current Coding Unit or non-decoded pixels. Other restrictions may be contemplated. In the same way, only one restriction can be used.

First, the restriction number 1 on the predictor 1 (1101) is tested (1102).

If this predictor respects the condition, the restriction number 2 (1103) is tested.

If at least one of these restrictions (1102) (1103) was false, the same restrictions 1 (1105) and 2 (1106) are tested (1105) on the second predictor (1104). If at least one of these restrictions (1105) (1106) was false, the 3rd predictor (1107) is considered. This 3 prediction is a default predictor which is set as value which always respects the restriction 1 and 2. When a predictor is selected, it is added (1110) to the decoded residual (1109) in order to obtain the vector (1111) needed to find the prediction of the INTRA Block Copy mode (1112).

The second embodiment is a competition based scheme of predictors for the INTRA Block Copy mode. Figure 12 shows the decoding process of this embodiment. The modules 1201 to 1212 have the same functioning as respectively the modules 1101 to 1112. The difference is that the 3 predictors are added to the list (1214). Moreover, the decoded predictor index i is extracted from the bitstream (1213). According to this number i the ith predictor is extracted (1215) from the list of predictors (1214). This predictor is used for the un-prediction of the vector of the current INTRA Block Copy block. Please note that the modules (1202, 1203, 1205,1206) are little bit different to respectively the modules (1102, 1103, 1105, 1106). Indeed, they don't imply an action when the predictor is not valid according to the restrictions 1 and 2.

Please note that the list of predictors is never empty because the default predictor pred3 is always added to the list. Moreover, it is possible to avoid the predictor index coding for some cases in order to save bitrate. For example, if all predictors are equal the index doesn't need to be inserted into the bitstream and extracted from the bitstream of course. In the same way, if some of them are equal the number of bits needed to code the index can be limited.

In a preferred embodiment the maximum number of predictors is 2 even if the list can contain 3 or more predictors.

Figure 13 is a schematic block diagram of a computing device 1300 for implementation of one or more embodiments of the invention. The computing device 1300 may be a device such as a micro-computer, a workstation or a light portable device. The computing device 1300 comprises a communication bus connected to: -a central processing unit 1301, such as a microprocessor, denoted CPU; -a random access memory 1302, denoted RAM, for storing the executable code of the method of embodiments of the invention as well as the registers adapted to record variables and parameters necessary for implementing the method for encoding or decoding at least part of an image according to embodiments of the invention, the memory capacity thereof can be expanded by an optional RAM connected to an expansion pod for example; -a read only memory 1303, denoted ROM, for storing computer programs for implementing embodiments of the invention; -a network interface 1304 is typically connected to a communication network over which digital data to be processed are transmitted or received.

The network interface 1304 can be a single network interface, or composed of a set of different network interfaces (for instance wired and wireless interfaces, or different kinds of wired or wireless interfaces). Data packets are written to the network interface for transmission or are read from the network interface for reception under the control of the software application running in the CPU 1301; -a user interface 1305 may be used for receiving inputs from a user or to display information to a user; -a hard disk 1306 denoted HD may be provided as a mass storage device; -an I/O module 1307 may be used for receiving/sending data from/to external devices such as a video source or display.

The executable code may be stored either in read only memory 1303, on the hard disk 1306 or on a removable digital medium such as for example a disk. According to a variant, the executable code of the programs can be received by means of a communication network, via the network interface 1304, in order to be stored in one of the storage means of the communication device 1300, such as the hard disk 1306, before being executed.

The central processing unit 1301 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to embodiments of the invention, which instructions are stored in one of the aforementioned storage means. After powering on, the CPU 1301 is capable of executing instructions from main RAM memory 1302 relating to a software application after those instructions have been loaded from the program ROM 1303 or the hard-disc (HD) 1306 for example. Such a software application, when executed by the CPU 1301, causes the steps of the flowcharts shown in Figures 11 or 12 to be performed.

Any step of the algorithm shown in Figure 11 or 12 may be implemented in software by execution of a set of instructions or program by a programmable computing machine, such as a PC ("Personal Computer"), a DSP ("Digital Signal Processor") or a microcontroller; or else implemented in hardware by a machine or a dedicated component, such as an FPGA ("Field-Programmable Gate Array") or an ASIC ("Application-Specific Integrated Circuit").

Although the present invention has been described hereinabove with reference to specific embodiments, the present invention is not limited to the specific embodiments, and modifications will be apparent to a skilled person in the art which lie within the scope of the present invention.

Many further modifications and variations will suggest themselves to those versed in the art upon making reference to the foregoing illustrative embodiments, which are given by way of example only and which are not intended to limit the scope of the invention, that being determined solely by the appended claims. In particular the different features from different embodiments may be interchanged, where appropriate.

In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used.

Claims

CLAIMS1. A method for encoding a bi-dimensional vector representing a location in a current image, comprising a plurality of blocks of pixels, at least one encoding mode used for encoding the blocks of pixels being an INTRA Block Copy prediction mode consisting in predicting a current block in the current image using a predictor block corresponding to a reconstructed actual block of the current image, the method comprising: -determining prediction information for this bi-dimensional vector; -encoding the bi-dimensional vector with respect to the determined prediction information.
2. The method according to claim 1, wherein determining prediction information of this bi-dimensional vector comprises: -using a bi-dimensional vector associated to another block of the current image encoded in a mode using at least a bi-dimensional vector for identifying a predictor block.
3. The method according to claim 2, wherein determining prediction information of this bi-dimensional vector comprises: -using a bi-dimensional vector associated to another block encoded according to the INTRA Block Copy prediction mode of the current image as prediction information.
4. The method according to claim 3, wherein determining prediction information of this bi-dimensional vector comprises: -using a bi-dimensional vector associated to the last block of the current image encoded according to the INTRA Block Copy prediction mode as prediction information.
5. The method according to claim 1, wherein determining prediction information of this bi-dimensional vector comprises: -using a bi-dimensional vector associated with the block located at the left of the current block.
6. The method according to claim 1, wherein determining prediction information of this bi-dimensional vector comprises: -using a bi-dimensional vector associated with the block located at the above of the current block.
7. The method according to claim 1 wherein determined prediction information comprises a plurality of bi-dimensional vectors, and one bi-dimensional vector minimising a rate distortion criterion is selected among the plurality of bi-dimensional vectors.
8. The method according to claims 1 or 2, wherein the current image being divided in coding entities of pixels of equal size, each block of pixels belonging to one coding entity, determining prediction information of this bi-dimensional vector comprises: -using a bi-dimensional vector associated with blocks located within the same coding entity.
9. The method according to any one claim 1 to 8, wherein the location of the predictor block for the current block being restrained in a given window, determining prediction information of this bi-dimensional vector comprises: -determining this prediction information as invalid if it points outside said window.
10. The method according to any one claim 1 to 8, wherein determining a prediction information of this bi-dimensional vector comprises: -determining this prediction information as invalid if it points to a block overlapping with the current block.
11. The method according to any one claim 1 to 8, wherein determining a prediction information of this bi-dimensional vector comprises: -determining this prediction information as invalid if it's value is nil.
12. A method for decoding a bi-dimensional vector representing a location in a current image, comprising a plurality of blocks of pixels, at least one encoding mode used for encoding the blocks of pixels being an INTRA Block Copy prediction mode consisting in predicting a current block in the current image using a predictor block corresponding to a reconstructed actual block of the current image, the method comprising: -determining prediction information for this bi-dimensional vector; decoding the bi-dimensional vector with respect to the determined prediction information.
13.The method according to claim 12, wherein determining prediction information of this bi-dimensional vector comprises: -using a bi-dimensional vector associated to another block of the current image encoded in a mode using at least a bi-dimensional vector for identifying a predictor block.
14.The method according to claim 13, wherein determining prediction information of this bi-dimensional vector comprises: -using a bi-dimensional vector associated to another block encoded according to the INTRA Block Copy prediction mode of the current image as prediction information.
15.The method according to claim 14, wherein determining prediction information of this bi-dimensional vector comprises: -using a bi-dimensional vector associated to the last block of the current image encoded according to the INTRA Block Copy prediction mode as prediction information.
16.The method according to claim 12, wherein determining prediction information of this bi-dimensional vector comprises: -using a bi-dimensional vector associated with the block located at the left of the current block.
17.The method according to claim 12, wherein determining prediction information of this bi-dimensional vector comprises: -using a bi-dimensional vector associated with the block located at the above of the current block.
18.The method according to claim 12, wherein determined prediction information comprises a plurality of bi-dimensional vectors, and one bi-dimensional vector is selected among the plurality of bi-dimensional vectors.
19. The method according to claim 12, wherein the current image being divided in coding entities of pixels of equal size, each block of pixels belonging to one coding entity, determining prediction information of this bi-dimensional vector comprises: -using a bi-dimensional vector associated with blocks located within the same coding entity.
20. The method according to any one claim 12 to 19, wherein the location of the predictor block for the current block being restrained in a given window, determining prediction information of this bi-dimensional vector comprises: -determining this prediction information as invalid if it points outside said window.
21.The method according to anyone claim l2to 19, wherein determining a prediction information of this bi-dimensional vector comprises: -determining this prediction information as invalid if it points to a block overlapping with the current block.
22.The method according to any one claim 12 to 19, wherein determining a prediction information of this bi-dimensional vector comprises: -determining this prediction information as invalid if it's value is nil.
23.A method for encoding video data comprising a method for encoding a bi-dimensional vector according to any claims 1 to 11.
24.A method for decoding video data comprising a method for decoding a bi-dimensional vector according to any claims 12 to 22.
25.A device for encoding a bi-dimensional vector representing a location in a current image, comprising a plurality of blocks of pixels, at least one encoding mode used for encoding the blocks of pixels being an INTRA Block Copy prediction mode consisting in predicting a current block in the current image using a predictor block corresponding to a reconstructed actual block of the current image, the device for encoding comprising: -a prediction module for determining prediction information for this bi-dimensional vector; -an encoding module for encoding the bi-dimensional vector with respect to the determined prediction information.
26. A device for decoding a bi-dimensional vector representing a location in a current image, comprising a plurality of blocks of pixels, at least one encoding mode used for encoding the blocks of pixels being an INTRA Block Copy prediction mode consisting in predicting a current block in the current image using a predictor block corresponding to a reconstructed actual block of the current image, the device for decoding comprising: -a prediction module for determining prediction information for this bi-dimensional vector; -a decoding module for decoding the bi-dimensional vector with respect to the determined prediction information.
27.A computer program product for a programmable apparatus, the computer program product comprising a sequence of instructions for implementing a method according to any one of claims 1 to 22, when loaded into and executed by the programmable apparatus.
28.A non-transitory computer-readable storage medium storing instructions of a computer program for implementing a method according to any one of claims 1 to 22.
29.An information storage means readable by a computer or a microprocessor storing instructions of a computer program, wherein it makes it possible to implement the method according to any one of the claims ito 22.
30.A method for encoding or decoding a bi-dimensional vector substantially as hereinbefore described with reference to, and as shown in Figures 1,2, ii and 12.
31.A device for encoding or decoding as hereinbefore described with reference to, and shown in Figs. 1 and 2, ii and 12 of the accompanying drawings.