WO2019072369A1

WO2019072369A1 - Motion vector list pruning

Info

Publication number: WO2019072369A1
Application number: PCT/EP2017/075711
Authority: WO
Inventors: Semih Esenlik; Zhijie Zhao; Anand Meher KOTRA; Han GAO
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2017-10-09
Filing date: 2017-10-09
Publication date: 2019-04-18

Abstract

The present disclosure relates to motion vector determination and in particular to construction of a list of candidate motion vectors for the purpose of motion vector prediction. In particular, if a motion vector predictor candidate is obtained using computationally intensive method than other motion vector predictor candidates, the motion vector candidate is to be inserted in a pre-defined fixed position in the MV predictor list. The presence of the other motion vector predictor candidates in the motion vector list does not depend on the value or the availability of the motion vector obtained by the computationally intensive approach.

Description

Motion vector list pruning

The present invention relates to the field of computer vision, in particular to the topic normally referred to as picture and video processing and coding, and in particular to motion vector determination.

BACKGROUND

Current hybrid video codecs employ predictive coding. A picture of a video sequence is subdivided into blocks of pixels and these blocks are then coded. Instead of coding a block pixel by pixel, the entire block is predicted using already encoded pixels in the spatial or temporal proximity of the block. The encoder further processes only the differences between the block and its prediction. The further processing typically includes a transformation of the block pixels into coefficients in a transformation domain. The coefficients may then be further compressed by means of quantization and further compacted by entropy coding to form a bitstream. The bitstream further includes any signaling information which enables the decoder to decode the encoded video. For instance, the signaling may include settings concerning the encoder settings such as size of the input picture, frame rate, quantization step indication, prediction applied to the blocks of the pictures, or the like.

Temporal prediction exploits temporal correlation between pictures, also referred to as frames, of a video. The temporal prediction is also called inter-prediction, as it is a prediction using the dependencies between (inter) different video frames. Accordingly, a block being encoded, also referred to as a current block, is predicted from one or more previously encoded picture(s) referred to as a reference picture(s). A reference picture is not necessarily a picture preceding the current picture in which the current block is located in the displaying order of the video sequence. The encoder may encode the pictures in a coding order different from the displaying order. As a prediction of the current block, a co-located block in a reference picture may be determined. The co-located block is a block which is located in the reference picture on the same position as is the current block in the current picture. Such prediction is accurate for motionless picture regions, i.e. picture regions without movement from one picture to another.

In order to obtain a predictor which takes into account the movement, i.e. a motion compensated predictor, motion estimation is typically employed when determining the prediction of the current block. Accordingly, the current block is predicted by a block in the reference picture, which is located in a distance given by a motion vector from the position of the co-located block. In order to enable a decoder to determine the same prediction of the current block, the motion vector may be signaled in the bitstream. In order to further reduce the signaling overhead caused by signaling the motion vectorfor each of the blocks, the motion vector itself may be estimated. The motion vector estimation may be performed based on the motion vectors of the neighboring blocks in spatial and/or temporal domain. The prediction of the current block may be computed using one reference picture or by weighting predictions obtained from two or more reference pictures. The reference picture may be an adjacent picture, i.e. a picture immediately preceding and/or the picture immediately following the current picture in the display order since adjacent pictures are most likely to be similar to the current picture. However, in general, the reference picture may be also any other picture preceding or following the current picture in the displaying order and preceding the current picture in the bitstream (decoding order). This may provide advantages for instance in case of occlusions and/or non-linear movement in the video content. The reference picture identification may thus be also signaled in the bitstream.

A special mode of the inter-prediction is a so-called bi-prediction in which two reference pictures are used in generating the prediction of the current block. In particular, two predictions determined in the respective two reference pictures are combined into a prediction signal of the current block. The bi-prediction may result in a more accurate prediction of the current block than the uni-prediction, i.e. prediction only using a single reference picture. The more accurate prediction leads to smaller differences between the pixels of the current block and the prediction (referred to also as "residuals"), which may be encoded more efficiently, i.e. compressed to a shorter bitstream. In general, more than two reference pictures may be used to find respective more than two reference blocks to predict the current block, i.e. a multi- reference inter prediction can be applied. The term multi-reference prediction thus includes bi- prediction as well as predictions using more than two reference pictures. In order to provide more accurate motion estimation, the resolution of the reference picture may be enhanced by interpolating samples between pixels. Fractional pixel interpolation can be performed by weighted averaging of the closest pixels. In case of half-pixel resolution, for instance a bilinear interpolation is typically used. Other fractional pixels are calculated as an average of the closest pixels weighted by the inverse of the distance between the respective closest pixels to the pixel being predicted.

The motion vector estimation is a computationally complex task in which a similarity is calculated between the current block and the corresponding prediction blocks pointed to by candidate motion vectors in the reference picture. Typically, the search region includes M x M samples of the image and each of the sample position of the M x M candidate positions is tested. The test includes calculation of a similarity measure between the N x N reference block C and a block R, located at the tested candidate position of the search region. For its simplicity, the sum of absolute differences (SAD) is a measure frequently used for this purpose and given by:

In the above formula, x and y define the candidate position within the search region, while indices i and j denote samples within the reference block C and candidate block R. The candidate position is often referred to as block displacement or offset, which reflects the representation of the block matching as shifting of the reference block within the search region and calculating a similarity between the reference block C and the overlapped portion of the search region. In order to reduce the complexity, the number of candidate motion vectors is usually reduced by limiting the candidate motion vectors to a certain search space. The search space may be, for instance, defined by a number and/or positions of pixels surrounding the position in the reference picture corresponding to the position of the current block in the current image. After calculating SAD for all M x M candidate positions x and y, the best matching block R is the block on the position resulting in the lowest SAD, corresponding to the largest similarity with reference block C. On the other hand, the candidate motion vectors may be defined by a list of candidate motion vectors formed by motion vectors of neighboring blocks.

Motion vectors are usually at least partially determined at the encoder side and signaled to the decoder within the coded bitstream. However, the motion vectors may also be derived at the decoder. In such case, the current block is not available at the decoder and cannot be used for calculating the similarity to the blocks to which the candidate motion vectors point in the reference picture. Therefore, instead of the current block, a template is used which is constructed out of pixels of already decoded blocks. For instance, already decoded pixels adjacent to the current block may be used. Such motion estimation provides an advantage of reducing the signaling: the motion vector is derived in the same way at both the encoder and the decoder and thus, no signaling is needed. On the other hand, the accuracy of such motion estimation may be lower.

In order to provide a tradeoff between the accuracy and signaling overhead, the motion vector estimation may be divided into two steps: motion vector derivation and motion vector refinement. For instance, a motion vector derivation may include selection of a motion vector from the list of candidates. Such a selected motion vector may be further refined for instance by a search within a search space. The search in the search space is based on calculating cost function for each candidate motion vector, i.e. for each candidate position of block to which the candidate motion vector points.

Document JVET-D0029: Decoder-Side Motion Vector Refinement Based on Bilateral Template Matching, X. Chen, J. An, J. Zheng (the document can be found at: http://phenix.it- sudparis.eu/jvet/ site) shows motion vector refinement in which a first motion vector in integer pixel resolution is found and further refined by a search with a half-pixel resolution in a search space around the first motion vector.

Motion vector prediction which is performed to code motion vectors efficiently requires construction of a list of candidates. The construction has to be performed at the encoder as well as at the decoder. If the list is to include candidates obtained by template matching, then the complexity and delay may grow.

SUMMARY

The present disclosure is based on observation that the availability and redundancy checks performed before entering a motion vector into the list may cause delays and complexity especially in cases in which the remaining list construction depends on the value of the motion vector obtained by a complex procedure. In such cases, the provision of such motion vector obtained by a complex procedure is necessary only for the purpose of the checks before entering the remaining motion vectors into the list even if the motion vector obtained by a complex procedure is not further used and needed.

In order to avoid such increase in complexity, the inclusion of motion vectors into the list is performed independently of the value of the motion vector obtained by a complex procedure. This may be achieved by a rule specifying at which position the motion vector obtained by a complex procedure is to be entered and/or by skipping for other motion vectors checks whether they are identical or similar to such motion vector before entering them into the list.

According to an aspect of the invention, an apparatus is provided for decoding a video image including decoding a current prediction block from a bitstream. The apparatus includes a processing circuitry which is configured to: generate a set of candidate motion vectors for the current prediction block by assigning an index to each candidate motion vector based on a predefined rule, according to which indexes assigned to candidate motion vectors not obtained by template matching are independent of any candidate motion vector obtained by template matching; parsing from the bitstream an index for the current prediction block; determine a motion vector predictor for the prediction block as the candidate motion vector associated with the parsed index; and decode the prediction block based on the determined motion vector predictor.

One of the advantages of the index assignment independent from the motion vector value / availability of the motion vector obtained by template matching at the encoder and/or decoder is that at the decoder, the motion vector obtained with the template matching does not have to be determined if it is not selected and included in the bitstream. This, on the other hand, leads to a more efficient implementation.

For example, at least one predefined index within the set may be reserved for a candidate motion vector obtained by template matching. This is one way in which the independency can be ensured: since the motion vector(s) obtained by template matching are assigned a predefined index (such as fixed index), the remaining assignment of indexes to the candidates not obtained by the template matching can be be performed independently.

In particular, if the candidate motion vector obtained by template matching is not available, motion vector assigned the predefined index may further be set to a predefined value. In other words, the unavailable motion vector to be obtained by template matching is padded, in order to assign a value to the predefined reserved index.

For example, the predefined value is a zero motion vector.

According to an embodiment, if the candidate motion vector obtained by template matching is not available or has a value already included in the set or has a value similar to a value already included in the set, the predefined index is assigned a motion vector of a previously decoded prediction block or a motion vector obtained as a function of one or more motion vectors of respective predefined previously decoded prediction blocks, the similarity being measured by thresholding difference between the value obtained by template matching and the values already included in the set. In this embodiment, instead of padding, the index which is to be assigned a motion vector obtained by template matching, if such motion vector is unavailable or redundant, is further assigned another motion vector based on the surrounding blocks already processed (encoded at the encoder side, decoded at the decoder side). In this embodiment, a motion vector is redundant if the same or similar value is already included in the list. Similarity is evaluated, for instance, by calculating a difference between the motion vector (obtained by the template matching) and any (in worst case each) vector of the list. If any of the corresponding differences is smaller than a threshold then the motion vector is considered as redundant and not included into the list. The replacement motion vector may be a motion vector of a predefined block or a motion vector of a block obtained according to a predetermined rule, or any motion vector determined in a same way at the encoder and the decoder.

If there are more than one motion vectors to be determined by template matching (for instance using different reference pictures and/or different templates), in one example, the processing circuitry is further configured to generate the set of candidate motion vectors so that indices assigned to a plurality of respective candidates obtained by template matching are also independent of each other's motion vector value and/or availability.

In one embodiment, the processing circuitry is further configured to, during the generation of the set; check whether currently inserted motion vector is already included in the set; and include the currently inserted motion vector into the set only if a similar motion vector is not already included in the set minus any candidate motion vector obtained by template matching.

In an embodiment (combinable with any of the above examples and embodiments), the processing circuitry is configured to: generate the set of candidate motion vectors for the current prediction block by assigning an index to each candidate motion vector except for any of j candidate motion vectors obtained by template matching, the index having respective values from the first index up to the last but j indexes, j being an integer larger than zero; and assign the last j indexes to the respective j candidate motion vectors obtained by template matching.

According to another embodiment, the processing circuitry is configured to:generate the set of candidate motion vectors for the current prediction block by: assigning an index to candidate motion vectors except for any of j candidate motion vectors obtained by template matching, the index having respective values from the j-th plus one index up to the last index, j being an integer larger than zero; after the assigning of index to the candidate motion vectors not obtained by template matching, assigning to the candidate motion vectors obtained by template matching respective indexes from the first index to the j-th index; wherein the index assigning at least for the candidate motion vectors not obtained by template matching includes: checking whether currently assigned motion vector is already included in the set and assigning the index to the currently assigned motion vector only if a similar motion vector is not already included in the set.

For example, in the above embodiments and examples, the predefined index reserved for a candidate motion vector obtained by template matching for a previously decoded prediction block may be the first or the last index within the set. In one implementation, the processing circuitry is configured to parse from the bitstream a flag indicating for an image data unit whether or not candidate motion vectors obtained by template matching are allowed to be inserted into the set. In particular, the image data unit is slice or coding tree block, or the flag is signaled in a sequence parameter set. According to an aspect of the invention, an apparatus is provided for encoding a video image including encoding a current prediction block into a bitstream, the apparatus including a processing circuitry configured to: generate a set of candidate motion vectors for the current prediction block by assigning an index to each candidate motion vector based on a predefined rule, according to which indexes assigned to candidate motion vectors not obtained by template matching are independent of any candidate motion vector obtained by template matching; determine a motion vector predictor for a motion vector of the current prediction block as one of the candidate motion vectors associated with an index; and including into the bitstream the index of the determined motion vector predictor for the current prediction block; encode the prediction block based on the motion vector for which the motion vector predictor is determined.

At the encoder, as also in the above decoder, for example, at least one predefined index within the set is reserved for a candidate motion vector obtained by template matching. Moreover, if the candidate motion vector obtained by template matching is not available, motion vector assigned the predefined index is set to a predefined value. The predefined value may be a zero motion vector.

In one embodiment, if the candidate motion vector obtained by template matching is not available or has a value already included in the set or has a value similar to a value already included in the set, the predefined index is assigned a motion vector of a previously decoded prediction block or a motion vector obtained as a function of one or more motion vectors of respective predefined previously decoded prediction blocks, the similarity being measured by thresholding difference between the value obtained by template matching and the values already included in the set.

The generating the set of candidate motion vectors may include assigning indices to a plurality of respective candidates obtained by template matching also independently of each other's motion vector value and/or availability.

In an embodiment, the generating may further include checking whether currently inserted motion vector is already included in the set; and including the currently inserted motion vector into the set only if a similar motion vector is not already included in the set minus any candidate motion vector obtained by template matching.

Alternatively or in addition, the processing circuitry in the generation may further generate the set of candidate motion vectors for the current prediction block by assigning an index to each candidate motion vector except for any of j candidate motion vectors obtained by template matching, the index having respective values from the first index up to the last but j indexes, j being an integer larger than zero; and assign the last j indexes to the respective j candidate motion vectors obtained by template matching.

According to an embodiment, the processing circuitry may be configured for generating the set of candidate motion vectors for the current prediction block by: assigning an index to candidate motion vectors except for any of j candidate motion vectors obtained by template matching, the index having respective values from the j-th plus one index up to the last index, j being an integer larger than zero, and after the assigning of index to the candidate motion vectors not obtained by template matching, assigning to the candidate motion vectors obtained by template matching respective indexes from the first index to the j-th index; wherein the index assigning at least for the candidate motion vectors not obtained by template matching includes: checking whether currently assigned motion vector is already included in the set and assigning the index to the currently assigned motion vector only if a similar motion vector is not already included in the set. For example, the predefined index reserved for a candidate motion vector obtained by template matching for a previously decoded prediction block is the first or the last index within the set.

According to an embodiment, the processing circuitry is further configured for including into the bitstream a flag indicating for an image data unit whether or not candidate motion vectors obtained by template matching are allowed to be inserted into the set. For example, the image data unit is slice or coding tree block, or the flag is signaled in a sequence parameter set.

According to an aspect of the invention, a method is provided for decoding a video image including decoding a current prediction block from a bitstream, the method including the steps of: generating a set of candidate motion vectors for the current prediction block by assigning an index to each candidate motion vector based on a predefined rule, according to which indexes assigned to candidate motion vectors not obtained by template matching are independent of any candidate motion vector obtained by template matching; parsing from the bitstream an index for the current prediction block; determining a motion vector predictor for the prediction block as the candidate motion vector associated with the parsed index; and decoding the prediction block based on the determined motion vector predictor. According to an aspect of the invention, a method for encoding a video image including encoding a current prediction block into a bitstream, the method including the steps of: generating a set of candidate motion vectors for the current prediction block by assigning an index to each candidate motion vector based on a predefined rule, according to which indexes assigned to candidate motion vectors not obtained by template matching are independent of any candidate motion vector obtained by template matching; determining a motion vector predictor for a motion vector of the current prediction block as one of the candidate motion vectors associated with an index; including into the bitstream the index of the determined motion vector predictor for the current prediction block; and encoding the prediction block based on the motion vector for which the motion vector predictor is determined.

According to an aspect of the invention a non-transitory computer-readable storage medium is provided storing instructions which when executed by a processor / processing circuitry perform the steps according to any of the above aspects or embodiments or their combinations.

In the following exemplary embodiments are described in more detail with reference to the attached figures and drawings, in which:

Figure 1 is a block diagram showing an exemplary structure of an encoder in which the motion vector derivation and refinement may be employed;

Figure 2 is a block diagram showing an exemplary structure of a decoder in which the motion vector derivation and refinement may be employed;

Figure 3 is a schematic drawing illustrating an exemplary template matching suitable for bi-prediction;

Figure 4 is a schematic drawing illustrating an exemplary template matching suitable for uni- and bi-prediction;

Figure 5 is a block diagram illustrating stages of motion vector derivation operating

without providing initial motion vectors to be refined in the bitstream;

Figure 6 is a flow diagram showing the steps performed for motion vector prediction at the encoder and the decoder;

Figure 7 is a flow diagram showing the steps performed for motion vector candidate list construction; Figure 8 is a flow diagram showing the steps performed for motion vector candidate list construction according to an embodiment;

Figure 9 is a flow diagram showing the steps performed for motion vector candidate list construction according to another embodiment;

Figure 10 is a flow diagram showing the steps performed for motion vector candidate list construction according to another embodiment; and

Figure 1 1 is a block diagram illustrating an exemplary hardware to implement an

embodiment of the invention.

DETAILED DESCRIPTION

Construction of a motion vector candidate list typically requires obtaining two kinds of candidates:

(1 ) One or more motion vector candidates which are obtained based on motion vector derivation/refinement process using template matching, i.e. based on motion vectors which are obtained by applying template matching. Since the template matching is computationally rather demanding, such candidates need to be calculated resulting in higher delay and computational effort.

(2) Other candidate motion vectors which do not require template matching. These are not computationally as demanding. For instance, such candidates may be obtained by motion vector prediction from the motion vectors of the neighboring blocks and/or as predefined motion vectors (such as (0, 0)).

The candidate(s) of type (1 ) above are required to be computed in the decoder (also in the encoder) even if finally type (2) candidate is selected as the motion vector predictor for a current block, due to the motion vector list pruning process, and in particular the redundancy checking operation before inclusion of the motion vector into the list as is described below in connection with the encoder and decoder similar to H.265/HEVC standard.

To obtain a candidate of type (1 ), for instance, the current coding block constructs an L-shaped (or another type) template, uses this template to obtain a motion vector by finding a patch of samples that resemble the template the most, and uses the obtained motion vector as motion vector predictor candidate. Here it is assumed that the MVs of the neighboring blocks are already available due to coding order (can be obtained with or without template matching).

Fig. 1 shows an encoder 100 which comprises an input for receiving input image samples of frames or pictures of a video stream and an output for generating an encoded video bitstream. The term "frame" in this disclosure is used as a synonym for picture. However, it is noted that the present disclosure is also applicable to fields in case interlacing is applied. In general, a picture includes m times n pixels. This corresponds to image samples and may comprise one or more color components. For the sake of simplicity, the following description refers to pixels meaning samples of luminance. However, it is noted that the motion vector search of the invention can be applied to any color component including chrominance or components of a search space such as RGB or the like. On the other hand, it may be beneficial to only perform motion vector estimation for one component and to apply the determined motion vector to more (or all) components.

The input blocks to be coded do not necessarily have the same size. One picture may include blocks of different sizes and the block raster of different pictures may also differ.

In an explicative realization, the encoder 100 is configured to apply prediction, transformation, quantization, and entropy coding to the video stream. The transformation, quantization, and entropy coding are carried out respectively by a transform unit 106, a quantization unit 108 and an entropy encoding unit 170 so as to generate as an output the encoded video bitstream. The video stream may include a plurality of frames, wherein each frame is divided into blocks of a certain size that are either intra or inter coded. The blocks of for example the first frame of the video stream are intra coded by means of an intra prediction unit 154. An intra frame is coded using only the information within the same frame, so that it can be independently decoded and it can provide an entry point in the bitstream for random access. Blocks of other frames of the video stream may be inter coded by means of an inter prediction unit 144: information from previously coded frames (reference frames) is used to reduce the temporal redundancy, so that each block of an inter-coded frame is predicted from a block in a reference frame. A mode selection unit 160 is configured to select whether a block of a frame is to be processed by the intra prediction unit 154 or the inter prediction unit 144. This mode selection unit 160 also controls the parameters of intra or inter prediction. In order to enable refreshing of the image information, intra-coded blocks may be provided within inter-coded frames. Moreover, intra-frames which contain only intra-coded blocks may be regularly inserted into the video sequence in order to provide entry points for decoding, i.e. points where the decoder can start decoding without having information from the previously coded frames. The intra estimation unit 152 and the intra prediction unit 154 are units which perform the intra prediction. In particular, the intra estimation unit 152 may derive the prediction mode based also on the knowledge of the original image while intra prediction unit 154 provides the corresponding predictor, i.e. samples predicted using the selected prediction mode, for the difference coding. For performing spatial or temporal prediction, the coded blocks may be further processed by an inverse quantization unit 1 10, and an inverse transform unit 1 12. After reconstruction of the block a loop filtering unit 120 is applied to further improve the quality of the decoded image. The filtered blocks then form the reference frames that are then stored in a decoded picture buffer 130. Such decoding loop (decoder) at the encoder side provides the advantage of producing reference frames which are the same as the reference pictures reconstructed at the decoder side. Accordingly, the encoder and decoder side operate in a corresponding manner. The term "reconstruction" here refers to obtaining the reconstructed block by adding to the decoded residual block the prediction block.

The inter estimation unit 142 receives as an input a block of a current frame or picture to be inter coded and one or several reference frames from the decoded picture buffer 130. Motion estimation is performed by the inter estimation unit 142 whereas motion compensation is applied by the inter prediction unit 144. The motion estimation is used to obtain a motion vector and a reference frame based on certain cost function, for instance using also the original image to be coded. For example, the motion estimation unit 142 may provide initial motion vector estimation. The initial motion vector may then be signaled within the bitstream in form of the vector directly or as an index referring to a motion vector candidate within a list of candidates constructed based on a predetermined rule in the same way at the encoder and the decoder. The motion compensation then derives a predictor of the current block as a translation of a block co-located with the current block in the reference frame to the reference block in the reference frame, i.e. by a motion vector. The inter prediction unit 144 outputs the prediction block for the current block, wherein said prediction block minimizes the cost function. For instance, the cost function may be a difference between the current block to be coded and its prediction block, i.e. the cost function minimizes the residual block. The minimization of the residual block is based e.g. on calculating a sum of absolute differences (SAD) between all pixels (samples) of the current block and the candidate block in the candidate reference picture. However, in general, any other similarity metric may be employed, such as mean square error (MSE) or structural similarity metric (SSIM).

However, cost-function may also be the number of bits necessary to code such inter-block and/or distortion resulting from such coding. Thus, the rate-distortion optimization procedure may be used to decide on the motion vector selection and/or in general on the encoding parameters such as whether to use inter or intra prediction for a block and with which settings.

The intra estimation unit 152 and intra prediction unit 154 receive as an input a block of a current frame or picture to be intra coded and one or several reference samples from an already reconstructed area of the current frame. The intra prediction then describes pixels of a current block of the current frame in terms of a function of reference samples of the current frame. The intra prediction unit 154 outputs a prediction block for the current block, wherein said prediction block advantageously minimizes the difference between the current block to be coded and its prediction block, i.e., it minimizes the residual block. The minimization of the residual block can be based e.g. on a rate-distortion optimization procedure. In particular, the prediction block is obtained as a directional interpolation of the reference samples. The direction may be determined by the rate-distortion optimization and/or by calculating a similarity measure as mentioned above in connection with inter-prediction.

The inter estimation unit 142 receives as an input a block or a more universal-formed image sample of a current frame or picture to be inter coded and two or more already decoded pictures 231 . The inter prediction then describes a current image sample of the current frame in terms of motion vectors to reference image samples of the reference pictures. The inter prediction unit 142 outputs one or more motion vectors for the current image sample, wherein said reference image samples pointed to by the motion vectors advantageously minimize the difference between the current image sample to be coded and its reference image samples, i.e., it minimizes the residual image sample. The predictorfor the current block is then provided by the inter prediction unit 144 for the difference coding.

The difference between the current block and its prediction, i.e. the residual block 105, is then transformed by the transform unit 106. The transform coefficients 107 are quantized by the quantization unit 108 and entropy coded by the entropy encoding unit 170. The thus generated encoded picture data 171 , i.e. encoded video bitstream, comprises intra coded blocks and inter coded blocks and the corresponding signaling (such as the mode indication, indication of the motion vector, and/or intra-prediction direction). The transform unit 106 may apply a linear transformation such as a Fourier or Discrete Cosine Transformation (DFT/FFT or DCT). Such transformation into the spatial frequency domain provides the advantage that the resulting coefficients 107 have typically higher values in the lower frequencies. Thus, after an effective coefficient scanning (such as zig-zag), and quantization, the resulting sequence of values has typically some larger values at the beginning and ends with a run of zeros. This enables further efficient coding. Quantization unit 108 performs the actual lossy compression by reducing the resolution of the coefficient values. The entropy coding unit 170 then assigns to coefficient values binary codewords to produce a bitstream. The entropy coding unit 170 also codes the signaling information (not shown in Fig. 1 ).

Fig. 2 shows a video decoder 200. The video decoder 200 comprises particularly a decoded picture buffer 230, an inter prediction unit 244 and an intra prediction unit 254, which is a block prediction unit. The decoded picture buffer 230 is configured to store at least one (for uni- prediction) or at least two (for bi-prediction) reference frames reconstructed from the encoded video bitstream, said reference frames being different from a current frame (currently decoded frame) of the encoded video bitstream. The intra prediction unit 254 is configured to generate a prediction block, which is an estimate of the block to be decoded. The intra prediction unit 254 is configured to generate this prediction based on reference samples that are obtained from the reconstructed block 215 or buffer 216.

The decoder 200 is configured to decode the encoded video bitstream generated by the video encoder 100, and preferably both the decoder 200 and the encoder 100 generate identical predictions for the respective block to be encoded / decoded. The features of the decoded picture buffer 230, reconstructed block 215, buffer 216 and the intra prediction unit 254 are similar to the features of the decoded picture buffer 130, reconstructed block 1 15, buffer 1 16 and the intra prediction unit 154 of Fig. 1.

The video decoder 200 comprises further units that are also present in the video encoder 100 like e.g. an inverse quantization unit 210, an inverse transform unit 212, and a loop filtering unit 220, which respectively correspond to the inverse quantization unit 1 10, the inverse transform unit 1 12, and the loop filtering unit 120 of the video coder 100.

An entropy decoding unit 204 is configured to decode the received encoded video bitstream and to correspondingly obtain quantized residual transform coefficients 209 and signaling information. The quantized residual transform coefficients 209 are fed to the inverse quantization unit 210 and an inverse transform unit 212 to generate a residual block. The residual block is added to a prediction block 265 and the addition is fed to the loop filtering unit 220 to obtain the decoded video. Frames of the decoded video can be stored in the decoded picture buffer 230 and serve as a decoded picture 231 for inter prediction.

Generally, the intra prediction units 154 and 254 of Figs. 1 and 2 can use reference samples from an already encoded area to generate prediction signals for blocks that need to be encoded or need to be decoded.

The entropy decoding unit 204 receives as its input the encoded bitstream 171. In general, the bitstream is at first parsed, i.e. the signaling parameters and the residuals are extracted from the bitstream. Typically, the syntax and semantic of the bitstream is defined by a standard so that the encoders and decoders may work in an interoperable manner. As described in the above Background section, the encoded bitstream does not only include the prediction residuals. In case of motion compensated prediction, a motion vector indication is also coded in the bitstream and parsed therefrom at the decoder. The motion vector indication may be given by means of a reference picture in which the motion vector is provided and by means of the motion vector coordinates. So far, coding the complete motion vectors was considered. However, also only the difference between the current motion vector and the previous motion vector in the bitstream may be encoded. This approach allows exploiting the redundancy between motion vectors of neighboring blocks.

In order to efficiently code the reference picture, H.265 codec (ITU-T, H.265, Series H: Audiovisual and multimedia systems: High Efficient Video Coding) provides a list of reference pictures assigning to list indices respective reference frames. The reference frame is then signaled in the bitstream by including therein the corresponding assigned list index. Such list may be defined in the standard or signaled at the beginning of the video or a set of a number of frames. It is noted that in H.265 there are two lists of reference pictures defined, called L0 and L1 . The reference picture is then signaled in the bitstream by indicating the list (L0 or L1 ) and indicating an index in that list associated with the desired reference picture. Providing two or more lists may have advantages for better compression. For instance, L0 may be used for both uni-directionally inter-predicted slices and bi-directionally inter-predicted slices while L1 may only be used for bi-directionally inter-predicted slices. However, in general the present disclosure is not limited to any content of the L0 and L1 lists.

The lists L0 and L1 may be defined in the standard and fixed. However, more flexibility in coding/decoding may be achieved by signaling them at the beginning of the video sequence. Accordingly, the encoder may configure the lists L0 and L1 with particular reference pictures ordered according to the index. The L0 and L1 lists may have the same fixed size. There may be more than two lists in general. The motion vector may be signaled directly by the coordinates in the reference picture. Alternatively, as also specified in H.265, a list of candidate motion vectors may be constructed and an index associated in the list with the particular motion vector can be transmitted.

Motion vectors of the current block are usually correlated with the motion vectors of neighboring blocks in the current picture or in the earlier coded pictures. This is because neighboring blocks are likely to correspond to the same moving object with similar motion and the motion of the object is not likely to change abruptly over time. Consequently, using the motion vectors in neighboring blocks as predictors reduces the size of the signaled motion vector difference. The Motion Vector Predictors (MVPs) are usually derived from already encoded/decoded motion vectors from spatial neighboring blocks or from temporally neighboring or co-located blocks in the reference picture. In H.264/AVC, this is done by doing a component wise median of three spatially neighboring motion vectors. Using this approach, no signaling of the predictor is required. Temporal MVPs from a co-located block in the reference picture are only considered in the so called temporal direct mode of H.264/AVC. The H.264/AVC direct modes are also used to derive other motion data than the motion vectors. Hence, they relate more to the block merging concept in HEVC. In HEVC, the approach of implicitly deriving the MVP was replaced by a technique known as motion vector competition, which explicitly signals which MVP from a list of MVPs, is used for motion vector derivation. The variable coding quad-tree block structure in HEVC can result in one block having several neighboring blocks with motion vectors as potential MVP candidates. Taking the left neighbor as an example, in the worst case a 64x64 luma prediction block could have 16 4x4 luma prediction blocks to the left when a 64x64 luma coding tree block is not further split and the left one is split to the maximum depth.

Advanced Motion Vector Prediction (AMVP) was introduced to modify motion vector competition to account for such a flexible block structure. During the development of HEVC, the initial AMVP design was significantly simplified to provide a good trade-off between coding efficiency and an implementation friendly design. The initial design of AMVP included five MVPs from three different classes of predictors: three motion vectors from spatial neighbors, the median of the three spatial predictors and a scaled motion vector from a co-located, temporally neighboring block. Furthermore, the list of predictors was modified by reordering to place the most probable motion predictor in the first position and by removing redundant candidates to assure minimal signaling overhead. The final design of the AMVP candidate list construction includes the following two MVP candidates: a) up to two spatial candidate MVPs that are derived from five spatial neighboring blocks; b) one temporal candidate MVPs derived from two temporal, co-located blocks when both spatial candidate MVPs are not available or they are identical; and c) zero motion vectors when the spatial, the temporal or both candidates are not available. Details on motion vector determination can be found in the book by V. Sze et al (Ed.), High Efficiency Video Coding (HEVC): Algorithms and Architectures, Springer, 2014, in particular in Chapter 5, incorporated herein by reference.

In order to further improve motion vector estimation without further increase in signaling overhead, it may be beneficial to further refine the motion vectors derived at the encoder side and provided in the bitstream. The motion vector refinement may be performed at the decoder without assistance from the encoder. The encoder in its decoder loop may employ the same refinement to obtain corresponding motion vectors. Motion vector refinement is performed in a search space which includes integer pixel positions and fractional pixel positions of a reference picture. For example, the fractional pixel positions may be half-pixel positions or quarter-pixel or further fractional positions. The fractional pixel positions may be obtained from the integer (full-pixel) positions by interpolation such as bi-linear interpolation.

In a bi-prediction of current block, two prediction blocks obtained using the respective first motion vector of list L0 and the second motion vector of list L1 , are combined to a single prediction signal, which can provide a better adaptation to the original signal than uni- prediction, resulting in less residual information and possibly a more efficient compression. Since at the decoder, the current block is not available since it is being decoded, for the purpose of motion vector refinement, a template is used, which is an estimate of the current block and which is constructed based on the already processed (i.e. coded at the encoder side and decoded at the decoder side) image portions.

First, an estimate of the first motion vector MVO and an estimate of the second motion vector MV1 are received as input at the decoder 200. At the encoder side 100, the motion vector estimates MVO and MV1 may be obtained by block matching and/or by search in a list of candidates (such as merge list) formed by motion vectors of the blocks neighboring to the current block (in the same picture or in adjacent pictures). MVO and MV1 are then advantageously signaled to the decoder side within the bitstream. However, it is noted that in general, also the first determination stage at the encoder could be performed by template matching which would provide the advantage of reducing signaling overhead.

At the decoder side 200, the motion vectors MVO and MV1 are advantageously obtained based on information in the bitstream. The MVO and MV1 are either directly signaled, or differentially signaled, and/or an index in the list of motion vector (merge list) is signaled. However, the present disclosure is not limited to signaling motion vectors in the bitstream. Rather, the motion vector may be determined by template matching already in the first stage, correspondingly to the operation of the encoder. The template matching of the first stage (motion vector derivation) may be performed based on a search space different from the search space of the second, refinement stage. In particular, the refinement may be performed on a search space with higher resolution (i.e. shorter distance between the search positions).

An indication of the two reference pictures RefPicO and RefPid , to which respective MVO and MV1 point, are provided to the decoder as well. The reference pictures are stored in the decoded picture buffer at the encoder and decoder side as a result of previous processing, i.e. respective encoding and decoding. One of these reference pictures is selected for motion vector refinement by search. A reference picture selection unit of the apparatus for the determination of motion vectors is configured to select the first reference picture to which MVO points and the second reference picture to which MV1 points. Following the selection, the reference picture selection unit determines whether the first reference picture or the second reference picture is used for performing of motion vector refinement. For performing motion vector refinement, the search region in the first reference picture is defined around the candidate position to which motion vector MVO points. The candidate search space positions within the search region are analyzed to find a block most similar to a template block by performing template matching within the search space and determining a similarity metric such as the sum of absolute differences (SAD). The positions of the search space denote the positions on which the top left corner of the template is matched. As already mentioned above, the top left corner is a mere convention and any point of the search space such as the central point can in general be used to denote the matching position.

According to the above mentioned document JVET-D0029, the decoder-side motion vector refinement (DMVR) has as an input the initial motion vectors MVO and MV1 which point into two respective reference pictures RefPictO and RefPictl . These initial motion vectors are used for determining the respective search spaces in the RefPictO and RefPictl . Moreover, using the motion vectors MVO and MV1 , a template is constructed based on the respective blocks (of samples) A and B pointed to by MVO and MV1 as follows: Template = function ((Block A, Block B)).

The function may be sample clipping operation in combination with sample-wise weighted summation. The template is then used to perform template matching in the search spaces determined based on MVO and MV1 in the respective reference pictures 0 and 1. The cost function for determining the best template match in the respective search spaces is SAD(Template, Block candA), where block candA is the candidate coding block which is pointed by the candidate MV in the search space spanned on a position given by the MVO. Figure 3 illustrates the determination of the best matching block A and the resulting refined motion vector MVO'. Correspondingly, the same template is used to find best matching block B' and the corresponding motion vector MVV which points to block B' as shown in Figure 3. In other words, after the template is constructed based on the block A and B pointed to by the initial motion vectors MVO and MV1 , the refined motion vectors MVO' and MVV are found via search on RefPicO and RefPid with the template.

Motion vector derivation techniques are sometimes also referred to as frame rate up- conversion (FRUC). The initial motion vectors MVO and MV1 may generally be indicated in the bitstream to ensure that encoder and decoder may use the same initial point for motion vector refinement. Alternatively, the initial motion vectors may be obtained by providing a list of initial candidates including one or more initial candidates. For each of them a refined motion vector is determined and at the end, the refined motion vector minimizing the cost function is selected. It is further noted that the present invention is not limited to the template matching as described above with reference to Figure 3. Figure 4 illustrates an alternative template matching which is also applicable for uni-prediction. Details can be found in document JVET-A1001 , in particular in Section "2.4.6. Pattern matched motion vector derivation" of document JVET- A1001 which is titled "Algorithm Description of Joint Exploration Test Model 1 ", by Jianle Chen et. al. and which is accessible at: http://phenix.it-sudparis.eu/jvet/. The template in this template matching approach is determined as samples adjacent to the current bock in the current frame. As shown in Figure 1 , the already reconstructed samples adjacent to the top and left boundary of the current block may be taken, referred to as "L-shaped template".

Figure 5 illustrates another type of motion vector derivation which may also be used. The input to the motion vector derivation process is a flag that indicates whether or not the motion vector derivation is applied. Implicitly, another input to the derivation process is the motion vector of a neighboring (temporally or spatially) previously coded/reconstructed block. The motion vectors of a plurality of neighboring blocks are used as candidates for the initial search step of motion vector derivation. The output of the process is MVO' (possibly also MV1 ', if bi-prediction is used) and the corresponding reference picture indices refPictO and possibly refPictl respectively. The motion vector refinement stage then includes the template matching as described above. After finding the refined one (uni-prediction) or more (bi-prediction / multi- frame prediction) motion vectors, the predictor of the current block is constructed (for bi/multi- prediction by weighted sample prediction, otherwise by referring to the samples pointed to by MV refined).

The present invention is not limited to the 2 template matching methods described above. As an example a third template matching method which is called bilateral matching (also described in the document JVET-A1001 ), can also be used for motion vector refinement and the invention applies similarly. According to bilateral matching, best match between two blocks along the motion trajectory of the current block in two different reference pictures is searched. Under the assumption of continuous motion trajectory, the motion vectors MVO and MV1 pointing to the two reference blocks shall be proportional to the temporal distances, i.e., TD0 and TD1 , between the current picture and the two reference pictures. In bilateral matching a cost function such as SAD(Block candO', Block candV) might be used where Block candO' is pointed by MVO and Block candl ' is pointed by MV1. Figure 6 illustrates Advanced Motion Vector Prediction (AMVP) which is also described in more detail in Section 5.2.1 titled "Advanced Motion Vector Prediction" of the book High Efficiency Video Coding (HEVC) by Vivienne Sze et.al, Springer 2014. In particular, on the bottom of Figure 6, shows operation at the encoder. The bitstream is added encoded 610 motion vector residuals. The motion vector residuals are obtained by subtracting 620 from the current motion vector the predictor for the motion vector, i.e. motion vector obtained from the motion vector prediction list based on the motion vector 645. Moreover, in order to enable reconstruction of the current prediction block at the decoder, the encoder also includes into the bitstream encoded 615 indication (index) which identifies within motion vector prediction list the selected motion vector predictor. The selection is performed in step 625 out of the motion vector prediction list constructed in step 630. The selection may be performed by selecting the candidate which is most similar to the motion vector 645 determined for the current prediction block.

The decoder portion is shown in the top part of Figure 6. In particular, the decoder portion shows parsing 660 of the encoded motion vector residual from the bitstream and parsing 650 of the motion vector prediction index from the bitstream. The parsed index is used to select 670 the motion vector prediction out of the list of motion vector predictions constructed 630 in the same way at the decoder as it was done at the encoder. In particular, the list is constructed and the parsed index indicates the motion vector prediction from the list which is to be selected and applied to obtain the motion vector predictor. Then the motion vector prediction is added 665 to the parsed motion vector residual and the motion vector to be applied to the coding block (current prediction block) is obtained 680 and applied to obtain the block prediction.

In other words, the AMVP process applies the following equation to construct the MV to be applied to a coding block: MV = MVpredictor + MVresidual, i.e. the motion vector MV of the current prediction block is obtained as a sum of the prediction MVpredictor and residuals MVresidual for this block.

The motion vector predictor list construction 630 is performed identically in encoder and decoder. This is illustrated in Figure 7 in a simplified manner, based on "JVET-F1001 Algorithm description of Joint Exploration Test Model 6 (JEM6)" document of Joint Video Exploration Team (JVET). AMVP first constructs a list of motion vector predictors, from which one of the candidates is selected as the predictor of the MV of the coding block. In Figure 7 a flowchart for motion vector predictor list construction is shown. During the list construction, additional checks are performed in order to ensure that the list size is fixed (e.g. two entries long) and that motion vector candidates are available. In Figure 7, these additional conditions are not included for the sake of simplicity. The list pruning process that is depicted in Figure 7 compares motion vectors with the list entries and discards one of them if they are identical. Alternatively, other types of redundancy checks might be employed, such as checking whether two motion vectors are similar (according to a distance metric and a corresponding threshold). In particular, in Figure 7, in step 710, a first motion vector candidate MVfruc is obtained using motion vector derivation/refinement with template matching. Then, in step 720, its availability is tested. The MVfruc may be unavailable, for instance, if the iterative motion vector derivation/refinement process does not converge to a single motion vector candidate or if it is not possible to construct a template for template matching especially at the frame boundaries. When MVfruc is available, in step 730, the MVfruc is inserted into the list - it is assumed in this example that MVfruc is available. In step 740, if size (length, i.e. number of entries) of the motion vector candidate list achieves a predefined maximum MAX_size, then the list construction is terminated. In this example it is assumed that MAX_size is 2. Thus, in the first step, the MVfruc is inserted into the list. If in step 720, the MVfruc is not available, then the list remains empty in this stage and another motion vector candidate is tested. If the list did not achieve the predefined maximum size, then another MV is tested.

In step 750, motion vector candidate MV1 is obtained from the left or bottom-left spatial block neighbor (left spatial neighbor having priority). If the motion vector of left spatial neighbor is available, it is used. If left spatial neighbor is not available, the motion vector of bottom-left spatial neighbor is used for motion vector prediction. If the motion vectors of both left and bottom-left spatial block neighbors are unavailable, MV1 is assumed to be unavailable. In the example the motion vector candidate MV1 is assumed available. The motion vectors of spatially or temporally neighboring blocks might not be available if for instance these blocks are not coded with inter prediction, or if the motion vector of the neighboring block does not point to the same reference picture as the current coding block. In step 760, it is judged whether or not MV1 is identical to any of MVs already included in the candidate list. In this example, the MV1 is not identical to the MVfruc already in the list and thus, MV1 is inserted, in step 735, into the list which now comprises MVfruc and MV1. In step 745, it is tested whether or not the maximum list length is reached and, if affirmative, the motion vector list construction is terminated. In this example the maximum list length has been reached, and the list remains with MVfruc and MV1 .

In case in which MV1 is the same as MVfruc (i.e. the list only includes MVfruc so far), in step 770, motion vector candidate MV2 is obtained from the top or top-right spatial neighboring block (top spatial neighbor having priority). In the example, MV2 is assumed to be available. In the following step 765, it is judged whether MV2 is identical to any of MVs (MVfruc) in the candidate list. Here it is assumed that MV2 is not identical to MVfruc and thus, MV2 is inserted to the list in step 737. The list has thus achieved the maximum length and the list construction is thus terminated in step 780. The list has entries (MVfruc, MV2) in this case. If the maximum length of the list is not yet reached, then further MVs are testes in step 790, similarly to the previously described stages.

As can be seen from this example, the resulting list depends on the value of MVfruc. If MV1 is not similar to MVfruc, then the list will be (MVfruc, MV1 ). If MV1 is similar to MVfruc, then the list will be (MVfruc, MV2).

If candidate 1 (entry 1 which is MVfruc) from the list is selected by the current block as predictor, template matching (here FRUC) process needs to be performed as MVfruc is necessary for reconstructing the motion vector and ultimately for sample reconstruction.

However, if candidate 2 (entry 2 which is MV1 or MV2) is selected by the current block as predictor, FRUC process still needs to be performed as MVfruc is necessary for MV list pruning process (although MVfruc is not used for the equation MV = MVpredictor + MVresidual). According to Figure 7, in order to identify if the second candidate in the list is MV1 or MV2, FRUC process must be carried out. It is not possible to know the second candidate (e.g. MV1 or MV2) without obtaining MVFruc first.

In particular, at the decoder, if candidate 1 is selected by an index which is coded in the bitstream, then FRUC operation needs to be performed. If candidate 2 is selected by an index which is coded in the bitstream, then there is a problem since FRUC still needs to be performed, as MVfruc is necessary to identify if candidate 2 is MV1 , MV2, MV3... etc. Therefore: if candidate 2 is selected, FRUC operation is performed even if MVfruc is not used for actual motion vector reconstruction (MV = MVpredictor + MVresidual). Due to this problem, the encoding and decoding times are increased since obtaining MVfruc is disproportionately more computationally demanding than obtaining MV1 , MV2, MV3...

In order to reduce the above problems, the present disclosure provides an apparatus for decoding a video image including decoding a current prediction block from a bitstream, the apparatus including a processing circuitry. The processing circuitry, in operation:

- generates a set of candidate motion vectors for the current prediction block by assigning an index to each candidate motion vector based on a predefined rule, according to which indexes assigned to candidate motion vectors not obtained by template matching are independent of any candidate motion vector obtained by template matching; - parses from the bitstream an index for the current prediction block;

- determines a motion vector predictor for the prediction block as the candidate motion vector associated with the parsed index; and

- decodes the prediction block based on the determined motion vector predictor. In particular, the set of candidate motion vectors may be implemented as a list or a table of candidates in a storage included also in the apparatus or external to but accessible by the apparatus described above. The order of the above steps may be changed. For instance, the steps of generation and parsing may be executed in any order or parallel.

The parsing may include extracting from the bitstream and possibly also entropy decoding. The determination of the motion vector predictor may be performed on the basis of the candidate motion vector associated with the parsed index. Thus, the motion vector predictor associated with the parsed index may be further processed before its use as a predictor (e.g. it can be scaled, clipped, refined based on a rule, etc.). The decoding of the prediction block may be performed by reconstructing it based on the motion vector calculated based on a sum of the motion vector predictor and motion vector residuals which may be signaled, for instance in the bitstream. However, the present disclosure is not limited thereto and the motion vector may be determined by motion vector refinement rather than from the signaled motion vector residuals.

Figure 8 shows a procedure according to an embodiment of the invention. Figure 8 differs from Figure 7 in particular by the steps 810, 830, and 840. The remaining steps are similar and denoted with the same number as in Figure 7. In a first stage, if MVfruc is unavailable in step 720, a replacement motion vector is padded in the motion vector candidate list in step 810. The padded motion vector might be a pre-defined motion vector, such as zero motion vector (i.e. motion vector with coordinates (0,0)), or the padded motion vector could be MVN which is available to construct the motion vector candidate list. This guarantees that the position of the FRUC candidate and the second entry of the list are fixed. Identifying whether or not MVFruc is available is also computationally demanding. Therefore, adding a padding motion vector guarantees that the other candidates in the list are not shifted up to fill the empty position from unavailable MVFruc. MV1 or any of the following motion vectors is inserted to the candidate list without checking if it is identical to MVfruc or not. As a result, the second candidate in the motion vector list is always MV1 if it is available for insertion into candidate list and it can be obtained without obtaining the MVfruc. On the other hand, the identity/similarity check may be performed for the motion vectors which are not obtained by the template matching. This is shown in the second stage of the procedure in Figure 8. In step 830, it is judged whether or not MV1 is identical to any of motion vectors (except for the first candidate in the list which is MVFruc, i.e. a motion vector obtained by template matching) in candidate list. Similarly, in step 840 it is judged whether or not MV2 is identical to any motion vectors (except for first candidate in list-MVfruc) in candidate list.

The goal of the first step (stage 1 ) is to dedicate the first place in the motion vector candidate list to MVfruc (or padding MVfruc if actual MVfruc is not available). Therefore, the other candidates are not shifted in the list depending on the condition if MVfruc is available or not. It is noted that the MVfruc does not need to be on the first position in the list. Any position may be reserved for the MVfruc, as long as this position is known. With the padding approach it can be ensured that the remaining candidates stay independent of the value of the MVfruc and in particular, on its availability. The term "padding" here refers to a replacement with a default or predetermined value.

In stage 1 it is assumed that obtaining the information whether MVfruc is available or not is also computationally demanding, which is usually the case. MVfruc is marked as unavailable if for instance the template matching process cannot obtain a patch of samples in the reference picture that is similar to the template according to a similarity metric. In other words availability of MVfruc can be determined after the application of MV derivation/refinement with template matching operation. However if the determination of the availability is not computationally demanding in certain implementations of MVfruc derivation, step (stage) 1 can be omitted (no padding MV is inserted into the list if MVfruc is not available). In the second step (stage) the redundancy check between MVfruc and other motion vector candidates are skipped. Therefore inclusion of MV1 or MV2 or any other motion vector into the list does not depend on the value of MVfruc. The redundancy check identifies whether two motion vectors other than MVfruc are identical or not. However there might be other implementations (e.g. a vector might be tagged as redundant if it is very close in value to a second vector according to a distance metric).

Alternatively, MV1 , MV2, MVN can be obtained and inserted into the MV list first, then MVfruc is inserted into the first position of the motion vector list. Since according to Step 1 , the position of MVfruc is now fixed. This enables to construct the list while still the result of MVfruc determination are awaited (parallel processing).

Figure 9 shows such example. Figure 9 differs from Figure 8 by the sequence of performing the steps. In particular, steps 710, 810 and 730 are performed after terminating the motion vector construction for the motion vector candidates which are not obtained by template matching, i.e. in the Example of Figure 9 in steps 740 and 745 in the case of "Yes" when the list is short of only 1 candidate, where the last empty slot in the list is reserved for the MVfruc candidate.

The maximal number of MV candidates MAX_size can be signaled within the bitstream or defined in a standard. In particular, the signaling may be performed on sequence level or picture level parameter set within the bitstream. For instance in H.265/HEVC, the picture parameter set and the sequence parameter set may accommodate the MAX_size parameter as one of the parameters signaled for a plurality of pictures in a video sequence or for the entire video sequence. However, the present invention is not limited thereby. MAX_size may be signaled anywhere in the bitstream and on any level or granularity. In summary, at the decoder side, if a motion vector predictor candidate MVfruc is obtained using computationally intensive method than other motion vector predictor candidates, the motion vector candidate MVfruc is to be inserted in a pre-defined fixed position in the MV predictor list. The presence of the other motion vector predictor candidates in the motion vector list shall not depend on the value of MVfruc. At the encoder side, if a motion vector predictor candidate MVfruc is obtained using computationally intensive method than other motion vector predictor candidates, the motion vector candidate MVfruc is to be inserted in a pre-defined fixed position in the motion vector predictor list. The presence of the other motion vector predictor candidates in the motion vector list shall not depend on the value of MVfruc.

For example, at least one predefined index within the set of candidate motion vectors is reserved for a candidate motion vector obtained by template matching. The motion vector obtained by the template matching may for instance be a motion vector for which the motion vector refinement was applied as described with reference to Figure 5. Since the template matching employed in motion vector refinement is complex, by reserving a particular index for such motion vector, the decoder may insert it into the motion vector list at any time. Moreover, it does not have to be inserted into list at all without influencing the further list construction in some cases.

If the candidate motion vector obtained by template matching is not available, motion vector assigned to the predefined index may be set to a predefined value. For example, the predefined index may be the first index in the list. However, any index may be predefined. The predefining may be included in the standard so that encoder and decoder operate in the same manner as soon as they operate according to the standard. Alternatively, it may be signaled in the bitstream. For instance, the predefined value is a zero motion vector.

However, this is only an example and, in general, any value which is obtained in a predefined manner such as taking a motion vector of a particular temporally or spatially adjacent block may be applied to obtain such predetermined value.

The invention is not limited to a single entry in the list for a motion vector obtained by template matching. In general one or more such candidates may be provided in respective one or more of the list entries. This may be relevant, for instance, when the different candidates are obtained with template matching in different reference pictures. In such cases, it may be advantageous that the processing circuitry is further configured to generate the set of candidate motion vectors so that indices assigned to a plurality of respective candidates obtained by template matching are also independent of each other. In this way, there is no unnecessary dependence between the candidates obtained by template matching which is beneficial since even if one of them is selected, the remaining candidates do not have to be calculated at the decoder but may be padded.

According to an exemplary implementation, the processing circuitry is further configured to, during the generation of the set, check whether currently inserted motion vector is already included in the set; and include the currently inserted motion vector into the set only if a similar motion vector is not already included in the set minus any candidate motion vector obtained by template matching. In other words, the redundancy check (also called pruning of the set/list) is performed for the motion vectors to be included into the list, but does not compare them with the candidates in the list obtained by template matching. As already discussed above, this approach avoids any dependencies of the list construction from the motion vector candidates obtained by the template matching. Thus, at the decoder, if the motion vector obtained by template matching is selected by the parsed index, it is also obtained. If it is not selected, then it does not have to be obtained since it is not necessary for determining the other entries of the list that need application of the template matching. In one embodiment, for instance as exemplified in Figure 9, the processing circuitry is configured to generate the set of candidate motion vectors for the current prediction block by assigning an index to each candidate motion vector except for any of j candidate motion vectors obtained by template matching, the index having respective values from the first index up to the last but j indexes, j being an integer larger than zero; and to assign the last j indexes to the respective j candidate motion vectors obtained by template matching.

Alternatively, the processing circuitry may be configured to: generate the set of candidate motion vectors for the current prediction block by: - assigning an index to candidate motion vectors except for any of j candidate motion vectors obtained by template matching, the index having respective values from the j- th plus one index up to the last index, j being an integer larger than zero,

- after the assigning of index to the candidate motion vectors not obtained by template matching, assigning to the candidate motion vectors obtained by template matching respective indexes from the first index to the j-th index.

The index assigning at least for the candidate motion vectors not obtained by template matching here includes:

- checking whether currently assigned motion vector is already included in the set and - assigning the index to the currently assigned motion vector only if a similar motion vector is not already included in the set.

This approach enables an efficient implementation of the above embodiment. In one implementation, the value of the first index might start from 0. However, the present disclosure is not limited by any particular starting value of the index, which is a mere implementation issue.

In the above examples, the complexity could be reduced by avoiding dependency of the motion vector candidate values on the motion vector derived using template matching. However as a side effect, the coding gain might be reduced since redundant motion vector candidates (or dummy/padding MVs) are added to the motion vector predictor candidate list. In order to avoid such situation, (Figure 10) MV2 is added to the motion vector list, if MVfruc is not available or if MVfruc is identical to any of the candidates in the list that do not apply template matching. In other words, according to a second embodiment, list construction process is continued even after the MV predictor list is full.

This enables the following operation of the decoder: As also described above for the first embodiment, if the candidate obtained without template matching is selected by an index which is coded in the bitstream, no FRUC operation needs to be performed. If the candidate obtained by template matching is selected, FRUC operation needs to be performed to obtain MVfruc. According to the second embodiment, MVfruc is thus replaced with a better predictor if it is not available and/or redundant. This is illustrated in Figure 10. In particular, in step 1010 if MVfruc is not available (and possibly padding motion vector has been inserted) or another motion vector in the list that does not apply template matching is identical to MVfructhen a motion vector candidate that is not obtained by template matching is inserted in the list to replace MVfruc (provided that the candidate is not identical to candidates already in the list which are not obtained by template matching). As a specific example, firstly the operations in figure 8 are applied in the following order 710, 720, 730, 740, 750, 830, 735 and 745, after which the Max_size of 2 is reached and the operation is terminated. According to the example, since the said specific path of operations are applied, it is known that the MV predictor list includes MVfruc and MV1 candidates. According to figure 10, further steps are applied to modify the already completed predictor list (the size limit of the list was reached, which was assumed to be 2 in the example), namely the step 1010 checks whether MVfruc is available (since step 810 is not applied, MVfruc is available in this example) and if it is identical to any of the candidates in the list that do not apply template matching (MV1 in this case). Then MV2 is obtained 1020 from top and top-right spatial block neighbor. MV2 is assumed to be available here. If MV2 is identical (similar) to MVfruc in step 1030, then more candidates are checked by proceeding to step(s) 1040. If MV2 is not identical (similar) to MVfruc in step 1030, MV2 is inserted into the candidate list in step 1050, for instance in the position of the MVfruc or the other motion vector identical to MVfruc. Once the MVfruc (or padding MV) is replaced, the operation terminates in step 1060.

The second embodiment maintains the complexity reduction benefits of the first embodiment. In figure 8 the first candidate is MVfruc (or a padding MV if MVfruc not available. If candidate 2 is selected - maybe 50% of the time - the FRUC operation does not need to be performed. The complexity of obtaining candidate 1 in the list (motion vector index used primarily for motion vector obtained by template matching in figure 8) is increased only slightly in order to improve the coding gain. The same can be said for figure 9, where only the position of the MVfruc is different (last position instead of first position). According to an implementation of the second embodiment, if the candidate motion vector obtained by template matching is not available or has a value already included in the set, the predefined index is assigned a motion vector of a previously decoded prediction block or a motion vector obtained as a function of one or more motion vectors of respective previously decoded prediction blocks. The function, for example, can be a motion vector averaging of or motion vectors.

It is noted that the predefined index is primarily still reserved for the motion vector obtained by template matching. However, if this motion vector is unavailable, or same as another already included motion vector, then the index is assigned to another motion vector not obtained by the template matching but rather obtained in a predefined manner such as inheriting from an adjacent block (temporal or spatial neighboring block) or as a function of motion vectors inherited from the adjacent blocks. The function may be, for instance, an average or a weighted average or the like. This may be understood as a kind of adaptive padding performed according to a rule which defines the order of the motion vector candidates to be tested for insertion into the list. The rule may be the same as the rule used to construct the list in general as described above with reference to Figures 8 and 9.

The above two embodiments target the decoding operation and they allow reducing especially the decoding time when motion vector not obtained by template matching is selected by the index parsed from the bitstream. It is noted that this approach still requires application at the encoder, too and is thus also relevant for the encoder. The encoder implementation in this manner contributes to the complexity reduction at the decoder.

However, in the encoder itself the encoding time reduction is not achieved since the encoder needs to check both alternatives in the motion vector list in order to signal the best option to the decoder, i.e. it needs to check and thus also provide the motion vector obtained by template matching and decide whether or not to select it. In order to further enable encoder complexity reduction, according to a third embodiment, a flag is signaled in the bitstream to indicate that the MVfruc candidate is not inserted into the motion vector list. The flag may be signaled on a slice, CTU, frame or sequence level.

This flag enables the encoder, for instance for certain video portions (slices orframes or groups of frames etc), to control the encoding time and complexity by using the flag. For instance, the decoder may switch on or off the presence of the motion vector candidate obtained for the current block by the template matching in the motion vector prediction candidate list. Thus, the encoder may control its complexity. This control may be performed based on the rate- distortion-complexity optimization. However, this is only an example. The control may be performed also according to different parameters, for instance based on the motion character and speed of the coded content, or the like. An encoder, as an example, can decide not to insert the MVfruc candidate if hectic motion is detected based on the distribution of the motion vectors, since template matching is known to perform not good in the cases of complex motion. As another example, the MVfruc candidate can be disabled, if the reference frame is far away from the current frame in terms of time difference or a frame number (Picture order count) difference.

In accordance with this third embodiment, the processing circuitry of the decoder is configured to parse from the bitstream a flag indicating for an image data unit whether or not candidate motion vectors obtained by template matching are allowed to be inserted into the set. The image data unit here is for instance: - one or a plurality of frames, e.g. the signalling of the flag may be included in a picture parameter set or a sequence parameter set;

- slice or tile, which are independently decodable image (frame portions). Independently here refers to the entropy coding and/or spatial prediction and the independence means independence on other slice / tiles. In particular, the signaling may be done via slice header or tile signaling information;

- CTB (coding tree block, also called coding tree unit, CTU), which is the largest coding unit which may be further hierarchically partitioned into coding units that are finally predicted by inter or intra prediction as described above with reference to Figures 1 and 2.

It is noted that in the above embodiments, the predefined index reserved for a candidate motion vector obtained by template matching for a previously decoded prediction block may be in particular the first or the last index within the set. However, the invention is equally applicable for any other indexes, too. In the above embodiments, the size of motion vector list is usually shown as 2, which is for the sake of easy illustration. However, in practice, the motion vector candidate list length can be different. The redundancy checking operation (identifying whether two motion vectors are identical, similar or unnecessary) can be implemented in different ways. For instance by comparing the identity of the two motion vectors or by comparing their difference to a certain threshold, by using different metrics for similarity, or the like. The positions of MVFruc and other motion vectors in the list as shown can be changed in the motion vector list. However it should be clear to the decoder which positions in the list are reserved for them (without performing FRUC operation) so that the corresponding operation of the encoder and the decoder is ensured. In general, the above methods and apparatuses may be applicable for construction of a motion vector candidate list with any two types of motion vector candidates of which one is obtained with substantially higher complexity than the other one. In the above examples, the high- complexity approach was template matching as compared to calculating a function of motion vectors inherited from the neighboring blocks or merely taking such motion vectors as respective candidates. However, the above disclosure may be directly applied to any other types of motion vector and provides the advantage of reducing delay and/or complexity of the list construction. As an example, it is possible that one of the motion vector candidates might require averaging of multiple different motion vectors. In one implementation of the decoder, the processing circuitry evaluates an index extracted from the bitstream. If it is an index which is not reserved to be assigned to the motion vector obtained by template matching, then the processing circuitry does not obtain a motion vector with template matching but rather constructs the list of motion vector candidates according to the above mentioned predefined rule, independently of the such motion vector, and takes the motion vector pointed to by the index in the constructed list as a predictor for the current motion vector. On the other hand, if the extracted index is the index associated with the motion vector determined by template matching, the decoder determined the motion vector by template matching and uses it as the predictor. If - in case the extracted index is reserved to a motion vector to be determined by template matching - the motion vector is unavailable, then the processing circuitry may handle according to a predefined rule and replace the unavailable motion vector with a predefined value or a value of motion vectors determined based on the previously processed one or more adjacent blocks.

The above portion of the description focuses on the decoder side. However, the present disclosure also provides the corresponding encoder.

The encoder serves for encoding a current prediction block into a bitstream and includes a processing circuitry configured to: generate a set of candidate motion vectors for the current prediction block by assigning an index to each candidate motion vector based on a predefined rule, according to which indexes assigned to candidate motion vectors not obtained by template matching are independent of a value / availability ofany candidate motion vector obtained by template matching; determine a motion vector predictor for a motion vector of the current prediction block as one of the candidate motion vectors associated with an index; include into the bitstream the index of the determined motion vector predictor for the current prediction block; and encode the prediction block based on the motion vector for which the motion vector predictor is determined.

The decoder and the encoder may include the processing circuitry 1 100 as illustrated in Figure 1 1 . The processing circuitry may include any hardware and the configuration may be implemented by any kind of programming or hardware design of a combination of both. For instance, the processing circuitry may be formed by a single processor such as general purpose processor with the corresponding software implementing the above steps. On the other hand, the processing circuitry may be implemented by a specialized hardware such as an ASIC (Application-Specific Integrated Circuit) or FPGA (Field-Programmable Gate Array) of a DSP (Digital Signal Processor) or the like. The processing circuitry may include one or more of the above mentioned hardware components interconnected for performing the above motion vector candidate list construction and pruning. The processing circuitry 1100 includes computation logic which implements construction of the motion vector predictor candidate list 1110 and motion vector prediction 1120. These two functionalities may be implemented on the same piece of hardware or may be performed by separate units of hardware such as list construction unit 1110 and motion vector prediction unit 1120. The processing circuitry 1100 may be communicatively connected to an external memory 1150. Moreover, the processing circuitry 1100 may further include an internal memory 1140. The processing circuitry may be embodied on a single chip as an integrated circuit. The internal memory 1 140 may serve for storing the list of motion vectors whereas the external memory may store additional parameters, reference pictures for performing template matching, or the like. It is further noted that the index signaled in the bitstream to identify a candidate within the list may also be provided by the motion vector prediction circuitry to (encoder) an entropy coding circuitry or obtained therefrom (decoder). The entropy coding circuitry may be implemented as a part of the processing circuitry or separately.

It is noted that the processing circuitry may implement further functions of the encoder and/or decoder described with reference to Figures 1 and 2. The internal memory may be an on-chip memory such as a cache. Chip memory is advantageously implemented on the encoder/decoder chip to speed up computations. Since the size of the chip is limited, the on- chip memory is usually small. On the other hand, the external memory can be very large in size, however the access to external memory consumes more energy and the access is typically slower. Usually the all necessary information is retrieved from the external memory to on-chip memory before the computations are performed. The term "prediction block" employed above refers to the current block which is to be predicted. It is a block within the image which may be obtained by subdividing the image into equally sized or differently sized (for instance by hierarchical partitioning of a coding tree unit, CTU into the smaller units) blocks. The block may be square or more generally rectangular as these are the typical shapes also employed in current encoders / decoders. However, the present disclosure is not limited by any size / shape of the block.

The apparatus including the processing circuit may be the encoder or decoder or even an apparatus including such encoder or decoder, for instance a recording device and/or a playback device. The present disclosure further provides the corresponding methods which perform steps as already described above with reference to the operations implemented by the processing circuitry. For example, the present disclosure further provides a method for decoding a video image including decoding a current prediction block from a bitstream. The method includes generating a set of candidate motion vectors for the current prediction block by assigning to each index a candidate motion vector based on a predefined rule, according to which indexes assigned to candidate motion vectors which are not obtained by template matching are independent of a value and/or availability of any candidate motion vector obtained by template matching. Moreover, the method further includes parsing from the bitstream an index for the current prediction block; determining a motion vector predictor for the prediction block as the candidate motion vector associated with the parsed index; and decoding the prediction block based on the determined motion vector predictor.

Moreover, the present disclosure also provides a method for encoding a video image corresponding to the above summarized method for decoding and also including the generating a set of candidate motion vectors for the current prediction block by assigning an index to each candidate motion vector based on a predefined rule, according to which indexes assigned to candidate motion vectors not obtained by template matching are independent of a value and/or availability of any candidate motion vector obtained by template matching. Moreover, the method further comprises determining a motion vector predictor for a motion vector of the current prediction block as one of the candidate motion vectors associated with an index; including into the bitstream the index of the determined motion vector predictor for the current prediction block; and encoding the prediction block based on the motion vector for which the motion vector predictor is determined. The result of the encoding is a bitstream.

The motion vector determination with sample padding as described above can be implemented as a part of encoding and/or decoding of a video signal (motion picture). However, the motion vector determination may also be used for other purposes in image processing such as movement detection, movement analysis, or the like without limitation to be employed for encoding / decoding.

The motion vector determination may be implemented as an apparatus. Such apparatus may be a combination of a software and hardware. For example, the motion vector determination may be performed by a chip such as a general purpose processor, or a digital signal processor (DSP), or a field programmable gate array (FPGA), or the like. However, the present invention is not limited to implementation on a programmable hardware. It may be implemented on an application-specific integrated circuit (ASIC) or by a combination of the above mentioned hardware components. The motion vector determination may also be implemented by program instructions stored on a computer readable medium. The program, when executed, causes the computer to perform the steps of the above described methods. The computer readable medium can be any medium on which the program is stored such as a DVD, CD, USB (flash) drive, hard disc, server storage available via a network, etc.

The encoder and/or decoder may be implemented in various devices including a TV set, set top box, PC, tablet, smartphone, or the like, i.e. any recording, coding, transcoding, decoding or playback device. It may be a software or an app implementing the method steps and stored / run on a processor included in an electronic device as those mentioned above. Summarizing, the present disclosure relates to motion vector determination and in particular to construction of a list of candidate motion vectors for the purpose of motion vector prediction. In particular, if a motion vector predictor candidate is obtained using computationally intensive method than other motion vector predictor candidates, the motion vector candidate is to be inserted in a pre-defined fixed position in the MV predictor list. The presence of the other motion vector predictor candidates in the motion vector list does not depend on the value or the availability of the motion vector obtained by the computationally intensive approach.

Claims

An apparatus for decoding a video image including decoding a current prediction block from a bitstream, the apparatus including a processing circuitry configured to: generate a set of candidate motion vectors for the current prediction block by assigning an index to each candidate motion vector based on a predefined rule, according to which indexes assigned to candidate motion vectors not obtained by template matching are independent of a value and/or availability of any candidate motion vector obtained by template matching; parsing from the bitstream an index for the current prediction block; determine a motion vector predictor for the prediction block as the candidate motion vector associated with the parsed index; and decode the prediction block based on the determined motion vector predictor.

The apparatus according to claim 1 , wherein at least one predefined index within the set is reserved for a candidate motion vector obtained by template matching.

The apparatus according to claim 2, wherein if the candidate motion vector obtained by template matching is not available, motion vector assigned the predefined index is set to a predefined value.

The apparatus according to claim 3, wherein the predefined value is a zero motion vector.

The apparatus according to claim 2 wherein if the candidate motion vector obtained by template matching is not available or has a value already included in the set or has a value similar to a value already included in the set, the predefined index is assigned a motion vector of a previously decoded prediction block or a motion vector obtained as a function of one or more motion vectors of respective predefined previously decoded prediction blocks, the similarity being measured by thresholding difference between the value obtained by template matching and the values already included in the set.

The apparatus according to any of claims 1 to 5, wherein the processing circuitry is further configured to generate the set of candidate motion vectors so that indices assigned to a plurality of respective candidates obtained by template matching are also independent of each other's motion vector value and/or availability.

The apparatus according to any of claims 1 to 6, wherein the processing circuitry is further configured to, during the generation of the set, check whether currently inserted motion vector is already included in the set; and include the currently inserted motion vector into the set only if a similar motion vector is not already included in the set minus any candidate motion vector obtained by template matching.

The apparatus according to any of claims 1 to 7, wherein the processing circuitry is configured to: generate the set of candidate motion vectors for the current prediction block by assigning an index to each candidate motion vector except for any of j candidate motion vectors obtained by template matching, the index having respective values from the first index up to the last but j indexes, j being an integer larger than zero; and assign the last j indexes to the respective j candidate motion vectors obtained by template matching. The apparatus according to claim 1 or 2, wherein the processing circuitry is configured to: generate the set of candidate motion vectors for the current prediction block by:

- assigning an index to candidate motion vectors except for any of j candidate motion vectors obtained by template matching, the index having respective values from the j-th plus one index up to the last index, j being an integer larger than zero,

- after the assigning of index to the candidate motion vectors not obtained by template matching, assigning to the candidate motion vectors obtained by template matching respective indexes from the first index to the j-th index; wherein the index assigning at least for the candidate motion vectors not obtained by template matching includes:

- checking whether currently assigned motion vector is already included in the set and

- assigning the index to the currently assigned motion vector only if a similar motion vector is not already included in the set.

The apparatus according to any of claims 2 to 9, wherein the predefined index reserved for a candidate motion vector obtained by template matching for a previously decoded prediction block is the first or the last index within the set.

The apparatus according to any of claims 1 to 10, wherein the processing circuitry is configured to: parse from the bitstream a flag indicating for an image data unit whether or not candidate motion vectors obtained by template matching are allowed to be inserted into the set.

The apparatus according to claim 1 1 , wherein the image data unit is slice or coding tree block, the flag is signaled in a sequence parameter set.

3. An apparatus for encoding a video image including encoding a current prediction block into a bitstream, the apparatus including a processing circuitry configured to: generate a set of candidate motion vectors for the current prediction block by assigning an index to each candidate motion vector based on a predefined rule, according to which indexes assigned to candidate motion vectors not obtained by template matching are independent of a value and/or availability of any candidate motion vector obtained by template matching; determine a motion vector predictor for a motion vector of the current prediction block as one of the candidate motion vectors associated with an index; and including into the bitstream the index of the determined motion vector predictor for the current prediction block; encode the prediction block based on the motion vector for which the motion vector predictor is determined.

4. A method for decoding a video image including decoding a current prediction block from a bitstream, the method including the steps of: generating a set of candidate motion vectors for the current prediction block by assigning an index to each candidate motion vector based on a predefined rule, according to which indexes assigned to candidate motion vectors not obtained by template matching are independent of a value and/or availability of any candidate motion vector obtained by template matching; parsing from the bitstream an index for the current prediction block; determining a motion vector predictor for the prediction block as the candidate motion vector associated with the parsed index; and decoding the prediction block based on the determined motion vector predictor. A method for encoding a video image including encoding a current prediction block into a bitstream, the method including the steps of: generating a set of candidate motion vectors for the current prediction block by assigning an index to each candidate motion vector based on a predefined rule, according to which indexes assigned to candidate motion vectors not obtained by template matching are independent of a value and/or availability of any candidate motion vector obtained by template matching; determining a motion vector predictor for a motion vector of the current prediction block as one of the candidate motion vectors associated with an index; including into the bitstream the index of the determined motion vector predictor for the current prediction block; and encoding the prediction block based on the motion vector for which the motion vector predictor is determined.

Computer readable medium storing instructions which when executed on a processor cause the processor to perform the method according to any of claims 14 or 15.