EP3685583A1 - Template update for motion vector refinement - Google Patents
Template update for motion vector refinementInfo
- Publication number
- EP3685583A1 EP3685583A1 EP17781115.5A EP17781115A EP3685583A1 EP 3685583 A1 EP3685583 A1 EP 3685583A1 EP 17781115 A EP17781115 A EP 17781115A EP 3685583 A1 EP3685583 A1 EP 3685583A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- motion vector
- template
- image
- updated
- prediction block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
- H04N19/517—Processing of motion vectors by encoding
- H04N19/52—Processing of motion vectors by encoding by predictive encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/533—Motion estimation using multistep search, e.g. 2D-log search or one-at-a-time search [OTS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/56—Motion estimation with initialisation of the vector search, e.g. estimating a good candidate to initiate a search
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/563—Motion estimation with padding, i.e. with filling of non-object values in an arbitrarily shaped picture block or region for estimation purposes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/57—Motion estimation characterised by a search window with variable size or shape
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/573—Motion compensation with multiple frame prediction using two or more reference frames in a given prediction direction
Definitions
- the present invention relates to the field of video coding and in particular to motion vector refinement applicable in an inter-prediction.
- a picture of a video sequence is subdivided into blocks of pixels and these blocks are then coded. Instead of coding a block pixel by pixel, the entire block is predicted using already encoded pixels in the spatial or temporal proximity of the block.
- the encoder further processes only the differences between the block and its prediction.
- the further processing typically includes a transformation of the block pixels into coefficients in a transformation domain.
- the coefficients may then be further compressed by means of quantization and further compacted by entropy coding to form a bitstream.
- the bitstream further includes any signaling information which enables the decoder to decode the encoded video.
- the signaling may include settings concerning the encoder settings such as size of the input picture, frame rate, quantization step indication, prediction applied to the blocks of the pictures, or the like.
- Temporal prediction exploits temporal correlation between pictures, also referred to as frames, of a video.
- the temporal prediction is also called inter-prediction, as it is a prediction using the dependencies between (inter) different video frames.
- a block being encoded also referred to as a current block
- a reference picture is not necessarily a picture preceding the current picture in which the current block is located in the displaying order of the video sequence.
- the encoder may encode the pictures in a coding order different from the displaying order.
- a co-located block in a reference picture may be determined.
- the co-located block is a block which is located in the reference picture on the same position as the current block in the current picture.
- Such prediction is accurate for motionless picture regions, i.e. picture regions without movement from one picture to another.
- motion estimation is typically employed when determining the prediction of the current block.
- the current block is predicted by a block in the reference picture, which is located in a distance given by a motion vector from the position of the co-located block.
- the motion vector may be signaled in the bitstream.
- the motion vector itself may be estimated at the encoder and decoder. The motion vector estimation may be performed based on the motion vectors of the neighboring blocks in spatial and/or temporal domain.
- the prediction of the current block may be computed using one reference picture or by weighting predictions obtained from two or more reference pictures.
- the reference picture may be an adjacent picture, i.e. a picture immediately preceding and/or the picture immediately following the current picture in the display order since adjacent pictures are most likely to be similar to the current picture.
- the reference picture may be also any other picture preceding or following the current picture in the displaying order and preceding the current picture in the bitstream (decoding order). This may provide advantages for instance in case of occlusions and/or non-linear movement in the video content.
- the reference picture identification may thus be also signaled in the bitstream.
- a special mode of the inter-prediction is a so-called bi-prediction in which two reference pictures are used in generating the prediction of the current block.
- two predictions determined in the respective two reference pictures are combined into a prediction signal of the current block.
- the bi-prediction may result in a more accurate prediction of the current block than the uni-prediction, i.e. prediction only using a single reference picture.
- the more accurate prediction leads to smaller differences between the pixels of the current block and the prediction (referred to also as "residuals"), which may be encoded more efficiently, i.e. compressed to a shorter bitstream.
- more than two reference pictures may be used to find respective more than two reference blocks to predict the current block, i.e. a multi- reference inter prediction can be applied.
- the term multi-reference prediction thus includes bi- prediction as well as predictions using more than two reference pictures.
- the resolution of the reference picture may be enhanced by interpolating samples between pixels. Fractional pixel interpolation can be performed by weighted averaging of the closest pixels. In case of half-pixel resolution, for instance a bilinear interpolation is typically used. Other fractional pixels are calculated as an average of the closest pixels weighted by the inverse of the distance between the respective closest pixels to the pixel being predicted.
- the terms "pixef and "sample” are employed interchangeably. From an image sensor pixel, a single value may be read out, namely brightness (luminance). However, in color imaging, a plurality of values per pixel pertaining to different color components may be read-out or provided later by interpolation.
- the motion vector estimation is a computationally complex task in which a similarity is calculated between the current block and the corresponding prediction blocks pointed to by candidate motion vectors in the reference picture.
- the search region includes M x M samples of the image and each of the sample position of the M x M candidate positions is tested.
- the test includes calculation of a similarity measure between the N x N reference block C and a block R, located at the tested candidate position of the search region.
- SAD sum of absolute differences
- x and y define the candidate position within the search region, while indices i and j denote samples within the reference block C and candidate block R.
- the candidate position is often referred to as block displacement or offset, which reflects the representation of the block matching as shifting of the reference block within the search region and calculating a similarity between the reference block C and the overlapped portion of the search region.
- the number of candidate motion vectors is usually reduced by limiting the candidate motion vectors to a certain search space.
- the search space may be, for instance, defined by a number and/or positions of pixels surrounding the position in the reference picture corresponding to the position of the current block in the current image.
- the best matching block R is the block on the position resulting in the lowest SAD, corresponding to the largest similarity with reference block C.
- the candidate motion vectors may be defined by a list of candidate motion vectors formed by motion vectors of neighboring blocks.
- Motion vectors are usually at least partially determined at the encoder side and signaled to the decoder within the coded bitstream.
- the motion vectors may also be derived at the decoder.
- the current block is not available at the decoder and cannot be used for calculating the similarity to the blocks to which the candidate motion vectors point in the reference picture. Therefore, instead of the current block, a template is used which is constructed out of pixels of already decoded blocks. For instance, already decoded pixels adjacent to the current block may be used.
- Such motion estimation provides an advantage of reducing the signaling: the motion vector is derived in the same way at both the encoder and the decoder and thus, no signaling is needed. On the other hand, the accuracy of such motion estimation may be lower.
- a motion vector derivation may include selection of a motion vector from the list of candidates.
- Such a selected motion vector may be further refined for instance by a search within a search space.
- the search in the search space is based on calculating cost function for each candidate motion vector, i.e. for each candidate position of block to which the candidate motion vector points.
- the present disclosure provides motion vector prediction, based on updating the template used for inter prediction and in particular for bi- prediction or, in general, multi-frame prediction.
- the present disclosure provides a technique in which a motion vector for a prediction block is determined. Based on a provided initial motion vector and a provided template, the motion vector is refined by template matching with a template. The refined motion vector points to image samples, which are used to update the template. Using the refined motion vector and the updated template, the motion vector is further refined by another iteration of template matching.
- an apparatus for determination of a motion vector for an image prediction block including a processing circuitry configured to: determine a refinement of an initial motion vector for the image prediction block by template matching with an initial template to generate a refined motion vector; generate an updated template based on the image samples pointed to by the refined motion vector; determine an updated motion vector for the image prediction block by template matching with the updated template in a search space including a plurality of candidate motion vector positions.
- Such apparatus may provide an advantage of more precise motion vector refinement since the template is adapted to the picture portions pointed to by the refined motion vectors.
- the processing circuitry may be any combination of hardware and/or software including one or more hardware pieces.
- the processing circuit may be configured to iteratively, over i being an integer larger than 1 , repeat the following steps of i-th iteration: generating an i-th update of the template based on image samples pointed to by a refined motion vector obtained in the (i-1 )-th iteration; and determining an i-th update of the refined motion vector for the image prediction block by template matching with the i-th update of the template.
- Performing more iterations may further improve the accuracy of the motion vector refinement and thus improve the prediction and thus also encoding performance.
- the processing circuitry is configured to determine, for the image prediction block, a refinement of a first initial motion vector pointing to a first picture and a refinement of a second initial motion vector pointing to a second picture by template matching with an initial template to generate a respective first refined motion vector and second refined motion vector; generate an updated template as a function of image samples pointed to by the first refined motion vector and second refined motion vector; and determine, for the image prediction block, a first updated motion vector and a second updated motion vector by template matching with the updated template in the respective first picture and second picture.
- the processing circuit may be configured to iteratively, over i being an integer larger than 1 , repeat the following steps of i-th iteration: generating an i-th update of the template based on image samples pointed to by a first refined motion vector and the second refined motion vector obtained in the (i-1 )-th iteration; and determining an i-th update of the first refined motion vector and the second refined motion vector for the image prediction block by template matching with the i-th update of the template.
- the processing circuitry is configured to: obtain a first initial motion vector and a second initial motion vector; determine, for the image prediction block, a refinement of the first initial motion vector pointing to a first picture by template matching with an initial template in the first picture to generate a respective first refined motion vector; generate an updated template as a function of image samples pointed to by the first refined motion vector and the second initial motion vector; determine, for the image prediction block, a refinement of the second initial motion vector pointing to a second picture by template matching with the updated template in the second picture to generate a second refined motion vector; and generate the updated template as a function of image samples pointed to by the first refined motion vector and the second refined motion vector.
- the processing circuit can be configured to iteratively, over i being an integer larger than 1 , repeat the following steps of i-th iteration: determine, for the image prediction block, an i-th refinement of the first motion vector pointing to the first picture by template matching with the updated template in the first picture; generate an i-th first-direction update of the template as a function of image samples pointed to by the i-th refinement of the first motion vector and the (i-1 )-th refinement of the second motion vector; determine, for the image prediction block, an i-th refinement of the second initial motion vector pointing to the second picture by template matching with the i-th first-direction-update of the template in the second picture; and generate the i-th second-direction update of the template as a function of image samples pointed to by the i-th refinement of the first motion vector and the i-th refinement of the second refined motion vector.
- the updated template may be generated as a function of image samples in a block pointed to by the refined motion vector and/or updated motion vector and the function includes a weighted average of the image samples.
- the template may have the shape and size of the image prediction block, the image prediction block being a rectangle of a preconfigured size.
- the number of iterations may be a predefined number and the processing circuitry is, according to an exemplary implementation, further configured to stop the interactive refinement of the motion vectors and block templates before the predefined number is reached if a predefined condition is met, the predefined condition being one or a combination of the following: - There is no change between the updated motion vector after iteration i and the updated motion vector after iteration i+1 , i being 0 corresponding to the initial motion vector or a non-zero number. - The difference between the matching cost after iteration i+1 and after iteration i is below a certain threshold.
- a result of adding the length of the motion vector after iteration i along the x axis to the top-left coordinate of a prediction unit and to a block width is below a fifth threshold.
- a result of adding the length of the motion vector after iteration i along the x axis to the top-left coordinate of a prediction unit exceeds a sixth threshold.
- a result of adding the length of the motion vector after iteration i along the y axis to the top-left coordinate of a prediction unit and to a block height is below a seventh threshold.
- a result of adding the length of the motion vector after iteration i along the y axis to the top-left coordinate of a prediction unit exceeds a eight threshold.
- an apparatus for encoding a video image comprising: the apparatus according to any of the above embodiments and examples for determination of a motion vector for an image prediction block and an image coding circuitry configured to perform video image coding of the image prediction block based on predictive coding using the determined motion vector and generating a bitstream including the coded image prediction block.
- the predictive coding may include using motion vector to generate predictor for the currently coded prediction block.
- the predictor is determined as a picture portion pointed to by the determined motion vector corresponding in size and form to the prediction block. Then a difference is formed between the current prediction block and the predictor. The difference is further coded.
- the further coding may include linear transformation, quantization and entropy coding to generate the bitstream.
- an apparatus for decoding a video image from a bitstream comprising: a bitstream parser for extracting from the bitstream portions corresponding to a compressed video image including compressed image prediction block to be decoded; and the apparatus according to any of the above embodiments and examples for determination of a motion vector for the image prediction block; as well as an image reconstruction circuitry configured to perform image reconstruction of the image prediction block based on the motion vector.
- the image reconstruction may include adding the differences to the predictor.
- the predictor may be obtained based on the refined motion vector.
- a method for determination of a motion vector for an image prediction block including: determining a refinement of an initial motion vector for the image prediction block by template matching with an initial template to generate a refined motion vector; generating an updated template based on the image samples pointed to by the refined motion vector; and determining an updated motion vectorfor the image prediction block by template matching with the updated template in a search space including a plurality of candidate motion vector positions.
- the method above may operate iteratively, over i being an integer larger than 1 , repeating the following steps of i-th iteration: generating an i-th update of the template based on image samples pointed to by a refined motion vector obtained in the (i-1 )-th iteration; determining an i-th update of the refined motion vector for the image prediction block by template matching with the i-th update of the template.
- the method according to an embodiment may include the steps of: determining, for the image prediction block, a refinement of a first initial motion vector pointing to a first picture and a refinement of a second initial motion vector pointing to a second picture by template matching with an initial template to generate a respective first refined motion vector and second refined motion vector; generating an updated template as a function of image samples pointed to by the first refined motion vector and second refined motion vector; and determining, for the image prediction block, a first updated motion vector and a second updated motion vector by template matching with the updated template in the respective first picture and second picture.
- the method can also iteratively, over i being an integer larger than 1 , repeat the following steps of i-th iteration: generating an i-th update of the template based on image samples pointed to by a first refined motion vector and the second refined motion vector obtained in the (i-1 )-th iteration; and determining an i-th update of the first refined motion vector and the second refined motion vector for the image prediction block by template matching with the i-th update of the template.
- the method may further include the following steps: obtaining a first initial motion vector and a second initial motion vector; determining, for the image prediction block, a refinement of the first initial motion vector pointing to a first picture by template matching with an initial template in the first picture to generate a respective first refined motion vector; generating an updated template as a function of image samples pointed to by the first refined motion vector and the second initial motion vector; determining, for the image prediction block, a refinement of the second initial motion vector pointing to a second picture by template matching with the updated template in the second picture to generate a second refined motion vector; and generating the updated template as a function of image samples pointed to by the first refined motion vector and the second refined motion vector.
- the method of an embodiment iteratively, over i being an integer larger than 1 , repeats the following steps of i-th iteration: determining, for the image prediction block, an i-th refinement of the first motion vector pointing to the first picture by template matching with the updated template in the first picture; generating an i-th first-direction update of the template as a function of image samples pointed to by the i-th refinement of the first motion vector and the (i-1 )-th refinement of the second motion vector; determining, for the image prediction block, an i-th refinement of the second initial motion vector pointing to the second picture by template matching with the i-th first-direction-update of the template in the second picture; and generating the i-th second-direction update of the template as a function of image samples pointed to by the i-th refinement of the first motion vector and the i-th refinement of the second refined motion vector.
- the updated template is generated as a function of image samples in a block pointed to by the refined motion vector and/or updated motion vector and the function includes a weighted average of the image samples.
- the template has the shape and size of the image prediction block, the image prediction block being a rectangle of a preconfigured size.
- the number of iterations is advantageously a predefined number and the processing circuitry is further configured to stop the interactive refinement of the motion vectors and block templates before the predefined number is reached if a predefined condition is met, the predefined condition being one or a combination of the following:
- a result of adding the length of the motion vector after iteration i along the x axis to the top-left coordinate of a prediction unit and to a block width is below a fifth threshold.
- a result of adding the length of the motion vector after iteration i along the y axis to the top-left coordinate of a prediction unit and to a block height is below a seventh threshold.
- a method for encoding a video image comprising: determining of a motion vector for an image prediction block according to any of the above methods; and performing video image coding of the image prediction block based on predictive coding using the determined motion vector and generating a bitstream including the coded image prediction block.
- a method for decoding a video image from a bitstream comprising: extracting from the bitstream portions corresponding to a compressed video image including compressed image prediction block to be decoded; determining of a motion vector for the image prediction block according to any of the above the methods; and performing image reconstruction of the image prediction block based on the motion vector.
- a non-transitory computer-readable storage medium storing instructions which when executed by a processor / processing circuitry perform the steps according to any of the above aspects or embodiments or their combinations.
- Figure 1 is a block diagram showing an exemplary structure of an encoder in which the motion vector derivation and refinement may be employed;
- Figure 2 is a block diagram showing an exemplary structure of a decoder in which the motion vector derivation and refinement may be employed;
- Figure 3a is a schematic drawing illustrating a method for determining motion vectors according to the prior art
- Figure 3b is a schematic drawing illustrating a method for determining motion vectors
- Figure 4a is a schematic drawing illustrating a method for determining image samples and motion vectors according to embodiment 1 ;
- Figure 4b is a schematic drawing illustrating a method for determining image samples and motion vectors according to embodiment 2;
- Figures 5a to 5c are schematic drawings illustrating the three method steps for
- FIGS. 6a to 6b are schematic drawings illustrating the two method steps for
- Figures 7a and 7b are schematic drawings illustrating bi-prediction with motion vector refinement
- Figure 8 is a block diagram illustrating a processing circuitry for performing the motion vector refinement and template update.
- the present disclosure relates to iteratively refined determination of template and motion vectors for an inter prediction. It may provide an improved inter prediction and may be advantageously employed in motion estimation performed during encoding and decoding of video.
- exemplary encoder and decoder which may implement the motion estimation with the iterative refinement of the template matching are described.
- the present disclosure enables template matching possibly leading to higher similarity of the prediction block to the current block of original samples compared with the prior art because the interactive refinement is less susceptible for local minima. It rather tends towards global minima of the cost function, i.e. to higher similarity.
- Fig. 1 shows an encoder 100 which comprises an input for receiving input image samples of frames or pictures of a video stream and an output for generating an encoded video bitstream.
- the term "frame" in this disclosure is used as a synonym for picture. However, it is noted that the present disclosure is also applicable to fields in case interlacing is applied.
- a picture includes m times n pixels. This corresponds to image samples and may comprise one or more color components. For the sake of simplicity, the following description refers to pixels meaning samples of luminance.
- the motion vector search of the invention can be applied to any color component including chrominance or components of a search space such as RGB or the like.
- the encoder 100 is configured to apply prediction, transformation, quantization, and entropy coding to the video stream.
- the transformation, quantization, and entropy coding are carried out respectively by a transform unit 106, a quantization unit 108 and an entropy encoding unit 170 so as to generate as an output the encoded video bitstream.
- the video stream may include a plurality of frames, wherein each frame is divided into blocks of a certain size that are either intra or inter coded.
- the blocks of for example the first frame of the video stream are intra coded by means of an intra prediction unit 154.
- An intra frame is coded using only the information within the same frame, so that it can be independently decoded and it can provide an entry point in the bitstream for random access.
- Blocks of other frames of the video stream may be inter coded by means of an inter prediction unit 144: information from previously coded frames (reference frames) is used to reduce the temporal redundancy, so that each block of an inter-coded frame is predicted from a block in a reference frame.
- a mode selection unit 160 is configured to select whether a block of a frame is to be processed by the intra prediction unit 154 or the inter prediction unit 144. This mode selection unit 160 also controls the parameters of intra or inter prediction. In order to enable refreshing of the image information, intra-coded blocks may be provided within inter-coded frames. Moreover, intra-frames which contain only intra-coded blocks may be regularly inserted into the video sequence in order to provide entry points for decoding, i.e. points where the decoder can start decoding without having information from the previously coded frames.
- the intra estimation unit 152 and the intra prediction unit 154 are units which perform the intra prediction.
- the intra estimation unit 152 may derive the prediction mode based also on the knowledge of the original image while intra prediction unit 154 provides the corresponding predictor, i.e. samples predicted using the selected prediction mode, for the difference coding.
- the coded blocks may be further processed by an inverse quantization unit 1 10, and an inverse transform unit 1 12.
- a loop filtering unit 120 is applied to further improve the quality of the decoded image.
- the filtered blocks then form the reference frames that are then stored in a decoded picture buffer 130.
- Such decoding loop (decoder) at the encoder side provides the advantage of producing reference frames which are the same as the reference pictures reconstructed at the decoder side. Accordingly, the encoder and decoder side operate in a corresponding manner.
- the term "reconstruction" here refers to obtaining the reconstructed block by adding to the decoded residual block the prediction block.
- the inter estimation unit 142 receives as an input a block of a current frame or picture to be inter coded and one or several reference frames from the decoded picture buffer 130. Motion estimation is performed by the inter estimation unit 142 whereas motion compensation is applied by the inter prediction unit 144. The motion estimation is used to obtain a motion vector and a reference frame based on certain cost function, for instance using also the original image to be coded. For example, the inter estimation unit 142 may provide initial motion vector estimation. The initial motion vector may then be signaled within the bitstream in form of the vector directly or as an index referring to a motion vector candidate within a list of candidates constructed based on a predetermined rule in the same way at the encoder and the decoder.
- the motion compensation then derives a predictor of the current block as a translation of a block co-located with the current block in the reference frame to the reference block in the reference frame, i.e. by a motion vector.
- the inter prediction unit 144 outputs the prediction block for the current block, wherein said prediction block minimizes the cost function.
- the cost function may be a difference between the current block to be coded and its prediction block, i.e. the cost function minimizes the residual block.
- the minimization of the residual block is based e.g. on calculating a sum of absolute differences (SAD) between all pixels (samples) of the current block and the candidate block in the candidate reference picture.
- SAD sum of absolute differences
- any other similarity metric may be employed, such as mean square error (MSE) or structural similarity metric (SSIM).
- rate-distortion optimization procedure may be used to decide on the motion vector selection and/or in general on the encoding parameters such as whether to use inter or intra prediction for a block and with which settings.
- the intra estimation unit 152 and inter prediction unit 154 receive as an input a block of a current frame or picture to be intra coded and one or several reference samples from an already reconstructed area of the current frame.
- the intra prediction then describes pixels of a current block of the current frame in terms of a function of reference samples of the current frame.
- the intra prediction unit 154 outputs a prediction block for the current block, wherein said prediction block advantageously minimizes the difference between the current block to be coded and its prediction block, i.e., it minimizes the residual block.
- the minimization of the residual block can be based e.g. on a rate-distortion optimization procedure.
- the prediction block is obtained as a directional interpolation of the reference samples. The direction may be determined by the rate-distortion optimization and/or by calculating a similarity measure as mentioned above in connection with inter-prediction.
- the inter estimation unit 142 receives as an input a block or a more universal-formed image sample of a current frame or picture to be inter coded and two or more already decoded pictures 231 .
- the inter prediction then describes a current image sample of the current frame in terms of motion vectors to reference image samples of the reference pictures.
- the inter prediction unit 142 outputs one or more motion vectors for the current image sample, wherein said reference image samples pointed to by the motion vectors advantageously minimize the difference between the current image sample to be coded and its reference image samples, i.e., it minimizes the residual image sample.
- the predictorfor the current block is then provided by the inter prediction unit 144 for the difference coding.
- the difference between the current block and its prediction, i.e. the residual block 105, is then transformed by the transform unit 106.
- the transform coefficients 107 are quantized by the quantization unit 108 and entropy coded by the entropy encoding unit 170.
- the thus generated encoded picture data 171 i.e. encoded video bitstream, comprises intra coded blocks and inter coded blocks and the corresponding signaling (such as the mode indication, indication of the motion vector, and/or intra-prediction direction).
- the transform unit 106 may apply a linear transformation such as a Fourier or Discrete Cosine Transformation (DFT/FFT or DCT). Such transformation into the spatial frequency domain provides the advantage that the resulting coefficients 107 have typically higher values in the lower frequencies.
- DFT/FFT or DCT Discrete Cosine Transformation
- Quantization unit 108 performs the actual lossy compression by reducing the resolution of the coefficient values.
- the entropy coding unit 170 then assigns to coefficient values binary codewords to produce a bitstream.
- the entropy coding unit 170 also codes the signaling information (not shown in Fig. 1 ).
- Fig. 2 shows a video decoder 200.
- the video decoder 200 comprises particularly a decoded picture buffer 230, an inter prediction unit 244 and an intra prediction unit 254, which is a block prediction unit.
- the decoded picture buffer 230 is configured to store at least one (for uni- prediction) or at least two (for bi-prediction) reference frames reconstructed from the encoded video bitstream, said reference frames being different from a current frame (currently decoded frame) of the encoded video bitstream.
- the intra prediction unit 254 is configured to generate a prediction block, which is an estimate of the block to be decoded.
- the intra prediction unit 254 is configured to generate this prediction based on reference samples that are obtained from the decoded picture buffer 230.
- the decoder 200 is configured to decode the encoded video bitstream generated by the video encoder 100, and preferably both the decoder 200 and the encoder 100 generate identical predictions for the respective block to be encoded / decoded.
- the features of the decoded picture buffer 230 and the intra prediction unit 254 are similar to the features of the decoded picture buffer 130 and the intra prediction unit 154 of Fig. 1 .
- the video decoder 200 comprises further units that are also present in the video encoder 100 like e.g. an inverse quantization unit 210, an inverse transform unit 212, and a loop filtering unit 220, which respectively correspond to the inverse quantization unit 1 10, the inverse transform unit 1 12, and the loop filtering unit 120 of the video coder 100.
- An entropy decoding unit 204 is configured to decode the received encoded video bitstream and to correspondingly obtain quantized residual transform coefficients 209 and signaling information.
- the quantized residual transform coefficients 209 are fed to the inverse quantization unit 210 and an inverse transform unit 212 to generate a residual block.
- the residual block is added to a prediction block 265 and the addition is fed to the loop filtering unit 220 to obtain the decoded video.
- Frames of the decoded video can be stored in the decoded picture buffer 230 and serve as a decoded picture 231 for inter prediction.
- the intra prediction units 154 and 254 of Figs. 1 and 2 can use reference samples from an already encoded area to generate prediction signals for blocks that need to be encoded or need to be decoded.
- the entropy decoding unit 204 receives as its input the encoded bitstream 171.
- the bitstream is at first parsed, i.e. the signaling parameters and the residuals are extracted from the bitstream.
- the syntax and semantic of the bitstream is defined by a standard so that the encoders and decoders may work in an interoperable manner.
- the encoded bitstream does not only include the prediction residuals.
- a motion vector indication is also coded in the bitstream and parsed therefrom at the decoder.
- the motion vector indication may be given by means of a reference picture in which the motion vector is provided and by means of the motion vector coordinates. So far, coding the complete motion vectors was considered. However, also only the difference between the current motion vector and the previous motion vector in the bitstream may be encoded. This approach allows exploiting the redundancy between motion vectors of neighboring blocks.
- H.265 codec In order to efficiently code the reference picture, H.265 codec (ITU-T, H265, Series H: Audiovisual and multimedia systems: High Efficient Video Coding) provides a list of reference pictures assigning to list indices respective reference frames. The reference frame is then signaled in the bitstream by including therein the corresponding assigned list index. Such list may be defined in the standard or signaled at the beginning of the video or a set of a number of frames. It is noted that in H.265 there are two lists of reference pictures defined, called L0 and L1 . The reference picture is then signaled in the bitstream by indicating the list (L0 or L1 ) and indicating an index in that list associated with the desired reference picture. Providing two or more lists may have advantages for better compression.
- L0 may be used for both uni-directionally inter-predicted slices and bi-directionally inter-predicted slices while L1 may only be used for bi-directionally inter-predicted slices.
- the lists L0 and L1 may be defined in the standard and fixed. However, more flexibility in coding/decoding may be achieved by signaling them at the beginning of the video sequence. Accordingly, the encoder may configure the lists L0 and L1 with particular reference pictures ordered according to the index.
- the L0 and L1 lists may have the same fixed size. There may be more than two lists in general.
- the motion vector may be signaled directly by the coordinates in the reference picture. Alternatively, as also specified in H.265, a list of candidate motion vectors may be constructed and an index associated in the list with the particular motion vector can be transmitted.
- Motion vectors of the current block are usually correlated with the motion vectors of neighboring blocks in the current picture or in the earlier coded pictures. This is because neighboring blocks are likely to correspond to the same moving object with similar motion and the motion of the object is not likely to change abruptly over time. Consequently, using the motion vectors in neighboring blocks as predictors reduces the size of the signaled motion vector difference.
- the Motion Vector Predictors are usually derived from already encoded/decoded motion vectors from spatial neighboring blocks or from temporally neighboring blocks in the co-located picture. In H.264/AVC, this is done by doing a component wise median of three spatially neighboring motion vectors. Using this approach, no signaling of the predictor is required.
- Temporal MVPs from a co-located picture are only considered in the so called temporal direct mode of H.264/AVC.
- the H.264/AVC direct modes are also used to derive other motion data than the motion vectors. Hence, they relate more to the block merging concept in HEVC.
- motion vector competition which explicitly signals which MVP from a list of MVPs, is used for motion vector derivation.
- the variable coding quad-tree block structure in HEVC can result in one block having several neighboring blocks with motion vectors as potential MVP candidates.
- a 64x64 luma prediction block could have 16 4x4 luma prediction blocks to the left when a 64x64 luma coding tree block is not further split and the left one is split to the maximum depth.
- AMVP Advanced Motion Vector Prediction
- the final design of the AMVP candidate list construction includes the following two MVP candidates: a) up to two spatial candidate MVPs that are derived from five spatial neighboring blocks; b) one temporal candidate MVPs derived from two temporal, co-located blocks when both spatial candidate MVPs are not available or they are identical; and c) zero motion vectors when the spatial, the temporal or both candidates are not available. Details on motion vector determination can be found in the book by V. Sze et al (Ed.), High Efficiency Video Coding (HEVC): Algorithms and Architectures, Springer, 2014, in particular in Chapter 5, incorporated herein by reference.
- HEVC High Efficiency Video Coding
- the motion vector refinement may be performed at the decoder without assistance from the encoder.
- the encoder in its decoder loop may employ the same refinement to obtain corresponding motion vectors.
- Motion vector refinement is performed in a search space which includes integer pixel positions and fractional pixel positions of a reference picture.
- the fractional pixel positions may be half-pixel positions or quarter-pixel or further fractional positions.
- the fractional pixel positions may be obtained from the integer (full-pixel) positions by interpolation such as bi-linear interpolation.
- two prediction blocks obtained using the respective first motion vector of list L0 and the second motion vector of list L1 are combined to a single prediction signal, which can provide a better adaptation to the original signal than uni- prediction, resulting in less residual information and possibly a more efficient compression.
- a template is used, which is an estimate of the current block and which is constructed based on the already processed (i.e. coded at the encoder side and decoded at the decoder side) image portions.
- an estimate of the first motion vector MV0 and an estimate of the second motion vector MV1 are received as input at the decoder 200.
- the motion vector estimates MV0 and MV1 may be obtained by block matching and/or by search in a list of candidates (such as merge list) formed by motion vectors of the blocks neighboring to the current block (in the same picture or in adjacent pictures).
- MV0 and MV1 are then advantageously signaled to the decoder side within the bitstream.
- the first determination stage at the encoder could be performed by template matching which would provide the advantage of reducing signaling overhead.
- the motion vectors MVO and MV1 are advantageously obtained based on information in the bitstream.
- the MVO and MV1 are either directly signaled, or differentially signaled, and/or an index in the list of motion vector (merge list) is signaled.
- the present disclosure is not limited to signaling motion vectors in the bitstream.
- the motion vector may be determined by template matching already in the first stage, correspondingly to the operation of the encoder.
- the template matching of the first stage may be performed based on a search space different from the search space of the second, refinement stage. In particular, the refinement may be performed on a search space with higher resolution (i.e. shorter distance between the search positions).
- An indication of the two reference pictures RefPicO and RefPid , to which respective MVO and MV1 point, are provided to the decoder as well.
- the reference pictures are stored in the decoded picture buffer at the encoder and decoder side as a result of previous processing, i.e. respective encoding and decoding.
- One of these reference pictures is selected for motion vector refinement by search.
- a reference picture selection unit of the apparatus for the determination of motion vectors is configured to select the first reference picture to which MVO points and the second reference picture to which MV1 points. Following the selection, the reference picture selection unit determines whether the first reference picture or the second reference picture is used for performing of motion vector refinement.
- the search region in the first reference picture is defined around the candidate position to which motion vector MVO points.
- the candidate search space positions within the search region are analyzed to find a block most similar to a template block by performing template matching within the search space and determining a similarity metric such as the sum of absolute differences (SAD).
- the positions of the search space denote the positions on which the top left corner of the template is matched.
- the top left corner is a mere convention and any point of the search space such as the central point can in general be used to denote the matching position.
- DMVR Decoder-Side Motion Vector Refinement
- DMVR has as an input the initial motion vectors MVO and MV1 which point into two respective reference pictures RefPictO and RefPictl . These initial motion vectors are used for determining the respective search spaces in the RefPictO and RefPictl . Moreover, using the motion vectors MVO and MV1 , a template is constructed based on the respective blocks (of samples) A and B pointed to by MVO and MV1 as follows:
- the function may be sample clipping operation in combination with weighted summation.
- the cost function for determining the best template match in the respective search spaces is SAD(Template, Block A'), where block A is the coding block which is pointed by the candidate MV in the search space spanned on a position given by the MVO.
- Figure 7a illustrates the determination of the best matching block A and the resulting refined motion vector MVO'.
- the same template is used to find best matching block B' and the corresponding motion vector MVV which points to block B' as shown in Figure 7b.
- the refined motion vectors MVO' and MV1 ' are found via search on RefPicO and RefPid with the template.
- the template is updated at least once, as will be described below.
- a block template is calculated by adding together the blocks that are referred by MVO and MV1 .
- the block template is used to find a refined MVO' and/or MVV.
- the refinement process is divided into steps. After each step, the block template is constructed / updated based on the refined motion vectors that were obtained in the previous step.
- the step may include refinement of both MVO and MV1.
- the template update may be performed after each of refinements of the respective MVO and MV1 .
- the number of template updates and the cost function for template matching could be pre-defined or signaled in the bitstream.
- the following steps are performed: - Determining of a refinement of an initial motion vector for the image prediction block by template matching with an initial template to generate a refined motion vector;
- the initial motion vector was signaled in the bitstream. Accordingly, it may be determined in the encoder based on the original image and obtained at the decoder based on the signaled quantity.
- the signaled quantity is an indication of the motion vector. This may be the motion vector itself defined by the coordinates (offset from the co- located block to the initial predictor block). However, it may be more efficient to construct a list of candidate motion vectors based on the motion vectors of the neighboring blocks (temporally and/or spatially) and/or some predefined values and to signal only an index to a candidate within such list. Nevertheless, the present disclosure is not limited to any particular determination of the initial motion vector.
- the initial motion vector may still be also performed by template matching.
- refinement refers to operation in which the initial motion vector defines a search space in which a template matching is used to test candidate positions in the surroundings of the initial motion vector to find a possibly better match.
- the result may be the same, initial motion vector or a motion vector on one of the search space candidate positions.
- the step of generating the updated template based on the image samples pointed to by the refined motion vector may also be performed in various different ways. It is noted that the term "updated” does not necessarily means that the updated template is determined as a function of the initial template. In one option, this may be the case. In another option, the updated template is newly constructed based on the samples pointed to by the refined motion vector. In other words, the updated template may be generated as a function of the block pointed to by the refined motion vector. In addition, the updated template may be generated as a function of the previous template and/or initial template and/or block pointed to by initial motion vector as will be described in detail in some selected examples below. Once the updated template is determined, a further refinement of the motion vector may be obtained by applying the updated template for the template matching.
- processing circuitry 800 is illustrated in Figure 8.
- the processing circuitry may include any hardware and the configuration may be implemented by any kind of programming or hardware design of a combination of both.
- the processing circuitry may be formed by a single processor such as general purpose processor with the corresponding software implementing the above steps.
- the processing circuitry may be implemented by a specialized hardware such as an ASIC (Application-Specific Integrated Circuit) or FPGA (Field-Programmable Gate Array) of a DSP (Digital Signal Processor) or the like.
- ASIC Application-Specific Integrated Circuit
- FPGA Field-Programmable Gate Array
- DSP Digital Signal Processor
- the processing circuitry may include one or more of the above mentioned hardware components interconnected for performing the above motion vector refinement including template update.
- the processing circuitry 800 implements two functionalities: performance of the template construction 810 and motion vector refinement 820 performed with the updated template. These two functionalities may be implemented on the same piece of hardware or may be performed by separate units of hardware such as a template determination unit 810 and motion vector refinement unit 820.
- the present disclosure provides at least one update of the template and refinement of the motion vector by applying template matching with the updated template.
- further refinement may be achieved by iteratively updating the template and iteratively performing the refinements with the so updated respective templates.
- the refinement may be performed iteratively, over i being an integer larger than 1 , by repeating the following steps of i-th iteration:
- the motion vector refinement is started with obtaining a first initial motion vector MV0 (0) and a second initial motion vector MV1 (0) pointing to a first initial image sample block A (0) and a second initial image sample block B (0) , respectively.
- the first and the second initial image sample blocks may be in two different reference (already reconstructed) pictures RefPicO and RefPid , both being different from the current picture.
- the first and the second initial image sample blocks may be in the same decoded picture, which is different from the current picture. Based on the two image sample blocks, an initial template C (0) is generated.
- the motion vector points to a particular sample.
- the wording employed in this disclosure concerning pointing to a block assumes that the motion vector points to a predetermined sample which identifies a location of a block. Such sample may be for instance a top left corner or a centrum of the block.
- a block is in general any group of samples with a predefined size and shape. A typical example is a rectangle or a square of samples.
- the size may correspond to the coding tree unit or coding unit. For instance, blocks of 2 x 2 pixels or 4 x 4 pixels or 8 x 8 pixels or any size such as 128 x 128 pixels may be provided. Rectangular sizes of 4 x 8 or 8 x 16 or any other may also be defined.
- the template follows the size and shape of the prediction blocks. However, this is not necessarily the case.
- the template may be smaller than the current / prediction block and may be formed by a subset of the sample positions of the current / prediction block. In the following, two exemplary embodiments will be described.
- Embodiment 1 starts from two initial motion vectors and performs the updating of the template following the refinement of either of the two initial motion vectors.
- the processing circuitry of the present disclosure in Embodiment 1 is configured to:
- the template update is performed after each motion vector refinement, i.e. after refining any of the two initial motion vectors. It is noted that the most current applications employ bi-prediction including two reference pictures. However, the present disclosure is also employable to approaches which use prediction referring to more than two reference pictures, i.e. starting from more than two initial motion vectors.
- the template updating / motion refinement may be performed iteratively.
- the processing circuitry may be configured to iteratively, over i being an integer larger than 1 , repeat the following steps of i-th iteration:
- the first-direction update is an update of the template based on the refined motion vector in one of the two reference pictures, i.e. motion vector pointing in one, first direction.
- the second- direction update is an update based on refined motion vector in the other one of the two reference pictures, i.e. refinement of the motion vector pointing in a different, second direction.
- the template matching includes a search for a best match in a search space which may be constructed around the position given by the refined / updated motion vector to be updated in the current iteration.
- the search space includes at least two candidate positions to be tested by the template matching to find the best matching position to become pointed to by the updated motion vector.
- Fig. 4a illustrates the processing steps of Embodiment 1 in a schematic drawing.
- the function determining the template may be a combination such as a weighted average of the first and the second initial image samples, though the invention is not limited to this function. Additional steps, including clipping, filtering and shifting, may follow the weighted averaging to determine the template.
- a search space is defined in the first decoded picture RefPicO and in the search space a reference picture portion best matching the template C (0) is determined. The best matching portion defines the first updated image sample block A (1) and its position determines the first refined motion vector MV0 (1) as shown in Fig. 5a.
- the first updated image sample block A (1) and the second initial image sample block B (0) are used to determine a first-direction update of the template generating template C (1) .
- the function is a sample-vise combination of sample blocks A (1) pointed to by the refined motion vector and B (0) pointed to by the initial motion vector.
- the function may be an average or a weighted average.
- a and b may be determined inverse proportional to the distance between the current frame including the current block and the respective blocks A and B.
- a simple average may be employed in general, for instance in cases where the refPictO and RefPictI have the same temporal distance from the current frame. However, for simplicity reasons, the average may also be used for other cases.
- a weighted average may be used with weights determined based on the distance of the respective refPictO and RefPictI from the current picture. In particular, the block (A or B) closer to the current block has higher weight than the block (B or A) farther from the current block.
- the weighting factors can be derived by other means, e.g. they might be signalled in the bitstream.
- the function might also include sample clipping as an example, where the intensity of each sample of weighted average of A and B is restricted to an intensity range defined by [minimum Intensity, maximum intensity], where the values of minimum intensity and maximum intensity can be signalled, predefined or derived.
- Another example function is the rounding operation that would be implemented after the weighted averaging of blocks A and B.
- averaging operation can be implemented as (A+B+1 )»1 , where "»" corresponds to shifting of bits to the right and discarding the least significant bit, for finite sample precision computations.
- the operations on blocks A and B are performed sample-wise, meaning that they are performed for each of the elements of A and the corresponding respective element of B.
- refinement of the second initial motion vector MV1 (0) is performed.
- a search space is defined in the second decoded picture RefPid and in this search space a reference picture portion best matching the template C (1) is determined.
- the best matching portion defines the second updated image sample block B (1) and its position determines the second refined motion vector MV1 (1) as shown in Fig. 5b.
- the first updated image sample block A (1) and the second updated image sample block B (1) are used to further update the template generating template C (2) .
- the updated template may be determined as follows:
- templates C (1) and C (2) are updated by constructing them based on the most up-to date blocks in the respective two reference pictures.
- templates T (1) and T (2) instead of templates C (1 ) and C (2) may be determined as a function of the previous template and the most recently refined motion vector (samples pointed to by such vector):
- T ⁇ 2 > function (V 1 B ⁇ 1 >).
- the template may be updated in any way using the samples pointed to by the most recently refined motion vector.
- the functions for updating the templates C (1) and C (2) may be but are not necessarily the same.
- this processing includes the following steps:
- the first updated image sample A (i+1) and the first refined motion vector MV0 (i+1) are determined.
- the second updated image sample B (i+1) and the second refined motion vector MV1 (i+1) are determined.
- the template update may thus be expressed as:
- the template update may also be performed in a different manner.
- the present disclosure is not limited to a plurality of iterations.
- already a single update may provide for improved motion estimation.
- already the update of the refined motion vector based on the updated first-direction template C (i+1> ' 1 may provide the advance without further template updates.
- the resolution of the search space may change to a finer resolution with an increasing number of repetitions.
- the search space spanned by the initial motion vector may include candidate positions in an integer-sample distance from the initial motion vector while the search space spanned after the refinement for the matching with the updated template may include positions in half-sample distances from the refined / updated motion vector.
- search spaces including quad-sample positions may be tested and so on.
- the present disclosure is not limited to this approach and the search spaces may maintain the same resolution or be defined with positions of mixed integer, half, quad or other sample resolution.
- Embodiment 2 performs the updating of the template following the refinement of both motion vectors, i.e. motion vectors pointing in the two respective reference pictures.
- processing circuitry is configured to:
- the approach may be iteratively repeated based on the updated template in which case the processing circuit is configured to iteratively, over i being an integer larger than 1 , repeat the following steps of i-th iteration:
- Fig. 4b illustrates the processing steps of embodiment 2 in a schematic drawing.
- the motion vector refinement of the present embodiment contains the following processing steps.
- Refinement of the first initial motion vector MV0 (0) is performed by determining a template C (0) based on the first and the second initial image sample blocks A (0) and B (0) .
- the function determining the template may be a weighted average of image samples pointed by the first and the second initial motion vectors, though the invention is not limited to this function. Additional steps, including clipping, filtering and shifting, may follow the weighted averaging to determine the template.
- a search space is defined in the first decoded picture RefPicO on a position given by the sample position pointed by the corresponding initial motion vector. In the search space a reference picture portion best matching the template C (0) is determined.
- the best matching portion defines the first updated image sample block A (1) and its position determines the first refined motion vector MV0 (1) .
- another search space is defined in the second decoded picture RefPid based on the position of the corresponding initial motion vector and in this search space a reference picture portion best matching the same template C (0) is determined as well.
- the best matching portion defines the second updated image sample block B (1) and its position determines the second refined motion vector MV1 (1) . This processing step is shown in Fig. 6a.
- the first updated image sample A (1) and the second updated image sample B (1) are used to further update the template generating template C (1) .
- the updated template is obtained as follows:
- C ⁇ 1> function (A ⁇ 1 >, B ⁇ 1 >).
- processing steps of motion vector refinement and template update discussed above are repeated a number of times, until a maximum number of times i ma x is reached.
- the maximum number of times i ma x is pre-defined or signaled in the bitstream.
- the processing step following the second repetition is shown in Fig. 6b.
- this processing includes the following steps:
- the first updated image sample A (i+1) and the first refined motion vector MV0 (i+1) are determined.
- the second updated image sample B (i+1) and the second refined motion vector MV1 (i+1) are determined.
- the update of the template C (i+1) is generated based on the image samples A (i+1) and B ⁇ i+1 >.
- the resolution of the search space may change to a finer resolution with an increasing number of repetitions as already noted above for Embodiment 1 .
- Embodiment 2 One of the advantages of Embodiment 2 over Embodiment 1 is that the refinement on reference picture 0 and 1 can be performed in parallel. However coding gain may be reduced in comparison to Embodiment 1 in which a more accurate result may be achieved.
- the number of template updates can be predefined or signaled in the bitstream.
- the number of iterations can be signaled as a sequence level parameter down to CU level parameter.
- the number of iterations may be signaled in a set of parameters applicable for one or more video frames such as picture parameter set or sequence parameter set known from H.264/H.265. This kind of signaling does not require much overhead.
- a finer adaption may be achieved if the number of iterations is signaled on a picture portion basis.
- Such picture portion may be a slice or any portion such as tile. Signaling in a CTU / CU basis may be too fine but is applicable with the present disclosure.
- maximum number of template updates can be predefined, i.e. fixed to a certain number such as 1 , 2, or 3. This may be defined by the standard and may differ for different image resolutions or based on other coding settings.
- the number of iterations is derived at the encoder and decoder in the same way based on the encoding settings, i.e. based on coding parameters signaled in the bitstream (block size, partitioning information, search space configuration, resolution, etc.).
- the iterations can be terminated before the maximum number is reached according to one of the following conditions:
- a ) Refinement does not progress This condition is fulfilled if there is no change in the refined motion vectors after an iteration.
- the iterations may be terminated if the motion vector in iteration i is the same as the initial motion vector.
- the iterations may be terminated if the updated motion vector is the same as the motion vector from the previous iteration, i.e. motion vector in iteration i is equal to motion vector in iteration i+1.
- the equality may be defined after a clipping or rounding of the motion vector.
- this condition is fulfilled if the difference between the refined and non-refined MV exceeds a certain threshold. Accordingly, if the improvement is rather small, the iterations are terminated.
- the motion vector of the current iteration i may be compared either with the initial motion vector or with the motion vector of the immediately preceding iteration i-1.
- the iterations may be terminated if the updated motion vector is the difference between the motion vector in iteration i and motion vector in iteration i+1 exceeded a threshold.
- This threshold is different and larger than the threshold from condition A mentioned above. This condition should reduce the risk of divergence of the refinement iterations.
- N may be an integer or a fractional number larger than zero.
- the length of motion vector refers to the absolute value irrespectively of the direction.
- the condition may be fulfilled if the length of motion vector in x or y axes plus block width/height exceeds a certain threshold (at frame, slice, tile or other segmentation boundaries).
- a certain threshold at frame, slice, tile or other segmentation boundaries.
- This condition is satisfied if the difference between the cost function value calculated for the motion vectors in two following iterations is too small. In other words, if the refinement only results in a negligible improvement, the iterations are stopped. For example, if SAD for the block pointed to by motion vector from iteration i is same or only slightly lower than the SAD for the block pointed to by motion vector from iteration i-1 , the iterations are stopped. The condition may also compare improvement in certain iteration i with the SAD corresponding to the initial motion vector. As also mentioned above, SA Dis only an example and in general, a different cost function may be used.
- the cost function used for template matching could be also signaled in the bitstream.
- An index may be used to indicate the selected template matching function according to a pre-defined cost function table.
- the index could be signaled as a sequence or picture level parameter.
- the number of iterations is a predefined number and the processing circuitry is further configured to stop the interactive refinement of the motion vectors and block templates before the predefined number is reached if a predefined condition is met, the predefined condition being one or a combination of the following:
- ThrO a certain threshold
- the difference between the updated motion vector after iteration K and the updated motion vector after iteration L does not exceed a predetermined first threshold, Thr1 , wherein L is an integer larger than zero and K is an integer larger than or equal to zero and smaller than L.
- Thr1 a predetermined first threshold
- the length of motion vector after iteration j along the x axis exceeds a predetermined third threshold, Thr3, j being an integer 1 or larger.
- Thr4 a predetermined fourth threshold
- a result of adding the length of the motion vector after iteration j along the x axis to the top-left coordinate of a prediction unit exceeds a sixth threshold, Thr6.
- Thr6 A result of adding the length of the motion vector after iteration i along the y axis to the top-left coordinate of a prediction unit and to a block height (i.e. size in y direction, of the prediction unit) is below a seventh threshold, Thr7.
- the third to eight threshold may be selected so that the predictor resulting from the motion vector after the j-th iteration does not cross frame or slice or tile boundary, i.e. a boundary of a picture portion which is to be decodable without spatial dependency on other parts of the same picture.
- the term coordinate here refers to the x or y coordinate of a sample within the picture, such as sample pointed to by the motion vector or a sample of the search space.
- the thresholds may be defined in standard, determined based on coding parameters signaled in the bitstream or signaled separately in the bitstream.
- the third and fourth thresholds may have the same value. However, assuming that the movement may be larger in x axis, it may also make sense to set them differently.
- the conditions may be combined. For instance, the difference between the motion vectors in iterations i and i+1 may be conditioned on a lower (first) and higher (second) threshold. The iterations are terminated if the difference is below the first or above the second threshold.
- a combined conditions on motion vector size / difference and on cost function value / difference may be provided.
- iterations can be terminated if the difference between the updated motion vector after iteration i and the initial motion vector along the x-axis is greater than a certain threshold, where the threshold can be signalled, predefined or derived depending on quantization parameter, frame width, etc.
- the above described motion vector refinement applying template update may be used in an encoding apparatus and a decoding apparatus which may be parts of the respective video recording device and video playback device.
- an encoding apparatus for encoding a video image, the apparatus comprising motion vector determining device with the processing circuitry as described above for determination of a motion vector for an image prediction block.
- the encoding apparatus may further include an image coding circuitry configured to perform video image coding of the image prediction block based on predictive coding using the determined motion vector and generating a bitstream including the coded image prediction block.
- the encoding apparatus may be further configured to encode into the bitstream the initial motion vector determined as described above with reference to Fig. 1 .
- the encoding apparatus may further be configured to encode into the bitstream further configuration parameters such as an enabling flag for enabling the use of the iterative template update as described above, number of iterations, some of the thresholds mentioned above or the like.
- a decoding apparatus for decoding a video image from a bitstream, the apparatus comprising a bitstream parser for extracting from the bitstream portions corresponding to a compressed video image including compressed image prediction block to be decoded.
- the decoding apparatus may further comprise the motion vector determining apparatus as described above for determination of a motion vector for the image prediction block including the refinement and template update.
- the decoder may further include an image reconstruction circuitry configured to perform image reconstruction of the image prediction block based on the motion vector and other parts as described with reference to Figure 2 above.
- a method for determination of a motion vector for an image prediction block.
- the method includes a step of determining a refinement of an initial motion vector for the image prediction block by template matching with an initial template to generate a refined motion vector and then generating an updated template based on the image samples pointed to by the refined motion vector.
- An updated motion vector is then determined for the image prediction block by template matching with the updated template in a search space including a plurality of candidate motion vector positions.
- a method for encoding a video image comprising determining of the motion vector for an image prediction block according to the above mentioned method as well as then performing video image coding of the image prediction block based on predictive coding using the determined motion vector and generating a bitstream including the coded image prediction block.
- a further method for decoding a video image from a bitstream comprising the steps of parsing from the bitstream portions corresponding to a compressed video image including compressed image prediction block to be decoded; determining of a motion vector for the image prediction block according to the method described above, and performing image reconstruction of the image prediction block based on the motion vector.
- the initial motion vectors MV0 (0) and MV1 (0) are provided.
- a template is determined based on them, namely as a function of the block samples of the respective blocks A and B pointed to by the two initial motion vectors.
- the motion refinement is performed with the template to obtain refined motion vectors MV0 (1) and MV1 (1) .
- the refined motion vectors may be used to define a further search space such as a fractional search space to determine the final motion vector refinement MV0 (2) and MV1 ⁇ 2 >.
- the initial motion vectors MV0 (0) and MV1 (0) are provided.
- a template is determined in step 310 based on them, namely as a function of the block samples of the respective blocks A(0) and B(0) pointed to by the two initial motion vectors. Then the motion refinement is performed in step 320 with the template to obtain refined motion vectors MV0 (1) and MV1 ⁇ 1 >.
- the template is updated as a function of the determined updated motion vectors MV0 (1) and MV1 (1) namely as a function of the block samples of the blocks pointed to by the respective updated motion vectors.
- the refined motion vectors may be used to define a further search space such as a fractional search space to determine the final motion vector refinement MV0 (2) and MV1 (2) .
- the updated motion vectors MV0 (1) and MV1 (1) may be further updated by template matching with the template updated in step 330 in one or more iterations.
- the search space refinement 340 may be performed after the iterations are performed.
- the iterations may also include the additional refinement of the search space so that the next search based on the template updated in step 330 may be performed based on the refined motion vectors MV0 (2) and MV1 (2) rather than MV0 (1) and MV1 (1) .
- the steps may be repeated as follows 310 -> 320 -> 330 -> 340 -> 310 -> 320 -> 330... etc.
- the motion vector determination with template adaption as described above can be implemented as a part of encoding and/or decoding of a video signal (motion picture).
- the motion vector determination may also be used for other purposes in image processing such as movement detection, movement analysis, or the like without limitation to be employed for encoding / decoding.
- the motion vector determination may be implemented as an apparatus. Such apparatus may be a combination of a software and hardware.
- the motion vector determination may be performed by a chip such as a general purpose processor, or a digital signal processor (DSP), or a field programmable gate array (FPGA), or the like.
- DSP digital signal processor
- FPGA field programmable gate array
- the present invention is not limited to implementation on a programmable hardware. It may be implemented on an application-specific integrated circuit (ASIC) or by a combination of the above mentioned hardware components.
- the motion vector determination may also be implemented by program instructions stored on a computer readable medium. The program, when executed, causes the computer to perform the steps of the above described methods.
- the computer readable medium can be any medium on which the program is stored such as a DVD, CD, USB (flash) drive, hard disc, server storage available via a network, etc.
- the encoder and/or decoder may be implemented in various devices including a TV set, set top box, PC, tablet, smartphone, or the like, i.e. any recording, coding, transcoding, decoding or playback device. It may be a software or an app implementing the method steps and stored / run on a processor included in an electronic device as those mentioned above.
- the present disclosure relates to motion vector determination and refinement.
- a refined motion vector is determined based on initial motion vector using template matching in a certain search space around the initial motion vector.
- the template is determined based on the samples pointed to by the initial motion vectors.
- the template is then updated based on the refined motion vector determined and used to further refine the motion vectors. This may be performed iteratively.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
Claims
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2017/075715 WO2019072373A1 (en) | 2017-10-09 | 2017-10-09 | Template update for motion vector refinement |
Publications (1)
Publication Number | Publication Date |
---|---|
EP3685583A1 true EP3685583A1 (en) | 2020-07-29 |
Family
ID=60043213
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP17781115.5A Pending EP3685583A1 (en) | 2017-10-09 | 2017-10-09 | Template update for motion vector refinement |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP3685583A1 (en) |
WO (2) | WO2019072373A1 (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019191717A1 (en) | 2018-03-30 | 2019-10-03 | Hulu, LLC | Template refined bi-prediction for video coding |
US11146810B2 (en) | 2018-11-27 | 2021-10-12 | Qualcomm Incorporated | Decoder-side motion vector refinement |
JP7257524B2 (en) * | 2019-01-02 | 2023-04-13 | テレフオンアクチーボラゲット エルエム エリクソン(パブル) | Side motion refinement in video encoding/decoding systems |
EP3970376A4 (en) | 2019-06-17 | 2022-11-09 | Beijing Dajia Internet Information Technology Co., Ltd. | Methods and apparatuses for decoder-side motion vector refinement in video coding |
WO2020257787A1 (en) * | 2019-06-21 | 2020-12-24 | Beijing Dajia Internet Information Technology Co., Ltd. | Methods and devices for prediction dependent residual scaling for video coding |
CN114051732A (en) * | 2019-07-27 | 2022-02-15 | 北京达佳互联信息技术有限公司 | Method and apparatus for decoder-side motion vector refinement in video coding |
US11736720B2 (en) * | 2019-09-03 | 2023-08-22 | Tencent America LLC | Motion vector refinement methods for video encoding |
US11936877B2 (en) * | 2021-04-12 | 2024-03-19 | Qualcomm Incorporated | Template matching based affine prediction for video coding |
CN116612157A (en) * | 2023-07-21 | 2023-08-18 | 云南大学 | Video single-target tracking method and device and electronic equipment |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011071514A2 (en) * | 2009-12-08 | 2011-06-16 | Thomson Licensing | Methods and apparatus for adaptive residual updating of template matching prediction for video encoding and decoding |
US10200711B2 (en) * | 2015-03-27 | 2019-02-05 | Qualcomm Incorporated | Motion vector derivation in video coding |
EP3876541A1 (en) * | 2015-09-02 | 2021-09-08 | Mediatek Inc. | Fast sum of absolute differences calculation for motion vector derivation in video coding |
-
2017
- 2017-10-09 EP EP17781115.5A patent/EP3685583A1/en active Pending
- 2017-10-09 WO PCT/EP2017/075715 patent/WO2019072373A1/en unknown
-
2018
- 2018-03-28 WO PCT/EP2018/057892 patent/WO2019072422A1/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
WO2019072373A1 (en) | 2019-04-18 |
WO2019072422A1 (en) | 2019-04-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10856006B2 (en) | Method and system using overlapped search space for bi-predictive motion vector refinement | |
US12069291B2 (en) | Limited memory access window for motion vector refinement | |
US11363292B2 (en) | Memory access window and padding for motion vector refinement and motion compensation | |
US11153595B2 (en) | Memory access window and padding for motion vector refinement | |
US20200236388A1 (en) | Memory access window for sub prediction block motion vector derivation | |
EP3685583A1 (en) | Template update for motion vector refinement | |
US11159820B2 (en) | Motion vector refinement of a motion vector pointing to a fractional sample position | |
WO2019072369A1 (en) | Motion vector list pruning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20200424 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20230307 |