WO2019117659A1

WO2019117659A1 - Image coding method based on deriving motion vector, and device therefor

Info

Publication number: WO2019117659A1
Application number: PCT/KR2018/015894
Authority: WO
Inventors: 이재호; 임재현
Original assignee: 엘지전자 주식회사
Priority date: 2017-12-14
Filing date: 2018-12-14
Publication date: 2019-06-20

Abstract

An image decoding method performed by a decoding device according to the present invention comprises: a step of deriving a motion vector predictor (MVP) for a current block; a step of acquiring, from a bitstream, a motion vector difference (MVD) indicating the difference between a motion vector of the current block and the MVP for the current block; a step of deriving control point motion vector predictors (CPMVP) respectively corresponding to control points (CP) for the current block; a step of deriving control point motion vectors (CPMV) respectively corresponding to the CPs for the current block, on the basis of the MVP for the current block, the MVD and the CPMVPs; a step of deriving a motion vector field (MVF) for the current block on the basis of the CPMVs; a step of generating prediction samples for the current block on the basis of the MVF; and a step of generating reconstruction samples for the current block on the basis of the prediction samples for the current block.

Description

Image coding method based on motion vector derivation and apparatus therefor

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an image coding technique, and more particularly, to an image coding method and apparatus based on motion vector derivation.

2. Description of the Related Art Demand for high-resolution, high-quality images such as high definition (HD) images and ultra high definition (UHD) images has recently increased in various fields. As the image data has high resolution and high quality, the amount of information or bits to be transmitted increases relative to the existing image data. Therefore, the image data can be transmitted using a medium such as a wired / wireless broadband line, When stored, the transmission cost and the storage cost are increased.

Accordingly, there is a need for a highly efficient image compression technique for efficiently transmitting, storing, and reproducing information of high resolution and high quality images.

SUMMARY OF THE INVENTION The present invention provides a method and apparatus for enhancing video coding efficiency.

It is another object of the present invention to provide a method and apparatus for performing image coding based on motion vector derivation.

It is another object of the present invention to provide a method and apparatus for deriving a motion vector based on a bilinear interpolation motion model.

According to another aspect of the present invention, there is provided a control point motion vector (CPMVP) based on a control point motion vector predictor (CPMVP) of a control point (CP) and a control point motion vector (CPMV).

Another object of the present invention is to provide a method and apparatus for deriving a motion vector of a current block based on CPMVs of CPs for a current block.

According to an embodiment of the present invention, an image decoding method performed by a decoding apparatus is provided. The method includes deriving a motion vector predictor (MVP) for a current block, calculating a motion vector (MVP) representing a difference between a motion vector of the current block and MVP for the current block from the bitstream, Deriving a motion vector difference (MVD), deriving a control point motion vector predictor (CPMVP) corresponding to each of the control points (CP) for the current block , Deriving control point motion vectors (CPMVs) corresponding to each of the CPs for the current block based on MVP, MVD and CPMVPs for the current block, Deriving a motion vector field (MVF) for the current block from the MVF, Based on the predicted samples for the current block and the step of generating prediction samples to be characterized in that it comprises a step of generating a reconstructed samples of the current block.

According to another embodiment of the present invention, an image encoding method performed by an encoding apparatus is provided. The method includes deriving a motion vector predictor (MVP) for a current block, calculating a motion vector predictor (MVP) representing a difference between a motion vector of the current block and MVP for the current block, deriving a control point motion vector predictor (CPMVP) corresponding to each control point (CP) for the current block, Deriving control point motion vectors (CPMVs) corresponding to each of the CPs for the current block, based on the MVP for the current block, the MVD, and the CPMVPs, Deriving a motion vector field (MVF) for the current block, based on the MVF for the current block, Generating residual samples for the current block based on the generated prediction samples, and generating residual samples for the predictive information including the derived MVD and residuals for the residual samples based on the generated prediction samples. And encoding the dual information.

According to another embodiment of the present invention, a decoding apparatus for performing image decoding is provided. The decoding apparatus includes a motion vector difference (MVD) representing a difference between a motion vector of the current block and a motion vector predictor (MVP) of the current block from a bitstream, (CPMVP) corresponding to each of the control points (CP) for the current block is derived, and the control point motion vector predictor A control point motion vector (CPMV) corresponding to each of the CPs for the current block is derived based on the MVP for the current block, the MVD, and the CPMVPs, Derives a motion vector field (MVF) for the current block, A predictor for generating prediction samples for the re-block, and an adder for generating reconstruction samples for the current block based on the prediction samples for the current block.

According to another embodiment of the present invention, an encoding apparatus for performing image encoding is provided. The encoding apparatus derives a motion vector predictor (MVP) for a current block and calculates a motion vector difference (MVP) representing a difference between a motion vector of the current block and MVP for the current block, the control point motion vector predictor (CPMVP) corresponding to each of the control points (CP) for the current block is derived, and the current block A control point motion vector (CPMV) corresponding to each of the CPs for the current block is derived based on the MVP, the MVD, and the CPMVPs for the current block, and based on the CPMVs, A motion vector field (MVF) is derived, and based on the MVF for the current block, prediction samples for the current block A residual processing unit for deriving residual samples of the current block based on the generated prediction samples, and a prediction unit for generating predicted information including the derived MVD and residual And an entropy encoding unit for encoding the dual information.

According to the present invention, the overall video / video compression efficiency can be increased.

According to the present invention, the motion vector can be derived based on the bidirectional linear interpolation motion model, thereby enhancing the image coding efficiency.

According to the present invention, the CPMVs of the CPs for the current block can be applied to the bidirectional linear interpolation motion model to derive the motion vector of the current block, thereby enhancing the video coding efficiency.

BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a schematic view illustrating a configuration of an encoding apparatus to which the present invention can be applied. FIG.

FIG. 2 is a view for schematically explaining a configuration of a decoding apparatus to which the present invention can be applied.

FIG. 3 is a diagram illustrating an example of motion represented through an affine motion model according to an embodiment.

4 is a diagram showing an example of an affine motion model using CPMVs of three CPs for the current block.

5 is a diagram showing an example of an affine motion model using CPMVs of two CPs for the current block.

6 is a diagram illustrating an example of deriving a motion vector on a subblock-by-block basis based on an affine motion model.

FIG. 7 is a flowchart illustrating a process of performing image coding based on a motion vector according to an embodiment.

8 is a diagram for explaining a process of deriving a motion vector of a current block based on four CPs for a current block according to an embodiment.

FIG. 9 is a flowchart illustrating a process of deriving a motion vector of a current block based on four CPs for a current block according to an embodiment.

FIG. 10 is a diagram for explaining a process of deriving a motion vector of a sample included in a current picture based on a bi-directional reference picture according to an embodiment.

FIG. 11 is a diagram for explaining a process of deriving an x-axis gradient of a left top CP for a current block according to an embodiment.

12 is a diagram for explaining a process of deriving a motion vector field (MVF) for a current block based on CPMVs of four CPs for a current block according to an embodiment.

13 is a diagram illustrating an example in which an MVF for a current block is derived according to an embodiment.

FIGS. 14 and 15 are flowcharts showing an encoding method of an encoding apparatus and an encoding apparatus according to an embodiment.

16 and 17 are flowcharts showing a decoding method of an image decoding apparatus and a decoding apparatus according to an embodiment.

While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. However, this is not intended to limit the invention to the specific embodiments. It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. It is to be understood that the terms "comprises", "having", and the like in the specification are intended to specify the presence of stated features, integers, steps, operations, elements, parts or combinations thereof, It should be understood that they do not preclude the presence or addition of a combination of numbers, steps, operations, components, parts, or combinations thereof.

In the meantime, the configurations of the drawings described in the present invention are shown independently for convenience of description of different characteristic functions, and do not mean that the configurations are implemented as separate hardware or separate software. For example, two or more of the configurations may combine to form one configuration, or one configuration may be divided into a plurality of configurations. Embodiments in which each configuration is integrated and / or separated are also included in the scope of the present invention unless they depart from the essence of the present invention.

The following description can be applied in the technical field dealing with video, image or image. For example, the method or embodiment disclosed in the following description may be applied to various video coding standards such as the Versatile Video Coding (VVC) standard (ITU-T Rec. H.266), the next generation video / image coding standard after VVC, For example, the High Efficiency Video Coding (HEVC) standard (ITU-T Rec. H.265), etc.).

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. Hereinafter, the same reference numerals will be used for the same constituent elements in the drawings, and redundant explanations for the same constituent elements will be omitted.

In this specification, video may mean a set of images according to time. A picture generally refers to a unit that represents one image in a specific time period, and a slice is a unit that constitutes a part of a picture in coding. One picture may be composed of a plurality of slices, and pictures and slices may be used in combination if necessary.

A pixel or a pel may mean a minimum unit of a picture (or image). Also, a 'sample' may be used as a term corresponding to a pixel. A sample may generally represent a pixel or pixel value and may only represent a pixel / pixel value of a luma component or only a pixel / pixel value of a chroma component.

A unit represents a basic unit of image processing. A unit may include at least one of a specific area of a picture and information related to the area. The unit may be used in combination with terms such as a block or an area in some cases. In general, an MxN block may represent a set of samples or transform coefficients consisting of M columns and N rows.

BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a schematic view illustrating a configuration of an encoding apparatus to which the present invention can be applied. FIG. The encoding device may include a video encoding device and / or a video encoding device. In some cases, the video encoding apparatus may be used as a concept including a video encoding apparatus.

1, the image encoding apparatus 100 includes a picture partitioning module 105, a prediction module 110, a residual processing module 120, an entropy encoding unit 120, module 130, an adder 140, a filtering module 150, and a memory 160. The residual processing unit 120 includes a subtractor 121, a transform module 122, a quantization module 123, a rearrangement module 124, a dequantization module 125 And an inverse transform module (126).

The picture dividing unit 105 may divide the inputted picture into at least one processing unit.

In one example, the processing unit may be referred to as a coding unit (CU). In this case, the coding unit may be recursively partitioned according to a quad-tree binary-tree (QTBT) structure from the largest coding unit (LCU). For example, one coding unit may be divided into a plurality of coding units of deeper depth based on a quadtree structure, a binary tree structure, and / or a ternary tree structure. In this case, for example, the quadtree structure is applied first, and the binary tree structure and the ternary tree structure can be applied later. Or a binary tree structure / ternary tree structure may be applied first. The coding procedure according to the present invention can be performed based on the final coding unit which is not further divided. In this case, the maximum coding unit may be directly used as the final coding unit based on the coding efficiency or the like depending on the image characteristics, or the coding unit may be recursively divided into lower-depth coding units Lt; / RTI > may be used as the final coding unit. Here, the coding procedure may include a procedure such as prediction, conversion, and restoration, which will be described later.

As another example, the processing unit may include a coding unit (CU) prediction unit (PU) or a transform unit (TU). The coding unit may be split from the largest coding unit (LCU) into coding units of deeper depth along the quad tree structure. In this case, the maximum coding unit may be directly used as the final coding unit based on the coding efficiency or the like depending on the image characteristics, or the coding unit may be recursively divided into lower-depth coding units Lt; / RTI > may be used as the final coding unit. When a smallest coding unit (SCU) is set, the coding unit can not be divided into smaller coding units than the minimum coding unit. Herein, the term " final coding unit " means a coding unit on which the prediction unit or the conversion unit is partitioned or divided. A prediction unit is a unit that is partitioned from a coding unit, and may be a unit of sample prediction. At this time, the prediction unit may be divided into sub-blocks. The conversion unit may be divided along the quad-tree structure from the coding unit, and may be a unit for deriving a conversion coefficient and / or a unit for deriving a residual signal from the conversion factor. Hereinafter, the coding unit may be referred to as a coding block (CB), the prediction unit may be referred to as a prediction block (PB), and the conversion unit may be referred to as a transform block (TB). The prediction block or prediction unit may refer to a specific area in the form of a block in a picture and may include an array of prediction samples. Also, a transform block or transform unit may refer to a specific region in the form of a block within a picture, and may include an array of transform coefficients or residual samples.

The prediction unit 110 predicts a current block or a residual block and generates a predicted block including prediction samples of the current block can do. The unit of prediction performed in the prediction unit 110 may be a coding block, a transform block, or a prediction block.

The prediction unit 110 may determine whether intra prediction or inter prediction is applied to the current block. For example, the prediction unit 110 may determine whether intra prediction or inter prediction is applied in units of CU.

In the case of intra prediction, the prediction unit 110 may derive a prediction sample for a current block based on a reference sample outside the current block in a picture to which the current block belongs (hereinafter referred to as a current picture). At this time, the prediction unit 110 may derive a prediction sample based on (i) an average or interpolation of neighboring reference samples of the current block, (ii) The prediction sample may be derived based on a reference sample existing in a specific (prediction) direction with respect to the prediction sample among the samples. (i) may be referred to as a non-directional mode or a non-angle mode, and (ii) may be referred to as a directional mode or an angular mode. In the intra prediction, the prediction mode may have, for example, 33 directional prediction modes and at least two non-directional modes. The non-directional mode may include a DC prediction mode and a planar mode (Planar mode). The prediction unit 110 may determine a prediction mode applied to a current block using a prediction mode applied to a neighboring block.

In the case of inter prediction, the prediction unit 110 may derive a prediction sample for a current block based on a sample specified by a motion vector on a reference picture. The prediction unit 110 may derive a prediction sample for a current block by applying one of a skip mode, a merge mode, and a motion vector prediction (MVP) mode. In the skip mode and the merge mode, the prediction unit 110 can use motion information of a neighboring block as motion information of a current block. In the skip mode, difference (residual) between the predicted sample and the original sample is not transmitted unlike the merge mode. In the MVP mode, a motion vector of a current block can be derived by using a motion vector of a neighboring block as a motion vector predictor to use as a motion vector predictor of a current block.

In the case of inter prediction, a neighboring block may include a spatial neighboring block existing in a current picture and a temporal neighboring block existing in a reference picture. The reference picture including the temporal neighboring block may be referred to as a collocated picture (colPic). The motion information may include a motion vector and a reference picture index. Information such as prediction mode information and motion information may be (entropy) encoded and output in the form of a bit stream.

When the motion information of the temporal neighboring blocks is used in the skip mode and the merge mode, the highest picture on the reference picture list may be used as a reference picture. The reference pictures included in the picture order count can be sorted on the basis of the picture order count (POC) difference between the current picture and the corresponding reference picture. The POC corresponds to the display order of the pictures and can be distinguished from the coding order.

The subtraction unit 121 generates residual samples that are the difference between the original sample and the predicted sample. When the skip mode is applied, a residual sample may not be generated as described above.

The transforming unit 122 transforms the residual samples on a transform block basis to generate a transform coefficient. The transforming unit 122 can perform the transform according to the size of the transform block and a prediction mode applied to the coding block or the prediction block spatially overlapping the transform block. For example, if intraprediction is applied to the coding block or the prediction block that overlaps the transform block and the transform block is a 4x4 residual array, the residual sample may be transformed using a DST (Discrete Sine Transform) transformation kernel In other cases, the residual samples can be transformed using a DCT (Discrete Cosine Transform) conversion kernel.

The quantization unit 123 may quantize the transform coefficients to generate quantized transform coefficients.

The reordering unit 124 rearranges the quantized transform coefficients. The reordering unit 124 may rearrange the block-shaped quantized transform coefficients into a one-dimensional vector form through a scanning method of coefficients. The reordering unit 124 may be a part of the quantization unit 123, although the reordering unit 124 is described as an alternative configuration.

The entropy encoding unit 130 may perform entropy encoding on the quantized transform coefficients. Entropy encoding may include, for example, an encoding method such as exponential Golomb, context-adaptive variable length coding (CAVLC), context-adaptive binary arithmetic coding (CABAC) The entropy encoding unit 130 may encode information necessary for video restoration (e.g., a value of a syntax element, etc.) other than the quantized transform coefficient, either entropy encoding or separately according to a predetermined method. The encoded information may be transmitted or stored in units of NAL (network abstraction layer) units in the form of a bit stream. The bitstream may be transmitted over a network or stored in a digital storage medium. The network may include a broadcasting network and / or a communication network, and the digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, and SSD.

The inverse quantization unit 125 inversely quantizes the quantized values (quantized transform coefficients) in the quantization unit 123 and the inverse transformation unit 126 inversely quantizes the inversely quantized values in the inverse quantization unit 125, .

The adder 140 combines the residual sample and the predicted sample to reconstruct the picture. The residual samples and the prediction samples are added in units of blocks so that a reconstruction block can be generated. Here, the adding unit 140 may be a part of the predicting unit 110, Meanwhile, the addition unit 140 may be referred to as a reconstruction module or a reconstruction block generation unit.

For the reconstructed picture, the filter unit 150 may apply a deblocking filter and / or a sample adaptive offset. Through deblocking filtering and / or sample adaptive offsets, artifacts in the block boundary in the reconstructed picture or distortion in the quantization process can be corrected. The sample adaptive offset can be applied on a sample-by-sample basis and can be applied after the process of deblocking filtering is complete. The filter unit 150 may apply an ALF (Adaptive Loop Filter) to the restored picture. The ALF may be applied to the reconstructed picture after the deblocking filter and / or sample adaptive offset is applied.

The memory 160 may store restored pictures (decoded pictures) or information necessary for encoding / decoding. Here, the reconstructed picture may be a reconstructed picture whose filtering procedure has been completed by the filter unit 150. The stored restored picture may be used as a reference picture for (inter) prediction of another picture. For example, the memory 160 may store (reference) pictures used for inter prediction. At this time, the pictures used for inter prediction can be designated by a reference picture set or a reference picture list.

FIG. 2 is a view schematically illustrating a configuration of a video decoding apparatus to which the present invention can be applied. Hereinafter, the image decoding apparatus may include an image decoding apparatus.

2, the image decoding apparatus 200 includes an entropy decoding module 210, a residual processing module 220, a prediction module 230, an adder 240, A filtering module 250, and a memory 260. As shown in FIG. Here, the residual processing unit 220 may include a rearrangement module 221, a dequantization module 222, and an inverse transform module 223. In addition, although not shown, the video decoding apparatus 200 may include a receiver for receiving a bitstream including video information. The receiving unit may be a separate module or may be included in the entropy decoding unit 210.

When the bitstream including the video / image information is input, the image decoding apparatus 200 can restore the video / image / picture corresponding to the process of processing the video / image information in the image encoding apparatus.

For example, the video decoding apparatus 200 can perform video decoding using a processing unit applied in the video encoding apparatus. Thus, the processing unit block of video decoding may be, for example, a coding unit and, in another example, a coding unit, a prediction unit or a conversion unit. The coding unit may be divided from the maximum coding unit along a quadtree structure, a binary tree structure, and / or a ternary tree structure.

A prediction unit and a conversion unit may be further used as the case may be, in which case the prediction block is a block derived or partitioned from the coding unit and may be a unit of sample prediction. At this time, the prediction unit may be divided into sub-blocks. The conversion unit may be divided along the quad tree structure from the coding unit and may be a unit that derives the conversion factor or a unit that derives the residual signal from the conversion factor.

The entropy decoding unit 210 may parse the bitstream and output information necessary for video restoration or picture restoration. For example, the entropy decoding unit 210 decodes information in a bitstream based on a coding method such as exponential Golomb coding, CAVLC, or CABAC, and calculates a value of a syntax element necessary for video restoration, a quantized value Lt; / RTI >

More specifically, the CABAC entropy decoding method includes receiving a bean corresponding to each syntax element in a bitstream, decoding decoding target information of the decoding target syntax element, decoding information of a surrounding and decoding target block, or information of a symbol / , Determines a context model, predicts the occurrence probability of a bin according to the determined context model, performs arithmetic decoding of the bean, and generates a symbol corresponding to the value of each syntax element can do. At this time, the CABAC entropy decoding method can update the context model using the information of the decoded symbol / bean for the context model of the next symbol / bean after determining the context model.

Information regarding prediction in the information decoded by the entropy decoding unit 210 is provided to the predicting unit 230. The residual value, i.e., the quantized transform coefficient, which is entropy-decoded in the entropy decoding unit 210, 221).

The reordering unit 221 may rearrange the quantized transform coefficients into a two-dimensional block form. The reordering unit 221 may perform reordering in response to the coefficient scanning performed in the encoding apparatus. The rearrangement unit 221 may be a part of the inverse quantization unit 222, although the rearrangement unit 221 has been described as an alternative configuration.

The inverse quantization unit 222 may dequantize the quantized transform coefficients based on the (inverse) quantization parameters, and output the transform coefficients. At this time, the information for deriving the quantization parameter may be signaled from the encoding device.

The inverse transform unit 223 may invert the transform coefficients to derive the residual samples.

The prediction unit 230 may predict a current block and may generate a predicted block including prediction samples of the current block. The unit of prediction performed in the prediction unit 230 may be a coding block, a transform block, or a prediction block.

The prediction unit 230 may determine whether intra prediction or inter prediction is to be applied based on the prediction information. In this case, a unit for determining whether to apply intra prediction or inter prediction may differ from a unit for generating a prediction sample. In addition, units for generating prediction samples in inter prediction and intra prediction may also be different. For example, whether inter prediction or intra prediction is to be applied can be determined in units of CU. Also, for example, in the inter prediction, the prediction mode may be determined in units of PU to generate prediction samples. In intra prediction, a prediction mode may be determined in units of PU, and prediction samples may be generated in units of TU.

In the case of intra prediction, the prediction unit 230 may derive a prediction sample for the current block based on the surrounding reference samples in the current picture. The prediction unit 230 may apply a directional mode or a non-directional mode based on the neighbor reference samples of the current block to derive a prediction sample for the current block. In this case, a prediction mode to be applied to the current block may be determined using the intra prediction mode of the neighboring block.

In the case of inter prediction, the prediction unit 230 may derive a prediction sample for a current block based on a sample specified on a reference picture by a motion vector on a reference picture. The prediction unit 230 may derive a prediction sample for a current block by applying a skip mode, a merge mode, or an MVP mode. At this time, motion information required for inter prediction of a current block provided in the video encoding apparatus, for example, information on a motion vector, a reference picture index, and the like may be acquired or derived based on the prediction information

In the skip mode and the merge mode, motion information of a neighboring block can be used as motion information of the current block. In this case, the neighboring block may include a spatial neighboring block and a temporal neighboring block.

The prediction unit 230 may construct a merge candidate list using the motion information of the available neighboring blocks and use the information indicated by the merge index on the merge candidate list as the motion vector of the current block. The merge index may be signaled from the encoding device. The motion information may include a motion vector and a reference picture. When the motion information of temporal neighboring blocks is used in the skip mode and the merge mode, the highest picture on the reference picture list can be used as a reference picture.

In the skip mode, unlike the merge mode, the difference between the predicted sample and the original sample (residual) is not transmitted.

In the MVP mode, a motion vector of a current block can be derived using a motion vector of a neighboring block as a motion vector predictor. In this case, the neighboring block may include a spatial neighboring block and a temporal neighboring block.

For example, when the merge mode is applied, a merge candidate list may be generated using a motion vector of the reconstructed spatial neighboring block and / or a motion vector corresponding to a Col block that is a temporally neighboring block. In the merge mode, the motion vector of the candidate block selected in the merge candidate list is used as the motion vector of the current block. The prediction information may include a merge index indicating a candidate block having an optimal motion vector selected from the candidate blocks included in the merge candidate list. At this time, the predicting unit 230 can derive the motion vector of the current block using the merge index.

As another example, when a motion vector prediction mode (MVP) is applied, a motion vector predictor candidate list is generated by using a motion vector of the reconstructed spatial neighboring block and / or a motion vector corresponding to a Col block which is a temporally neighboring block . That is, the motion vector of the reconstructed spatial neighboring block and / or the motion vector corresponding to the neighboring block Col may be used as a motion vector candidate. The information on the prediction may include a predicted motion vector index indicating an optimal motion vector selected from the motion vector candidates included in the list. At this time, the predicting unit 230 can use the motion vector index to select a predictive motion vector of the current block from the motion vector candidates included in the motion vector candidate list. The predicting unit of the encoding apparatus can obtain the motion vector difference (MVD) between the motion vector of the current block and the motion vector predictor, and can output it as a bit stream. That is, MVD can be obtained by subtracting the motion vector predictor from the motion vector of the current block. In this case, the predicting unit 230 may obtain the motion vector difference included in the information on the prediction, and derive the motion vector of the current block through addition of the motion vector difference and the motion vector predictor. The prediction unit may also acquire or derive a reference picture index or the like indicating the reference picture from the information on the prediction.

The adder 240 may add a residual sample and a prediction sample to reconstruct a current block or a current picture. The adder 240 may add the residual samples and the prediction samples on a block-by-block basis to reconstruct the current picture. When the skip mode is applied, since the residual is not transmitted, the predicted sample can be the restored sample. Here, the adder 240 has been described as an alternative configuration, but the adder 240 may be a part of the predictor 230. Meanwhile, the addition unit 240 may be referred to as a reconstruction module or a reconstruction block generation unit.

The filter unit 250 may apply deblocking filtered sample adaptive offsets, and / or ALFs, to the reconstructed pictures. At this time, the sample adaptive offset may be applied on a sample-by-sample basis and may be applied after deblocking filtering. The ALF may be applied after deblocking filtering and / or sample adaptive offsets.

The memory 260 may store restored pictures (decoded pictures) or information necessary for decoding. Here, the reconstructed picture may be a reconstructed picture whose filtering procedure has been completed by the filter unit 250. For example, the memory 260 may store pictures used for inter prediction. At this time, the pictures used for inter prediction may be designated by a reference picture set or a reference picture list. The reconstructed picture can be used as a reference picture for another picture. In addition, the memory 260 may output the restored picture according to the output order.

On the other hand, prediction is performed in order to enhance the compression efficiency in performing video coding as described above. Thereby generating a predicted block including predicted samples of the current block as a coding target block. Where the predicted block includes predicted samples in the spatial domain (or pixel domain). The predicted block is derived similarly in the encoding apparatus and the decoding apparatus, and the encoding apparatus decodes information (residual information) about the residual between the original block and the predicted block rather than the original sample value of the original block Signaling to the device can improve the image coding efficiency. The decoding apparatus may derive a residual block including residual samples based on the residual information and may generate a restoration block including restoration samples by adding the predicted block to the residual block, The reconstructed picture including the reconstructed picture can be generated.

The residual information may be generated through conversion and quantization procedures. For example, the encoding apparatus derives a residual block between the original block and the predicted block, and performs a conversion procedure on the residual samples (residual sample array) included in the residual block to derive the transform coefficients And quantizing the quantized transform coefficients by performing a quantization procedure on the transform coefficients to signal related decoding information (through a bit stream) to the decoding device. Here, the residual information may include information such as the value information of the quantized transform coefficients, the position information, the transform technique, the transform kernel, and the quantization parameter. The decoding apparatus performs an inverse quantization / inverse transformation procedure based on the residual information and can derive residual samples (or residual blocks). The decoding apparatus can generate a reconstructed picture based on the predicted block and the residual block. The encoding apparatus may also derive a residual block by dequantizing / inverse transforming the quantized transform coefficients for a reference for inter prediction of a subsequent picture, and generate a reconstructed picture based on the derived residual block.

3 is a diagram illustrating an example of motion represented through an affine motion model according to one embodiment.

In the present specification, "CP" is an abbreviation of a control point, and may refer to a reference sample or a reference point in the process of applying an affine motion model to a current block. The motion vector of the CP may be referred to as a " CPMV (Control Point Motion Vector) "and the CPMV may be derived based on a CPMV predictor" CPMVP (Control Point Motion Vector Predictor) ".

Referring to FIG. 3, a motion that may be represented through an affine motion model according to one embodiment may include translational motion, scale motion, rotate motion, and shear motion. have. In other words, the affine motion model is based on the translational movement of the image (part of the image) in accordance with the passage of time, the scale movement in which a part of the image is scaled according to the passage of time, And a shear motion in which a part of the image is deformed into an equilibrium quadrilateral shape according to the passage of time.

Affine inter prediction may be performed using an affine motion model according to an embodiment. The encoding apparatus / decoding apparatus can predict the distortion of the image based on the motion vectors in the CPs of the current block through affine inter prediction, thereby improving the compression performance of the image by increasing the accuracy of the prediction . In addition, since a motion vector for at least one CP of the current block can be derived by using a motion vector of a neighboring block of the current block, it is possible to reduce the data amount burden on the added additional information, .

In one example, affine inter prediction may be performed based on three CPs for the current block, i.e., motion information at three reference points. The motion information at the three CPs for the current block may include the CPMV of each CP.

FIG. 4 exemplarily shows a motion model in which the motion vectors for the three CPs are used.

When the position of the top-left sample in the current block 400 is (0, 0), the width of the current block 400 is w, and the height is h, The samples located at (0, 0), (w, 0) and (0, h) as CPs for the current block 400 as shown in FIG. CP at the (0, 0) sample position can be represented as CP0, CP at the (w, 0) sample position as CP1, and CP at the (0, h) sample position as CP2.

An affine motion model according to an embodiment can be applied using the CPs and the motion vectors for the CPs. The affine motion model can be expressed as Equation 1 below.

[Equation 1]

Here, w represents the width of the current block 400, h represents the height of the current block 400, v _0x and v _0y denote the x component of the motion vector of CP0, y And v _1x and v _1y denote the x and y components of the motion vector of CP1, respectively, and v _2x and v _2y denote the x and y components of the motion vector of CP2, respectively. X represents the x component of the position of the target sample in the current block 400 y represents the y component of the position of the target sample in the current block 400 and v _x represents the current block 400 ) x component of the motion vector of the target within the sample, v _y represents the y component of the motion vector of the target sample within the current block (400).

Since a motion vector of CP0, a motion vector of CP1, and a motion vector of CP2 are known, a motion vector according to a sample position in a current block can be derived based on Equation (1). That is, according to the affine motion model, the motion vectors v0 ( _v0x , _v0y ) and v1 (v0) in the CPs are calculated based on the distance ratios between the coordinates (x, y) _{_{v 1x, v 1y), v2}} (v 2x, v 2y) is can be scaled to be derived by the motion vector of the target sample corresponding to the destination sample position. That is, according to the affine motion model, a motion vector of each sample in the current block can be derived based on motion vectors of the CPs. The set of motion vectors of the samples in the current block derived according to the affine motion model may be referred to as an affine motion vector field.

The six parameters for Equation (1) can be represented by a, b, c, d, e, and f as shown in the following mathematical formulas, and the mathematical expression for the affine motion model represented by the six parameters is &Lt; / RTI >

&Quot; (2) "

The affine motion model using the six parameters or the affine inter prediction may be referred to as a six-parameter affine motion model or AF6.

In one example, affine inter prediction may be performed based on three CPs for the current block 400, i.e., motion information at three reference points. The motion information at the three CPs for the current block 400 may include the CPMV of each CP.

In one example, affine inter prediction may be performed based on two CPs for the current block 400, i.e., motion information at two reference points. The motion information at the two CPs for the current block 400 may include the CPMV of each CP.

5 illustrates an exemplary motion model in which the motion vectors for two CPs are used.

An affine motion model using two CPs can represent three movements including translational motion, scale motion, and rotational motion. The affine motion model representing the three movements may be referred to as a similarity affine motion model or a simplified affine motion model.

When the position of the top-left sample in the current block 500 is (0, 0), the width of the current block 500 is w, and the height is h, as shown in FIG. 5 0, 0) and (w, 0) may be defined as CPs for the current block 500. The CP at the (0, 0) sample position can be denoted as CP0, and the CP at the (w, 0) sample position can be denoted as CP1.

An affine motion model according to an embodiment can be applied using the CPs and the motion vectors for the CPs. The affine motion model can be expressed as Equation 3 below.

&Quot; (3) "

Here, w represents the width of the current block 500, v 0 _x and v _0y denote the x-component and y-component of the motion vector of CP 0, v _1x and v _1y denote the motion vectors of CP ₁ , x component, and y component. X is the x component of the location of the object in the current block 500 y is the y component of the location of the object in the current block 500 and v _x is the current block 500 ) x component of the motion vector of the target within the sample, v _y represents the y component of the motion vector of the target sample within the current block (500).

The four parameters for Equation (3) can be expressed as a, b, c, and d as shown in the following equation, and the mathematical expression for the affine motion model represented by the four parameters can be expressed as follows .

&Quot; (4) "

Here, w represents the width of the current block 500, v 0 _x and v _0y denote the x-component and y-component of the motion vector of CP 0, v _1x and v _1y denote the motion vectors of CP ₁ , x component, and y component. X is the x component of the location of the object in the current block 500 y is the y component of the location of the object in the current block 500 and v _x is the current block 500 ) x component of the motion vector of the target within the sample, v _y represents the y component of the motion vector of the target sample within the current block (500). The affine motion model using the two CPs can be represented by four parameters a, b, c, and d as in Equation (4), and the affine motion model using the four parameters The affine inter prediction can be expressed as a motion parameter, which is a 4-parameter affine, or AF4. That is, according to the affine motion model, a motion vector of each sample in the current block can be derived based on motion vectors of the control points. A set of motion vectors of the samples in the current block derived according to the affine motion model may be expressed as an affine motion vector field.

On the other hand, as described above, the motion vector of a sample unit can be derived through the affine motion model, and the accuracy of the inter prediction can be improved significantly. However, in this case, the complexity in the motion compensation process may be greatly increased.

In another example, instead of deriving a motion vector of a sample unit, the motion vector of each sub-block in the current block may be limited to be derived.

FIG. 6 exemplarily illustrates a case where the size of the current block is 16x16 and a motion vector is derived in units of 4x4 sub-blocks. For example, if the subblock is set to a size of nxn (n is a positive integer, ex, n is 4), the subblock may be set in various sizes. For example, A motion vector may be derived for each sub-block, and various methods for deriving a motion vector representative of each sub-block may be applied.

For example, referring to FIG. 6, a motion vector of each sub-block may be derived using a center position or a lower right side sample position of each sub-block as representative coordinates. Here, the lower right position of the center may indicate the sample position located on the lower right side among the four samples located at the center of the sub-block. For example, if n is an odd number, one sample may be located in the center of a sub-block, in which case a center sample position may be used to derive the motion vector of the sub-block. However, if n is an even number, four samples may be positioned adjacent to the center of a sub-block, and a lower-order sample position may be used for deriving the motion vector. For example, referring to FIG. 6, representative coordinates of each sub-block may be derived as (2, 2), (6, 2), (10, 2), ..., (14, 14) The apparatus / decoding apparatus may substitute each of the representative coordinates of the sub-blocks into Equation (1) or (3) to derive a motion vector of each sub-block. The motion vectors of the subblocks in the current block derived through the affine motion model may be denoted as affine MVF.

Meanwhile, inter prediction using the above affine motion model, that is, affine motion prediction, may have an affine merge mode (AF_MERGE) and an affine inter mode (AF_INTER).

The merge mode according to an exemplary embodiment may include two or three CPs from neighboring blocks of the current block without coding for MVD (motion vector difference) similar to the existing skip / merge mode Lt; RTI ID = 0.0 > CPMV < / RTI >

3 to 6 are intended to aid understanding of the principles of the encoding / decoding method according to an embodiment of the present invention, which will be described later with reference to FIGS. 7 to 17, It will be readily appreciated by those of ordinary skill in the art that the scope of the invention is not limited by the foregoing description in Figures 3-6.

7 to 17, an encoding / decoding method according to some embodiments of the present invention will be described. More specifically, FIGS. 7 to 17 illustrate motion estimation and motion compensation, which may be performed based on Motion Vector Derivation (MVD) or Decoder Side Motion Vector Derivation (DMVD). As an example of motion vector derivation, BIMVD (Bilinear Interpolation Motion Model based Motion Vector Derivation) may be applied, and a bidirectional linear interpolation motion model may be used in BIMVD. In the following description, the detailed operations according to some embodiments have been described with reference to the decoding apparatus 200 for the sake of convenience, but the detailed operations may be applied to the encoding apparatus 100 as well.

FIG. 7 is a flowchart illustrating a process of performing image coding based on a motion vector according to an embodiment. In one example, FIG. 7 may be a flow chart that schematically illustrates a method of operating inter prediction according to one embodiment.

The decoding apparatus 200 according to an exemplary embodiment may estimate motion information of a current block (S700). More specifically, the decoding apparatus 200 according to the embodiment can derive MVP for the current block (S702), decode (or obtain) MVD (from the bit stream) for the current block (S704 , MVP and MVD for the current block are added to obtain the motion vector of the current block (S706).

The decoding apparatus 200 may generate prediction samples for a current block based on motion information estimated in S700 (S710). When the prediction samples for the current block are generated, the decoding apparatus 200 can restore the current block based on the prediction sample and the residual sample for the current block.

The decoding apparatus 200 according to an exemplary embodiment may perform a motion vector derivation (BIMVD) based on a bidirectional interference motion model. Information about the four CPs CP0, CP1, CP2, and CP3 for the current block 800 may be needed to perform BIMVD on the current block 800, for example, as shown in FIG. 8 . Information about the CP may include information about the CPMVP, CPMV, CP location, the x-axis gradient of the CP, the y-axis gradient of the CP, and the time-base gradient of the CP. CPMVi and CPMVPi represent CPi (i = 0, 1, 2, 3).

In FIG. 8, it is assumed that information on four CPs is required to perform BIMVD on the current block 800, but the embodiment is not limited thereto. For example, the decoding device 200 may perform BIMVD on the current block 800 based on two CPs or three CPs for the current block 800.

In FIG. 8, four CPs used in the process of performing BIMVD on the current block 800 are assumed to be four samples existing at each corner of the current block 800, but the embodiment is limited thereto no. For example, a sample at the center of the current block 800, a sample at a predetermined distance from each edge of the current block 800, and the like may be used as a CP in the process of performing BIMVD on the current block 800 have.

The decoding apparatus 200 according to one embodiment may derive MVP for the current block 800 and decode (or obtain) the MVD for the current block 800 (from the bitstream).

In addition, the decoding apparatus 200 according to an exemplary embodiment may derive the CPMVPs of the CPs for the current block 800, respectively. For example, the decoding apparatus 200 may derive the CPMVPs CPMVP0, CPMVP1, CPMVP2 and CPMVP3 of the four CPs CP0, CP1, CP2 and CP3 for the current block 800, respectively. A more specific method by which the decoding apparatus 200 derives the CPMVPs of the CPs for the current block 800 will be described later with reference to FIG.

In addition, the decoding apparatus 200 according to an exemplary embodiment may derive the CPMVs of the CPs for the current block 800 based on the CPMVPs of the CPs for the current block 800, respectively. For example, the decoding apparatus 200 may determine the number of CPs for each of the four CPs for the current block 800 based on CPMVPs (CPMVP0, CPMVP1, CPMVP2, and CPMVP3) of each of the four CPs for the current block 800 The CPMVs CPMV0, CPMV1, CPMV2 and CPMV3 can be derived. The CPMVs of each of the CPs for the current block 800 may be derived, for example, based on Equation (5) below.

&Quot; (5) "

In Equation (5), MVP represents the MVP for the current block 800, and MVD may represent the difference between the motion vector of the current block 800 and the MVP for the current block 800.

In addition, the decoding apparatus 200 may derive a motion vector field (MVF) for the current block 800 based on the CPMVs of the CPs for the current block 800 have. At this time, since the embodiment according to FIG. 8 is based on the BIMVD described above, the MVF for the current block 800 may be referred to as a BIMVD MVF. The MVF for the current block 800 may be derived in units of subblocks or samples in the current block 800, and the subblock unit may be, for example, a 4x4 block unit or an NxN block unit. The decoding apparatus 200 can derive the MVF for the current block 800 based on the determined unit.

In this case, CPMVs can be applied to the bidirectional linear interpolation motion model in deriving the MVF for the current block 800 based on the CPMVs of the CPs for the current block 800. A more detailed description of the bidirectional linear interpolation motion model will be described later with reference to FIG.

The decoding apparatus 200 according to the embodiment can perform prediction on a sub-block unit or a sample basis on the basis of the derived MVF.

Each step disclosed in Fig. 9 can be performed by the encoding apparatus 100 disclosed in Fig. 1 or the decoding apparatus 200 disclosed in Fig. In addition, some of the explanations relating to each step disclosed in Fig. 9 have been described above in Fig. More specifically, step S910 corresponds to step S710 of FIG. 7, step S920 is performed by the transforming unit 122 and the quantizing unit 123 of FIG. 1, and the inverse quantizing unit 222 and the inverse transforming unit 223 of FIG. And S930 may be performed by the entropy encoding unit 130 of FIG. 1 and the entropy decoding unit 210 of FIG. Therefore, the detailed description overlapping with the above-described contents in FIG. 1, FIG. 2 or FIG. 8 will be omitted or simplified.

The decoding apparatus 200 according to an exemplary embodiment may estimate motion information of a current block (S900). More specifically, the decoding apparatus 200 according to the embodiment can derive the MVP for the current block (S901), decode the MVD for the current block (S902), and calculate the four CPs It is possible to derive four CPMVPs corresponding to each of the CPMVs of the current block (S903), derive four CPMVs corresponding to each of the four CPs of the current block (S904) (S905).

The decoding apparatus 200 according to an embodiment of the present invention decodes CPMVPs using a first reference picture and a second reference picture of a current picture including a current block in the course of deriving CPMVPs according to step S903 Time-axix bidirectional prediction (bi-prediction) can be performed. The time base bidirectional prediction used in the process of deriving the CPMVPs will be described in more detail with reference to FIG.

The decoding apparatus 200 according to the embodiment may use Equation (5) described above in the process of explaining FIG. 8 in the process of deriving CPMVs according to S904. However, the embodiment is not limited thereto, and the decoding apparatus 200 may derive the CPMVs from Equation (5) based on a formula within a range that can be easily derived by a person skilled in the art.

The decoding apparatus 200 according to the embodiment can apply the CPMVs to the bidirectional linear interpolation motion model in the process of deriving the MVF for the current block according to S905. The bidirectional linear interpolation motion model can be determined based on the CPMVs, the width of the current block, and the height of the current block. The MVF for the current block derived according to S905 may be derived in units of subblocks or samples, and the subblock unit may be, for example, 4x4 blocks or NxN blocks. A more detailed description of the unit in which the MVF is determined in the current block will be described later with reference to FIG.

The decoding apparatus 200 according to the embodiment can generate prediction samples for the current block based on the motion information estimated in S900 (S901 to S905) (S910), and quantizes and quantizes the samples of the current block The transform is performed (S920), and entropy coding is performed (S930).

In one embodiment, the CPMVPs of each of the four CPs for the current block may be derived in accordance with the principles described in FIG. More specifically, the CPMVPs of each of the four CPs for the current block can be derived based on the temporal bidirectional prediction described in FIG.

Referring to FIG. 10, an LO picture (LO picture) indicates a picture before a predetermined time when a current picture is referred to, and an L1 picture (L1 picture) indicates a picture after a predetermined time . The motion vector corresponding to the A position and the motion vector corresponding to the B position may have the same size but opposite directions. The difference between the sample value at the A position and the sample value at the B position can be derived, for example, as shown in Equation (6) below.

&Quot; (6) "

In Equation (6)

Represents a sample value at the A position of the LO picture,

Represents the sample value of the B position of the L1 picture. The sample values of the A position and the B position of the L0 picture of the L0 picture and the L picture B of the L0 picture can be obtained by using the first order equation of the Taylor series, respectively.

&Quot; (7) "

&Quot; (8) "

Substituting the sample value of the A position of the LO picture according to Equation (7) and the sample value of the B position of the L1 picture according to Equation (8) into Equation (6), the following Equation (9) can be derived.

&Quot; (9) "

In Equation (9)

And

Represents the x-axis and y-axis partial differential values at the (i, j) position of L0,

And

Represents the x-axis and y-axis partial differential values at the (i, j) position of L1, respectively. The x-axis and y-axis partial differential values can mean the amount of change in the sample values in the x- and y-axis directions and can be derived through various mathematical methods.

(I, j) in Equations (7) and (8)

And

(9)

As shown in FIG. In this case, (i, j) may be one of the positions of four CPs for the current block, and local (i, j) within a window of (2M + 1) If a locally steady motion is applied, the motion vector at position (i, j) is (i ', j') (where iM ≤ i'≤i + M, jM ≤ j'≤j + M) . &Lt; / RTI > Accordingly, based on Equations (6) to (9)

The following equation (10) is obtained.

&Quot; (10) "

Can be expressed by Equation (11) below.

&Quot; (11) "

In Equation (11)

May represent an x-axis gradient, a y-axis gradient, and a time-base gradient, respectively, of the (i, j) position. Equation 10

(12), < / RTI >< RTI ID =

The following equation (13) is obtained.

&Quot; (12) "

&Quot; (13) "

Based on Equations (12) and (13), the motion vector at the (i, j)

And

Can be derived as shown in the following Equation (14).

&Quot; (14) "

Therefore, according to an embodiment, when the positions of each of the four CPs for the current block are substituted into (i, j) in Equations (6) through (14), CPMVP at each CP for the current block can be derived have. However, according to the embodiment, when the positions of the four CPs for the current block are substituted into (i, j) in Equations (6) to (14), the CPMV at each CP may be directly derived have.

In one embodiment, the contents described in Fig. 10 can always be applied when the decoding apparatus 200 performs image decoding.

Alternatively, in another embodiment, the operations described in FIG. 10 may be performed based on a determination as to whether to perform bi-prediction in the course of deriving the CPMVP of each CP. The temporal bidirectional prediction refers to a prediction using a first reference picture and a second reference picture of a current picture including a current block to derive CPMVPs. In the temporal bidirectional prediction, the first reference picture and the second reference picture may be located in mutually opposite directions on the time axis with reference to the current picture.

The result of the decision as to whether to perform temporal bi-directional prediction may be indicated as TRUE bi-prediction. TRUE bi-prediction can perform bidirectional prediction, and there are a first reference picture L0 and a second reference picture L1 for the current picture, and the first reference picture L0 and the second reference picture L1 are time- It can be said to be located in the opposite axial direction. In the case of TRUE bi-prediction, the decoding apparatus 200 performs time-base bi-directional prediction and performs BIMVD based on the bi-prediction. If it is not TRUE bi-prediction, BIMVD is not performed, and the decoding apparatus 200 can perform bi-directional prediction (general) instead of time-base bidirectional prediction. (General) bidirectional prediction may mean prediction based on, for example, a first reference picture L0 and a second reference picture L1 existing in the same direction on the time axis with respect to the current picture.

Alternatively, in still another embodiment, the decoding apparatus 200 can determine whether or not to perform time-base bidirectional prediction based on flag information indicating whether or not to perform time-base bidirectional prediction. When the flag information indicating whether or not to perform temporal-axis bidirectional prediction indicates that time-base bidirectional prediction is to be performed, the decoding apparatus 200 performs time-base bidirectional prediction and performs BIMVD based on the time-base bidirectional prediction. When the flag information indicating whether or not to perform temporal-axis bidirectional prediction indicates not to perform temporal-axis bidirectional prediction, the decoding apparatus 200 may perform bidirectional prediction (general) instead of time-base bidirectional prediction.

In one embodiment, when the size of the window? 1120 to which the local reshaded motion described above is applied is 5x5 ((2M + 1) x (2M + 1), M = 2) The process of obtaining the x-axis gradient Gx (i, j) of the upper left CP 1130 with respect to the block 1100 may be as shown in FIG. (I, j) of the coordinates (i, j) of the upper left CP 1130 (where iM? I'? I + M, jM? J '? J + M, M = 2) The window? 1120 of the upper left CP 1130 can be applied and the x-axis gradient of the upper left CP 1130 can be derived based on the window? 1120 of the 5x5 size.

The lower left CP 1140, the lower right CP 1150 and the upper right CP 1150 of the current block 1100 in the same manner as the method of deriving the x-axis direction gradient of the upper left CP 1130 of the

current block

1100, 1160 can be derived.

11, a block of size (8 + 2M) x (8 + 2M) (where M = 2) surrounding the current block 1100 of 8x8 size corresponds to an x-axis gradient map of the current block 1100 gradient map, 1110). 11 shows a case where the width and height of the current block are 8 and M is 2. However, the present embodiment is not limited to this, and the size of the current block and the size of M may vary.

12 is a diagram for explaining a process of deriving an MVF for a current block based on CPMVs of four CPs for a current block according to an embodiment.

8, the CPMVs for each of the four CPs of the current block 1200 may be derived based on Equation (5). Once the CPMVs are derived, the decoding device 200 may derive the MVF for the current block 1200 based on the CPMVs. The MVF for the current block 1200 may then be derived by applying the CPMVs to a bilinear interpolation motion model. The bidirectional linear interpolation motion model may be based, for example, on the following equation (15).

&Quot; (15) "

In Equation (15), x represents the x-axis coordinate of the sample in the current block 1200, y represents the y-axis coordinate of the sample in the current block 1200,

Represents the MVF for the current block 1200,

Where W represents the width of the current block 1200, H represents the height of the current block 1200,

May represent a weighting factor.

The decoding apparatus 200 according to an exemplary embodiment may perform prediction on the current block 1200 using the MVF for the current block 1200 derived based on Equation (15). The MVF for the current block 1200 may be derived in units of subblocks or samples in the current block 1200, and the subblock unit may be, for example, a 4x4 block unit or an NxN block unit. The MVF unit for the current block 1200 will be described in more detail with reference to FIG. 13 below.

In one embodiment, the MVF for the current block 1200 may be derived for each sample unit in the current block 1200.

In another embodiment, the MVF for the current block 1200 may be derived on a 4x4 sub-block basis in the current block 1200. [ In one example, a motion vector derived on a 4x4 sub-block basis may be for a sample in the center position of each 4x4 sub-block or a sample closest to the center position. However, the present invention is not limited to this example. For example, a motion vector derived in units of 4x4 sub-blocks may be for samples at arbitrary positions such as upper left, lower left, upper right, and lower right of each 4x4 sub-block.

In yet another embodiment, the MVF for the current block 1200 may be derived in units of NxN subblocks in the current block 1200. [ Where N may be less than the width and height of the current block 1200 and may be determined based on the size of the current block 1200 or the vector magnitude of the CPMVs of the CPs for the current block 1200. [ In one example, a motion vector derived on an NxN subblock basis may be for a sample in the center position of each NxN subblock or a sample closest to the center position. However, the present invention is not limited to this example. For example, a motion vector derived in units of NxN subblocks may be for samples at arbitrary positions such as upper left, lower left, upper right, and lower right of each NxN subblock.

As shown in FIG. 15, the encoding apparatus according to an embodiment may include a predictor 110 and an entropy encoding unit 130. [0064] FIG. The encoding apparatus shown in Fig. 15 can perform the operations according to Fig. More specifically, S1400 to S1450 in Fig. 14 can be performed by the predicting unit 110 of the encoding apparatus, S1460 in Fig. 14 can be performed by the residual processing unit 120 of the encoding apparatus, S1470 may be performed by the entropy encoding unit 130 of the encoding apparatus. In addition, operations according to S1400 to S1470 are based on some of the contents described above in Figs. Therefore, the detailed description overlapping with the above-described contents in FIG. 1 and FIG. 7 to FIG. 13 will be omitted or simplified.

One of ordinary skill in the art will recognize that the block diagram of the encoding apparatus shown in Figure 15 illustrates some of the arrangements of the encoding apparatus 100 shown in Figure 1 to more briefly describe the operation of the encoding apparatus according to one embodiment And therefore it will be readily understood that the contents described above in Fig. 1 can be applied similarly in Figs. 14 and 15 without reducing the scope of right.

The encoding apparatus according to an exemplary embodiment may derive a motion vector of a current block by performing BIMVD on the current block. The process of performing BIMVD on the current block may be more specific as S1400 to S1470 shown in FIG.

The encoding apparatus according to an exemplary embodiment may derive an MVP for a current block (S1400). More specifically, the prediction unit 110 according to an embodiment can derive the MVP for the current block.

An encoding apparatus according to an embodiment may derive an MVD indicating a difference between a motion vector of a current block and MVP for a current block (S1410). More specifically, the predicting unit 110 according to an embodiment can derive an MVD indicating a difference between a motion vector of a current block and MVP for a current block.

The encoding apparatus according to an embodiment may derive CPMVPs corresponding to CPs for the current block (S1420). More specifically, the predicting unit 110 according to an embodiment may derive CPMVPs corresponding to CPs for the current block. For example, the prediction unit 110 may derive four CPMVPs corresponding to each of the four CPs for the current block based on the time base bidirectional prediction.

The encoding apparatus may derive CPMVs corresponding to each of the CPs for the current block, based on the MVP, the MVD, and the CPMVPs for the current block (S1430). More specifically, the predicting unit 110 according to an embodiment can derive CPMVs corresponding to CPs for the current block, based on MVP, MVD, and CPMVPs for the current block. For example, the predicting unit 110 may derive four CPMVs corresponding to each of the four CPs for the current block, based on the sum of MVP, MVD, and CPMVP for the current block.

The encoding apparatus according to an exemplary embodiment may derive an MVF for a current block based on the CPMVs (S1440). More specifically, the predicting unit 110 according to an embodiment can derive MVF for a current block based on CPMVs. For example, the prediction unit 110 may derive the MVF for the current block by applying the CPMVs to the bidirectional linear interpolation motion model.

The encoding apparatus according to an exemplary embodiment may generate prediction samples for a current block based on the MVF for the current block (S1450). More specifically, the prediction unit 110 according to an embodiment may generate prediction samples for a current block based on MVF for the current block.

The encoding apparatus according to an embodiment may derive residual samples for a current block based on the generated prediction samples (S1460). More specifically, the residual prediction unit 120 according to an embodiment can derive residual samples for a current block based on the generated prediction samples.

The encoding apparatus according to an exemplary embodiment may encode prediction information including the derived MVD and residual information about residual samples (S1470). More specifically, the entropy encoding unit 130 according to an exemplary embodiment may encode prediction information including the derived MVD and residual information about residual samples. The entropy encoding unit 130 may output the encoded information in the form of a bitstream. The output bit stream may be transmitted to the decoding apparatus 200 disclosed in FIG.

According to the encoding apparatus and the image encoding method of the encoding apparatus disclosed in FIGS. 14 and 15, the encoding efficiency of the image information can be increased by deriving the MVF for the current block based on BIMVD.

As shown in FIG. 17, the decoding apparatus according to an embodiment may include an entropy decoding unit 210, a predicting unit 230, and an adding unit 240. The decoding apparatus shown in Fig. 17 can perform the operations according to Fig. 16 may be performed by the entropy decoding unit 210 of the decoding apparatus, and S1600, S1620 to S1650 may be performed by the predicting unit 230 of the decoding apparatus, and S1660 may be performed by the decoding apparatus The adder 240 of FIG. In addition, the operations according to S1600 to S1660 are based on some of the contents described above in Figs. Therefore, the detailed description overlapping with the above-described contents in FIG. 2 and FIGS. 7 to 17 will be omitted or simplified.

A person skilled in the art will appreciate that the block diagram of the decoding apparatus shown in FIG. 17 illustrates a portion of the configuration of the decoding apparatus 200 shown in FIG. 2 to further explain the operation of the decoding apparatus according to an embodiment And therefore it will be readily understood that the foregoing contents in Fig. 2 can be applied similarly in Figs. 16 and 17 without reducing the scope of right.

The decoding apparatus according to an exemplary embodiment may derive a motion vector of a current block by performing BIMVD on the current block. The process of performing BIMVD on the current block may be more specifically described as S1600 to S1660 shown in Fig.

The decoding apparatus according to one embodiment may derive the MVP for the current block (S1600). More specifically, the prediction unit 230 according to an embodiment can derive the MVP for the current block.

The decoding apparatus according to an exemplary embodiment may obtain MVD indicating a difference between a motion vector of a current block and MVP for a current block from a bitstream (S1610). More specifically, the entropy decoding unit 210 according to an exemplary embodiment may obtain MVD representing the difference between the motion vector of the current block and the MVP for the current block from the bitstream.

The decoding apparatus according to an embodiment may derive the CPMVPs corresponding to each of the CPs for the current block (S1620). More specifically, the predicting unit 230 according to an embodiment can derive CPMVPs corresponding to CPs for the current block. For example, the prediction unit 230 may derive four CPMVPs corresponding to each of the four CPs for the current block based on the time base bidirectional prediction.

The decoding apparatus may derive CPMVs corresponding to CPs for the current block, based on MVP, MVD, and CPMVPs for the current block (S1630). More specifically, the predicting unit 230 according to an embodiment can derive CPMVs corresponding to CPs for the current block, based on MVP, MVD, and CPMVPs for the current block. For example, the predicting unit 230 may derive four CPMVs corresponding to each of the four CPs for the current block, based on the sum of MVP, MVD, and CPMVP for the current block.

The decoding apparatus according to an exemplary embodiment may derive an MVF for a current block based on the CPMVs (S1640). More specifically, the predicting unit 230 according to an embodiment can derive the MVF for the current block based on the CPMVs. For example, the prediction unit 230 may derive the MVF for the current block by applying the CPMVs to the bidirectional linear interpolation motion model.

The decoding apparatus according to an exemplary embodiment may generate prediction samples for a current block based on MVF (S1650). More specifically, the prediction unit 230 according to an exemplary embodiment may generate prediction samples for a current block based on MVF.

The decoding apparatus according to an exemplary embodiment may generate restoration samples for a current block based on a prediction sample for the current block (S1660). More specifically, the adder 240 according to an embodiment may generate reconstruction samples for the current block based on the prediction samples for the current block.

According to the video decoding method of the decoding apparatus and the decoding apparatus disclosed in FIGS. 16 and 17, the decoding efficiency of the video information can be increased by deriving the MVF for the current block based on BIMVD.

The internal components of the above-described devices may be processors executing the sequential execution processes stored in the memory, or hardware components configured with other hardware. These can be located inside or outside the unit.

The above-described modules may be omitted according to the embodiment, or may be replaced by other modules performing similar / same operations.

The above-described method according to the present invention can be implemented in software, and the encoding apparatus and / or decoding apparatus according to the present invention can perform image processing of, for example, a TV, a computer, a smart phone, a set- Device.

In the above-described embodiments, while the methods are described on the basis of a flowchart as a series of steps or blocks, the present invention is not limited to the order of steps, and some steps may occur in different orders or in a different order than the steps described above have. It will also be appreciated by those skilled in the art that the steps depicted in the flowchart are not exclusive and that other steps may be included or that one or more steps in the flowchart may be deleted without affecting the scope of the invention.

When the embodiments of the present invention are implemented in software, the above-described method may be implemented by a module (a process, a function, and the like) that performs the above-described functions. The module is stored in memory and can be executed by the processor. The memory may be internal or external to the processor and may be coupled to the processor by any of a variety of well known means. The processor may include an application specific integrated circuit (ASIC), other chipset, logic circuitry and / or a data processing device. The memory may include read-only memory (ROM), random access memory (RAM), flash memory, memory cards, storage media, and / or other storage devices.

Claims

An image decoding method performed by a decoding apparatus,

Deriving a motion vector predictor (MVP) for the current block;

Obtaining a motion vector difference (MVD) representing a difference between a motion vector of the current block and a MVP of the current block from a bitstream;

Deriving a control point motion vector predictor (CPMVP) corresponding to each of the control points (CP) for the current block;

Deriving a control point motion vector (CPMV) corresponding to each of the CPs for the current block, based on the MVP, the MVD, and the CPMVPs for the current block;

Deriving a motion vector field (MVF) for the current block based on the CPMVs;

Generating prediction samples for the current block based on the MVF; And

And generating reconstruction samples for the current block based on prediction samples for the current block.
The method according to claim 1,

Wherein the number of CPs for the current block is 4 and the CPMVs corresponding to each of the 4 CPs for the current block are derived based on the following equation,

Where CPMVi represents each of the four CPMVs, MVP represents MVP for the current block, and CPMVPi represents the CPMVPs corresponding to each of the four CPs for the current block. .
3. The method of claim 2,

Wherein deriving the CPMVPs comprises:

Determining whether to perform temporal bi-prediction using a first reference picture and a second reference picture of a current picture including the current block to derive the CPMVPs;

Deriving an x-axis gradient, a y-axis gradient, and a time axis gradient of each of the four CPs, if it is determined to perform the bidirectional prediction; And

Deriving the CPMVPs corresponding to each of the four CPs based on an x-axis gradient, a y-axis gradient, and a time axis gradient of each of the derived four CPs,

Wherein the first reference picture and the second reference picture are located in mutually opposite directions on a time axis with reference to the current picture.
The method of claim 3,

Wherein whether to perform the time-base bidirectional prediction is determined based on flag information indicating whether to perform the time-base bidirectional prediction.
The method of claim 3,

CPMVP corresponding to one CP among the derived CPs is derived based on the following equation,

Where v x denotes an x-axis component of the CPMVP, v y denotes a y-axis component of the CPMVP, s 1 to s 6 are derived based on the following equation,

Wherein G x denotes the x-axis gradient, G y denotes the y-axis gradient,
And P represents the time-base gradient.
3. The method of claim 2,

Wherein deriving the MVF for the current block based on the CPMVs comprises:

And applying the CPMVs to a bilinear interpolation motion model to derive the MVF for the current block.
The method according to claim 6,

The bidirectional linear interpolation motion model is based on the following equation,

Where x represents the x-axis coordinate of the sample in the current block, y represents the y-axis coordinate of the sample in the current block,
Represents the MVF for the current block,
Represents the CPMVs, W represents the width of the current block, H represents the height of the current block,
Gt; < / RTI > is a weighting factor.
The method according to claim 1,

Wherein the MVF for the current block includes a motion vector for each subblock in the current block,

Wherein the sub-block units in the current block are 4x4 block units or NxN block units.
9. The method of claim 8,

Wherein deriving the MVF for the current block includes deriving a motion vector of the 4x4 block unit in the current block,

Wherein the derived 4x4 block motion vector is for a sample at a center position of the 4x4 block unit or a sample closest to the center position.
9. The method of claim 8,

Wherein deriving the MVF for the current block includes deriving a motion vector for each NxN block in the current block,

Wherein N is smaller than the width and height of the current block and N is determined based on a size of the current block or a vector size of the CPMVs,

Wherein the derived NxN block-based motion vector is for a sample at a center position in the NxN block unit or a sample closest to the center position.
A video encoding method performed by an encoding apparatus,

Deriving a motion vector predictor (MVP) for the current block;

Deriving a motion vector difference (MVD) representing a difference between a motion vector of the current block and MVP of the current block;

Deriving a control point motion vector predictor (CPMVP) corresponding to each of the control points (CP) for the current block;

Deriving a control point motion vector (CPMV) corresponding to each of the CPs for the current block, based on the MVP, the MVD, and the CPMVPs for the current block;

Deriving a motion vector field (MVF) for the current block based on the CPMVs;

Generating prediction samples for the current block based on MVF for the current block;

Deriving residual samples for the current block based on the generated prediction samples; And

And encoding prediction information including the derived MVD and residual information about the residual samples.
12. The method of claim 11,

Wherein the number of CPs for the current block is 4 and the CPMVs corresponding to each of the 4 CPs for the current block are derived based on the following equation,

Wherein CPMVi represents each of the four CPMVs, MVP represents MVP for the current block, and CPMVPi represents the CPMVPs corresponding to each of the four CPs for the current block. .
13. The method of claim 12,

Wherein deriving the CPMVPs comprises:

Determining whether to perform temporal bi-prediction using a first reference picture and a second reference picture of a current picture including the current block to derive the CPMVPs;

Deriving an x-axis gradient, a y-axis gradient, and a time axis gradient of each of the four CPs, if it is determined to perform the temporal bi-directional prediction; And

Deriving the CPMVPs corresponding to each of the four CPs based on an x-axis gradient, a y-axis gradient, and a time axis gradient of each of the derived four CPs,

Wherein the first reference picture and the second reference picture are located in mutually opposite directions on a time axis with reference to the current picture,

CPMVP corresponding to one CP among the derived CPs is derived based on the following equation,

Where v x denotes an x-axis component of the CPMVP, v y denotes a y-axis component of the CPMVP, s 1 to s 6 are derived based on the following equation,

Wherein G x denotes the x-axis gradient, G y denotes the y-axis gradient,
And P represents the time base gradient.
13. The method of claim 12,

Wherein deriving the motion vector of the sub-block unit or the sample unit in the current block based on the CPMVs comprises:

Applying the CPMVs to a bilinear interpolation motion model to derive a motion vector of the subblock unit or the sample unit in the current block,

The bidirectional linear interpolation motion model is based on the following equation,

Where x represents the x-axis coordinate of the sample in the current block, y represents the y-axis coordinate of the sample in the current block,
Represents the sub-block unit or the sample unit motion vector in the current block,
Represents the CPMVs, W represents the width of the current block, H represents the height of the current block,
Is a weighting factor. &Lt; Desc / Clms Page number 20 >
A decoding apparatus for performing image decoding,

An entropy decoding unit for obtaining a motion vector difference (MVD) representing a difference between a motion vector of the current block and a motion vector predictor (MVP) for the current block from a bitstream, ;

Deriving the MVP, deriving a control point motion vector predictor (CPMVP) corresponding to each control point (CP) for the current block, and calculating MVP, A control point motion vector (CPMV) corresponding to each of the CPs for the current block is derived based on the MVD and the CPMVPs, a motion vector field for the current block a prediction unit for deriving a motion vector field (MVF) and generating prediction samples for the current block based on the MVF for the current block; And

And an adder for generating reconstruction samples for the current block based on prediction samples of the current block.