WO2019194497A1

WO2019194497A1 - Inter-prediction mode-based image processing method and apparatus therefor

Info

Publication number: WO2019194497A1
Application number: PCT/KR2019/003803
Authority: WO
Inventors: 장형문; 남정학; 박내리; 이재호
Original assignee: 엘지전자 주식회사
Priority date: 2018-04-01
Filing date: 2019-04-01
Publication date: 2019-10-10

Abstract

Disclosed are a method for decoding a video signal and an apparatus therefor. Specifically, a method for decoding an image on the basis of an inter-prediction mode may comprise the steps of: deriving an initial motion vector of a current block on the basis of motion information of a spatial neighboring block or a temporal neighboring block of the current block; deriving a motion vector differential value indicating a differential value between an initial position specified by the initial motion vector and a refined position within a preset search range; deriving a refined motion vector of the current block by adding the motion vector differential value to the initial motion vector; and generating a prediction block of the current block by using the refined motion vector.

Description

【Specification】

[Name of invention]

Inter prediction mode based image processing method and apparatus therefor

Technical Field

The present invention relates to a still image or moving image processing method, and more particularly, to a method for encoding / decoding a still image or moving image based on an inter prediction mode, and an apparatus supporting the same.

Background Art

Compression coding refers to a series of signal processing techniques for transmitting digitized information through a communication line or for storing in a form suitable for a storage medium. Media such as an image, an image, an audio, and the like may be a target of compression encoding. In particular, a technique of performing compression encoding on an image is called video image compression. Next generation video content will be characterized by high spatial resolution, high frame rate and high dimensionality of scene representation. Processing such content will result in a tremendous increase in terms of memory storage, memory access rate, and processing power.

Accordingly, there is a need to design coding tools for more efficiently processing next generation video content.

[Detailed Description of the Invention]

[Technical problem] An object of the present invention is to propose a method for deriving an optimal motion vector at various motion vector precisions in applying a DMVR.

It is also an object of the present invention to propose a method of deriving a refined motion vector based on a pre-patched integer pixel without using an interpolation filter in applying a DMVR.

The technical problems to be achieved in the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned above will be clearly understood by those skilled in the art from the following description. Could be.

Technical Solution

An aspect of the present invention provides a method of decoding an image based on an inter prediction mode, wherein an initial motion vector of the current block is derived based on motion information of a spatial neighboring block or a temporal neighboring block of the current block. Doing; Deriving a motion vector difference value representing a difference value between an initial position specified by the initial motion vector and an improved position within a preset search range; Deriving a refined motion vector of the current block by adding the motion vector difference value to the initial motion vector; And generating a prediction block of the current block by using the improved motion vector.

Preferably, the improved position may be determined as a position that minimizes the cost value of the block that includes the improved position as the upper left pixel position. Preferably, the deriving of the motion vector difference value comprises: rounding the initial motion vector to integer pixel precision when the initial motion vector has fractional pixel precision; within the search range with integer pixel precision. step of searching for an integer pixel position to a cost value to a minimum _in; and on the basis of the search the integer pixel positions, wherein the improvement by searching an integer pixel position as a fraction pixel accuracy by minimizing the cost value in said search range Deriving the position may further include.

Preferably, the deriving of the motion vector differential value comprises: rounding the initial motion vector with integer pixel precision when the initial motion vector has a fractional pixel constellation degree; And searching for an integer pixel position that minimizes a cost value within the search range with integer pixel precision, wherein the improved position corresponds to the fractional pixel corresponding to the initial position with respect to the searched integer pixel position. Can be determined by location.

Preferably, the deriving of the motion vector differential value may include: rounding the initial motion vector to integer pixel precision when the initial motion vector has fractional pixel precision; And inducing the improved position by searching an integer pixel position that minimizes the cost value within the search range with integer pixel precision, wherein the motion vector differential value is specified by the rounded initial motion vector. The difference between the position and the improved position can be derived.

Another aspect of the present invention is an apparatus for decoding an image based on an inter prediction mode, wherein a motion of a spatial neighboring block or a temporal neighboring block of a current block is performed. An initial motion vector derivation unit for deriving an initial motion vector of the current block based on information; A motion vector difference value inducing unit for deriving a motion vector difference value representing a difference value between an initial position specified by the initial motion vector and an improved position within a preset search range. ; An improved motion vector derivation unit for deriving a refined motion vector of the current block by adding the motion vector difference value to the initial motion vector; And a prediction block generator for generating a prediction block of the current block by using the improved motion vector.

Preferably, the improved position may be determined as a position that minimizes the cost value of the block that includes the improved position as the upper left pixel position.

Preferably, when the initial motion vector has fractional pixel precision, the motion vector difference value deriving unit rounds the initial motion vector to integer pixel precision and rounds the cost value within the search range with integer pixel precision. The improved position can be derived by searching for a minimum integer pixel position and searching for an integer pixel position with a minimum cost value within the search range with fractional pixel precision based on the found integer pixel position. .

Preferably, when the initial motion vector has fractional pixel precision, the motion vector differential value deriving unit rounds the initial motion vector to integer pixel precision and rounds the cost value within the search range with integer pixel precision. The minimum integer pixel position is searched and the improved position may be determined as a fractional pixel position corresponding to the initial position based on the found integer pixel position. have.

Preferably, when the initial motion vector has fractional pixel precision, the motion vector difference value deriving unit rounds the initial motion vector to integer pixel precision and rounds the cost value within the search range with integer pixel precision. By searching for the minimum integer pixel position, the improved position can be derived, and the motion vector difference value can be derived as the difference between the position specified by the rounded initial motion vector and the improved position.

Advantageous Effects

According to the embodiment of the present invention, it is possible to increase the precision of motion prediction and improve the compression performance without additional signaling information.

In addition, according to an embodiment of the present invention, when the interpolation filter? Is not applied in the DMVR process, the complexity of the decoder, i.e., the memory burden can be remarkably improved.

The effects obtainable in the present invention are not limited to the above-mentioned effects, and other effects not mentioned will be clearly understood by those skilled in the art from the following description. .

[Brief Description of Drawings]

BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, included as part of the detailed description in order to provide a thorough understanding of the present invention, provide embodiments of the present invention and together with the description, describe the technical features of the present invention.

1 is a schematic block diagram of an encoding apparatus in which an encoding of a video / image signal is performed, according to an embodiment to which the present invention is applied. 2 is a schematic block diagram of a decoding apparatus in which an embodiment of the present invention is applied and decoding of a video / image signal is performed.

3 is a diagram illustrating an example of a multi-type tree structure as an embodiment to which the present invention can be applied.

FIG. 4 is a diagram illustrating a signaling mechanism of partition partitioning information of a quadtree with nested multi-type tree structure according to an embodiment to which the present invention may be applied.

FIG. 5 is a diagram illustrating a method of dividing a CTU into multiple CUs based on a quadtree and a accompanying multi-type tree structure as an embodiment to which the present invention may be applied.

FIG. 6 is a diagram illustrating a method of limiting ternary-tree partitioning as an embodiment to which the present invention may be applied.

FIG. 7 is a diagram illustrating redundant division patterns that may occur in binary tree division and ternary tree division as an embodiment to which the present invention may be applied.

8 and 9 illustrate an inter prediction based video / image encoding method and an inter prediction unit in an encoding apparatus according to an embodiment of the present invention.

10 and 11 illustrate an inter prediction based video / image decoding method and an inter prediction unit in a decoding apparatus according to an embodiment of the present invention.

12 is an embodiment to which the present invention is applied and is shown in merge mode or skip mode. 0 2019/194497 1 ＞（1 '/ 1 公 019/003803

7 is a diagram for explaining peripheral blocks used.

13 is a flowchart illustrating a merge candidate list construction method according to an embodiment to which the present invention is applied.

14 is a flowchart illustrating a merge candidate list construction method according to an embodiment to which the present invention is applied.

15 is a flowchart illustrating a method of generating an inter prediction block by applying a DMVR as an embodiment to which the present invention may be applied.

FIG. 16 is a diagram for describing a neighboring block used in a merge mode or a skip mode as an embodiment to which the present invention is applied.

FIG. 17A illustrates an embodiment to which the present invention is applied and illustrates a method of improving a motion vector obtained from neighboring blocks based on template matching.

FIG. 1 o is a diagram for describing a method of improving a motion vector based on similarity between prediction blocks according to an embodiment to which the present invention is applied.

18 is a diagram for describing a method of improving a motion vector obtained from a neighboring block, according to an embodiment to which the present invention is applied.

19 is a diagram for describing a method of improving a motion vector obtained from a neighboring block according to an embodiment to which the present invention is applied.

20 is a diagram for describing a method of deriving an improved motion vector in units of integer pixels according to an embodiment to which the present invention is applied.

FIG. 21 is a diagram for describing a method of deriving an improved motion vector in units of integer pixels according to an embodiment to which the present invention is applied. FIG. 22 is a diagram for describing a method of deriving an improved motion vector in units of integer pixels according to an embodiment to which the present invention is applied.

FIG. 23 shows an example of a decoder side motion vector refinement (DMVR) process as an embodiment to which the present invention may be applied.

24 is an embodiment to which the present invention may be applied and includes a DMVR (Decoder side).

Motion Vector Refinement) process is an example.

25 and 26 are diagrams for describing a search range and an area required for interpolation according to an embodiment to which the present invention is applied.

27 is a diagram illustrating a search region of integer pixel positions for motion vector refinement according to an embodiment to which the present invention is applied. FIG. 28 is a diagram illustrating a search region for a motion vector refinement and a patch region according thereto according to an embodiment to which the present invention is applied.

29 is a diagram illustrating a search region for improving a motion vector according to an embodiment of the present invention.

30 is a diagram illustrating a search region for motion vector improvement according to an embodiment of the present invention.

31 is a flowchart illustrating a method of generating an inter prediction block according to an embodiment to which the present invention is applied.

32 is a diagram illustrating an inter prediction apparatus according to an embodiment to which the present invention is applied.

33 shows a video coding system to which the present invention is applied. 2019/194497 1 »（： 1 ^ 1 {2019/003803

9 is a diagram illustrating the structure of a content streaming system according to an embodiment to which the present invention is applied.

[Form for implementation of invention]

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. The detailed description, which will be given below with reference to the accompanying drawings, is intended to explain exemplary embodiments of the present invention and is not intended to represent the only embodiments in which the present invention may be practiced. The following detailed description includes specific details in order to provide a thorough understanding of the present invention. However, one of ordinary skill in the art appreciates that the present invention may be practiced without these specific details.

In some instances, well-known structures and devices may be omitted or shown in block diagram form centering on the core functions of the structures and devices in order to avoid obscuring the concepts of the present invention.

In addition, the terminology used in the present invention was selected as a general term widely used as possible now, in a specific case will be described using terms arbitrarily selected by the applicant. In such a case, since the meaning is clearly described in the detailed description of the part, it should not be interpreted simply by the name of the term used in the description of the present invention, and it should be understood that the meaning of the term should be understood and interpreted. .

Specific terms used in the following description are provided to help the understanding of the present invention, and the use of such specific terms may be changed to other forms without departing from the technical spirit of the present invention. For example, signals, data, samples, pictures, frames, blocks, etc. can be properly replaced and interpreted during each coding process. There will be.

Hereinafter, in the present specification, a 'processing unit' refers to a unit in which a process of encoding / decoding such as transform and / or quantization is performed. Hereinafter, for convenience of description, the processing unit may be referred to as a 'processing block' or 'block'.

The processing unit may be interpreted to include a unit for the luma component and a unit for the chroma component. For example, the processing unit may correspond to a Coding Tree Unit (CTU), a Coding Unit (CU), a Prediction Unit (PU), or a Transform Unit (TU).

In addition, the processing unit may be interpreted as a unit for a luma component or a unit for a chroma component. For example, the processing unit may be a coding tree block (CTB) for a luma component. ) May correspond to a coding block (CB), for example, a prediction block (PU) or a transform block (TB). Or, it may correspond to a coding tree block (CTB), a coding block (CB), a prediction block (PU), or a transform block (TB) for a color difference (chr: oina) component. In addition, the present invention is not limited thereto, and the processing unit may be interpreted to include a unit for a luma component and a unit for a chroma component.

In addition, the processing unit is not necessarily limited to square blocks, but may also be configured in a polygonal form having three or more vertices.

In the following specification, a pixel, a pixel, and the like are referred to collectively as a sample. In addition, using a sample may mean using a pixel value or a pixel value. have.

1 is a schematic block diagram of an encoding apparatus in which an encoding of a video / image signal is performed, according to an embodiment to which the present invention is applied.

Referring to FIG. 1, the encoding apparatus 100 may include an image splitter 110, a subtractor 11, a transform unit 120, a quantization unit 130, an inverse quantization unit 140, an inverse transform unit 150, An adder 15 부, a filter 160, a memory 170, an inter predictor 180, an intra predictor 185, and an entropy encoder 190 may be configured. ) And the intra predictor 18 ¼ may be collectively referred to as a predictor. In other words, the predictor may include an inter predictor 180 and an intra predictor 185. A transformer 120 and a quantizer 130 ), The inverse quantization unit 140 and the inverse transform unit 150 may be included in the residual processing unit, and the residual processing unit may further include a subtraction unit 115. As an embodiment, the above-described image may be included. Splitter 110, Subtractor 11, Transformer 120, Quantizer 130, Inverse Quantizer 140, Inverse Transformer 150, Adder 155 Prediction unit 180, intra prediction 185 and the entropy encoding unit 190 may be configured by one hardware component (eg, an encoder or a processor), and the memory 170 may include a decoded picture buffer (DPB), Digital storage

It may also be configured by a sieve.

The image divider 110 may divide an input image (or a picture or a frame) input to the encoding apparatus 100 into one or more processing units. For example, the processing unit may be called a coding unit (CU). In this case, the coding unit is a coding tree unit (CTU) or maximum It may be recursively split from a coding unit (LCU) according to a quad-tree binary-tree (QTBT) structure. For example, one coding unit may be divided into a plurality of coding units of a deeper map based on a quad tree structure and / or a binary tree structure. In this case, for example, the quad tree structure may be applied first and the binary tree structure may be applied later. Alternatively, the binary tree structure may be applied first. The coding procedure according to the present invention may be performed based on the final coding unit that is no longer split. In this case, the maximum coding unit may be used as the final coding unit immediately based on coding efficiency according to the image characteristic, or if necessary, the coding unit is recursively divided into coding units of lower depths and optimized. A coding unit of size may be used as the final coding unit. Here, the coding procedure may include a procedure of prediction, transform, and reconstruction, which will be described later. As another example, the processing unit may further include a Prediction Unit (PU) or a Transform Unit (TU). In this case, the prediction unit and the transform unit may be partitioned or partitioned from the aforementioned final coding unit, respectively. The prediction unit may be a unit of sample prediction, and the transformation unit may be a unit for deriving a transform coefficient and / or a unit for deriving a residual signal from the transform coefficient.

The unit may be used interchangeably with terms such as block or area as the case may be. In a general case, an M × N block may represent a sample of M columns and N rows or a set of transform coefficients. Samples can usually represent pixels or pixel values It may represent only pixel / pixel values of the luma component or only pixel / pixel values of the chroma component. A sample may be used as a term corresponding to one picture (or image) for a pixel or pel.

The encoding apparatus 100 subtracts the prediction signal output from the inter prediction unit 180 or the intra prediction unit 185 (the predicted block, ie, the sample array) from the input image signal (the original block, the original sample array) and the residual. A signal (residual signal, residual block, residual sample array) may be generated, and the generated residual signal is transmitted to the converter 120. In this case, as shown, the unit that subtracts the prediction signal (prediction block, prediction sample array) from the input image signal (original block, original sample array) in the encoder 100 may be referred to as a subtraction unit 115. The prediction unit may perform a prediction on a block to be processed (hereinafter, referred to as a current block) and generate a predicted block including examples of the current block, that is, samples. The prediction unit may determine whether intra prediction or inter prediction is applied on a current block or CU basis. As described later in the description of each prediction mode, the prediction unit may generate various information related to prediction, such as prediction mode information, and transmit the generated information to the entropy encoding unit 190. The information about the prediction may be encoded in the entropy encoding unit 190 and output in the form of a bitstream.

The intra predictor 185 may predict the current block by referring to the samples in the current picture. The referenced samples may be located in the neighborhood of the current block or may be located apart according to the prediction mode. In intra prediction, prediction modes may include a plurality of non-directional modes and a plurality of directional modes. The non-directional mode is for example DC mode and planner mode (Planar mode) It may include. The directional mode may include, for example, 33 directional prediction modes or 65 directional prediction modes according to the degree of detail of the prediction direction. However, as an example, more or less directional prediction modes may be used depending on the setting. The intra predictor 185 may determine the prediction mode applied to the current block by using the prediction mode applied to the neighboring block.

The inter predictor 180 may derive the predicted block for the current block based on the reference block (reference sample array) specified by the motion vector on the reference picture. In this case, to reduce the amount of motion information transmitted in the inter prediction mode, the motion information may be predicted in units of blocks, sub-blocks, or samples based on the correlation of the motion information between the neighboring block and the current block. The motion information may include a motion vector and a reference picture index. The motion information may further include inter prediction direction (L0 prediction, L1 prediction, Bi prediction, etc.) information. In the case of inter prediction, the neighboring block may include a spatial neighboring block existing in the current picture and a temporal neighboring block present in the reference picture. The reference picture including the reference block and the reference picture including the temporal neighboring block may be the same or different. The temporal neighboring block may be called a co-located reference block, a co-located CU (colCU), etc., and a reference picture including the temporal neighboring block may be called a collocated picture (colPic). It may be. For example, the inter prediction unit 180 constructs a motion information candidate list based on neighboring blocks, and selects a candidate to derive a motion vector and / or a reference picture index of the current block. Information indicating whether it is used can be generated. Inter prediction may be performed based on various prediction modes. For example, in the case of a skip mode and a merge mode, the inter prediction unit 180 may use motion information of a neighboring block as motion information of a current block. In the skip mode, unlike the merge mode, the residual signal may not be transmitted. In the case of motion information prediction (MVP) mode, the motion vector of the current block is used by using the motion vector of the neighboring block as a motion vector predictor and signaling a motion vector difference. Can be directed.

The prediction signal generated by the inter predictor 180 or the intra predictor 185 may be used to generate a reconstruction signal or to generate a residual signal.

The transform unit 120 may apply transform transform techniques to the residual signal to generate transform coefficients. For example, the transformation technique may include at least one of Discrete Cosine Transform (DCT), Discrete Sine Transform (DST), Karhunen-Loeve Transform (KLT), Graph-Based Transform (GBT), or Conditionally Non-linear Transform (CNT). It may include. Here, GBT means a conversion obtained from this graph when the relationship information between pixels is represented by a graph. CNT means a transform that is generated using and based on all previously reconstructed pixels, i.e., a signal. In addition, the conversion process may be applied to pixel blocks having the same size as the square, or may be applied to blocks of variable size rather than square. The quantization unit 130 ñ quantizes the transform coefficients and transmits them to the entropy encoding unit 190. The entropy encoding unit 190 encodes the quantized signal (information about the quantized transform coefficients) and outputs the bitstream. The information about the quantized transform coefficients may be referred to as residual information The quantization unit 130 재정 rearranges the quantized transform coefficients in a block form based on a coefficient scan order into a one-dimensional vector form. The information on the quantized transform coefficients may be generated based on the quantized transform coefficients in the form of the one-dimensional vector The entropy encoding unit 190 may include, for example, an exponential Golomb, CAVLC. It can perform various encoding methods such as context-adaptive variable length coding and context-adaptive binary arithmetic coding (CABAC). The unit 190-may encode information necessary for video / image reconstruction in addition to the quantized transform coefficients (for example, the value of syntax elements, etc.) together or separately. / Image information) can be transmitted or stored in units of NAL (network abstraction layer) in the form of a bitstream, which can be transmitted through a network or stored in a digital storage medium. It may include a broadcasting network and / or a communication network, etc. The digital storage medium may include a variety of storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, etc. Signal output from the entropy encoding unit 190 May be configured as a transmitting unit (not shown) and / or a storing unit (not shown) as an internal / external element of the encoding apparatus 100, or the transmitting unit may be entropy encoded. It may be a component of 190. The quantized transform coefficients output from the quantization unit 130 may be used to generate a prediction signal. For example, the quantized transform coefficients may be reconstructed in the residual signal by applying inverse quantization and inverse transform through inverse quantization unit 14 and inverse transform unit 150 in the loop. A reconstructed signal (reconstructed picture, reconstructed block, reconstructed samples) may be generated by adding the residual signal to the predictive signal output from the inter predictor 180 or the intra predictor 18. If there is no residual for the block to be processed, such as when the mode is applied, the predicted block may be used as the reconstructed block, and the adder 155 may be called a reconstructor or a reconstructed block generator. May be used for intra prediction of the next block to be processed in the current picture, and may be used for inter prediction of the next picture through filtering as described below.

The filtering unit 160 may improve subjective / objective image quality by applying filtering to the reconstruction signal. For example, the filtering unit 160 may generate a modified reconstructed picture by applying various filtering methods to the reconstructed picture, and the modified reconstructed picture is stored in the memory 170, specifically, the DPB of the memory 170. Can be stored in The various filtering methods may include, for example, deblocking filtering, a sample adaptive offset, an adaptive loop filter, a bilateral filter, and the like. As described later in the description of each filtering method, the filtering unit 160 may generate various information about the filtering and transmit the generated information to the entropy encoding unit 190. The information about the filtering is encoded in the entropy encoding unit 190 in the form of a bitstream. 2019/194497 1 »（： 1 ^ 1 {2019/003803

18 can be output.

The modified reconstructed picture transmitted to the memory 170 may be used as the reference picture in the inter predictor 180. When the inter prediction is applied through the encoding apparatus, the encoding apparatus may avoid prediction mismatch between the encoding apparatus 100 and the decoding apparatus, and may improve encoding efficiency.

Memory (170)

The modified reconstructed picture may be stored for use as a reference picture in the inter prediction unit 180. The memory 170 may store the motion information of the block from which the motion information in the current picture is derived (or encoded) and / or the motion information of the blocks in the picture that have already been reconstructed. The stored motion information may be transmitted to the inter predictor 180 to use the motion information of the spatial neighboring block or the motion information of the temporal neighboring block. The memory 170 may store reconstructed samples of reconstructed blocks in the current picture, and transfer the reconstructed samples to the intra predictor 185.

2 is a schematic block diagram of a decoding apparatus in which an embodiment of the present invention is applied and decoding of a video / image signal is performed.

Referring to FIG. 2, the decoding apparatus 200 includes an entropy decoding unit 210, an inverse quantization unit 220, an inverse transform unit 230, an adder 23 23, a filter 240, a memory 250, and an interoperator. The prediction unit 260 and the intra prediction unit 265 may be configured to include the inter prediction unit 260 and the intra prediction unit 265, which may be called a prediction unit. 180) and an intra prediction unit 185. The inverse quantization unit 220 and the inverse transform unit 230 may be collectively referred to as a residual processing unit, that is, the residual processing unit 220, May include an inverse transform unit 230 have. The entropy decoding unit 210, the inverse quantization unit 220, the inverse transform unit 230, the adder 23 23, the filtering unit 240, the inter prediction unit 260, and the intra prediction unit 26 are described in the above-described embodiments. The memory 170 may include a decoded picture buffer (DPB) and may be configured by a digital storage medium. When a bitstream including image information is input, the decoding apparatus 200 may reconstruct an image corresponding to a process in which video / image information is processed in the encoding apparatus of Fig. 1. For example, the decoding apparatus 200 Can perform decoding using a processing unit applied in the encoding apparatus, so that the processing unit of decoding can be, for example, a coding unit, and the coding unit is calculated from the coding tree unit or the largest coding unit. The reconstructed video signal decoded and output through the decoding apparatus 200 may be reproduced through the reproducing apparatus.

The decoding apparatus 200 may receive a signal output from the encoding apparatus of FIG. 1 in the form of a bitstream, and the received signal may be decoded through the entropy decoding unit 210. For example, the entropy decoding unit 210 may parse the bitstream to derive information (eg, video / image information) necessary for image reconstruction (or picture reconstruction). For example, the entropy decoding unit 210 decodes the information in the bitstream based on a coding method such as exponential Golomb coding, CAVLC, or CABAC, quantized values of syntax elements required for image reconstruction, and transform coefficients for residuals. Can be output. In more detail, the CABAC entropy decoding method, receives a bin corresponding to each syntax element in the bitstream, and the syntax element to be decoded The context model is determined using the information and the decoding information of the neighboring and decoded blocks or the information of symbols / bins decoded in the previous step, and the probability of occurrence of the bins is determined according to the determined context model. Arithmetic decoding may be performed to generate a symbol corresponding to the value of each syntax element. In this case, the CABAC entropy decoding method may update the context model by using the information of the decoded symbol / bin for the context model of the next symbol / bean after determining the context model. The information related to the prediction among the information decoded by the entropy decoding unit 2110 is provided to the prediction unit (the inter prediction unit 260 and the intra prediction unit 26), and the entropy decoding unit 210 performs entropy decoding. Dual values, that is, quantized transform coefficients and related parameter information, may be input to the inverse quantization unit 220. Also, information about filtering among information decoded by the entropy decoding unit 210 may be transmitted to the filtering unit 240. Meanwhile, a receiver (not shown) for receiving a signal output from the encoding apparatus may be further configured as an internal / external element of the decoding apparatus 200 ñ, or the receiver may be configured of the entropy decoding unit 210. It may be an element.

The inverse quantization unit 220 may dequantize the quantized transform coefficients and output the transform coefficients. The inverse quantization unit 220 may rearrange the quantized transform coefficients in the form of a two-dimensional block. In this case, the reordering may be performed based on the coefficient scan order performed by the encoding apparatus. Inverse quantization unit 220 performs inverse quantization on quantized transform coefficients using a quantization parameter (for example, quantization step size information), and transform coefficients.

111。丄 6111;) can be obtained. The inverse transform unit 230 inversely transforms the transform coefficients to obtain a residual signal (residual block _/ residual sample array?).

The prediction unit may perform prediction on the current block and generate a predicted block including prediction samples for the current block. The prediction unit may determine whether intra prediction or inter prediction is applied to the current block based on the information about the prediction output from the entropy decoding unit 210, and may determine a specific intra / inter prediction mode. The intra predictor 265 may predict the current block by referring to the samples in the current picture. The referenced samples may be located in the neighborhood of the current block or may be located apart according to the prediction mode. In intra prediction, prediction modes may include a plurality of non-directional modes and a plurality of directional modes. The intra predictor 265 may determine the prediction mode applied to the current block by using the prediction mode applied to the neighboring block.

The inter predictor 260 may derive the predicted block for the current block based on the reference block (reference sample array) specified by the motion vector on the reference picture. In this case, in order to reduce the amount of motion information transmitted in the inter prediction mode, the motion information may be predicted in units of blocks, subblocks, or samples based on the correlation of the motion information between the neighboring block and the current block. The motion information may include a motion vector and a reference picture index. The motion information may further include inter prediction direction (L0 prediction, L1 prediction, Bi prediction, etc.) information. In the case of inter prediction, the neighboring block is a spatial neighboring block existing in the current picture and a temporal neighboring block present in the reference picture. neighboring block). For example, the inter predicate 260 may construct a motion information candidate list based on neighboring blocks and derive a motion vector and / or a reference picture index of the current block based on the received candidate selection information. Inter prediction may be performed based on various prediction modes, and the information about the prediction may include information indicating a mode of inter prediction for the current block.

The adder 23 is configured to add the obtained residual signal to the predictive signal (predicted block, predictive sample array) output from the inter predictor 260 or the intra predictor 265 to restore the reconstructed signal (reconstructed picture, reconstructed block). If there is no residual for the block to be processed, such as when the skip mode is applied, the prediction tube block may be used as the reconstruction block.

The adder 235 may be called a restoration unit or a restoration block generation unit. The generated reconstruction signal may be used for intra prediction of a next processing target block in a current picture, and may be used for inter prediction of a next picture through filtering as described below.

The filtering unit 240 may improve subjective / objective picture quality by applying filtering to the reconstruction signal. For example, the filtering unit 240 may generate a modified reconstructed picture by applying various filtering methods to the reconstructed picture, and the modified reconstructed picture may be stored in the memory 250, specifically, in the DPB of the memory 250. Can transmit The various filtering methods may include, for example, deblocking filtering, a sample adaptive offset, an adaptive loop filter, a bilateral filter, and the like. The (modified) reconstructed picture stored in the DPB of the memory 250 may be used as the reference picture in the inter predictor 260. The memory 250 may store the motion information of the block from which the motion information in the current picture is derived (or decoded) and / or the motion information of the blocks in the picture that have already been reconstructed. The stored motion information may be transmitted to the inter predictor 260 to use the motion information of the spatial neighboring block or the motion information of the temporal neighboring block. The memory 170 may store reconstructed samples of reconstructed blocks in the current picture, and may deliver the reconstructed samples to the intra predictor 265.

In the present specification, the embodiments described by the filtering unit 160, the inter prediction unit 180, and the intra prediction unit 185 of the encoding apparatus 100 are respectively the filtering unit 240 and the inter prediction of the decoding apparatus 200. The same may also apply to the unit 260 and the intra predictor 26.

Block Partitioning

The video / image coding method according to this document may be performed based on various detailed techniques, and each detailed technique will be described as follows. Techniques described below include prediction, residual processing ((inverse) transformation, (inverse) quantization, etc.), syntax element coding, filtering, partitioning / division, etc., in the video / image encoding / decoding procedures described above and / or below. It will be apparent to those skilled in the art that they may be involved in related procedures. The block partitioning procedure according to this document may be performed by the image splitter 110 of the encoding apparatus described above, and the partitioning related information may be processed (encoded) by the entropy encoding unit 190 ñ and transmitted to the decoding apparatus in the form of a bitstream. Decoding The entropy decoding unit 210 of the apparatus derives a block partitioning structure of a current picture based on the partitioning related information obtained from the bitstream, and based on the entropy decoding unit 210, a series of procedures (eg, prediction and residual processing) for image decoding. Block recovery, in-loop filtering, etc.).

Partitioning of picture into CTUs

Pictures can be divided into a sequence of coding tree units (CTUs). The CTU may correspond to a coding tree block (CTB). Alternatively, the CTU may include a coding tree block of luma samples and two coding tree blocks of corresponding chroma samples. In other words, for a picture that includes three sample arrays, the CTU may include an N × N block of luma samples and two corresponding blocks of chroma samples.

The maximum allowable size of the CTU for coding and prediction may be different from the maximum allowable size of the CTU for transform. For example, the maximum allowable size of the luma block in the CTU may be 128x128.

Partit 丄 onig of the CTUs using a tree structure

The CTU may be divided into (based on a quad-tree (QT) structure. The quadtree structure may be referred to as a quaternary tree structure, which reflects various local characteristics. Meanwhile, in the present document, the CTU may be partitioned based on a multi-type tree structure partition including a binary tree (BT) and a ternary tree (TT) as well as a quad tree. Hereinafter, the QTBT structure may include a quadtree and binary tree based partition structure, and the QTBTTT is a quadtree, It can include binary tree and ternary tree based partitioning structures. or,

The structure may include a quadtree, binary tree and ternary tree based partitioning structure. In a coding tree structure, a CU may have a square or rectangular shape. The CTU may first be divided into quadtree structures. Subsequently, the leaf nodes of the quadtree structure may be additionally divided by the multitype tree structure.

In one embodiment of the present invention, the multitype tree structure may include four partition types as shown in FIG. The four split types include vertical binary splitting (SPLIT_BT_VER), horizontal binary splitting (SPLIT_BT_HOR), vertical ternary splitting (SPLIT_TT_VER), and horizontal ternary splitting (SPLIT_TT_HOR). ) May be included. Leaf nodes of the multitype tree structure may be called CUs. These CUs can be used for prediction and transform procedures. In general, CU, PU, in the present document may have the same block size. However, when the maximum supported transform length is smaller than the width or height of the color component of [, the CU and the TU may have different block sizes.

FIG. 4 is a diagram illustrating a signaling mechanism of partition partitioning information of a quadtree with nested multi-type tree structure with a multitype tree according to an embodiment to which the present invention may be applied. 2019/194497 1 »（： 1/10 公 019/003803

26 Here, the CTU is treated as the root of the quadtree and is partitioned for the first time into a quadtree structure. Each quadtree leaf node may then be further partitioned into a multitype tree structure. In the multitype tree structure, a first flag (ex. Mtt_split_cu_flag) is signaled to indicate whether the node is partitioned share price. If the node is further partitioned, a second slag (ex. Mtt__split_cu_verticla_flag) may be signaled to indicate the splitting direction. ex. mtt_split_cu_binary_flag) may be signaled to indicate whether the partition type is binary partition or ternary partition. For example, based on the mtt_split_cu_vertical_flag and the mtt_split_cu_binary_flag, a multi-type tree splitting mode (MttSplitMode) of a CU may be derived as shown in Table 1 below.

Table 1

FIG. 5 is an embodiment to which the present invention may be applied, based on a quadtree and a accompanying multi-type tree structure A diagram illustrating a method of dividing a CTU into multiple CUs.

Here, bold block edges represent quadtree partitioning and the remaining edges represent multitype tree partitioning. Quadtree partitions involving a multitype tree can provide a content-assigned coding tree structure. The CU may correspond to a coding block (CB). Alternatively, the CU may include a coding block of luma samples and two coding blocks of corresponding chroma samples. The size of a CU may be as large as, or cut by 4 × 4 in luma sample units. For example, in 4: 2: 0 color format (or chroma format), the maximum chroma CB size may be 64x64 and the minimum chroma CB size may be 2x2.

For example, in this document, the maximum allowable luma TB size may be 64x64 and the maximum allowable chroma TB size may be 32x32. If the width or height of the CB divided according to the tree structure is larger than the maximum transform width or height, the CB may be automatically (or implicitly) split until the TB size limit in the horizontal and vertical directions is satisfied.

Meanwhile, for a quadtree coding tree scheme involving a multitype tree, the following parameters may be defined and identified as an SPS syntax element.

CTU size: the root node size of a quaternary tree

MinQTSize: the min 丄 mum allowed quaternary tree leaf node size

MaxBtS 丄 ze: the maximum allowed binary tree root node size

MaxTtSize: the maximum allowed ternary tree root node size 2019/194497 1 »（： 1/10 公 019/003803

28

MaxMttDepth: the maximum allowed hierarchy depth of multi-type tree splitting from a quadtree leaf

MinBtS 丄 ze: the minimum allowed binary tree leaf node size

MinTtSize the min 丄 mum allowed ternary tree leaf node size

As an example of a quadtree coding tree structure involving a multitype tree, the CTU size may be set to 64x64 blocks of 128x128 luma samples and two corresponding chroma samples (in 4: 2: 0 chroma format). In this case, MinOTSize can be set to 16x16, MaxBtS 丄 ze to 128x128, MaxTtSzie to 64x64, MinBtSize and MinTtSize (for both width and height) to 4x4, and MaxMttDepth to 4. Quarttree partitioning may be applied to the CTU to generate quadtree leaf nodes. The quadtree leaf node may be called a leaf QT node. Quadtree leaf nodes may have a 128x128 size (ie the CTU size) from a 16x16 size (ie the MinOTSize). If the leaf QT node is 128x128, it may not be additionally divided into a binary tree / a ternary tree. This is because in this case even if split, it exceeds MaxBtsize and MaxTtszie (ie 64x64). In other cases, leaf QT nodes may be further partitioned into a multitype tree. Therefore, the leaf QT node is the root node for the multitype tree, and the leaf QT node may have a multitype tree map (mttDepth) 0 value. If the multitype tree map reaches MaxMttdepth (ex. 4), then stock splits may no longer be considered. If the width of a multitype tree node When ½ 丄 118 와 3176 and less than or equal to 2x1 ^ 1111 ^: 3: 126, the stock price horizontal division may no longer be considered. If the height of the multitype tree node is equal to MinBtSize and is less than or equal to 2xMinTtSize, then the main vertical split may no longer be considered.

With reference to FIG. 6, to allow for 64x64 luma blocks and 32x32 chroma pipeline designs in a hardware decoder, TT partitioning may be limited in certain cases. For example, when the width or height of the luma coding block is larger than a predetermined specific value (eg, 32, 64), as shown in FIG. 6, TT division may be limited.

In this document, the coding tree scheme may support that the luma and chroma blocks have separate block tree structures. For P and B slices, luma and chroma pictures in one CTLT may be restricted to have the same coding tree structure. However, for I slices, luma and chroma blocks may have a separate block tree structure from each other. If an individual block tree mode is applied, the luma (picture is split into () based on a particular coding tree structure, and the chroma CTB can be split into chroma CUs based on another coding tree structure. My CU may consist of a coding block of luma components or coding blocks of two chroma components, and that of a P or B slice may mean that it may consist of blocks of three color components.

In the above-mentioned “Partitionig of the CTUs using a tree structure” A quadtree coding tree structure with a multitype tree has been described, but a structure in which a CU is divided is not limited thereto. For example, the BT structure and the TT structure may be interpreted as a concept included in a multiple partitioning tree (MPT) structure, and the CU may be interpreted to be divided through the QT structure and the MPT structure. In one example where a CU is split through a QT structure and an MPT structure, a syntax element (eg, MPT_split__type) that contains information about how many blocks the leaf node of the QT structure is divided into and the leaf node of the QT structure are vertical the syntax elements including information as to whether and in which direction of the horizontal partition (e.g., _Sp MPT_ lit_mode) signaling can be determined by being a split structure.

In another example, it may be partitioned in a different way than the QT structure, the BT structure or the TT structure. That is, according to the QT structure, the CUs of the child maps are divided into one-fourth the size of the CUs of the parent map, or the CUs of the child maps are divided into one-half size of the CUs of the parent map, according to the BT structure. CU7 of lower depth> In contrast to being divided into 1/4 or 1/2 size of the CU of the upper map, the CU of the lower map is sometimes 1/5, 1/3, 3/8, 3 of the CU of the upper depth. It can be divided into / 5, 2/3 or 5/8 size, the way in which the CU is divided is not limited to this.

If a portion of a tree node block exceeds the bottom or right picture boundary, the tree node block is placed so that all samples of all coded CUs are located within the picture boundaries. May be limited. In this case, for example, the following division rule may be applied.

-If a portion of a tree node block exceeds both the bottom and the right picture boundaries, -If the block is a QT node and the size of the block is larger than the minimum QT size, the block is forced to be split with QT split mode.

-Otherwise, the block is forced to be split with SPLIT_BT_HOR mode

-Otherwise if a portion of a tree node block exceeds the bottom picture boundaries,

-If the block is a QT node, and the size of the block is larger than the minimum QT size, and the size of the block is larger than the maximum BT size, the block is forced to be split with QT split mode.

-Otherwise, if the block is a QT node, and the size of the block is larger than the minimum QT size and the size of the block is smaller than or equal to the maximum BT size, the block is forced to be split with QT split mode or SPLIT_BT_HOR mode.

-Otherwise (the block is a BTT node or the size of the block is smaller than or equal to the minimum QT size), the block is forced to be split with SPLIT_BT_HOR mode.

-Otherwise if a portion of a tree node block exceeds the right picture boundaries,

-If the block is a QT node, and the size of the block is 2019/194497 1 »(： 1/10 公 019/003803

32 larger than the minimum QT size, and the size of the block is larger than the maximum BT size, the block is forced to be split with QT split mode.

-Otherwise, if the block is a QT node, and the size of the block is larger than the minimum QT size and the size of the block is smaller than or equal to the maximum BT size, the block is forced to be split with QT split mode or SPLIT_BT_VER mode.

-Otherwise (the block is a BTT node or the size of the block is smaller than or equal to the minimum QT size), the block is forced to be split with SPLIT_BT__VER mode.

On the other hand, the quadtree coded block structure with the multi-type tree described above can provide a very flexible block partitioning structure. Because of the partition types supported in a multitype tree, different partition patterns can sometimes lead to potentially identical coding block structure results. By limiting the occurrence of such redundant partition patterns, the data amount of partitioning information can be reduced. It demonstrates with reference to the following drawings.

As shown in FIG. 7, two levels of consecutive binary splits in one direction) has the same coding block structure as binary division for the center partition after ternary division. In this case, the binary tree split (in the given direction) for the center partition of the ternary tree split may be limited. This restriction can be applied for CUs of all pictures. If this particular partitioning is restricted, the signaling of the corresponding syntax elements can be modified to reflect this limited case, thereby reducing the number of bits signaled for partitioning. For example, as shown in FIG. 7, when the binary tree split for the center partition of the CU is restricted, the mtt_split_cu_binary_flag syntax element indicating whether the split is a binary split or a tenary split is not signaled, and its value is Can be inferred by the decoder to zero.

Prediction

The decoded portion of the current picture or other pictures in which the current processing unit is included may be used to reconstruct the current processing unit in which decoding is performed.

An intra picture or I picture (slice) that uses only the current picture for reconstruction, that is, performs only intra-picture prediction, and a picture (slice) that uses a maximum of one motion vector and a reference index to predict each unit. A picture (slice) using a picture (predictive picture) or a P picture (slice), up to two motion vectors, and a reference index may be referred to as a pair, that is, a picture (Bi-predictive picture) or a B picture (slice).

Intra prediction is performed by using data elements (eg, slices) of the same decoded picture (or slice). For example, a sample value). That is, a method of predicting pixel values of the current processing block by referring to reconstructed regions in the current picture.

Hereinafter, the inter prediction will be described in more detail. Inter prediction (or inter-screen example)

Inter prediction means a prediction method of deriving a current processing block based on data elements (eg, sample values or motion vectors, etc.) of pictures other than the current picture. That is, a method of predicting pixel values of the current processing block by referring to reconstructed regions in other reconstructed pictures other than the current picture. . Inter prediction (or inter picture prediction) is the redundancy that exists between pictures.

Most of these techniques are achieved through motion estimation and motion compensation.

The present invention describes the detailed description of the inter prediction method described above with reference to FIGS. 1 and 2, and the decoder may be represented by the inter prediction-based video / image decoding method of FIG. 10 described later and the inter prediction unit in the decoding apparatus of FIG. 11. . In addition, the encoder may be represented by the inter prediction based video / video encoding method of FIG. 8 and the inter prediction unit in the encoding apparatus of FIG. 9. In addition, the data encoded by FIGS. 8 and 9 may be stored in the form of a bitstream.

The prediction unit of the encoding device / decoding device may derive the prediction sample by performing inter prediction on a block basis. Inter prediction is derived in a manner dependent on the data elements (eg sample values, or motion information, etc.) of the picture (s) other than the current picture. Represent a prediction. When inter prediction is applied to the current block, a predicted block (prediction sample array) for the current block is derived based on a reference block (reference sample array) specified by a motion vector on the reference picture indicated by the reference picture index. Can be.

In this case, in order to reduce the amount of motion information transmitted in the inter prediction mode, the motion information of the current block may be predicted in units of blocks, subblocks, or samples based on the correlation of the motion information between the neighboring block and the current block. The motion information may include a motion vector and a reference picture index. The motion information may further include inter-e.g., Type (L0, e.g., L1, e.

When inter prediction is applied, the neighboring block is a spatial neighboring block existing within the current picture _. And a temporal neighboring block existing in the reference picture. The reference picture including the reference block and the reference picture including the temporal neighboring block may be the same or different. The temporal neighboring block may be called a collocated reference block, a collocated CU (colCU), or the like, and a reference picture including the temporal neighboring block is called a collocated picture (colPic). It may be. For example, a motion information candidate list may be constructed based on neighboring blocks of the current block, and a flag indicating which candidate is selected (used) to derive a motion vector and / or a reference picture index of the current block. Or index information may be signaled.

Inter prediction may be performed based on various prediction modes, for example In the case of the skip mode and the merge mode, the motion information of the current block may be the same as the motion information of the selected neighboring block. In the skip mode, unlike the merge mode, the residual signal may not be transmitted. In the case of a motion vector prediction (MVP) mode, the motion vector of the selected neighboring block is used as a motion vector predictor, and a motion vector difference may be signaled. In this case, the motion vector of the current block may be derived using the sum of the motion vector predictor and the motion vector difference.

8 and 9, S801 may be performed by the inter prediction unit 180 of the encoding apparatus, and S802 may be performed by the residual processing unit of the encoding apparatus. In detail, S802 may be performed by the subtracting unit 115 of the encoding apparatus. In S803, the prediction information may be derived by the inter prediction unit 180-and may be encoded by the entropy encoding unit 190. In S803, the residual information may be derived by the residual processing unit, and the entropy encoding unit 190 may be used. The residual information may be information about the residual samples, and the residual information may include information about quantized transform coefficients of the residual samples.

As described above, the residual samples are derived as transform coefficients through the transform unit 120 of the encoding apparatus, and the transform coefficients are obtained through the quantization unit 130. It can be aided by quantized transform coefficients. Information about the quantized transform coefficients may be encoded by the entropy encoding unit 190 through a residual coding procedure.

The encoding apparatus performs inter prediction on the current block (S801). The encoding apparatus may derive inter prediction mode and motion information of the current block and generate prediction samples of the current block. In this case, the inter prediction mode determination, the motion information derivation, and the prediction samples generation procedure may be performed simultaneously, or one procedure may be performed before the other. For example, the inter prediction unit 180 of the encoding apparatus may include a prediction mode determination unit 181, a motion information derivation unit 182, and a prediction sample derivation unit 183, and the prediction mode determination unit 181 may be used. In FIG. 2, the prediction mode for the current block may be determined, the motion information derivation unit 182 may derive the motion information of the current block, and the prediction sample derivation unit 183 may derive the motion samples of the current block.

For example, the inter prediction unit 180 of the encoding apparatus searches for a block similar to the current block in a predetermined area (search area) of reference pictures through motion estimation, and a difference from the current block is determined. Reference blocks that are minimum or below a certain criterion may be derived. Based on this, a reference picture index indicating a reference picture in which the reference block is located may be derived, and a motion vector may be derived based on a position difference between the reference block and the current block. The encoding apparatus may determine a mode applied to the current block among various prediction modes. The encoding apparatus may compare RD costs for the various prediction modes and determine an optimal prediction mode for the current block. For example, when a skip mode or a merge mode is applied to the current block, the encoding apparatus configures a merge candidate task, which will be described later, and the current block among the reference blocks indicated by merge candidates included in the merge candidate list. Among them, a reference block having a minimum difference from the current block or less than a predetermined reference may be derived. In this case, a merge candidate associated with the derived reference block is selected, and merge index information indicating the selected merge candidate may be generated and signaled to the decoding apparatus. The motion information of the current block may be derived using the motion information of the selected merge candidate.

As another example, when the (A) MVP mode is applied to the current block, the encoding apparatus constructs (A) MVP candidate list to be described later, and among (A) Mvp (motion vector predictor) candidates included in the MVP candidate list. The motion vector of the selected mvp candidate may be used as mvp of the current block. In this case, for example, a motion vector indicating a reference block derived by the above-described motion estimation may be used as the motion vector of the current block, and the difference with the motion vector of the current block is smallest among the ravp candidates. An mvp candidate with a motion vector may be the selected mvp candidate. A motion vector difference (MVD), which is a difference obtained by subtracting the mvp from the motion vector of the current block, may be plotted. In this case, the information about the MVD may be signaled to the decoding apparatus. In addition, when the (A) MVP mode is applied, the value of the reference picture index may be configured with reference picture index information and separately signaled to the decoding apparatus.

The encoding apparatus may derive residual samples based on the prediction samples (S802). The encoding device is adapted to predict the original samples of the current block and the prediction. The residual samples can be derived by comparing the samples.

The encoding apparatus encodes image information including prediction information and residual information (S803). The encoding apparatus may output the encoded image information in the form of a bitstream. The prediction information may include prediction mode information (eg, skip flag, merge flag or mode index, etc.) and information about motion information as information related to the prediction procedure. Candidate selection information (eg, merge index, mvp flag, or mvp index), which is information to be used for information, may include information about the JMVD and / or reference picture index information. can do.

In addition, the information about the motion information may include information indicating whether L0 prediction, L1 prediction, or pair (bi) prediction is applied. The residual information is information about the residual samples. The residual information may include information about quantized transform coefficients for the residual samples. The output bitstream may be stored in a (digital) storage medium and delivered to the decoding device, or may be delivered to the decoding device via a network.

Meanwhile, as described above, the encoding apparatus may generate a reconstructed picture (including the reconstructed samples and the reconstructed block) based on the reference samples and the residual samples. This is because the encoding apparatus derives the same prediction result as that performed in the decoding apparatus, and thus the coding efficiency can be increased. Therefore, the encoding apparatus may store a reconstructed picture (or reconstructed samples, a reconstructed block) in a memory and use it as a reference picture for inter prediction. As described above, an in-loop filtering procedure or the like may be further applied to the reconstructed picture. 2019/194497 1 »（1 ^ 1 {2019/003803

10 and 11, the decoding apparatus may perform an operation corresponding to the operation performed by the encoding apparatus. The decoding apparatus may perform prediction on the current block and derive prediction samples based on the received prediction information.

31001 to 003 may be performed by the inter prediction unit 260 of the decoding apparatus, and residual information of £ 1004 may be obtained from the bitstream by the entropy decoding unit 210 of the decoding apparatus. The residual processor of the decoding apparatus may derive residual samples for the current block based on the residual information. In detail, the inverse quantization unit 220 of the residual processing unit performs dequantization on the basis of the quantized transform coefficients derived based on the residual information to derive transform coefficients and inverse transform unit of the residual processing unit ( 230 may derive residual samples for the current block by performing an inverse transform on the transform coefficients. 005 may be performed by an adder 23 recuperative or reconstruction unit of the decoding apparatus.

In detail, the decoding apparatus may determine a prediction mode for the current block based on the received prediction information (31001). The decoding apparatus may determine which inter prediction mode is applied to the current block based on the prediction mode information in the prediction information.

E.g,

It may be determined whether the merge mode is not applied to the current block or (^ degree mode is determined based on 3. Or One of various inter-ie, mode candidates may be selected based on the mode index. The inter prediction mode candidates may include a skip mode, a merge mode, and / or (A) MVP mode, or may include various inter prediction modes described below.

The decoding apparatus derives the motion information of the current block based on the determined inter prediction mode (S1002). For example, when a skip mode or a merge mode is applied to the current block, the decoding apparatus may configure a merge candidate list to be described later, and select one merge candidate among merge candidates included in the merge candidate list. The selection may be performed based on the above merge information. The motion information of the current block may be derived using the motion information of the selected merge candidate. The motion information of the selected merge candidate may be used as motion information of the current block.

As another example, when the (A) MVP mode is applied to the current block, the decoding apparatus configures a (AñMVP candidate list) to be described later, and selects among the mvp (motion vector predictor) reports included in the (A) MVP candidate list. The motion vector of the mvp tube may be used as the mvp of the current block The selection may be performed based on the above-described selection information (mvp flag or mvp index), in this case, based on the information on the MVD. The MVD of the block may be derived, and the motion vector of the current block may be derived based on the mvp and the MVD of the current block, and the reference picture index of the current block may be derived based on the reference picture index information. The picture indicated by the reference picture index in the reference picture list for the current block is the interlink of the current block. 2019/194497

42 may be derived as a reference picture referenced for prediction.

Meanwhile, as described below, motion information of the current block may be derived without constructing a candidate list, and in this case, motion information of the current block may be derived according to a procedure disclosed in a prediction mode to be described later. In this case, the candidate list structure as described above may be omitted.

The decoding apparatus may generate prediction samples for the current block based on the motion information of the current block (31003). In this case, the reference picture may be derived based on the reference picture index of the current block, and the prediction samples of the current block may be derived using the samples of the reference block indicated by the motion vector of the current block on the reference picture. In this case, as described below, a prediction sample filtering procedure for all or some of the prediction samples of the current block may be further performed.

For example, the inter prediction unit 260 of the decoding apparatus may include a prediction mode determination unit 261, a motion information derivation unit 262, and a prediction sample derivation unit 263, and the prediction mode determination unit 261. Determining a prediction mode for the current block based on the prediction mode information received in the step, and based on the information on the motion information received from the motion information derivation unit 262, motion information (motion vector and / or A reference picture index, etc.), and a predictive sample derivation unit 263? May derive the predictive samples of the current block.

The decoding apparatus generates residual samples for the current block based on the received residual information (004). The decoding apparatus generates reconstruction samples for the current block based on the prediction samples and the residual samples, Based on this, a reconstructed picture may be generated (S100 引.) As described above, an in-loop filtering procedure may be further applied to the reconstructed picture.

As described above, the inter prediction procedure may include an inter prediction mode determination step, motion information derivation step according to the determined prediction mode, and prediction execution (prediction sample generation) step based on the derived motion information. Determination of inter prediction mode Various inter prediction modes may be used for prediction of a current block in a picture. For example, various modes, such as merge mode, skip mode, MVP mode, and affine mode, may be used. Decoder side motion vector refinement (DMVR) mode, adaptive motion vector resolution (AMVR) mode, and the like may further be used as a secondary mode. The affine mode may be called an affine motion prediction mode. MVP mode may be referred to as advanced motion vector predictor (AMVP) mode. Prediction mode information indicating the inter prediction mode of the current block may be signaled from the encoding device to the decoding device. The prediction mode information may be included in the bitstream and received by the decoding apparatus. The prediction mode information may include index information indicating one of a plurality of candidate modes. Alternatively, the inter prediction mode may be indicated through hierarchical signaling of flag information. In this case, the prediction mode information may include one or more flags.

For example, a skip flag is signaled to indicate whether a skip mode is applied, and if a skip mode is not applied, a merge flag is signaled to apply a merge mode. 0 2019/194497 1 ＞（1 '/ 1 公 019/003803

44 indicates whether the merge mode is not applied

It may indicate that the mode is applied or further signal a flag for additional classification. The affine mode may be signaled in an independent mode, or may be signaled in a mode dependent on a merge mode or an MVP mode. For example, the affine mode may be configured with one candidate of a merge candidate list or an MVP candidate list as described below.

¾ M. £ # (Derivation of motion informat 丄 on according to inter prediction mode)

Inter prediction may be performed using motion information of the current block. The encoding apparatus may derive optimal motion information for the current block through a motion estimation procedure. For example, the encoding apparatus may search for a similar reference block having a high correlation using the original block in the original picture for the current block in fractional pixel units within a predetermined search range in the reference picture, thereby deriving motion information. Can be. The similarity of blocks can be derived based on the phase (difference of phase ñ based sample values). In this case, the motion information may be derived based on a reference block having the smallest SAD in the search area, and the derived motion information may be signaled to the decoding apparatus according to various methods based on the inter prediction mode. Merge mode and skip mode FIG. 12 is a diagram for describing a neighboring block used in a merge mode or a skip mode as an embodiment to which the present invention is applied.

When the merge mode is applied, the motion information of the current prediction block is not directly transmitted, but the motion information of the current prediction block is derived using the motion information of the neighboring prediction block. Accordingly, the motion information of the current prediction block can be indicated by transmitting flag information indicating that the merge mode is used and a merge index indicating which neighboring prediction blocks are used.

The encoder can search the merge candidate block used to derive motion information of the current prediction block to perform the merge mode. For example, up to five merge candidate blocks may be used, but the present invention is not limited thereto. The maximum number of merge candidate blocks may be transmitted in a slice header (or tile group header), but the present invention is not limited thereto. After finding the merge candidate blocks, the encoder may generate a merge candidate list, and select the merge candidate block having the smallest cost among them as the final merge candidate block.

The present invention provides various embodiments of a merge candidate block forming the merge candidate list.

The merge candidate list may use, for example, five merge candidate blocks. For example, four spatial merge candidates

One temporal merge candidate can be used. As a specific example, in the case of a spatial merge candidate, the blocks illustrated in FIG. 12 may be spatially merged. Can be used as a candidate.

Referring to FIG. 13, the coding apparatus (encoder / decoder) inserts spatial merge candidates derived by searching for spatial neighboring blocks of the current block, to the merge candidate list (S1301). For example, the spatial neighboring blocks may include a lower left corner peripheral block, a left peripheral block, a right upper corner peripheral block, an upper peripheral block, and an upper left corner peripheral block of the current block. However, as an example, in addition to the above-described spatial neighboring blocks, additional neighboring blocks such as a right neighboring block, a lower neighboring block, and a lower right neighboring block may be further used as the spatial neighboring blocks: the coding apparatus prioritizes the spatial neighboring blocks. The available blocks may be detected by searching based on, and the motion information of the detected blocks may be derived as the spatial merge candidates. For example, the encoder and the decoder may search the five blocks shown in FIG. 12 in the order of Al, Bl, BO, AO, and B2 to sequentially index the available candidates to form a merge candidate list.

The coding apparatus inserts the temporal merge candidate derived by searching the temporal neighboring block of the current block into the merge candidate list (S1302). The temporal neighboring block may be located on a reference picture that is a picture different from the current picture in which the current block is located. The reference picture in which the temporal neighboring block is located may be called a collocated picture or a col picture. The temporal neighboring block may be searched in the order of the lower right corner peripheral block and the lower right center block of the co-located block with respect to the current block on the col picture. 2019/194497 1 »（： 1 ^ 1 {2019/003803

On the other hand, when motion data compression is applied, specific motion information may be stored as the representative motion information in each col storage unit in the col picture. In this case, it is not necessary to store the motion information for all the blocks in the predetermined storage unit, thereby obtaining a motion data compression effect. In this case, the constant storage unit may be predetermined, for example, 16x16 sample units, 8x8 sample units, or the like, or size information about the constant storage unit may be signaled from the encoder to the decoder. When the motion data compression ⁰ ] is applied, motion information of the temporal neighboring block may be replaced with representative motion information of the predetermined storage unit in which the temporal neighboring block is located.

That is, in this case, in terms of implementation, the arithmetic left shifted position after arithmetic right shifted by a predetermined value based on the coordinates (upper left sample position) of the temporal neighboring block, not the prediction block located at the coordinates of the temporal neighboring block. The temporal merge candidate may be derived based on motion information of a covering prediction block. For example, when the constant storage unit is 2nx2n sample units, assuming that the coordinates of the temporal neighboring block are (xTnb, yTnb), the modified positions ((xTnb »n) << n) and (yTnb> ñn) Motion information of the predictive block located at < <n) may be used for the temporal merge candidate.

Specifically, for example, when the constant storage unit is a 16x16 sample unit, when the coordinates of the temporal neighboring block are (xTnb, yTnb), the modified position ((xTnb »4)« 4), (yTnb> Motion information of the prediction block located at < 4 >)<<4) can be used for the temporal merge candidate. Or for example, the schedule If the storage unit is 8x8 sample units, assuming that the coordinates of the temporal neighboring block are (xTnb, yTnb), the modified positions ((xTnb >> 3) << 3), (yTnb> ñ3) «3)) The motion information of the located prediction block may be used for the temporal merge candidate.

The coding apparatus may check whether the number of current merge candidates is smaller than the maximum number of merge candidates (S1303). The maximum number of merge candidates may be predefined or signaled at the encoder to the decoder. For example, the encoder may generate information about the maximum number of merge candidates, encode the information, and transmit the encoded information to the decoder in the form of a bitstream. If the maximum number of merge candidates is filled up, the subsequent candidate addition process may not proceed.

As a result of the checking, when the number of current merge candidates is smaller than the maximum merge candidates, the coding apparatus inserts an additional merge candidate into the merge candidate list (S13CM). The additional merge candidate may include, for example, ATMVP, combined bi-predictive merge candidate (when the slice type of the current slice is B type) and / or zero vector merge candidate.

As a result of the checking, when the number of the current merge candidates is not smaller than the number of the maximum merge candidates, the coding apparatus may terminate the construction of the merge candidate list. In this case, the encoder may select an optimal merge fubo among merge candidates constituting the merge candidate list based on a rate-distortion (RD) cost, and signal selection information (ex. Merge index) indicating the selected merge fubo to a decoder. can do. The decoder may select the optimal merge candidate based on the merge candidate list and the selection information. 2019/194497 1 »（： 1/10 技 019/003803

As described above, the motion information of the selected merge candidate may be used as the motion information of the current block, and the prediction samples of the current block may be derived based on the motion information of the current block. An encoder may derive residual samples of the current block based on the prediction samples, and may signal residual information about the residual samples to a decoder. As described above, the decoder may generate reconstructed samples based on the residual samples and the example, based on the residual information, and generate reconstructed pictures based on the samples.

When the skip mode is applied, the motion information of the current block may be derived in the same manner as when the merge mode is applied. However, when the skip mode is applied, the residual signal for the corresponding block is omitted, and thus prediction samples may be used as reconstructed samples.

MVP mode

When the Motion Vector Prediction (MVP) mode is applied, the motion vector and / or the temporal neighboring block (or Col block) of the reconstructed spatial neighboring block (for example, may be the neighboring block described above with reference to FIG. 12). Using the motion vector, a motion vector predictor (mvp) candidate list may be generated. That is, the motion vector corresponding to the reconstructed spatial neighboring block and / or the motion vector corresponding to the temporal neighboring block is a motion vector, that is, a candidate. Can be used.

The information about the prediction may include selection information (eg, an MVP flag or an MVP index) indicating an optimal motion vector predictor candidate selected from the motion vector predictor candidates included in the list. At this time, the prediction unit may select the motion vector predictor of the current block from among the motion vector predictor candidates included in the motion vector candidate list using the selection information. The prediction unit of the encoding apparatus may obtain a motion vector difference (MVD) between the motion vector of the current block and the motion vector predictor, and may encode the output vector in a bitstream form. That is, MVD may be obtained by subtracting the motion vector predictor from the motion vector of the current block. In this case, the prediction unit of the decoding apparatus may obtain a motion vector difference included in the information about the prediction, and derive the motion vector of the current block by adding the motion vector difference and the motion vector predictor. The prediction unit of the decoding apparatus may obtain or derive a reference picture index or the like indicating the reference picture from the information about the prediction. For example, the motion vector predictor candidate list may be configured as shown in FIG. 14.

DMVR (Decoder side Motion Vector Refinement)

The DMVR is a method of performing a motion example, that is, by refining motion information obtained from neighboring blocks at the decoder side. In an embodiment, when the DMVR is applied, the decoder may generate a cost based on a prediction block (or prediction sample array, prediction template) generated using motion information of neighboring blocks in a merge / skip mode. Improved movement information through comparison motion information) can be derived.

When the merge / skip mode using the motion vector of the neighboring block is applied as it is, an error may occur because the motion vector of the current block is expressed using the motion vector for each direction. The encoder / decoder may apply a DMVR process to improve the motion vector to correct this error. In other words, the DMVR process can be invoked to improve the accuracy of the initial motion compensation prediction (ie, motion compensation prediction through merge / skip mode). 15 is a flowchart illustrating a method of generating an inter prediction block by applying a DMVR as an embodiment to which the present invention may be applied.

Referring to FIG. 15, a decoder is mainly described for convenience of description, but the present invention is not limited thereto. The method of generating an inter prediction block according to an embodiment of the present invention may be performed in the same manner in the encoder and the decoder.

When the bidirectional prediction is applied to the current block, the decoder derives a first initial motion vector and a second initial motion vector of the current block (S1501). In one embodiment, when the decoder is bi-directionally applied to the current block, and the current picture is located between two reference picture lists based on a picture order count (POC) indicating the output order of the pictures, the present invention The DMVR process according to the embodiment of the present invention may be applied. In addition, in one embodiment, if the decoder is bidirectional prediction is applied to the current block, the current picture is located between two reference pictures on the basis of the POC, the distance between the two reference picture and the current picture is the same, the embodiment of the present invention The DMVR process can be applied. 2019/194497 1 »（： 1/10 公 019/003803

The decoder 52 derives a first final motion vector (or an improved motion vector) and a second final motion vector (or an improved motion vector) by improving the first initial motion vector and the second initial motion vector (31502). As an embodiment, the decoder may improve the first initial motion vector and the second initial motion vector by using a motion vector offset value that is symmetric with respect to the reference list direction. Here, the motion vector offset value represents a value added (or subtracted) to the initial motion vector, and may be referred to as a motion vector difference value.

The decoder generates a first example block using the first final motion vector, and generates a second prediction block using the second final motion vector (31503). The decoder generates a third prediction block representing the prediction block of the current block by using the first prediction block and the second prediction block (31504?

1 and 5 are diagrams for explaining neighboring blocks used in a merge mode or a skip mode as an embodiment to which the present invention is applied.

Referring to FIG. 16, a decoder is mainly described for convenience of description, but the present invention is not limited thereto, and the method of generating an inter prediction block according to an embodiment of the present invention may be performed in the same manner in the encoder and the decoder.

Like the example of FIG. 15 described above-as,

Prior to applying the process, an initial motion vector can be derived. For example, when the merge mode (root 1110016) is applied, current motion, i.e., the motion information of the block is not transmitted directly from the encoder, and the decoder uses the motion information of the neighboring motion block, i.e., the initial motion information of the current prediction block. Can be induced. The initial motion information may include an initial motion vector, a reference picture list, and a reference picture index. As an embodiment, the encoder may indicate the initial motion information of the current prediction block by transmitting flag information indicating that the merge mode is used and a merge index indicating which block motion information is used.

The decoder may use the candidate blocks as shown in FIG. 16 in ie, initializing the initial motion vector for refinement. As an example, the decoder may construct a merge candidate list using the candidate block illustrated in FIG. 16. In this case, the merge candidate list may include a spatial candidate and a temporal candidate of the position illustrated in FIG. 16. However, FIG. 16 is an example, and the decoder may predict the initial motion vector by referring to candidates of various positions in addition.

Figure 17a is an embodiment to which the present invention is applied, a template

FIG. 4 illustrates a method of improving a motion vector obtained from a neighboring block based on template matching.

Referring to FIG. 17A, the encoder / decoder may obtain initial motion vectors mv_x and mv_y from neighboring blocks. (A) of FIG. 17A assumes a case in which a neighboring block adjacent to the upper right end of the current block is selected among several stubs.

The encoder / decoder may improve the initial motion vector by searching for a motion vector having an optimal cost based on template matching. Referring to FIG. 17A (b), the template region may be set to the restored left and upper specific regions. The encoder / decoder may improve the initial motion vector based on the cost (or difference) between the template area of the current block and a particular template area within the search range. In the present invention, the search region has initial motion within a reference picture. The specific region determined based on the vector may be referred to as a search range, a limited region, a limited range, or the like.

FIG. 17B is a diagram for describing a method of improving a motion vector based on similarity between prediction blocks in an embodiment to which the present invention is applied.

5 Referring to FIG. 17B, an encoder / decoder may first obtain an initial motion vector from a spatial candidate, a temporal candidate, and / or a history based prediction candidate.

The encoder / decoder may improve the thatched motion vector by searching for a motion vector having an optimal cost based on zero-to-block similarity generated through a bidirectional example. In this case, Equation 1 below may be used.

[Equation 1] pos

Referring to Equation 1, P0 and P1 represent initial prediction L5 blocks for respective reference / directions. The encoder / decoder may improve the initial motion vector by searching for a motion vector having an optimal cost based on the difference between the position pixels corresponding to each other in the two prediction blocks.

20 Referring to (a) of FIG. 18, as in the example of FIG. 12 or FIG. It is assumed that a neighboring block adjacent to the upper right end of the current block is selected among neighboring candidate blocks. The encoder / decoder may use the motion vector of the selected peripheral block as the initial motion vector.

18 (Referring to this, the encoder / decoder predicts (or derives and obtains) an initial motion vector from a candidate list, and then searches an area around a pixel specified by the predicted motion vector.

Specifically, in an embodiment of the present invention, the encoder / decoder may predict the initial motion vector from the candidate list and then perform rounding with integer pixel precision on the initial motion vector. The encoder / decoder may search for an integer pixel having a minimum cost value in the search area based on the initial motion vector rounded to an integer pixel. In the same way, the encoder / decoder can then search for pixels whose cost value is minimum with half pixel precision. In addition, the encoder / decoder may search for a pixel having a minimum cost value with quarter pixel (that is, one quarter pixel) precision.

In one embodiment, the encoder / decoder may use bilinear interpolation filtering to search for motion vectors with minimal cost values.

In an embodiment of the present invention, the above-described search region may be a region predefined in the encoder and the decoder, or the search region information may be signaled from the encoder to the decoder based on a higher level syntax.

In one embodiment, the method proposed in the present invention not only transmits a motion vector difference, but also transmits a motion vector difference. It can also be used for a method for performing motion prediction / compensation without using the derived motion vector (that is, the initial motion vector).

Referring to FIG. 19A, it is assumed that a neighboring block adjacent to the upper right end of the current block is selected from among candidate blocks around the current block as in the example of FIG. 12 or 16. The encoder / decoder may use the motion vector of the selected neighboring block as the initial motion vector.

19 (Referring to this, the encoder / decoder predicts (or derives and obtains) an initial motion vector from a candidate list, and then searches an area around a pixel specified by the predicted motion vector.

In one embodiment of the present invention, in the process of transmitting the motion vector difference in integer-pixel units according to adaptive motion vector resolution (AMVR) technology, the encoder / decoder Regardless of the precision, the motion vector can be improved with integer pixel precision relative to the current pixel.

That is, the encoder / decoder may search for a motion vector having a minimum cost value in the search region in units of longevity pixels based on the initial motion vector regardless of the precision of the initial motion vectors mv_x and mv_y. 19 (As an example of this, when the initial motion vector has 1/4 pixel precision, the encoder / decoder may derive the improved motion vector by comparing the costs only at the 1/4 pixel positions of the same corresponding positions. . 20 is a diagram for describing a method of deriving an improved motion vector in units of integer pixels according to an embodiment to which the present invention is applied.

Referring to FIG. 20 (a ñ, as described above with reference to FIG. 12 or FIG. 16), the encoder / decoder may derive a motion vector of a specific block from among candidate blocks around the current block as an initial motion vector.

Referring to FIG. 20B, after the encoder / decoder predicts (or derives and obtains) the initial motion vector, the encoder / decoder may search for a pixel peripheral region specified by the initial motion vector. In this case, when AMVR is applied to the current block, the encoder / decoder performs rounding with integer pixel precision on the initial motion vector, and the encoder / decoder is in the search region based on the initial motion vector rounded with integer pixels. You can search for integer pixels where the cost value is the smallest in. The encoder can then send motion vector differences (mvd_x, mvd_y) with integer pixel precision to the decoder.

According to an embodiment of the present invention, an interpolation filter is not required because it is not necessary to perform a process of calculating a cost value of a fractional pel in order to refine a motion vector. However, it has advantages in terms of complexity.

In the above-described embodiments, a method of improving a motion vector with integer pixel precision is described under the assumption that AMVR is applied to the current block in generating the inter prediction block of the current block. On the other hand, in another embodiment, the encoder / decoder predicts the initial motion vector from the candidate list, and then converts the hypervalent motion vector into integer pixels regardless of the on / off condition of the AMVR. After rounding, the motion vector enhancement process may be performed based on the location (ie, pixel) specified by the rounded motion vector.

The calculation of the fractional pixel cost in the motion vector improvement process requires an interpolation filter. There is a drawback to increasing complexity from a known hardware point of view. Accordingly, according to the embodiment of the present invention, such a problem is solved, and the encoder / decoder may improve the motion vector by applying the method described with reference to FIG. 20 regardless of whether the AMVR is applied or not.

FIG. 21 is a diagram for describing a method of deriving an improved motion vector in units of integer pixels according to an embodiment to which the present invention is applied.

Referring to FIG. 21 (a), as in the example of FIG. 12 or FIG. 16, the encoder / decoder may derive a motion vector of a specific block from among candidate blocks around the current block as an initial motion vector.

Referring to FIG. 21 (b), the encoder / decoder may search for an area around a pixel specified by the initial motion vector after predicting (or deriving or obtaining) an initial motion vector. In this case, the encoder / decoder may round the initial motion vector with integer pixel precision and search for an integer pixel having a minimum cost value in the search area based on the initial motion vector rounded with integer pixels. have. In addition, the encoder may transmit motion vector differences (mvd_x, mvd_y) to the decoder.

In particular, in an embodiment of the present invention, when no AMVR is applied to the current block, the encoder / decoder does not round the initial motion vector to integer pixel precision, but rather the nearest integer pixel position. \ ¥ 0 2019/194497 1 ^> (: 1 '/ 1 ¾2019 / 003803

59 position), a motion vector refinement process may be performed. The encoder may transmit motion vector differences (mvd_x, mvd_y) to the decoder based on the initial motion vector.

FIG. 22 is a diagram for describing a method of deriving an improved motion vector in units of integer pixels according to an embodiment to which the present invention is applied.

Referring to FIG. 22A, as in the example of FIG. 12 or FIG. 16, the encoder / decoder may derive a motion vector of a specific block among initial blocks and neighboring candidate blocks as an initial motion vector.

22 (Referring to this, in an embodiment of the present invention, after the encoder / decoder derives the initial motion vector, the improvement process for the initial motion vector is applied based on the integer pixel position regardless of whether the AMVR is applied. In this embodiment, the rounding may not be performed on the initial motion vector In the case of the mode in which the motion vector difference is not transmitted, the proposed method may be applied independently of whether the AMVR is applied.

In this embodiment, the encoder / decoder may search for integer pixel positions in the search region to derive the improved motion vector. According to an embodiment of the present invention, as the interpolation filter is not applied, the complexity of the decoder side may be improved.

Hereinafter, a specific example of a DMVR process to which the methods proposed herein are applied will be described.

23 is an embodiment to which the present invention may be applied and includes a DMVR (Decoder side).

Motion Vector Refinement) process.

Referring to FIG. 23, for convenience of description, the decoder will be mainly described. The present invention is not limited thereto, and the motion vector derivation method according to the embodiment of the present invention may be performed in the same manner in the encoder and the decoder.

The decoder checks whether the inter mode application condition is satisfied in the current block (S2301). For example, when the condition according to Equation 2 below is satisfied, the inter 5 mode may be applied to the current block.

[Equation 2]

! merge_1: riangle_flag &&! affine_flag &&! nierge_siibblock_flag

Referring to Equation 2, if the triangle merge mode is not applied to the current block, the affine mode is not applied, and the sub block merge mode is not applied, the inter mode may be applied to the current block.

The decoder checks whether the DMVR application condition is satisfied for the current block (S2302). For example, if the condition according to Equation 3 below is satisfied, DMVR may be applied to the current block.

[Equation 3]

SPS_ dm vr && merge_flag && predFlagLO && predFlagLl && TrueBi &&H> = 8 && W * H _{x 5} ñ = 64 &&! Mm vd_ flag

Referring to Equation 3, a picture order count (POC) that allows DMVR application in the SPS, merge mode is applied to the current block, bidirectional prediction is applied to the current block, and an output order of the current picture. If the current picture is located between two reference pictures, the width of the current block is greater than or equal to 20, the number of pixels in the current block is greater than or equal to 64, and no mmvd is applied to the current block, the decoder DMVR can be applied to a block. And, If the decoder satisfies the DMVR application condition, the decoder may parse dmvrFlag indicating whether dmvr is applied to the current block.

If the dmvrFlag value is 1, the decoder divides the current block into subblocks according to the size of the current block (S2303). The decoder may apply the DMVR process in units of 16 × 16 sub-blocks (S2304 and S2305). The decoder may divide the current block into sub-blocks of size 16x16 when the current block is larger than 16x16, and derive motion information based on the current block size. If the dmvrFlag value is 0, the decoder derives the motion vector of the current block without applying the DMVR (S2306).

The above-described sub-block unit and motion information derivation process (ie, S2303, S2304,

Step S2305) will be described in detail with reference to the drawings below.

24 shows an example of a decoder side motion vector refinement (DMVR) process as an embodiment to which the present invention may be applied.

Referring to FIG. 24, a decoder is mainly described for convenience of description, but the present invention is not limited thereto, and the motion vector derivation method according to the embodiment of the present invention may be performed in the same manner in the encoder and the decoder.

1. When the decoder fetches a reference region or a reference picture buffer, when the size of the current block is W * H in consideration of memory bandwidth, (W + 4) * (H Patch the area of +4). The area of (W + 4) * (H + 4) represents a search range, and the reference area may be specified by an initial motion vector of the current block.

2. The motion information of the merge candidate may have 1/16 precision. In this case, the decoder may perform interpolation on sub-pixels using a bi-linear interpolation filter to reduce complexity. In addition, in one embodiment, the decoder may perform the interpolation operation only within the region that has been previously patched.

3. After interpolation, the decoder calculates the sum of absolute difference (SAD) to compare the SAD values of each point (or pixel) to derive (or select) the best position with the minimum SAD value. .

4 . The decoder can derive a motion vector difference value (dMV) representing the difference value of the position of the point with the minimum SAD value at the initial position ñ. In the above, it is applied to improve the accuracy of the motion prediction in the decoder. When the DMVR is applied, an interpolation process may be required if the target pixel is not an inter-pixel pixel, and when the interpolation process is applied, this may cause hardware gate count, area, and power consumption. Can increase.

Therefore, in order to solve this problem, the present invention proposes a DMVR process that does not use an interpolation filter.

In addition, an embodiment of the present invention proposes a method of performing a DMVR using a pre-patched integer pixel region.

In addition, an embodiment of the present invention proposes a method for improving a motion vector based on the precision of the motion vector.

In general, in improving motion vectors, interpolation filters are sub-pixels (i.e. Fractional pixels). To apply an interpolation filter, the decoder must pre-patch integer-pixels before interpolation, regardless of whether they are used in the actual interpolation process. Accordingly, an embodiment of the present invention proposes a method of estimating as an additional position for motion vector improvement using pre-patched pixels.

In the conventional refinement process, the motion estimation / compensation process is applied for the current pixel position. In other words, if the current pixel is a fractional pixel (or fractional-pel), the motion estimation / compensation process is applied to the fractional pixel. Therefore, there is a problem that integer pixel pixels are required to be pre-patched to a memory (for example, a double data rate (DDR) memory) for interpolation filter application even if they are not actually estimated for improvement. Therefore, first of all, the embodiment of the present invention proposes a method of dealing with an improvement process without an interpolation filter.

Referring to FIG. 25, a current prediction block predicted based on a motion vector of a fractional pixel is illustrated. In FIG. 25, it is assumed that a current block is a 4x4 block. In this case, the encoder / decoder may use integer pixels as shown in FIG. 25 for motion estimation / compensation.

A search area for motion estimation / compensation for the example of FIG. 26 is shown in FIG. 26A. In FIG. 26A, it is assumed that a search region for improving a motion vector is (W + l) x (H + l) based on the current block. However, the present invention is not limited thereto, and the search area to which the embodiment of the present invention is applied is (W + N) x (H + N) As an integer value, for example, it may be set to 2, 3, 4, and so on.

Referring to FIG. 26A, the search area may be configured of the same precision fractional pixel using the location of the currently predicted block. As an example, fractional pixels in the search region may be generated from integer-pels using interpolation filters. At this time, the integer pixel required for the improvement process is shown in FIG. 26 (as shown in FIG. 26. In this case, it is assumed that an 8 snake interpolation filter is applied, but the present invention is not limited thereto. For example, the encoder / decoder may perform interpolation using a bilinear interpolation filter.

That is, even when the search area is set in the area where the pixel line is added based on the current block, when the fractional pixel is configured, more integer pixels are required for interpolation, which may cause a memory burden on the decoder side. have.

27 is a diagram illustrating a search region of integer pixel positions for motion vector refinement according to an embodiment to which the present invention is applied. Referring to FIG. 27, the size of the current block is 4x4, and the search area is

Assume that (W + l) x (H + l) is set.

In an embodiment of the present invention, the encoder / decoder may perform motion vector improvement without interpolation as shown in FIG. 27 to reduce overhead (or memory burden) due to integer pixel interpolation. That is, the encoder / decoder may set an integer pixel position block of (W + l) x (H + l) as a search region and perform motion vector improvement.

For example, in the refinement process, the encoder / decoder may be A maximum of 9 integer pixel positions can be used for motion sting / compensation. The best location can be chosen from these locations.

Thus, in the embodiment of the present invention, the surplus integer pixels shown in FIG. 26 (therefore, are not required. Instead, integer pixels as shown in FIG. 27 may be patched for motion vector improvement.

FIG. 28 is a diagram illustrating a search region for a motion vector refinement and a patch region according thereto according to an embodiment to which the present invention is applied.

Referring to FIG. 28, in an embodiment of the present invention, the pre-patched integer pixels can be used completely for motion vector improvement. On the hardware side, calculating SAD using data from DDR does not significantly increase the complexity. 25 and 26, when the motion vector is a fractional pixel position, integer-fill pixels are required to be patched for interpolation.

That is, in an embodiment of the present invention, when the motion vector has fractional pixel precision, the encoder / decoder may predict all fractional pixel positions using integer pixels that are pre-patched without additional memory patches. In this case, the integer pixel range may be set in the same manner as in FIG. 26.

In one embodiment, the encoder / decoder may set the current fractional position, eight fractional pixel positions, and 64 integer pixel positions shown in FIGS. 28A and 28B as search regions. And, the best location can be determined within the set search area.

28 (In this case, it is assumed that the 8-tap interpolation filter is applied, but the present invention The present invention is not limited thereto, and various ¾ interpolation filters may be applied. For example, an encoder / decoder can perform interpolation using a bilinear interpolation filter.

In one embodiment of the invention, the encoder / decoder may apply a motion vector improvement process based on the current motion vector precision. In this case, the encoder / decoder may perform an interpolation process by applying independent conditions with respect to the horizontal direction and the vertical direction.

Specifically, the encoder / decoder applies the proposed interpolation process using pre-patched integer pixels in the horizontal direction when the condition of Equation 4 below (i.e. xFrac is set to true), and If the condition of Equation 5 is satisfied (ie, yFrac is set to true), the proposed process can be applied in the vertical direction.

[Equation 4]

For example, if the motion vector for the vertical direction is not in fractional pixel units, the interpolation filter is not required in the vertical direction as shown in Fig. 29A. Similarly, if the motion vector for the horizontal direction is not in fractional pixel units, as shown in Fig. 29 (13), the interpolation filter is used for the horizontal direction. May not be used. In this case, the integer pixel block for interpolation will be described with reference to the drawings below.

3 is a diagram illustrating a search region for improving a motion vector according to an embodiment of the present invention.

Referring to FIG. 30, the encoder / decoder shows an integer pixel block for which a patch is required when an interpolation process is applied in the manual or horizontal direction as in the example of FIG. 29. In other words, the optimized integer pixel block according to the method proposed in FIG. 29 may be patched for interpolation as shown in FIG. 3.

Referring to FIG. 30 (a), the encoder / decoder may apply an interpolation process only in the vertical direction according to Equations 4 and 5, and is an integer in the vertical direction. 26 (In this case, it is assumed that an 8-tap interpolation filter is applied, but the present invention is not limited thereto and various tap number interpolation filters may be applied. For example, the encoder / decoder may be a bidirectional linear interpolation filter ( Interpolation can be performed using a bilinear interpolation filter.

According to an embodiment of the present invention, the proposed improvement process can be applied only to integer pixels that are pre-patched to improve coding performance without additional patching from DDR.

Embodiments of the present invention described above may be implemented independently, or one or more embodiments may be implemented in combination.

Referring to FIG. 31, for convenience of description, the decoder will be mainly described. The present invention is not limited thereto, and the inter prediction block generation method according to the embodiment of the present invention may be performed in the same manner in the encoder and the decoder.

The decoder derives an initial motion vector of the current block based on the motion information of the spatial neighboring block or the temporal neighboring block of the current block (S3101).

According to an embodiment, when the bidirectional prediction is applied to the current block, and the current picture is located between two reference picture lists based on a picture order count (POC) indicating a picture order count, the present invention is performed. The DMVR process according to the embodiment of the present invention may be applied. In addition, in one embodiment, if the decoder is bidirectional prediction is applied to the current block, the current picture is located between two reference pictures on the basis of the POC, the distance between the two reference picture and the current picture is the same, the embodiment of the present invention DMVR process can be applied.

The decoder derives a motion vector difference value representing a difference value between an initial position specified by the initial motion vector and an improved position within a preset search range (S3102).

In an embodiment, the improved position may be determined as a position that minimizes the cost value of the block that includes the improved position as the upper left pixel position.

As described above with reference to FIG. 18, if the initial motion vector has fractional pixel precision, the decoder rounds the initial motion vector to integer pixel precision and minimizes the cost value within the search range with integer pixel precision. After searching for an integer pixel position, the searched integer pixel position is determined. By reference, the improved position can be derived by searching for an integer pixel position that minimizes the cost value within the search range with fractional pixel precision.

In addition, as described above with reference to FIG. 19, when the initial motion vector has fractional pixel precision, the decoder rounds the initial motion vector to integer pixel precision, and the cost value within the search range with integer pixel precision. Integer pixel positions can be searched to minimize. In this case, the improved position may be determined as the fractional pixel position corresponding to the initial position based on the searched integer pixel position.

In addition, as described above with reference to FIGS. 20 to 22, when the initial motion vector has fractional pixel precision, the decoder rounds the initial motion vector to integer pixel precision, and within the search range with integer pixel precision. The improved position can be derived by searching for an integer pixel position that minimizes the cost value. In one embodiment, the motion vector differential value may be derived as a difference between the position specified by the rounded initial motion vector and the improved position.

The decoder derives a refined motion vector of the current block by adding the motion vector difference value to the initial motion vector (S3103).

The decoder generates a predictive block of the current block by using the improved motion vector (S3104).

32 is a diagram illustrating an inter prediction apparatus according to an embodiment to which the present invention is applied. In FIG. 32, the inter prediction unit is illustrated as one block for convenience of description, but the inter prediction unit may be implemented in a configuration included in the encoder and / or the decoder.

Referring to FIG. 32, the inter prediction unit implements the functions, processes, and / or methods proposed in FIGS. 8 to 31. In detail, the inter prediction unit may include an initial motion vector derivation unit 3201, a motion vector difference value derivation unit 3202, an improved motion vector derivation unit 3203, and a prediction block generator 3204.

The initial motion vector derivation unit 3201 derives an initial mot 의 on vector of the current block based on the motion information of the spatial neighboring block or the temporal neighboring block of the current block.

According to an embodiment, when the decoder applies bidirectional prediction to a current block and a current picture is located between two reference picture lists based on a picture order count (POC) indicating an output order of a picture, the present invention may be applied to the decoder. The DMVR process according to the embodiment of the present invention may be applied. In addition, in one embodiment, if the decoder is bidirectional prediction is applied to the current block, the current picture is located between two reference pictures on the basis of the POC, the distance between the two reference picture and the current picture is the same, the embodiment of the present invention DMVR process can be applied.

The motion vector difference induction _{unit 32 02} is a motion vector difference indicating a difference value between an initial position specified by an initial motion vector and an improved position within a preset search range. Derive a value.

In an embodiment, the improved position is defined as the upper left pixel position. 2019/194497 1 »（： 1 ^ 1 {2019/003803

71 may be determined as a position that minimizes the cost value of the containing block.

As described above with reference to FIG. 18, when the initial motion vector has fractional pixel precision, the motion vector differential value inducing unit 3202 rounds the initial motion vector to integer pixel precision, and the search range with integer pixel precision. The improved by searching for an integer pixel position that minimizes the cost value within, and then searching for an integer pixel position that minimizes the cost value within the search range with fractional pixel precision, based on the found integer pixel position. You can derive the location.

In addition, as described above with reference to FIG. 19, when the initial motion vector has fractional pixel precision, the motion vector difference value inducing unit 3202 rounds the initial motion vector to integer pixel precision, An integer pixel position may be searched for minimizing a cost value within the search range. In this case, the improved position may be determined as the fractional pixel position corresponding to the initial position based on the searched integer pixel position.

In addition, as described above with reference to FIGS. 20 to 22, when the initial motion vector has the fractional pixel precision, the motion vector difference value inducing unit 3202 rounds the initial motion vector to integer pixel precision and rounds the integer pixel. The improved position can be derived by searching for an integer pixel position that minimizes the cost value within the search range with precision. In one embodiment, the motion vector difference value may be derived from a position specified by the rounded initial motion vector and the improved position difference value.

The improved motion vector derivation unit (3203) adds motion vectors to initial motion vectors. By adding the difference values, we derive a refined motion vector of the current block.

The predictive block generator 3204 generates a predictive block of the current block by using the improved motion vector.

33 shows a video coding system to which the present invention is applied.

The video coding system can include a source device and a receiving device. The source device may transmit the encoded video / image information or data to a receiving device through a digital storage medium or a network in the form of a file or streaming.

The source device may include a video source _/ encoding apparatus and a transmitter. The receiving device may include a receiver, a decoding apparatus and a tendered, the encoding apparatus may be called a video / video encoding apparatus, and the decoding apparatus is called a video / image decoding apparatus. The transmitter may be included in the encoding apparatus, the receiver may be included in the decoding apparatus, the renderer may include a display unit, and the display unit may be configured as a separate device or an external component.

The video source may acquire the video / image through a process of capturing, synthesizing, or generating the video / image. The video source may comprise a video / image capture device and / or a video / image generation device. The video / image capture device may include, for example, a video / image archive including one or more cameras, previously captured video / images. Video / Video Generation The device may include, for example, a computer, a tablet, a smartphone, and the like, and may generate (electronically) video / images. For example, a virtual video / image may be generated through a computer, and in this case, the video / image capturing process may be replaced by a process of generating related data.

The encoding device may encode the input video / picture. The encoding apparatus may perform a series of procedures such as prediction, transform, and quantization for compression and coding efficiency. The encoded data (encoded video / image information) may be output in the form of a bitstream.

The transmitter may transmit the encoded video / video information or data output in the form of a bitstream to the receiver of the receiving device through a digital storage medium or a network in the form of a file or streaming. The digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, and the like. The transmission unit may include an element for generating a media file through a predetermined file format, and may include an element for transmission through a broadcast / communication network. The receiver may extract the bitstream and transmit the extracted bitstream to the decoding apparatus. The decoding apparatus may decode the video / image by performing a series of procedures such as inverse quantization, inverse transformation, and prediction corresponding to the operation of the encoding apparatus.

The renderer may render the decoded video / image. The rendered video / image may be displayed through the display unit. 34 is a diagram illustrating the structure of a content streaming system according to an embodiment to which the present invention is applied. \ ¥ 0 2019/194497 1 »（： 1/10 {2019/003803

74 Referring to FIG. 34, a content streaming system to which the present invention is applied may largely include an encoding server, a streaming server, a web server, a media storage, a user device, and a multimedia input device.

The encoding server compresses content input from multimedia input devices such as a smart phone, a camera, a camcorder, etc. into digital data to generate a bitstream and transmit the bitstream to the streaming server. As another example, when multimedia input devices such as smart phones, cameras, camcorders, etc. directly generate a bitstream, the encoding server may be omitted.

The bitstream may be generated by an encoding method or a bitstream generation method to which the present invention is applied, and the streaming server may temporarily store the bitstream in the process of transmitting or receiving the bitstream.

The streaming server transmits the multimedia data to the user device based on the user's request through the web server, and the web server serves as a medium for informing the user of what service. When a user requests a desired service from the web server, the web server delivers it to a streaming server, and the streaming server transmits multimedia data to the user. In this case, the content streaming system may include a separate control server, in which case the control server serves to control the command / response between each device in the content streaming system.

The streaming server may receive content from a media store and / or an encoding server. For example, when the content is received from the encoding server, the content may be received in real time. In this case, smooth streaming In order to provide a service, the streaming server may store the bitstream for a predetermined time.

Examples of the user device include a cell phone, a smart phone, a laptop computer, a terminal for digital broadcasting, a personal digital assistant, a portable multimedia player, a navigation, and a slate PC. , Tablet PC, ultrabook, wearable device (e.g., smartwatch, glass glass, head mounted display), digital TV, Desktop computers, digital signage, and the like.

Each server in the content streaming system may be operated as a distributed server, in which case data received from each server may be distributed.

As described above, the embodiments described herein may be implemented and performed on a processor, microprocessor, controller, or ¾. For example, the functional units shown in each drawing may be implemented and performed on a computer, processor, microprocessor, controller, or chip.

In addition, the decoder and encoder to which the present invention is applied include a multimedia broadcasting transmitting and receiving device, a mobile communication terminal, a home cinema video device, a digital cinema video device, a surveillance camera, a video chat device, a real time communication device such as video communication, a mobile streaming device, Storage media, camcorders, video on demand (VoD) service providers, 0TT video (Over the top video) devices, Internet streaming service providers, 3D (3D) video devices, video telephony video devices, and medical video devices. Can be used to process video signals or data signals Can be. For example, the OTT video (Over the top video) device may include a game console, a Blu-ray player, an Internet access TV, a home theater system, a smartphone, a tablet PC, a digital video recorder (DVR), and the like.

In addition, the processing method to which the present invention is applied can be produced in the form of a program executed by a computer, and can be stored in a computer-readable recording medium. Multimedia data having a data structure according to the present invention can also be stored in a computer-readable recording medium. The computer readable recording medium includes all kinds of storage devices and distributed storage devices in which computer readable data is stored. The computer-readable recording medium may include, for example, a Blu-ray Disc (BD), a Universal Serial Bus (USB), a ROM, a PROM, an EPROM, an EEPROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, and an optical disc. It may include a data storage device. In addition, the recording medium in which the computer can read comprises a media implemented in the form of a carrier wave (e.g., transmission over the ^Internet,). In addition, the bitstream generated by the encoding method may be stored in a computer-readable recording medium or transmitted through a wired or wireless communication network.

In addition, an embodiment of the present invention may be implemented as a computer program product by program code, which may be performed on a computer by an embodiment of the present invention. The program code may be stored on a carrier readable by a computer.

The embodiments described above are the components and features of the present invention are combined in a predetermined form. Each component or feature is to be considered optional unless stated otherwise. Each component or feature may be a different component or 2019/194497 1 »（： 1 ^ 1 {2019/003803

77 may be implemented in a form not combined with the feature. It is also possible to combine some of the components and / or features to form an embodiment of the invention. The order of the operations described in the embodiments of the present invention may be changed. Some components or features of one embodiment may be included in another embodiment or may be replaced with corresponding components or features of another embodiment. It is obvious that the claims may be combined to form an embodiment by combining claims that do not have an explicit citation relationship in the claims or as new claims by post-application correction.

Embodiments according to the present invention may be implemented by various means, for example, hardware, firmware, software, or a combination thereof. For implementation in hardware, an embodiment of the present invention may include one or more ASICs (application specific integrated circuits), DSPs (digital signal processors), DSPDs (digital signal processing devices), PLDs (programmable logic devices), FPGAs. may be implemented by field programmable gate arrays, processors, controllers, microcontrollers, microprocessors, and the like.

In the case of implementation by firmware or software, an embodiment of the present invention may be implemented in the form of a module, procedure, function, etc. that performs the functions or operations described above. The software code may be stored in memory and driven by the processor. The memory may be located inside or outside the processor, and may exchange data with the processor by various known means.

It will be apparent to those skilled in the art that the present invention may be embodied in other specific forms without departing from the essential features of the present invention. Therefore, the above detailed description 2019/194497 1 »（： 1 ^ 1 {2019/003803

78 It should not be construed as limiting in all respects, but should be considered as illustrative. The scope of the invention should be determined by reasonable interpretation of the appended claims, and all changes within the equivalent scope of the invention are included in the scope of the invention.

Industrial Applicability

As mentioned above, preferred embodiments of the present invention are disclosed for purposes of illustration, and those skilled in the art can improve and change various other embodiments within the spirit and technical scope of the present invention disclosed in the appended claims below. , Replacement or addition would be possible.

Claims

[Range of request]

[Claim 1]

A method of decoding an image based on an inter prediction mode,

Deriving an initial motion vector of the current block based on motion information of a spatial neighboring block or a temporal neighboring block of the current block;

Deriving a motion vector difference value representing a difference value between an initial position specified by the initial motion vector and an improved position within a preset search range;

Deriving a refined motion vector of the current block by adding the motion vector difference value to the initial motion vector; And

And generating a prediction block of the current block by using the improved motion vector.

[Claim 2]

The method of claim 1,

And wherein the improved position is determined as a position that minimizes a cost (COSt) value of a block including the improved position as a top left pixel position.

[Claim 3] 2019/194497 1 ＞（1 '/ 1? 2019/003803

80 The method of claim 1,

Deriving the motion vector difference value,

If the additive initial motion vector has fractional pixel precision, rounding the initial motion vector to integer pixel precision;

Searching for integer pixel positions that minimize cost values within the search range with integer pixel precision; And

And based on the searched integer pixel position, deriving the improved position by searching for an integer pixel position having a minimum cost value within the search range with fractional pixel precision. .

[Claim 4]

The method of claim 1,

Deriving the motion vector difference value,

If the initial motion vector has fractional pixel precision, rounding the initial motion vector to integer pixel precision; And

Searching for an integer pixel position that minimizes a cost value within the search range with integer pixel precision,

And the improved position is determined as the fractional pixel position corresponding to the initial position based on the searched integer pixel position.

[Claim 5]

The method of claim 1,

Deriving the motion vector difference value, If the initial motion vector has fractional pixel precision, rounding the initial motion vector to integer pixel precision; And

Deriving the improved position by searching an integer pixel position that minimizes a cost value within the search range with integer pixel precision, wherein the motion vector differential value is specified by the rounded initial motion vector An inter prediction mode based image decoding method derived from a position and the improved position difference.

[Claim 6]

An apparatus for decoding an image based on an inter prediction mode,

An initial motion vector derivation unit for deriving an initial mot 의 on vector of the current block based on the motion cancer information of the spatial neighboring block or the temporal neighboring block of the current block;

A motion vector differential value deriving unit for deriving a motion vector differential value representing a difference value between an initial position specified by the initial motion vector and an improved position within a preset search range. ;

An improved motion vector derivation unit for deriving a refined motion vector of the current block by adding the motion vector difference value to the initial motion vector; And

And a prediction block generator generating a prediction block of the current block by using the improved motion vector.

[Claim 7] The method of claim 6,

And the improved position is determined as a position that minimizes a cost value of a block including the improved position as a top left pixel position.

[Claim 8]

The method of claim 6,

The motion vector difference value inducing unit,

If the initial motion vector has fractional pixel precision, rounding the initial motion vector to integer pixel precision,

Search for integer pixel positions with a minimum cost value within the search range with integer pixel precision,

And based on the searched integer pixel position, deriving the improved position by searching an integer pixel position having a minimum cost value within the search range with fractional pixel precision.

[Claim 9]

The method of claim 6,

The motion vector difference value inducing unit,

The improved position is determined based on the searched integer pixel position. 2019/194497 1 »（： 1/10 公 019/003803

An inter prediction mode based image decoding apparatus determined by fractional pixel positions corresponding to 83 positions.

[Claim 10]

The method of claim 6,

The motion vector difference value inducing unit,

If the initial motion vector has fractional pixel precision, the initial motion vector is converted to integer pixel precision.

and,

Derive the improved position by searching an integer pixel position that minimizes the cost value within the search range with integer pixel precision,

And the motion vector difference is derived from a position specified by the rounded initial motion vector and the difference between the improved position.