CN116866594A

CN116866594A - Method and apparatus for encoding/decoding image and recording medium storing bit stream

Info

Publication number: CN116866594A
Application number: CN202311025877.1A
Authority: CN
Inventors: 赵承眩; 林成昶; 姜晶媛; 高玄硕; 李镇浩; 李河贤; 全东山; 金晖容; 崔振秀
Original assignee: Electronics and Telecommunications Research Institute ETRI
Current assignee: Electronics and Telecommunications Research Institute ETRI
Priority date: 2016-11-28
Filing date: 2017-11-28
Publication date: 2023-10-10
Also published as: CN110024394B; CN116886929A; KR102328179B1; CN110024394A; CN116866593A; CN116886928A; WO2018097692A2; KR20210137982A; KR20230042673A; WO2018097692A3; KR20180061041A; CN116886930A

Abstract

The present application relates to a method and apparatus for encoding/decoding an image and a recording medium storing a bitstream. The method for decoding an image may comprise the steps of: generating a first prediction block of the current block through motion data of the current block; determining at least one motion data of a current lower block that can be used to generate a second prediction block among motion data of neighboring lower blocks; generating at least one second prediction block of the current subordinate block by the determined at least one motion data; a final prediction block is generated based on a weighted sum of the first prediction block of the current block and at least one second prediction block of the current lower block.

Description

Method and apparatus for encoding/decoding image and recording medium storing bit stream

The present application is a divisional application of patent application No. 201780073517.5, entitled "method and apparatus for encoding/decoding an image and recording medium storing a bitstream", having application date 2017, 11, 28.

Technical Field

The present application relates to a method and apparatus for encoding/decoding an image and a recording medium storing a bitstream. More particularly, the present application relates to a method and apparatus for encoding/decoding an image using overlapped block motion compensation.

Background

Recently, demand for high-resolution quality images such as high-definition (HD) images or ultra-high-definition (UHD) images has been growing in various application fields. However, the data amount of higher resolution and quality image data is increased as compared with conventional image data. Accordingly, when image data is transmitted by using a medium such as a conventional wired broadband network or a wireless broadband network, or when image data is stored in a conventional storage medium, transmission costs and storage costs increase. In order to solve these problems occurring with the improvement of resolution and quality of image data, efficient image encoding/decoding techniques are required.

Image compression techniques include various techniques including: inter prediction techniques that predict pixel values included in a current picture from a previous picture or a subsequent picture of the current picture; an intra prediction technique of predicting a pixel value included in a current picture by using pixel information in the current picture; entropy coding techniques that assign short codes to high frequency of occurrence values and long codes to low frequency of occurrence values; etc. By using such an image compression technique, image data can be efficiently compressed, and the compressed image data is transmitted or stored.

The conventional image encoding/decoding method and apparatus have disadvantages in that: the computational complexity increases during the calculation of the weighted sum for the overlapped block motion compensation and the derivation of the motion information of neighboring blocks.

Disclosure of Invention

Technical problem

Accordingly, the present invention has been made keeping in mind the above problems occurring in the prior art, and an object of the present invention is to provide a method and apparatus for performing overlapped block motion compensation while reducing computational complexity during calculating a weighted sum for overlapped block motion compensation and deriving motion information of neighboring blocks.

Solution scheme

To achieve the above object, the present invention provides a method for decoding an image, the method comprising: generating a first prediction block of the current block using motion information of the current block; determining motion information that can be used to generate a second prediction block among motion information of at least one neighboring sub-block of the current sub-block; generating at least one second prediction block of the current sub-block using the determined motion information; and generating a final prediction block based on a weighted sum of the first prediction block of the current block and the at least one second prediction block of the current sub-block.

In the image decoding method, in the determining of the motion information that can be used to generate the second prediction block, the motion information that can be used to generate the second prediction block may be determined based on at least one of a size and a direction of a motion vector of a neighboring sub-block of the current sub-block.

In the image decoding method, in the step of determining motion information that can be used to generate the second prediction block, the motion information that can be used to generate the second prediction block may be determined based on a picture count (POC) of a reference picture of the neighboring sub-block and a POC of a reference picture of the current block.

In the image decoding method, in the step of determining motion information that can be used to generate the second prediction block, the motion information of the neighboring sub-block is determined as motion information that can be used to generate the second prediction block only when the POC of the reference picture of the neighboring sub-block is equal to the POC of the reference picture of the current block.

In the image decoding method, the current sub-block may have a square shape or a rectangular shape.

In the image decoding method, in the generating of the at least one second prediction block, the at least one second prediction block may be generated using motion information of at least one neighboring sub-block of the current sub-block only when the current block has neither the motion vector derivation mode nor the affine motion compensation mode.

In the image decoding method, in the step of generating the final prediction block, when the current sub-block is included in the boundary region of the current block, the final prediction block is generated by obtaining a weighted sum of each sample point in a partial row or partial column of the first prediction block adjacent to the boundary and each sample point in a partial row or partial column of the second prediction block adjacent to the boundary.

In the image decoding method, the sample points in the partial row or the partial column of the first prediction block adjacent to the boundary and the sample points in the partial row or the partial column of the second prediction block adjacent to the boundary may be determined based on at least one of a block size of the current sub-block, a size and a direction of a motion vector of the current sub-block, an inter prediction indicator of the current block, and a POC of a reference picture of the current block.

In the image decoding method, in the step of generating the final prediction block, a weighted sum of the first prediction block and the second prediction block may be obtained by applying different weight factors to samples in the first prediction block and the second prediction block according to at least one of a size and a direction of a motion vector of the current sub-block.

The present invention provides a method for encoding an image, the method comprising: generating a first prediction block of the current block using motion information of the current block; determining motion information that can be used to generate a second prediction block among motion information of at least one neighboring sub-block of the current sub-block; generating at least one second prediction block of the current sub-block using the determined motion information; a final prediction block is generated based on a weighted sum of the first prediction block of the current block and the at least one second prediction block of the current sub-block.

In the image encoding method, in the determining of the motion information that can be used to generate the second prediction block, the motion information that can be used to generate the second prediction block may be determined based on at least one of a size and a direction of the motion vector of the neighboring sub-block.

In the image encoding method, in the step of determining the motion information that can be used to generate the second prediction block, the motion information that can be used to generate the second prediction block may be determined based on the POC of the reference picture of the neighboring sub-block and the POC of the reference picture of the current block.

In the image encoding method, in the determining of the motion information that can be used to generate the second prediction block, the motion information of the neighboring sub-block may be determined as the motion information that can be used to generate the second prediction block only when the POC of the reference picture of the neighboring sub-block is equal to the POC of the reference picture of the current block.

In the image encoding method, the current sub-block may have a square shape or a rectangular shape.

In the image encoding method, in the generating of the at least one second prediction block, the at least one second prediction block may be generated using motion information of the at least one neighboring sub-block only when the current block has neither the motion vector derivation mode nor the affine motion compensation mode.

In the image encoding method, in the generating of the final prediction block, when the current sub-block is included in the boundary region of the current block, the final prediction block may be generated based on a weighted sum of samples in a partial row or a partial column of the first prediction block adjacent to the boundary and samples in a partial row or a partial column of the second prediction block adjacent to the boundary.

In the image encoding method, the sample points in the partial row or the partial column of the first prediction block adjacent to the boundary and the sample points in the partial row or the partial column of the second prediction block adjacent to the boundary may be determined based on at least one of a block size of the current sub-block, a size and a direction of a motion vector of the current sub-block, an inter prediction indicator of the current block, and a POC of a reference picture of the current block.

In the image encoding method, in the step of generating the final prediction block, the weighted sum may be obtained by applying different weight values to samples in the first prediction block and the second prediction block according to at least one of the size and direction of the motion vector of the current sub-block.

The present invention provides a recording medium storing a bitstream generated by an image encoding method, the image encoding method comprising: generating a first prediction block of the current block using motion information of the current block; determining motion information that can be used to generate a second prediction block among motion information of at least one neighboring sub-block of the current sub-block; generating at least one second prediction block of the current sub-block using the determined motion information; a final prediction block is generated based on a weighted sum of the first prediction block of the current block and the at least one second prediction block of the current sub-block.

Advantageous effects

According to the present invention, it is possible to provide a method and apparatus for encoding/decoding an image with improved compression efficiency.

According to the present invention, image encoding/decoding efficiency can be improved.

According to the present invention, the computational complexity of the image encoder and the image decoder can be reduced.

Drawings

Fig. 1 is a block diagram showing the construction of an encoding apparatus according to an embodiment to which the present invention is applied;

fig. 2 is a block diagram showing the construction of a decoding apparatus according to an embodiment to which the present invention is applied;

fig. 3 is a diagram schematically showing a partition structure of an image used when encoding or decoding the image;

fig. 4 is a diagram illustrating an embodiment of an inter prediction process;

fig. 5 is a flowchart illustrating an image encoding method according to an embodiment of the present invention;

fig. 6 is a flowchart illustrating an image decoding method according to an embodiment of the present invention;

fig. 7 is a flowchart illustrating an image encoding method according to another embodiment of the present invention;

fig. 8 is a flowchart illustrating an image decoding method according to another embodiment of the present invention;

fig. 9 is a diagram showing an example of deriving spatial motion vector candidates of a current block;

Fig. 10 is a diagram showing an example of deriving temporal motion vector candidates of a current block;

fig. 11 is a diagram showing an example of adding spatial merge candidates to a merge candidate list;

fig. 12 is a diagram showing an example of adding a temporal merge candidate to a merge candidate list;

fig. 13 is a diagram showing an example of performing overlapped block motion compensation on a sub-block-by-sub-block basis;

fig. 14 is a diagram illustrating an example of performing overlapped block motion compensation using motion information of sub-blocks of a same bit block;

fig. 15 is a diagram illustrating an example of performing overlapped block motion compensation using motion information of blocks adjacent to a boundary of a reference block;

fig. 16 is a diagram showing an example of performing overlapped block motion compensation on a sub-block group-by-sub-block group basis;

fig. 17 is a diagram showing an example of the number of pieces of motion information for overlapped block motion compensation;

fig. 18 and 19 are diagrams showing the order in which motion information for generating the second prediction block is derived;

fig. 20 is a diagram illustrating an example of determining whether motion information of a neighboring sub-block is information usable to generate a second prediction block by comparing a POC of a reference picture of a current sub-block with a POC of a reference picture of a neighboring sub-block of the current sub-block;

Fig. 21 is a diagram illustrating an embodiment in which a weight factor is applied when calculating a weighted sum of a first prediction block and a second prediction block;

fig. 22 is a diagram showing an embodiment in which different weight factors are applied to samples in a block according to the positions of the samples when calculating a weighted sum of a first prediction block and a second prediction block;

fig. 23 is a diagram illustrating an embodiment of sequentially cumulatively calculating a weighted sum of a first prediction block and a second prediction block in a predetermined order during overlapped block motion compensation;

FIG. 24 is a diagram illustrating an embodiment of calculating a weighted sum of a first prediction block and a second prediction block during overlapped block motion compensation;

fig. 25 is a flowchart illustrating an image decoding method according to another embodiment of the present invention.

Detailed Description

Many modifications may be made to the present invention and there are various embodiments of the present invention, examples of which will now be provided with reference to the accompanying drawings and described in detail. However, the present invention is not limited thereto, and although the exemplary embodiments may be construed to include all modifications, equivalents, or alternatives falling within the technical spirit and scope of the present invention. Like reference numerals refer to the same or similar functionality in all respects. In the drawings, the shape and size of elements may be exaggerated for clarity. In the following detailed description of the invention, reference is made to the accompanying drawings that show, by way of illustration, specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosure. It is to be understood that the various embodiments of the disclosure, although different, are not necessarily mutually exclusive. For example, the particular features, structures, and characteristics described herein in connection with one embodiment may be implemented in other embodiments without departing from the spirit and scope of the disclosure. Further, it is to be understood that the location or arrangement of individual elements within each disclosed embodiment may be modified without departing from the spirit and scope of the disclosure. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present disclosure is defined only by the appended claims, appropriately interpreted, along with the full range of equivalents to which the claims are entitled.

The terms "first," "second," and the like, as used in the specification, may be used to describe various components, but these components are not to be construed as limiting the terms. The term is used merely to distinguish one component from another component. For example, a "first" component may be termed a "second" component, and a "second" component may be similarly termed a "first" component, without departing from the scope of the present invention. The term "and/or" includes a combination of items or any of a plurality of items.

It will be understood that in the present specification, when an element is referred to simply as being "connected" or "coupled" to another element, it can be "directly connected" or "directly coupled" to the other element or be connected or coupled to the other element with other elements interposed therebetween. In contrast, it will be understood that when an element is referred to as being "directly coupled" or "directly connected" to another element, there are no intervening elements present.

Further, constituent elements shown in the embodiments of the present invention are independently shown so as to exhibit characteristic functions different from each other. Therefore, this does not mean that each constituent element is composed in a separate constituent unit of hardware or software. In other words, for convenience, each component includes each of the enumerated components. Thus, at least two of each component may be combined to form one component, or one component may be divided into a plurality of components to perform each function. Embodiments in which each component is combined and embodiments in which one component is divided are also included in the scope of the present invention without departing from the essence of the present invention.

The terminology used in the description presented herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The use of the expression in the singular includes the plural unless it has a distinct meaning in context. In this specification, it will be understood that terms, such as "comprises," "comprising," "having," "includes," etc., are intended to specify the presence of the stated features, integers, steps, actions, elements, components, or groups thereof disclosed in the specification, and is not intended to exclude the possibility that one or more other features, amounts, steps, actions, elements, components, or combinations thereof may be present or may be added. In other words, when a specific element is referred to as being "included", elements other than the corresponding element are not excluded, and additional elements may be included in the embodiments of the present invention or be within the scope of the present invention.

Furthermore, some constituent elements may not be indispensable constituent elements performing the necessary functions of the present invention, but may be optional constituent elements merely improving the performance thereof. The present invention can be implemented by including only essential constituent elements for implementing the essence of the present invention, excluding constituent elements used in enhancing performance. Structures that include only the indispensable components and exclude optional components that are used in merely enhancing performance are also included in the scope of the present invention.

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. In describing exemplary embodiments of the present invention, well-known functions or constructions will not be described in detail since they would unnecessarily obscure the present invention. The same constituent elements in the drawings are denoted by the same reference numerals, and repetitive description of the same elements will be omitted.

Further, hereinafter, an image may mean a picture constituting a video, or may mean a video itself. For example, "encoding or decoding or both" may mean "encoding or decoding or both" a video, and may mean "encoding or decoding or both one image among a plurality of images of a video. Here, the picture and the image may have the same meaning.

Description of the terms

An encoder: means a device that performs encoding.

A decoder: meaning a device that performs decoding.

And (3) block: is a sample of an mxn matrix. Here, M and N mean positive integers, and a block may mean a sample matrix in two dimensions. A block may refer to a unit. The current block may mean an encoding target block that becomes a target when encoding or a decoding target block that becomes a target when decoding. In addition, the current block may be at least one of a coded block, a predicted block, a residual block, and a transformed block.

Sampling points: is the basic unit constituting a block. The samples can be represented as from 0 to 2 according to bit depth (Bd) ^Bd -1. In the present invention, a sample may be used as a meaning of a pixel.

A unit: refers to the coding and decoding unit. In encoding and decoding an image, a unit may be a region generated by partitioning a single image. Further, a unit may mean a sub-divided unit when a single image is partitioned into a plurality of sub-divided units during encoding or decoding. When encoding and decoding an image, predetermined processing for each unit may be performed. A single cell may be partitioned into sub-cells that are smaller in size than the cell. By function, a unit may mean a block, a macroblock, a coding tree unit, a coding tree block, a coding unit, a coding block, a prediction unit, a prediction block, a residual unit, a residual block, a transform unit, a transform block, etc. Further, to distinguish a unit from a block, the unit may include a luma component block, a chroma component block associated with the luma component block, and a syntax element for each color component block. The cells may have various sizes and shapes, and in particular, the shape of the cells may be a two-dimensional geometry, such as a rectangular shape, a square shape, a trapezoidal shape, a triangular shape, a pentagonal shape, and the like. Further, the unit information may include at least one of a unit type (indicating an encoding unit, a prediction unit, a transformation unit, etc.), a unit size, a unit depth, an order in which the units are encoded and decoded, and the like.

Coding tree unit: a single coding tree block configured with a luminance component Y and two coding tree blocks associated with chrominance components Cb and Cr. In addition, the coding tree unit may mean a syntax element including blocks and each block. The lower layer unit, such as an encoding unit, a prediction unit, a transformation unit, and the like, may be constructed by partitioning each encoding tree unit using at least one of a quadtree partitioning method and a binary tree partitioning method. The coding tree unit may be used as a term for specifying a block of pixels that becomes a processing unit when an image that is an input image is encoded/decoded.

Coding tree blocks: may be used as a term for specifying any one of a Y-coding tree block, a Cb-coding tree block, and a Cr-coding tree block.

Adjacent blocks: meaning a block adjacent to the current block. The block adjacent to the current block may mean a block in contact with a boundary of the current block or a block located within a predetermined distance from the current block. Neighboring blocks may mean blocks adjacent to the vertices of the current block. Here, a block adjacent to a vertex of the current block may mean a block vertically adjacent to an adjacent block horizontally adjacent to the current block, or a block horizontally adjacent to an adjacent block vertically adjacent to the current block.

Reconstructing neighboring blocks: meaning a neighboring block that is adjacent to the current block and has been encoded or decoded in time/space. Here, reconstructing a neighboring block may mean reconstructing a neighboring cell. The reconstructed spatial neighboring block may be a block within the current picture that has been reconstructed by encoding or decoding or both. The reconstructed temporal neighboring block is a block within the reference picture that is co-located with the current block of the current picture or a neighboring block of the block.

Cell depth: meaning the degree to which a cell is partitioned. In the tree structure, the root node may be the highest node and the leaf node may be the lowest node. In addition, when a cell is represented as a tree structure, the level at which the cell exists may mean the cell depth.

Bit stream: meaning a bitstream comprising encoded image information.

Parameter set: corresponding to header information in the structure within the bitstream. At least one of a video parameter set, a sequence parameter set, a picture parameter set, and an adaptive parameter set may be included in the parameter set. Further, the parameter set may include a stripe header and parallel block (tile) header information.

Analysis: it may mean that the value of the syntax element is determined by performing entropy decoding, or it may mean entropy decoding itself.

The symbols: it may mean at least one of syntax elements, coding parameters, and transform coefficient values of the encoding/decoding target unit. In addition, the symbol may mean an entropy encoding target or an entropy decoding result.

Prediction unit: meaning a basic unit when prediction such as inter prediction, intra prediction, inter compensation, intra compensation, and motion compensation is performed. A single prediction unit may be partitioned into a plurality of partitions having a small size, or may be partitioned into lower prediction units.

Prediction unit partitioning: meaning the shape obtained by partitioning the prediction unit.

Reference picture list: meaning a list comprising one or more reference pictures for inter-picture prediction or motion compensation. LC (List Combined), L0 (List 0), L1 (List 1), L2 (List 2), L3 (List 3), and the like are types of reference picture lists. One or more reference picture lists may be used for inter-picture prediction.

Inter-picture prediction indicator: inter prediction direction (unidirectional prediction, bidirectional prediction, etc.) of the current block may be intended. Alternatively, the inter-picture prediction indicator may mean the number of reference pictures used to generate a prediction block of the current block. Further alternatively, the inter prediction indicator may mean the number of prediction blocks used to perform inter prediction or motion compensation for the current block.

Reference picture index: meaning an index indicating a specific reference picture in the reference picture list.

Reference picture: it may mean a picture referenced for inter-picture prediction or motion compensation of a particular block.

Motion vector: is a two-dimensional vector for inter-picture prediction or motion compensation, and may mean an offset between a reference picture and an encoding/decoding target picture. For example, (mvX, mvY) may represent a motion vector, mvX may represent a horizontal component, and mvY may represent a vertical component.

Motion vector candidates: it may mean a block that becomes a prediction candidate when a motion vector is predicted, or it may mean a motion vector of the block. The motion vector candidates may be listed in a motion vector candidate list.

Motion vector candidate list: may mean a list of motion vector candidates.

Motion vector candidate index: may mean an indicator indicating a motion vector candidate in the motion vector candidate list. The motion vector candidate index is also referred to as an index of a motion vector predictor.

Motion information: it may mean information including at least any one of a motion vector, a reference picture index, an inter-picture prediction indicator, and: reference picture list information, reference pictures, motion vector candidates, motion vector candidate indexes, merge candidates, and merge indexes.

Merging candidate list: meaning a list of merge candidates.

Combining candidates: meaning spatial merge candidates, temporal merge candidates, combined bi-predictive merge candidates, zero merge candidates, etc. The merge candidates may have inter-picture prediction indicators, reference picture indexes for each list, and motion information such as motion vectors.

Merging index: meaning information indicating a merge candidate within the merge candidate list. The merge index may indicate a block for deriving a merge candidate among reconstructed blocks spatially and/or temporally adjacent to the current block. The merge index may indicate at least one item of motion information owned by the merge candidate.

A conversion unit: meaning a basic unit when encoding/decoding (such as transform, inverse transform, quantization, inverse quantization, and transform coefficient encoding/decoding) is performed on a residual signal. A single transform unit may be partitioned into multiple transform units having a small size.

Scaling: meaning the process of multiplying the transform coefficient level by a factor. Transform coefficients may be generated by scaling the transform coefficient levels. Scaling may also be referred to as dequantization.

Quantization parameters: it may mean a value used when generating a transform coefficient level of a transform coefficient during quantization. Quantization parameters may also mean values used when generating transform coefficients by scaling the transform coefficient levels during dequantization. The quantization parameter may be a value mapped on the quantization step size.

Delta (Delta) quantization parameter: meaning the difference between the quantization parameter of the encoding/decoding target unit and the predicted quantization parameter.

Scanning: meaning a method of ordering coefficients within a block or matrix. For example, the operation of changing the two-dimensional matrix of coefficients to the one-dimensional matrix may be referred to as scanning, and the operation of changing the one-dimensional matrix of coefficients to the two-dimensional matrix may be referred to as scanning or inverse scanning.

Transform coefficients: it may mean coefficient values generated after performing a transform in an encoder. The transform coefficient may mean a coefficient value generated after at least one of entropy decoding and dequantization is performed in the decoder. The quantized level obtained by quantizing the transform coefficient or the residual signal or the quantized transform coefficient level may also fall within the meaning of the transform coefficient.

Quantized grade: meaning a value generated by quantizing a transform coefficient or a residual signal in an encoder. Alternatively, the level of quantization may mean a value that is an inverse quantization target subject to inverse quantization in a decoder. Similarly, quantized transform coefficient levels, which are the result of transform and quantization, may also fall within the meaning of quantized levels.

Non-zero transform coefficients: a transform coefficient having a value other than 0 is meant, or a transform coefficient level having a value other than 0 is meant.

Quantization matrix: meaning a matrix used in a quantization process or an inverse quantization process performed in order to improve subjective image quality or objective image quality. The quantization matrix may also be referred to as a scaling list.

Quantization matrix coefficients: meaning that each element within the matrix is quantized. The quantized matrix coefficients may also be referred to as matrix coefficients.

Default matrix: meaning a predetermined quantization matrix that is primarily defined in the encoder or decoder.

Non-default matrix: meaning a quantization matrix that is not initially defined in the encoder or decoder but signaled by the user.

Fig. 1 is a block diagram showing the construction of an encoding apparatus according to an embodiment to which the present invention is applied.

The encoding apparatus 100 may be an encoder, a video encoding apparatus, or an image encoding apparatus. The video may include at least one image. The encoding apparatus 100 may sequentially encode the at least one image.

Referring to fig. 1, the encoding apparatus 100 may include a motion prediction unit 111, a motion compensation unit 112, an intra prediction unit 120, a switcher 115, a subtractor 125, a transform unit 130, a quantization unit 140, an entropy encoding unit 150, an inverse quantization unit 160, an inverse transform unit 170, an adder 175, a filter unit 180, and a reference picture buffer 190.

The encoding apparatus 100 may perform encoding on an input picture by using an intra mode or an inter mode or both the intra mode and the inter mode. Further, the encoding apparatus 100 may generate a bitstream by encoding an input picture, and may output the generated bitstream. The generated bit stream may be stored in a computer readable recording medium or may be streamed over a wired/wireless transmission medium. When the intra mode is used as the prediction mode, the switcher 115 can switch to intra. Alternatively, when the inter mode is used as the prediction mode, the switcher 115 may switch to the inter mode. Here, the intra mode may mean an intra prediction mode, and the inter mode may mean an inter prediction mode. The encoding apparatus 100 may generate a prediction block of an input image. Further, after generating the prediction block, the encoding apparatus 100 may encode a residual between the input block and the prediction block. The input image may be referred to as a current image as a current encoding target. The input block may be referred to as a current block as a current encoding target or may be referred to as an encoding target block.

When the prediction mode is an intra mode, the intra prediction unit 120 may use pixel values of an encoded/decoded block adjacent to the current block as reference pixels. The intra prediction unit 120 may perform spatial prediction by using the reference pixels, or may generate prediction samples of the input block by performing spatial prediction. Here, intra prediction may mean prediction within a frame.

When the prediction mode is an inter mode, the motion prediction unit 111 may search for a region that best matches the input block from the reference image when performing motion prediction, and may derive a motion vector by using the searched region. The reference picture may be stored in a reference picture buffer 190.

The motion compensation unit 112 may generate a prediction block by performing motion compensation using the motion vector. Here, inter prediction may mean prediction between frames or motion compensation.

When the value of the motion vector is not an integer, the motion prediction unit 111 and the motion compensation unit 112 may generate a prediction block by applying an interpolation filter to a partial region of the reference picture. In order to perform inter-picture prediction or motion compensation on a coding unit, it may be determined which mode among a skip mode, a merge mode, an Advanced Motion Vector Prediction (AMVP) mode, and a current picture reference mode is used for motion prediction and motion compensation of a prediction unit included in the corresponding coding unit. Inter-picture prediction or motion compensation may then be performed differently according to the determined mode.

The subtractor 125 may generate a residual block by using a residual between the input block and the prediction block. The residual block may be referred to as a residual signal. The residual signal may mean the difference between the original signal and the predicted signal. In addition, the residual signal may be a signal generated by transforming or quantizing a difference between the original signal and the predicted signal, or transforming and quantizing. The residual block may be a residual signal of a block unit.

The transform unit 130 may generate transform coefficients by performing a transform on the residual block, and may output the generated transform coefficients. Here, the transform coefficient may be a coefficient value generated by performing a transform on the residual block. When the transform skip mode is applied, the transform unit 130 may skip the transform of the residual block.

The level of quantization may be generated by applying quantization to the transform coefficients or to the residual signal. Hereinafter, in an embodiment, the level of quantization may also be referred to as a transform coefficient.

The quantization unit 140 may generate a quantized level by quantizing the transform coefficient or the residual signal according to the parameter, and may output the generated quantized level. Here, the quantization unit 140 may quantize the transform coefficient by using a quantization matrix.

The entropy encoding unit 150 may generate a bitstream by performing entropy encoding on the values calculated by the quantization unit 140 or on the encoding parameter values calculated when encoding is performed according to the probability distribution, and may output the generated bitstream. The entropy encoding unit 150 may perform entropy encoding on pixel information of an image and information for decoding the image. For example, the information for decoding the image may include a syntax element.

When entropy encoding is applied, symbols may be represented in such a manner that a smaller number of bits are allocated to symbols having a high generation opportunity and a larger number of bits are allocated to symbols having a low generation opportunity, so that the size of a bit stream for symbols to be encoded may be reduced. The entropy encoding unit 150 may use an encoding method for entropy encoding such as exponential golomb, context Adaptive Variable Length Coding (CAVLC), context Adaptive Binary Arithmetic Coding (CABAC), or the like. For example, the entropy encoding unit 150 may perform entropy encoding by using a variable length coding/coding (VLC) table. Further, the entropy encoding unit 150 may derive a binarization method of the target symbol and a probability model of the target symbol/binary bit, and may perform arithmetic encoding by using the derived binarization method and context model.

In order to encode the transform coefficient levels, the entropy encoding unit 150 may change coefficients in the form of a two-dimensional block into a one-dimensional vector form by using a transform coefficient scanning method.

The encoding parameters may include information (flags, indexes, etc.) such as syntax elements encoded in the encoder and signaled to the decoder, and information derived when encoding or decoding is performed. The encoding parameter may mean information required when encoding or decoding an image. For example, the encoding parameters may include at least one value or combination of the following: unit/block size, unit/block depth, unit/block partition information, unit/block partition structure, whether to perform a quadtree-form partition, whether to perform a binary tree-form partition, binary tree-form partition direction (horizontal direction or vertical direction), binary tree-form partition form (symmetric partition or asymmetric partition), intra prediction mode/direction, reference sample filtering method, prediction block filter tap, prediction block filter coefficient, inter prediction mode, motion information, motion vector, reference picture index, inter prediction angle, inter prediction indicator, reference picture list, reference picture, motion vector predictor candidate, motion vector candidate list, whether to use merge mode, merge candidate list, whether to use skip mode, motion vector interpolation filter type, interpolation filter tap, interpolation filter coefficient, motion vector size, accuracy of representation of motion vector, transform type, transform size, information whether primary (first) transform is used, information whether secondary transform is used, primary transform index, secondary transform index, information whether residual signal is present, coding block pattern, coding Block Flag (CBF), quantization parameter, quantization matrix, whether in-loop filter is applied, in-loop filter coefficient, in-loop filter tap, in-loop filter shape/form, whether deblocking filter is applied, deblocking filter coefficient, deblocking filter tap, deblocking filter strength, deblocking filter shape/form, whether adaptive sample offset is applied, adaptive sample offset value, quantization matrix, and method for processing a residual signal, an adaptive sample offset class, an adaptive sample offset type, whether an adaptive in-loop filter is applied, an adaptive in-loop filter coefficient, an adaptive in-loop filter tap, an adaptive in-loop filter shape/form, a binarization/anti-binarization method, a context model determination method, a context model update method, whether a normal mode is performed, whether a bypass mode is performed, a context binary bit, a bypass binary bit, a transform coefficient level scan method, an image display/output order, stripe identification information, stripe type, stripe partition information, parallel block identification information, parallel block type, parallel block partition information, picture type, bit depth, and information of a luminance signal or a chrominance signal.

Here, signaling a flag or index may mean that the corresponding flag or index is entropy encoded by an encoder and included in a bitstream, and may mean that the corresponding flag or index is entropy decoded from the bitstream by a decoder.

When the encoding apparatus 100 performs encoding through inter prediction, the encoded current image may be used as a reference image for another image to be subsequently processed. Accordingly, the encoding apparatus 100 may reconstruct or decode the encoded current image, or may store the reconstructed or decoded image as a reference image.

The quantized level may be inverse quantized in the inverse quantization unit 160 or may be inverse transformed in the inverse transformation unit 170. The inverse quantized or inverse transformed coefficients, or both, may be added to the prediction block by adder 175. The reconstructed block may be generated by adding the inverse quantized or inverse transformed coefficients or the inverse quantized and inverse transformed coefficients to the prediction block. Here, the coefficient subjected to inverse quantization or inverse transformation or the coefficient subjected to both inverse quantization and inverse transformation may mean a coefficient subjected to at least one of inverse quantization and inverse transformation, and may mean a reconstructed residual block.

The reconstructed block may pass through a filter unit 180. The filter unit 180 may apply at least one of a deblocking filter, a Sample Adaptive Offset (SAO), and an Adaptive Loop Filter (ALF) to the reconstructed block or the reconstructed image. The filter unit 180 may be referred to as an in-loop filter.

The deblocking filter may remove block distortion generated at boundaries between blocks. To determine whether to apply the deblocking filter, whether to apply the deblocking filter to the current block may be determined based on pixels included in several rows or columns included in the block. When a deblocking filter is applied to a block, another filter may be applied according to the required deblocking filter strength.

To compensate for coding errors, appropriate offset values may be added to the pixel values by using a sample adaptive offset. The sample adaptive offset may correct an offset of the deblocked image from the original image according to pixel units. A method of applying an offset in consideration of edge information about each pixel may be used, or the following method may be used: the pixels of the image are partitioned into a predetermined number of regions, the regions to which the offset is applied are determined, and the offset is applied to the determined regions.

The adaptive loop filter may perform filtering based on a comparison between the filtered reconstructed image and the original image. Pixels included in the image may be partitioned into predetermined groups, filters to be applied to each group may be determined, and different filtering may be performed for each group. Information on whether to apply the ALF may be signaled in terms of a Coding Unit (CU), and the shape and coefficient of the ALF to be applied to each block may vary.

The reconstructed block or reconstructed image that passes through the filter unit 180 may be stored in a reference picture buffer 190. Fig. 2 is a block diagram showing the construction of a decoding apparatus to which the present invention is applied according to an embodiment.

The decoding apparatus 200 may be a decoder, a video decoding apparatus, or an image decoding apparatus.

Referring to fig. 2, the decoding apparatus 200 may include an entropy decoding unit 210, an inverse quantization unit 220, an inverse transform unit 230, an intra prediction unit 240, a motion compensation unit 250, an adder 255, a filter unit 260, and a reference picture buffer 270.

The decoding apparatus 200 may receive the bit stream output from the encoding apparatus 100. The decoding apparatus 200 may receive a bit stream stored in a computer readable recording medium, or may receive a bit stream streamed over a wired/wireless transmission medium. The decoding apparatus 200 may decode the bitstream by using an intra mode or an inter mode. Further, the decoding apparatus 200 may generate a reconstructed image or a decoded image generated by performing decoding, and may output the reconstructed image or the decoded image.

The switcher may be switched into the intra frame when the prediction mode used at the time of decoding is the intra frame mode. Alternatively, the switcher may be switched to the inter mode when the prediction mode used at the time of decoding is the inter mode.

The decoding apparatus 200 may obtain a reconstructed residual block by decoding an input bitstream, and may generate a prediction block. When the reconstructed residual block and the prediction block are obtained, the decoding apparatus 200 may generate a reconstructed block that becomes a decoding target by adding the reconstructed residual block and the prediction block. The decoding target block may be referred to as a current block.

The entropy decoding unit 210 may generate symbols by entropy decoding the bitstream according to the probability distribution. The generated symbols may include quantized, hierarchical forms of symbols. Here, the entropy decoding method may be an inverse process of the above-described entropy encoding method.

In order to decode the transform coefficient level, the entropy decoding unit 210 may change coefficients in the form of a one-dimensional vector into a two-dimensional block form by using a transform coefficient scanning method.

The quantized level may be inverse quantized in the inverse quantization unit 220 or may be inverse transformed in the inverse transformation unit 230. The level of quantization may be the result of performing inverse quantization or inverse transformation or both, and may be generated as a reconstructed residual block. Here, the dequantization unit 220 may apply a quantization matrix to the quantized level.

When the intra mode is used, the intra prediction unit 240 may generate a prediction block by performing spatial prediction using pixel values of blocks that have been decoded adjacent to the decoding target block.

When the inter mode is used, the motion compensation unit 250 may generate a prediction block by performing motion compensation using the reference image stored in the reference picture buffer 270 and the motion vector.

The adder 255 may generate a reconstructed block by adding the reconstructed residual block and the prediction block. The filter unit 260 may apply at least one of a deblocking filter, a sample adaptive offset, and an adaptive loop filter to the reconstructed block or the reconstructed image. The filter unit 260 may output the reconstructed image. The reconstructed block or the reconstructed image may be stored in a reference picture buffer 270 and may be used when performing inter prediction.

Fig. 3 is a diagram schematically showing a partition structure of an image when the image is encoded and decoded. Fig. 3 schematically shows an example of partitioning a single unit into a plurality of subordinate units.

For efficient partitioning of images, an encoding unit (CU) may be used when encoding and decoding. The encoding unit may be used as a basic unit when encoding/decoding an image. In addition, the encoding unit may be used as a unit for distinguishing an intra mode and an inter mode when encoding/decoding an image. The coding unit may be a basic unit for performing prediction, transformation, quantization, inverse transformation, inverse quantization, or encoding/decoding processes on the transform coefficients.

Referring to fig. 3, an image 300 is sequentially partitioned according to a maximum coding unit (LCU), and LCU units are determined as a partition structure. Here, LCU may be used in the same sense as a Coding Tree Unit (CTU). Unit partitioning may mean partitioning blocks associated with a unit. In the block partition information, information of a unit depth may be included. The depth information may represent the number or degree to which the unit is partitioned, or both the number and degree. Individual units may be partitioned in layers associated with depth information based on a tree structure. Each partitioned subordinate unit may have depth information. The depth information may be information representing the size of the CU, and may be stored in each CU.

The partition structure may mean a distribution of Coding Units (CUs) in LCU 310. Such a distribution may be determined according to whether a single CU is partitioned into multiple (positive integers equal to or greater than 2, including 2, 4, 8, 16, etc.) CUs. The horizontal and vertical sizes of the CUs generated by performing the partitioning may be half the horizontal and vertical sizes of the CUs before performing the partitioning, respectively, or may have sizes smaller than the horizontal and vertical sizes before performing the partitioning, respectively, according to the number of times the partitioning is performed. A CU may be recursively partitioned into multiple CUs. Partitioning may be performed recursively on the CUs until a predefined depth or a predefined size. For example, the depth of the LCU may be 0 and the depth of the Smallest Coding Unit (SCU) may be a predefined maximum depth. Here, the LCU may be a coding unit having a maximum coding unit size, and the SCU may be a coding unit having a minimum coding unit size as described above. Partitioning starts from LCU 310, and when the horizontal or vertical size or both the horizontal and vertical sizes of a CU are reduced by partitioning, the CU depth increases by 1.

In addition, information on whether a CU is partitioned may be represented by using partition information of the CU. The partition information may be 1-bit information. All CUs except SCU may include partition information. For example, a CU may not be partitioned when the value of partition information is a first value, and a CU may be partitioned when the value of partition information is a second value.

Referring to fig. 3, an LCU having a depth of 0 may be a 64×64 block. 0 may be the minimum depth. The SCU with depth 3 may be an 8 x 8 block. 3 may be the maximum depth. A CU of a 32×32 block and a 16×16 block may be represented as depth 1 and depth 2, respectively.

For example, when a single coding unit is partitioned into four coding units, the horizontal and vertical sizes of the partitioned four coding units may be half the sizes of the horizontal and vertical sizes of the CU before being partitioned. In one embodiment, when a coding unit having a size of 32×32 is partitioned into four coding units, each of the partitioned four coding units may have a size of 16×16. When a single coding unit is partitioned into four coding units, it may be referred to that the coding units may be partitioned into quadtree forms.

For example, when a single coding unit is partitioned into two coding units, the horizontal or vertical size of the two coding units may be half of the horizontal or vertical size of the coding unit before being partitioned. For example, when the coding units having a size of 32×32 are partitioned in the vertical direction, each of the partitioned two coding units may have a size of 16×32. When a single coding unit is partitioned into two coding units, the coding units may be referred to as being partitioned in a binary tree form. LCU 320 of fig. 3 is an example of an LCU to which both quad-tree form partitions and binary tree form partitions are applied.

Fig. 4 is a diagram illustrating an embodiment of inter prediction processing.

In fig. 4, a rectangle may represent a screen. In fig. 4, an arrow indicates a prediction direction. Pictures can be classified into intra pictures (I pictures), predicted pictures (P pictures), and bi-predicted pictures (B pictures) according to the coding type of the pictures.

I pictures can be encoded by intra prediction without inter prediction. The P picture may be encoded via inter prediction by using a reference picture existing in one direction (i.e., forward or backward) with respect to the current block. The B picture may be encoded via inter prediction by using reference pictures preset in two directions (i.e., forward and backward) with respect to the current block. When inter-picture prediction is used, the encoder may perform inter-picture prediction or motion compensation, and the decoder may perform corresponding motion compensation.

Hereinafter, an embodiment of inter-picture prediction will be described in detail.

Inter-picture prediction or motion compensation may be performed using reference pictures and motion information.

Motion information of the current block may be derived by each of the encoding apparatus 100 and the decoding apparatus 200 during inter prediction. The motion information of the current block may be derived by using motion information of reconstructed neighboring blocks, a co-located block (also referred to as col block or co-located block), and/or motion information of blocks adjacent to the co-located block. A co-located block may mean a block spatially co-located with the current block within a previously reconstructed co-located picture (also referred to as col picture or co-located picture). The co-located picture may be one picture among one or more reference pictures included in the reference picture list.

The method of deriving the motion information of the current block may vary according to the prediction mode of the current block. For example, as a prediction mode for inter-picture prediction, there may be an AMVP mode, a merge mode, a skip mode, a current picture reference mode, and the like. The merge mode may be referred to as a motion merge mode.

For example, when AMVP is used as a prediction mode, at least one of a motion vector of a reconstructed neighboring block, a motion vector of a co-located block, a motion vector of a block adjacent to the co-located block, and a motion vector (0, 0) may be determined as a motion vector candidate for the current block, and a motion vector candidate list may be generated by using the motion vector candidate. The motion vector candidates of the current block may be derived by using the generated motion vector candidate list. Motion information of the current block may be determined based on the derived motion vector candidates. The motion vector of the co-located block or the motion vector of a block adjacent to the co-located block may be referred to as a temporal motion vector candidate, and the motion vector of the reconstructed adjacent block may be referred to as a spatial motion vector candidate.

The encoding apparatus 100 may calculate a Motion Vector Difference (MVD) between a motion vector of the current block and a motion vector candidate, and may perform entropy encoding on the Motion Vector Difference (MVD). In addition, the encoding apparatus 100 may perform entropy encoding on the motion vector candidate index and generate a bitstream. The motion vector candidate index may indicate the best motion vector candidate among the motion vector candidates included in the motion vector candidate list. The decoding apparatus may perform entropy decoding on motion vector candidate indexes included in the bitstream, and may select a motion vector candidate of the decoding target block among motion vector candidates included in the motion vector candidate list by using the entropy-decoded motion vector candidate indexes. In addition, the decoding apparatus 200 may add the entropy-decoded MVD to the motion vector candidates extracted by the entropy decoding, thereby deriving a motion vector of the decoding target block.

The bitstream may include a reference picture index indicating a reference picture. The reference picture index may be entropy encoded by the encoding device 100 and then signaled as a bitstream to the decoding device 200. The decoding apparatus 200 may generate a prediction block of the decoding target block based on the derived motion vector and the reference picture index information.

Another example of a method of deriving motion information of a current block may be a merge mode. The merge mode may mean a method of merging motions of a plurality of blocks. The merge mode may mean a mode of deriving motion information of a current block from motion information of neighboring blocks. When the merge mode is applied, the merge candidate list may be generated using motion information of reconstructed neighboring blocks and/or motion information of co-located blocks. The motion information may include at least one of a motion vector, a reference picture index, and an inter-picture prediction indicator. The prediction indicator may indicate unidirectional prediction (L0 prediction or L1 prediction) or bidirectional prediction (L0 prediction and L1 prediction).

The merge candidate list may be a list of stored motion information. The motion information included in the merge candidate list may be at least one of zero merge candidate and new motion information, wherein the new motion information is a combination of motion information of one neighboring block adjacent to the current block (spatial merge candidate), motion information of a co-located block of the current block included in the reference picture (temporal merge candidate), and motion information existing in the merge candidate list.

The encoding apparatus 100 may generate a bitstream by performing entropy encoding on at least one of the merge flag and the merge index, and may signal the bitstream to the decoding apparatus 200. The merge flag may be information indicating whether to perform a merge mode for each block, and the merge index may be information indicating which neighboring block among neighboring blocks of the current block is a merge target block. For example, neighboring blocks of the current block may include a left neighboring block to the left of the current block, an upper neighboring block located above the current block, and a temporal neighboring block temporally adjacent to the current block.

The skip mode may be a mode in which motion information of a neighboring block is applied to a current block as it is. When the skip mode is applied, the encoding apparatus 100 may perform entropy encoding on information of the fact of which block's motion information is to be used as motion information of the current block to generate a bitstream, and may signal the bitstream to the decoding apparatus 200. The encoding apparatus 100 may not signal syntax elements regarding at least any one of motion vector difference information, an encoded block flag, and a transform coefficient level to the decoding apparatus 200.

The current picture reference mode may mean a prediction mode in which a previously reconstructed region within the current picture to which the current block belongs is used for prediction. Here, a vector may be used to designate the previously reconstructed region. Information indicating whether the current block is to be encoded in the current picture reference mode may be encoded by using a reference picture index of the current block. A flag or index indicating whether the current block is a block encoded in the current picture reference mode may be signaled and may be derived based on the reference picture index of the current block. In the case where the current block is encoded in the current picture reference mode, the current picture may be added to the reference picture list for the current block so as to be located at a fixed position or a random position in the reference picture list. The fixed position may be, for example, the position indicated by reference picture index 0 or the last position in the list. When a current picture is added to the reference picture list so as to be located at the random position, a reference picture index indicating the random position may be signaled.

Based on the above description, an image encoding method and an image decoding method according to an embodiment of the present invention will be described in detail below.

Fig. 5 is a flowchart illustrating an image encoding method according to an embodiment of the present invention, and fig. 6 is a flowchart illustrating an image decoding method according to an embodiment of the present invention.

Referring to fig. 5, the encoding apparatus may derive a motion vector candidate (step S501), and may generate a motion vector candidate list based on the derived motion vector candidate (step S502). After the motion vector candidate list is generated, a motion vector may be determined based on the generated motion vector candidate list (step S503), and motion compensation may be performed based on the determined motion vector (step S504). Thereafter, the encoding apparatus may encode information associated with the motion compensation (step S505).

Referring to fig. 6, the decoding apparatus may perform entropy decoding on information associated with motion compensation received from the encoding apparatus (step S601), and may derive motion vector candidates (step S602). The decoding apparatus may generate a motion vector candidate list based on the derived motion vector candidates (step S603), and determine a motion vector using the generated motion vector candidate list (step S604). Thereafter, the decoding apparatus may perform motion compensation by using the determined motion vector (step S605).

Fig. 7 is a flowchart illustrating an image encoding method according to another embodiment of the present invention, and fig. 8 is a flowchart illustrating an image decoding method according to another embodiment of the present invention.

Referring to fig. 7, the encoding apparatus may derive a merge candidate (step S701) and generate a merge candidate list based on the derived merge candidate. After the merge candidate list is generated, the encoding device may determine motion information using the generated merge candidate list (step S702), and may perform motion compensation on the current block using the determined motion information (step S703). Thereafter, the encoding apparatus may perform entropy encoding on information associated with motion compensation (step S704).

Referring to fig. 8, the decoding apparatus may perform entropy decoding on information associated with motion compensation received from the encoding apparatus (S801), derive a merge candidate (S802), and generate a merge candidate list based on the derived merge candidate. After the merge candidate list is generated, the decoding apparatus may determine motion information of the current block by using the generated merge candidate list (S803). Thereafter, the decoding apparatus may perform motion compensation using the motion information (S804).

Fig. 5 and 6 illustrate examples in which the AMVP illustrated in fig. 4 is applied, and fig. 7 and 8 illustrate examples in which the merge mode illustrated in fig. 4 is applied.

Hereinafter, each step in fig. 5 and 6 will be described, and then each step in fig. 7 and 8 will be described. However, the motion compensation steps corresponding to S504, S605, S703, and S804 and the entropy encoding/decoding steps corresponding to S505, S601, S704, and S801 will be described collectively.

Hereinafter, each step in fig. 5 and 6 will be described in detail below.

First, the step of deriving motion vector candidates (S501, S602) will be described in detail.

The motion vector candidates of the current block may include one of spatial motion vector candidates and temporal motion vector candidates, or both spatial motion vector candidates and temporal motion vector candidates.

The spatial motion vector of the current block may be derived from reconstructed blocks adjacent to the current block. For example, a motion vector of a reconstructed block adjacent to the current block may be determined as a spatial motion vector candidate for the current block.

Fig. 9 is a diagram illustrating an example of deriving spatial motion vector candidates of a current block.

Referring to fig. 9, a spatial motion vector candidate of a current block may be derived from neighboring blocks adjacent to the current block X. The neighboring blocks adjacent to the current block X include at least one of a block B1 adjacent to an upper end of the current block, a block A1 adjacent to a left end of the current block, a block B0 adjacent to an upper right corner of the current block, a block B2 adjacent to an upper left corner of the current block, and a block A0 adjacent to a lower left corner of the current block. Neighboring blocks adjacent to the current block may have a square shape or a non-square shape. When one neighboring block of a plurality of neighboring blocks neighboring the current block has a motion vector, the motion vector of the neighboring block may be determined as a spatial motion vector candidate of the current block. Whether the neighboring block has a motion vector or whether the motion vector of the neighboring block can be used as a spatial motion vector candidate of the current block may be determined based on a determination of whether the neighboring block exists or whether the neighboring block has been encoded through the inter prediction process. The determination of whether a particular neighboring block has a motion vector or whether a motion vector of a neighboring block can be used as a spatial motion vector candidate of the current block may be performed in a predetermined order. For example, as shown in fig. 9, the availability determination of the motion vector may be performed in the order of blocks A0, A1, B0, B1, and B2.

When a reference picture of a current block and a reference picture of a neighboring block having a motion vector are different from each other, the motion vector of the neighboring block is scaled, and then the scaled motion vector may be used as a spatial motion vector candidate of the current block. The motion vector scaling may be performed based on at least any one of a distance between a current picture and a reference picture of a current block and a distance between the current picture and a reference picture of a neighboring block. Here, the spatial motion vector candidates of the current block may be derived by scaling the motion vectors of the neighboring blocks according to a ratio of a distance between the current picture and a reference picture of the current block to a distance between the current picture and a reference picture of the neighboring block.

However, when the reference picture index of the current block and the reference picture index of the neighboring block having the motion vector are different, the scaled motion vector of the neighboring block may be determined as a spatial motion vector candidate of the current block. Even in this case, scaling may be performed based on at least one of a distance between the current picture and a reference picture of the current block and a distance between the current picture and a reference picture of a neighboring block.

Regarding scaling, motion vectors of neighboring blocks may be scaled based on a reference picture indicated by a reference picture index having a predefined value, and the scaled motion vector may be determined as a spatial motion vector candidate for the current block. The predefined value may be zero or a positive integer. For example, the spatial motion vector candidate of the current block may be derived by scaling the motion vector of the neighboring block based on a ratio of a distance between the current picture and a reference picture of the current block indicated by a reference picture index having a predefined value to a distance between the current picture and a reference picture of the neighboring block having a predefined value.

Alternatively, the spatial motion vector candidates of the current block may be derived based on at least one of the coding parameters of the current block.

Temporal motion vector candidates for the current block may be derived from reconstructed blocks included in co-located pictures of the current picture. A co-located picture is a picture encoded/decoded before a current picture and may be different from the current picture in temporal order.

Fig. 10 is a diagram showing an example of deriving temporal motion vector candidates of a current block.

Referring to fig. 10, a temporal motion vector candidate of a current block may be derived from a block including a position outside a block spatially co-located with the current block X within a co-located picture (also referred to as a co-located picture) of the current picture, or from a block including a position inside a block spatially co-located with the current block X. Here, the temporal motion vector candidate may mean a motion vector of a co-located block of the current block. For example, the temporal motion vector candidate of the current block X may be derived from a block H spatially adjacent to the lower right corner of the block C located at the same position as the current block X, or from a block C3 including the middle position of the block C. The block H, the block C3, etc. used to derive the temporal motion vector candidate of the current block are referred to as co-located blocks.

Alternatively, at least one of the temporal motion vector candidates, the co-located picture, the co-located block, the prediction list utilization flag, and the reference picture index may be derived based on at least one of the encoding parameters.

When the distance between the current picture including the current block and the reference picture of the current block is different from the distance between the co-located picture including the co-located block and the reference picture of the co-located block, the temporal motion vector candidate of the current block may be obtained by scaling the motion vector of the co-located block. Here, the scaling may be performed based on at least one of a distance between the current picture and a reference picture of the current block and a distance between the co-located picture and a reference picture of the co-located block. For example, the temporal motion vector candidate of the current block may be derived by scaling the motion vector of the co-located block according to a ratio of a distance between the current picture and a reference picture of the current block to a distance between the co-located picture and a reference picture of the co-located block.

Next, the step of generating a motion vector candidate list based on the derived motion vector candidates (S502, S503) will be described.

The step of generating the motion vector candidate list may include a process of adding or removing motion vector candidates to or from the motion vector candidate list, and a process of adding combined motion vector candidates to the motion vector candidate list.

First, a process of adding or removing the derived motion vector candidate to or from the motion vector candidate list will be described. The encoding device and the decoding device may add the deduced motion vector candidates to the motion vector candidate list in the order in which the motion vector candidates were deduced.

It is assumed that the motion vector candidate list mvpllistlx may mean a motion vector candidate list corresponding to the reference picture lists L0, L1, L2, and L3. That is, a motion vector candidate list corresponding to the reference picture list L0 may be represented by mvpListL 0.

In addition to the spatial motion vector candidates and the temporal motion vector candidates, a motion vector having a predetermined value may be added to the motion vector candidate list. For example, when the number of motion vector candidates in the motion vector candidate list is smaller than the maximum number of motion vector candidates that can be included in the motion vector candidate list, a motion vector having a value of 0 may be added to the motion vector candidate list.

Next, a process of adding the combined motion vector candidate to the motion vector candidate list will be described.

When the number of motion vector candidates in the motion vector candidate list is less than the maximum number of motion vector candidates that can be included in the motion vector candidate list, one or more motion vector candidates in the motion vector candidate list are combined to generate one or more combined motion vector candidates, and the generated combined motion vector candidates may be added to the motion vector candidate list. For example, at least one or more of a spatial motion vector candidate, a temporal motion vector candidate, and a zero motion vector candidate included in the motion vector candidate list are used to generate a combined motion vector candidate, and the generated combined motion vector candidate may be added to the motion vector candidate list.

Alternatively, a combined motion vector candidate may be generated based on at least one of the encoding parameters, and the combined motion vector candidate generated based on at least one of the encoding parameters may be added to the motion vector candidate list.

Next, the step of selecting the predicted motion vector of the current block from the motion vector candidate list (S503, S604) will be described below.

Among the motion vector candidates included in the motion vector candidate list, the motion vector candidate indicated by the motion vector candidate index may be determined as a predicted motion vector of the current block.

The encoding apparatus may calculate a difference between a predicted motion vector of the current block and the motion vector, thereby generating a motion vector difference. The decoding apparatus may generate a motion vector of the current block by adding the predicted motion vector to the motion vector difference.

The steps of performing motion compensation (S504, S605) and entropy encoding/decoding the information associated with motion compensation (S505, S601) shown in fig. 5 and 6, and the steps of performing motion compensation (S703, S804) and entropy encoding/decoding (S704, S801) shown in fig. 7 and 8 will be described collectively later.

Next, each step shown in fig. 7 and 8 will be described in detail.

First, the step of deriving the merge candidates (S701, S802) will be described.

The merging candidates of the current block may include at least one of a spatial merging candidate, a temporal merging candidate, and an additional merging candidate. Here, the expression "deriving a spatial merge candidate" means a process of deriving a spatial merge candidate and adding the derived merge candidate to a merge candidate list.

Referring to fig. 9, a spatial merging candidate of a current block may be derived from neighboring blocks adjacent to the current block X. The neighboring blocks adjacent to the current block X may include at least one of a block B1 adjacent to an upper end of the current block, a block A1 adjacent to a left end of the current block, a block B0 adjacent to an upper right corner of the current block, a block B2 adjacent to an upper left corner of the current block, and a block A0 adjacent to a lower left corner of the current block.

In order to derive a spatial merging candidate for a current block, it is determined whether each neighboring block adjacent to the current block is available for derivation of the spatial merging candidate for the current block. Such determination may be made for neighboring blocks in a predetermined priority order. For example, in the example of fig. 9, the availability of the spatial merging candidates may be determined in the order of blocks A1, B0, A0, and B2. The spatial merge candidates determined in the order of determination based on availability may be sequentially added to the merge candidate list of the current block.

Fig. 11 is a diagram showing an example of a process of adding a spatial merge candidate to a merge candidate list.

Referring to fig. 11, four spatial merging candidates are derived from four neighboring blocks A1, B0, A0, and B2, and the derived spatial merging candidates may be sequentially added to the merging candidate list.

Alternatively, the spatial merging candidates may be derived based on at least one of the coding parameters.

Here, the motion information of the spatial merging candidate may include three or more pieces of motion information including L2 motion information and L3 motion information in addition to L0 motion information and L1 motion information. Here, there may be at least one reference picture list including, for example, L0, L1, L2, and L3.

Next, a method of deriving the temporal merging candidates of the current block will be described.

The temporal merging candidates of the current block may be derived from reconstructed blocks included in co-located pictures of the current picture. The co-located picture may be a picture encoded/decoded before the current picture and may be different from the current picture in temporal order.

The expression "deriving temporal merging candidates" means a process of deriving temporal merging candidates and adding the derived temporal merging candidates to a merging candidate list.

Referring to fig. 10, the temporal merging candidate of the current block may be derived from a block including a position spatially outside a block spatially co-located with the current block X in a co-located picture (also referred to as a co-located picture) of the current picture, or may be derived from a block including a position spatially inside a block spatially co-located with the current block X in a co-located picture of the current picture. The term "temporal merging candidates" may mean motion information of the co-located blocks. For example, the temporal merging candidate of the current block X may be derived from a block H spatially adjacent to the lower right corner of the block C located at the same position as the current block X, or from a block C3 including the middle position of the block C. The block H, C and the like used to derive the temporal merging candidates of the current block are referred to as co-located blocks (also referred to as co-located blocks).

When the temporal merging candidate of the current block can be derived from the block H including the position located outside the block C, the block H is set as the same-bit block of the current block. In this case, the temporal merging candidates of the current block may be derived based on the motion information of the block H. In contrast, when the temporal merging candidate of the current block cannot be derived from the block H, the block C3 including the position located inside the block C may be set as the same-bit block of the current block. In this case, the temporal merging candidate of the current block may be derived based on the motion vector of the block C3. When any temporal merging candidate of the current block cannot be derived from either block H or block C3 (e.g., both block H and block C3 are intra-coded blocks), the temporal merging candidate of the current block may not be derived at all, or may be derived from blocks other than blocks H and C3.

Alternatively, for example, a plurality of temporal merging candidates of the current block may be derived from a plurality of blocks included in the co-located picture. That is, a plurality of temporal candidates for the current block may be derived from block H, C3 and the like.

Fig. 12 is a diagram showing an example of a process of adding a temporal merge candidate to a merge candidate list.

Referring to fig. 12, when one temporal merging candidate is derived from a parity block located at position H1, the derived temporal merging candidate may be added to the merging candidate list.

When the distance between the current picture including the current block and the reference picture of the current block is different from the distance between the co-located picture including the co-located block and the reference picture of the co-located block, the motion vector of the temporal merging candidate of the current block may be obtained by scaling the motion vector of the co-located block. Here, the scaling of the motion vector may be performed based on at least one of a distance between the current picture and a reference picture of the current block and a distance between the co-located picture and a reference picture of the co-located block. For example, the motion vector of the temporal merging candidate of the current block may be derived by scaling the motion vector of the co-located block according to a ratio of a distance between the current picture and a reference picture of the current block to a distance between the co-located picture and a reference picture of the co-located block.

In addition, at least one of a temporal merging candidate, a co-located picture, a co-located block, a prediction list utilization flag, and a reference picture index may be derived based on at least one of coding parameters of a current block, a neighboring block, or a co-located block.

The merge candidate list may be generated by generating at least one of a spatial merge candidate and a temporal merge candidate and sequentially adding the derived merge candidates to the merge candidate list in the derived order.

Next, a method of deriving additional merge candidates for the current block will be described.

The term "additional merge candidate" may mean at least one of a modified spatial merge candidate, a modified temporal merge candidate, a combined merge candidate, and a predetermined merge candidate having a predetermined motion information value. Here, the expression "deriving additional merge candidates" may mean a process of deriving additional merge candidates and adding the derived additional merge candidates to the merge candidate list.

The modified spatial merge candidate may mean a merge candidate obtained by modifying at least one of the motion information of the derived spatial merge candidate.

The modified temporal merging candidate may mean a modified merging candidate obtained by modifying at least one of the motion information of the derived temporal merging candidate.

The combined merge candidate may mean a merge candidate obtained by combining motion information of at least one of a spatial merge candidate, a temporal merge candidate, a modified spatial merge candidate, a modified temporal merge candidate, a combined merge candidate, and a predetermined merge candidate having a predetermined motion information value, wherein the spatial merge candidate, the temporal merge candidate, the modified spatial merge candidate, the modified temporal merge candidate, the combined merge candidate, and the predetermined merge candidate having the predetermined motion information value are all included in the merge candidate list.

Alternatively, the combined merge candidate may mean a merge candidate derived by combining motion information of at least one of the following merge candidates: spatial and temporal merging candidates not included in the merging candidate list but derived from a block from which at least one of the spatial and temporal merging candidates can be derived; modified spatial merging candidates and modified temporal merging candidates derived based on the spatial merging candidates and temporal merging candidates derived from the block; combining the merge candidates; and a predetermined merge candidate having a predetermined motion information value.

Alternatively, the combination merge candidate may be derived using motion information obtained by performing entropy decoding on the bit stream in the decoder. In this case, the motion information used to derive the combined merge candidates may be entropy encoded into a bitstream in the encoder.

The combined merge candidate may mean a combined bi-directional merge candidate. The combined bi-directional merge candidate is a merge candidate using bi-prediction, and the combined bi-directional merge candidate may be a merge candidate having L0 motion information and L1 motion information.

The merge candidate having the predetermined motion information value may be a zero merge candidate having a motion vector (0, 0). The merge candidate having the predetermined motion information value may be set such that the merge candidate has the same value in the encoding device and the decoding device.

At least one of a modified spatial merge candidate, a modified temporal merge candidate, a combined merge candidate, and a merge candidate having a predetermined motion information value may be derived or generated based on at least one of the encoding parameters of the current block, the neighboring block, or the co-located block. In addition, at least one of a modified spatial merge candidate, a modified temporal merge candidate, a combined merge candidate, and a merge candidate having a predetermined motion information value may be added to the merge candidate list based on at least one of coding parameters of the current block, the neighboring block, or the co-located block.

The size of the merge candidate list may be determined based on coding parameters of the current block, neighboring blocks, or co-located blocks, and may vary according to the coding parameters.

Next, a step of determining motion information of the current block using the generated merge candidate list (S702, S803) will be described.

The encoder may select a merge candidate to be used for motion compensation of the current block from the merge candidate list through motion estimation, and may encode a merge candidate index merge_idx indicating the determined merge candidate to the bitstream.

In order to generate a predicted block of a current block, an encoder may select a merge candidate from a merge candidate list by using a merge candidate index and determine motion information of the current block. The encoder may then perform motion compensation based on the determined motion information, thereby generating a prediction block of the current block.

The decoder may decode the merge candidate index in the received bitstream and determine a merge candidate indicated by the merge candidate index included in the merge candidate list. The determined merge candidate may be determined as motion information of the current block. The determined motion information is used for motion compensation of the current block. Here, the term "motion compensation" may have the same meaning as inter prediction.

Next, steps of performing motion compensation using motion vectors or motion information (S504, S605, S703, S804) will be described.

The encoding apparatus and the decoding apparatus may calculate a motion vector of the current block by using the predicted motion vector and the motion vector difference. After calculating the motion vector, the encoding apparatus and the decoding apparatus may perform inter prediction or motion compensation using the calculated motion vector (S504, S605).

The encoding apparatus and the decoding apparatus may perform inter prediction or motion compensation using the determined motion information (S703, S804). Here, the current block may have motion information of the determined merge candidate.

The current block may have one (minimum) to N (maximum) motion vectors according to the prediction direction of the current block. The one to N motion vectors may be used to generate one (minimum) to N (maximum) prediction blocks, and a final prediction block may be selected among the generated prediction blocks.

For example, when the current block has one motion vector, a prediction block generated using the motion vector (or motion information) is determined as a final prediction block of the current block.

Further, when the current block has a plurality of motion vectors (or pieces of motion information), a plurality of prediction blocks are generated using the plurality of motion vectors (or pieces of motion information), and a final prediction block of the current block is determined based on a weighted sum of the plurality of prediction blocks. A plurality of reference pictures respectively including a plurality of prediction blocks respectively indicated by a plurality of motion vectors (or a plurality of pieces of motion information) may be listed in different reference picture lists or in one reference picture list.

For example, a plurality of prediction blocks of the current block may be generated based on at least one of a spatial motion vector candidate, a temporal motion vector candidate, a motion vector having a predetermined value, and a combined motion vector candidate, and then a final prediction block of the current block may be determined based on a weighted sum of the plurality of prediction blocks.

Alternatively, for example, a plurality of prediction blocks of the current block may be generated based on the motion vector candidates indicated by the preset motion vector candidate index, and then a final prediction block of the current block may be determined based on a weighted sum of the plurality of prediction blocks. In addition, a plurality of prediction blocks may be generated based on motion vector candidates indicated by indexes within a predetermined motion vector candidate index range, and then a final prediction block of the current block may be determined based on a weighted sum of the plurality of prediction blocks.

The weight factor for each prediction block may be equal to 1/N (where N is the number of prediction blocks generated). For example, when two prediction blocks are generated, the weight factor for each prediction block is 1/2. Similarly, when three prediction blocks are generated, the weight factor for each prediction block is 1/3. When four prediction blocks are generated, the weight factor for each prediction block may be 1/4. Alternatively, the final prediction block of the current block may be determined in such a manner that different weight factors are applied to the respective prediction blocks.

The weighting factors for the prediction blocks may not be fixed, but variable. The weighting factors for the prediction blocks may not be equal, but different. For example, when two prediction blocks are generated, the weight factors for the two prediction blocks may be equal, such as (1/2 ), or may be unequal, such as (1/3, 2/3), (1/4, 3/4), (2/5, 3/5), or (3/8, 5/8). The weight factor may be a positive real value or a negative real value. That is, the value of the weighting factor may include a negative real value, such as (-1/2, 3/2), (-1/3, 4/3), or (1-1/4, 5/4).

To apply the variable weight factor, one or more pieces of weight factor information for the current block may be signaled through the bitstream. The weight factor information may be signaled on a prediction block-by-prediction block basis, or may be signaled on a reference picture-by-reference picture basis. Alternatively, multiple prediction blocks may share one weight factor.

The encoding apparatus and the decoding apparatus may determine whether to use the prediction motion vector (or the prediction motion information) using the flag based on the prediction block list. For example, when the prediction block list utilization flag has a first value of one (1) for each reference picture list, the encoding apparatus and decoding apparatus may perform inter prediction or motion compensation on the current block using the prediction motion vector of the current block. However, when the prediction block list utilization flag has a second value of zero (0), the encoding apparatus and the decoding apparatus may perform inter prediction or motion compensation on the current block without using the prediction motion vector of the current block. The first and second values of the prediction block list utilization flag may be inversely set to 0 and 1, respectively. Expression 3 to expression 5 are examples of a method of generating a final prediction block of the current block when the inter prediction indicator of the current block is pred_bi, pred_tri, or pred_quad and when the prediction direction of each reference picture list is unidirectional.

[ expression 1]

P_BI＝(WF_L0*P_L0+OFFSET_L0+WF_L1*P_L1+OFFSET_L1+RF)＞＞1

[ expression 2]

P_TRI＝(WF_L0*P_L0+OFFSET_L0+WF_L1*P_L1+OFFSET_L1+WF_L2*P_L2+OFFSET_L2+RF)/3

[ expression 3]

P_QUAD＝(WF_L0*P_L0+OFFSET_L0+WF_L1*P_L1+OFFSET_L1+WF_L2*P_L2+OFFSET_L2+WF_L3*P_L3+OFFSET_L3+RF)>>2

In expressions 1 to 3, each of p_bi, p_tri, and p_quad represents a final prediction block of the current block, and LX (x=0, 1,2, 3) represents a reference picture list. Wflx represents the weight factor of the prediction block generated using LX reference picture list. OFFSET LX represents an OFFSET value for a prediction block generated using the LX reference picture list. P LX denotes a prediction block of the current block generated using a motion vector (or motion information) of the LX reference picture list. RF means a rounding factor and it can be set to 0, a positive integer or a negative integer. The LX reference picture list may include at least one of the following reference pictures: long-term reference pictures, reference pictures that have not undergone a de-filter, reference pictures that have not undergone a sample adaptive offset, reference pictures that have not undergone an adaptive loop filter, reference pictures that have undergone only a block filter and an adaptive offset, reference pictures that have undergone only a block filter and an adaptive loop filter, reference pictures that have undergone a sample adaptive offset and an adaptive loop filter, and reference pictures that have undergone all of a deblocking filter, a sample adaptive offset, and an adaptive loop filter. In this case, the LX reference picture list may be at least any one of an L2 reference picture list and an L3 reference picture list.

Even if there are multiple prediction directions for the predetermined reference picture list, a final prediction block of the current block may be obtained based on a weighted sum of the prediction blocks. In this case, the weight factors of the plurality of prediction blocks derived using one reference picture list may be equal or may be different from each other.

At least the weight factor wf_lx or the OFFSET offset_lx of the plurality of prediction blocks may be coding parameters to be entropy encoded/decoded. Alternatively, the weight factors and offsets may be derived from previously encoded/decoded neighboring blocks adjacent to the current block, for example. Here, the neighboring block adjacent to the current block may include at least one of a block for deriving a spatial motion vector candidate of the current block and a block for deriving a temporal motion vector candidate of the current block.

Further alternatively, the weight factor and offset may be determined based on, for example, the display order (picture order count (POC)) of the current picture and the POC of each reference picture. In this case, as the distance between the current picture and the reference picture increases, the value of the weight factor or offset may decrease. That is, when the current picture and the reference picture are close to each other, a larger value may be set as the weight factor or offset. For example, when the distance between the POC of the current picture and the POC of the L0 reference picture is 2, the value of the weight factor applied to the prediction block generated using the L0 reference picture may be set to 1/3. Meanwhile, when the difference between the POC of the current picture and the POC of the L0 reference picture is 1, the value of the weight factor applied to the prediction block generated using the L0 reference picture may be set to 2/3. As described above, the weight factor or offset may be inversely proportional to a difference between a display order (POC) of the current picture and a display order (POC) of the reference picture. Alternatively, the weight factor or offset may be proportional to a difference between a display order (POC) of the current picture and a display order (POC) of the reference picture.

Optionally, at least one of the weight factors and the offsets may be entropy encoded/decoded based on at least one encoding parameter, for example. In addition, a weighted sum of the prediction blocks may be calculated based on the at least one coding parameter.

The weighted sum of the plurality of prediction blocks may be applied to only a partial region of the prediction block. The partial region may be a boundary region adjacent to a boundary of each prediction block. In order to apply the weighted sum to only the partial region as described above, the weighted sum may be calculated sub-block by sub-block in each prediction block.

In a block having a block size indicated by the region information, inter prediction or motion compensation may be performed for a sub-block smaller than the block by using the same prediction block or the same final prediction block.

In a block having a block depth indicated by the region information, inter prediction or motion compensation may be performed for a sub-block having a block depth deeper than that of the block by using the same prediction block or the same final prediction block.

In addition, when a weighted sum of the prediction blocks is calculated by motion vector prediction, the weighted sum may be calculated using at least one of the motion vector candidates included in the motion vector candidate list, and the calculation result may be used as a final prediction block of the current block.

For example, a prediction block may be generated using only spatial motion vector candidates, a weighted sum of the prediction blocks may be calculated, and the calculated weighted sum may be used as a final prediction block of the current block.

For example, a prediction block may be generated using a spatial motion vector candidate and a temporal motion vector candidate, a weighted sum of the prediction blocks may be calculated, and the calculated weighted sum may be used as a final prediction block of the current block.

For example, a prediction block may be generated using only combined motion vector candidates, a weighted sum of the prediction blocks may be calculated, and the calculated weighted sum may be used as a final prediction block of the current block.

For example, a prediction block may be generated using only motion vector candidates indicated by a specific index, a weighted sum of the prediction blocks may be calculated, and the calculated weighted sum may be used as a final prediction block of the current block.

For example, a prediction block may be generated using only motion vector candidates indicated by indexes within a predetermined index range, a weighted sum of the prediction blocks may be calculated, and the calculated weighted sum may be used as a final prediction block of the current block.

When the weighted sum of the prediction block is calculated using the merge mode, the weighted sum may be calculated using at least one merge candidate among the merge candidates included in the merge candidate list, and the calculation result may be used as the final prediction block of the current block.

For example, a prediction block may be generated using only spatial merging candidates, a weighted sum of the prediction blocks may be calculated, and the calculated weighted sum may be used as a final prediction block of the current block.

For example, a prediction block may be generated using the spatial merge candidate and the temporal merge candidate, a weighted sum of the prediction blocks may be calculated, and the calculated weighted sum may be used as a final prediction block of the current block.

For example, a prediction block may be generated using only the combined merge candidates, a weighted sum of the prediction blocks may be generated, and the calculated weighted sum may be used as a final prediction block of the current block.

For example, a prediction block may be generated using only the merging candidates indicated by a specific index, a weighted sum of the prediction blocks may be generated, and the calculated weighted sum may be used as a final prediction block of the current block.

For example, a prediction block may be generated using only merging candidates indicated by indexes within a predetermined index range, a weighted sum of the prediction blocks may be calculated, and the calculated weighted sum may be used as a final prediction block of the current block.

In the encoder and decoder, motion compensation may be performed using motion vectors or motion information of the current block. At this time, at least one prediction block may be used to determine a final prediction block as a result of motion compensation. Here, the current block may mean at least one of a current coded block and a current predicted block.

The final prediction block of the current block may be generated by performing overlapped block motion compensation on a boundary region of the current block.

The boundary region of the current block may be a region located within the current block and adjacent to a boundary between the current block and a neighboring block of the current block. The boundary region of the current block may include at least one of an upper boundary region, a left boundary region, a lower boundary region, a right boundary region, an upper right corner region, a lower right corner region, an upper left corner region, and a lower left corner region. The boundary region of the current block may be a region corresponding to a portion of the prediction block of the current block.

Overlapped block motion compensation may mean a process of performing motion compensation by calculating a weighted sum of a prediction block corresponding to a boundary region of a current block and a prediction block generated using motion information of an encoded/decoded block adjacent to the current block.

The calculation of the weighted sum may be performed sub-block by dividing the current block into a plurality of sub-blocks. That is, motion compensation of the current block may be performed on a sub-block-by-sub-block basis using motion information of an encoded/decoded sub-block adjacent to the current block. A sub-block may mean a lower block of the current block.

In addition, in calculating the weighted sum, a first prediction block generated for each sub-block of the current block using motion information of the current block and a second prediction block generated using motion information of a neighboring sub-block spatially adjacent to the current block may be used. In this case, the expression "using motion information" means "deriving motion information". The first prediction block may mean a prediction block generated by using motion information of an encoding/decoding target sub-block within the current block. The second prediction block may be a prediction block generated by using motion information of a neighboring sub-block spatially adjacent to the encoding/decoding target sub-block within the current block.

A weighted sum of the first prediction block and the second prediction block may be used to generate a final prediction block for the current block. That is, overlapped block motion compensation consists in finding the final prediction block of the current block using the motion information of the current block and the motion information of another block.

In addition, when at least one of an Advanced Motion Vector Prediction (AMVP) mode, a merge mode, an affine motion compensation mode, a decoder-side motion vector derivation mode, an adaptive motion vector resolution mode, a local illumination compensation mode, a bidirectional optical flow mode is used, a current block may be divided into a plurality of sub-blocks, and overlapped block motion compensation may be performed on a sub-block-by-sub-block basis.

When the merge mode is used for motion compensation, overlapped block motion compensation may be performed on at least one of an improved temporal motion vector predictor (ATMVP) candidate and a spatial-temporal motion vector predictor (STMVP) candidate.

Details of the overlapped block motion compensation will be described later with reference to fig. 13 to 24.

Next, a process of performing entropy encoding/decoding on information associated with motion compensation (S505, S601, S704, S801) will be described.

The encoding apparatus may entropy encode information associated with the motion compensation into a bitstream, and the decoder may decode information associated with the motion compensation included in the bitstream. The information associated with motion compensation, which is the target of entropy encoding or entropy decoding, may include at least one of: inter prediction indicators inter_pred_idc, reference picture indexes ref_idx_l0, ref_idx_l1, ref_idx_l2, and ref_idx_l3, motion vector candidate indexes mvp_l0_idx, mvp_l1_idx, mvp_l2_idx, and mvp_l3_idx, motion vector differences, skip mode used/unused information cu_skip_flag, merge mode used/unused information merge_flag, merge index information merge_index, weight factors wf_l0, wf_l1, wf_l2, and wf_l3, and offset values offset_10, offset_11, offset_12, and offset_13.

The inter prediction indicator may mean a prediction direction of inter prediction when the current block is encoded/decoded through inter prediction, the number of prediction directions, or both the prediction direction and the number of prediction directions of inter prediction. For example, the inter prediction indicator may indicate unidirectional prediction or multi-directional prediction (such as bi-directional prediction, tri-directional prediction, and tetra-directional prediction). The inter prediction indicator may indicate the number of reference pictures used to generate a prediction block of the current block. Alternatively, one reference picture may be used for prediction in multiple directions. In this case, M reference pictures are used to perform prediction in N directions (where N > M). The inter prediction indicator may also mean the number of prediction blocks used for inter prediction or motion compensation of the current block.

The reference picture indicator may indicate one direction pred_lx, two directions pred_bi, three directions pred_tri, four directions pred_qud, or more directions according to the number of prediction directions of the current block.

The prediction list utilization flag of a specific reference picture list indicates whether to generate a prediction block using the reference picture list.

For example, when the prediction list utilization flag of a particular reference picture list has a first value of one (1), this means that the reference picture list is used to generate a prediction block. When the prediction list utilization flag has a second value of zero (0), this means that the reference picture list is not used to generate the prediction block. Here, the first value and the second value of the prediction list utilization flag may be inversely set to 0 and 1, respectively.

That is, when the prediction list utilization flag of a specific reference picture list has a first value, a prediction block of a current block may be generated using motion information corresponding to the reference picture list.

The reference picture index may indicate a particular reference picture that exists in the reference picture list and is referenced by the current block. For each reference picture list, one or more reference picture indices may be entropy encoded/decoded. The current block may be motion compensated using one or more reference picture indices.

The motion vector candidate index indicates a motion vector candidate of the current block among motion vector candidates included in a motion vector candidate list prepared for each reference picture list or each reference picture index. At least one or more motion vector candidate indexes may be entropy encoded/entropy decoded for each motion vector candidate list. The current block may be motion compensated using at least one or more motion vector candidate indexes.

The motion vector difference represents the difference between the current motion vector and the predicted motion vector. For each motion vector candidate list generated for each reference picture list or each reference picture index for the current block, one or more motion vector differences may be entropy encoded/entropy decoded. The current block may be motion compensated using one or more motion vector differences.

Regarding the skip mode used/unused information cu_skip_flag, when the skip mode used/unused information cu_skip_flag has a first value of one (1), the skip mode may be used. In contrast, when the skip mode used/unused information cu_skip_flag has a second value of zero (0), the skip mode may not be used. Motion compensation may be performed on the current block using the skip mode according to the skip mode used/unused information.

Regarding the merge mode use/non-use information merge_flag, when the merge mode use/non-use information merge_flag has a first value of one (1), the merge mode may be used. In contrast, when the merge mode use/non-use information merge_flag has a second value of zero (0), the merge mode may not be used. Motion compensation may be performed on the current block using the merge mode according to the merge mode use/unused information.

The merge index information merge_index may mean information indicating a merge candidate within the merge candidate list.

Alternatively, the merge index information may mean information about the merge index.

In addition, the merge index information may indicate a reconstructed block for deriving a merge candidate among reconstructed blocks spatially/temporally adjacent to the current block.

The merge index information may indicate one or more pieces of motion information that the merge candidate has. For example, when the merge index information has a first value of zero (0), the merge index information may indicate a first merge candidate listed as a first entry in the merge candidate list; when the merge index information has a second value of one (1), the merge index information may indicate a second merge candidate listed as a second entry in the merge candidate list; when the merge index information has a third value of two (2), the merge index information indicates a third merge candidate listed as a third entry in the merge candidate list. Similarly, when the merge index information has values from the fourth value to the nth value, the merge index information may indicate merge candidates listed in the merge candidate list at positions according to the order of the values. Here, N may be 0 or a positive integer.

Motion compensation may be performed on the current block using a merge mode based on the merge index information.

When two or more prediction blocks are generated during motion compensation of the current block, a final prediction block of the current block may be determined based on a weighted sum of the prediction blocks. When calculating the weighted sum, a weight factor, an offset, or both a weight factor and an offset may be applied to each prediction block. The weighted sum factors (e.g., weight factors and offsets) used to calculate the weighted sum may be entropy encoded/entropy decoded by the number of at least one of the following or may be entropy encoded/entropy decoded by the number corresponding to the at least one of the following: reference picture list, reference picture, motion vector candidate index, motion vector difference, motion vector, skip mode used/unused information, merge index information. In addition, the weighted sum factor for each prediction block may be entropy encoded/entropy decoded based on the inter prediction indicator. The weighted sum factor may include at least one of a weight factor and an offset.

The information associated with motion compensation may be entropy-encoded/entropy-decoded on a block-by-block basis, or may be entropy-encoded/entropy-decoded in units of higher-level units. For example, information associated with motion compensation may be entropy encoded/entropy decoded on a block-by-block (e.g., CTU-by-CTU, CU-by-CU, or PU-by-PU) basis. Optionally, the information associated with motion compensation may be entropy encoded/entropy decoded in units of higher-level units such as a video parameter set, a sequence parameter set, a picture parameter set, an adaptive parameter set, or a slice header.

The information associated with motion compensation may be entropy encoded/entropy decoded based on a motion compensation information difference, wherein the motion compensation information difference indicates a difference between the information associated with motion compensation and a predicted value of the information associated with motion compensation.

Information associated with motion compensation of an encoded/decoded block adjacent to the current block may be used as information associated with motion compensation of the current block instead of entropy encoding/decoding the information associated with motion compensation of the current block.

At least one piece of information associated with motion compensation may be derived based on at least one of the encoding parameters.

The bitstream may be decoded based on at least one of the encoding parameters to generate at least one piece of information associated with motion compensation. Instead, at least one piece of information associated with motion compensation may be entropy encoded to the bitstream based on at least one of the encoding parameters.

The at least one piece of information associated with motion compensation may include at least one of a motion vector, a motion vector candidate index, a motion vector difference, a motion vector predictor, skip mode used/unused information skip_flag, merge mode used/unused information merge_flag, merge index information merge_index, motion vector resolution information, overlapped block motion compensation information, local illumination compensation information, affine motion compensation information, decoder side motion vector derivation information, and bidirectional optical flow information. Here, the decoder-side motion vector derivation may mean pattern-matched motion vector derivation.

The motion vector resolution information may be information indicating which specific resolution is used for at least one of the motion vector and the motion vector difference. Here, resolution may mean accuracy. The specific resolution may be set to at least any one of a 16-pixel (16-pel) unit, an 8-pixel (8-pel) unit, a 4-pixel (4-pel) unit, an integer-pixel (integer-pel) unit, a 1/2-pixel (1/2-pel) unit, a 1/4-pixel (1/4-pel) unit, a 1/8-pixel (1/8-pel) unit, a 1/16-pixel (1/16-pel) unit, a 1/32-pixel (1/32-pel) unit, and a 1/64-pixel (1/64-pel) unit.

The overlapped block motion compensation information may be information indicating whether a motion vector of a neighboring block spatially adjacent to the current block during motion compensation of the current block is to be additionally used to calculate a weighted sum of prediction blocks of the current block.

The local illumination compensation information may be information indicating whether any one of a weight factor and an offset is applied when generating a prediction block of the current block. Here, at least one of the weight factor and the offset may be a value calculated based on the reference block.

The affine motion compensation information may be information indicating whether the affine motion model is to be used for motion compensation of the current block. Here, the affine motion model may be a model that divides one block into a plurality of sub-blocks using a plurality of parameters and calculates a motion vector of the block using a representative motion vector.

The decoder-side motion vector derivation information may be information indicating whether or not a motion vector required for motion compensation is derived by the decoder and then used in the decoder. Information associated with the motion vector may not be entropy encoded/decoded from the decoder-side motion vector derivation information. When the decoder-side motion vector derivation information indicates that the motion vector is derived by the decoder and then used in the decoder, information associated with the merge mode may be entropy encoded/entropy decoded. That is, the decoder-side motion vector derivation information may indicate whether a merge mode is used in the decoder.

The bidirectional optical flow information may be information indicating whether a motion vector is modified pixel by pixel or sub-block by sub-block and whether a subsequently modified motion vector is used for motion compensation. According to the bi-directional optical flow information, the motion vectors may not be entropy encoded/entropy decoded on a pixel-by-pixel or sub-block-by-sub-block basis. The modification of the motion vector means converting the value of the block-based motion vector into the value of the pixel-based motion vector or the value of the sub-block-based motion vector.

The current block may be motion compensated based on at least one piece of information associated with motion compensation, and the at least one piece of information associated with motion compensation may be entropy encoded/entropy decoded.

When information associated with motion compensation is entropy encoded/entropy decoded, binarization methods such as a truncated rice binarization method, a K-th order exponential golomb binarization method, a finite K-th order exponential golomb binarization method, a fixed length binarization method, a unitary binarization method, and a truncated unitary binarization method may be used.

When information associated with motion compensation is entropy encoded/entropy decoded, a context model may be determined based on at least one of the following information: information associated with motion information of a neighboring block adjacent to the current block or region information of the neighboring block; previously encoded/decoded information associated with motion compensation or previously encoded/decoded region information; information on the depth of the current block; information about the size of the current block.

Alternatively, when the information associated with the motion compensation is entropy-encoded/entropy-decoded, the entropy-encoding/entropy-decoding may be performed by using at least one of the following information as a predicted value of the information associated with the motion compensation of the current block: information associated with motion compensation of neighboring blocks, previously encoded/decoded information associated with motion compensation, information about the depth of the current block, and information about the size of the current block.

Details of the overlapped block motion compensation may be described below with reference to fig. 13 to 24.

Fig. 13 is a diagram illustrating an example of performing overlapped block motion compensation on a sub-block-by-sub-block basis.

Referring to fig. 13, a shadow block is a region to be performed with overlapped block motion compensation. The shadow block may include a sub-block of the current block located at a boundary or a sub-block in the current block. The area marked by the thick solid line may be the current block.

The arrow indicates that the motion information of the neighboring sub-block is used for motion compensation of the current sub-block. Here, the region in which the arrow tail is located may mean (1) a neighboring sub-block adjacent to the current block or a neighboring sub-block adjacent to the current sub-block within the current block. The region in which the arrow head is located may mean a current sub-block within the current block.

For a shadow block, a weighted sum of the first prediction block and the second prediction block may be calculated. The motion information of the current sub-block within the current block may be used as motion information for generating the first prediction block. At least one or both of motion information of a neighboring sub-block neighboring the current block and motion information of a neighboring sub-block neighboring the current sub-block and included in the current block may be used as motion information for generating the second prediction block.

In addition, in order to improve encoding efficiency, the motion information for generating the second prediction block may include motion information of at least one of an upper block, a left block, a lower block, a right block, an upper right block, a lower right block, an upper left block, and a lower left block of the current sub-block within the current block. The neighboring sub-blocks that may be used to generate the second prediction block may be determined according to the location of the current sub-block. For example, when the current sub-block is located at the upper boundary of the current block, at least one neighboring sub-block among neighboring sub-blocks located at the upper side, upper right side, and upper left side of the current sub-block may be used. When the current sub-block is located at the left boundary of the current block, at least one neighboring sub-block among neighboring sub-blocks located at the left side, the upper left side, and the lower left side of the current sub-block may be used.

Here, blocks located at the upper side, left side, lower side, right side, upper right side, lower right side, upper left side, and lower left side of the current sub-block may be referred to as an upper neighboring sub-block, a left neighboring sub-block, a lower neighboring sub-block, a right neighboring sub-block, an upper right neighboring sub-block, a lower right neighboring sub-block, an upper left neighboring sub-block, and a lower left neighboring sub-block, respectively.

Meanwhile, in order to reduce computational complexity, motion information for generating the second prediction block may vary according to the size of a neighboring sub-block adjacent to the current block or a motion vector of a neighboring sub-block adjacent to the current sub-block within the current block.

For example, when the neighboring sub-block is a bi-predicted sub-block, the magnitudes of the L0 direction and the L1 direction of the motion vector are compared, and only the motion information of the direction of the larger magnitude may be used to generate the second prediction block.

Alternatively, for example, the sum of the absolute values of the x-component and the y-component of the L0 direction of the motion vector and the sum of the absolute values of the x-component and the y-component of the L1 direction of the motion vector are calculated. Then, only motion vectors equal to or greater than a predetermined value may be used to generate the second prediction block. Here, the predetermined value may be 0 or a positive integer. The predetermined value may be a value determined based on information signaled from the encoder to the decoder. Alternatively, the predetermined value may not be signaled, but may be a value that is set identically in the encoder and decoder.

In addition, in order to reduce computational complexity, the motion information used to generate the second prediction block may vary according to the size and direction of the motion vector of the current sub-block.

For example, the absolute values of the x-component and y-component of the motion vector of the current sub-block may be compared. When the absolute value of the x-component is large, motion information of at least one of the left and right sub-blocks of the current sub-block may be used to generate the second prediction block.

Alternatively, for example, the absolute values of the x-component and y-component of the motion vector of the current sub-block may be compared. When the absolute value of the y component is large, motion information of at least one of an upper sub-block and a lower sub-block of the current sub-block may be used to generate the second prediction block.

Alternatively, for example, when an absolute value of an x-component of a motion vector of the current sub-block is equal to or greater than a predetermined value, motion information of at least one of a left sub-block and a right sub-block of the current sub-block may be used to generate the second prediction block. Here, the predetermined value may be zero (0) or a positive integer. The predetermined value may be determined based on information signaled from the encoder to the decoder, or may be set identically in the encoder and decoder.

Further alternatively, for example, when an absolute value of a y component of a motion vector of the current sub-block is equal to or greater than a predetermined value, motion information of at least one of an upper sub-block and a lower sub-block of the current sub-block may be used to generate the second prediction block. Here, the predetermined value may be zero (0) or a positive integer. The predetermined value may be determined based on information signaled from the encoder to the decoder, or may be set identically in the encoder and decoder.

Here, the sub-block may have a size of n×m, where N and M are positive integers. N and M may be equal or may be unequal. For example, the size of the sub-block may be 4×4 or 8×8. The information of the size of the sub-block may be entropy encoded/entropy decoded at the sequence unit level.

The size of the sub-block may be determined based on the size of the current block. For example, when the size of the current block is K samples or less, the size of the sub-block may be 4×4. Meanwhile, when the size of the current block is greater than K samples, the size of the sub-block may be 8×8. Here, K is a positive integer, for example 256.

Here, the information of the size of the sub-block may be entropy encoded/entropy decoded in units of at least any one of a sequence, a picture, a slice, a parallel block, a CTU, a CU, and a PU. In addition, the size of the sub-block may be a predetermined value preset in the encoder and the decoder.

The sub-blocks may have a square shape or a rectangular shape. For example, when the current block has a square shape or a rectangular shape, the sub-block may have a square shape.

For example, when the current block has a rectangular shape, the sub-block may have a rectangular shape.

Here, the information of the shape of the sub-block may be entropy encoded/entropy decoded in units of at least one of a sequence, a picture, a slice, a parallel block, a CTU, a CU, and a PU. In addition, the shape of the sub-block may be a predetermined shape preset in the encoder and the decoder.

Fig. 14 is a diagram illustrating an example of performing overlapped block motion compensation using motion information of sub-blocks of a same bit block. In order to improve coding efficiency, motion information of a co-located sub-block spatially co-located with the current block in a co-located picture or a reference picture may be used to generate the second prediction block.

Referring to fig. 14, motion information of a sub-block temporally adjacent to a current block within a co-located block may be used for overlapped block motion compensation of the current sub-block. The area where the tail of the arrow is located may be a sub-block within the co-located block. The region in which the arrow head is located may be a current sub-block within the current block.

In addition, motion information of at least one of a co-located sub-block in the co-located picture, a neighboring sub-block spatially adjacent to the current block, and a neighboring sub-block spatially adjacent to the current sub-block within the current block may be used to generate the second prediction block.

Fig. 15 is a diagram illustrating an example of performing overlapped block motion compensation using motion information of blocks adjacent to a boundary of a reference block. In order to improve coding efficiency, a reference block in a reference picture may be identified by using at least one of a motion vector of a current block and a reference picture index, and motion information of neighboring blocks adjacent to the identified boundary of the reference block may be used to generate a second prediction block. Here, the neighboring block may include an encoded/decoded block adjacent to a sub-block of the reference block located at the right boundary or the left boundary.

Referring to fig. 15, motion information of an encoded/decoded block adjacent to a lower boundary or a right boundary of a reference block may be used for overlapped block motion compensation of a current sub-block.

In addition, at least one of motion information of an encoded/decoded block adjacent to a lower boundary or a right boundary of the reference block, motion information of a neighboring sub-block spatially adjacent to the current block, and motion information of a neighboring sub-block spatially adjacent to the current sub-block within the current block may be used to generate the second prediction block.

In order to improve coding efficiency, motion information of at least one of a plurality of merge candidates included in a merge candidate list may be used to generate a second prediction block. Here, the merge candidate list may be a list used in a merge mode among a plurality of inter prediction modes.

For example, spatial merge candidates in the merge candidate list may be used as motion information for generating the second prediction block.

Alternatively, for example, temporal merging candidates in the merging candidate list may be used as motion information for generating the second prediction block.

Further alternatively, for example, the combined merge candidate in the merge candidate list may be used as motion information for generating the second prediction block.

Alternatively, in order to improve coding efficiency, at least one motion vector among a plurality of motion vector candidates included in the motion vector candidate list may be used as a motion vector for generating the second prediction block. Here, the motion vector candidate list may be a list used in an AMVP mode among a plurality of inter prediction modes.

For example, spatial motion vector candidates in the motion vector candidate list may be used as motion information for generating the second prediction block.

Alternatively, for example, temporal motion vector candidates in the motion vector candidate list may be used as motion information for generating the second prediction block.

When at least one of the merge candidate and the motion vector candidate is used as motion information required to generate the second prediction block, an area to which the overlapped block motion compensation is applied may be differently set. The region to which the overlapped block motion compensation is applied may be a region of the block adjacent to the boundary (i.e., a sub-block of the block located at the boundary) or a region of the block not adjacent to the boundary (i.e., a sub-block of the block not located at the boundary).

When the overlapped block motion compensation is applied to a region of the block not adjacent to the boundary, at least one of the merge candidate and the motion vector candidate may be used as motion information required to generate the second prediction block.

For example, overlapped block motion compensation may be performed for a region of a block that is not adjacent to a boundary by using a spatial merging candidate or a spatial motion vector candidate as motion information.

Alternatively, for example, overlapped block motion compensation may be performed for a region of a block that is not adjacent to a boundary by using a temporal merging candidate or a temporal motion vector candidate as motion information.

Further alternatively, for example, overlapped block motion compensation may be performed for a region of a block adjacent to a lower boundary or a right boundary by using a spatial merging candidate or a spatial motion vector candidate as motion information.

Further alternatively, for example, overlapped block motion compensation may be performed for a region of a block adjacent to a lower boundary or a right boundary by using a temporal merging candidate or a temporal motion vector candidate as motion information.

In addition, in order to improve coding efficiency, motion information derived from a specific block within a merge candidate list or a motion vector candidate list may be used for overlapped block motion compensation for a specific region.

For example, when motion information of a right upper neighboring block of the current block is included in the merge candidate list or the motion vector candidate list, the motion information may be used for overlapped block motion compensation of a right boundary region of the current block.

Alternatively, for example, when motion information of a lower left neighboring block of the current block is included in the merge candidate list or the motion vector candidate list, the motion information may be used for overlapped block motion compensation of a lower boundary region of the current block.

Fig. 16 is a diagram showing an example of performing overlapped block motion compensation on a sub-block group-by-sub-block group basis. To reduce computational complexity, sub-block based overlapped block motion compensation may be performed in units of sub-block sets including one or more sub-blocks. The unit of the sub-block set may mean a unit of a sub-block group.

Referring to fig. 16, the hatched area marked with lines may be referred to as a sub-block group. The arrow means that motion information of neighboring sub-blocks can be used for motion compensation of the current sub-block group. The region in which the arrow tail is located may be (1) a neighboring sub-block neighboring the current block, (2) a neighboring sub-block group neighboring the current block, or (3) a neighboring sub-block group neighboring the current sub-block within the current block. In addition, the region in which the arrow head is located may mean a current sub-block group within the current block.

For each sub-block group, a weighted sum of the first prediction block and the second prediction block may be calculated. The motion information of the current sub-block group within the current block may be used as motion information for generating the first prediction block. Here, the motion information of the current sub-block group within the current block may be any one of an average value, a median value, a minimum value, a maximum value, and a weighted sum of the motion information of each sub-block within the current sub-block group. At least one of motion information of a neighboring sub-block neighboring the current block, motion information of a neighboring sub-block group neighboring the current block, and motion information of a neighboring sub-block neighboring the current sub-block within the current block may be used as motion information for generating the second prediction block. Here, the motion information of the neighboring sub-block group neighboring the current block may be any one of an average value, a median value, a minimum value, a maximum value, and a weighted sum of the motion information of each sub-block included in the neighboring sub-block group.

Here, the current block may include one or more sub-block groups. The horizontal size of one sub-block group may be equal to or smaller than the horizontal size of one current sub-block. In addition, the vertical size of one sub-block group may be equal to or smaller than the vertical size of one current sub-block. In addition, overlapped block motion compensation may be performed for at least one sub-block of a plurality of sub-blocks of the current block located at the upper and left boundaries.

Since blocks adjacent to the lower and right boundaries within the current block have not been encoded/decoded, overlapped block motion compensation may not be performed for at least one of a plurality of sub-blocks located at the lower and right boundaries within the current block. Alternatively, since blocks adjacent to the lower and right boundaries within the current block have not been encoded/decoded, overlapped block motion compensation may be performed for at least any one of a plurality of sub-blocks located at the left and right boundaries within the current block by using at least one of motion information of an upper block, motion information of a left block, motion information of an upper left block, motion information of a lower left block, and motion information of an upper right block of the current sub-block.

In addition, when the current block is to be predicted in the merge mode and has at least one of an improved temporal motion vector prediction candidate and a spatio-temporal motion vector prediction candidate, for at least one of a plurality of sub-blocks located at a lower boundary and a right boundary within the current block, overlapped block motion compensation may not be performed.

In addition, when the current block is to be predicted in the decoder-side motion vector derivation mode or the affine motion compensation mode, for at least one of a plurality of sub-blocks located at the lower boundary and the right boundary within the current block, overlapped block motion compensation may not be performed.

In addition, overlapped block motion compensation may be performed for at least one of the color components of the current block. The color component may include at least one of a luminance component and a chrominance component.

Alternatively, overlapped block motion compensation may be performed according to an inter prediction indicator of the current block. That is, overlapping block motion compensation may be performed when the current block is to be uni-directionally predicted, bi-directionally predicted, tri-directionally predicted, and/or tetra-directionally predicted. Alternatively, the overlapped block motion compensation may be performed only when the current block is unidirectionally predicted. Further alternatively, the overlapped block motion compensation may be performed only when the current block is bi-directionally predicted.

Fig. 17 is a diagram showing an example of pieces of motion information for overlapped block motion compensation.

The maximum number of pieces of motion information for generating the second prediction block may be K. That is, up to K second prediction blocks may be generated and used for overlapped block motion compensation. Here, K may be zero (0) or a positive integer, for example, 1, 2, 3, or 4.

For example, when the second prediction block is generated using motion information of a neighboring sub-block adjacent to the current block, at most two pieces of motion information may be derived from at least one of the upper block and the right block. When the second prediction block is generated based on motion information of neighboring sub-blocks adjacent to the current sub-block within the current block, a maximum of four pieces of motion information may be derived from at least one of an upper block, a left block, a right block, an upper left block, an upper right block, a lower left block, and a lower right block of the current sub-block. Here, the expression "deriving motion information" may mean a process of generating a second prediction block using the derived motion information and then performing overlapped block motion compensation using the generated second prediction block.

Referring to fig. 17, in order to improve coding efficiency, when motion compensation is performed for at least one of a plurality of sub-blocks located at an upper boundary and a left boundary within a current block, at most three pieces of motion information may be derived for generating a second prediction block. That is, motion information for generating the second prediction block may be derived based on the 3-connection.

For example, when motion compensation is performed for a sub-block located at an upper boundary within a current block, motion information may be derived from at least one of an upper neighboring block, an upper left neighboring block, and an upper right neighboring block among neighboring sub-blocks neighboring the current block.

For example, when motion compensation is performed for a sub-block located at a left boundary within a current block, motion information may be derived from at least one of a left neighboring block, an upper left neighboring block, and a lower left neighboring block among neighboring sub-blocks neighboring the current block.

In addition, when motion compensation is performed for a sub-block located at an upper left boundary within the current block, motion information may be derived from at least one of an upper neighboring block, a left neighboring block, and an upper left neighboring block among neighboring sub-blocks neighboring the current block.

In addition, when motion compensation is performed for a sub-block located at an upper right boundary within the current block, motion information may be derived from at least one of an upper neighboring block, an upper left neighboring block, and an upper right neighboring block among neighboring sub-blocks neighboring the current block.

Meanwhile, when motion compensation is performed for a sub-block located at a lower left boundary within a current block, motion information may be derived from at least one of a left neighboring block, an upper left neighboring block, and a lower left neighboring block among neighboring sub-blocks adjacent to the current block.

Alternatively, in order to improve coding efficiency, when motion compensation is performed for a sub-block of at least one of a plurality of sub-blocks within a current block that are not located at an upper boundary and a left boundary, a maximum of 8 pieces of motion information may be derived for generating the second prediction block. That is, motion information for generating the second prediction block may be derived based on the 8-connection.

For example, for a sub-block within the current block, motion information may be derived from at least one of an upper neighboring block, a left neighboring block, a lower neighboring block, a right neighboring block, an upper left neighboring block, a lower right neighboring block, and an upper right neighboring block included in the current block as neighboring sub-blocks adjacent to the current sub-block.

In addition, motion information for generating the second prediction block may be derived from the co-located sub-blocks within the co-located picture. In addition, motion information for generating the second prediction block may be derived from encoded/decoded blocks within the reference picture that are adjacent to the lower and right boundaries of the reference block.

In addition, in order to improve coding efficiency, the number of pieces of motion information used to generate the second prediction block may be determined according to the size and direction of the motion vector.

For example, when the sum of absolute values of x and y components of a motion vector is equal to or greater than J, a maximum of L pieces of motion information may be used. In contrast, when the sum of absolute values of x and y components of the motion vector is smaller than J, a maximum of P pieces of motion information may be used. In this case, J, L and P are zero or positive integers. L and P are preferably different values. However, L and P may be equal to each other.

In addition, when the current block is to be predicted in the merge mode and when at least one of the modified temporal motion vector prediction candidate and the spatio-temporal motion vector prediction candidate is used, at most K pieces of motion information may be used to generate the second prediction block. Here, K may be zero or a positive integer, for example 4.

In addition, when the current block is to be predicted in the decoder-side motion vector derivation mode, at most K pieces of motion information may be used to generate the second prediction block. Here, K may be zero or a positive integer, for example 4.

In addition, when the current block is to be predicted in the affine motion compensation mode, at most K pieces of motion information may be used to generate the second prediction block. Here, K may be zero or a positive integer, for example 4.

Fig. 18 and 19 are diagrams showing the order in which motion information for generating the second prediction block is derived. The motion information for generating the second prediction block may be derived in a predetermined order preset in the encoder and the decoder.

Referring to fig. 18, motion information may be derived from neighboring blocks of the current block in the order of an upper block, a left block, a lower block, and a right block.

Referring to fig. 19, in order to improve coding efficiency, an order in which motion information for generating the second prediction block is derived may be determined based on the position of the current sub-block.

For example, when motion information is derived for a current sub-block located at an upper boundary within a current block, the motion information may be derived from neighboring sub-blocks in the order of (1) an upper neighboring block, (2) a left upper neighboring block, and (3) a right upper neighboring block, which are neighboring sub-blocks adjacent to the current block.

In addition, when motion information is derived for a current sub-block located at a left boundary within a current block, the motion information may be derived from neighboring sub-blocks in the order of (1) a left neighboring block, (2) an upper left neighboring block, and (3) a lower left neighboring block, which are neighboring sub-blocks adjacent to the current block.

In addition, when motion information is derived for a current sub-block located at an upper left boundary within a current block, the motion information may be derived from neighboring sub-blocks in the order of (1) an upper neighboring block, (2) a left neighboring block, and (3) an upper left neighboring block, which are neighboring sub-blocks adjacent to the current block.

In addition, when motion information is derived for a current sub-block located at an upper right boundary within a current block, the motion information may be derived from neighboring sub-blocks in the order of (1) an upper neighboring block, (2) an upper left neighboring block, and (3) an upper right neighboring block, which are neighboring sub-blocks adjacent to the current block.

In addition, when motion information is derived for a current sub-block of the current block located at a lower right boundary, the motion information may be derived from neighboring sub-blocks in the order of (1) a left neighboring block, (2) an upper left neighboring block, and (3) a lower left neighboring block, which are neighboring sub-blocks neighboring the current block.

As in the example of fig. 19, the motion information of the current sub-block within the current sub-block may be derived in the order of (1) the above neighboring block, (2) the left neighboring block, (3) the below neighboring block, (4) the right neighboring block, (5) the above left neighboring block, (6) the below left neighboring block, (7) the below right neighboring block, and (8) the above right neighboring block, which are neighboring sub-blocks adjacent to the current sub-block. Alternatively, the motion information may be derived in a different order than that shown in fig. 19.

On the other hand, the motion information of the co-located sub-block within the co-located picture may be subsequently derived after the motion information of the neighboring sub-block spatially adjacent to the current sub-block is derived. Alternatively, the motion information of the co-located sub-block within the co-located picture may be derived before the motion information of the neighboring sub-block spatially adjacent to the current sub-block is derived.

In addition, motion information of the encoded/decoded blocks located at the lower and right boundaries of the reference block within the reference picture may be subsequently derived after motion information of neighboring sub-blocks spatially adjacent to the current sub-block is derived. Alternatively, the motion information of the encoded/decoded blocks located at the lower and right boundaries of the reference block within the reference picture may be derived before the motion information of the neighboring sub-block spatially neighboring the current sub-block is derived.

Only when the predetermined condition is satisfied, motion information of a neighboring sub-block adjacent to the current block or motion information of a neighboring sub-block adjacent to the current sub-block may be derived as motion information for generating the second prediction block.

For example, when at least one of a neighboring sub-block adjacent to the current block or a neighboring sub-block adjacent to the current sub-block within the current block exists, motion information of the existing neighboring sub-block may be derived as motion information for generating the second prediction block.

Further alternatively, for example, when at least one of a neighboring sub-block adjacent to the current block or a neighboring sub-block adjacent to the current sub-block within the current block is predicted in the inter prediction mode, motion information of the neighboring sub-block predicted in the inter prediction may be derived as motion information for generating the second prediction block. Meanwhile, when at least one of a neighboring sub-block adjacent to the current block or a neighboring sub-block adjacent to the current sub-block within the current block is predicted in the intra prediction mode, motion information of the neighboring sub-block predicted in the intra prediction mode may not be derived as motion information for generating the second prediction block because the neighboring sub-block does not have motion information.

In addition, when the inter prediction indicator of at least one of the neighboring sub-block adjacent to the current block or the neighboring sub-block adjacent to the current sub-block within the current block does not indicate at least one of L0 prediction, L1 prediction, L2 prediction, L3 prediction, unidirectional prediction, bidirectional prediction, three-way prediction, and four-way prediction, motion information for generating the second prediction block may not be derived.

In addition, when the inter prediction indicator used to generate the second prediction block is different from the inter prediction indicator used to generate the first prediction block, motion information used to generate the second prediction block may be derived.

In addition, when a motion vector for generating the second prediction block is different from a motion vector for generating the first prediction block, motion information required for generating the second prediction block may be derived.

In addition, when the reference picture index used to generate the second prediction block is different from the reference picture index used to generate the first prediction block, motion information required to generate the second prediction block may be derived.

In addition, when at least one of the motion vector and the reference picture index for generating the second prediction block is different from at least one of the motion vector and the reference picture index for generating the first prediction block, motion information required for generating the second prediction block may be derived.

In addition, in order to reduce computational complexity, in the case where the inter prediction indicator used to generate the first prediction block indicates unidirectional prediction, when at least one of the reference picture index and the motion vector used to generate the L0 prediction direction and the L1 prediction direction of the second prediction block is different from at least one of the reference picture index and the motion vector used to generate the first prediction block, motion information required to generate the second prediction block may be derived.

In addition, in order to reduce computational complexity, based on an inter prediction indicator used to generate the first prediction block, in the case where the inter prediction indicator indicates bi-prediction, when at least one set of a motion vector and a reference picture index used to generate an L0 prediction direction and an L1 prediction direction of the second prediction block is different from at least one set of a motion vector and a reference picture index used to generate an L0 prediction direction and an L1 prediction direction of the first prediction block, motion information required to generate the second prediction block may be derived.

In addition, in order to reduce computational complexity, when at least one piece of motion information used to generate the second prediction block is different from at least one piece of motion information used to generate the first prediction block, motion information required to generate the second prediction block may be derived.

Fig. 20 is a diagram illustrating an example of determining whether motion information of a specific neighboring sub-block can be used as motion information for generating a second prediction block by comparing POC of a reference picture of a current sub-block with POC of a reference picture of the specific neighboring sub-block.

Referring to fig. 20, in order to reduce computational complexity, when the POC of the reference picture of the current sub-block is equal to that of the reference picture of the neighboring sub-block, motion information of the neighboring sub-block may be used to generate the second prediction block of the current sub-block.

In addition, in order to reduce computational complexity, as in the example of fig. 20, when the POC of the reference picture used to generate the second prediction block is different from the POC of the reference picture used to generate the first prediction block, motion information required to generate the second prediction block may be derived.

In particular, when the POC of the reference picture used to generate the second prediction block is different from the POC of the reference picture used to generate the first prediction block, the motion vector used to generate the second prediction block may be derived by scaling the motion vector used to generate the first prediction block based on the reference picture or the POC of the reference picture.

Fig. 21 is a diagram illustrating an example of using a weight factor when calculating a weighted sum of a first prediction block and a second prediction block.

When calculating the weighted sum of the first prediction block and the second prediction block, different weight factors may be applied to the samples in the block used according to the locations of the samples in the block. In addition, a weighted sum of samples located at the same position in the first prediction block and the second prediction block may be calculated. In this case, when the weighted sum is calculated to produce the final prediction block, at least one of the weight factor and the offset may be used for the calculation.

Here, the weight factor may be a negative value smaller than zero or a positive value larger than zero. The offset may be zero, a negative value less than zero, or a positive value greater than zero.

When the weighted sum of the first prediction block and the second prediction block is calculated, the same weight factor may be applied to all samples in each prediction block.

Referring to fig. 21, for example, the weight factors {3/4,7/8,15/16, and 31/32} may be applied to respective rows or columns of the first prediction block, and the weight factors {1/4,1/8,1/16, and 1/32} may be applied to respective rows or columns of the second prediction block. In this case, the samples in the same row or column may be applied with the same weight factor.

The value of the weight factor increases with decreasing distance from the boundary of the current sub-block. In addition, a weighting factor may be applied to all samples within a sub-block.

In fig. 21, (a), (b), (c) and (d) show cases where the second prediction block is generated by using the motion information of the upper neighboring block, the motion information of the lower neighboring block, the motion information of the left neighboring block and the motion information of the right neighboring block, respectively. Here, the upper second prediction block, the lower second prediction block, the left second prediction block, and the right second prediction block may mean second prediction blocks generated based on motion information of an upper neighboring block, motion information of a lower neighboring block, motion information of a left neighboring block, and motion information of a right neighboring block, respectively.

Fig. 22 is a diagram showing an embodiment in which different weight factors are applied to samples according to positions in a block when calculating a weighted sum of a first prediction block and a second prediction block. In order to improve coding efficiency, the weight factor may vary according to the position of the samples in the block when a weighted sum of the first prediction block and the second prediction block is calculated. That is, the weighted sum may be calculated using a weight factor that is different according to the position of the sample spatially adjacent to the current sub-block. In addition, a weighted sum may be calculated for samples located at the same position in the first prediction block and the second prediction block.

Referring to fig. 22, in the first prediction block, weight factors {1/2,3/4,7/8,15/16,31/32,63/64,127/128,255/256,511/512, and 1023/1024} may be applied to respective samples according to positions, and in the second prediction block, weight factors {1/2,1/4,1/16,1/32,1/64,1/128,1/256,1/512, and 1/1024} may be applied to respective samples according to positions. Here, the weight factor used in at least one of the upper second prediction block, the left second prediction block, the lower second prediction block, and the right second prediction block may be greater than the weight factor used in at least one of the upper left second prediction block, the lower right second prediction block, and the upper right second prediction block.

In addition, the weight factor used in at least one of the upper second prediction block, the left second prediction block, the lower second prediction block, and the right second prediction block may be equal to the weight factor used in at least one of the upper left second prediction block, the lower right second prediction block, and the upper right second prediction block.

In addition, the weight factors of all samples in the second prediction block generated using the motion information of the co-located sub-block in the co-located picture may be equal.

In addition, the weighting factor of the samples in the second prediction block generated using the motion information of the co-located sub-block in the co-located picture may be equal to the weighting factor of the samples in the first prediction block.

In addition, weight factors of all samples within the second prediction block generated using motion information of the encoded/decoded block within the reference picture adjacent to the lower boundary and the right boundary of the reference block may be equal.

In addition, a weight factor of a sample in the second prediction block generated using motion information of an encoded/decoded block within the reference picture adjacent to a lower boundary and a right boundary of the reference block may be equal to a weight factor of a sample in the first prediction block.

In order to reduce computational complexity, the weight factor may vary according to the size of a motion vector of a neighboring sub-block adjacent to the current block or a neighboring sub-block adjacent to the current sub-block within the current block.

For example, when the sum of absolute values of x-component and y-component of motion vector of neighboring sub-block is equal to or greater than a predetermined value, {1/2,3/4,7/8,15/16} may be used as the weight factor of the current sub-block. Conversely, when the sum of the absolute values of the x-component and the y-component of the motion vector of the neighboring sub-block is less than the predetermined value, {7/8,15/16,31/32,63/64} may be used as the weight factor of the current sub-block. In this case, the predetermined value may be zero or a positive integer.

In addition, in order to reduce computational complexity, the weight factor may be changed according to the size or direction of the motion vector of the current sub-block.

For example, when the absolute value of the x component of the motion vector of the current sub-block is equal to or greater than a predetermined value, {1/2,3/4,7/8,15/16} may be used as the weight factor of the left-side neighboring sub-block and the right-side neighboring sub-block. In contrast, when the absolute value of the x component of the motion vector of the current sub-block is less than the predetermined value, {7/8,15/16,31/32,63/64} may be used as the weight factor of the left-side neighboring sub-block and the right-side neighboring sub-block. In this case, the predetermined value may be zero or a positive integer.

For example, when the absolute value of the y component of the motion vector of the current sub-block is equal to or greater than a predetermined value, {1/2,3/4,7/8,15/16} may be used as the weight factor of the upper neighboring sub-block and the lower neighboring sub-block. In contrast, when the absolute value of the y component of the motion vector of the current sub-block is less than the predetermined value, {7/8,15/16,31/32,63/64} may be used as the weight factor of the upper neighboring sub-block and the lower neighboring sub-block. In this case, the predetermined value may be zero or a positive integer.

For example, when the sum of absolute values of x-component and y-component of the motion vector of the current sub-block is equal to or greater than a predetermined value, {1/2,3/4,7/8,15/16} may be used as the weight factor of the current sub-block. Conversely, when the sum of absolute values of x-component and y-component of the motion vector of the current sub-block is smaller than the predetermined value, {7/8,15/16,31/32,63/64} may be used as the weight factor of the current sub-block. In this case, the predetermined value may be zero or a positive integer.

The weighted sum may not be calculated for all samples within a sub-block, but only for some of the K rows/columns adjacent to each block boundary. In this case, K may be zero or a positive integer, for example 1 or 2.

In addition, when the size of the current block is smaller than nxm, a weighted sum may be calculated for samples in K rows/columns adjacent to each block boundary. In addition, when the current block is divided into sub-blocks and motion compensation is performed based on the sub-blocks, a weighted sum may be calculated for samples in K rows/columns adjacent to each block boundary. Here, K may be zero or a positive integer, for example 1 or 2. In addition, N and M may be positive integers. For example, N and M may be 4 or greater and 8 or greater than 8.N and M may be equal or may not be equal.

Alternatively, a weighted sum may be calculated for samples in K rows/columns adjacent to each block boundary according to the type of color component of the current block. In this case, K may be zero or a positive integer, for example 1 or 2. When the current block is a block of luminance components, a weighted sum may be calculated for samples in two rows/columns adjacent to each block boundary. On the other hand, when the current block is a chrominance component block, a weighted sum may be calculated for samples in one row/column adjacent to each block boundary.

In addition, when the current block is to be predicted in the merge mode and has at least one of the improved temporal motion vector prediction candidate and the spatio-temporal motion vector prediction candidate, the weighted sum may be calculated only for the samples in K rows/columns adjacent to each block boundary.

In addition, when the current block is to be predicted in the decoder-side motion vector derivation mode, a weighted sum may be calculated for samples in K rows/columns adjacent to each block boundary. In addition, when the current block is to be predicted in affine motion compensation mode, a weighted sum may be calculated for samples in K rows/columns adjacent to each block boundary. In these cases, K may be zero or a positive integer, such as 1 or 2.

Meanwhile, in order to reduce computational complexity, a weighted sum may be calculated for samples in K rows/columns adjacent to each block boundary according to the size of a sub-block of the current block.

For example, when the sub-block of the current block has a size of 4×4, a weighted sum may be calculated for samples in one, two, three, or four rows/columns adjacent to each block boundary. Alternatively, when the sub-block of the current block has a size of 8×8, a weighted sum may be calculated for samples in one, two, three, four, five, six, seven or eight rows/columns adjacent to each block boundary. In this case, K may be zero or a positive integer. The maximum value of K may correspond to the number of rows or columns included in the sub-block.

In addition, to reduce computational complexity, a weighted sum may be calculated for samples in one or two rows/columns within a sub-block that are adjacent to each block boundary.

In addition, in order to reduce computational complexity, a weighted sum may be calculated for samples in K rows/columns adjacent to each block boundary according to the number of pieces of motion information used to generate the second prediction block. Here, K may be zero or a positive integer.

For example, when the number of pieces of motion information is smaller than a predetermined value, a weighted sum may be calculated for samples in two rows/columns adjacent to each block boundary.

In addition, when the number of pieces of motion information is equal to or greater than the predetermined value, a weighted sum may be calculated for samples in one row/column adjacent to each block boundary.

In addition, in order to reduce computational complexity, a weighted sum may be calculated for samples in K rows/columns adjacent to each block boundary according to the inter prediction indicator of the current block. K may be zero or a positive integer.

For example, when the inter prediction indicator indicates unidirectional prediction, a weighted sum may be calculated for samples in two rows/columns adjacent to each block boundary. Meanwhile, when the inter prediction indicator indicates bi-directional prediction, a weighted sum may be calculated for samples in one row/column adjacent to each block boundary.

In addition, in order to reduce computational complexity, a weighted sum may be calculated for samples in K rows/columns adjacent to each block boundary according to POC of a reference picture of the current block. Here, K may be zero or a positive integer.

For example, when the difference between the POC of the current picture and the POC of the reference picture is less than a predetermined value, a weighted sum may be calculated for samples in two rows/columns adjacent to each block boundary. In contrast, when the difference between the POC of the current picture and the POC of the reference picture is equal to or greater than the predetermined value, a weighted sum may be calculated for samples in one row/column adjacent to each block boundary.

In addition, in order to reduce the computational complexity, a weighted sum may be calculated for samples in K rows/columns adjacent to each block boundary according to the motion vector of a neighboring sub-block adjacent to the current block or the size of a motion vector of a neighboring sub-block adjacent to the current sub-block within the current block. Here, K may be zero or a positive integer.

For example, when the sum of absolute values of x-component and y-component of motion vectors of neighboring sub-blocks is equal to or greater than a predetermined value, a weighted sum may be calculated for samples in two rows/columns adjacent to each block boundary. Conversely, when the sum of the absolute values of the x-component and y-component of the motion vector of the neighboring sub-block is less than the predetermined value, a weighted sum may be calculated for samples in one row/column adjacent to each block boundary. In this case, the predetermined value may be zero or a positive integer.

In addition, in order to reduce the computational complexity, a weighted sum may be calculated for samples in K rows/columns adjacent to each block boundary according to the size or direction of the motion vector of the current sub-block. Here, K may be zero or a positive integer.

For example, when the absolute value of the x-component of the motion vector of the current sub-block is equal to or greater than a predetermined value, a weighted sum may be calculated for samples in two rows/columns adjacent to each of the left and right boundaries. In contrast, when the absolute value of the x-component of the motion vector of the current sub-block is smaller than the predetermined value, a weighted sum may be calculated for samples in one row/column adjacent to each of the left and right boundaries. In this case, the predetermined value may be zero or a positive integer.

For example, when the absolute value of the y component of the motion vector of the current sub-block is equal to or greater than a predetermined value, a weighted sum may be calculated for samples in two rows/columns adjacent to each of the upper and lower boundaries. In contrast, when the absolute value of the y component of the motion vector of the current sub-block is smaller than the predetermined value, a weighted sum may be calculated for samples in one row/column adjacent to the upper and lower boundaries. In this case, the predetermined value may be zero or a positive integer.

For example, when the sum of absolute values of x and y components of the motion vector is equal to or greater than a predetermined value, a weighted sum may be calculated for samples in two rows/columns adjacent to each block boundary. Conversely, when the sum of the absolute values of the x-component and the y-component of the motion vector is less than the predetermined value, a weighted sum may be calculated for samples in one row/column adjacent to each block boundary. In this case, the predetermined value may be zero or a positive integer.

Fig. 23 is a diagram illustrating an embodiment of sequentially cumulatively calculating a weighted sum of a first prediction block and a second prediction block in a predetermined order during overlapped block motion compensation. The weighted sum of the first prediction block and the second prediction block may be added in a predetermined order preset in the encoder and the decoder.

Referring to fig. 23, motion information may be derived from neighboring sub-blocks in the order of an upper block, a left block, and a right block neighboring the current sub-block; the derived motion information may be used in this order to generate a second prediction block; a weighted sum of the first prediction block and the second prediction block may be calculated. When the weighted sum is calculated in a predetermined order, the weighted sum may be accumulated in the above order, and thus the final predicted block of the current block may be derived.

As in the example of fig. 23, a weighted sum of the first prediction block and the second prediction block generated using the motion information of the upper block is calculated so that a first weighted sum result block can be generated. Then, a weighted sum of the first weighted sum result block and a second prediction block generated using motion information of the left block may be calculated such that the second weighted sum result block may be generated. Then, a weighted sum of the second weighted sum result block and a second prediction block generated using motion information of the lower block may be calculated so that a third weighted sum result block may be generated. Finally, a weighted sum of the third weighted sum result block and the second prediction block generated using the motion information of the right block may be calculated so that a final prediction block may be generated.

On the other hand, the order in which the motion information for generating the second prediction block is derived, and the order in which the second prediction block is used to calculate the weighted sum of the first prediction block and the second prediction block may be different.

Fig. 24 is a diagram illustrating an embodiment of calculating a weighted sum of a first prediction block and a second prediction block during overlapped block motion compensation. In order to improve coding efficiency, when the weighted sum is calculated, the weighted sum is not calculated cumulatively in order, but the weighted sum of the first prediction block and the second prediction block generated using motion information of at least one of the upper block, the left block, the lower block, and the right block may be calculated regardless of the order in which the second prediction block is generated.

In this case, the weight factors for the second prediction block generated using the motion information of at least one of the upper block, the left block, the lower block, and the right block may be equal to each other. Alternatively, the weight factor for the second prediction block and the weight factor for the first prediction block may be equal.

Referring to fig. 24, a plurality of recording spaces corresponding to the total number of first and second prediction blocks are prepared, and when a final prediction block is generated, a weighted sum of the first prediction block and each second prediction block may be calculated while using equal weight factors for all the second prediction blocks.

In addition, even for the second prediction block generated using the motion information of the co-located sub-block within the co-located picture, a weighted sum of the first prediction block and the second prediction block can be calculated.

When the size of the current block is K samples or less, information determining whether to perform overlapped block motion compensation on the current block may be entropy encoded/entropy decoded. Here, K may be a positive integer, for example 256.

When the size of the current block is greater than K samples or when the current block is to be predicted in a specific inter prediction mode (e.g., a merge mode or an advanced motion vector prediction mode), information determining whether to perform overlapped block motion compensation on the current block may not be entropy encoded/entropy decoded, but overlapped block motion compensation may be substantially performed.

The encoder may perform prediction after subtracting the second prediction block from the original signal of the boundary region of the current block when performing motion prediction. In this case, when the second prediction block is subtracted from the original signal, a weighted sum of the second prediction block and the original signal may be calculated.

For a current block that is not subjected to overlapped block motion compensation, an Enhanced Multiple Transform (EMT) in which a Discrete Cosine Transform (DCT) and a Discrete Sine Transform (DST) are applied to a vertical/horizontal transform may not be applied. That is, the enhanced multiple transform may be applied only to the current block on which the overlapped block motion compensation is performed.

Fig. 25 is a flowchart illustrating an image decoding method according to an embodiment of the present invention.

Referring to fig. 25, a first prediction block of the current block may be generated using motion information of the current block (step S2510).

Next, motion information that can be used to generate the second prediction block may be determined among the motion information of at least one neighboring sub-block of the current sub-block (step S2520).

In this case, motion information that can be used to generate the second prediction block may be determined based on at least one of the size and direction of the motion vector of the neighboring sub-block.

In step S2520, which is to determine motion information that can be used to generate the second prediction block, the motion information that can be used to generate the second prediction block may be determined based on the picture count (POC) of reference pictures of neighboring sub-blocks and the POC of reference pictures of the current block. In particular, the motion information of the neighboring sub-block may be determined as motion information that can be used to generate the second prediction block only when the POC of the reference picture of the neighboring sub-block is equal to the POC of the reference picture of the current block.

The current sub-block has a square shape or a non-square shape.

The at least one second prediction block may be generated using the determined motion information determined at step S2520 (step S2530).

The motion information of at least one neighboring sub-block may be used to generate at least one second prediction block only when the current block has neither the motion vector derivation mode nor the affine motion compensation mode.

Next, a final prediction block may be generated based on a weighted sum of the first prediction block of the current block and the at least one second prediction block of the current sub-block (step S2540).

When the current sub-block is included in the boundary region of the current block, the final prediction block may be generated by obtaining a weighted sum of samples in several rows or columns of the first prediction block adjacent to the boundary and samples in several rows or columns of the second prediction block adjacent to the boundary.

Here, the samples in the rows or columns of the first prediction block adjacent to the boundary and the samples in the rows or columns of the second prediction block adjacent to the boundary may be determined based on at least one of a block size of the current sub-block, a size and direction of a motion vector of the current sub-block, an inter prediction indicator of the current block, and a POC of a reference picture of the current block.

In the final prediction block generation step S2540, the weighted sum may be calculated while different weight factors are applied to samples in the first and second prediction blocks according to at least one of the size and direction of the motion vector of the current sub-block.

Each step of the image decoding method of fig. 25 can be similarly applied to the corresponding step of the image encoding method of the present invention.

The bit stream generated by performing the image encoding method according to the present invention may be recorded in a recording medium.

The above embodiments may be performed in the same way in the encoder and decoder.

The order applied to the above embodiments may be different between the encoder and the decoder, or the order applied to the above embodiments may be the same in the encoder and the decoder.

The above embodiments may be performed for each of the luminance signal and the chrominance signal, or may be performed identically for the luminance signal and the chrominance signal.

The block shape to which the above embodiments of the present invention are applied may have a square shape or a non-square shape.

The above embodiments of the present invention may be applied according to the size of at least one of a coded block, a predicted block, a transformed block, a current block, a coded unit, a predicted unit, a transformed unit, a unit, and a current unit. Here, the size may be defined as a minimum size or a maximum size or both of the minimum size and the maximum size so that the above embodiments are applied, or may be defined as a fixed size to which the above embodiments are applied. In addition, in the above embodiments, the first embodiment may be applied to the first size, and the second embodiment may be applied to the second size. In other words, the above embodiments may be applied in combination according to the size. In addition, when the size is equal to or larger than the minimum size and equal to or smaller than the maximum size, the above embodiments can be applied. In other words, when the block size is included in a specific range, the above embodiments may be applied.

For example, when the size of the current block is 8×8 or more, the above embodiments may be applied. For example, when the size of the current block is 4×4 or more, the above embodiments may be applied. For example, when the size of the current block is 16×16 or more, the above embodiments may be applied. For example, when the size of the current block is equal to or greater than 16×16 and equal to or less than 64×64, the above embodiments may be applied.

The above embodiments of the present invention can be applied according to the time layer. To identify that the above embodiments may be applied to temporal layers, the respective identifiers may be signaled, and the above embodiments may be applied to particular temporal layers identified by the respective identifiers. Here, the identifier may be defined as a lowest layer or a highest layer or both of the lowest layer and the highest layer to which the above embodiments may be applied, or may be defined as a specific layer indicating the applied embodiment. In addition, a fixed temporal layer of the applied embodiment may be defined.

For example, when the temporal layer of the current image is the lowest layer, the above embodiments may be applied. For example, when the temporal layer identifier of the current image is 1, the above embodiment may be applied. For example, when the temporal layer of the current image is the highest layer, the above embodiments may be applied.

The strip types to which the above embodiments of the present invention are applied may be defined, and the above embodiments may be applied according to the respective strip types.

In the above-described embodiment, the method is described based on a flowchart having a series of steps or units, but the present invention is not limited to the order of the steps, and some steps may be performed simultaneously with other steps or may be performed in a different order from other steps. Furthermore, it will be understood by those of ordinary skill in the art that steps in the flowcharts are not mutually exclusive, and other steps may be added to the flowcharts, or some steps may be deleted from the flowcharts without affecting the scope of the present invention.

Embodiments include various aspects of the examples. All possible combinations for the various aspects may not be described, but one of ordinary skill in the art will be able to recognize different combinations. Accordingly, the present invention is intended to embrace all such alternatives, modifications and variances which fall within the scope of the appended claims.

The embodiments of the present invention can be implemented in the form of program instructions executable by various computer components and recorded on a computer-readable recording medium. The computer readable recording medium may include individual program instructions, data files, data structures, etc., or a combination of program instructions, data files, data structures, etc. The program instructions recorded in the computer-readable recording medium may be specially designed and constructed for the present invention, or they may be known to those skilled in the art of computer software technology. Examples of the computer-readable recording medium include: magnetic recording media (such as hard disks, floppy disks, and magnetic tape); an optical data storage medium (such as a CD-ROM or DVD-ROM); magneto-optical media (such as floppy disks); and hardware devices that are specially constructed for storing and carrying out the program instructions (e.g., read-only memory (ROM), random Access Memory (RAM), flash memory, etc.). Examples of program instructions include not only machine language code, which is formed by a compiler, but also high-level language code that may be implemented by a computer using an interpreter. The hardware devices may be configured to be operated by one or more software modules to perform the processes according to the invention, and vice versa.

Although the present invention has been described in terms of specific terms (such as detailed elements) and limited embodiments and figures, they are merely provided to facilitate a more general understanding of the present invention, and the present invention is not limited to the above-described embodiments. Those skilled in the art to which the invention pertains will appreciate that various modifications and changes can be made from the above description.

The spirit of the invention should, therefore, not be limited to the above-described embodiments, and the full scope of the appended claims and equivalents thereof should be accorded the full scope of the invention.

Industrial applicability

The present invention can be applied to an apparatus for encoding/decoding an image.

Claims

1. A decoding method, comprising:

generating a first prediction block and a second prediction block of the current block; and

a final prediction block of the current block is generated using the first prediction block and the second prediction block,

wherein a first prediction block is generated by performing a first prediction with respect to a current block,

generating a second prediction block by performing a second prediction with respect to the current block, and

the first prediction is inter prediction.

2. The decoding method of claim 1, wherein the first prediction block and the second prediction block are generated using one or more merge candidates in one merge candidate list for the current block.

3. The decoding method of claim 1, wherein the final prediction block is generated using a weighted sum of samples of a first prediction block and samples of a second prediction block, and

the weighted sum is applied only to a partial region of the current block.

4. The decoding method of claim 1, wherein the current block includes a first region and a second region designated by triangulating for the current block,

the prediction value generated by the first prediction for the first prediction block includes a prediction value for the first region,

the prediction value generated by the second prediction for the second prediction block includes a prediction value for the second region.

5. The decoding method of claim 1, wherein the final prediction block is generated using a weighted sum of samples of a first prediction block and samples of a second prediction block.

6. The decoding method of claim 5, wherein a weight pair is selected from a plurality of weight pairs for the weighted sum,

a first weight of the selected pair of weights is applied to each sample of the first prediction block,

the second weight of the selected pair of weights is applied to each sample of the second prediction block.

7. A method of encoding, comprising:

the first prediction is inter prediction.

8. The encoding method of claim 7, wherein the first prediction block and the second prediction block are generated using one or more merge candidates in one merge candidate list for the current block.

9. The encoding method of claim 7, wherein the final prediction block is generated using a weighted sum of samples of a first prediction block and samples of a second prediction block, and

the weighted sum is applied only to a partial region of the current block.

10. The encoding method of claim 7, wherein the current block includes a first region and a second region designated by triangulating the current block,

11. The encoding method of claim 7, wherein the final prediction block is generated using a weighted sum of samples of a first prediction block and samples of a second prediction block.

12. The encoding method of claim 7, wherein a weight pair is selected from a plurality of weight pairs for the weighted sum,

13. A computer-readable recording medium storing a bit stream generated by the encoding method of claim 7.

14. A computer-readable recording medium storing a bitstream generated by a video encoding method, wherein the video encoding method comprises:

the first prediction is inter prediction.

15. A computer readable recording medium storing a bitstream comprising computer executable code, wherein the computer executable code, when executed by a processor of a video decoding device, causes the processor to perform the steps of:

Decoding prediction mode information indicating a prediction mode for the current block;

generating a first prediction block and a second prediction block of the current block based on the prediction mode; and

the first prediction is inter prediction.

16. A computer readable recording medium storing a bitstream comprising computer executable code, wherein the computer executable code, when executed by a processor of a video decoding device, causes the processor to perform the steps of:

generating a first prediction block and a second prediction block of a current block based on prediction mode information in the computer executable code; and

generating a final prediction block of the current block using the first prediction block and the second prediction block,

the first prediction is inter prediction.

17. The computer-readable recording medium of claim 16, wherein the first prediction block and the second prediction block are generated using one or more merge candidates in a merge candidate list for the current block.

18. The computer-readable recording medium of claim 16, wherein the final prediction block is generated using a weighted sum of samples of a first prediction block and samples of a second prediction block, and

the weighted sum is applied only to a partial region of the current block.

19. The computer-readable recording medium of claim 16, wherein the current block includes a first area and a second area designated by triangulating for the current block,

20. The computer-readable recording medium of claim 16, wherein the final prediction block is generated using a weighted sum of samples of a first prediction block and samples of a second prediction block.

21. The computer-readable recording medium of claim 20, wherein a weight pair is selected from a plurality of weight pairs for the weighted sum,

22. A decoding method, comprising:

determining first motion information and second motion information; and

third motion information of the current block is generated using the first motion information and the second motion information,

wherein the first motion information is information for inter prediction.

23. The decoding method of claim 22, wherein the first motion information and the second motion information are determined based on a merge candidate list for the current block.

24. The decoding method of claim 22, wherein the current block is partitioned into a plurality of sub-blocks, and

a plurality of fourth motion information of the plurality of sub-blocks is determined in sub-block units based on first motion information of a first merge candidate from a candidate merge list and second motion information of a second merge candidate from the candidate merge list.

25. A method of encoding, comprising:

determining first motion information and second motion information; and

wherein the first motion information is information for inter prediction.

26. The encoding method of claim 25, wherein the first motion information and the second motion information are determined based on a merge candidate list for the current block.

27. The encoding method of claim 25, wherein the current block is partitioned into a plurality of sub-blocks, and

28. A computer-readable recording medium storing a bit stream generated by the encoding method of claim 25.

29. A computer-readable recording medium storing a bitstream generated by a video encoding method, wherein the video encoding method comprises:

determining first motion information and second motion information; and

wherein the first motion information is information for inter prediction.

30. A computer readable recording medium storing a bitstream comprising computer executable code, wherein the computer executable code, when executed by a processor of a video decoding device, causes the processor to perform the steps of:

Determining first motion information and second motion information based on the prediction mode; and

wherein the first motion information is information for inter prediction.

31. A computer readable recording medium storing a bitstream comprising computer executable code, wherein the computer executable code, when executed by a processor of a video decoding device, causes the processor to perform the steps of:

determining first motion information and second motion information based on prediction mode information in the computer executable code; and

wherein the first motion information is information for inter prediction.

32. The computer-readable recording medium of claim 31, wherein the first motion information and the second motion information are determined based on a merge candidate list for the current block.

33. The computer readable recording medium of claim 31, wherein the current block is partitioned into a plurality of sub-blocks, and