WO2024010362A1

WO2024010362A1 - Image encoding/decoding method and recording medium storing bitstream

Info

Publication number: WO2024010362A1
Application number: PCT/KR2023/009505
Authority: WO
Inventors: 임성원
Original assignee: 주식회사 케이티
Priority date: 2022-07-06
Filing date: 2023-07-05
Publication date: 2024-01-11

Abstract

An image decoding method, according to the present disclosure, comprises the steps of: deriving a first prediction block for a current block; deriving a reference block for the current block; and updating the first prediction block on the basis of a second prediction block derived from the reference block. In the current block, a third prediction sample at a current prediction location is derived on the basis of a weighted sum calculation based on a first prediction sample in the first prediction block and a second prediction sample in the second prediction block, and a weight for the weighted sum calculation may be determined on the basis of a classification result at the current prediction location.

Description

Video encoding/decoding method and recording medium for storing bitstream

This disclosure relates to a video signal processing method and apparatus.

Recently, demand for high-resolution, high-quality images such as HD (High Definition) images and UHD (Ultra High Definition) images is increasing in various application fields. As video data becomes higher resolution and higher quality, the amount of data increases relative to existing video data. Therefore, when video data is transmitted using media such as existing wired or wireless broadband lines or stored using existing storage media, transmission costs and Storage costs increase. High-efficiency video compression technologies can be used to solve these problems that arise as video data becomes higher resolution and higher quality.

Inter-screen prediction technology that predicts the pixel value included in the current picture from pictures before or after the current picture using video compression technology, intra-screen prediction technology that predicts the pixel value included in the current picture using pixel information in the current picture, There are various technologies, such as entropy coding technology, which assigns short codes to values with a high frequency of occurrence and long codes to values with a low frequency of occurrence. Using these video compression technologies, video data can be effectively compressed and transmitted or stored.

Meanwhile, as the demand for high-resolution video increases, the demand for three-dimensional video content as a new video service is also increasing. Discussions are underway regarding video compression technology to effectively provide high-resolution and ultra-high-resolution stereoscopic video content.

The purpose of the present disclosure is to provide a method for improving the accuracy of a prediction signal and an apparatus for performing the same.

The purpose of the present disclosure is to provide a method for adaptively determining weights on a sample or sub-block basis to improve prediction signals, and an apparatus for performing the same.

The purpose of the present disclosure is to provide a method for adaptively determining weights applied to an L0 prediction block and an L1 prediction block based on the prediction accuracy of each of the L0 prediction block and the L1 prediction block, and an apparatus for performing the same.

The technical problems to be achieved by this disclosure are not limited to the technical problems mentioned above, and other technical problems not mentioned can be clearly understood by those skilled in the art from the description below. You will be able to.

An image decoding method according to the present disclosure includes deriving a first prediction block for a current block; Deriving a reference block for the current block; and updating the first prediction block based on a second prediction block derived from the reference block. At this time, the third prediction sample at the current prediction position in the current block is derived based on a weighted sum operation based on the first prediction sample in the first prediction block and the second prediction sample in the second prediction block, Weights for calculating the weighted sum may be determined based on the classification result at the current prediction position.

An image encoding method according to the present disclosure includes deriving a first prediction block for a current block; Deriving a reference block for the current block; and updating the first prediction block based on a second prediction block derived from the reference block. At this time, the third prediction sample at the current prediction position in the current block is derived based on a weighted sum operation based on the first prediction sample in the first prediction block and the second prediction sample in the second prediction block, The weight for the weighted sum calculation may be determined based on the classification result at the current prediction position.

In the image encoding/decoding method according to the present disclosure, classification of the current prediction position may be performed using a residual sample of a position corresponding to the current prediction position in the reference block.

In the image encoding/decoding method according to the present disclosure, when the absolute value of the residual sample is less than the threshold, the current predicted position is classified into the first group, and when the absolute value of the residual sample is greater than or equal to the threshold, the current predicted position is classified into the first group. The current predicted location can be classified into the second group.

In the video encoding/decoding method according to the present disclosure, the weight assigned to the second prediction sample is determined when the current prediction position is classified into the first group. It can be set to a larger value.

In the image encoding/decoding method according to the present disclosure, the threshold value may be derived based on at least one of the minimum value, maximum value, or average value of residual samples in the reference block.

In the image encoding/decoding method according to the present disclosure, the reference block is determined based on a reference template in the reference picture, the reference template is a region with the lowest cost compared to the current template in the reference picture, and the current template is The template may be composed of previously restored samples surrounding the current block.

In the video encoding/decoding method according to the present disclosure, the reference picture may have a predefined index in a reference picture list.

In the video encoding/decoding method according to the present disclosure, the reference picture may have the closest distance to the current picture among reference pictures.

In the video encoding/decoding method according to the present disclosure, whether to perform an update on the first prediction block is determined by the size of the current block, the prediction mode of the current block, the inter prediction mode of the current block, or the current block. It may be determined based on at least one of the prediction directions of the block.

In the image encoding/decoding method according to the present disclosure, the classification may be performed on a sub-block basis.

In the image encoding/decoding method according to the present disclosure, the classification of the current sub-block within the current block is the minimum, maximum, or average value of residual samples included in the reference sub-block corresponding to the current sub-block within the reference block. It can be performed using .

In the video encoding/decoding method according to the present disclosure, classification of the current sub-block within the current block is performed using a residual sample at a predefined position within the reference sub-block corresponding to the current sub-block within the reference block. It can be.

In the video encoding/decoding method according to the present disclosure, the predefined position may be the upper left position or the center position.

The features briefly summarized above with respect to the present disclosure are merely exemplary aspects of the detailed description of the present disclosure described below, and do not limit the scope of the present disclosure.

According to the present disclosure, prediction accuracy can be improved using a reference block.

According to the present disclosure, prediction accuracy can be improved by adaptively determining weights on a sample or sub-block basis.

According to the present disclosure, encoding/decoding efficiency can be improved by determining the prediction accuracy of each L0 prediction block and L1 prediction block in the decoder using the same method as the encoder.

The effects that can be obtained from the present disclosure are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the description below. will be.

1 is a block diagram showing a video encoding device according to an embodiment of the present disclosure.

Figure 2 is a block diagram showing a video decoding device according to an embodiment of the present disclosure.

3 and 4 are flowcharts of the inter prediction method.

Figure 5 shows an example in which motion estimation is performed.

Figures 6 and 7 show an example in which a prediction block of the current block is generated based on motion information generated through motion estimation.

Figure 8 shows positions referenced to derive motion vector prediction values.

Figure 9 is a diagram for explaining a template-based motion estimation method.

Figure 10 shows examples of template configurations.

Figure 11 is a diagram for explaining a motion estimation method based on a bilateral matching method.

Figure 12 shows a flowchart of a prediction signal improvement method according to an embodiment of the present disclosure.

Figure 13 is a diagram for explaining the process of searching for a reference template similar to the current template in a reference picture.

Figure 14 shows an example in which each of the target positions is classified into one of two groups based on residual samples in the reference block.

Figure 15 shows an example of setting a threshold based on a normal distribution.

Figure 16 shows an example in which prediction accuracy is determined on a sub-block basis.

Figure 17 is to explain an example in which weights for weighted sum calculation are determined based on the current template.

Since the present disclosure can make various changes and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present disclosure to specific embodiments, and should be understood to include all changes, equivalents, and substitutes included in the spirit and technical scope of the present disclosure. While describing each drawing, similar reference numerals are used for similar components.

Terms such as first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, a first component may be referred to as a second component, and similarly, the second component may be referred to as a first component without departing from the scope of the present disclosure. The term and/or includes any of a plurality of related stated items or a combination of a plurality of related stated items.

When a component is said to be "connected" or "connected" to another component, it is understood that it may be directly connected or connected to the other component, but that other components may exist in between. It should be. On the other hand, when it is mentioned that a component is “directly connected” or “directly connected” to another component, it should be understood that there are no other components in between.

The terms used in this application are only used to describe specific embodiments and are not intended to limit the disclosure. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this application, terms such as “comprise” or “have” are intended to designate the presence of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification, but are not intended to indicate the presence of one or more other features. It should be understood that this does not exclude in advance the possibility of the existence or addition of elements, numbers, steps, operations, components, parts, or combinations thereof.

Hereinafter, preferred embodiments of the present disclosure will be described in more detail with reference to the attached drawings. Hereinafter, the same reference numerals will be used for the same components in the drawings, and duplicate descriptions of the same components will be omitted.

Referring to FIG. 1, the image encoding device 100 includes a picture segmentation unit 110,

prediction units

120 and 125, a conversion unit 130, a quantization unit 135, a reordering unit 160, and an entropy encoding unit ( 165), an inverse quantization unit 140, an inverse transform unit 145, a filter unit 150, and a memory 155.

Each component shown in FIG. 1 is shown independently to represent different characteristic functions in the video encoding device, and does not mean that each component is comprised of separate hardware or a single software component. That is, each component is listed and included as a separate component for convenience of explanation, and at least two of each component can be combined to form one component, or one component can be divided into a plurality of components to perform a function, and each of these components can be divided into a plurality of components. Integrated embodiments and separate embodiments of the constituent parts are also included in the scope of the present disclosure as long as they do not deviate from the essence of the present disclosure.

Additionally, some components may not be essential components that perform essential functions in the present disclosure, but may simply be optional components to improve performance. The present disclosure can be implemented by including only essential components for implementing the essence of the present disclosure, excluding components used only to improve performance, and a structure that includes only essential components excluding optional components used only to improve performance. is also included in the scope of rights of this disclosure.

The picture division unit 110 may divide the input picture into at least one processing unit. At this time, the processing unit may be a prediction unit (PU), a transformation unit (TU), or a coding unit (CU). The picture division unit 110 divides one picture into a combination of a plurality of coding units, prediction units, and transformation units, and combines one coding unit, prediction unit, and transformation unit based on a predetermined standard (for example, a cost function). You can encode the picture by selecting .

For example, one picture may be divided into a plurality of coding units. To partition the coding unit in a picture, a recursive tree structure such as Quad Tree, Ternary Tree, or Binary Tree can be used, which can be used to divide one image or the largest coding unit. A coding unit that is divided into other coding units with the coding unit as the root may be divided into child nodes equal to the number of divided coding units. A coding unit that is no longer divided according to certain restrictions becomes a leaf node. For example, when it is assumed that quad tree partitioning is applied to one coding unit, one coding unit may be split into up to four different coding units.

Hereinafter, in the embodiments of the present disclosure, the coding unit may be used to mean a unit that performs encoding or may be used to mean a unit that performs decoding.

A prediction unit may be divided into at least one square or rectangular shape of the same size within one coding unit, and any one of the prediction units divided within one coding unit may be a prediction unit of another prediction unit. It may be divided to have a different shape and/or size than the unit.

During intra-screen prediction, the conversion unit and prediction unit may be set to be the same. At this time, after dividing the coding unit into a plurality of transformation units, intra-screen prediction may be performed for each transformation unit. A coding unit may be divided in the horizontal or vertical direction. The number of transformation units generated by dividing the coding unit may be 2 or 4, depending on the size of the coding unit.

The

prediction units

120 and 125 may include an inter-prediction unit 120 that performs inter-prediction and an intra-prediction unit 125 that performs intra-prediction. It is possible to determine whether to use inter-prediction or intra-prediction for a coding unit, and determine specific information (eg, intra-prediction mode, motion vector, reference picture, etc.) according to each prediction method. At this time, the processing unit in which the prediction is performed and the processing unit in which the prediction method and specific contents are determined may be different. For example, the prediction method and prediction mode are determined in coding units, and prediction may be performed in prediction units or transformation units. The residual value (residual block) between the generated prediction block and the original block may be input to the conversion unit 130. Additionally, prediction mode information, motion vector information, etc. used for prediction may be encoded in the entropy encoder 165 together with the residual value and transmitted to the decoding device. When using a specific encoding mode, it is possible to encode the original block as is and transmit it to the decoder without generating a prediction block through the

prediction units

120 and 125.

The inter-picture prediction unit 120 may predict a prediction unit based on information on at least one picture among the pictures before or after the current picture, and in some cases, based on information on a partially encoded region within the current picture. You can also predict prediction units. The inter-screen prediction unit 120 may include a reference picture interpolation unit, a motion prediction unit, and a motion compensation unit.

The reference picture interpolation unit may receive reference picture information from the memory 155 and generate pixel information of an integer number of pixels or less from the reference picture. In the case of luminance pixels, a DCT-based 8-tap interpolation filter with different filter coefficients can be used to generate pixel information of an integer pixel or less in 1/4 pixel units. In the case of color difference signals, a DCT-based 4-tap interpolation filter with different filter coefficients can be used to generate pixel information of an integer pixel or less in 1/8 pixel units.

The motion prediction unit may perform motion prediction based on the reference picture interpolated by the reference picture interpolation unit. Various methods, such as FBMA (Full search-based Block Matching Algorithm), TSS (Three Step Search), and NTS (New Three-Step Search Algorithm), can be used to calculate the motion vector. The motion vector may have a motion vector value in 1/2 or 1/4 pixel units based on the interpolated pixel. The motion prediction unit can predict the current prediction unit by using a different motion prediction method. Various methods can be used as motion prediction methods, such as the skip method, merge method, AMVP (Advanced Motion Vector Prediction) method, and intra block copy method.

The intra-screen prediction unit 125 may generate a prediction block based on reference pixel information, which is pixel information in the current picture. Reference pixel information may be derived from one selected among a plurality of reference pixel lines. The N-th reference pixel line among the plurality of reference pixel lines may include left pixels whose x-axis difference with the top-left pixel in the current block is N and top pixels whose y-axis difference with the top-left pixel is N. The number of reference pixel lines that the current block can select may be 1, 2, 3, or 4.

If the surrounding block of the current prediction unit is a block that performed inter-screen prediction, and the reference pixel is a pixel that performed inter-screen prediction, the reference pixel included in the block that performed inter-screen prediction is used to perform intra-screen prediction around the surrounding reference pixel. It can be used instead of the reference pixel information of the block. That is, when a reference pixel is not available, information on the unavailable reference pixel can be replaced with information on at least one of the available reference pixels.

In intra-screen prediction, the prediction mode can include a directional prediction mode that uses reference pixel information according to the prediction direction and a non-directional mode that does not use directional information when performing prediction. The mode for predicting luminance information and the mode for predicting chrominance information may be different, and the intra-screen prediction mode information used to predict luminance information or predicted luminance signal information may be used to predict chrominance information. .

When performing intra-screen prediction, if the size of the prediction unit and the size of the conversion unit are the same, the screen for the prediction unit is based on the pixel on the left, the pixel on the upper left, and the pixel on the top of the prediction unit. My prediction can be carried out.

The intra-screen prediction method can generate a prediction block after applying a smoothing filter to the reference pixel according to the prediction mode. Depending on the selected reference pixel line, whether to apply a smoothing filter may be determined.

To perform the intra prediction method, the intra prediction mode of the current prediction unit can be predicted from the intra prediction mode of prediction units existing around the current prediction unit. When predicting the prediction mode of the current prediction unit using predicted mode information from the surrounding prediction unit, if the intra-screen prediction mode of the current prediction unit and the surrounding prediction unit are the same, the current prediction unit and the surrounding prediction unit are predicted using predetermined flag information. Information that the prediction modes of the units are the same can be transmitted, and if the prediction modes of the current prediction unit and neighboring prediction units are different, entropy encoding can be performed to encode the prediction mode information of the current block.

Additionally, based on the prediction units generated by the

prediction units

120 and 125, a residual block may be generated that includes residual information that is the difference between the prediction unit on which prediction was performed and the original block of the prediction unit. The generated residual block may be input to the conversion unit 130.

The transform unit 130 transforms the residual block, including the original block and the residual value information of the prediction unit generated through the

prediction units

120 and 125, into DCT (Discrete Cosine Transform), DST (Discrete Sine Transform), KLT and It can be converted using the same conversion method. Whether to apply DCT, DST, or KLT to transform the residual block is based on at least one of the size of the transformation unit, the shape of the transformation unit, the prediction mode of the prediction unit, or the intra-screen prediction mode information of the prediction unit. can be decided.

The quantization unit 135 may quantize the values converted to the frequency domain by the conversion unit 130. The quantization coefficient may change depending on the block or the importance of the image. The value calculated by the quantization unit 135 may be provided to the inverse quantization unit 140 and the realignment unit 160.

The rearrangement unit 160 may rearrange coefficient values for the quantized residual values.

The rearrangement unit 160 can change the coefficients in a two-dimensional block form into a one-dimensional vector form through a coefficient scanning method. For example, the realignment unit 160 can scan from DC coefficients to coefficients in the high frequency region using a zig-zag scan method and change it into a one-dimensional vector form. Depending on the size of the transformation unit and the intra-screen prediction mode, instead of zig-zag scan, a vertical scan that scans the two-dimensional block-shaped coefficients in the column direction, a horizontal scan that scans the two-dimensional block-shaped coefficients in the row direction, or a two-dimensional A diagonal scan, which scans the block shape coefficients diagonally, may also be used. That is, depending on the size of the transformation unit and the intra-screen prediction mode, it can be determined which scan method among zig-zag scan, vertical scan, horizontal scan, or diagonal scan will be used.

The entropy encoding unit 165 may perform entropy encoding based on the values calculated by the reordering unit 160. Entropy coding can use various coding methods, such as Exponential Golomb, Context-Adaptive Variable Length Coding (CAVLC), and Context-Adaptive Binary Arithmetic Coding (CABAC).

The entropy encoding unit 165 receives the residual value coefficient information and block type information of the coding unit, prediction mode information, division unit information, prediction unit information and transmission unit information, and motion information from the reordering unit 160 and the

prediction units

120 and 125. Various information such as vector information, reference frame information, block interpolation information, and filtering information can be encoded.

The entropy encoding unit 165 may entropy encode the coefficient value of the coding unit input from the reordering unit 160.

The inverse quantization unit 140 and the inverse transformation unit 145 inversely quantize the values quantized in the quantization unit 135 and inversely transform the values transformed in the transformation unit 130. The residual value generated in the inverse quantization unit 140 and the inverse transform unit 145 is combined with the prediction unit predicted through the motion estimation unit, motion compensation unit, and intra-screen prediction unit included in the

prediction units

120 and 125. Reconstructed blocks can be created.

The filter unit 150 may include at least one of a deblocking filter, an offset correction unit, and an adaptive loop filter (ALF).

The deblocking filter can remove block distortion caused by boundaries between blocks in the restored picture. To determine whether to perform deblocking, it is possible to determine whether to apply a deblocking filter to the current block based on the pixels included in several columns or rows included in the block. When applying a deblocking filter to a block, a strong filter or a weak filter can be applied depending on the required deblocking filtering strength. Additionally, when applying a deblocking filter, horizontal filtering and vertical filtering can be processed in parallel when vertical filtering and horizontal filtering are performed.

The offset correction unit may correct the offset of the deblocked image from the original image in pixel units. In order to perform offset correction for a specific picture, the pixels included in the image are divided into a certain number of areas, then the area to perform offset is determined and the offset is applied to that area, or the offset is performed by considering the edge information of each pixel. You can use the method of applying .

Adaptive Loop Filtering (ALF) can be performed based on a comparison between the filtered restored image and the original image. After dividing the pixels included in the image into predetermined groups, filtering can be performed differentially for each group by determining one filter to be applied to that group. Information related to whether to apply ALF may be transmitted for each coding unit (CU), and the shape and filter coefficients of the ALF filter to be applied may vary for each block. Additionally, an ALF filter of the same type (fixed type) may be applied regardless of the characteristics of the block to which it is applied.

The memory 155 may store a reconstructed block or picture calculated through the filter unit 150, and the stored reconstructed block or picture may be provided to the

prediction units

120 and 125 when inter-prediction is performed.

Referring to FIG. 2, the image decoding device 200 includes an entropy decoding unit 210, a reordering unit 215, an inverse quantization unit 220, an inverse transform unit 225, a

prediction unit

230, 235, and a filter unit ( 240) and memory 245 may be included.

When a video bitstream is input from a video encoding device, the input bitstream can be decoded in a procedure opposite to that of the video encoding device.

The entropy decoding unit 210 may perform entropy decoding in a procedure opposite to the procedure in which entropy encoding is performed in the entropy encoding unit of the video encoding device. For example, various methods such as Exponential Golomb, CAVLC (Context-Adaptive Variable Length Coding), and CABAC (Context-Adaptive Binary Arithmetic Coding) may be applied in response to the method performed in the image encoding device.

The entropy decoder 210 can decode information related to intra-screen prediction and inter-screen prediction performed by the encoding device.

The reordering unit 215 may rearrange the bitstream entropy-decoded by the entropy decoding unit 210 based on the method in which the encoder rearranges the bitstream. Coefficients expressed in the form of a one-dimensional vector can be restored and rearranged as coefficients in the form of a two-dimensional block. The reordering unit 215 may receive information related to coefficient scanning performed by the encoder and perform reordering by reverse scanning based on the scanning order performed by the encoder.

The inverse quantization unit 220 may perform inverse quantization based on the quantization parameters provided by the encoding device and the coefficient values of the rearranged blocks.

The inverse transform unit 225 may perform inverse transform, that is, inverse DCT, inverse DST, and inverse KLT, on the transform performed by the transformer, that is, DCT, DST, and KLT, on the quantization result performed by the image encoding device. Inverse transformation may be performed based on the transmission unit determined by the video encoding device. The inverse transform unit 225 of the video decoding device selectively performs transformation techniques (e.g., DCT, DST, KLT) according to a plurality of information such as prediction method, size and shape of the current block, prediction mode, and intra-screen prediction direction. It can be.

The

prediction units

230 and 235 may generate a prediction block based on prediction block generation-related information provided by the entropy decoder 210 and previously decoded block or picture information provided by the memory 245.

As described above, when performing intra-screen prediction in the same manner as the operation of the video encoding device, when the size of the prediction unit and the size of the transformation unit are the same, the pixel existing on the left of the prediction unit, the pixel existing in the upper left, and the upper In-screen prediction is performed for the prediction unit based on the pixels present in the screen. However, when performing intra-screen prediction, if the size of the prediction unit and the size of the conversion unit are different, the reference pixel based on the conversion unit is used to predict the screen. My prediction can be carried out. Additionally, intra-picture prediction using NxN partitioning only for the minimum coding unit can be used.

The

prediction units

230 and 235 may include a prediction unit determination unit, an inter-screen prediction unit, and an intra-screen prediction unit. The prediction unit discriminator receives various information such as prediction unit information input from the entropy decoder 210, prediction mode information of the intra-screen prediction method, and motion prediction-related information of the inter-screen prediction method, and distinguishes the prediction unit from the current encoding unit. , it is possible to determine whether the prediction unit performs inter-screen prediction or intra-screen prediction. The inter-picture prediction unit 230 uses the information required for inter-prediction of the current prediction unit provided by the video encoding device to base the information included in at least one picture of the picture before or after the current picture containing the current prediction unit. You can perform inter-screen prediction for the current prediction unit. Alternatively, inter-screen prediction may be performed based on information on a pre-restored partial region within the current picture including the current prediction unit.

To perform inter-screen prediction, the motion prediction methods of the prediction unit included in the coding unit based on the coding unit are Skip Mode, Merge Mode, AMVP Mode, and In-Screen Block Copy. It is possible to determine which of the modes is used.

The intra-screen prediction unit 235 may generate a prediction block based on pixel information in the current picture. If the prediction unit is a prediction unit that has performed intra-prediction, intra-prediction can be performed based on the intra-prediction mode information of the prediction unit provided by the video encoding device. The intra-screen prediction unit 235 may include an Adaptive Intra Smoothing (AIS) filter, a reference pixel interpolation unit, and a DC filter. The AIS filter is a part that performs filtering on the reference pixels of the current block, and can be applied by determining whether or not to apply the filter according to the prediction mode of the current prediction unit. AIS filtering can be performed on the reference pixel of the current block using the prediction mode and AIS filter information of the prediction unit provided by the video encoding device. If the prediction mode of the current block is a mode that does not perform AIS filtering, the AIS filter may not be applied.

If the prediction mode of the prediction unit is a prediction unit that performs intra-screen prediction based on pixel values by interpolating the reference pixel, the reference pixel interpolator may interpolate the reference pixel to generate a reference pixel in pixel units of an integer value or less. . If the prediction mode of the current prediction unit is a prediction mode that generates a prediction block without interpolating the reference pixel, the reference pixel may not be interpolated. The DC filter can generate a prediction block through filtering when the prediction mode of the current block is DC mode.

The restored block or picture may be provided to the filter unit 240. The filter unit 240 may include a deblocking filter, an offset correction unit, and an ALF.

Information on whether a deblocking filter has been applied to the corresponding block or picture can be provided from the video encoding device, and when a deblocking filter has been applied, information on whether a strong filter or a weak filter has been applied. The deblocking filter of the video decoding device receives information related to the deblocking filter provided by the video encoding device, and the video decoding device can perform deblocking filtering on the corresponding block.

The offset correction unit may perform offset correction on the reconstructed image based on the type of offset correction applied to the image during encoding and offset value information.

ALF can be applied to the coding unit based on ALF application availability information, ALF coefficient information, etc. provided from the coding device. This ALF information may be included and provided in a specific parameter set.

The memory 245 can store the restored picture or block so that it can be used as a reference picture or reference block, and can also provide the restored picture to an output unit.

As described above, hereinafter, in the embodiments of the present disclosure, the term coding unit is used as a coding unit for convenience of explanation, but it may also be a unit that performs not only encoding but also decoding.

In addition, the current block represents an encoding/decoding target block, and depending on the encoding/decoding stage, it may be a coding tree block (or coding tree unit), a coding block (or coding unit), a transform block (or transform unit), or a prediction block. (or prediction unit) or may represent a block to which an in-loop filter is applied. In this specification, 'unit' may represent a basic unit for performing a specific encoding/decoding process, and 'block' may represent a pixel array of a predetermined size. Unless otherwise specified, ‘block’ and ‘unit’ can be used with the same meaning. For example, in embodiments described later, a coding block (coding block) and a coding unit (coding unit) may be understood to have equivalent meanings.

Furthermore, the picture including the current block will be called the current picture.

When encoding the current picture, overlapping data between pictures can be removed through inter prediction. Inter prediction can be performed on a block basis. Specifically, a prediction block of the current block can be generated from a reference picture using motion information of the current block. Here, the motion information may include at least one of a motion vector, a reference picture index, and a prediction direction.

3 and 4 are flowcharts of the inter prediction method.

Figure 3 shows the operation of the encoder, and Figure 4 shows the operation of the decoder.

The encoder performs motion estimation (S310) and obtains a prediction block based on motion information derived as a motion estimation result (S320). Here, the motion information may include at least one of a motion vector, reference picture index, motion vector precision, bidirectional weight, and whether L0 prediction is performed or L1 prediction is performed.

Furthermore, the encoder may determine an inter prediction mode for performing inter prediction and encode information for deriving motion information according to the determined inter prediction mode (S330).

In the decoder, an inter prediction mode is determined based on information signaled from the bitstream (S410), and motion information is acquired according to the determined inter prediction mode (S420). When motion information is acquired, a prediction block of the current block can be obtained based on the obtained motion information (S430).

Hereinafter, each step will be described in detail.

Motion information of the current block can be generated through motion estimation.

Figure 5 shows an example in which motion estimation is performed.

In Figure 5, it is assumed that the POC (Picture Order Count) of the current picture is T, and the POC of the reference picture is (T-1).

The search range for motion estimation can be set from the same position as the reference point of the current block in the reference picture. Here, the reference point may be the location of the upper left sample of the current block.

As an example, in Figure 5, it is illustrated that a rectangle of size (w0+w01) and (h0+h1) is set as the search range, centered on the reference point. In the above example, w0, w1, h0, and h1 may have the same value. Alternatively, at least one of w0, w1, h0, and h1 may be set to have a different value from the other. Alternatively, the sizes of w0, w1, h0, and h1 may be determined so as not to exceed a Coding Tree Unit (CTU) boundary, slice boundary, tile boundary, or picture boundary.

After setting reference blocks with the same size as the current block within the search range, the cost of each reference block compared to the current block can be measured. The cost can be calculated using the similarity between two blocks.

As an example, the cost may be calculated based on the Sum of Absolute Difference (SAD) of the difference values between the original samples in the current block and the original samples (or restored samples) in the reference block. The smaller the absolute value sum, the lower the cost can be.

Afterwards, after comparing the costs of each reference block, the reference block with the optimal cost can be set as the prediction block of the current block.

Additionally, the distance between the current block and the reference block can be set as a motion vector. Specifically, the x-coordinate difference and y-coordinate difference between the current block and the reference block may be set as a motion vector.

Furthermore, the index of the picture containing the reference block specified through motion estimation is set as the reference picture index.

Additionally, the prediction direction can be set based on whether the reference picture belongs to the L0 reference picture list or the L1 reference picture list.

Additionally, motion estimation may be performed for each of the L0 direction and L1 direction. When prediction is performed in both the L0 direction and the L1 direction, motion information in the L0 direction and motion information in the L1 direction can be generated respectively.

Figure 6 shows an example of generating a prediction block through unidirectional (i.e., L0 direction) prediction, and Figure 7 shows an example of generating a prediction block through bidirectional (i.e., L0 and L1 directions) prediction.

In the case of unidirectional prediction, a prediction block of the current block is generated using one piece of motion information. As an example, the motion information may include an L0 motion vector, an L0 reference picture index, and prediction direction information indicating the L0 direction.

In the case of bidirectional prediction, a prediction block is created using two pieces of motion information. As an example, a reference block in the L0 direction specified based on motion information in the L0 direction (L0 motion information) is set as an L0 prediction block, and the L1 direction specified based on motion information in the L1 direction (L1 motion information) is set as an L0 prediction block. The reference block can be used to generate an L1 prediction block. Afterwards, the L0 prediction block and the L1 prediction block can be weighted to generate the prediction block of the current block.

In the examples shown in FIGS. 5 to 7, the L0 reference picture exists in the direction before the current picture (i.e., the POC value is smaller than the current picture), and the L1 reference picture exists in the direction after the current picture (i.e., the POC value is smaller than the current picture). It is exemplified as existing in (the POC value is larger than the picture).

However, unlike the example shown, the L0 reference picture may exist in the direction after the current picture, or the L1 reference picture may exist in the direction before the current picture. For example, both the L0 reference picture and the L1 reference picture may exist in the previous direction of the current picture, or both may exist in the subsequent direction of the current picture. Alternatively, bidirectional prediction may be performed using an L0 reference picture that exists in the direction after the current picture and an L1 reference picture that exists in the direction before the current picture.

Motion information of the block on which inter prediction was performed may be stored in memory. At this time, motion information may be stored in sample units. Specifically, motion information of the block to which a specific sample belongs may be stored as motion information of the specific sample. The stored motion information can be used to derive motion information of a neighboring block to be encoded/decoded later.

The encoder may signal information encoding a residual sample corresponding to the difference value between the sample of the current block (i.e., the original sample) and the prediction sample and the motion information necessary to generate the prediction block to the decoder. The decoder may decode information about the signaled difference value to derive a difference sample, and add a prediction sample within a prediction block generated using motion information to the difference sample to generate a restored sample.

At this time, in order to effectively compress the motion information signaled to the decoder, one of a plurality of inter prediction modes may be selected. Here, the plurality of inter prediction modes may include a motion information merge mode and a motion vector prediction mode.

The motion vector prediction mode is a mode in which the difference value between a motion vector and a motion vector predicted value is encoded and signaled. Here, the motion vector prediction value may be derived based on motion information of neighboring blocks or neighboring samples adjacent to the current block.

Figure 8 shows positions referenced to derive motion vector prediction values.

For convenience of explanation, it is assumed that the current block has a size of 4x4.

In the illustrated example, 'LB' represents samples included in the leftmost column and bottommost row in the current block. 'RT' represents the sample included in the rightmost column and topmost row in the current block. A0 to A4 represent samples neighboring to the left of the current block, and B0 to B5 represent samples neighboring to the top of the current block. As an example, A1 represents a sample neighboring to the left of LB, and B1 represents a sample neighboring to the top of RT. A neighboring block containing a sample (that is, one of A0 to A4 or one of B0 to B5) that is spatially adjacent to the current block may be referred to as a spatial neighboring block.

Col indicates the position of a sample neighboring the bottom right of the current block in the co-located picture. The collocated picture is a different picture from the current picture, and information for specifying the collocated picture can be explicitly encoded and signaled in the bitstream. Alternatively, a reference picture with a predefined reference picture index may be set as a collocated picture. A neighboring block containing a sample (i.e., Col) temporally adjacent to the current block may be referred to as a temporal neighboring block.

The motion vector prediction value of the current block may be derived from at least one motion vector prediction candidate included in the motion vector prediction list.

The number of motion vector prediction candidates that can be inserted into the motion vector prediction list (i.e., the size of the list) may be predefined in the encoder and decoder. As an example, the maximum number of motion vector prediction candidates may be two.

A motion vector stored at the position of a neighboring sample adjacent to the current block or a scaled motion vector derived by scaling the motion vector may be inserted into the motion vector prediction list as a motion vector prediction candidate. At this time, a motion vector prediction candidate can be derived by scanning neighboring samples adjacent to the current block in a predefined order.

As an example, it can be checked whether a motion vector is stored at each location in the order from A0 to A4. And, according to the above scan order, the earliest discovered available motion vector can be inserted into the motion vector prediction list as a motion vector prediction candidate.

As another example, check whether a motion vector is stored at each location in the order from A0 to A4, and predict the motion vector using the motion vector of the position that has the same reference picture as the current block found first as a motion vector prediction candidate. It can be inserted into the list. If there is no neighboring sample having the same reference picture as the current block, a motion vector prediction candidate can be derived based on the available vector found first. Specifically, after scaling the first available motion vector found, the scaled motion vector can be inserted into the motion vector prediction list as a motion vector prediction candidate. At this time, scaling may be performed based on the output order difference between the current picture and the reference picture (i.e., POC difference) and the output order difference between the current picture and the reference picture of the neighboring sample (i.e., POC difference).

Furthermore, it is possible to check whether a motion vector is stored at each location in the order from B0 to B5. And, according to the above scan order, the earliest discovered available motion vector can be inserted into the motion vector prediction list as a motion vector prediction candidate.

As another example, check whether a motion vector is stored at each location in the order from B0 to B5, but predict the motion vector using the motion vector at the position that has the same reference picture as the current block found first as a motion vector prediction candidate. It can be inserted into the list. If there is no neighboring sample having the same reference picture as the current block, a motion vector prediction candidate can be derived based on the available vector found first. Specifically, after scaling the first available motion vector found, the scaled motion vector can be inserted into the motion vector prediction list as a motion vector prediction candidate. At this time, scaling may be performed based on the output order difference between the current picture and the reference picture (i.e., POC difference) and the output order difference between the current picture and the reference picture of the neighboring sample (i.e., POC difference).

When motion vectors are stored in block units (eg, 4x4), a motion vector prediction candidate may be derived based on the motion vector of a block including a sample at a predetermined position.

As in the above example, a motion vector prediction candidate can be derived from a sample adjacent to the left of the current block, and a motion vector prediction candidate can be derived from a sample adjacent to the top of the current block.

At this time, the motion vector prediction candidate derived from the left sample may be inserted into the motion vector prediction list before the motion vector prediction candidate derived from the top sample. In this case, the index assigned to the motion vector prediction candidate derived from the left sample may have a smaller value than the motion vector prediction candidate derived from the top sample.

Contrary to the above, the motion vector prediction candidate derived from the top sample may be inserted into the motion vector prediction list before the motion vector prediction candidate derived from the left sample.

Among the motion vector prediction candidates included in the motion vector prediction list, the motion vector prediction candidate with the highest coding efficiency may be set as the motion vector predictor (MVP) of the current block. Additionally, index information indicating a motion vector prediction candidate that is set as the motion vector prediction value of the current block among a plurality of motion vector prediction candidates may be encoded and signaled to the decoder. When the number of motion vector prediction candidates is two, the index information may be a 1-bit flag (eg, MVP flag). Additionally, a motion vector difference (MVD), which is the difference between the motion vector of the current block and the motion vector predicted value, can be encoded and signaled to the decoder.

The decoder can construct a motion vector prediction list in the same way as the encoder. Additionally, index information may be decoded from the bitstream, and one of a plurality of motion vector prediction candidates may be selected based on the decoded index information. The selected motion vector prediction candidate can be set as the motion vector prediction value of the current block.

Additionally, motion vector difference values can be decoded from the bitstream. Afterwards, the motion vector of the current block can be derived by combining the motion vector prediction value and the motion vector difference value.

When bidirectional prediction is applied to the current block, a motion vector prediction list can be generated for each of the L0 direction and L1 direction. That is, the motion vector prediction list may be composed of motion vectors in the same direction. Accordingly, the motion vector of the current block and the motion vector prediction candidates included in the motion vector prediction list have the same direction.

When the motion vector prediction mode is selected, the reference picture index and prediction direction information may be explicitly encoded and signaled to the decoder. As an example, when a plurality of reference pictures exist in the reference picture list, and motion estimation is performed for each of the plurality of reference pictures, a method for specifying a reference picture from which motion information of the current block is derived among the plurality of reference pictures The reference picture index can be explicitly encoded and signaled to the decoder.

At this time, if the reference picture list includes only one reference picture, encoding/decoding of the reference picture index may be omitted.

Prediction direction information may be an index indicating one of L0 unidirectional prediction, L1 unidirectional prediction, or bidirectional prediction. Alternatively, the L0 flag indicating whether prediction in the L0 direction is performed and the L1 flag indicating whether prediction in the L1 direction is performed may be encoded and signaled, respectively.

The motion information merge mode is a mode that sets the motion information of the current block to be the same as the motion information of the neighboring block. In the motion information merge mode, motion information can be encoded/decoded using a motion information merge list.

A motion information merge candidate may be derived based on motion information of a neighboring block or neighboring sample adjacent to the current block. For example, after pre-defining a reference position around the current block, it is possible to check whether motion information exists at the pre-defined reference position. If motion information exists at a predefined reference location, motion information at that location can be inserted into the motion information merge list as a motion information merge candidate.

In the example of FIG. 8, the predefined reference position may include at least one of A0, A1, B0, B1, B5, and Col. Furthermore, motion information merging candidates can be derived in the following order: A1, B1, B0, A0, B5, and Col.

When motion information is stored in block units (eg, 4x4), a motion information merging candidate may be derived based on motion information of a block including a sample of a predefined reference position.

Among the motion information merge candidates included in the motion information merge list, the motion information of the motion information merge candidate with the optimal cost can be set as the motion information of the current block. Furthermore, index information (eg, merge index) indicating a motion information merge candidate selected from among a plurality of motion information merge candidates may be encoded and transmitted to the decoder.

In the decoder, a motion information merge list can be constructed in the same way as in the encoder. Then, a motion information merge candidate can be selected based on the merge index decoded from the bitstream. The motion information of the selected motion information merge candidate may be set as the motion information of the current block.

Unlike the motion vector prediction list, the motion information merge list consists of a single list regardless of the prediction direction. That is, the motion information merge candidate included in the motion information merge list may have only L0 motion information or L1 motion information, or may have bidirectional motion information (i.e., L0 motion information and L1 motion information).

Motion information of the current block can also be derived using the restored sample area around the current block. Here, the restored sample area used to derive motion information of the current block may be called a template.

Figure 9 is a diagram for explaining a template-based motion estimation method.

In Figure 5, it is explained that the prediction block of the current block is determined based on the cost between the current block and the reference block within the search range. According to this embodiment, unlike FIG. 5, motion estimation for the current block is based on the cost between a template neighboring the current block (hereinafter referred to as the current template) and a reference template having the same size and shape as the current template. can be performed.

As an example, the cost may be calculated based on the absolute sum of difference values between restored samples in the current template and restored samples in the reference block. The smaller the absolute value sum, the lower the cost can be.

Once the current template within the search range and the reference template with the optimal cost are determined, the reference block neighboring the reference template can be set as the prediction block of the current block.

Additionally, motion information of the current block can be set based on the distance between the current block and the reference block, the index of the picture to which the reference block belongs, and whether the reference picture is included in the L0 or L1 reference picture list.

Since the template is defined as the previously restored area around the current block, the decoder itself can perform motion estimation in the same manner as the encoder. Accordingly, when motion information is derived using a template, there is no need to encode and signal motion information other than information indicating whether the template is used.

The current template may include at least one of an area adjacent to the top of the current block or an area adjacent to the left. At this time, the area adjacent to the top may include at least one row, and the area adjacent to the left may include at least one column.

Figure 10 shows examples of template configurations.

A current template may be constructed following one of the examples shown in Figure 10.

Alternatively, unlike the example shown in FIG. 10, the template may be configured only from the area adjacent to the left side of the current block, or may be configured only from the area adjacent to the top of the current block.

The size and/or shape of the current template may be predefined in the encoder and decoder.

Alternatively, after pre-defining a plurality of template candidates with different sizes and/or shapes, index information specifying one of the plurality of template candidates can be encoded and signaled to the decoder.

Alternatively, one of a plurality of template candidates may be adaptively selected based on at least one of the size, shape, or location of the current block. For example, if the current block touches the upper border of the CTU, the current template can be constructed only from the area adjacent to the left side of the current block.

Template-based motion estimation can be performed for each reference picture stored in the reference picture list. Alternatively, motion estimation may be performed on only some of the reference pictures. As an example, motion estimation is performed only on reference pictures with a reference picture index of 0, or only on reference pictures whose reference picture index is smaller than the threshold, or on reference pictures whose POC difference with the current picture is smaller than the threshold. It can be done.

Alternatively, the reference picture index can be explicitly encoded and signaled, and then motion estimation can be performed only on the reference picture indicated by the reference picture index.

Alternatively, motion estimation can be performed targeting the reference picture of a neighboring block corresponding to the current template. For example, if the template consists of a left neighboring area and a top neighboring area, at least one reference picture can be selected using at least one of the reference picture index of the left neighboring block or the reference picture index of the top neighboring block. Afterwards, motion estimation can be performed on at least one selected reference picture.

Information indicating whether template-based motion estimation has been applied may be encoded and signaled to the decoder. The information may be a 1-bit flag. For example, if the flag is true (1), it indicates that template-based motion estimation is applied to the L0 direction and L1 direction of the current block. On the other hand, if the flag is false (0), it indicates that template-based motion estimation is not applied. In this case, motion information of the current block may be derived based on the motion information merging mode or motion vector prediction mode.

Contrary to the above, template-based motion estimation can be applied only when it is determined that the motion information merge mode and motion vector prediction mode are not applied to the current block. For example, when the first flag indicating whether the motion information merge mode is applied and the second flag indicating whether the motion vector prediction mode is applied are both 0, motion estimation based on the template may be performed.

For each of the L0 direction and the L1 direction, information indicating whether template-based motion estimation has been applied may be signaled. That is, whether template-based motion estimation is applied to the L0 direction and whether template-based motion estimation is applied to the L1 direction can be determined independently of each other. Accordingly, template-based motion estimation may be applied to one of the L0 direction and the L1 direction, while another mode (eg, motion information merge mode or motion vector prediction mode) may be applied to the other direction.

When template-based motion estimation is applied to both the L0 direction and the L1 direction, a prediction block of the current block may be generated based on a weighted sum operation of the L0 prediction block and the L1 prediction block. Alternatively, even when template-based motion estimation is applied to one of the L0 direction and the L1 direction, but another mode is applied to the other, the prediction block of the current block is based on a weighted sum operation of the L0 prediction block and the L1 prediction block. This can be created.

Alternatively, a motion estimation method based on a template may be inserted as a motion information merging candidate in a motion information merging mode or a motion vector prediction candidate in a motion vector prediction mode. In this case, whether to apply the template-based motion estimation method may be determined based on whether the selected motion information merge candidate or the selected motion vector prediction candidate indicates the template-based motion estimation method.

Based on the two-way matching method, movement information of the current block can also be generated.

The two-way matching method can be performed only when the temporal order of the current picture (i.e., POC) exists between the temporal order of the L0 reference picture and the temporal order of the L1 reference picture.

When the two-way matching method is applied, the search range can be set for each of the L0 reference picture and L1 reference picture. At this time, the L0 reference picture index for identifying the L0 reference picture and the L1 reference picture index for identifying the L1 reference picture may be encoded and signaled, respectively.

As another example, only the L0 reference picture index can be encoded and signaled, and the L1 reference picture can be selected based on the distance between the current picture and the L0 reference picture (hereinafter referred to as L0 POC difference). As an example, among the L1 reference pictures included in the L1 reference picture list, an L1 reference whose absolute value of the distance to the current picture (hereinafter referred to as L1 POC difference) is the same as the absolute value of the distance between the current picture and the L0 reference picture. You can select a picture. If there is no L1 reference picture with the same L1 POC difference as the L0 POC difference, the L1 reference picture whose L1 POC difference is most similar to the L0 POC difference among the L1 reference pictures can be selected.

At this time, among the L1 reference pictures, only the L1 reference picture that has a different temporal direction from the L0 reference picture can be used for bilateral matching. For example, if the POC of the L0 reference picture is smaller than that of the current picture, one of the L1 reference pictures whose POC is larger than the current picture can be selected.

Contrary to the above, only the L1 reference picture index may be encoded and signaled, and the L0 reference picture may be selected based on the distance between the current picture and the L1 reference picture.

Alternatively, a two-way matching method may be performed using an L0 reference picture among L0 reference pictures that is closest in distance to the current picture, and an L1 reference picture among L1 reference pictures that is closest in distance to the current picture.

Or, using an L0 reference picture assigned a predefined index in the L0 reference picture list (e.g., index 0) and an L1 reference picture assigned a predefined index in the L1 reference picture list (e.g., index 0), two-way A matching method can also be performed.

Alternatively, the LX (X is 0 or 1) reference picture is selected based on an explicitly signaled reference picture index, and the L| It can be selected as a reference picture with the closest distance to, or a reference picture with a predefined index in the L｜X-1｜ reference picture list.

As another example, the L0 and/or L1 reference picture may be selected based on the motion information of the neighboring block of the current block. As an example, the L0 and/or L1 reference picture to be used for two-way matching can be selected using the reference picture index of the left or top neighboring block of the current block.

The search range can be set to within a predetermined range from the collocated block in the reference picture.

As another example, the search range can be set based on initial motion information. Initial motion information may be derived from a neighboring block of the current block. For example, motion information of the left neighboring block or the top neighboring block of the current block may be set as the initial motion information of the current block.

When the two-way matching method is applied, the L0 motion vector and the motion vector in the L1 direction are set in opposite directions. This indicates that the sign of the L0 motion vector and the motion vector in the L1 direction have opposite signs. In addition, the size of the LX motion vector may be proportional to the distance (i.e., POC difference) between the current picture and the LX reference picture.

Afterwards, the cost between the reference block within the search range of the L0 reference picture (hereinafter referred to as L0 reference block) and the reference block within the search range of the L1 reference picture (hereinafter referred to as L1 reference block) is used. , motion estimation can be performed.

If you select an L0 reference block whose vector with the current block is (x, y), you can select an L1 reference block located at a distance of (-Dx, -Dy) from the current block. Here, D can be determined by the ratio of the distance between the current picture and the L0 reference picture and the distance between the L1 reference picture and the current picture.

For example, in the example shown in Figure 11, the absolute value of the distance between the current picture (T) and the L0 reference picture (T-1) and the distance between the current picture (T) and the L1 reference picture (T+1) The absolute values are mutually identical. Accordingly, in the illustrated example, the L0 motion vector (x0, y0) and the L1 motion vector (x1, y1) have the same size but opposite distances. If an L1 reference picture with a POC of (T+2) was used, the L1 motion vector (x1, y1) will be set to (-2*x0, -2*y0).

Once the L0 reference block and L1 reference block with optimal cost are selected, the L0 reference block and L1 reference block can be set as the L0 prediction block and L1 prediction block of the current block, respectively. Afterwards, the final prediction block of the current block can be generated through a weighted sum operation of the L0 reference block and the L1 reference block.

When the bilateral matching method is applied, the decoder can perform motion estimation in the same way as the encoder. Accordingly, information indicating whether the two-way motion matching method is applied is explicitly encoded/decoded, while encoding/decoding of motion information such as motion vectors can be omitted. As described above, at least one of the L0 reference picture index or the L1 reference picture index may be explicitly encoded/decoded.

As another example, information indicating whether the two-way matching method has been applied may be explicitly encoded/decoded, but if the two-way matching method has been applied, the L0 motion vector or the L1 motion vector may be explicitly encoded and signaled. If the L0 motion vector is signaled, the L1 motion vector can be derived based on the POC difference between the current picture and the L0 reference picture and the POC difference between the current picture and the L1 reference picture. If the L1 motion vector is signaled, the L0 motion vector can be derived based on the POC difference between the current picture and the L0 reference picture and the POC difference between the current picture and the L1 reference picture. At this time, the encoder can explicitly encode the smaller one of the L0 motion vector and the L1 motion vector.

Information indicating whether the two-way matching method has been applied may be a 1-bit flag. As an example, if the flag is true (eg, 1), it may indicate that the two-way matching method is applied to the current block. If the flag is false (eg, 0), it may indicate that the two-way matching method is not applied to the current block. In this case, motion information merge mode or motion vector prediction mode may be applied to the current block.

Contrary to the above, the two-way matching method can be applied only when it is determined that the motion information merge mode and motion vector prediction mode are not applied to the current block. For example, when the first flag indicating whether the motion information merge mode is applied and the second flag indicating whether the motion vector prediction mode is applied are both 0, the two-way matching method may be applied.

Alternatively, the two-way matching method may be inserted as a motion information merge candidate in the motion information merge mode or a motion vector prediction candidate in the motion vector prediction mode. In this case, whether to apply the two-way matching method may be determined based on whether the selected motion information merge candidate or the selected motion vector prediction candidate indicates the two-way matching method.

In the two-way matching method, it is exemplified that the temporal order of the current picture must exist between the temporal order of the L0 reference picture and the temporal order of the L1 reference picture. It is also possible to generate a prediction block of the current block by applying a one-way matching method that does not apply the constraints of the above two-way matching method. Specifically, in the one-way matching method, two reference pictures whose temporal order (i.e., POC) is smaller than that of the current block or two reference pictures whose temporal order is larger than the current block can be used. At this time, both reference pictures may be derived from the L0 reference picture list or the L1 reference picture list. Alternatively, one of the two reference pictures may be derived from the L0 reference picture list, and the other may be derived from the L1 reference picture list.

When a prediction block for the current block is obtained, a residual block can be generated by differentiating the prediction block from the original block. At this time, prediction accuracy can be evaluated according to the size of the residual signal.

For example, if the absolute value of the residual signal in a specific area of the current block is large, it means that the prediction accuracy in that area is low. On the other hand, if the absolute value of the residual signal in a specific area of the current block is small, it means that the prediction accuracy in that area is high.

In this disclosure, we propose a method to improve the accuracy of the prediction signal by considering the correlation between the size of the residual signal and the prediction accuracy. Specifically, the present disclosure proposes a method of improving the accuracy of a prediction signal based on the prediction accuracy after obtaining the prediction block.

In the present disclosure, the residual signal may represent at least one residual sample or residual block, and the prediction signal may represent at least one prediction sample or prediction block.

Information indicating whether the prediction signal improvement method according to the present disclosure is applied may be encoded and signaled. The information may be a 1-bit flag.

Alternatively, whether to apply the prediction signal improvement method may be determined based on at least one of the size, shape, prediction mode, inter prediction mode, or prediction direction of the current block. Here, the size of the current block may represent one of the width, height, or value derived based on the product of the width and height of the current block. Additionally, the prediction mode may represent intra prediction or inter prediction, and the inter prediction mode may represent AMVP mode or motion vector summation mode. The prediction direction may represent unidirectional prediction (eg, L0-directional prediction or L1-directional prediction) or bidirectional prediction.

For example, the prediction signal improvement method can be applied only when the size of the current block is larger than the threshold. Alternatively, the prediction signal improvement method may be applied only when inter prediction is applied to the current block. Alternatively, the prediction signal improvement method may be applied only when the motion vector merge mode is applied to the current block.

Hereinafter, the prediction signal improvement method according to the present disclosure will be described in detail.

Referring to FIG. 12, a reference block within a reference picture can be specified (S1210). As an example, to specify a reference block, a reference template within a reference picture can be specified based on the template of the current block (hereinafter referred to as the current template).

Here, the reference template can be searched based on the L0 reference picture list or the reference picture at a predefined position in the L1 reference picture list. Here, the reference picture at the predefined position may be a reference picture with an index of 0.

Alternatively, when inter prediction is applied to the current block, a reference template can be searched from the reference picture indicated by the motion information of the current block.

Alternatively, information identifying a reference picture for searching a reference template may be explicitly encoded and signaled.

Alternatively, the reference template can be searched from the reference picture that has the smallest distance (i.e., POC difference) from the current picture among the reference pictures.

Information identifying a reference picture for searching a reference template may be explicitly encoded and signaled.

First, the previously restored area adjacent to the current block can be set as the current template. For example, as in the example shown in FIG. 13, the current template may be configured to include a restoration area adjacent to the top of the current block and a restoration area adjacent to the left side of the current block.

Once the current template is set, the reference template most similar to the current template within the reference picture can be searched. Specifically, the area in the reference picture with the lowest cost compared to the current template can be set as the reference template. Here, the cost may be SAD (Sum of Difference). Once a reference template is set up, you can select the reference block surrounded by the reference template.

The reference template may be an area of the same size/shape as the current template. A reference block surrounded by a reference template can also have the same shape/size as the current block. For example, if the size of the current block is 4x4, the size of the reference block may also be 4x4.

Once the reference block is determined, a threshold value can be derived based on the residual signal within the reference block. The threshold may be derived based on the absolute value of each residual sample in the reference block. Specifically, the threshold value may be derived based on the average, median, minimum, or maximum value of the absolute values of the residual samples.

For example, the average of the absolute values of the residual samples can be set as the threshold, or a value derived by adding or subtracting an offset to the average can be set as the threshold.

Here, the offset may be predefined in the encoder and decoder.

Alternatively, the offset can be adaptively determined according to the size or shape of the current block.

Alternatively, information representing the offset or threshold may be explicitly signaled through the bitstream.

Once the threshold is determined, the prediction accuracy at each location within the current block can be determined based on the threshold (S1220). Here, determining the prediction accuracy may mean classifying the classification target position within the current block into one of a plurality of groups.

Here, by comparing the value of the residual sample in the reference block and the threshold, the classification target position in the current block can be classified into the first group or the second group. Here, the residual sample in the reference block refers to the value used to restore the reference sample, and may be located at the same position as the classification target position. For example, if the upper left coordinates of each of the current block and the reference block are (0, 0), classification of the (x, y) position in the current block is based on the residual sample and threshold of the (x, y) position in the reference block. This can be done by comparing values.

When the absolute value of the residual sample is less than the threshold, the position corresponding to the residual sample is classified into the first group, and when the absolute value of the residual sample is greater than the threshold, the position corresponding to the residual sample is classified into the second group. Can be classified into groups. That is, depending on whether the absolute value of the residual sample is greater than or equal to the threshold, the position to be classified may be classified into one of two groups.

If the absolute value of the residual sample is smaller than the threshold, that is, the residual sample belongs to the first group, it means that the prediction accuracy at that location is relatively high. On the other hand, if the absolute value of the residual sample is greater than the threshold, that is, if the residual sample belongs to the second group, it means that the prediction accuracy at that location is relatively low.

As another example, the threshold may be set using the normal distribution of residual samples.

In Figure 15, m and σ represent the average value and standard deviation of residual samples, respectively. The threshold may be set as m±kσ. Here, the value of k can be an integer greater than 1.

For example, when a residual sample exists between m-σ and m+σ, the position corresponding to the residual sample may be classified into the first group. Otherwise, the position corresponding to the corresponding residual sample may be classified into the second group.

At this time, the fact that the residual sample falls within the standard deviation range, that is, the residual sample belongs to the first group means that the prediction accuracy at that location is relatively high. On the other hand, if the residual sample does not fall within the standard deviation range, that is, if the residual sample belongs to the second group, it means that the prediction accuracy at that location is relatively low.

That is, based on the threshold, the classification target location can be classified into a first group with high prediction accuracy or a second group with low prediction accuracy.

When classifying a random position in the current block into one of a plurality of groups, the prediction mode at the position corresponding to the random position in the reference block may be further considered. Specifically, based on whether the residual sample represents the difference between the prediction sample obtained by intra prediction and the original block, or whether the residual sample represents the difference between the prediction sample obtained by inter prediction and the original block, The classification target location corresponding to the residual sample may be classified into one of a plurality of groups.

As an example, when intra prediction is applied to the first position in the reference block, regardless of the result of comparing the absolute value of the residual sample at the first position and the threshold value, the first position is selected as a group with low prediction accuracy (i.e. It can be classified into group 2).

On the other hand, when inter prediction is applied to the second position in the reference block, the second position is classified into the first group or the second group based on the result of comparing the absolute value of the residual sample at the second position and the threshold value. can do.

That is, in positions coded/decoded by inter prediction, classification is performed based on the comparison result between the residual sample and the threshold, while positions coded/decoded by intra prediction may be forced to be classified into the second group.

Or, contrary to the above example, in positions coded/decoded by intra prediction, classification is performed based on the result of comparing the residual sample and the threshold value, while positions coded/decoded by inter prediction are classified into the second group. It can be forced.

Based on the classification result of each position in the current block, the prediction block of the current block can be updated (S1230).

At this time, the prediction block of the current block may be derived based on the motion information of the current block. As an example, a prediction block (hereinafter referred to as a first prediction block) of the current block may be derived based on a reference block specified based on the motion vector of the current block in the reference picture indicated by the reference picture index of the current block. At this time, the motion information of the current block may be derived based on a Motion Vector Prediction (MVP) list or a motion information merge list.

Thereafter, the first prediction block may be updated based on the reference block specified through the reference template (that is, the reference block specified through steps S1210 to S1220). Specifically, the first prediction block can be updated through a weighted sum operation of the first prediction block and the reference block. The updated first prediction block and the current block can be set as the final prediction block.

Meanwhile, the reference block specified through the reference template may be set as the second prediction block of the current block. That is, the restored samples in the reference block may be set as second prediction samples of the current block.

In this case, the final prediction block of the current block may be defined as being derived by a weighted sum operation of the first prediction block and the second prediction block. Hereinafter, it is assumed that the reference block specified based on the reference template is the second prediction block of the current block.

The weighted sum operation at a predetermined position within the current block may be performed based on a first prediction sample corresponding to the predetermined position within the first prediction block and a second prediction sample corresponding to the predetermined position within the second prediction block. there is. As an example, the weighted sum operation at the (x, y) position within the current block includes the first prediction sample at the (x, y) position within the first prediction block and the second prediction sample at the (x, y) position within the second prediction block. It can be performed using prediction samples.

At this time, the weights assigned to the first prediction sample and the second prediction sample may be determined based on the classification result at the location where the weighted sum operation within the current block is performed.

For example, when the first location is classified as a first group with relatively high prediction accuracy, a relatively high weight may be assigned to the second prediction block at the first location. On the other hand, when the second location is classified into the second group with relatively low prediction accuracy, a relatively low weight may be assigned to the second prediction block at the second location.

Specifically, the final prediction sample of the current block can be obtained using Equation 1 below.

In Equation 1, P(x, y) represents the final prediction sample at the (x, y) position in the current block, and P1(x, y) represents the final prediction sample at the (x, y) position in the first prediction block. P2(x, y) represents the prediction sample (i.e., the first prediction sample), and P2(x, y) represents the prediction sample (i.e., the second prediction sample) at the (x, y) position in the second prediction block.

w0 and w1 represent a first weight applied to the first prediction sample and a second weight applied to the second prediction sample, respectively. Each of w0 and w1 may be a real number of 0 or 1 or less. The sum of w0 and w1 may be 1.

To avoid real numbers, it is also possible to set w0 and w1 to integers and then shift the weighted sum result to the right to derive the final prediction sample.

The weight w1 assigned to the second prediction block may be set to a larger value when the (x, y) position is classified into the first group than when the (x, y) position is classified into the second group. there is.

Alternatively, the prediction sample may be updated based on Equation 1 only at positions belonging to a group with high prediction accuracy (i.e., the first group). That is, if the (x, y) position belongs to the first group, w0 and w1 may be set to values other than 0.

On the other hand, if the (x, y) position belongs to the second group, the first prediction sample can be set as the final prediction sample. That is, if the (x, y) position belongs to the second group, the second weight w1 assigned to the second prediction sample can be set to 0.

Weights may be predefined for each group. As an example, a lookup table defining weights assigned to each group may be predefined in the encoder and decoder.

Alternatively, the weight for the location where the weighted sum operation is performed may be adaptively determined based on the difference between the threshold value and the residual sample in the reference block. As an example, the second weight w1 for the position where the weighted sum operation is performed may be inversely proportional to the difference between the threshold value and the residual sample in the reference block.

In the above example, it was explained that each of the predicted positions within the current block is classified into one of two groups. In addition to the example described, each of the predicted positions within the current block may be classified into one of more groups. For example, 3, 4, or more groups may be defined. To this end, a plurality of threshold values may be determined, and boundaries between groups may be defined by the threshold values. For example, when the absolute value of the residual signal is less than or equal to the first threshold, the position to be classified may be classified into the first group. On the other hand, when the absolute value of the residual signal is greater than the first threshold and less than or equal to the second threshold, the position to be classified may be classified into the second group. On the other hand, if the absolute value of the residual signal is greater than the second threshold, the classification target position may be classified into the third group.

At this time, the weight matching each of the plurality of groups may be different.

When bidirectional prediction is applied to the current block, the above-described prediction block update method can be applied to each of the L0 direction and L1 direction. As an example, based on the first prediction block and the second prediction block for the L0 direction, a final prediction block for the L0 direction is obtained, and based on the first prediction block and the second prediction block for the L1 direction, the L1 direction The final prediction block for can be obtained. Then, the final prediction block of the current block can be obtained by an average or weighted sum operation between the final prediction block for the L0 direction and the final prediction block for the L1 direction.

Alternatively, when bidirectional prediction is applied to the current block, the above-described prediction block update method can be applied to the prediction block derived by calculating the average or weighted sum between the L0 prediction block and the L1 prediction block. As an example, the prediction block derived by the average or weighted sum operation of the prediction block in the L0 direction and the prediction block in the L1 direction is set as the first prediction block, and the reference block derived by template matching is set as the second prediction block. It can be set to . Afterwards, the final prediction block of the current block can be obtained through a weighted sum operation of the first prediction block and the second prediction block.

In Figure 12, it is explained that a reference block is derived through template matching based on the current template.

Unlike the described example, a plurality of reference blocks may be selected based on a two-way matching method.

Specifically, when applying two-way matching based on one motion vector, a reference block in the L0 direction (hereinafter referred to as an L0 reference block) is determined based on the motion vector, and a vector having the opposite direction to the motion vector is selected. Based on this, a reference block in the L1 direction (hereinafter referred to as an L1 reference block) can be determined.

At this time, based on the difference between the L0 reference block and the L1 reference block, the position to be classified within the current block may be classified into one of a plurality of groups. As an example, the difference between the sample at the first position in the L0 reference block and the sample at the first position included in the L1 reference block is compared with a threshold value to classify the classification target position in the current block into the first group or the second group. can do. For example, when the difference is less than or equal to the threshold, the classification target position within the current block may be classified into a group with relatively high prediction accuracy (i.e., the first group). On the other hand, if the difference is greater than the threshold, the classification target position within the current block may be classified into a group with relatively low prediction accuracy (i.e., the second group).

After setting the average or weighted sum of the L0 reference block and the L1 reference block as the second prediction block, the first prediction block can be updated through a weighted sum operation of the first prediction block and the second prediction block. At this time, the weight for calculating the weighted sum of the first prediction block and the second prediction block may be determined based on the classification result at each position in the current block.

Alternatively, when bidirectional prediction is applied to the current block, the L0 reference block can be used to update the L0 prediction block, and the L1 reference block can be used to update the L1 prediction block. That is, based on the L0 reference block, classification can be performed for each sample position in the current block, and then the L0 prediction block of the current block can be updated based on the classification result. Additionally, based on the L1 reference block, classification can be performed for each sample position in the current block, and then the L1 prediction block of the current block can be updated based on the classification result.

When deriving a reference block, information indicating whether template matching or bilateral matching is applied may be encoded and signaled.

Alternatively, the method of deriving the reference block may be adaptively determined according to the prediction direction of the current block. For example, when unidirectional prediction is applied to the current block, a reference block can be derived based on template matching. On the other hand, when bidirectional prediction is applied to the current block, two reference blocks can be derived based on bidirectional matching.

In the above example, it is exemplified that a previously derived prediction block is modified (or updated) based on a reference block. Instead of updating the previously derived prediction block, it is also possible to derive the prediction block of the current block using a plurality of reference blocks.

For example, when bidirectional prediction is applied to the current block, a reference block for the L0 direction and a reference block for the L1 direction can be determined. The L0 reference block and L1 reference block can be derived based on the motion information of the current block. Alternatively, the L0 reference block and the L1 reference block may be derived based on template matching or may be derived based on bilateral matching. The L0 reference block and the L1 reference block may also be referred to as an L0 prediction block and an L0 prediction block, respectively.

Afterwards, based on the L0 prediction block, the prediction accuracy in the L0 direction can be determined for each classification target position in the current block. Specifically, based on the residual signal of the L0 prediction block, each of the classification target positions in the current block may be classified into a first group or a second group.

Likewise, based on the L1 prediction block, the prediction accuracy in the L1 direction can be determined for each classification target position in the current block. Specifically, based on the residual signal of the L1 prediction block, each of the classification target positions within the current block may be classified into a first group or a second group.

Afterwards, the prediction block of the current block can be derived by performing a weighted sum of the L0 prediction block and the L1 prediction block. At this time, weights assigned to the L0 prediction block and the L1 prediction block may be determined based on the prediction accuracy in the L0 direction and the prediction accuracy in the L1 direction at the prediction target location in the current block.

Specifically, when the absolute value of the L0 residual sample and the absolute value of the L1 residual sample corresponding to the first position in the current block are different, or the difference between the absolute value of the L0 residual sample and the absolute value of the L1 residual sample is greater than or equal to the offset. In this case, the weight assigned to the L0 prediction sample and the weight assigned to the L1 prediction sample may be different.

For example, when the absolute value of the L0 residual sample is smaller than the absolute value of the L1 residual sample, the weight assigned to the L0 prediction sample may have a larger value than the weight assigned to the L1 prediction sample.

On the other hand, if the absolute value of the L0 residual sample is greater than the absolute value of the L1 residual sample, the weight assigned to the L0 prediction sample may have a smaller value than the weight assigned to the L1 prediction sample.

If the absolute value of the L0 residual sample and the absolute value of the L1 residual sample are the same, or the difference between the absolute value of the L0 residual sample and the absolute value of the L1 residual sample is less than the threshold, the weight assigned to the L0 prediction sample and the L1 prediction Weights assigned to samples can be set to the same value.

In the above example, it was explained that the prediction accuracy is determined on a sample basis within the current block. Unlike the described example, prediction accuracy may be determined in units of multiple samples or subblocks within the current block.

In the example shown in FIG. 16, prediction accuracy is determined in units of subblocks of 2x2 size. Sub-blocks may be set to a size or shape different from that shown.

At this time, the prediction accuracy for the sub-block may be determined based on the result of comparing the average, minimum, maximum, or median value of the residual signals included in the sub-block with the threshold value.

Alternatively, the prediction accuracy for the sub-block can be determined by comparing the threshold value with a residual sample at a specific position within the sub-block. Here, the specific location may be an upper left location, an upper right location, a lower left location, a lower right location, or a central location.

When prediction accuracy is determined on a sub-block basis, the weight for the weighted sum calculation may be determined on a sub-block basis. As a result, the weight applied to each prediction sample included in the sub-block may have the same value.

A subblock may have a smaller size than the current block or reference block. At least one of the number of sub-blocks, the size or shape of the sub-block may be predefined in the encoder and decoder. Alternatively, depending on the size or shape of the current block, at least one of the number of sub-blocks and the size or shape of the sub-block may be adaptively determined.

In the above example, it was explained that the weight for the weighted sum calculation is determined based on the prediction accuracy at each position in the current block. Unlike the example described, the weight for the weighted sum operation may be determined by comparing the current template and the reference template. Specifically, the weight for the current block can be determined by comparing the cost between the current template and the L0 reference template and the cost between the current template and the L1 reference template.

Based on the L0 motion information of the current block, the L0 reference block can be determined from the L0 reference picture. Likewise, an L1 reference block can be determined from an L1 reference picture based on the L1 motion information of the current block.

Alternatively, the L0 reference block and L1 reference block may be determined based on template matching or upward matching.

Meanwhile, the L0 reference block and the L1 reference block may be set as the L0 prediction block and the L1 prediction block of the current block, respectively.

The prediction block of the current block can be obtained by a weighted sum operation of the L0 prediction block and the L1 prediction block. For example, the prediction block of the current block can be derived using Equation 1, and in this case, P1 in Equation 1 may represent an L0 prediction sample, and P2 may represent an L1 prediction sample.

Meanwhile, the weight applied to the L0 prediction block and the weight applied to the L1 prediction block may be determined by comparing the cost between the current template and the L0 reference template and the cost between the current template and the L1 reference template. Here, the cost may represent SAD (Sum Of Absolute Difference).

As an example, a cost SAD0 between the current template and the L0 reference template, and a cost SAD1 between the current template and the L1 reference template can be calculated. Thereafter, by comparing SAD0 and SAD1, the weight assigned to the L0 prediction sample and the weight assigned to the L1 prediction sample can be determined.

For example, when SAD0 is smaller than SAD1, the weight w0 assigned to the L0 prediction sample may have a value greater than the weight w1 assigned to the L1 prediction sample. Conversely, when SAD1 is smaller than SAD0, the weight w1 assigned to the L1 prediction sample may have a value greater than the weight w0 assigned to the L0 prediction sample.

As an example, the weight applied to a prediction block derived from a reference block adjacent to a reference template with a lower cost among the L0 reference template and the L1 reference template may be derived as (SAD_max)/(SAD0+SAD1). Here, SAD_max represents the maximum value among SAD0 and SAD1. On the other hand, the weight applied to the prediction block derived from the reference block adjacent to the reference template with the highest cost among the L0 reference template and the L1 reference template may be derived as (SAD_min)/(SAD0+SAD1). Here, SAD_min represents the minimum value among SAD0 and SAD1.

Alternatively, the weight may be determined at the block level. As an example, for the current block, the prediction accuracy for the L0 prediction block and the prediction accuracy for the L1 prediction block may be determined, and based on this, the weight applied to each of the L0 prediction block and the L1 prediction block may be determined.

The determination unit of prediction accuracy may be predefined in the encoder and decoder.

Alternatively, information representing the determination unit of prediction accuracy may be encoded and signaled. As an example, the information may be an index indicating one of sample units, sub-block units, or block levels.

Alternatively, the determination unit of prediction accuracy may be adaptively determined based on at least one of the size, shape, inter prediction mode, or prediction direction of the current block.

Applying the embodiments described focusing on the decoding process or encoding process to the encoding process or decoding process is included in the scope of the present disclosure. Modification of the embodiments described in the prescribed order to an order different from that described is also included within the scope of the present disclosure.

Although the above-mentioned disclosure is explained based on a series of steps or a flowchart, this does not limit the chronological order of the invention, and may be performed simultaneously or in a different order as needed. In addition, each of the components (e.g., units, modules, etc.) constituting the block diagram in the above-described disclosure may be implemented as a hardware device or software, and a plurality of components may be combined to form a single hardware device or software. It could be. The above-described disclosure may be implemented in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, etc., singly or in combination. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. media), and hardware devices specifically configured to store and perform program instructions, such as ROM, RAM, flash memory, etc. The hardware devices may be configured to operate as one or more software modules to perform processing according to the present disclosure, and vice versa.

Embodiments through this disclosure can be applied to electronic devices that encode or decode images.

Claims

Deriving a first prediction block for the current block;

Deriving a reference block for the current block; and

Comprising updating the first prediction block based on a second prediction block derived from the reference block,

The third prediction sample at the current prediction position in the current block is derived based on a weighted sum operation based on the first prediction sample in the first prediction block and the second prediction sample in the second prediction block,

An image decoding method, characterized in that the weights for the weighted sum calculation are determined based on the classification result at the current prediction position.
According to claim 1,

An image decoding method, wherein the classification of the current prediction position is performed using a residual sample of a position corresponding to the current prediction position in the reference block.
According to clause 2,

When the absolute value of the residual sample is less than the threshold, the current predicted position is classified into the first group, and when the absolute value of the residual sample is greater than the threshold, the current predicted position is classified into the second group. Characterized by video decoding method.
According to clause 3,

The weight assigned to the second prediction sample is set to a larger value when the current prediction position is classified into the first group than when the current prediction position is classified into the second group. Decryption method.
According to clause 3,

The threshold value is derived based on at least one of the minimum value, maximum value, or average value of residual samples in the reference block.
According to claim 1,

The reference block is determined based on a reference template in the reference picture,

The reference template is an area with the lowest cost compared to the current template in the reference picture,

An image decoding method, wherein the current template is composed of previously reconstructed samples surrounding the current block.
According to clause 6,

A video decoding method, wherein the reference picture has a predefined index in a reference picture list.
According to clause 6,

A video decoding method, wherein the reference picture has the closest distance to the current picture among reference pictures.
According to claim 1,

Whether to perform an update on the first prediction block is determined based on at least one of the size of the current block, the prediction mode of the current block, the inter prediction mode of the current block, or the prediction direction of the current block. A video decoding method, characterized in that.
According to claim 1,

An image decoding method, characterized in that the classification is performed on a sub-block basis.
According to claim 10,

Image decoding, wherein classification of the current sub-block within the current block is performed using the minimum value, maximum value, or average value of residual samples included in the reference sub-block corresponding to the current sub-block within the reference block. method.
According to claim 10,

Classification of the current sub-block within the current block is performed using a residual sample at a predefined position within the reference sub-block corresponding to the current sub-block within the reference block.
According to claim 12,

A video decoding method, characterized in that the predefined position is the upper left position or the center position.
Deriving a first prediction block for the current block;

Deriving a reference block for the current block; and

Comprising updating the first prediction block based on a second prediction block derived from the reference block,

The third prediction sample at the current prediction position in the current block is derived based on a weighted sum operation based on the first prediction sample in the first prediction block and the second prediction sample in the second prediction block,

An image encoding method, characterized in that the weight for the weighted sum calculation is determined based on the classification result at the current prediction position.
Deriving a first prediction block for the current block;

Deriving a reference block for the current block; and

Comprising updating the first prediction block based on a second prediction block derived from the reference block,

The third prediction sample at the current prediction position in the current block is derived based on a weighted sum operation based on the first prediction sample in the first prediction block and the second prediction sample in the second prediction block,

A computer-readable recording medium recording a bitstream generated by an image encoding method, wherein the weight for the weighted sum calculation is determined based on a classification result at the current prediction position.