WO2019066202A1

WO2019066202A1 - Image processing method and apparatus therefor

Info

Publication number: WO2019066202A1
Application number: PCT/KR2018/007094
Authority: WO
Inventors: 박내리; 남정학; 서정동; 이재호
Original assignee: 엘지전자(주)
Priority date: 2017-09-26
Filing date: 2018-06-22
Publication date: 2019-04-04

Abstract

Disclosed is an inter-prediction-based image processing method. Specifically, an inter-prediction-based image processing method may comprise the steps of: determining whether bi-prediction-based filtering is applied to a first prediction block and a second prediction block for a current block; when the bi-prediction-based filtering is determined to be applied thereto, applying the bi-prediction-based filtering to the first prediction block and the second prediction; and generating a final prediction block for the current block, using the filtered first prediction block and second prediction.

Description

Image processing method and apparatus therefor

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a still image or moving picture processing method, and more particularly, to a method of encoding / decoding moving picture based on an inter prediction mode and a device supporting the same.

Compressive encoding refers to a series of signal processing techniques for transmitting digitized information over a communication line or for storing it in a form suitable for a storage medium. Media such as video, image, and audio can be subject to compression coding. In particular, a technique for performing compression coding on an image is referred to as video image compression.

Next-generation video content will feature high spatial resolution, high frame rate, and high dimensionality of scene representation. Processing such content will result in a tremendous increase in terms of memory storage, memory access rate, and processing power.

Therefore, there is a need to design a coding tool for processing next generation video contents more efficiently.

In inter-picture prediction of video codec, prediction block filtering (PBF) derives the wiener filter coefficients between the original block and the prediction block and applies it to the prediction block to increase the accuracy of the prediction block and reduce the residual signal. However, this method is not suitable for the latest video codec because it increases the amount of additional information because the filter coefficient must be transmitted on a block-by-block basis. Therefore, a method of deriving a filter coefficient to reduce the amount of additional information has been proposed.

In the adaptive prediction block filtering (APBF) scheme, Wiener filter coefficients between the restoration block of the adjacent block and the prediction block of the adjacent block are derived to replace the original block and the prediction block in the encoding / decoding process, To be applied. However, the APBF scheme has a limitation in using a filter coefficient derived using a neighboring block, not a current block.

Therefore, in the present specification, a method of deriving a filter coefficient using information of a current block and determining whether to apply a filter through a condition of a block or a sample unit is proposed.

The technical objects to be achieved by the present invention are not limited to the above-mentioned technical problems, and other technical subjects which are not mentioned are described in the following description, which will be clearly understood by those skilled in the art to which the present invention belongs It will be possible.

According to an aspect of the present invention, there is provided an inter-prediction-based image processing method comprising: determining whether to apply a pair prediction-based filtering to a first prediction block and a second prediction block of a current block; Applying the bi-prediction-based filtering to the first prediction block and the second prediction block if it is determined to apply the bi-prediction-based filtering; And generating a final predicted block of the current block using the filtered first predictive block and the filtered second predictive block, wherein the bi- Wherein the first prediction block is generated by performing inter-prediction based on a list 0 reference picture, and the first prediction block is generated by performing inter-prediction based on a list 0 reference picture, The second prediction block is generated by performing inter prediction on the basis of the list 1 reference picture.

The applying the bi-prediction-based filtering to the first and second prediction blocks may include generating the average block using the first and second prediction blocks, Deriving first Wiener filter coefficients that minimize a difference between the first prediction block and the average block; Deriving second Wiener filter coefficients that minimize a difference between the second prediction block and the average block; Filtering the first prediction block using the derived first Wiener filter coefficients; And filtering the second prediction block using the derived second Wiener filter coefficients.

The generating of the average block may include: generating a first interpolation block based on the size of the first prediction block and the number of taps of the Wiener filter; Generating a second interpolation block based on the size of the second prediction block and the number of taps of the Wiener filter; And generating an average value of the first interpolation block and the second interpolation block as the average block.

Preferably, the step of determining whether to apply the biproporant-based filtering may include the step of determining whether to apply the biproporant-based filtering when the AMVP mode is applied to the current block, Wherein the AMVP mode is a mode for deriving a motion vector prediction value of the current block from a neighboring block of the current block; And determining to apply the bi-prediction-based filtering to the first and second prediction blocks when the bi-prediction-based filtering is applied to the current block according to the bi-prediction-based filtering flag .

Preferably, the step of determining whether to apply the bi-prediction-based filtering comprises: constructing a merge candidate list based on motion information of neighboring blocks of the current block when a merge mode is applied to the current block; Wherein the merge mode is a mode for deriving motion information of the current block using spatially or temporally neighboring blocks with the current block; Obtaining a merge index indicating the selected merge candidate; And determining whether to apply the bi-prediction-based filtering to the first and second prediction blocks based on the selected merge candidate indicated by the merge index.

Preferably, when the selected merge candidate is a merge candidate generated by combining other merge candidates, a zero motion vector, a candidate derived in units of subblocks, or a temporal merge candidate, the first prediction block and the second The prediction-based filtering is not applied to the prediction block.

Preferably, if the bi-prediction-based filtering is applied to the selected merge candidate, the bi-prediction-based filtering is applied to the first and second prediction blocks.

Preferably, if the current block is predicted in a bi-prediction mode, the bi-prediction-based filtering is applied to the first and second prediction blocks.

Preferably, when the list 0 reference picture and the list 1 reference picture are temporally associated with a reference picture output before the current picture and a reference picture output after the current picture, the first prediction block and the second prediction block, Lt; RTI ID = 0.0 > bi-prediction-based < / RTI >

Preferably, if the size of the current block is greater than a predetermined threshold value, the bi-prediction-based filtering is applied to the first prediction block and the second prediction block.

According to an aspect of the present invention, there is provided an inter-prediction-based image processing apparatus including: a filtering determination unit determining whether to apply a pair prediction-based filtering to a first prediction block and a second prediction block of a current block; Wherein the filtering unit applies the bi-prediction-based filtering to each of the first and second prediction blocks if it is determined to apply the bi-prediction-based filtering. And a prediction block generation unit for generating a final prediction block of the current block using the filtered first prediction block and the filtered second prediction block, wherein the bi- Wherein the first prediction block is generated by performing inter-prediction based on a list 0 reference picture, and the first prediction block is generated by performing inter-prediction based on a list 0 reference picture, The second prediction block is generated by performing inter prediction on the basis of the list 1 reference picture.

According to the embodiment of the present invention, the prediction efficiency of the prediction block can be improved and the coding efficiency can be improved by reducing the amount of information of the residual signal by filtering two prediction blocks in the pair prediction close to the average blocks of the two prediction blocks.

Further, according to the embodiment of the present invention, prediction performance and compression efficiency can be further improved by determining whether to apply the filtering in units of blocks or samples.

The effects obtained in the present invention are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the following description .

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of the specification, illustrate embodiments of the invention and, together with the description, serve to explain the technical features of the invention.

FIG. 1 is a schematic block diagram of an encoder in which still image or moving picture signal encoding is performed according to an embodiment of the present invention.

2 is a schematic block diagram of a decoder in which still image or moving picture signal encoding is performed according to an embodiment of the present invention.

3 is a diagram for explaining a division structure of a coding unit applicable to the present invention.

4 is a diagram for explaining a prediction unit that can be applied to the present invention.

5 is a diagram illustrating the direction of inter prediction, which is an embodiment to which the present invention can be applied.

Figure 6 illustrates integer and fractional sample locations for 1/4 sample interpolation as an embodiment to which the present invention may be applied.

Figure 7 illustrates the location of spatial candidates as an embodiment to which the present invention may be applied.

8 is a diagram illustrating an inter prediction method according to an embodiment to which the present invention is applied.

FIG. 9 is a diagram illustrating a motion compensation process according to an embodiment to which the present invention can be applied.

10 schematically illustrates a method of applying adaptive loop filtering, in accordance with an embodiment of the present invention.

11 schematically shows a method of applying prediction block filtering and adaptive prediction block filtering according to an embodiment of the present invention.

12 is a flowchart illustrating a motion compensation process in an inter-prediction mode for applying the bi-predictive block filtering according to an embodiment of the present invention.

13 shows a flowchart of an inter prediction based image processing method according to an embodiment of the present invention.

14 shows a block diagram of an inter prediction unit according to an embodiment of the present invention.

15 shows a structure of a contents streaming system according to an embodiment of the present invention.

Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The following detailed description, together with the accompanying drawings, is intended to illustrate exemplary embodiments of the invention and is not intended to represent the only embodiments in which the invention may be practiced. The following detailed description includes specific details in order to provide a thorough understanding of the present invention. However, those skilled in the art will appreciate that the present invention may be practiced without these specific details.

In some instances, well-known structures and devices may be omitted or may be shown in block diagram form, centering on the core functionality of each structure and device, to avoid obscuring the concepts of the present invention.

In addition, although the term used in the present invention is selected as a general term that is widely used as far as possible, a specific term will be described using a term arbitrarily selected by the applicant. In such a case, the meaning is clearly stated in the detailed description of the relevant part, so it should be understood that the name of the term used in the description of the present invention should not be simply interpreted and that the meaning of the corresponding term should be understood and interpreted .

The specific terminology used in the following description is provided to aid understanding of the present invention, and the use of such specific terminology may be changed into other forms without departing from the technical idea of the present invention. For example, signals, data, samples, pictures, frames, blocks, etc. may be appropriately replaced in each coding process.

In the present specification, 'block' or 'unit' means a unit in which encoding / decoding processes such as prediction, conversion and / or quantization are performed, and may be composed of a multi-dimensional array of samples (or pixels).

A 'block' or 'unit' may refer to a multidimensional array of samples for a luma component, or a multidimensional array of samples for a chroma component. It may also be collectively referred to as a multidimensional array of samples for a luma component and a multidimensional array of samples for a chroma component.

For example, a 'block' or a 'unit' may include a coding block (CB) indicating an array of samples to be subjected to encoding / decoding, a coding tree block (CTB) composed of a plurality of coding blocks A prediction block (PU) (Prediction Unit) representing an array of samples to which the same prediction is applied, a conversion block (TB) representing an array of samples to which the same conversion is applied, Transform Block) (or Transform Unit (TU)).

Unless otherwise stated herein, a 'block' or 'unit' is a syntax or syntax used in encoding / decoding an array of samples for a luma component and / or a chroma component, The term " sturcture " Here, the syntax structure means zero or more syntax elements existing in the bit stream in a specific order, and the syntax element means an element of data represented in the bitstream.

For example, a 'block' or a 'unit' includes a coding block (CB) and a coding unit (CU) including a syntax structure used for encoding the corresponding coding block (CB) A prediction unit PU (Prediction Unit) including a syntax structure used for predicting the prediction block PB, a conversion block TB, and a prediction unit PU (Coding Tree Unit) And a conversion unit (TU: Transform Unit) including a syntax structure used for conversion of the corresponding conversion block (TB).

The term 'block' or 'unit' is not necessarily limited to an array of samples (or pixels) in the form of a square or a rectangle, but may be a polygonal sample (or pixel, pixel) having three or more vertices. May also be used. In this case, it may be referred to as a polygon block or a polygon unit.

1, an encoder 100 includes an image divider 110, a subtractor 115, a transformer 120, a quantizer 130, an inverse quantizer 140, an inverse transformer 150, A decoding unit 160, a decoded picture buffer (DPB) 170, a predicting unit 180, and an entropy encoding unit 190. The prediction unit 180 may include an inter prediction unit 181 and an intra prediction unit 182.

The image divider 110 divides an input video signal (or a picture or a frame) input to the encoder 100 into one or more blocks.

The subtractor 115 subtracts a predicted signal (or a predicted block) from the predictor 180 (i.e., the inter prediction unit 181 or the intra prediction unit 182) )) To generate a residual signal (or a difference block). The generated difference signal (or difference block) is transmitted to the conversion unit 120.

The transforming unit 120 transforms a difference signal (or a difference block) by a transform technique (for example, DCT (Discrete Cosine Transform), DST (Discrete Sine Transform), GBT (Graph-Based Transform), KLT (Karhunen- Etc.) to generate a transform coefficient. At this time, the transform unit 120 may generate transform coefficients by performing transform using a transform technique determined according to a prediction mode applied to a difference block and a size of a difference block.

The quantization unit 130 quantizes the transform coefficients and transmits the quantized transform coefficients to the entropy encoding unit 190. The entropy encoding unit 190 entropy-codes the quantized signals and outputs them as a bitstream.

Meanwhile, the quantized signal output from the quantization unit 130 may be used to generate a prediction signal. For example, the quantized signal can be reconstructed by applying inverse quantization and inverse transformation through the inverse quantization unit 140 and the inverse transform unit 150 in the loop. A reconstructed signal (or reconstruction block) can be generated by adding the reconstructed difference signal to the prediction signal output from the inter prediction unit 181 or the intra prediction unit 182. [

On the other hand, in the compression process as described above, adjacent blocks are quantized by different quantization parameters, so that deterioration of the block boundary can be generated. This phenomenon is called blocking artifacts, and this is one of the important factors for evaluating image quality. A filtering process can be performed to reduce such deterioration. Through the filtering process, blocking deterioration is eliminated and the error of the current picture is reduced, thereby improving the image quality.

The filtering unit 160 applies filtering to the restored signal and outputs the restored signal to the playback apparatus or the decoded picture buffer 170. The filtered signal transmitted to the decoding picture buffer 170 may be used as a reference picture in the inter-prediction unit 181. [ As described above, not only the picture quality but also the coding efficiency can be improved by using the filtered picture as a reference picture in the inter picture prediction mode.

The decoded picture buffer 170 may store the filtered picture for use as a reference picture in the inter-prediction unit 181. [

The inter-prediction unit 181 performs temporal prediction and / or spatial prediction to remove temporal redundancy and / or spatial redundancy with reference to a reconstructed picture. Here, since the reference picture used for prediction is a transformed signal obtained through quantization and inverse quantization in units of blocks at the time of encoding / decoding in the previous time, blocking artifacts or ringing artifacts may exist have.

Accordingly, the inter-prediction unit 181 can interpolate signals between pixels by sub-pixel by applying a low-pass filter in order to solve the performance degradation due to discontinuity or quantization of such signals. Here, a subpixel means a virtual pixel generated by applying an interpolation filter, and an integer pixel means an actual pixel existing in a reconstructed picture. As the interpolation method, linear interpolation, bi-linear interpolation, wiener filter and the like can be applied.

The interpolation filter may be applied to a reconstructed picture to improve the accuracy of the prediction. For example, the inter prediction unit 181 may apply an interpolation filter to an integer pixel to generate an interpolation pixel, and may perform prediction using an interpolated block composed of interpolated pixels.

The intra predictor 182 predicts a current block by referring to samples in the vicinity of a block to be currently encoded. The intraprediction unit 182 may perform the following procedure to perform intra prediction. First, a reference sample necessary for generating a prediction signal can be prepared. Then, the predicted signal (predicted block) can be generated using the prepared reference sample. Thereafter, the prediction mode is encoded. At this time, reference samples can be prepared through reference sample padding and / or reference sample filtering. Since the reference samples have undergone prediction and reconstruction processes, quantization errors may exist. Therefore, a reference sample filtering process can be performed for each prediction mode used for intraprediction to reduce such errors.

A predicted signal (or a predicted block) generated through the inter prediction unit 181 or the intra prediction unit 182 is used to generate a reconstructed signal (or a reconstructed block) Block). &Lt; / RTI >

2, the decoder 200 includes an entropy decoding unit 210, an inverse quantization unit 220, an inverse transform unit 230, an adder 235, a filtering unit 240, a decoded picture buffer (DPB) A buffer unit 250, and a prediction unit 260. The prediction unit 260 may include an inter prediction unit 261 and an intra prediction unit 262.

The reconstructed video signal output through the decoder 200 may be reproduced through a reproducing apparatus.

The decoder 200 receives a signal (i.e., a bit stream) output from the encoder 100 of FIG. 1, and the received signal is entropy-decoded through the entropy decoding unit 210.

The inverse quantization unit 220 obtains a transform coefficient from the entropy-decoded signal using the quantization step size information.

The inverse transform unit 230 obtains a residual signal (or a difference block) by inverse transforming the transform coefficient by applying an inverse transform technique.

The adder 235 adds the obtained difference signal (or difference block) to a predicted signal (or prediction signal) output from the predicting unit 260 (i.e., the inter prediction unit 261 or the intra prediction unit 262) The reconstructed signal (or restoration block) is generated.

The filtering unit 240 applies filtering to a reconstructed signal (or a reconstructed block) and outputs it to a reproducing apparatus or transmits the reconstructed signal to a decoding picture buffer unit 250. The filtered signal transmitted to the decoding picture buffer unit 250 may be used as a reference picture in the inter prediction unit 261.

The embodiments described in the filtering unit 160, the inter-prediction unit 181 and the intra-prediction unit 182 of the encoder 100 respectively include the filtering unit 240 of the decoder, the inter-prediction unit 261, The same can be applied to the intra prediction unit 262.

Block division structure

Generally, a block-based image compression method is used in a still image or moving image compression technique (for example, HEVC). A block-based image compression method is a method of dividing an image into a specific block unit, and can reduce memory usage and computation amount.

The encoder divides one image (or picture) into units of a rectangular shaped coding tree unit (CTU: Coding Tree Unit). Then, one CTU is sequentially encoded according to a raster scan order.

In HEVC, the size of CTU can be set to 64X64, 32X32, 16X16. The encoder can select the size of the CTU according to the resolution of the input image or characteristics of the input image. The CTU includes a coding tree block (CTB) for a luma component and a CTB for two chroma components corresponding thereto.

One CTU can be partitioned into a quad-tree structure. That is, one CTU is divided into four units having a square shape and having a half horizontal size and a half vertical size to generate a coding unit (CU) have. This division of the quad-tree structure can be performed recursively. That is, the CU is hierarchically partitioned from one CTU to a quad-tree structure.

The CU means a basic unit of coding in which processing of an input image, for example, intra / inter prediction is performed. The CU includes a coding block (CB) for the luma component and CB for the corresponding two chroma components. In HEVC, the size of the CU can be set to 64X64, 32X32, 16X16, or 8X8.

Referring to FIG. 3, the root node of the quad-tree is associated with the CTU. The quad-tree is divided until it reaches the leaf node, and the leaf node corresponds to the CU.

More specifically, the CTU corresponds to a root node and has the smallest depth (i.e., depth = 0). Depending on the characteristics of the input image, the CTU may not be divided. In this case, the CTU corresponds to the CU.

The CTU can be partitioned into a quad tree form, resulting in subnodes with depth 1 (depth = 1). A node that is not further divided in the lower node having a depth of 1 (i.e., leaf node) corresponds to a CU. For example, CU (a), CU (b), and CU (j) corresponding to nodes a, b, and j in FIG. 3B are divided once in the CTU and have a depth of one.

At least one of the nodes having a depth of 1 can be further divided into a quad tree form, and as a result, the lower nodes having the depth 1 (i.e., depth = 2) are generated. A node that is not further divided in the lower node having a depth of 2 (i.e., a leaf node) corresponds to a CU. For example, CU (c), CU (h) and CU (i) corresponding to nodes c, h and i in FIG. 3B are divided twice in the CTU and have a depth of 2.

Also, at least one of the nodes having a depth of 2 can be further divided into a quad tree form, so that the lower nodes having a depth of 3 (i.e., depth = 3) are generated. A node that is not further divided in the lower node having a depth of 3 corresponds to a CU. For example, CU (d), CU (e), CU (f) and CU (g) corresponding to nodes d, e, f and g in FIG. Depth.

In the encoder, the maximum size or the minimum size of the CU can be determined according to the characteristics of the video image (for example, resolution) or considering the efficiency of encoding. Information on this or information capable of deriving the information may be included in the bitstream. A CU having a maximum size is called a Largest Coding Unit (LCU), and a CU having a minimum size can be referred to as a Smallest Coding Unit (SCU).

Also, a CU having a tree structure can be hierarchically divided with a predetermined maximum depth information (or maximum level information). Each divided CU can have depth information. The depth information indicates the number and / or degree of division of the CU, and therefore may include information on the size of the CU.

Since the LCU is divided into quad tree form, the size of the SCU can be obtained by using the LCU size and the maximum depth information. Conversely, by using the size of the SCU and the maximum depth information of the tree, the size of the LCU can be obtained.

For one CU, information indicating whether the corresponding CU is divided (for example, a split CU flag (split_cu_flag)) may be transmitted to the decoder. This split mode is included in all CUs except SCU. For example, if the value of the flag indicating division is '1', the corresponding CU is again divided into four CUs. If the flag indicating the division is '0', the corresponding CU is not further divided, Can be performed.

As described above, the CU is a basic unit of coding in which intra prediction or inter prediction is performed. The HEVC divides the CU into units of Prediction Unit (PU) in order to more effectively code the input image.

PU is a basic unit for generating prediction blocks, and it is possible to generate prediction blocks in units of PU different from each other in a single CU. However, PUs belonging to one CU are not mixed with intra prediction and inter prediction, and PUs belonging to one CU are coded by the same prediction method (i.e., intra prediction or inter prediction).

The PU is not divided into a quad-tree structure, and is divided into a predetermined form in one CU. This will be described with reference to the following drawings.

The PU is divided according to whether the intra prediction mode is used or the inter prediction mode is used in the coding mode of the CU to which the PU belongs.

FIG. 4A illustrates a PU when an intra prediction mode is used, and FIG. 4B illustrates a PU when an inter prediction mode is used.

4A, assuming that the size of one CU is 2NXH2N (N = 4, 8, 16, and 32), one CU is divided into two types (i.e., 2NX2N or NXN) .

Here, in the case of dividing into 2NX2N type PUs, it means that only one PU exists in one CU.

On the other hand, when the PU is divided into PUs of NXN type, one CU is divided into four PUs, and different prediction blocks are generated for each PU unit. However, the division of the PU can be performed only when the size of the CB with respect to the luminance component of the CU is the minimum size (i.e., when the CU is the SCU).

Referring to FIG. 4B, assuming that the size of one CU is 2NHX2N (N = 4, 8, 16, and 32), one CU includes eight PU types (i.e., 2NX2N, NXN, 2NXN , NX2N, nLX2N, nRX2N, 2NXnU, 2NXnD).

Similar to intraprediction, PU partitioning of the NХN type can only be performed if the size of the CB for the luminance component of the CU is the minimum size (i.e., the CU is SCU).

In the inter prediction, 2NHN type divided in the horizontal direction and NX2N type PU divided in the vertical direction are supported.

In addition, it supports PU segmentation in the form of asymmetric motion partition (AMP: Asymmetric Motion Partition) such as nLH2N, nRH2N, 2NHnU, 2NHnD. Here, 'n' means a 1/4 value of 2N. However, the AMP can not be used when the CU to which the PU belongs is the minimum size CU.

The optimal division structure of the coding unit (CU), the prediction unit (PU), and the conversion unit (TU) for efficiently encoding an input image in one CTU is a rate-distortion- Value. &Lt; / RTI > For example, if we look at the optimal CU segmentation process in a 64Х64 CTU, the rate-distortion cost can be calculated by dividing the 64X64 size CU to the 8X8 size CU. The concrete procedure is as follows.

1) Determine the optimal PU and TU partition structure that generates the minimum rate-distortion value through inter / intra prediction, transform / quantization, dequantization / inverse transform, and entropy encoding for 64X64 CUs.

2) Divide the 64X64 CU into 4 32U32 CUs and determine the optimal PU and TU partition structure to generate the minimum rate-distortion value for each 32Х32 CU.

3) Subdivide the 32Х32 CU into 4 16U16 CUs and determine the optimal PU and TU partition structure that yields the lowest rate-distortion value for each 16Х16 CU.

4) Divide 16X16 CU into 4 8X8 CUs, and determine the optimal PU and TU partition structure that yields the minimum rate-distortion value for each 8X8 CU.

5) Comparing the rate-distortion value of 16 Х16 CU calculated in the above 3) and the sum of the 4 8 8 8 CU rate-distortion values calculated in the process of 4) above, the optimal CU Determine the partition structure. This process is performed for the remaining three 16-by-16 CUs.

6) Comparing the rate-distortion value of 32Х32 CU calculated in the process of 2) above and the sum of the rate-distortion values of 4 16Х16 CUs obtained in the process of 5) above, the optimal CU Determine the partition structure. This process is also performed for the remaining 32 3232 CUs.

7) Finally, we compare the sum of the rate-distortion values of the 64 X 64 CUs calculated in the process of the above 1) and the rate-distortion values of the four 32 X 32 CUs obtained in the process of 6) CU < / RTI >

In the intra prediction mode, a prediction mode is selected in units of PU, and prediction and reconstruction are performed in units of actual TUs for the selected prediction mode.

TU means the basic unit on which the actual prediction and reconstruction are performed. The TU includes a transform block (TB) for the luma component and a TB for the two chroma components corresponding thereto.

In the example of FIG. 3, the TU is hierarchically divided into a quad-tree structure from one CU to be coded, as one CTU is divided into a quad-tree structure to generate a CU.

Since the TU is divided into quad-tree structures, the TUs segmented from the CUs can be further divided into smaller lower TUs. In HEVC, the size of the TU can be set to any one of 32 Х32, 16 Х16, 8 Х8, and 4 Х4.

Referring again to FIG. 3, it is assumed that the root node of the quadtree is associated with a CU. The quad-tree is divided until it reaches a leaf node, and the leaf node corresponds to TU.

More specifically, the CU corresponds to a root node and has the smallest depth (i.e., depth = 0). Depending on the characteristics of the input image, the CU may not be divided. In this case, the CU corresponds to the TU.

The CU can be partitioned into a quadtree form, resulting in sub-nodes with depth 1 (depth = 1). Then, a node that is not further divided in the lower node having a depth of 1 (i.e., leaf node) corresponds to TU. For example, TU (a), TU (b), and TU (j) corresponding to nodes a, b, and j in FIG. 3B are once partitioned in the CU and have a depth of one.

At least one of the nodes having a depth of 1 can be further divided into a quad tree form, and as a result, the lower nodes having the depth 1 (i.e., depth = 2) are generated. And, the node that is not further divided in the lower node having the depth of 2 (ie leaf node) corresponds to TU. For example, TU (c), TU (h) and TU (i) corresponding to nodes c, h and i in FIG. 3B are divided twice in CU and have a depth of 2.

Also, at least one of the nodes having a depth of 2 can be further divided into a quad tree form, so that the lower nodes having a depth of 3 (i.e., depth = 3) are generated. A node that is not further divided in the lower node having a depth of 3 corresponds to a CU. For example, TU (d), TU (e), TU (f), and TU (g) corresponding to nodes d, e, f and g in FIG. Depth.

A TU having a tree structure can be hierarchically divided with predetermined maximum depth information (or maximum level information). Then, each divided TU can have depth information. The depth information indicates the number and / or degree of division of the TU, and therefore may include information on the size of the TU.

For one TU, information indicating whether the corresponding TU is divided (e.g., a split TU flag (split_transform_flag)) may be communicated to the decoder. This partitioning information is included in all TUs except the minimum size TU. For example, if the value of the flag indicating whether or not to divide is '1', the corresponding TU is again divided into four TUs, and if the flag indicating the division is '0', the corresponding TU is no longer divided.

예측(prediction)Prediction

And may use the decoded portion of the current picture or other pictures that contain the current processing unit to recover the current processing unit in which decoding is performed.

A picture (slice) that uses only the current picture for restoration, that is, a picture (slice) that only performs intra prediction (or intra prediction) is referred to as an intra picture or an I picture (Slice) is referred to as a predictive picture or a P picture (slice), and a picture (slice) using a maximum of two motion vectors and a reference index is referred to as a Bi-predictive picture or a B picture .

Intra prediction refers to a prediction method that derives the current processing block from a data element (e.g., a sample value, etc.) of the same decoded picture (or slice). That is, it means a method of predicting the pixel value of the current processing block by referring to the reconstructed areas in the current picture.

Inter prediction refers to a prediction method of deriving a current processing block based on a data element (e.g., a sample value or a motion vector) of a picture other than the current picture. That is, this means a method of predicting pixel values of a current processing block by referring to reconstructed areas in other reconstructed pictures other than the current picture.

Hereinafter, inter prediction will be described in more detail.

인터Inter 예측(Inter prediction)(또는 화면 간 예측) Inter prediction (or inter prediction)

Inter prediction (or inter picture prediction) is a technique for eliminating the redundancy existing between pictures, and is mostly performed through motion estimation and motion compensation.

Referring to FIG. 5, the inter prediction includes uni-directional prediction using a past picture or a future picture as a reference picture on a time axis for one block, and bidirectional prediction Bi-directional prediction).

In addition, uni-directional prediction includes forward direction prediction using one reference picture temporally displayed (or outputting) before the current picture and forward prediction using temporally one And a backward direction prediction using a plurality of reference pictures.

The motion parameter (or information) used to specify which reference region (or reference block) is used to predict the current block in the inter prediction process (i. E., Unidirectional or bidirectional prediction) , The inter prediction mode may indicate a reference direction (i.e., unidirectional or bidirectional) and a reference list (i.e. L0, L1 or bidirectional), a reference index (or reference picture index or reference list index) And includes motion vector information. The motion vector information may include a motion vector, a motion vector predictor (MVP), or a motion vector difference (MVD). The motion vector difference value means a difference value between the motion vector and the motion vector predictor.

For unidirectional prediction, a motion parameter for one direction is used. That is, one motion parameter may be needed to specify the reference region (or reference block).

In bidirectional prediction, motion parameters for both directions are used. In the bi-directional prediction method, a maximum of two reference areas can be used. These two reference areas may exist in the same reference picture or in different pictures. That is, in the bi-directional prediction method, a maximum of two motion parameters can be used, and two motion vectors may have the same reference picture index or different reference picture indexes. At this time, the reference pictures may be all displayed (or output) temporally before the current picture, or all displayed (or output) thereafter.

In the inter prediction process, the encoder performs motion estimation (Motion Estimation) for finding a reference region most similar to the current block from the reference pictures. The encoder may then provide motion parameters for the reference region to the decoder.

The encoder / decoder can obtain the reference area of the current block using motion parameters. The reference area exists in the reference picture having the reference index. In addition, a pixel value or an interpolated value of a reference region specified by the motion vector may be used as a predictor of the current processing block. That is, motion compensation for predicting an image of a current processing block from a previously decoded picture is performed using motion information.

In order to reduce the amount of transmission related to the motion vector information, a method of acquiring a motion vector predictor (mvp) using motion information of previously coded blocks and transmitting only a difference value (mvd) therebetween may be used. That is, the decoder obtains the motion vector predictor of the current block by using the motion information of the decoded other blocks, and obtains the motion vector value for the current processing block using the difference value transmitted from the encoder. In obtaining the motion vector predictor, the decoder may obtain various motion vector candidate values using the motion information of other decoded blocks and acquire one of the candidate motion vector predictors.

Reference picture set and reference picture list

To manage multiple reference pictures, a set of previously decoded pictures is stored in the decoding picture buffer (DPB) for decoding of the remaining pictures.

The reconstructed picture used for inter prediction among reconstructed pictures stored in the DPB is referred to as a reference picture. In other words, a reference picture refers to a picture including samples that can be used for inter prediction in the decoding process of the next picture in the decoding order.

A reference picture set (RPS) refers to a set of reference pictures associated with a picture, and is composed of all the pictures previously associated in the decoding order. The reference picture set may be used for inter prediction of a picture following an associated picture or a picture associated with the decoding order. That is, the reference pictures held in the decoded picture buffer DPB may be referred to as a reference picture set. The encoder can provide the decoder with reference picture set information in a sequence parameter set (SPS) (i.e., a syntax structure composed of syntax elements) or in each slice header.

A reference picture list refers to a list of reference pictures used for inter prediction of a P picture (or a slice) or a B picture (or a slice). Here, the reference picture list can be divided into two reference picture lists and can be referred to as a reference picture list 0 (or L0) and a reference picture list 1 (or L1), respectively. Further, the reference picture belonging to the reference picture list 0 can be referred to as a reference picture 0 (or L0 reference picture), and the reference picture belonging to the reference picture list 1 can be referred to as a reference picture 1 (or L1 reference picture).

In the decoding process of the P picture (or slice), one reference picture list (i.e., reference picture list 0) is used and in the decoding process of the B picture (or slice), two reference picture lists Picture list 0 and reference picture list 1) can be used. Information for identifying the reference picture list for each reference picture may be provided to the decoder through the reference picture set information. The decoder adds the reference picture to the reference picture list 0 or the reference picture list 1 based on the reference picture set information.

A reference picture index (or a reference index) is used to identify any one specific reference picture in the reference picture list.

- fractional sample interpolation

A sample of a prediction block for an inter-predicted current block is obtained from a sample value of a corresponding reference area in a reference picture identified by a reference picture index. Here, the corresponding reference area in the reference picture indicates a region of a position indicated by a horizontal component and a vertical component of a motion vector. Fractional sample interpolation is used to generate a prediction sample for noninteger sample coordinates, except when the motion vector has an integer value. For example, a motion vector of a quarter of the distance between samples may be supported.

For HEVC, fractional sample interpolation of the luminance component applies the 8-tap filter in the horizontal and vertical directions, respectively. The fractional sample interpolation of the chrominance components applies the 4-tap filter in the horizontal direction and the vertical direction, respectively.

Referring to Fig. 6, a shaded block in which an upper-case letter (A_i, j) is written represents an integer sample position and a shaded block in which a lower-case letter (x_i, j) .

A fractional sample is generated with interpolation filters applied to integer sample values in the horizontal and vertical directions, respectively. For example, in the horizontal direction, an 8-tap filter may be applied to the left four integer sample values and the right four integer sample values based on the fraction sample to be generated.

- Inter prediction mode

In HEVC, a merge mode, AMVP (Advanced Motion Vector Prediction) can be used to reduce the amount of motion information.

1) Merge mode

The merge mode refers to a method of deriving a motion parameter (or information) from a neighboring block spatially or temporally.

The set of candidates available in the merge mode consists of spatial neighbor candidates, temporal candidates, and generated candidates.

Referring to FIG. 7A, it is determined whether or not each spatial candidate block is available according to the order of {A1, B1, B0, A0, B2}. At this time, if the candidate block is encoded in the intra-prediction mode and motion information does not exist, or if the candidate block is located outside the current picture (or slice), the candidate block can not be used.

After determining the validity of the spatial candidate, the spatial merge candidate can be constructed by excluding unnecessary candidate blocks from the candidate block of the current block. For example, if the candidate block of the current prediction block is the first prediction block in the same coding block, the candidate blocks excluding the candidate block and the same motion information may be excluded.

When the spatial merge candidate configuration is completed, the temporal merge candidate configuration process proceeds according to the order of {T0, T1}.

In the temporal candidate configuration, if a right bottom block T0 of a collocated block of a reference picture is available, the block is configured as a temporal merge candidate. A collocated block refers to a block existing at a position corresponding to a current block in a selected reference picture. Otherwise, the block (T1) located at the center of the collocated block is constructed as a temporal merge candidate.

The maximum number of merge candidates can be specified in the slice header. If the number of merge candidates is greater than the maximum number, the spatial candidates and temporal candidates smaller than the maximum number are retained. Otherwise, additional merge candidates (i.e., combined bi-predictive merging candidates) are generated by combining the candidates added so far until the number of merge candidates reaches the maximum number of candidates .

The encoder constructs a merge candidate list by performing the above-described method and performs motion estimation (Motion Estimation) to obtain a merge index (for example, merge_idx [x0] [y0] ) To signal the decoder. FIG. 7B illustrates a case where the B1 block is selected in the merge candidate list. In this case, "Index 1" can be signaled to the decoder as a merge index.

The decoder constructs a merge candidate list in the same way as the encoder and derives the motion information for the current block from the motion information of the candidate block corresponding to the merge index received from the encoder in the merge candidate list. Then, the decoder generates a prediction block for the current block based on the derived motion information (i.e., motion compensation).

2) Advanced Motion Vector Prediction (AMVP) mode

The AMVP mode refers to a method of deriving motion vector prediction values from neighboring blocks. Thus, the horizontal and vertical motion vector difference (MVD), reference index, and inter prediction mode are signaled to the decoder. The horizontal and vertical motion vector values are calculated using the derived motion vector prediction value and the motion vector difference (MVD) provided from the encoder.

That is, the encoder constructs a motion vector predictor candidate list and performs motion estimation (Motion Estimation) to select a motion vector predictor flag (i.e., candidate block information) (e.g., mvp_lX_flag [ x0] [y0] ') to the decoder. The decoder constructs a motion vector predictor candidate list in the same way as the encoder. In the motion vector predictor candidate list, the motion vector prediction of the current processing block is performed using the motion information of the candidate block indicated in the motion vector predictor flag received from the encoder To derive a person. Then, the decoder obtains a motion vector value for the current processing block using the derived motion vector predictor and the motion vector difference value transmitted from the encoder. Then, the decoder generates a predicted block (i.e., an array of predicted samples) for the current block based on the derived motion information (i.e., motion compensation).

In the case of the AMVP mode, two spatial motion candidates are selected from among the five available candidates in Fig. The first spatial motion candidate is selected from the set {A0, A1} located on the left and the second spatial motion candidate is selected from the set {B0, B1, B2} located on the upper. At this time, if the reference index of the neighboring candidate block is not the same as the current prediction block, the motion vector is scaled.

If the number of selected candidates is two, the candidate composition is terminated. If the number of selected candidates is less than two, temporal motion candidates are added.

Referring to FIG. 8, a decoder (specifically, the inter-prediction unit 261 of the decoder in FIG. 2) decodes a motion parameter for a processing block (for example, a prediction block) (S801).

For example, if a merge mode is applied to the current block, the decoder can decode the signaled merge index from the encoder. Then, the decoder can derive the motion parameter of the current block from the motion parameter of the candidate block indicated in the merge index.

Further, when the AMVP mode is applied to the current block, the decoder can decode the horizontal and vertical motion vector difference (MVD) signaled from the encoder, the reference index and the inter prediction mode. The motion vector predictor is derived from the motion parameter of the candidate block indicated by the motion vector predictor flag, and the motion vector value of the current block can be derived using the motion vector predictor and the received motion vector difference value.

The decoder performs motion compensation on the current block using the decoded motion parameter (or information) (S802).

That is, the encoder / decoder performs motion compensation for predicting an image of a current block from a previously decoded picture (i.e., generating a prediction block for a current unit) using the decoded motion parameters. In other words, the encoder / decoder can derive a predicted block (i.e., an array of predicted samples) of the current block from a sample of the area corresponding to the current block in the previously decoded reference picture.

In FIG. 9, the motion parameters for the current block to be coded in the current picture are unidirectional prediction, the second picture in LIST0, the second picture in LIST0, and the motion vector (-a, b) do.

In this case, as shown in FIG. 9, the current block is predicted using the value of the position (-a, b) of the current block in the second picture of LIST0 (i.e., the sample value of the reference block).

In the case of bidirectional prediction, another reference list (e.g., LIST1), a reference index, and a motion vector difference value are transmitted, and the decoder derives two reference blocks and predicts the current block based on the two reference blocks To generate a predicted sample of the block).

Adaptive loop filtering (ALF) is a technique for acquiring an image similar to an original image by applying a filter to a reconstructed picture to compensate for errors due to prediction and quantization.

When the ALF is applied, the encoder derives the coefficients of the Wiener filter using the original block and the reconstructed block, and applies the coefficients of the derived Wiener filter to the restoration block. A filtered reconstructed block may be obtained.

Referring to FIG. 10, a restoration block is obtained by adding a prediction block and a residual block. A circular symbol + (10010) represents addition. Then, the coefficients of the Wiener filter between the restoration block and the original block are acquired (computed). The circular symbol M (10020) indicates that the coefficients of the Wiener filter are calculated (or the Wiener filter is applied). The filter coefficient of FIG. 10 represents the Wiener filter coefficient. In other words, the ALF method uses the restored block and the original block as inputs to calculate the Wiener filter coefficients. The coefficients of the obtained Wiener filter are applied to the reconstruction block, whereby the filtered reconstruction block is obtained. The coefficients of the obtained Wiener filter are transmitted to the decoder.

The ALF technique can improve the peak signal-to-noise ratio (PSNR) by applying a filter to the reconstructed block (picture). In ALF, the filter coefficient is calculated on a picture-by-picture basis, and the encoder transmits the calculated filter coefficient of the picture unit to the decoder. Since the ALF performs filtering on the reconstruction block of the current block, the coding efficiency can not be improved by reducing the data amount of the residual signal of the current picture. Instead, the ALF can improve the coding efficiency by using the filtered reconstruction picture as an enhanced reference picture of the picture (future picture) decoded after the current picture.

In FIG. 11, (a) is a schematic diagram of a prediction block filtering (PBF) technique and (b) is a schematic diagram of an adaptive prediction block filtering (APBF) technique.

Predictive block filtering (PBF) is to improve prediction accuracy and coding efficiency by applying a filter to the prediction block to compensate for errors due to prediction and quantization. When PBF is applied, the encoder calculates the Wiener filter coefficients between the original block and the prediction block, and applies the calculated Wiener filter coefficients to the prediction block to improve the accuracy of the prediction block and the coding efficiency.

Referring to FIG. 11 (a), the coefficients of the Wiener filter between the prediction block and the original block are acquired (computed). The circular symbol M (11010) indicates that the coefficient of the Wiener filter is calculated (or the Wiener filter is applied). In other words, the PBF method uses the prediction block and the original block as inputs to calculate the Wiener filter coefficients. In Fig. 11 (a), the filter coefficient indicates the Wiener filter coefficient. The obtained Wiener filter coefficients are applied to the prediction block, whereby the filtered prediction block is obtained. Thereafter, a modified reconstructed block is obtained by adding the filtered residual block to the filtered residual block. The modified residual block indicates that the residual block has also changed because the prediction block is changed due to filtering (application of the Wiener filter). The residual block is obtained by subtracting the prediction block from the original block. The circular symbol + (11020) represents addition.

However, since the PBF scheme calculates the filter coefficients in units of blocks, the PBF scheme has a disadvantage in that the filter coefficients in units of blocks must be transmitted to the decoder. Therefore, the use of the PBF scheme increases the amount of additional information for transmitting the filter coefficients. 11A, "X" indicated in the filter coefficient under the modified restoration block means that the PBF scheme is not suitable for improving the coding efficiency due to an increase in the data amount of the additional information to be transmitted.

In order to reduce the amount of additional information, the following adaptive predictive block filtering (APBF) scheme has been proposed as a method of deriving a filter coefficient by a decoder.

Adaptive Predictive Blocking (APBF) derives the filter coefficients using information of neighboring blocks of the current block rather than the current block, and applies the derived filter coefficients to the prediction blocks to improve the prediction accuracy and coding efficiency .

The decoder does not have information on the original block, which is the target block for deriving the filter coefficients. Therefore, in order to replace the original block, the APBF method derives a filter coefficient that can improve the accuracy of a prediction block of a neighboring block by using a reconstruction block of a neighboring block. The filter coefficients derived in this way are used for the prediction block of the current block.

Specifically, when APBF is applied, the decoder derives the Wiener filter coefficients between the reconstructed block of the neighboring block (or neighboring block) and the predicted block of the neighboring block instead of the original block of the current block and the predicted block of the current block, The coefficient is applied to the prediction block of the current block.

11 (b), a Wiener filter coefficient is obtained (calculated) between a prediction block of a neighboring block and a restoration block of a neighboring block. Circular symbol M (11030) indicates that the coefficients of the Wiener filter are calculated (or the Wiener filter is applied). That is, the APBF scheme uses the prediction block of the neighboring block and the restoration block of the neighboring block as an input for calculating the Wiener filter coefficient. 11 (b), the filter coefficient indicates the Wiener filter coefficient. The obtained Wiener filter coefficients are applied to the prediction block of the current block, whereby the filtered prediction block is obtained.

A Wiener filter is a filter that transforms the input as closely as possible to the desired output. Here, the meaning of 'as close as possible' means that the sum of squares of the difference between the filter input and the desired result is minimized. That is, the Wiener filter is a filter that minimizes the mean square error between the input and the desired output.

Equation (1) below is an example of an equation for calculating the coefficients of the Wiener filter.

In Equation (1), C represents the Wiener filter coefficient, and x and y represent the coordinates of the sample in the block. i and j denote the coordinates in the Wiener filter. c_ (i, j) represents the coefficient of the (i, j) coordinate in the winner filter coefficient. N represents the filter size, where the number of filter taps is 2N + 1. R denotes a reconstruction block, and P denotes a prediction block. The restoration block R and the prediction block P correspond to the inputs of Equation (1). That is, Equation (1) corresponds to a formula for obtaining a Wiener filter coefficient (C) that minimizes an error between a reconstruction block (R) and a prediction block (P).

With respect to Equation (1), the input used for the coefficient (C) calculation of the Wiener filter can be changed according to the filtering method. For example, an adaptive loop filter (ALF) computes a Wiener filter coefficient C by using an original block O and a reconstruction block R as inputs instead of a reconstruction block R and a prediction block P, (See Fig. 10). Prediction block filtering (PBF) uses the original block O and the prediction block P as inputs (see the description of FIG. 11 (a)). Adaptive prediction block filtering (APBF) uses a reconstruction block R and a prediction block P as inputs (see the description of FIG. 11 (b)). Bi-prediction block filtering (BPBF) proposed in this specification uses an average block (avg (P0, P1)) and a prediction block (P0 / P1) as inputs. Details of the BPBF will be described later.

Unlike the BPF, the APBF does not transmit the filter coefficient information to the decoder, and the decoder reduces the amount of additional information transmitted by deriving the filter coefficient. However, since the APBF uses a filter coefficient derived from the information of the adjacent block rather than the information of the current block, it has a limitation in increasing the accuracy of the prediction block.

In order to further increase the accuracy of prediction, the following bi-prediction block filtering (BPBF) method is used as a method of deriving a filter coefficient using information of a current block in inter-prediction I suggest.

쌍예측Pair prediction 블록 block 필터링Filtering (( BPBFBPBF : bi-prediction block filtering): bi-prediction block filtering)

The decoder can improve prediction accuracy and coding efficiency by filtering two prediction blocks in different reference picture lists obtained by bidirectional prediction similar to the average blocks of the two prediction blocks. This approach can be referred to as bi-predictive block filtering or bi-predictive based filtering and the like.

Pair prediction block filtering (BPBF) is performed so that each of the prediction block P0 of the reference picture list 0 and the prediction block of the reference picture list P1 is similar to the average block avg (P0, P1) of the two prediction blocks Thereby improving the accuracy of prediction and the coding efficiency.

In this specification, a prediction block obtained based on reference picture list 0 in bidirectional prediction may be referred to as P0 (or P0 block), and a prediction block obtained based on reference picture list 1 may be referred to as P1 (or P1 block) Lt; / RTI > One prediction block composed of the average values of the P0 block and the P1 block may be referred to as Avg (P0, P1). Avg (P0, P1) corresponds to an average value (or an average block) of the P1 block and the P0 block, and may also be referred to as an average prediction block. Also, in this specification, the generation of one block (or an operation in which an average block of two blocks is generated) as an average value of two blocks may be referred to as an average sum.

In the case of bidirectional prediction during inter prediction, one prediction block (i.e., Avg (P0, P1)) generated from the average values of the P0 block and the P1 block is used as the best prediction block of the current block. That is, the average block (Avg (P0, P1)) of the two prediction blocks can be regarded as a block most similar to the original block. Therefore, the prediction performance and the coding efficiency can be further improved by refining the P0 block and the P1 block to become more similar to the average prediction block (Avg (P0, P1)).

For this purpose, the proposed Prediction Block Filtering (BPBF) can further improve the accuracy of the prediction block by refinement such that the two prediction blocks P0 and P1 are similar to the average prediction block Avg (P0, P1) have.

Specifically, the pair prediction block filtering (BPBF) derives a Wiener filter coefficient that minimizes an error between the average prediction block (Avg (P0, P1)) and the P0 block, and applies the derived filter coefficient to the P0 block, . Similarly, the Pair Prediction Block Filtering (BPBF) derives Wiener filter coefficients that minimize errors between the average prediction block (Avg (P0, P1)) and the P1 block, and applies the derived filter coefficients to the P1 block to refine the P1 block do.

Pair Prediction Block Filtering (BPBF) can increase the accuracy of prediction by using refined prediction blocks, thereby reducing the information amount of residual signals and improving coding efficiency. In addition, the BPBF can be adaptively applied to various sequences by determining whether refinement is applied in block or sample units.

Hereinafter, an embodiment for determining whether or not to apply the bi-predictive block filtering (BPBF) on a block-by-block basis will be described first, and then an embodiment for determining on a sample-by-sample basis will be described.

Example 1 (embodiment 1)

According to the present embodiment, the decoder can determine whether to apply the BPBF on a block-by-block basis.

The encoder can signal information indicating whether the BPBF is applied to the decoder on a block-by-block basis. The flag indicating whether or not the BPBF is applied may be referred to as a BPBF flag (bpbf_flag). The encoder can determine whether to signal the BPBF flag depending on whether the inter prediction mode of the current block is the AMVP mode or the merge mode.

When the AMVP mode is applied to the current block, the BPBF flag (bpbf_flag) is signaled to the decoder only when the prediction direction of the corresponding block is bidirectional prediction, and is not signaled if it is not bidirectional prediction.

When the merge mode is applied to the current block, whether to apply the BPBF of the current block can be determined according to 'bpbf_flag' of the selected candidate block.

(E.g., a combined motion vector predictor (MVP) combined bi-predictive merging candidates), or if the selected motion vector predictor is a zero vector, The decoder may not apply the BPBF to the current block if the block (or coding unit) is a candidate derived on a divided sub-block basis, or corresponds to a temporal motion vector predictor (TMVP) or the like.

Referring to FIG. 12, the decoder may perform different motion compensation processes when the current block does not satisfy the specific condition, or when the block prediction block filtering (BPBF) is applied or not. . The following procedure will be described with reference to a decoder, but it can also be performed by an encoder as well.

First, the decoder confirms (or determines) whether or not the current block satisfies a predefined condition (S12010). Examples of conditions for determining whether to apply the BPBF will be described below.

As an example, the condition may be whether or not the BPBF flag (bpbf_flag) indicates that the BPBF is applied to the current block (condition 1). If the obtained 'bpbf_flag' indicates that the BPBF is applied to the current block, the decoder can decide to apply the BPBF to the current block. If no syntax for BPBF (e.g., 'bpbf_flag') is present, the following conditions may be used.

As an example, the condition may be whether or not the current block has been predicted in a bi-prediction mode (condition 2). The decoder can decide to apply the BPBF if the current block is predicted in the bi-prediction mode.

For example, the condition may be predicted in the bi-predictive mode, and whether the picture order counter (POC) of the two reference pictures is in a different direction with respect to the current picture (Condition 3). The POC is the same as the display order. In other words, the decoder can decide to apply the BPBF to the current block if the two reference pictures are a past picture and a future picture, respectively, with respect to the time axis of the current picture.

As an example, the specific condition may be the size of the current block (condition 4). As a method for determining whether to apply the BPBF, the characteristics of the block can be considered. For example, the decoder may decide to apply BPBF if the size of the current block (e.g., a coding unit) is greater than 8 占 8 and not BPBF if less than 8 占 8. If the current block size is small, it is possible to generate a relatively optimal prediction block through the motion estimation process, so that the decoder can not apply the BPBF considering the signaling overhead due to the BPBF flag ('bpbf_flag').

The decoder decides to apply the BPBF to the current block if the current block satisfies a certain condition (or if the condition is true), and performs the following S12020 to S12050.

The decoder performs interpolation on a block having a size of (W + T_W) X (H + T_H) in each reference picture list (S12020).

In the course of interpolation, the shape of the Wiener filter is taken into account. For example, the shape of the Wiener filter may be M xl, M x N, N x M or M x M, and may also be a 5 x 5 diamond shape, 7 x 7 diamond shape, or 9 x 9 diamond shape.

In step S12020, W represents the width of the current block, and H represents the height of the current block. T_W and T_H represent values derived from the number of horizontal filter taps and the number of vertical filter taps, respectively. As an example, (W + T_W) Х (H + T_H) is 12 Х12 corresponding to (8 + 4) Х (8 + 4) when the Wiener filter is 5 Х5 and the block is 8 Х8. That is, the value of T_W and T_H is 4 at this time. As another example, (W + T_W) Х (H + T_H) is 14 Х12 corresponding to (8 + 6) Х (8 + 4) when the Wiener filter is 7 Х5 and 8 Х8. That is, the value of T_W is 6 and the value of T_H is 4 at this time.

The decoder repeats the process of S12020 in the reference picture list 0 (L0) and the reference picture list 1 (L1), respectively. Through this interpolation process, the P0 block and the P1 block used for generating the average prediction block (Avg (P0, P1)) are obtained.

Then, the decoder generates an average prediction block (Avg (P0, P1)) which is an average sum of the two prediction blocks obtained in step S12020 (S12030). That is, the decoder generates the average value (or average block) of the P0 block and the P1 block as an average prediction block (Avg (P0, P1)).

The decoder then calculates the Wiener filter coefficients between the prediction blocks P0 and P1 in the respective directions and the average prediction block Avg (P0, P1), and outputs the calculated filter coefficients to the prediction blocks P0 and P1 in each direction (S12040). Specifically, the decoder calculates a filter coefficient that minimizes the error between the P0 block and the average prediction block (Avg (P0, P1)), and then applies the calculated filter coefficient to the P0 block. Further, the decoder calculates the filter coefficient that minimizes the error between the P1 block and the average prediction block (Avg (P0, P1)), and then applies the calculated filter coefficient to the P1 block. Referring to Equation (1) for calculating the coefficients of the Wiener filter, bi-prediction block filtering (BPBF) is a method of calculating an average prediction block (Avg (P0, P1)) and a prediction block .

That is, in step S12040, the P0 block and the P1 block are filtered. The decoder performs refinement so that the P0 block and the P1 block are similar to the average prediction block (Avg (P0, P1)) by performing the calculation and application of the filter coefficients in the P0 block and the P1 block, respectively.

Then, the decoder generates a final prediction block by calculating the average of the sum (Sum Average) of the filtered prediction block P0 and P1 to the secondary ^{(2 nd Average sum) (S12050} ).

The decoder determines not to apply the BPBF to the current block if the current block does not satisfy a certain condition (or if the condition is false), and performs steps S12060 to S12070.

Steps S12060 and S12070 are the same as those of the existing inter prediction. The decoder performs interpolation for each reference picture in the reference picture lists 0 and 1 (S12060). Thereafter, the decoder generates a prediction block by calculating an average sum of two reference blocks selected from the interpolated reference pictures (S12070). That is, the average value (average block) of the two reference predictions is generated as the final prediction block.

Example 2 (embodiment 1)

According to this embodiment, the decoder can determine whether to apply BPBF in units of samples (or pixels). We propose two methods to decide whether to apply BPBF in sample units. The following procedure will be described with reference to a decoder, but it can also be performed by an encoder as well.

For example, as a first method for determining whether or not the BPBF is applied in units of samples, a variation amount of a sample within a certain interval may be used. The decoder can decide not to apply the BPBF to the sample if the amount of change in the sample in a particular region (or window) around one sample in one block is greater than a predetermined threshold. As an example, the size of the specific area may be 5 X 5 area.

The threshold value for the variation of the sample value within a specific region may be different depending on the quantization parameter (QP). When the threshold is adjusted according to the QP, the larger the QP, the smaller the threshold value and the more the BPBF is not applied. At this time, the amount of change in the specific region can be calculated by the following method.

For example, the first calculation method of calculating the variation of the sample in the specific region (window) is a method of calculating the sum of the difference between the average of the sample values in the window and the respective sample values as the variation amount of the sample. Equation (2) below is an example of a formula for calculating the amount of change of the sample using the first calculation method.

In Equation (2), (x-) on the right side in parentheses represents an average value in the window. i and j represent positions in the window. The size of the window is NХN. (1 / N ² ) may not be applied considering the computational complexity and loss of information due to down-scale.

For example, a second calculation method for calculating the amount of change of a sample in a specific area (window) is a method of calculating a sum of a difference value between a sample value of an intermediate position (target sample) in the window and another sample value in the window as a variation amount of the sample Method. Equation (3) below is an example of a formula for calculating the amount of change of a sample using the second calculation method.

In Equation (2), i and j denote positions in the window. Also, k and l represent an intermediate position in the window (or the position of the target sample). The size of the window is NХN. 1 / N ² may not be applied considering the computational complexity and loss of information due to down-scale.

Further, the decoder can compute the amount of change in the window by dividing the horizontal axis and the vertical axis. In this case, the decoder can change the shape of the Wiener filter in consideration of the amount of change in each direction, and apply the changed Wiener filter to the prediction block. That is, the decoder can determine the shape of the Wiener filter according to the variation of the sample value in the horizontal or vertical direction.

For example, when the 5 占 5 window is used, the shape of the Wiener filter may be changed from 5 占 5 to 7 占 5 when the decoder has a large variation in the horizontal axis (or in the horizontal direction). Further, when the variation of the vertical axis (or vertical direction) is large, the shape of the Wiener filter can be changed from 5 占 5 to 5 占 7.

In addition, the directionality of the motion vector can be considered as a method for determining the shape of the Wiener filter. For example, if the 5X5 window is used and the x value of the motion vector having (x, y) is large, the shape of the Wiener filter may be changed from 5X5 to 7X5. When the y value of the motion vector is large, the shape of the Wiener filter can be changed from 5 占 5 to 5 占 7.

The encoder / decoder determines whether to apply the pair prediction based filtering to the first prediction block and the second prediction block of the current block (S13010).

The prediction-based filtering corresponds to filtering (i.e., BPBF filtering) for approximating each of the first and second prediction blocks to an average block generated based on the first prediction block and the second prediction block. For more information on pair prediction-based filtering, see the description of BPBF filtering described above.

The first prediction block is generated by performing inter prediction on the basis of the list 0 reference picture, and the second prediction block is generated by performing inter prediction on the basis of the list 1 reference picture.

After that, if the encoder / decoder determines to perform the bi-prediction-based filtering, the bi-prediction-based filtering is applied to the first and second prediction blocks (S13020). For a specific description on the process of applying the pair prediction-based filtering (BPBF filtering), refer to the description of FIG. 12 described above.

Thereafter, the encoder / decoder generates a final predicted block of the current block using the filtered first predictive block and the filtered second predictive block (S13030). A residual block can be generated using an encoder final prediction block, and a decoder can generate a reconstruction block using a final prediction block.

Referring to Fig. 14, the inter prediction unit implements the functions, processes and / or methods proposed in the description related to Figs. 11 to 13 above. Specifically, the inter prediction unit may include a filtering determination unit 14010, a filtering application unit 14020, and a prediction block generation unit 14030.

The filtering determination unit 14010 may determine whether to apply the bi-prediction-based filtering to the first prediction block and the second prediction block of the current block. Pair prediction-based filtering may correspond to filtering (i.e., BPBF filtering) for approximating each of the first and second prediction blocks to an average block generated based on the first prediction block and the second prediction block.

The filtering application unit 14020 may apply the pair prediction-based filtering to the first prediction block and the second prediction block, respectively, if it is determined that the bi-prediction-based filtering is applied. Here, the first prediction block may be generated by applying inter prediction on the basis of the list 0 reference picture, and the second prediction block may be generated by performing inter prediction on the basis of the list 1 reference picture.

The prediction block generator 14030 may generate a final prediction block of the current block using the filtered first predictive block and the filtered second predictive block.

The filtering application unit 14020 generates the average block using the first prediction block and the second prediction block, derives first Wiener filter coefficients that minimize a difference between the first prediction block and the average block, Deriving second Wiener filter coefficients that minimize the difference between the prediction block and the average block, filtering the first prediction block using the derived first Wiener filter coefficients, and filtering the second predicted block using the derived second Wiener filter coefficients, The prediction block can be filtered.

In generating the average block, the filtering application unit 14020 generates a first interpolation block based on the size of the first prediction block and the number of taps of the Wiener filter, and the size of the second prediction block and the number of taps of the Wiener filter And a mean value of the first interpolation block and the second interpolation block may be generated as an average block.

The filtering decision unit 14010 can obtain a bi-prediction-based filtering flag indicating whether bi-prediction-based filtering is applied when the AMVP mode is applied to the current block. Then, the filtering decision unit 14010 can determine to apply the bi-prediction-based filtering to the first and second prediction blocks when the bi-prediction-based filtering is applied to the current block according to the bi-prediction-based filtering flag.

In one embodiment, when the merge mode is applied to the current block, the filtering determination unit 14010 may construct a merge candidate list based on motion information of neighboring blocks of the current block. Thereafter, the filtering determination unit 14010 obtains a merge index indicating the selected merge candidate, and determines whether to apply pair prediction-based filtering to the first prediction block and the second prediction block based on the selected merge candidate indicated by the merge index Can be determined.

In one embodiment, if the selected merge candidate is a merge candidate generated by combining other merge candidates, a zero motion vector, a candidate derived in units of subblocks, or a temporal merge candidate, 1 prediction block and the second prediction block may not be applied.

In one embodiment, the filtering determination unit 14010 may determine to apply the bi-prediction-based filtering to the first and second prediction blocks when the bi-prediction-based filtering is applied to the selected merge candidate.

In one embodiment, the filtering determination unit 14010 may determine to apply bi-prediction-based filtering to the first and second prediction blocks when the current block is predicted in a bi-prediction mode .

In one embodiment, if the list 0 reference picture and the list 1 reference picture temporally correspond to the reference picture output before the current picture and the reference picture output after the current picture, the filtering determination unit 14010 determines that the first prediction block Prediction-based filtering on the first prediction block and the second prediction block.

In one embodiment, the filtering determination unit 14010 may determine to apply the bi-prediction-based filtering to the first and second prediction blocks if the size of the current block is greater than a predetermined threshold value.

In one embodiment, the filtering determination unit 14010 may determine whether to apply bi-prediction-based filtering of the current sample based on the amount of change of the sample value within a specific area around the current sample. The variation amount of the sample value may include a variation amount of the horizontal direction sample value in the specific region and a variation amount of the vertical direction sample value.

For example, the filtering determination unit 14010 does not apply the bi-prediction-based filtering to the current sample if the sum of the average of the sample values in the specific region and the difference values between the respective sample values in the specific region is greater than a predetermined threshold value You can decide. Further, the filtering decision unit 14020 can decide not to apply the bi-predictive based filtering to the current sample if the sum of the difference values between the sample value at the intermediate position in the specific area and each sample value within the specific area is greater than a predetermined threshold value have. Here, the threshold value may be determined based on the quantization parameter value.

In one embodiment, the filtering determination unit 14010 can change the shape of the Wiener filter according to a direction having a larger amount of change in the horizontal direction sample value and the vertical direction sample value.

Referring to FIG. 15, the content streaming system to which the present invention is applied may include an encoding server, a streaming server, a web server, a media repository, a user device, and a multimedia input device.

The encoding server compresses content input from multimedia input devices such as a smart phone, a camera, and a camcorder into digital data to generate a bit stream and transmit the bit stream to the streaming server. As another example, when a multimedia input device such as a smart phone, a camera, a camcorder, or the like directly generates a bitstream, the encoding server may be omitted.

The bitstream may be generated by an encoding method or a bitstream generating method to which the present invention is applied, and the streaming server may temporarily store the bitstream in the process of transmitting or receiving the bitstream.

The streaming server transmits multimedia data to a user device based on a user request through the web server, and the web server serves as a medium for informing the user of what services are available. When a user requests a desired service to the web server, the web server delivers it to the streaming server, and the streaming server transmits the multimedia data to the user. At this time, the content streaming system may include a separate control server. In this case, the control server controls commands / responses among the devices in the content streaming system.

The streaming server may receive content from a media repository and / or an encoding server. For example, when receiving the content from the encoding server, the content can be received in real time. In this case, in order to provide a smooth streaming service, the streaming server can store the bit stream for a predetermined time.

Examples of the user device include a mobile phone, a smart phone, a laptop computer, a digital broadcasting terminal, a personal digital assistant (PDA), a portable multimedia player (PMP), a navigation device, a slate PC, Such as tablet PCs, ultrabooks, wearable devices (e.g., smartwatches, smart glass, HMDs (head mounted displays)), digital TVs, desktops Computers, and digital signage.

Each of the servers in the content streaming system can be operated as a distributed server. In this case, data received at each server can be distributed.

As described above, the embodiments described in the present invention can be implemented and executed on a processor, a microprocessor, a controller, or a chip. For example, the functional units depicted in the figures may be implemented and implemented on a computer, processor, microprocessor, controller, or chip.

In addition, the decoder and encoder to which the present invention is applied can be applied to multimedia communication devices such as a multimedia broadcasting transmitting and receiving device, a mobile communication terminal, a home cinema video device, a digital cinema video device, a surveillance camera, a video chatting device, (3D) video devices, video telephony video devices, and medical video devices, and the like, which may be included in, for example, a storage medium, a camcorder, a video on demand (VoD) service provision device, an OTT video (Over the top video) And may be used to process video signals or data signals. For example, the OTT video (Over the top video) device may include a game console, a Blu-ray player, an Internet access TV, a home theater system, a smart phone, a tablet PC, a DVR (Digital Video Recorder)

Further, the processing method to which the present invention is applied may be produced in the form of a computer-executed program, and may be stored in a computer-readable recording medium. The multimedia data having the data structure according to the present invention can also be stored in a computer-readable recording medium. The computer-readable recording medium includes all kinds of storage devices and distributed storage devices in which computer-readable data is stored. The computer-readable recording medium may be, for example, a Blu-ray Disc (BD), a Universal Serial Bus (USB), a ROM, a PROM, an EPROM, an EEPROM, a RAM, a CD- Data storage devices. In addition, the computer-readable recording medium includes media implemented in the form of a carrier wave (for example, transmission over the Internet). In addition, the bit stream generated by the encoding method can be stored in a computer-readable recording medium or transmitted over a wired or wireless communication network.

Further, an embodiment of the present invention may be embodied as a computer program product by program code, and the program code may be executed in a computer according to an embodiment of the present invention. The program code may be stored on a carrier readable by a computer.

Embodiments in accordance with the present invention may be implemented by various means, for example, hardware, firmware, software, or a combination thereof. In the case of hardware implementation, an embodiment of the present invention may include one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs) field programmable gate arrays, processors, controllers, microcontrollers, microprocessors, and the like.

In the case of an implementation by firmware or software, an embodiment of the present invention may be implemented in the form of a module, a procedure, a function, or the like for performing the functions or operations described above. The software code can be stored in memory and driven by the processor. The memory is located inside or outside the processor and can exchange data with the processor by various means already known.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention as defined by the appended claims. , Substitution or addition, or the like.

Claims

In an inter prediction-based image processing method,

Determining whether to apply bi-prediction-based filtering to a first prediction block and a second prediction block of a current block;

Applying the bi-prediction-based filtering to the first prediction block and the second prediction block if it is determined to apply the bi-prediction-based filtering; And

Generating a final predicted block of the current block using the filtered first predictive block and the filtered second predictive block,

Wherein the prediction-based filtering is filtering for approximating each of the first and second prediction blocks to an average block generated based on the first prediction block and the second prediction block, Wherein the second prediction block is generated by performing inter-prediction based on a list 0 reference picture, and the second prediction block is generated by performing inter-prediction based on a list 1 reference picture.
The method according to claim 1,

Wherein the applying the bi-prediction-based filtering to the first prediction block and the second prediction block comprises:

Generating the average block using the first prediction block and the second prediction block;

Deriving first Wiener filter coefficients that minimize a difference between the first prediction block and the average block;

Deriving second Wiener filter coefficients that minimize a difference between the second prediction block and the average block;

Filtering the first prediction block using the derived first Wiener filter coefficients; And

And filtering the second prediction block using the derived second Wiener filter coefficients.
3. The method of claim 2,

Wherein generating the average block comprises:

Generating a first interpolation block based on the size of the first prediction block and the number of taps of the Wiener filter;

Generating a second interpolation block based on the size of the second prediction block and the number of taps of the Wiener filter; And

And generating an average value of the first interpolation block and the second interpolation block as the average block.
The method according to claim 1,

Wherein the step of determining whether to apply the bi-

Prediction-based filtering flag indicating whether to apply the bi-prediction-based filtering when the AMVP mode is applied to the current block, wherein the AMVP mode includes: A mode for deriving a motion vector prediction value of a block; And

Prediction-based filtering is applied to the current block according to the bi-prediction-based filtering flag, determining to apply the bi-prediction-based filtering to the first and second prediction blocks How to.
The method according to claim 1,

Wherein the step of determining whether to apply the bi-

Constructing a merge candidate list based on motion information of neighboring blocks of the current block when a merge mode is applied to the current block, wherein the merge mode comprises: The motion information of the current block is derived using the motion information;

Obtaining a merge index indicating the selected merge candidate; And

Further comprising determining whether to apply the pair prediction based filtering to the first and second prediction blocks based on the selected merge candidate indicated by the merge index.
6. The method of claim 5,

If the selected merge candidate is a merge candidate generated by combining other merge candidates, a zero motion vector, a candidate derived in units of subblocks, or a temporal merge candidate, the first and second prediction blocks Wherein the bi-prediction based filtering is not applied.
6. The method of claim 5,

Wherein the bi-prediction based filtering is applied to the first prediction block and the second prediction block when the bi-prediction based filtering is applied to the selected merge candidate.
The method according to claim 1,

Wherein the bi-prediction-based filtering is applied to the first and second prediction blocks when the current block is predicted in a bi-prediction mode.
The method according to claim 1,

When the list 0 reference picture and the list 1 reference picture are temporally associated with a reference picture output before the current picture and a reference picture output after the current picture, How prediction-based filtering is applied.
The method according to claim 1,

Wherein the bi-prediction-based filtering is applied to the first and second prediction blocks if the size of the current block is greater than a predetermined threshold.
In an inter prediction-based image processing apparatus,

A filtering determination unit for determining whether to apply a pair prediction based filtering to a first prediction block and a second prediction block of a current block;

Wherein the filtering unit applies the bi-prediction-based filtering to each of the first and second prediction blocks if it is determined to apply the bi-prediction-based filtering. And

And a predictive block generator for generating a final predictive block of the current block using the filtered first predictive block and the filtered second predictive block,

Wherein the prediction-based filtering is filtering for approximating each of the first and second prediction blocks to an average block generated based on the first prediction block and the second prediction block, Wherein the second prediction block is generated by performing inter-prediction based on a list 0 reference picture, and the second prediction block is generated by performing inter-prediction based on a list 1 reference picture.