CN109644276B

CN109644276B - Image encoding/decoding method

Info

Publication number: CN109644276B
Application number: CN201780048129.1A
Authority: CN
Inventors: 全东山; 李镇浩; 姜晶媛; 高玄硕; 林成昶; 李河贤; 赵承眩; 金晖容; 崔振秀
Original assignee: Electronics and Telecommunications Research Institute ETRI
Current assignee: Electronics and Telecommunications Research Institute ETRI
Priority date: 2016-08-01
Filing date: 2017-07-18
Publication date: 2022-12-30
Anticipated expiration: 2037-07-18
Also published as: KR102549022B1; KR102321394B1; KR20180014655A; KR102400315B1; CN116016910A; CN115914625A; CN115052142A; KR20210133202A; CN109644276A; KR20220068974A; WO2018026118A1; KR20230096953A; CN115052143A

Abstract

The invention relates to a method for motion compensation using motion vector prediction. An image decoding method for the method may include the steps of: obtaining a quantized residual signal for a current block; inverse quantizing the quantized residual signal; and determining a transform scheme for inverse transforming the residual signal. The inverse transform includes a primary transform and a secondary transform, and at least one of the primary transform scheme and the secondary transform scheme may be derived from the already decoded reconstructed blocks around the current block.

Description

Image encoding/decoding method

Technical Field

The present invention relates to a method and apparatus for encoding/decoding an image. More particularly, the present invention relates to a method and apparatus for deriving coding information of a current block by using coding information of neighboring blocks.

Background

Recently, demands for high resolution and high quality images, such as High Definition (HD) images and Ultra High Definition (UHD) images, have increased in various application fields. However, the image data of higher resolution and quality increases the data amount compared to the conventional image data. Therefore, when image data is transmitted by using a medium such as a conventional wired and wireless broadband network, or when image data is stored by using a conventional storage medium, the cost of transmission and storage increases. In order to solve these problems occurring as the resolution and quality of image data are improved, an efficient image encoding/decoding technique for higher resolution and higher quality images is required.

Image compression techniques include various techniques including: an inter prediction technique of predicting pixel values included in a current picture from a previous picture or a subsequent picture of the current picture; an intra prediction technique of predicting pixel values included in a current picture by using pixel information in the current picture; transform and quantization techniques for compressing the energy of the residual signal; entropy coding techniques that assign short codes to values with high frequency of occurrence and long codes to values with low frequency of occurrence; and so on. Image data can be efficiently compressed by using such image compression techniques and can be transmitted or stored.

In the conventional motion compensation, only a spatial motion vector candidate, a temporal motion vector candidate, and a zero motion vector candidate are added to a motion vector candidate list to be used, and only unidirectional prediction and bidirectional prediction are used, so there is a limit to improve encoding efficiency.

Disclosure of Invention

Technical problem

An object of the present invention is to provide a method and apparatus for deriving coding information of a current block from a reconstructed block adjacent to the current block.

Another object of the present invention is to provide a method and apparatus for encoding/decoding a difference between a motion vector difference adjacent to a current block and a motion vector difference of the current block.

Technical scheme

According to the present invention, an image encoding method includes: generating a prediction signal of a current block; generating a residual signal of the current block based on the prediction signal; determining a transformation scheme for transforming the residual signal; and performing quantization on the residual signal. Here, the transform includes a primary transform and a secondary transform, and at least one of the primary transform scheme and the secondary transform scheme is derived from an encoded reconstructed block adjacent to the current block.

According to the present invention, an image decoding method includes: obtaining a quantized residual signal of a current block; performing inverse quantization on the quantized residual signal; and determining a transform scheme for inverse transforming the residual signal. Here, the inverse transform includes a primary transform and a secondary transform, and at least one of the primary transform scheme and the secondary transform scheme is derived from a decoded reconstructed block adjacent to the current block.

In the image encoding method or the image decoding method, when the prediction signal is generated through intra prediction, at least one of the primary transform scheme and the secondary transform scheme may be derived from a neighboring block having the same intra prediction mode as that of the current block.

In the image encoding method or the image decoding method, when a primary transform scheme of a neighboring block having the same intra prediction mode as that of the current block represents a transform skip, the primary transform scheme and a secondary transform scheme of the current block may be determined as the transform skip.

In the picture encoding method or the picture decoding method, the secondary transform scheme may be derived from a neighboring block of which the primary transform scheme is the same as the primary transform scheme of the current block.

In the image encoding method or the image decoding method, when the prediction signal is generated through inter prediction, at least one of the primary transform scheme and the secondary transform scheme is derived from a neighboring block having the same motion information as that of the current block.

In the image encoding method or the image decoding method, the motion information may include at least one of a motion vector, a reference picture index, and a reference picture direction.

Technical effects

According to the present invention, encoding/decoding efficiency can be improved by providing a method and apparatus for deriving encoding information of a current block from a reconstructed block adjacent to the current block.

According to the present invention, encoding/decoding efficiency can be improved by a method and apparatus for encoding/decoding a difference between a motion vector difference adjacent to a current block and a motion vector difference of the current block.

Drawings

Fig. 1 is a block diagram showing a configuration of an encoding apparatus according to an embodiment of the present invention.

Fig. 2 is a block diagram showing a configuration of a decoding apparatus according to an embodiment of the present invention.

Fig. 3 is a diagram schematically showing a partition structure of an image when the image is encoded and decoded.

Fig. 4 is a diagram illustrating a form of a Prediction Unit (PU) that may be included in a Coding Unit (CU).

Fig. 5 is a diagram illustrating a form of a Transform Unit (TU) that may be included in a Coding Unit (CU).

Fig. 6 is a diagram for explaining an embodiment of a process of intra prediction.

Fig. 7 is a diagram for explaining an embodiment of a process of inter prediction.

Fig. 8 is a diagram for explaining a transform set according to an intra prediction mode.

Fig. 9 is a diagram for explaining the process of transformation.

Fig. 10 is a diagram for explaining scanning of quantized transform coefficients.

Fig. 11 is a diagram for explaining block partitioning.

Fig. 12 is a diagram illustrating an example of an encoding/decoding unit according to a partition form of a block.

Fig. 13 is a flowchart showing a process of determining whether to decode information of a binary tree partition.

Fig. 14 is a flowchart showing a process of determining whether to decode information of a binary tree partition.

Fig. 15 to 17 are diagrams showing an example of a case where binary tree partitioning is no longer performed for a block having a predetermined size or less.

Fig. 18 is a flowchart illustrating a process of determining whether to derive encoding information of a residual signal of a current block from neighboring blocks when the current block is encoded through intra prediction.

Fig. 19 is a flowchart illustrating a process of determining whether to derive encoding information of a residual signal of a current block from neighboring blocks when the current block is encoded through inter prediction.

Fig. 20 is a flowchart showing a decoding process of a motion vector of a current block.

Fig. 21 is a diagram showing an example of deriving a spatial motion vector candidate.

Fig. 22 is a diagram showing an example of deriving a temporal motion vector candidate.

Fig. 23 is a diagram illustrating the derivation of the second motion vector difference.

Detailed Description

Modes for carrying out the invention

Various modifications may be made to the present invention and there are various embodiments of the present invention, examples of which will now be provided with reference to the accompanying drawings and examples of which will be described in detail. However, the present invention is not limited thereto, although the exemplary embodiments may be construed to include all modifications, equivalents, or alternatives within the technical spirit and scope of the present invention. Like reference numerals refer to the same or similar functionality in all respects. In the drawings, the shapes and sizes of elements may be exaggerated for clarity. In the following detailed description of the present invention, reference is made to the accompanying drawings that show, by way of illustration, specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosure. It is to be understood that the various embodiments of the disclosure, although different, are not necessarily mutually exclusive. For example, a particular feature, structure, or characteristic described herein in connection with one embodiment may be implemented within other embodiments without departing from the spirit and scope of the disclosure. In addition, it is to be understood that the location or arrangement of individual elements within each disclosed embodiment may be modified without departing from the spirit and scope of the disclosure. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present disclosure is defined only by the appended claims, appropriately interpreted, along with the full range of equivalents to which the claims are entitled.

The terms "first", "second", and the like, as used in the specification may be used to describe various components, but these components should not be construed as limited by the terms. The terms are used only to distinguish one component from another. For example, a "first" component may be termed a "second" component, and a "second" component may be similarly termed a "first" component, without departing from the scope of the present invention. The term "and/or" includes a combination of items or any of items.

It will be understood that, in the present specification, when an element is referred to simply as being "connected to" or "coupled to" another element, rather than "directly connected to" or "directly coupled to" another element, it can be "directly connected to" or "directly coupled to" the other element or be connected to or coupled to the other element with the other element interposed therebetween. In contrast, when an element is referred to as being "directly bonded" or "directly connected" to another element, there are no intervening elements present.

Further, the constituent elements shown in the embodiments of the present invention are independently shown so as to exhibit characteristic functions different from each other. Therefore, it does not mean that each constituent element is constituted by a separate hardware or software constituent unit. In other words, for convenience, each component includes each of the enumerated components. Accordingly, at least two components of each component may be combined to form one component, or one component may be divided into a plurality of components to perform each function. An embodiment in which each component is combined and an embodiment in which one component is divided are also included in the scope of the present invention without departing from the essence of the present invention.

The terminology used in the description is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. Expressions used in the singular include plural expressions unless it has a distinctly different meaning in the context. In this specification, it will be understood that terms such as "comprises," "comprising," "has," "having," and the like, are intended to specify the presence of stated features, quantities, steps, acts, elements, components, or combinations thereof, disclosed in the specification, and are not intended to preclude the possibility that one or more other features, quantities, steps, acts, elements, components, or combinations thereof may be present or may be added. In other words, when a specific element is referred to as being "included", elements other than the corresponding element are not excluded, and instead, additional elements may be included in the embodiment of the present invention or the scope of the present invention.

Further, some constituent elements may not be indispensable constituent elements that perform the essential functions of the present invention, but optional constituent elements that merely enhance the performance thereof. The present invention can be implemented by excluding the constituent elements used in enhancing the performance by including only indispensable constituent elements for implementing the essence of the present invention. A structure including only the indispensable constituent elements excluding optional constituent elements used in only enhancing performance is also included in the scope of the present invention.

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. In describing exemplary embodiments of the present invention, well-known functions or constructions are not described in detail since they would unnecessarily obscure the understanding of the present invention. The same constituent elements in the drawings are denoted by the same reference numerals, and a repetitive description of the same elements will be omitted.

Further, hereinafter, an image may be intended as a picture constituting a video, or may be intended as a video itself. For example, "encoding or decoding or encoding and decoding an image" may mean "encoding or decoding or encoding and decoding a video" and may mean "encoding or decoding or encoding and decoding one image among a plurality of images of a video". Here, the picture and the image may have the same meaning.

Description of the terms

An encoder: may mean a device that performs the encoding.

A decoder: may mean a device that performs decoding.

And (3) analysis: may mean that a value of a syntax element is determined by performing entropy decoding, or may mean entropy decoding itself.

Block (2): may mean a sample of an M x N matrix. Here, M and N are positive integers, and a block may mean a two-dimensional form of a sample point matrix.

Sampling points are as follows: is a basic unit of a block and may indicate a value ranging from 0 to 2Bd-1 depending on a bit depth (Bd). A sample may be referred to as a pixel in the present invention.

A unit: may mean a unit that encodes and decodes an image. In encoding and decoding an image, a unit may be a region generated by partitioning one image. In addition, a unit may mean a sub-division unit when one image is partitioned into a plurality of sub-division units during encoding or decoding. In encoding and decoding an image, predetermined processing for each unit may be performed. A cell may be partitioned into sub-cells having a size smaller than the size of the cell. Depending on the function, a unit may mean a block, a macroblock, a coding tree unit, a coding tree block, a coding unit, a coding block, a prediction unit, a prediction block, a transform unit, a transform block, and the like. Further, in order to distinguish a unit from a block, the unit may include a luminance component block, a chrominance component block of the luminance component block, and a syntax element of each color component block. The cells may have various sizes and shapes, and in particular, the shape of the cells may be a two-dimensional geometric figure, such as a rectangle, square, trapezoid, triangle, pentagon, and the like. In addition, the unit information may include at least one of a unit type (indicating a coding unit, a prediction unit, a transform unit, etc.), a unit size, a unit depth, an order of encoding and decoding the unit, etc.

Reconstruction of neighboring cells: may mean a reconstruction unit that is previously encoded or decoded in space/time, and the reconstruction unit is adjacent to the encoding/decoding target unit. Here, reconstructing neighboring cells may mean reconstructing neighboring blocks.

Adjacent blocks: may mean a block adjacent to the encoding/decoding target block. The block adjacent to the encoding/decoding target block may mean a block having a boundary in contact with the encoding/decoding target block. The neighboring blocks may mean blocks located at neighboring vertices of the encoding/decoding target block. The neighboring blocks may be intended to be reconstructed neighboring blocks.

Depth of cell: which may be referred to as the degree to which the cell is partitioned. In the tree structure, the root node may be the highest node and the leaf nodes may be the lowest nodes.

Symbol: may mean syntax elements of the encoding/decoding target unit, encoding parameters, values of transform coefficients, and the like.

Parameter set: may mean header information in the structure of the bitstream. The parameter set may comprise at least one of a video parameter set, a sequence parameter set, a picture parameter set, or an adaptation parameter set. Also, the parameter set may mean slice header information, parallel block (tile) header information, and the like.

Bit stream: may be intended to include a bit string of encoded image information.

A prediction unit: may mean a basic unit when inter prediction or intra prediction and compensation for prediction are performed. One prediction unit may be partitioned into a plurality of partitions. In this case, each of the plurality of partitions may be a basic unit when prediction and compensation are performed, and each partition partitioned from the prediction unit may be a prediction unit. In addition, one prediction unit may be partitioned into a plurality of small prediction units. The prediction unit may have various sizes and shapes, and in particular, the shape of the prediction unit may be a two-dimensional geometric figure, such as a rectangle, a square, a trapezoid, a triangle, a pentagon, and the like.

Prediction unit partitioning: which may be referred to as the shape of the partitioned prediction unit.

List of reference pictures: may be intended to include a list of at least one reference picture, wherein the at least one reference picture is used for inter prediction or motion compensation. The type of the reference picture List may be List Combined (LC), list 0 (L0), list 1 (L1), list 2 (L2), list 3 (L3), etc. At least one reference picture list may be used for inter prediction.

Inter prediction indicator: may mean one of the following: an inter prediction direction (unidirectional prediction, bidirectional prediction, etc.) of an encoding/decoding target block in the case of inter prediction, the number of reference pictures used to generate a prediction block by the encoding/decoding target block, and the number of reference blocks used to perform inter prediction or motion compensation by the encoding/decoding target block.

Reference picture index: may be intended as an index to a particular reference picture in the reference picture list.

Reference picture: may be intended as a picture that a particular unit refers to for inter prediction or motion compensation. The reference picture may be referred to as a reference picture.

Motion vector: is a two-dimensional vector for inter prediction or motion compensation, and may mean an offset between an encoding/decoding target picture and a reference picture. For example, (mvX, mvY) may indicate a motion vector, mvX may indicate a horizontal component, and mvY may indicate a vertical component.

Motion vector candidates: may mean a unit that becomes a prediction candidate when predicting a motion vector, or may mean a motion vector of the unit.

Motion vector candidate list: may mean a list configured by using motion vector candidates.

Motion vector candidate index: may be intended as an indicator indicating a motion vector candidate in the motion vector candidate list. The motion vector candidate index may be referred to as an index of a motion vector predictor.

Motion information: may mean a motion vector, a reference picture index, and an inter prediction indicator, and information including at least one of reference picture list information, a reference picture, a motion vector candidate index, and the like.

Merging the candidate lists: may mean by using a list of merging candidate configurations.

Merging candidates: may include spatial merge candidates, temporal merge candidates, combined bi-predictive merge candidates, zero merge candidates, etc. The merge candidate may include motion information such as prediction type information, a reference picture index for each list, a motion vector, and the like.

Merging indexes: may mean information indicating a merge candidate in the merge candidate list. Furthermore, the merge index may indicate a block of a derived merge candidate among reconstructed blocks spatially/temporally adjacent to the current block. Further, the merge index may indicate at least one piece of motion information among pieces of motion information of the merge candidates.

A transformation unit: may mean a basic unit when encoding/decoding similar to transform, inverse transform, quantization, inverse quantization, and transform coefficient encoding/decoding is performed on a residual signal. One transform unit may be partitioned into multiple small transform units. The transform unit may have various sizes and shapes. Specifically, the shape of the transformation unit may be a two-dimensional geometric figure such as a rectangle, a square, a trapezoid, a triangle, a pentagon, or the like.

Zooming: may mean a process of multiplying a factor by a transform coefficient level, and as a result, transform coefficients may be generated. Scaling may also be referred to as inverse quantization.

Quantization parameters: may mean values used in scaling transform coefficient levels during quantization and inverse quantization. Here, the quantization parameter may be a value mapped to a quantized step size.

Delta quantization parameter: may mean a difference between the quantization parameter of the encoding/decoding target unit and the predicted quantization parameter.

Scanning: may mean a method of ordering the order of coefficients within a block or matrix. For example, the operation of ordering a two-dimensional matrix into a one-dimensional matrix may be referred to as scanning, and the operation of ordering a one-dimensional matrix into a two-dimensional matrix may be referred to as scanning or inverse scanning.

Transform coefficients: may mean the coefficient values that are produced after the transformation is performed. In the present invention, a quantized transform coefficient level (i.e., a transform coefficient to which quantization is applied) may be referred to as a transform coefficient.

Non-zero transform coefficients: may mean a transform coefficient having a value other than 0, or may mean a transform coefficient level having a value other than 0.

Quantization matrix: may mean a matrix used in quantization and inverse quantization in order to improve subject quality (subject quality) or object quality (object quality) of an image. The quantization matrix may be referred to as a scaling list.

Quantization matrix coefficients: it may be intended to quantize each element of the matrix. The quantized matrix coefficients may be referred to as matrix coefficients.

Default matrix: can be intended as a predefined quantization matrix predefined in the encoder and decoder.

Non-default matrix: it may mean a quantization matrix transmitted/received by a user without being predefined in an encoder and a decoder.

A coding tree unit: may consist of one luma component (Y) coding tree unit and two associated chroma component (Cb, cr) coding tree units. Each coding tree unit may be partitioned by using at least one partitioning method (such as a quadtree, a binary tree, etc.) to constitute sub-units such as a coding unit, a prediction unit, a transform unit, etc. The coding tree unit may be used as a term for indicating a pixel block (where the pixel block is a processing unit in a decoding/encoding process of an image, such as a partition of an input image).

And (3) encoding a tree block: may be used as a term for indicating one of a Y coding tree unit, a Cb coding tree unit, and a Cr coding tree unit.

The encoding apparatus 100 may be a video encoding apparatus or an image encoding apparatus. The video may include one or more images. The encoding apparatus 100 may encode one or more images of a video in a temporal order.

Referring to fig. 1, the encoding apparatus 100 may include a motion prediction unit 111, a motion compensation unit 112, an intra prediction unit 120, a switch 115, a subtractor 125, a transform unit 130, a quantization unit 140, an entropy encoding unit 150, an inverse quantization unit 160, an inverse transform unit 170, an adder 175, a filter unit 180, and a reference picture buffer 190.

The encoding apparatus 100 may encode an input picture in an intra mode or an inter mode or both the intra mode and the inter mode. Also, the encoding apparatus 100 may generate a bitstream by encoding an input picture, and may output the generated bitstream. When the intra mode is used as the prediction mode, the switch 115 may switch to intra. When the inter mode is used as the prediction mode, the switch 115 may switch to the inter. Here, the intra mode may be referred to as an intra prediction mode, and the inter mode may be referred to as an inter prediction mode. The encoding apparatus 100 may generate a prediction block of an input picture. Further, after generating the prediction block, the encoding apparatus 100 may encode a residual between the input block and the prediction block. The input picture may be referred to as a current image that is a target of current encoding. The input block may be referred to as a current block or may be referred to as an encoding target block that is a target of current encoding.

When the prediction mode is an intra mode, the intra prediction unit 120 may use pixel values of previously encoded blocks adjacent to the current block as reference pixels. The intra prediction unit 120 may perform spatial prediction by using the reference pixels and may generate prediction samples of the input block by using the spatial prediction. Here, the intra prediction may mean intra frame prediction.

When the prediction mode is the inter mode, the motion prediction unit 111 may search for a region that best matches the input block from a reference picture in the motion prediction process, and may derive a motion vector by using the searched region. The reference pictures may be stored in the reference picture buffer 190.

The motion compensation unit 112 may generate the prediction block by performing motion compensation using the motion vector. Here, the motion vector may be a two-dimensional vector for inter prediction. Further, the motion vector may indicate an offset between the current picture and the reference picture. Here, the inter prediction may mean inter frame prediction.

When the value of the motion vector is not an integer, the motion prediction unit 111 and the motion compensation unit 112 may generate a prediction block by applying an interpolation filter to a partial region in a reference picture. In order to perform inter prediction or motion compensation based on a coding unit, which method is used by a motion prediction and compensation method of a prediction unit in the coding unit may be determined among a skip mode, a merge mode, an AMVP mode, and a current picture reference mode. Inter prediction or motion compensation may be performed according to each mode. Here, the current picture reference mode may mean a prediction mode using a previously reconstructed region of the current picture having the encoding target block. To indicate the pre-reconstructed region, a motion vector for the current picture reference mode may be defined. Whether or not the encoding target block is encoded in the current picture reference mode may be encoded by using a reference picture index of the encoding target block.

The subtractor 125 may generate a residual block by using a residual between the input block and the prediction block. The residual block may be referred to as a residual signal.

The transform unit 130 may generate transform coefficients by transforming the residual block and may output the transform coefficients. Here, the transform coefficient may be a coefficient value generated by transforming the residual block. In the transform skip mode, transform unit 130 may skip the transform of the residual block.

The quantized transform coefficient levels may be generated by applying quantization to the transform coefficients. Hereinafter, in an embodiment of the present invention, the quantized transform coefficient levels may be referred to as transform coefficients.

The quantization unit 140 may generate quantized transform coefficient levels by quantizing the transform coefficients according to a quantization parameter, and may output the quantized transform coefficient levels. Here, the quantization unit 140 may quantize the transform coefficient by using the quantization matrix.

The entropy encoding unit 150 may generate a bitstream by performing entropy encoding on the values calculated by the quantization unit 140 or on encoding parameter values calculated in the encoding process, etc., according to the probability distribution, and may output the generated bitstream. The entropy encoding unit 150 may perform entropy encoding on information used to decode the image and entropy encoding on information of pixels of the image. For example, the information for decoding the image may include syntax elements and the like.

When entropy encoding is applied, symbols are represented by allocating a small number of bits to symbols having a high occurrence probability and allocating a large number of bits to symbols having a low occurrence probability, thereby reducing the size of a bit stream encoding a target symbol. Therefore, by entropy encoding, the compression performance of image encoding can be improved. For entropy encoding, the entropy encoding unit 150 may use encoding methods such as exponential golomb, context Adaptive Variable Length Coding (CAVLC), and Context Adaptive Binary Arithmetic Coding (CABAC). For example, the entropy encoding unit 150 may perform entropy encoding by using a variable length coding/code (VLC) table. Further, the entropy encoding unit 150 may derive a binarization method of the target symbol and a probability model of the target symbol/bin, and then may perform arithmetic encoding by using the derived binarization method or the derived probability model.

In order to encode the transform coefficient levels, the entropy encoding unit 150 may change the coefficients of the two-dimensional block form into the one-dimensional vector form by using a transform coefficient scanning method. For example, by scanning the coefficients of a block with an upper right scan, the coefficients in two-dimensional form can be changed to one-dimensional vectors. Depending on the size of the transform unit and the intra prediction mode, vertical direction scanning for scanning coefficients in the form of a two-dimensional block in the column direction and horizontal direction scanning for scanning coefficients in the form of a two-dimensional block in the row direction may be used instead of using the upper-right scan. That is, depending on the size of the transform unit and the intra prediction mode, it may be determined which scanning method among the top-right scanning, the vertical direction scanning, and the horizontal direction scanning is to be used.

The encoding parameters may include information, such as syntax elements, encoded by the encoder and sent to the decoder, and may include information that may be derived in the encoding or decoding process. The encoding parameter may mean information necessary to encode or decode an image. For example, the encoding parameters may include at least one value or a combination of: <xnotran> , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , () , , , , , , , , , / , , , , , , , / , , , , , , , , , . </xnotran>

The residual signal may mean a difference between the original signal and the predicted signal. Alternatively, the residual signal may be a signal generated by transforming a difference between the original signal and the prediction signal. Alternatively, the residual signal may be a signal generated by transforming and quantizing the difference between the original signal and the prediction signal. The residual block may be a residual signal of a block unit.

When the encoding apparatus 100 performs encoding by using inter prediction. The encoded current picture may be used as a reference picture for another image to be subsequently processed. Accordingly, the encoding apparatus 100 may decode the encoded current picture and may store the decoded image as a reference picture. To perform decoding, inverse quantization and inverse transformation may be performed on the encoded current picture.

The quantized coefficients may be inverse quantized by the inverse quantization unit 160 and inverse transformed by the inverse transformation unit 170. The inverse quantized and inverse transformed coefficients may be added to the prediction block by adder 175, whereby a reconstructed block may be generated.

The reconstructed block may pass through the filter unit 180. Filter unit 180 may apply at least one of a deblocking filter, sample Adaptive Offset (SAO), and Adaptive Loop Filter (ALF) to the reconstructed block or reconstructed picture. The filter unit 180 may be referred to as a loop filter.

The deblocking filter may remove block distortion occurring at a boundary between blocks. To determine whether the deblocking filter is operated, whether the deblocking filter is applied to the current block may be determined based on pixels included in several rows or columns in the block. When a deblocking filter is applied to a block, a strong filter or a weak filter may be applied depending on the required deblocking filtering strength. Further, when the deblocking filter is applied, horizontal direction filtering and vertical direction filtering may be processed in parallel.

The sample adaptive offset may add an optimal offset value to the pixel values in order to compensate for coding errors. The sample adaptive offset can correct the offset between the deblock filtered image and the original picture for each pixel. In order to perform offset correction on a specific picture, a method of applying an offset in consideration of edge information of each pixel may be used, or the following method is used: the method includes dividing pixels of an image into a predetermined number of regions, determining a region where offset correction is to be performed, and applying offset correction to the determined region.

The adaptive loop filter may perform filtering based on a value obtained by comparing the reconstructed picture with the original picture. Pixels of the image may be partitioned into predetermined groups, one filter applied to each group is determined, and different filtering may be performed at each group. Information on whether the adaptive loop filter is applied to the luminance signal may be transmitted for each Coding Unit (CU). The shape and filter coefficients of the adaptive loop filter applied to each block may vary. Further, an adaptive loop filter having the same form (fixed form) may be applied regardless of the characteristics of the target block.

The reconstructed block passed through the filter unit 180 may be stored in the reference picture buffer 190.

The decoding apparatus 200 may be a video decoding apparatus or an image decoding apparatus.

Referring to fig. 2, the decoding apparatus 200 may include an entropy decoding unit 210, an inverse quantization unit 220, an inverse transform unit 230, an intra prediction unit 240, a motion compensation unit 250, an adder 255, a filter unit 260, and a reference picture buffer 270.

The decoding apparatus 200 may receive the bitstream output from the encoding apparatus 100. The decoding apparatus 200 may decode the bitstream in an intra mode or an inter mode. Also, the decoding apparatus 100 may generate a reconstructed picture by performing decoding, and may output the reconstructed picture.

When the prediction mode used in the decoding is an intra mode, the switch may be switched to intra. When the prediction mode used in the decoding is an inter mode, the switch may be switched to an inter.

The decoding apparatus 200 may obtain a reconstructed residual block from an input bitstream and may generate a prediction block. When the reconstructed residual block and the prediction block are obtained, the decoding apparatus 200 may generate a reconstructed block, which is a decoding target block, by adding the reconstructed residual block to the prediction block. The decoding target block may be referred to as a current block.

The entropy decoding unit 210 may generate symbols by performing entropy decoding on the bitstream according to the probability distribution. The generated symbols may comprise symbols having quantized transform coefficient levels. Here, the method of entropy decoding may be similar to the method of entropy encoding described above. For example, the method of entropy decoding may be an inverse process of the above-described method of entropy encoding.

To decode the transform coefficient levels, entropy decoding unit 210 may perform transform coefficient scanning, whereby coefficients in the form of one-dimensional vectors may be changed into the form of two-dimensional blocks. For example, by scanning the coefficients of a block with an upper right scan, the coefficients in the form of one-dimensional vectors can be changed to a two-dimensional block form. Depending on the size of the transform unit and the intra prediction mode, vertical direction scanning and horizontal direction scanning may be used instead of using the top-right scanning. That is, depending on the size of the transform unit and the intra prediction mode, it may be determined which scanning method among the upper right scan, the vertical direction scan, and the horizontal direction scan is used.

The quantized transform coefficient levels may be inverse quantized by the inverse quantization unit 220 and inverse transformed by the inverse transformation unit 230. The quantized transform coefficient levels are inverse quantized and inverse transformed to produce a reconstructed residual block. Here, the inverse quantization unit 220 may apply a quantization matrix to the quantized transform coefficient levels.

When the intra mode is used, the intra prediction unit 240 may generate a prediction block by performing spatial prediction using pixel values of a previously decoded block adjacent to the decoding target block.

When the inter mode is used, the motion compensation unit 250 may generate a prediction block by performing motion compensation using both a reference picture stored in the reference picture buffer 270 and a motion vector. When the value of the motion vector is not an integer, the motion compensation unit 250 may generate the prediction block by applying an interpolation filter to a partial region in the reference picture. To perform motion compensation, which method is used by a motion compensation method of a prediction unit in a coding unit may be determined among a skip mode, a merge mode, an AMVP mode, and a current picture reference mode, based on the coding unit. Furthermore, motion compensation may be performed in accordance with the mode. Here, the current picture reference mode may mean a prediction mode using a previously reconstructed region within the current picture having the decoding target block. The previous reconstruction region may not be adjacent to the decoding target block. To indicate the previous reconstruction region, a fixed vector may be used for the current picture reference mode. Further, a flag or index indicating whether the decoding target block is a block decoded in accordance with the current picture reference mode may be signaled and may be derived by using the reference picture index of the decoding target block. The current picture for the current picture reference mode may exist at a fixed position (e.g., a position where the reference picture index is 0 or the last position) within the reference picture list for the decoding target block. Further, the current picture may be variably located within the reference picture list, and to this end, a reference picture index indicating the position of the current picture may be signaled.

The reconstructed residual block may be added to the prediction block by an adder 255. A block generated by adding the reconstructed residual block and the prediction block may pass through the filter unit 260. Filter unit 260 may apply at least one of a deblocking filter, a sample adaptive offset, and an adaptive loop filter to the reconstructed block or reconstructed picture. The filter unit 260 may output a reconstructed picture. The reconstructed picture may be stored in the reference picture buffer 270 and may be used for inter prediction.

Fig. 3 is a diagram schematically showing a partition structure of an image when the image is encoded and decoded. FIG. 3 schematically illustrates an embodiment of partitioning a unit into a plurality of sub-units.

In order to efficiently partition an image, a Coding Unit (CU) may be used in encoding and decoding. Here, the coding unit may mean a unit that performs coding. The unit may be a combination of 1) syntax elements and 2) blocks comprising image samples. For example, "partition of a unit" may mean "partition of a block associated with the unit". The block partition information may include information on a unit depth. The depth information may indicate a number of times the unit is partitioned or a degree to which the unit is partitioned, or both.

Referring to fig. 3, a picture 300 is sequentially partitioned for each maximum coding unit (LCU), and a partition structure is determined for each LCU. Here, the LCU and the Coding Tree Unit (CTU) have the same meaning. One unit may have depth information based on a tree structure and may be hierarchically partitioned. Each partitioned sub-unit may have depth information. The depth information indicates the number of times the unit is partitioned or the degree to which the unit is partitioned, or both, and thus, the depth information may include information on the size of the sub-unit.

The partition structure may mean the distribution of Coding Units (CUs) in the LCU 310. A CU may be a unit for efficiently encoding an image. The distribution may be determined based on whether a CU is to be partitioned multiple times (positive integers equal to or greater than 2, including 2, 4, 8, 16, etc.). The width and height dimensions of a partitioned CU may be half the width and half the height dimensions of the original CU, respectively. Alternatively, the width and height dimensions of the partitioned CU may be smaller than the width and height dimensions of the original CU, respectively, depending on the number of partitions. A partitioned CU may be recursively partitioned into a plurality of further partitioned CUs, wherein the further partitioned CUs have width and height dimensions that are smaller than the width and height dimensions of the partitioned CU according to the same partitioning method.

Here, the partitioning of the CU may be recursively performed up to a predetermined depth. The depth information may be information indicating a size of the CU, and may be stored in each CU. For example, the depth of the LCU may be 0, and the depth of the minimum coding unit (SCU) may be a predetermined maximum depth. Here, the LCU may be a coding unit having the maximum size as described above, and the SCU may be a coding unit having the minimum size.

Whenever LCU 310 starts to be partitioned and the width and height dimensions of CUs are reduced by the partitioning operation, the depth of the CUs is increased by 1. In the case of a CU that cannot be partitioned, the CU may have a size of 2N × 2N for each depth. In the case of a CU that can be partitioned, a CU having a size of 2N × 2N may be partitioned into a plurality of N × N sized CUs. Each time the depth increases by 1, the size of N is halved.

For example, when one coding unit is partitioned into four sub-coding units, the width size and the height size of one of the four sub-coding units may be half the width size and half the height size of the original coding unit, respectively. For example, when a coding unit of 32 × 32 size is partitioned into four sub-coding units, each of the four sub-coding units may have a size of 16 × 16. When one coding unit is partitioned into four sub-coding units, the coding units may be partitioned in a quadtree form.

For example, when one coding unit is partitioned into two sub-coding units, the width size or the height size of one of the two sub-coding units may be half the width size or half the height size of the original coding unit, respectively. For example, when a coding unit of 32 × 32 size is vertically partitioned into two sub-coding units, each of the two sub-coding units may have a size of 16 × 32. For example, when a coding unit of 32 × 32 size is horizontally partitioned into two sub-coding units, each of the two sub-coding units may have a size of 32 × 16. When one coding unit is partitioned into two sub-coding units, the coding unit may be partitioned in a binary tree form.

Referring to fig. 3, the LCU having the minimum depth of 0 may be 64 × 64 pixels in size, and the SCU having the maximum depth of 3 may be 8 × 8 pixels in size. Here, a CU having 64 × 64 pixels (i.e., LCU) may be represented by depth 0, a CU having 32 × 32 pixels may be represented by depth 1, a CU having 16 × 16 pixels may be represented by depth 2, and a CU having 8 × 8 pixels (i.e., SCU) may be represented by depth 3.

Further, information on whether a CU is to be partitioned may be represented by partition information of the CU. The partition information may be 1-bit information. The partition information may be included in all CUs except the SCU. For example, when the value of the partition information is 0, the CU may not be partitioned, and when the value of the partition information is 1, the CU may be partitioned.

A CU that is no longer partitioned among a plurality of CUs partitioned from the LCU may be partitioned into at least one Prediction Unit (PU). This process may also be referred to as partitioning.

A PU may be the basic unit for prediction. The PU may be encoded and decoded in any of a skip mode, an inter mode, and an intra mode. The PU may be partitioned in various forms depending on the mode.

In addition, the coding unit may not be partitioned into a plurality of prediction units, and the coding unit and the prediction units have the same size.

As shown in fig. 4, in skip mode, a CU may not be partitioned. In skip mode, a 2N × 2N mode 410 may be supported that has the same size as a CU that is not partitioned.

In inter mode, 8 partition modes may be supported in a CU. For example, in the inter mode, a 2N × 2N mode 410, a 2N × N mode 415, an N × 2N mode 420, an N × N mode 425, a 2N × nU mode 430, a 2N × nD mode 435, an nL × 2N mode 440, and an nR × 2N mode 445 may be supported. In intra mode, a 2N × 2N mode 410 and an N × N mode 425 may be supported.

One coding unit may be partitioned into one or more prediction units. One prediction unit may be partitioned into one or more sub-prediction units.

For example, when one prediction unit is partitioned into four sub-prediction units, the width size and the height size of one of the four sub-prediction units may be half the width size and half the height size of the original prediction unit. For example, when a prediction unit of 32 × 32 size is partitioned into four sub-prediction units, each of the four sub-prediction units may have a size of 16 × 16. When one prediction unit is partitioned into four sub-prediction units, the prediction unit may be partitioned in a quadtree form.

For example, when one prediction unit is partitioned into two sub-prediction units, the width size or the height size of one of the two sub-prediction units may be half the width size or half the height size of the original prediction unit. For example, when a prediction unit of 32 × 32 size is vertically partitioned into two sub-prediction units, each of the two sub-prediction units may have a 16 × 32 size. For example, when a prediction unit of 32 × 32 size is horizontally partitioned into two sub-prediction units, each of the two sub-prediction units may have a size of 32 × 16. When one prediction unit is partitioned into two sub-prediction units, the prediction units may be partitioned in a binary tree form.

A Transform Unit (TU) may be a basic unit for transform, quantization, inverse transform, and inverse quantization within a CU. The TU may have a square shape or a rectangular shape, etc. TUs may be independently determined according to the size of a CU or the form of a CU, or both.

A CU that is not partitioned any more among CUs partitioned from the LCU may be partitioned into at least one TU. Here, the partition structure of the TU may be a quad-tree structure. For example, as shown in FIG. 5, a CU 510 may be partitioned one or more times according to a quadtree structure. The case where one CU is partitioned at least once may be referred to as recursive partitioning. By performing partitioning, one CU 510 may be formed of TUs having various sizes. Alternatively, a CU may be partitioned into at least one TU depending on the number of vertical lines partitioning the CU or the number of horizontal lines partitioning the CU, or both. CUs may be partitioned into TUs that are symmetric to each other, or may be partitioned into TUs that are asymmetric to each other. In order to partition a CU into TUs that are symmetrical to each other, information of the size/shape of the TUs may be signaled and may be derived from the information of the size/shape of the CU.

In addition, the coding unit may not be partitioned into the transform units, and the coding unit and the transform units may have the same size.

One coding unit may be partitioned into at least one transform unit, and one transform unit may be partitioned into at least one sub-transform unit.

For example, when one transform unit is partitioned into four sub-transform units, the width size and the height size of one of the four sub-transform units may be half the width size and half the height size of the original transform unit, respectively. For example, when a transform unit of 32 × 32 size is partitioned into four sub-transform units, each of the four sub-transform units may have a size of 16 × 16. When one transform unit is partitioned into four sub-transform units, the transform unit may be partitioned in a quadtree form.

For example, when one transform unit is partitioned into two sub-transform units, a width size or a height size of one of the two sub-transform units may be a half width size or a half height size of an original transform unit, respectively. For example, when a transform unit of 32 × 32 size is vertically partitioned into two sub-transform units, each of the two sub-transform units may have a 16 × 32 size. For example, when a transform unit of 32 × 32 size is horizontally partitioned into two sub-transform units, each of the two sub-transform units may have a size of 32 × 16. When one transform unit is partitioned into two sub-transform units, the transform unit may be partitioned in a binary tree form.

When the transform is performed, the residual block may be transformed by using at least one of predetermined transform methods. For example, the predetermined transform method may include Discrete Cosine Transform (DCT), discrete Sine Transform (DST), KLT, or the like. Which transformation method is applied to transform the residual block may be determined by using at least one of: inter prediction mode information of the prediction unit, intra prediction mode information of the prediction unit, and a size/shape of the transform block. Information indicating the transformation method may be signaled.

The intra prediction mode may be a non-directional mode or a directional mode. The non-directional mode may be a DC mode or a planar mode. The direction mode may be a prediction mode having a specific direction or angle, and the number of direction modes may be M equal to or greater than 1. The direction mode may be indicated as at least one of a mode number, a mode value, and a mode angle.

The number of intra prediction modes may be N equal to or greater than 1, including non-directional modes and directional modes.

The number of intra prediction modes may vary depending on the size of the block. For example, the number of intra prediction modes may be 67 when the size of the block is 4 × 4 or 8 × 8, 35 when the size of the block is 16 × 16, 19 when the size of the block is 32 × 32, and 7 when the size of the block is 64 × 64.

The number of intra prediction modes can be fixed to N regardless of the size of the block. For example, the number of intra prediction modes may be fixed to at least one of 35 or 67 regardless of the size of the block.

The number of intra prediction modes may vary depending on the type of color component. For example, the number of prediction modes may vary depending on whether the color component is a luminance signal or a chrominance signal.

The intra-frame encoding and/or decoding may be performed by using sample values or encoding parameters included in the reconstructed neighboring blocks.

In order to encode/decode a current block in intra prediction, it may be identified whether samples included in a reconstructed neighboring block are available as reference samples of an encoding/decoding target block. When there are samples that cannot be used as reference samples of the encoding/decoding target block, sample values are copied and/or interpolated to samples that cannot be used as reference samples by using at least one sample among samples included in the reconstructed neighboring blocks, whereby samples that cannot be used as reference samples can be used as reference samples of the encoding/decoding target block.

In the intra prediction, a filter may be applied to at least one of the reference samples or the prediction samples based on at least one of an intra prediction mode and a size of the encoding/decoding target block. Here, the encoding/decoding target block may mean a current block, and may mean at least one of an encoding block, a prediction block, and a transform block. The type of the filter applied to the reference samples or the prediction samples may vary depending on at least one of an intra prediction mode or a size/shape of the current block. The type of filter may vary depending on at least one of the number of filter taps, the filter coefficient values, or the filter strength.

In a non-directional plane mode among the intra prediction modes, when generating a prediction block of the encoding/decoding target block, a sample value in the prediction block may be generated by using a weighted sum of an upper reference sample of the current sample, a left reference sample of the current sample, an upper-right reference sample of the current block, and a lower-left reference sample of the current block according to a sample position.

In the non-directional DC mode among the intra prediction modes, when a prediction block of the encoding/decoding target block is generated, the prediction block may be generated by an average of the upper reference sample point of the current block and the left reference sample point of the current block. Further, filtering may be performed on one or more upper rows and one or more left columns adjacent to the reference sample in the encoding/decoding block by using the reference sample value.

In the case of a plurality of directional modes (angular modes) among the intra prediction modes, a prediction block may be generated by using the upper-right reference sample point and/or the lower-left reference sample point, and the plurality of directional modes may have different directions. To generate predicted sample values, interpolation in real units may be performed.

To perform the intra prediction method, an intra prediction mode of the current prediction block may be predicted from intra prediction modes of neighboring prediction blocks adjacent to the current prediction block. In case of predicting an intra prediction mode of a current prediction block by using mode information predicted from an adjacent intra prediction mode, information that the current prediction block and the adjacent prediction block have the same intra prediction mode may be transmitted by using predetermined flag information when the current prediction block and the adjacent prediction block have the same intra prediction mode. When the intra prediction mode of the current prediction block is different from the intra prediction modes of the neighboring prediction blocks, the intra prediction mode information of the encoding/decoding target block may be encoded by performing entropy encoding.

The quadrangle illustrated in fig. 7 may indicate an image (or picture). Further, the arrows of fig. 7 may indicate the prediction direction. That is, an image may be encoded or decoded, or both, according to a prediction direction. Each image may be classified into an I picture (intra picture), a P picture (unidirectional predictive picture), a B picture (bidirectional predictive picture), etc., according to the coding type. Each picture can be encoded and decoded depending on the encoding type of each picture.

When the image as the encoding target is an I picture, the image itself may be intra-coded without inter prediction. When an image as an encoding target is a P picture, the image can be encoded by inter prediction or motion compensation using a reference picture only in the front direction. When the image as the encoding target is a B picture, the image can be encoded by inter prediction or motion compensation using reference pictures in both forward and reverse directions. Alternatively, the image may be encoded by inter prediction or motion compensation using a reference picture in one of the forward and reverse directions. Here, when the inter prediction mode is used, the encoder may perform inter prediction or motion compensation, and the decoder may perform motion compensation in response to the encoder. Images of P and B pictures that are encoded or decoded or encoded and decoded by using a reference picture may be regarded as images for inter prediction.

Hereinafter, inter prediction according to an embodiment will be described in detail.

Inter prediction or motion compensation may be performed by using both reference pictures and motion information. In addition, the inter prediction may use the skip mode described above.

The reference picture may be at least one of a previous picture and a subsequent picture of the current picture. Here, inter prediction may predict a block of a current picture from a reference picture. Here, the reference picture may mean an image used in predicting a block. Here, the area within the reference picture can be designated by using a reference picture index (refIdx) indicating the reference picture, a motion vector, and the like.

Inter prediction may select a reference picture and a reference block within the reference picture that is related to the current block. A prediction block for the current block may be generated by using the selected reference block. The current block may be a block that is a current encoding target or a current decoding target among blocks of the current picture.

The motion information may be derived from the process of inter prediction by the encoding apparatus 100 and the decoding apparatus 200. Furthermore, the derived motion information may be used when performing inter prediction. Here, the encoding apparatus 100 and the decoding apparatus 200 may improve encoding efficiency or decoding efficiency or both by using motion information of reconstructed neighboring blocks or motion information of co-located blocks (col blocks), or both. The col block may be a block related to a spatial position of an encoding/decoding target block within a previously reconstructed co-located picture (col picture). The reconstructed neighboring blocks may be blocks within the current picture and blocks previously reconstructed by encoding or decoding or both. Further, the reconstructed block may be a block adjacent to the encoding/decoding target block, or a block located at an outer corner of the encoding/decoding target block, or both. Here, the blocks located at the outer corners of the encoding/decoding target block may be blocks vertically adjacent to the neighboring blocks horizontally adjacent to the encoding/decoding target block. Alternatively, the blocks located at the outer corners of the encoding/decoding target block may be blocks horizontally adjacent to the adjacent blocks vertically adjacent to the encoding/decoding target block.

The encoding apparatus 100 and the decoding apparatus 200 may determine a block existing at a position spatially related to an encoding/decoding target block within a col picture, respectively, and may determine a predefined relative position based on the determined block. The predefined relative position may be an inner position or an outer position or both of blocks existing at a position spatially correlated with the encoding/decoding target block. Further, the encoding apparatus 100 and the decoding apparatus 200 may respectively derive the col block based on the determined predefined relative position. Here, the col picture may be one picture among at least one reference picture included in the reference picture list.

The method of deriving motion information may vary according to a prediction mode of an encoding/decoding target block. For example, the prediction modes applied to the inter prediction may include Advanced Motion Vector Prediction (AMVP), merge mode, and the like. Here, the merge mode may be referred to as a motion merge mode.

For example, when AMVP is applied as a prediction mode, the encoding apparatus 100 and the decoding apparatus 200 may generate motion vector candidate lists by using a motion vector of a reconstructed neighboring block or a motion vector of a col block, or both, respectively. Either the motion vector of the reconstructed neighboring block or the motion vector of the col block or both may be used as a motion vector candidate. Here, the motion vector of the col block may be referred to as a temporal motion vector candidate, and the motion vector of the reconstructed neighboring block may be referred to as a spatial motion vector candidate.

The encoding apparatus 100 may generate a bitstream, and the bitstream may include a motion vector candidate index. That is, the encoding apparatus 100 may generate a bitstream by entropy-encoding the motion vector candidate index. The motion vector candidate index may indicate an optimal motion vector candidate selected from among motion vector candidates included in the motion vector candidate list. The motion vector candidate index may be transmitted from the encoding apparatus 100 to the decoding apparatus 200 through a bitstream.

The decoding apparatus 200 may entropy-decode the motion vector candidate index from the bitstream, and may select a motion vector candidate of the decoding target block among motion vector candidates included in the motion vector candidate list by using the entropy-decoded motion vector candidate index.

The encoding apparatus 100 may calculate a Motion Vector Difference (MVD) between the motion vector of the decoding target block and the motion vector candidate, and may entropy-encode the MVD. The bitstream may include the entropy-encoded MVDs. The MVD may be transmitted from the encoding apparatus 100 to the decoding apparatus 200 through a bitstream. Here, the decoding apparatus 200 may entropy-decode the MVDs received from the bitstream. The decoding apparatus 200 may derive a motion vector of the decoding target block through the sum of the decoded MVD and the motion vector candidate.

The bitstream may include a reference picture index indicating a reference picture, etc., and the reference picture index may be entropy-encoded and transmitted from the encoding apparatus 100 to the decoding apparatus 200 through the bitstream. The decoding apparatus 200 may predict a motion vector of the decoding target block by using motion information of neighboring blocks, and may derive a motion vector of the decoding target block by using the predicted motion vector and a motion vector difference. The decoding apparatus 200 may generate a prediction block of the decoding target block based on the derived motion vector and the reference picture index information.

As another method of deriving motion information, a merge mode is used. The merge mode may mean the merging of motions of a plurality of blocks. The merge mode may mean that motion information of one block is applied to another block. When the merge mode is applied, the encoding apparatus 100 and the decoding apparatus 200 may generate the merge candidate lists by using the motion information of the reconstructed neighboring blocks or the motion information of the col blocks, or both, respectively. The motion information may include at least one of: 1) a motion vector, 2) a reference picture index, and 3) an inter prediction indicator. The prediction indicator may indicate unidirectional (L0 prediction, L1 prediction) or bidirectional.

Here, the merge mode may be applied to each CU or each PU. When the merge mode is performed per CU or per PU, the encoding apparatus 100 may generate a bitstream by entropy-decoding predefined information and may transmit the bitstream to the decoding apparatus 200. The bitstream may comprise said predefined information. The predefined information may include: 1) A merge flag as information indicating whether a merge mode is performed for each block partition, 2) a merge index as information indicating which block is merged among neighboring blocks adjacent to the encoding target block. For example, the neighboring blocks adjacent to the encoding target block may include a left neighboring block of the encoding target block, an upper neighboring block of the encoding target block, a temporal neighboring block of the encoding target block, and the like.

The merge candidate list may indicate a list storing motion information. Further, the merge candidate list may be generated before the merge mode is performed. The motion information stored in the merge candidate list may be at least one of the following motion information: motion information of a neighboring block adjacent to the encoding/decoding target block, motion information of a co-located block in the reference picture with respect to the encoding/decoding target block, motion information newly generated by pre-combining motion information existing in the merged motion candidate list, and a zero merge candidate. Here, the motion information of the neighboring block adjacent to the encoding/decoding target block may be referred to as a spatial merge candidate. The motion information of the co-located block in the reference picture related to the encoding/decoding target block may be referred to as a temporal merging candidate.

The skip mode may be a mode in which mode information of the neighboring block itself is applied to the encoding/decoding target block. The skip mode may be one of modes for inter prediction. When the skip mode is used, the encoding apparatus 100 may entropy-encode information regarding which block's motion information is used as motion information of an encoding target block, and may transmit the information to the decoding apparatus 200 through a bitstream. The encoding apparatus 100 may not transmit other information (e.g., syntax element information) to the decoding apparatus 200. The syntax element information may include at least one of motion vector difference information, a coded block flag, and a transform coefficient level.

The residual signal generated after the intra prediction or the inter prediction may be transformed to the frequency domain through a transform process as a part of the quantization process. Here, the first transform may use DCT type 2 (DCT-II) and various DCT, DST kernels. These transform kernels may perform separable transforms, which perform 1D transforms in the horizontal and/or vertical directions, on the residual signal, or may perform 2D non-separable transforms on the residual signal.

For example, in the case of a 1D transform, the DCT and DST types used in the transform may use DCT-II, DCT-V, DCT-VIII, DST-I, and DST-VII as shown in the following table. For example, as shown in table 1 and table 2, DCT or DST types used in the transform by composing the transform set may be derived.

[ Table 1]

[ Table 2]

Transformation set	Transformation of
		0	DST_VII、DCT-VIII、DST-I
1	DST-VII、DST-I、DCT-VIII
		2	DST-VII、DCT-V、DST-I

For example, as shown in fig. 8, different transform sets are defined for the horizontal direction and the vertical direction according to the intra prediction mode. Next, the encoder/decoder may perform a transform and/or an inverse transform by using the intra prediction mode of the current encoding/decoding target block and a transform of the relevant transform set. In this case, entropy encoding/decoding is not performed on the transform set, and the encoder/decoder may define the transform set according to the same rule. In this case, entropy encoding/decoding indicating which transform among the transforms of the transform set is used may be performed. For example, when the size of the block is equal to or less than 64 × 64, three transform sets are composed as shown in table 2 according to the intra prediction mode, and three transforms are used for each of the horizontal direction transform and the vertical direction transform to combine and perform a total of nine multi-transform methods. Next, the residual signal is encoded/decoded by using the optimal transform method, whereby the encoding efficiency can be improved. Here, in order to perform entropy encoding/decoding on information regarding which transform method is used among three transforms of one transform set, truncated unary binarization may be used. Here, in order to perform at least one of the vertical transform and the horizontal transform, entropy encoding/decoding may be performed on information indicating which transform of transforms of the transform set is used.

After completing the above-described first transformation, as shown in fig. 9, the encoder may perform a secondary transformation on the transformed coefficients to improve energy concentration. The quadratic transform may perform a separable transform that performs a 1D transform in the horizontal and/or vertical direction, or may perform a 2D inseparable transform. The used transform information may be transmitted or may be derived by the encoder/decoder from the current coding information and the neighboring coding information. For example, as with a 1D transform, a set of transforms for a quadratic transform may be defined. The entropy encoding/decoding is not performed on the transform set, and the encoder/decoder may define the transform set according to the same rule. In this case, information indicating which transform among transforms of the transform set is used may be transmitted, and the information may be applied to the at least one residual signal through intra prediction or inter prediction.

At least one of the number or type of transform candidates is different for each transform set. At least one of the number or type of transform candidates may be variously determined based on at least one of: location, size, partition form, and direction/non-direction of prediction mode (intra/inter mode) or intra prediction mode of a block (CU, PU, TU, etc.).

The decoder may perform a secondary inverse transform depending on whether the secondary inverse transform is performed, and may perform a primary inverse transform depending on whether the primary inverse transform is performed from the result of the secondary inverse transform.

The above-described primary and secondary transforms may be applied to at least one signal component among the luma/chroma components or may be applied according to the size/shape of an arbitrary encoded block. Entropy encoding/decoding may be performed on indices indicating both: whether a primary/secondary transform is used and the used primary/secondary transform in any coding block. Alternatively, the index may be derived by default by the encoder/decoder based on at least one piece of current/neighboring coding information.

A residual signal generated after intra prediction or inter prediction is subjected to quantization processing after being subjected to primary transformation and/or secondary transformation, and the quantized transform coefficient is subjected to entropy encoding processing. Here, as shown in fig. 10, the quantized transform coefficients may be scanned in a diagonal direction, a vertical direction, and a horizontal direction based on at least one of an intra prediction mode or a size/shape of a minimum block.

Also, the quantized transform coefficients on which entropy decoding is performed may be arranged in a block form by being inversely scanned, and at least one of inverse quantization or inverse transformation may be performed on the relevant block. Here, as a method of the inverse scanning, at least one of diagonal direction scanning, horizontal direction scanning, and vertical direction scanning may be performed.

For example, when the size of a current coding block is 8 × 8, first transform, second transform, and quantization may be performed on a residual signal for an 8 × 8 block, and then, scanning and entropy encoding may be performed on quantized transform coefficients for each of four 4 × 4 sub-blocks according to at least one of three scan order methods shown in fig. 10. Further, inverse scanning may be performed on the quantized transform coefficients by performing entropy decoding. The quantized transform coefficients on which the inverse scanning is performed become transform coefficients after being subjected to inverse quantization, and at least one of secondary inverse transform or primary inverse transform is performed, whereby a reconstructed residual signal can be generated.

In the video encoding process, one block may be partitioned as shown in fig. 11, and an indicator corresponding to the partition information may be signaled. Here, the partition information may be at least one of: a partition flag (split _ flag), a quad/binary tree flag (QB _ flag), a quad tree partition flag (quadtree _ flag), a binary tree partition flag (binytree _ flag), and a binary tree partition type flag (Btype _ flag). Here, split _ flag is a flag indicating whether a block is partitioned, QB _ flag is a flag indicating whether a block is partitioned in a quadtree form or a binary tree form, quadtree _ flag is a flag indicating whether a block is partitioned in a quadtree form, binarytree _ flag is a flag indicating whether a block is partitioned in a binary tree form, and Btype _ flag is a flag indicating whether a block is vertically partitioned or horizontally partitioned in the case of a partition in a binary tree form.

When the partition flag is 1, it may indicate that the partition is executed, and when the partition flag is 0, it may indicate that the partition is not executed. In the case of a quad/binary tree flag, a0 may indicate a quad tree partition and a1 may indicate a binary tree partition. Alternatively, 0 may indicate a binary tree partition and 1 may indicate a quadtree partition. In the case of the binary tree partition type flag, 0 may indicate a horizontal direction partition and 1 may indicate a vertical direction partition. Alternatively, 0 may indicate a vertical direction division and 1 may indicate a horizontal direction division.

For example, the partition information of fig. 11 may be derived by signaling at least one of quadtree _ flag, binytree _ flag, and Btype _ flag as shown in table 3.

[ Table 3]

For example, the partition information of fig. 11 may be derived by signaling at least one of split _ flag, QB _ flag, and Btype _ flag as shown in table 4.

[ Table 4]

The partitioning method may be performed only in a quad tree form or only in a binary tree form according to the size/shape of the block. In this case, the split _ flag may mean a flag indicating whether partitioning is performed in a quad tree form or a binary tree form. The size/shape of the block may be derived from the depth information of the block, and the depth information may be signaled.

When the size of the block is in a predetermined range, the partitioning may be performed only in a quad-tree form. Here, the predetermined range may be defined as at least one of a size of a largest block or a size of a smallest block that can be partitioned only in a quad-tree form. Information indicating the size of the maximum block/minimum block allowing the partition in the form of a quad-tree may be signaled through a bitstream, and may be signaled in units of at least one of a sequence, a picture parameter, or a slice (segment). Alternatively, the size of the maximum block/minimum block may be a fixed size preset in the encoder/decoder. For example, when the size of the block ranges from 256 × 256 to 64 × 64, the partitioning may be performed only in the quad-tree form. In this case, the split _ flag may be a flag indicating whether partitioning is performed in a quadtree form.

When the size of the block is in a predetermined range, the partitioning may be performed only in a binary tree form. Here, the predetermined range may be defined as at least one of a size of a largest block or a size of a smallest block that can be partitioned only in a binary tree form. Information indicating the size of the maximum block/minimum block allowing the partition in the form of a binary tree may be signaled through a bit stream, and may be signaled in units of at least one of a sequence, a picture parameter, or a slice (segment). Alternatively, the size of the maximum block/minimum block may be a fixed size preset in the encoder/decoder. For example, when the size of the block ranges from 16 × 16 to 8 × 8, the partitioning may be performed only in the form of a binary tree. In this case, the split _ flag may be a flag indicating whether partitioning is performed in a binary tree form.

After partitioning one block in the binary tree form, partitioning may be performed only in the binary tree form when the partitioned block is further partitioned.

The at least one indicator may not be signaled when a width dimension or a length dimension of the partitioned block cannot be further partitioned.

In addition to the quadtree-based binary tree partitioning, the quadtree-based partitioning may be performed after the binary tree partitioning.

When a block is partitioned based on a quad tree form or a binary tree form, or both, a block corresponding to a leaf node according to a final partition of the block may be set as a single encoding/decoding unit. In other words, when a block having an arbitrary size or an arbitrary form is no longer partitioned, encoding/decoding may be performed on the corresponding block. In one embodiment, encoding/decoding processes such as prediction (e.g., inter prediction or intra prediction), transformation, and the like may be performed for blocks having an arbitrary size or an arbitrary form and corresponding to binary leaf nodes generated by quad-tree-form partitioning or binary-tree-form partitioning or both quad-tree-form partitioning and binary-tree-form partitioning.

Fig. 12 is a diagram illustrating an example of an encoding/decoding unit according to a partition form of a block. In the example shown in fig. 12, the solid lines are used to distinguish blocks generated by quad-tree partitioning, and the dotted lines are used to distinguish blocks generated by binary-tree partitioning. When it is assumed that the structure of the coding block is determined as the example shown in fig. 12, the nodes finally partitioned by the solid lines and the dotted lines may be defined as binary leaf nodes. Encoding/decoding (e.g., intra-prediction or inter-prediction, first transform, second transform, quantization, entropy encoding/decoding, etc.) may be performed on blocks corresponding to binary leaf nodes in block sizes or block forms corresponding to the corresponding leaf nodes without additional partitioning from the predictor blocks or transform sub-blocks.

For convenience of explanation, in an embodiment to be described later, a partition form based on a block in a quad tree form or a binary tree form, or both, is defined as a block structure.

When encoding/decoding, the block structure of each color component may be the same, or the block structure of each color component may be different. In one embodiment, the block structure may be the same for the luma component and the chroma component or may be different for the luma component and the chroma component, depending on arbitrary coding parameter conditions. Here, the same block structure for the luminance component and the chrominance component may mean that block structure information determined for the luminance component is inherited to the chrominance component, or block structure information determined for the chrominance component is inherited to the luminance component. For example, the luminance signal and the chrominance signal may have the same block structure within an intra picture or in an intra slice or different block structures depending on the type of a current encoded/decoded picture or slice. Here, whether the block structures of the luma component and the chroma component constituting the intra picture or the intra slice are set to be the same or different may be determined by an encoding process that derives a rate distortion cost function from each block structure and selects a block structure in which the cost function becomes minimum.

The encoding apparatus may entropy-encode information indicating whether the same block structure is used for each color component and transmit the information to the decoding apparatus. Here, the information may be encoded in at least one of a sequence level (e.g., a Sequence Parameter Set (SPS)), a picture level (e.g., a Picture Parameter Set (PPS)), a slice header, a largest coding unit (LCT or CTU), and a coding unit (or coding block).

For example, encoding parameter information indicating whether a block structure of each color component is the same for an intra picture, an intra slice, an inter picture, or an inter slice may be transmitted through SPS or PPS.

In addition, encoding parameter information indicating whether a block structure of each color component is the same within an intra slice or an inter slice may be transmitted through a slice header.

Also, coding parameter information indicating whether the block structure of each color component is the same for a maximum coding unit or a coding unit may be transmitted in units of a maximum coding unit or a coding unit.

When the encoding/decoding target block satisfies a predetermined condition, block partitioning of the encoding/decoding target block may not be allowed. Accordingly, encoding/decoding of block partition information of a block satisfying a predetermined condition may be omitted. Here, the predetermined condition may relate to at least one of a block size, a block form, and a block partition depth, and may indicate a size, form, or depth of a block that allows or does not allow a quad-tree form or a binary-tree form, or both forms of partitioning. The block size or form may be a base value representing the size, form or depth of a block that allows or disallows partition in a quad-tree form or a binary tree form, or both. The block depth may represent a threshold of block depth that allows or disallows partitions in quad-tree or binary-tree form, or both. The block depth may be a factor of 1 increment depending on whether a quadtree-like or binary tree-like form or both forms of partitioning are performed.

The block partition information may include at least one of information (e.g., split _ flag) indicating whether to perform block partitioning, information (e.g., quad _ flag or QB _ flag) indicating whether to perform quad-tree partitioning, information (e.g., binary _ flag or QB _ flag) indicating whether to perform binary-tree partitioning, and information (e.g., btype _ flag) indicating a type of binary-tree partitioning.

For example, when it is assumed that the predetermined condition indicates that the block size is equal to or less than the base value and binary tree partitioning is not allowed for blocks satisfying the predetermined condition, encoding/decoding of at least one of information related to binary tree partitioning, such as a quad/binary tree form flag (QB _ flag), a binary tree partition flag (binary tree _ flag), and a binary tree partition type flag (Btype _ flag), may be omitted for blocks having a block size equal to or less than the base value. When encoding of the quad/binary tree form flag (QB _ flag) is omitted, a partition flag (split _ flag) may be used to indicate whether or not quad tree partitioning is performed on a block.

Not limited to the above example, it may be set that quad-tree partitioning may not be allowed for blocks that satisfy a predetermined condition. Here, for a block satisfying a predetermined condition, encoding/decoding of at least one of information related to quad-tree partitioning, for example, a quad/binary tree form flag (QB _ flag) or quad-tree partitioning flag (quadtree _ flag), may be omitted. When encoding/decoding of the quad/binary tree form flag (QB _ flag) can be omitted, a partition flag (split _ flag) can be used to indicate whether to perform binary tree partitioning on a block.

In another embodiment, any form of partitioning of blocks that satisfy a predetermined condition may be allowed. Here, for a block satisfying a predetermined condition, any one piece of partition information may not be encoded/decoded.

Referring to the drawings, a process of determining whether to omit encoding/decoding of partition information will be described in detail.

Fig. 13 is a flowchart showing a process of determining whether to decode information related to the binary tree partition. For convenience of explanation, in the present embodiment, it is assumed that partitioning based on a binary tree form is not allowed for blocks satisfying a predetermined condition.

First, in step S1301, information about a predetermined condition may be obtained. Here, the information related to the predetermined condition may include at least one of a block size, a block form, and a partition depth. The predetermined condition may be set as whether the block size is equal to or greater than a threshold value, whether the block size is equal to or less than a threshold value, whether the block form is a preset form, whether the block depth is equal to or greater than a threshold value, or whether the block depth is equal to or less than a threshold value, based on information of the predetermined condition.

The information on the predetermined condition may be predefined in the encoder and the decoder. Here, the information related to the predetermined condition may represent at least one of a block size, a block form, and a block depth defining the predetermined condition. In one embodiment, the block size/form or the partition depth, from which encoding/decoding of the partition information is omitted, may have a fixed value predefined in the encoder and the decoder. Alternatively, the information related to the predetermined condition may be variously determined by encoding parameters indicating the size/form of the encoding/decoding target block or the partition depth of the block.

In another embodiment, the information related to the predetermined condition may be encoded/decoded in a sequence level, a picture level, a slice header, or a predetermined coding region unit. Here, the predetermined coding region may have a size/form smaller than that of the current encoded/decoded picture or slice, and may include a maximum coding unit (LCU or CTU) or a block having an arbitrary size or an arbitrary form included in the maximum coding unit (e.g., a block generated by performing quad-tree partitioning on the maximum coding unit). The information related to the predetermined condition may be expressed as a maximum size of the block or a minimum size of the block or both, or may be expressed as a maximum depth of the block or a minimum depth of the block or both.

The encoder may determine the block structure by comparing rate distortion of a result obtained by encoding based on the quadtree form and the binary tree form and rate distortion of a result obtained by encoding based on the quadtree form. The encoder may encode information related to the predetermined condition in consideration of the size, form or depth of the block, in which the binary tree partitioning is no longer performed according to the determined block structure. Further, the decoder may decode information related to the predetermined condition from a bitstream that does not allow the binary tree partition, and determine whether the current block satisfies the predetermined condition based on the decoded information.

In step S1302, the decoder may determine whether the current block satisfies a predetermined condition. As a result, when the current block satisfies the predetermined condition, decoding of information related to the binary tree partition of the current block may be omitted.

Alternatively, when the current block does not satisfy the predetermined condition, information regarding a binary tree partition of the current block may be decoded according to whether or not the quadtree partition is performed on the current block at step S1303. For example, when the quad-tree partition is not performed on the current block, information related to the binary-tree partition of the current block may be decoded.

In other words, whether to encode/decode the block partition information of the current block may be determined by comparing whether the size, form, or depth of the current block corresponds to the size, form, or depth of the block according to a predetermined condition.

In another embodiment, according to an embodiment of the present invention, information indicating whether block partitioning is allowed for a block having an arbitrary size, an arbitrary form, or an arbitrary depth may be encoded/decoded. Here, the information indicating whether block partitioning is allowed or not may include information indicating whether a quad-tree partition exists (e.g., noPresent _ Quadtree _ flag) or information indicating whether a binary-tree partition exists (e.g., noPresent _ binary _ flag).

When it is indicated that block partitioning is not allowed for a block having an arbitrary size, an arbitrary form, or an arbitrary depth, block partitioning may not be allowed for a lower layer block except for a corresponding block. Here, the lower layer block may include at least one of a block having a block size smaller than the corresponding block, a block having the same block form as the corresponding block, a block having a partition depth larger than the corresponding block, and a lower layer node block of the corresponding block.

In one embodiment, when information indicating whether there is a binary tree partition for a block having an arbitrary size/form is signaled, and the information indicates that there is no binary tree partition, encoding/decoding of information related to the binary tree partition, for a block having a size/form smaller than the block, other than the block, such as information indicating whether to perform the binary tree partition (e.g., at least one of a quad/binary tree flag (QB _ flag), a binary tree partition flag (binary tree _ flag), and a binary tree partition type flag (Btype _ flag), may be omitted.

Without being limited to the above example, information indicating whether there is a quad-tree partition for a block having an arbitrary size/form and whether there is a binary-tree partition type flag may be signaled.

Information indicating whether block partitioning is allowed or not may be transmitted according to a predetermined coding region. Here, the predetermined coding region may have a size/form smaller than that of the current encoded/decoded picture or slice, and may include a block having an arbitrary size or an arbitrary form included in a maximum coding unit (LCU or CTU) or a coding unit (e.g., a block generated by performing quad-tree partitioning on the maximum coding unit). The encoder may determine a block structure by comparing rate distortion of a result obtained by encoding a block having an arbitrary size/form based on the quadtree form and the binary tree form and rate distortion of a result obtained by encoding based on the quadtree form, and determine whether to encode information representing that binary tree partitioning is allowed according to the determined block structure.

The information indicating whether the binary tree partition is allowed or not may be encoded/decoded by layer. In one embodiment, when the signaled information of the higher layer block indicates that block partitioning is allowed, information indicating whether block partitioning is allowed for a lower layer block generated by partitioning the higher layer block may be encoded/decoded.

In another embodiment, information of the size, form, or depth of the block, in which information indicating whether block partitioning is performed is signaled, may be encoded/decoded at a higher level. In one embodiment, the information of the size, form, or depth of the block may be transmitted in at least one of a sequence level, a picture level, and a slice header. Here, for a block corresponding to the size, form, or depth of a block signaled through a higher level, or for a higher layer block in the higher level, information indicating whether block partitioning is allowed may be signaled.

Fig. 14 is a flowchart showing a process of determining whether to decode information related to the binary tree partition. For convenience of explanation, in the present embodiment, it is assumed that information indicating whether the binary tree partition allows only for the current block to be signaled.

First, in step S1401, information indicating whether to perform binary tree partitioning may be decoded.

In step S1402, when the information indicates that binary tree partitioning is not allowed, decoding of the binary tree partition information of the current block may be omitted. Also, binary tree partition information for a lower layer block generated by a current block partitioned by a quad tree may not be decoded.

Meanwhile, when the information indicates that binary tree partitioning is allowed in step S1402, the information regarding the binary tree partition may be decoded according to whether the quadtree partition is performed on the current block in step S1403. For example, when the quad-tree partition is not performed on the current block, information related to the binary-tree partition for the current block may be decoded. Also, for a lower layer block generated by performing a quadtree type or binary tree partitioning on the current block, information regarding the binary tree partition may be decoded according to whether the quadtree partitioning is performed on the lower layer block.

As an example shown in fig. 15, assuming that the size/form of the maximum coding unit is 128 × 128, binary tree partitioning is not performed, and only quad tree partitioning exists in the maximum coding unit through rate-distortion optimization performed by the encoding apparatus.

As shown in the example illustrated in fig. 16, when information indicating that binary tree partitioning is not performed on a block of a predetermined size is not encoded/decoded, information indicating whether binary tree partitioning is performed on a block for which quad tree partitioning is not to be performed any more may be encoded/decoded.

However, as shown in the example illustrated in fig. 17, when information indicating whether to perform binary tree partitioning on a block having a size of 128 × 128 or less is encoded/decoded, information indicating whether to perform binary tree partitioning on a block having a size of 128 × 128 or less may not be encoded/decoded. Therefore, the amount of information to be encoded is reduced, and encoding/decoding efficiency is improved.

As described above with reference to fig. 13, the encoder may encode information of the size (e.g., information representing 128 × 128), form, or depth of a block for which binary tree partitioning is not allowed, and transmit the encoded information to the decoding apparatus. The decoding apparatus may decode, from the bitstream, information of a block size in which the binary tree partition is not performed, and no longer decode information on the binary tree partition of a block having a block size equal to or smaller than a size indicated by the decoded information.

In another embodiment, as described above with reference to fig. 14, the encoding apparatus may encode information indicating that binary tree partitioning is not allowed for a block having an arbitrary size, which does not perform binary tree partitioning, and transmit the encoded information to the decoding apparatus. Here, the information may be a 1-bit flag (e.g., noPresent _ BinaryTree _ flag), but is not limited thereto. In the example shown in fig. 17, for example, the NoPresent _ BinaryTree _ flag having a block size of 128 × 128 is signaled.

In fig. 16 and 17, for example, when a quad-tree form or a binary tree partition is performed, the flag value is set to 1, and otherwise, the flag value is set to 0. However, the opposite arrangement is also possible.

Embodiments related to disallowing block partitioning may be applied to both the luma component and the chroma component. Here, information indicating that block partitioning is not allowed (e.g., information indicating the size, form, or depth of a block for which block partitioning is not allowed, or information indicating whether block partitioning is allowed or not) may be commonly applied to the luminance component and the chrominance component, or may be independently signaled for the luminance component and the chrominance component. When information is entropy-encoded/decoded, any one of a truncated rice binarization method, a K-order exponential golomb binarization method, a limited K-order exponential golomb binarization method, a fixed-length binarization method, a unary binarization method, and a truncated unary binarization method may be used as the entropy encoding method. Further, after the information is binarized, the information is finally encoded/decoded by using CABAC (ae (v)).

Next, transformation and scanning of a residual signal of the current block will be described.

When encoding/decoding a residual signal of a current block, at least one piece of encoding information of the residual signal of the current block may be implicitly derived in an encoder/decoder by encoding information of a residual signal of an encoding/decoding block adjacent to the current block. Here, the encoding information of the residual signal may include information on a transform scheme (e.g., a transform scheme for a first transform and a second transform) of the residual signal and information of transform coefficients for scan quantization. Here, the quantized transform coefficient may mean that transform (e.g., first transform and second transform) and quantization are performed on a residual signal generated after intra prediction.

In detail, when a current block is encoded through intra prediction, encoding information of the current block may be derived from neighboring blocks adjacent to the current block based on an intra prediction mode of the current block. Alternatively, when the current block is encoded through inter prediction, encoding information of the current block may be derived from neighboring blocks adjacent to the current block based on motion information of the current block. Hereinafter, referring to fig. 18 and 19, a process of deriving encoding information of a residual signal of a current block from neighboring blocks when the current block is encoded by intra prediction and the current block is encoded by inter prediction will be described in detail.

First, in step S1801, it may be determined whether there is a neighboring block encoded in the same intra prediction mode as that of the current block. Here, the neighboring blocks of the current block may be included in the same picture as the current block (in other words, the current picture) and represent blocks encoded/decoded before the current block. In one embodiment, the neighboring blocks may include blocks adjacent to the current block among blocks encoded/decoded before the current block. Here, the blocks adjacent to the current block may include at least one of blocks adjacent to a boundary (e.g., a left side boundary or an upper side boundary) of the current block and blocks adjacent to a corner (e.g., an upper left corner, an upper right corner, or a lower left corner) of the current block.

When there are neighboring blocks encoded in the same intra prediction mode as that of the current block, the encoding information of the residual signal for the corresponding neighboring blocks may be derived as the encoding information of the current block at step S1802. In detail, at least one of the first transform, the second transform, and the scan information of the current block may be derived from neighboring blocks having the same intra prediction mode as that of the current block.

In one embodiment, when the intra prediction mode of the current block is the same as the intra prediction modes of the neighboring blocks of the current block and the corresponding neighboring blocks skip the first transform (transform skip), the residual signal of the current coding block may also skip the first transform. When the first transform of the current block is skipped, the second transform of the current block may also be skipped.

Alternatively, when the intra prediction mode of the current block is the same as the intra prediction modes of the neighboring blocks of the current block, the first transform of the current block with respect to the horizontal and vertical directions may be set to be the same as the first transform applied to the neighboring blocks having the same intra prediction mode as the intra prediction mode of the current block. Accordingly, encoding/decoding of encoding information (e.g., transform information (or transform index) used when first transforming for horizontal and vertical directions) required to first transform a residual signal of a current block may be omitted.

For example, when the intra prediction mode of the current block is determined as number 23 (mode 23), and the intra prediction mode of at least one neighboring block adjacent to the current block is determined as number 23 (mode 23), the first transform applied to the residual signal of the neighboring block having the intra prediction mode of number 23 may be used as the first transform for the residual signal of the current block. For example, when the first transform in the horizontal direction for the residual signal of the neighboring block having the same intra prediction mode as that of the current block is performed through the DCT-V and the first transform in the vertical direction for the residual signal is performed through the DST-VII, the first transform in the horizontal direction for the residual signal of the current block is performed by using the DCT-V and the first transform in the vertical direction is performed by using the DST-VII.

In another embodiment, when the intra prediction mode of the current block is the same as the intra prediction modes of the neighboring blocks of the current block, the secondary transform of the current block may be set to be the same as the secondary transform applied to the neighboring blocks having the same intra prediction mode as the intra prediction mode of the current block. Accordingly, encoding/decoding of encoding information (e.g., transform information (or transform index) of a secondary transform) required to perform the secondary transform on the residual signal of the current block may be omitted.

For example, when the intra prediction mode of the current block is determined as number 35 (mode 35), and the intra prediction mode of at least one neighboring block adjacent to the current block is also determined as number 35 (mode 35), the secondary transform applied to the residual signal of the neighboring block having the intra prediction mode of number 35 may be used as the secondary transform for the residual signal of the current block.

In another embodiment, when the intra prediction mode of the current block is the same as the intra prediction modes of the neighboring blocks of the current block, the scan order of the current block may be set to be the same as the scan order of the neighboring blocks having the same intra prediction mode as the intra prediction mode of the current block. Accordingly, encoding/decoding of encoding information (e.g., a scan index (scan order index) of at least one of a diagonal direction, a horizontal direction, and a vertical direction (representing a scan order)) required to scan the quantized transform coefficients for the residual signal of the current block may be omitted.

Not limited to the above-described example, at least two of the first transform, the second transform, and the scan order of neighboring blocks having the same intra prediction mode as that of the current block may be derived as encoding information of the current block.

In one embodiment, the first transform and the second transform of the neighboring blocks having the same intra prediction mode as that of the current block may be applied to the current block, or the first transform and the scan order of the neighboring blocks or the second transform and the scan order of the neighboring blocks may be applied to the current block. Alternatively, all of the first transform, the second transform, and the scan order of neighboring blocks having the same intra prediction mode as that of the current block may be applied to the current block.

When a plurality of neighboring blocks having the same intra prediction mode as the current block are adjacent to the current block, encoding information of the current block may be derived based on priorities between the neighboring blocks. In one embodiment, when a block adjacent to the left side of the current block and a block adjacent to the upper side of the current block respectively have the same intra prediction mode as that of the current block and have a higher priority than a block adjacent to the upper side of the current block, encoding information of the current block may be derived based on encoding information of the block adjacent to the left side of the current block.

In another embodiment, when a plurality of neighboring blocks having the same intra prediction mode as that of the current block are adjacent to the current block, information for identifying the neighboring blocks, which are used to derive encoding information of the current block, may be signaled through a bitstream. Here, encoding information of a residual signal of the current block may be derived from neighboring blocks indicated by information (e.g., neighboring block indexes) for identifying the neighboring blocks.

When there is no neighboring block having the same intra prediction mode as that of the current block, the encoding information of the residual signal of the current block may be entropy-encoded/decoded at step S1803. In one embodiment, when there is no neighboring block having the same intra prediction mode as that of the current block, at least one of transform information (or transform index) of a first transform, transform information (or transform index) of a second transform, and information (or scan index) of a scan order of the current block may be entropy-encoded/decoded.

In the above-described embodiment, the description has been made in the case where the intra prediction mode of the current block is the same as the intra prediction mode of the neighboring block and the encoding information of the residual signal of the current block is derived from the neighboring block. In another embodiment, the second encoding information of the residual signal of the current block may be derived from neighboring blocks of the first encoding information having the same residual signal as the first encoding information of the current block. Here, the first encoding information and the second encoding information may include at least one of information of a first transformation, information of a second transformation, and a scanning order.

In one embodiment, when there is at least one neighboring block using a first transform that is the same as the first transform determined for the current block, the secondary transform of the current block may be set to be applied to a secondary transform of a neighboring block using the same first transform as the first transform of the current block. Here, encoding/decoding of encoding information required to perform secondary transformation on a residual signal of the current block may be omitted. For example, it is assumed that the first transform in the horizontal direction for the residual signal of the current block is determined as DCT-V and the first transform in the vertical direction is determined as DST-VII. When the DCT-V is determined as a first transformation in a horizontal direction of at least one neighboring block of the current block and the DST-VII is determined as a first transformation in a vertical direction, a second transformation of the neighboring block to which the same first transformation as the first transformation of the current block is applied may be applied as the second transformation of the current block.

Also, a scan order of neighboring blocks using the same first transform as the first transform of the current block may be applied as the scan order of the current block. Also, the quadratic transform and scan order of neighboring blocks using the same first transform as the first transform of the current block may also be applied as the quadratic transform and scan order of the current block.

In the above-described embodiment, it has been described that at least one of the secondary transform and the scan order of the current block is derived from neighboring blocks using the same primary transform as the primary transform of the current block. However, at least one of the first transform and the second transform of the current block may be derived from neighboring blocks using the same second transform as the second transform of the current block, or may be derived from neighboring blocks using the same scan order as the scan order of the current block.

The second encoding information of the current block may be derived from neighboring blocks having the same intra prediction mode and the first encoding information as those of the current block.

In one embodiment, when there is at least one neighboring block using the same intra prediction mode and first transform as those determined for the current block and the first transform determined for the current block, the secondary transform of the current block may be set to be applied to a secondary transform of a neighboring block having the same intra prediction mode and using the same first transform as those of the current block. Here, encoding/decoding of encoding information required to perform secondary transformation on the residual signal of the current block may be omitted.

Also, a scan order of neighboring blocks using the same intra prediction mode and first transformation as those of the current block may be applied as the scan order of the current block. Alternatively, a secondary transform and a scan order of neighboring blocks using the same intra prediction mode and a first transform as those of the current block may be applied to the current block.

In the above-described embodiment, at least one of the secondary transform and the scan order of the current block is derived from neighboring blocks that have the same intra prediction mode as the current block and use the same primary transform as the primary transform of the current block. In addition to this, at least one of the first transform and the second transform of the current block may be derived from neighboring blocks having the same intra prediction mode as that of the current block and using the same second transform as that of the current block, or may be derived from neighboring blocks having the same intra prediction mode as that of the current block and using the same scan order as that of the current block.

First, in step S1901, it may be determined whether the inter prediction mode of the current block is the merge mode. When the inter prediction mode of the current block is the merge mode, in order to derive motion information of the current block, a neighboring block merged with the current block may be determined in step S1902. In one embodiment, the neighbor block merged with the current block may be determined by a merge index representing a neighbor block to be merged with the current block in the merge candidate list. Here, the neighboring blocks of the current block may include neighboring blocks spatially neighboring the current block and neighboring blocks temporally neighboring the current block.

When the neighboring block merged with the current block is determined, encoding information of a residual signal of the neighboring block merged with the current block may be derived as encoding information of a residual signal of the current block in step S1903. In one embodiment, at least one of the first transform, the second transform, and the scan order of the current block may be identically set to at least one of the first transform, the second transform, and the scan order of the neighboring block merged with the current block.

When the inter prediction mode of the current block is not the merge mode, it may be determined whether there is a neighboring block having the same motion information as that of the current block among neighboring blocks of the current block in step S1904. Here, the motion information may include at least one of a motion vector, a reference picture index, and a reference picture direction.

When there is a neighboring block having the same motion information as that of the current block, encoding information of a residual signal of the neighboring block having the same motion information as that of the current block may be derived as encoding information of the residual signal of the current block in step S1905. In one embodiment, at least one of the first transform, the second transform, and the scan order of the current block may be identically set to at least one of the first transform, the second transform, and the scan order of neighboring blocks having the same motion vector, at least one of the reference picture index, and the reference picture direction of the current block.

When there is no neighboring block having the same motion information as that of the current block, the encoding information of the residual signal of the current block may be entropy-encoded/decoded in step S1906. In one embodiment, when there is no neighboring block having the same motion information as that of the current block, at least one of transform information (or transform index) of a first transform, transform information (or transform index) of a second transform, and information (or scan index) of a scan order of the current block may be entropy-encoded/decoded.

In the example shown in fig. 19, neighboring blocks used to derive encoding information of a residual signal of the current block may be adaptively determined according to whether an inter prediction mode of the current block is a merge mode. Unlike the example shown in fig. 19, when the inter prediction mode of the current block is only the merge mode, encoding information of a residual signal of the current block may be derived from neighboring blocks. Alternatively, the encoding information of the current block may be derived from neighboring blocks having the same motion information as that of the current block, regardless of whether the inter prediction mode of the current block is a merge mode.

In the above-described embodiments, the description has been made in the case where the motion vector of the current block is the same as the motion vectors of the neighboring blocks and the encoding information of the residual signal of the current block is derived from the neighboring blocks. In another embodiment, the second encoding information of the residual signal of the current block may be derived from neighboring blocks having the same first encoding information or motion vector as that of the residual signal of the current block.

After deriving the motion information of the current block as in the above-described embodiment, encoding information of the current block may be derived from neighboring blocks based on whether the motion information of the current block is the same as the motion information of the neighboring blocks. Furthermore, encoding information of the current block may be derived based on motion information of neighboring blocks without considering motion information of the current block.

The above-described encoding information (such as the first transform, the second transform, and the scan order) may be encoded/decoded based on at least one piece of information indicating whether any one of a predefined type (e.g., a predefined transform type or a predefined scan type) and a residual type (e.g., a residual transform type or a residual scan type) other than the predefined type is used.

In one embodiment, when the residual signal is generated by intra prediction or inter prediction, or both, information indicating whether a predefined transform type is applied to the residual signal may be encoded. Here, the predefined transform type may be a transform type (e.g., DCT-II) mainly used when performing a transform on the residual signal, but is not limited thereto. The information may be a 1-bit flag (e.g., transform flag, TM flag). In one embodiment, when the TM flag is 0 (or 1), it may indicate that a predefined transform type is applied to the residual signal. When the TM flag is 1 (or 0), it may indicate that other transform types than the predefined transform type are applied to the residual signal. Further, the information may be configured with a flag having 2 bits or more, a first bit may indicate whether a predefined transform type is used for a first transform, and a second bit may indicate whether the predefined transform type is used for a second transform.

When the information indicates that another transform type other than the predefined transform type is applied to the residual signal, information for specifying any one of the residual transform types may be encoded. Here, the residual transform type may mean a remaining transform type except for a predefined transform type among transform types that may be applied to the residual signal. For example, when the predefined transform type is DCT-II, the residual transform type may include at least one of DCT-V, DCT-VIII, DST-I, and DST-VII. The information may be index information (TM idx) specifying any one of the residual transform types, and the index information may be any positive integer. For example, TM idx 1 may represent DCT-V, TM idx 2 may represent DCT-VIII, TM idx 3 may represent DST-I, and TM idx 4 may represent DST-VII.

The index information may indicate a transform type combination for the horizontal direction and the vertical direction of the residual signal. In other words, the 1D transform type for the horizontal/vertical direction may be determined by a single piece of index information. For example, when the TM flag is 1 while the TM idx is 1, a transform type combination matching the TM idx 1 may be determined as a transform type for the horizontal direction and the vertical direction of the current block. In one embodiment, when TM idx indicates DCT-V for the horizontal direction and indicates DCT-VIII for the vertical direction, DCT-V and DCT-VIII may be determined as the horizontal direction transform type and the vertical direction transform type of the current block, respectively.

When determining encoding parameters of the current block, at least one piece of information specifying any one of the following items may be derived from neighboring blocks of the current block: whether a predefined type is used, and a residual type. For example, at least one piece of information (TM idx) specifying any one of the following may be derived from neighboring blocks of the current block: information (TM flag) indicating whether a predefined transform type is applied to the current block, and a residual transform type.

In one embodiment, at least one of the TM flag and TM idx of the current block may be derived as the same value as that of the neighboring block of the current block.

Also, when at least one TM flag of a neighboring block of the current block is 1, encoding/decoding may be performed by implicitly assuming that the TM flag of the current block is 1. Here, the TM idx of the current block may be implicitly transmitted through a bitstream or may be implicitly derived from neighboring blocks.

Described by way of example, at least one piece of information (TM idx) for specifying any one of the following may be derived from neighboring blocks used when performing intra prediction or inter prediction on a current block: information (TM flag) indicating whether a predefined transform type is applied to the current block, and a residual transform type.

In one embodiment, when the inter prediction mode of the current block is the merge mode, the merge candidate may be newly configured in consideration of at least one of the TM flag and the TM idx. The newly configured merge candidate list may include merge candidates in which at least one of the TM flag and the TM idx has a different value. In one embodiment, the merge candidate list may be configured to have a first merge candidate and a second merge candidate, where the first merge candidate and the second merge candidate have the same motion information and have different TM flags or TM idx or both different TM flags and TM idx. At least one of the TM flag and the TM idx of the current block may be determined to be the same as at least one of the Merge candidates indicated by the Merge index (Merge _ idx). Accordingly, motion information (motion vector, reference picture index, inter prediction direction indicator) and a TM flag or TM idx or both the TM flag and the TM idx of the current block may be encoded/decoded based on the merge mode.

Here, information indicating that the merge candidate list is newly configured may be explicitly transmitted through a bitstream. The transmitted information may be a 1-bit flag, but is not limited thereto. Further, when the TM flag of at least one neighboring block of the current block is 1, it may be implicitly recognized that the merge candidate list is newly configured. Here, the neighbor block may be a block in which the TM flag first becomes 1 according to a predetermined neighbor block scanning order, or may be a predefined location block.

In the above-described embodiments, a method of deriving information (e.g., TM flag or TM idx or both TM flag and TM idx) for determining a transform type from neighboring blocks of a current block has been described. The embodiments may be applied to at least one of determining a transform type of a first transform and determining a transform type of a second transform of a current block. In one embodiment, in other words, at least one of transform information of a first transform (e.g., a TM flag (first TM flag) or TM idx (first TM idx) or both the TM flag and the TM idx) and transform information of a second transform (e.g., a TM flag (second TM flag) or TM idx (second TM idx)) may be derived from neighboring blocks of the current block.

Also, in addition to the merge candidate list generated based on the motion information, the merge candidate list may be generated based on transform information of the current block. In one embodiment, when a merge candidate list generated based on motion information of a neighboring block is defined as a "first merge candidate list" and a merge candidate list generated based on transform information of a neighboring block is defined as a "second merge candidate list", motion information of a current block is derived from a merge candidate within the first merge candidate list, which is designated by a first merge index. However, transform information of the current block may be derived from a merge candidate within the second merge candidate list specified by the second merge index.

In addition to this, information (e.g., a scan flag or scan idx or both) for determining the scan order of the current block from neighboring blocks of the current block may be derived. Here, the scan flag may indicate whether the scan order of the current block is the same as a predefined scan order, and the scan idx may be information indicating any one of residual scan orders.

According to another embodiment of the present invention, the same coding information may be applied to all blocks located within a signaling block within a current coded/decoded picture or slice. Here, the signaling block may represent an area having a size smaller than at least one of a horizontal resolution or a vertical resolution of the current picture or the current slice. In other words, a signaling block may be defined as a predetermined area having a size smaller than that of a current picture or a current slice.

The information of the signaling block may be transmitted through at least one of a sequence unit, a picture unit, and a slice header. In one embodiment, at least one of the size, form, or position of the signaling block may be transmitted through at least one of a sequence parameter set, a picture parameter set, and a slice header. Alternatively, the information of the signaling block may be implicitly derived from the coding information of the current block or a neighboring block adjacent to the current block. The signaling block may have a square or rectangular form, but is not limited thereto.

The coding information of the signaling block may be applied to all blocks included in the signaling block. In one embodiment, at least one of the first transformation, the second transformation, and the scanning order may be identically set to all blocks included in the signaling block. The coding information applied to all blocks included in the signaling block may be transmitted through a bitstream. Alternatively, the coding information of a specific location block within the signaling block may be applied to all blocks included in the signaling block.

In the above-described embodiments, it has been described that all blocks included in a signaling block have the same coding information. In another embodiment, blocks satisfying a predetermined condition among blocks included in the signaling block may be set to have the same encoding information. Here, the predetermined condition may be defined according to at least one of a size, a form, or a depth of the block. In one embodiment, at least one of the first transformation, the second transformation, and the scanning order may be identically set to a block having a predetermined size or less (e.g., a block having a size of 4 × 4 or less) among all blocks included in the signaling block.

The described embodiments of obtaining encoding information of a current block may be applied to a luminance component and a chrominance component. Further, by using at least one of the embodiments, information indicating that at least one of a primary transform, a secondary transform, and a scan is performed on a residual signal of a current block may be encoded/decoded. When the above information is entropy encoded/decoded, at least one of a truncated rice binarization method, a K-order exponential golomb binarization method, a constrained K-order exponential golomb binarization method, a fixed-length binarization method, a unary binarization method, and a truncated unary binarization method may be used as the entropy encoding method. Further, after the information is binarized, the information may be finally encoded/decoded by using CABAC (ae (v)). Alternatively, the encoding information of the current block may be implicitly derived by using at least one of a size and a form of the current block.

Next, encoding/decoding of motion vector information will be described in detail.

When the current block is encoded through inter prediction, the encoder may transmit a Motion Vector Difference (MVD), which represents a difference between a motion vector encoded adjacent to the current block and a motion vector of the current block, to the decoder.

The decoder may derive a motion vector encoded adjacent to the current block as a motion vector candidate for the current block. In detail, the decoder may derive a motion vector candidate from both of the decoded temporal motion vector and the decoded spatial motion vector of the current block or at least one of the decoded temporal motion vector or the decoded spatial motion vector of the current block, and configure a motion vector candidate list (MVP list).

The encoder may transmit information representing information (e.g., an MVP list index) of a motion vector predictor used to derive a motion vector difference among motion vector candidates included in the motion vector candidate list. Then, the decoding apparatus may determine the motion vector candidate indicated by the MVP list index as a motion vector predictor and derive a motion vector of the current block by using the motion vector predictor and the motion vector difference.

Based on the above explanation, a method of encoding/decoding motion vector information of a current block according to the present invention will be described in detail.

First, in step S2001, a spatial motion vector candidate of the current block may be derived. The spatial motion vector candidate of the current block may be derived from an encoded/decoded block included in the same picture as the picture including the current block.

As an example shown in fig. 21, the spatial motion vector of the current block may be derived from a block B1 adjacent to the upper side of the current block X, a block A1 adjacent to the left side of the current block, a block B0 adjacent to the upper right corner of the current block, a block B2 located at the upper left corner, and a block A0 adjacent to the lower left corner of the current block. A spatial motion vector derived from a neighboring block of the current block may be determined as a spatial motion vector candidate of the current block.

Here, the spatial motion vector candidates may be derived in a predetermined order. In one embodiment, the spatial motion vector candidate may determine whether a motion vector exists in each block in the order of A0, A1, B0, B1, and B2. When there are motion vectors of neighboring blocks, the motion vector of the corresponding neighboring block may be determined as a spatial motion vector candidate.

When the reference picture of the neighboring block and the reference picture of the current block are different, a motion vector obtained by scaling a motion vector of the neighboring block using a distance between the reference picture referred to by the current picture and the neighboring block and a distance between the reference picture referred to by the current picture and the current block may be determined as the spatial motion vector of the current block.

Subsequently, in step S2002, a temporal motion vector candidate for the current block may be derived. The temporal motion vector of the current block may be derived from reconstructed blocks within the co-located picture.

As an example shown in fig. 22, the temporal motion vector of the current block may be derived from a block existing at an H position outside the co-located block C corresponding to a spatially same position as the current block X within the co-located picture of the current picture or from a block existing at a C3 position inside the co-located block C. The temporal motion vector candidates may be sequentially derived from the block of the H position and the block of the C3 position. In one embodiment, when a motion vector may be derived from a block at an H position, a temporal motion vector candidate may be derived from the block at the H position. Alternatively, when a motion vector is not derivable from a block at the H position, a temporal motion vector candidate may be derived from a block at the C3 position. When a block at an H position or a C3 position is encoded by intra prediction, a temporal motion vector candidate of the current block is not derived.

In addition to the example shown in fig. 22, at least one temporal motion vector candidate of the current block may be derived from the co-located picture indicated by the obtained motion information of the current block and a co-located block or a neighboring block of the co-located block included in the co-located picture indicated by the motion information. Here, the motion information may include at least one of a picture index indicating the co-located picture and a motion vector indicating the co-located block within the co-located picture. Motion information for specifying the co-located picture and the co-located block may be additionally signaled for the current block.

The temporal motion vector candidate of the current block may be obtained in a sub-block unit having a size smaller than that of the current block. For example, when the size of the current block is 8 × 8, the temporal motion vector candidates may be obtained in sub-block units having a smaller size (such as 2 × 2, 4 × 4, 8 × 4, 4 × 8, etc.) than the current block. The sub-blocks may have a square or rectangular form. In addition, the size or form of the sub-block may be preset in the encoder/decoder, or may be determined according to the size or form of the current block.

Subsequently, in step S2003, a motion vector candidate list including at least one of a spatial motion vector candidate and a temporal motion vector candidate may be generated.

Here, the motion vector candidate list may be configured to include at least one temporal motion vector candidate. In one embodiment, when the number of motion vector candidates that can be included in the motion vector candidate list is N (here, N is a positive integer greater than 0), the motion vector candidate list may be configured to definitely include at least one motion vector candidate. Although a maximum of N spatial motion vector candidates different from each other may be derived when deriving the spatial motion vector candidates, at least one of the N spatial motion vector candidates may be removed from the motion vector candidate list by arbitrary similarity determination. Thus, the temporal motion vector candidate may be included in the motion vector candidate list. Here, the arbitrary similarity determination may represent a method of combining at least two spatial motion vectors into a single spatial motion vector by using a maximum value, a minimum value, an average value, a median value, or an arbitrary weighted sum when even the spatial motion vectors have different values from each other but the difference between the motion vectors is not large. The number of spatial motion vector candidates can be reduced by using any similarity determination.

Alternatively, when the N spatial motion vector candidates are included in the motion vector candidate list according to a predetermined priority, at least one of the spatial motion vector candidates may be removed from the motion vector candidate list in a reverse order of the predetermined priority. In other words, at least one of the spatial motion vector candidates may be removed from the motion vector candidate list in order from back to front. Thus, the temporal motion vector candidate may be included in the motion vector candidate list.

Whether to remove a spatial motion vector candidate from the motion vector candidate list described above may be determined according to whether a temporal motion vector candidate is used. Further, the number of spatial motion vector candidates removed from the motion vector candidate list may be determined according to the number of temporal motion vector candidates for the current block or the number of temporal motion vector candidates available for the current block.

Further, the number of motion vector candidates that can be included in the motion vector candidate list may be increased by 1 (in other words, increased to N + 1), so that temporal motion vector candidates are included in the motion vector candidate list.

Subsequently, in step S2004, any one of the motion vector candidates included in the motion vector candidate list may be determined as a motion vector predictor. In one embodiment, the decoder may determine the motion vector predictor of the current block based on information (e.g., MVP list index) specifying any one of the motion vector candidates included in the motion vector candidate list.

In step S2005, when the motion vector predictor of the current block is determined, the motion vector of the current block may be obtained by using the motion vector difference. The motion vector difference may represent a difference between a motion vector of the current block and a motion vector predictor of the current block. The motion vector difference of the current block may be entropy-encoded/decoded.

According to an embodiment of the present invention, in order to reduce the amount of information of a motion vector difference, the motion vector difference of a current block may be encoded by using a motion vector difference of a reconstructed block that is adjacent to the current block and encoded through inter prediction. In one embodiment, a second motion vector difference of the current block may be encoded, wherein the second motion vector difference represents a difference between: a motion vector difference representing a difference between a motion vector of the current block and the motion vector prediction, and a motion vector difference representing a difference between reconstructed blocks adjacent to the current block and encoded through inter prediction.

Assume that the Motion Vector Difference (MVD) of the current block (block 2) is (5,5). Here, the second motion vector difference of the current block may be encoded by using a motion vector difference of an upper block (block 1) located at an upper side of the current block.

In one embodiment, when it is assumed that the motion vector difference of the upper side block is (5,5), since the motion vector difference of the current block and the motion vector difference of the upper side block are the same, the second motion vector difference of the current block may become (0,0). When the motion vector difference (0,0) is encoded instead of the motion vector difference (5,5), the amount of information for encoding the motion vector difference of the current block may be reduced.

Also, when there is a block having the same motion vector difference as that of the current block, the motion vector difference of the current block may be derived from neighboring blocks without transmitting the motion vector difference of the current block.

As in the above-described example, the position of the neighboring block used to derive the second motion vector candidate of the current block or information indicating the position of the neighboring block having the same motion vector difference as that of the current block may be explicitly transmitted through a bitstream. In one embodiment, information (e.g., MVD index) for identifying, among neighboring blocks of the current block, a neighboring block used to derive the second motion vector candidate or a neighboring block having the same motion vector candidate as the motion vector candidate of the current block may be transmitted to the decoder through a bitstream.

In another embodiment, the position of a neighboring block used to derive the second motion vector candidate of the current block or information indicating the position of a neighboring block having the same motion vector difference as that of the current block may be implicitly derived in the encoder/decoder according to the same process. In one embodiment, a motion vector difference of a neighboring block used as a Motion Vector Predictor (MVP) of a current block may be used as a motion vector difference predictor (MVD predictor) for deriving a second motion vector difference of the current block.

When the current block is encoded by bidirectional prediction, information indicating whether the motion vector differences of the reference picture List 0 (List 0) and the reference picture List 1 (List 1) are the same may be encoded. Here, the same motion vector difference may indicate that the sign and magnitude of the motion vector difference are the same, or may indicate that the magnitude of the motion vector difference is the same but the sign of the motion vector difference is different. When the motion vector differences of the reference picture list 0 and the reference picture list 1 are the same, encoding/decoding of any one of the motion vector differences of the reference picture list 0 and the reference picture list 1 may be omitted.

According to another embodiment of the present invention, all blocks located within a signaling block of a current coded/decoded picture or slice may have at least one same Motion Vector Predictor (MVP) to derive an optimal Motion Vector Difference (MVD). Alternatively, according to another embodiment of the present invention, all blocks located within a signaling block of a current coded/decoded picture or slice may have at least one same motion vector difference prediction value (MVD predictor) to derive an optimal second motion vector difference. Here, the motion vector predictor or the motion vector difference predictor may be transmitted for each signaling block or may be implicitly derived by using coding information of neighboring blocks adjacent to the signaling block. Here, the signaling block may represent an area having a size smaller than at least one of a horizontal resolution and a vertical resolution of the current picture or the current slice. In other words, a signaling block may be defined as a predetermined area having a size smaller than that of a current picture or a current slice.

The above inter-coding/decoding process may be performed for each of the luminance signal and the chrominance signal. For example, at least one of the methods of obtaining an inter prediction indicator, generating a motion vector candidate list, deriving a motion vector, and performing motion compensation of the above inter encoding/decoding process may be differently applied to a luminance signal and a chrominance signal.

The above inter-coding/decoding process may be equally performed for the luminance signal and the chrominance signal. For example, at least one of an inter prediction indicator, a motion vector candidate list, a motion vector candidate, a motion vector, and a reference picture, which is applied to a luminance signal when the above inter encoding/decoding process is performed, may be equally applied to a chrominance signal.

The above method can be performed in the encoder and the decoder in the same way. For example, at least one of the methods of deriving a motion vector candidate list, deriving a motion vector candidate, deriving a motion vector, and performing motion compensation of the above inter-encoding/decoding process may be equally applied to an encoder and a decoder. Alternatively, the order of the above methods may be applied differently to the encoder and the decoder.

The above embodiments of the present invention may be applied according to the size of at least one of the coding block, the prediction block, the block, and the unit. Here, the size may be defined as a minimum size or a maximum size or both of the minimum size and the maximum size to which the above embodiments are applied, or may be defined as a fixed size to which the embodiments are applied. Further, in the above embodiments, the first embodiment may be applied to the first size, and the second embodiment may be applied to the second size. In other words, the embodiments may be applied in combination according to the size. Further, the above embodiments of the present invention may apply only the minimum size or larger size and the maximum size or smaller size. In other words, the above embodiments may be applied to block sizes included within a predetermined range.

For example, the above embodiment may be applied when the size of the encoding/decoding target block is 8 × 8 or more. For example, the above embodiment may be applied when the size of the encoding/decoding target block is 16 × 16 or more. For example, the above embodiment may be applied when the size of the encoding/decoding target block is 32 × 32 or more. For example, the above embodiment may be applied when the size of the encoding/decoding target block is 64 × 64 or more. For example, the above embodiment may be applied when the size of the encoding/decoding target block is 128 × 128 or more. For example, the above embodiment may be applied when the size of the encoding/decoding target block is 4 × 4. For example, the above embodiment may be applied when the size of the encoding/decoding target block is 8 × 8 or less. For example, the above embodiment may be applied when the size of the encoding/decoding target block is 16 × 16 or less. For example, the above embodiment may be applied when the size of the encoding/decoding target block is 8 × 8 or more and 16 × 16 or less. For example, the above embodiment may be applied when the size of the encoding/decoding target block is 16 × 16 or more and is 64 × 64 or less.

The above embodiments of the present invention may be applied according to temporal layers. Additional identifiers for identifying temporal layers to which the above embodiments may be applied may be signaled, and the above embodiments may be applied to temporal layers indicated by the respective identifiers. Here, the identifier may be defined to indicate a minimum layer or a maximum layer or both of the minimum layer and the maximum layer to which the embodiment may be applied, or may be defined to indicate a specific layer to which the above embodiment may be applied.

For example, the above embodiment can be applied when the temporal layer of the current picture is the lowest layer. For example, the above embodiment can be applied when the temporal layer identifier of the current picture is 0. For example, when the temporal layer identifier of the current picture is 1, the above embodiment can be applied. For example, the above embodiment may be applied when the temporal layer of the current picture is the highest layer.

As in the above embodiments of the present invention, the reference picture set used when generating the reference picture list and modifying the reference picture list may use at least one of the reference picture lists L0, L1, L2, and L3.

According to an embodiment of the present invention, at least one to at most N motion vectors of an encoding/decoding target block may be used when calculating the boundary strength in the deblocking filter. Here, N is a positive integer equal to or greater than 1, and may be 2, 3, 4, etc.

The above embodiment of the present invention can be applied when the motion vector has at least one of the following units when predicting the motion vector: 16-pixel (16-pel), 8-pixel (8-pel), 4-pixel (4-pel), integer-pixel (integer-pel), 1/2-pixel (1/2-pel), 1/4-pixel (1/4-pel), 1/8-pixel (1/8-pel), 1/16-pixel (1/16-pel), 1/32-pixel (1/32-pel), and 1/64-pixel (1/64-pel). Further, when predicting a motion vector, the motion vector may be optionally used in the above pixel unit.

The band type to which the above embodiments of the present invention are applied may be defined, and the above embodiments of the present invention may be applied according to the corresponding band type.

For example, when the slice type is T (three-way prediction) -slice, the prediction block may be generated by using at least three motion vectors, so that the weighted sum of the at least three prediction blocks may be used as a final prediction block of the encoding/decoding target block by calculating the weighted sum. For example, when the slice type is Q (four-way prediction) -slice, the prediction block may be generated by using at least four motion vectors, so that the weighted sum may be used as a final prediction block of the encoding/decoding target block by calculating the weighted sum of the at least four prediction blocks.

The above embodiments of the present invention can be applied to inter prediction and motion compensation methods using motion vector prediction, and can be applied to inter prediction and motion compensation methods using a skip mode or a merge mode.

The block form to which the above embodiments of the present invention can be applied may have a square form or a non-square form.

In the above embodiments, the method is described based on a flowchart having a series of steps or units, but the present invention is not limited to the order of the steps. Rather, some steps may be performed concurrently with other steps, or may be performed in a different order than other steps. Further, it will be understood by those of ordinary skill in the art that steps in the flowcharts are not mutually exclusive, and that other steps may be added to the flowcharts, or some steps may be deleted from the flowcharts, without affecting the scope of the present invention.

The embodiments that have been described above include examples of various aspects. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the various aspects, but one of ordinary skill in the art may recognize that many further combinations and permutations are possible. Accordingly, the specification is intended to embrace all such alternatives, modifications and variances which fall within the spirit and scope of the appended claims.

Computer readable storage media may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded in the computer-readable storage medium may be specially designed and constructed for the present invention or any program instructions well-known to those skilled in the computer software technology field. Examples of computer-readable storage media include: magnetic recording media (such as hard disks, floppy disks, and magnetic tape); optical data storage media (such as CD-ROM or DVD-ROM); magneto-optical media (such as floppy disks); and hardware devices that are specifically configured to store and implement program instructions, such as read-only memory (ROM), random-access memory (RAM), and flash memory. Examples of program instructions include not only machine language code, which is formatted by a compiler, but also high-level language code that may be implemented by the computer using an interpreter. The hardware device may be configured to be operated by one or more software modules to perform the processing according to the present invention, and vice versa.

Although the present invention has been described in terms of specific terms (such as detailed elements) and limited embodiments and drawings, they are provided only to assist in a more colloquial understanding of the present invention, and the present invention is not limited to the above embodiments. It will be understood by those skilled in the art that various modifications and changes may be made from the above description.

Therefore, the spirit of the present invention should not be limited to the above-described embodiments, and the full scope of the appended claims and their equivalents will fall within the scope and spirit of the present invention.

Industrial applicability

The present invention is applicable to an apparatus for encoding/decoding an image.

Claims

1. An image decoding method performed by an image decoding apparatus, the method comprising:

obtaining a current block by partitioning an image;

partitioning the current block using a partitioning method determined based on a size of the current block;

generating a prediction block of a coding block obtained based on the partitioning method; and

generating a reconstructed block using the prediction block,

wherein, when the size of the current block is greater than 64 x 64, the partition method is implicitly determined as a quad-tree partition without signaling partition information,

wherein the partitioning method is explicitly determined based on the signaled partition information when the size of the current block is less than or equal to 64 x 64,

wherein the current block is recursively and repeatedly partitioned until the size of the current block becomes 64 x 64 or less.

2. The image decoding method according to claim 1,

wherein the generating of the prediction block comprises: and in the case that the coding block meets a preset condition, not allowing binary tree partition.

3. The image decoding method as set forth in claim 2,

wherein the predetermined condition is derived based on a size of the encoded block.

4. The image decoding method as set forth in claim 1,

wherein the generating of the prediction block comprises: filtering the prediction block based on at least one of an intra prediction mode of the coding block and a size of the coding block.

5. The image decoding method according to claim 4,

wherein the filtering the prediction block comprises: determining a weight of a reference sample based on a position of a prediction sample included in the prediction block in a case where the intra prediction mode is a predetermined mode,

filtering the prediction block using the determined weights.

6. The image decoding method according to claim 5,

wherein the predetermined pattern is a planar pattern.

7. The image decoding method according to claim 5,

wherein the reference sample is a sample adjacent to at least one of a left side and an upper side of the prediction block.

8. An image encoding method performed by an image encoding apparatus, the method comprising:

obtaining a current block by partitioning an image;

encoding the coding block using the prediction block,

9. The image encoding method as set forth in claim 8,

10. The image encoding method as set forth in claim 9,

11. The image encoding method as set forth in claim 8,

12. The image encoding method as set forth in claim 11,

wherein the filtering the prediction block comprises: determining weights of reference samples based on positions of prediction samples included in the prediction block in a case where the intra prediction mode is a predetermined mode,

filtering the prediction block using the determined weights.

13. The image encoding method as set forth in claim 12,

wherein the predetermined pattern is a planar pattern.

14. A method of transmitting a bitstream generated by an image encoding method, comprising:

transmitting the bitstream generated by the image encoding method,

wherein the image encoding method comprises:

obtaining a current block by partitioning an image;

encoding the coding block using the prediction block,