CN116546211A

CN116546211A - Video encoding method, video encoding device, computer equipment and storage medium

Info

Publication number: CN116546211A
Application number: CN202210098184.4A
Authority: CN
Inventors: 张洪彬; 李翔; 王玉伟; 曹志强; 刘冠志; 王金华; 高剑林; 张贤国
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-01-26
Filing date: 2022-01-26
Publication date: 2023-08-04

Abstract

The embodiment of the invention discloses a video coding method, a video coding device, computer equipment and a storage medium, wherein the method comprises the following steps: obtaining a second video frame to be encoded from a target video, wherein the target video comprises a first video frame which is encoded, a reconstructed image frame of the first video frame comprises one or more image blocks, and the reconstructed image frame of the first video frame corresponds to a target filtering related parameter; when the second video frame is subjected to coding processing, a target image block is obtained from the reconstructed image frame of the first video frame; performing filtering processing on the target image block according to the target filtering related parameters to obtain a filtered target image block; and taking the target image block after the filtering processing as coding reference information of the second video frame, and coding the second video frame, so that the filtering efficiency can be improved, and the coding efficiency of the video frame can be further improved.

Description

Method and apparatus for deriving motion prediction information

The present application is a divisional application of the invention patent application with the application number of "201780035615.X" and the title of "method and apparatus for deriving motion prediction information" on the application date of 2017, 04, 07.

Technical Field

The following embodiments relate generally to a video decoding method and apparatus and a video encoding method and apparatus, and more particularly, to a method and apparatus for deriving motion prediction information and performing encoding and/or decoding on video using the derived motion prediction information.

The present application claims the benefit of korean patent application No. 10-2016-0043249, filed on 8 of 4/2016, which is hereby incorporated by reference in its entirety.

Background

With the continued development of the information and communication industry, broadcast services with High Definition (HD) resolution have been popular throughout the world. Through this popularity, a large number of users have become accustomed to high resolution and high definition images and/or videos.

In order to meet the demands of users for high definition, a large number of institutions have accelerated the development of next-generation imaging devices. In addition to the increased interest of users in High Definition TV (HDTV) and Full High Definition (FHD) TV, there has been an increase in Ultra High Definition (UHD) TV, which has a resolution that is four times or more that of full high definition (FUD) TV. With this interest increasing, image encoding/decoding techniques for images with higher resolution and higher definition are required.

The image encoding/decoding apparatus and method may use an inter prediction technique, an intra prediction technique, an entropy encoding technique, or the like in order to perform encoding/decoding on high resolution and high definition images. The inter prediction technique may be a technique for predicting values of pixels included in a current picture using a temporally preceding picture and/or a temporally following picture. The intra prediction technique may be a technique for predicting a value of a pixel included in a current picture using information about the pixel in the current picture. Entropy coding techniques may be techniques for assigning short codes to more frequently occurring symbols and long codes to less frequently occurring symbols.

In image encoding and decoding processes, prediction may mean generating a predicted signal similar to the original signal. Predictions can be categorized primarily as: prediction of reference spatially reconstructed images, prediction of reference temporally reconstructed images, and prediction with reference to other symbols. In other words, the temporal reference may represent that the temporally reconstructed image is referenced and the spatial reference may represent that the spatially reconstructed image is referenced.

Inter prediction may be a technique for predicting a target block using a temporal reference and a spatial reference. Intra prediction may be a technique for predicting a target block using only spatial references.

When pictures constituting a video are encoded, each picture may be partitioned into a plurality of portions, and the plurality of portions may be encoded. In this case, in order for the decoding apparatus to decode the partitioned picture, information about the partition of the picture may be required.

In order to increase the encoding processing speed, pictures may be encoded in parallel using a parallel encoding method. In addition, in order to increase the decoding processing speed, the pictures may be decoded in parallel using a parallel decoding method.

Parallel coding methods include a variety of picture partition coding methods. A slice-based picture partition coding method and a parallel block-based picture partition coding method are provided as picture partition coding methods.

Conventional picture partition coding methods do not allow reference between sections of a partitioned picture when coding using intra prediction. On the other hand, the conventional picture partition coding method allows reference between sections of a partitioned picture when coding using inter prediction.

Therefore, when it is desired to perform parallel encoding on each picture partition unit using the conventional picture partition encoding method, synchronization must be achieved for each picture. The efficiency of parallel processing by the encoding apparatus in the case where synchronization is required for each picture is inevitably lower than that of parallel processing by the encoding apparatus in the case where synchronization is not required.

Disclosure of Invention

Technical problem

Embodiments are directed to a method and apparatus for preventing inter-section reference from occurring when a picture partitioned into sections is encoded or decoded.

Embodiments aim to provide a method and apparatus for performing parallel encoding or parallel decoding on sections by preventing inter-section references.

Embodiments are directed to a method and apparatus for performing encoding or decoding that does not refer to other sections when performing inter prediction on a target block in one section.

Embodiments aim to provide a method and apparatus that generates a list of motion information such that no reference is made to one section when inter prediction is performed on a target block in another section.

Embodiments are directed to a method and apparatus that allow only a region corresponding to inter prediction to be referenced when performing encoding using inter prediction.

Embodiments aim to provide a method and apparatus that excludes motion information that causes a target block to reference a location outside the boundary of an area from a list.

Solution scheme

According to an aspect, there is provided a list generation method for generating a list for inter prediction of a target block, including: determining whether motion information of the candidate block is to be added to the list; if it is determined that the motion information is to be added to the list, the motion information is added to the list, wherein it is determined whether the motion information is to be added to the list based on the information about the target block and the motion information.

The information about the target block may be a location of the target block.

Whether the motion information is to be added to the list may be determined based on a motion vector of the motion information.

Whether the motion information is to be added to the list may be determined based on a position indicated by a motion vector of the motion information applied to the target block.

The position indicated by the motion vector may be a position in a reference picture that is referenced by the target block.

The motion information may be added to the list if the location is within an area, and conversely, the motion information is not added to the list if the location is outside the area.

The region may be a region of a stripe including the target block, a region of a parallel block including the target block, or a region of a motion constrained parallel block set MCTS including the target block.

If the location is not outside the boundary, the motion information may be added to the list, whereas if the location is outside the boundary, the motion information may not be added to the list.

The boundary may comprise a boundary of a picture.

The boundaries may include boundaries between stripes, boundaries between parallel blocks, or boundaries between MCTSs.

The inter prediction mode of the target block may be a merge mode or a skip mode.

The list may be a merged list.

The inter prediction mode of the target block may be an advanced motion vector predictor AMVP mode.

The list may be a predicted motion vector candidate list.

The candidate block may include a plurality of spatial candidates and a plurality of temporal candidates.

The motion information of the candidate block may be added to the list if the candidate block is available and the motion information of the candidate block is not repeated with other motion information present in the list.

Even if the candidate block is the first available candidate block, the motion information of the candidate block may not be added to the list when the information on the target block and the motion information satisfy a specific condition.

According to another aspect, there is provided a list generating apparatus for generating a list for inter prediction of a target block, comprising: a processing unit configured to determine whether or not the motion information of a candidate block is to be added to the list based on information on the target block and motion information of the candidate block; and the storage unit is used for storing the list.

According to another aspect, there is provided a method for setting availability of candidate blocks for inter prediction of a target block, comprising: determining whether the candidate block is available; the availability of the candidate block is set based on a result of the determination, wherein the availability is set based on information on a target block and motion information of an object including the candidate block.

The object may be a prediction unit PU.

Whether a candidate block is available may be determined based on a motion vector of the motion information.

Whether the candidate block is available may be determined based on a position indicated by a motion vector of the motion information applied to the target block.

The candidate block may be set to be available if the location exists within the region, and conversely, the candidate block may be set to be unavailable if the location is outside the region.

Advantageous effects

A method and apparatus for preventing inter-section reference from occurring when a picture partitioned into sections is encoded or decoded are provided.

A method and apparatus for performing parallel encoding or parallel decoding on a section by preventing inter-section reference are provided.

A method and apparatus for performing encoding or decoding that does not refer to other sections when performing inter prediction on a target block in one section are provided.

A method and apparatus for generating a list of motion information such that no reference is made to a target block in one section when inter prediction is performed on the other section is provided.

A method and apparatus are provided that allow only a region corresponding to inter prediction to be referenced when encoding using inter prediction is performed.

A method and apparatus are provided for excluding motion information that causes a target block to reference a position outside a boundary of an area from a list.

Drawings

Fig. 1 is a block diagram showing the configuration of an embodiment of an encoding apparatus to which the present invention is applied;

fig. 2 is a block diagram showing the configuration of an embodiment of a decoding apparatus to which the present invention is applied;

fig. 3 is a diagram schematically showing a partition structure of an image when the image is encoded and decoded;

fig. 4 is a diagram showing the shape of a Prediction Unit (PU) that an encoding unit (CU) can include;

fig. 5 is a diagram illustrating the shape of a Transform Unit (TU) that can be included in a CU;

fig. 6 is a diagram for explaining an embodiment of an intra prediction process;

fig. 7 is a diagram for explaining the position of a reference sample point used in an intra prediction process;

fig. 8 is a diagram for explaining an embodiment of an inter prediction process;

FIG. 9 illustrates spatial candidates according to an embodiment;

fig. 10 illustrates an order of adding motion information of spatial candidates to a merge list according to an embodiment;

FIG. 11 illustrates partitioning a picture using parallel blocks (tile) according to an embodiment;

FIG. 12 illustrates partitioning a picture using slices according to an embodiment;

FIG. 13 illustrates distributively encoding temporally and spatially partitioned pictures according to an embodiment;

FIG. 14 illustrates processing of a motion constrained parallel block set (MCTS) according to an embodiment;

FIG. 15 illustrates PUs adjacent to the boundary of a stripe, according to an embodiment;

FIG. 16 illustrates a merge list according to an embodiment;

fig. 17 is a flowchart of an inter prediction method according to an embodiment;

FIG. 18 is a flowchart of a method for generating a merge list for inter prediction of a target block, according to an embodiment;

FIG. 19 is a flowchart of a method for generating a predicted motion vector candidate list for inter prediction of a target block, according to an embodiment;

FIG. 20 is a flowchart of a method for determining availability of candidate blocks for inter prediction of a target block, according to an embodiment;

FIG. 21 shows a merge list to which motion prediction boundary checking is applied according to an embodiment;

fig. 22 is a configuration diagram of an electronic device implementing an encoding apparatus according to an embodiment;

fig. 23 is a configuration diagram of an electronic device implementing a decoding apparatus according to an embodiment.

Best mode for carrying out the invention

The following exemplary embodiments will be described in detail with reference to the accompanying drawings showing specific embodiments.

In the drawings, like reference numerals are used to designate the same or similar functions in all respects. The shapes, sizes, etc. of components in the drawings may be exaggerated to make the description clear.

It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Furthermore, it should be noted that, in the exemplary embodiments, the expression for describing "including" a specific component means that another component may be included in the practical scope or technical spirit of the exemplary embodiments, but does not exclude the presence of components other than the specific component.

For convenience of description, the respective components are separately arranged. For example, at least two of the plurality of components may be integrated into a single component. Instead, one component may be divided into a plurality of components. Embodiments in which multiple components are integrated or embodiments in which some components are separated are included in the scope of the present specification as long as they do not depart from the essence of the present specification.

The embodiments will be described in detail below with reference to the drawings so that those skilled in the art to which the embodiments pertain can easily practice the embodiments. In the following description of the embodiments, a detailed description of known functions or configurations that are considered to obscure the gist of the present specification will be omitted.

Hereinafter, "image" may represent a single picture that forms part of a video, or may represent the video itself. For example, "encoding and/or decoding an image" may mean "encoding and/or decoding a video" and may also mean "encoding and/or decoding any one of a plurality of images constituting a video".

Hereinafter, "video" and "moving picture" may be used to have the same meaning and may be used interchangeably with each other.

Hereinafter, the target image may be an encoding target image that is a target to be encoded and/or a decoding target image that is a target to be decoded. Further, the target image may be an input image to be input to the encoding apparatus or an input image to be input to the decoding apparatus.

Hereinafter, the terms "image", "picture", "frame", and "screen" may be used to have the same meaning and may be used interchangeably with each other.

Hereinafter, the target block may be an encoding target block that is a target to be encoded and/or a decoding target block that is a target to be decoded. Further, the target block may be a current block that is a target of being currently encoded and/or decoded. In other words, the "target block" and the "current block" may be used as having the same meaning, and may be used interchangeably with each other.

Hereinafter, "block" and "unit" may be used as having the same meaning, and may be used interchangeably with each other. Alternatively, "block" may represent a particular unit.

Hereinafter, "region" and "section" may be used interchangeably with each other.

Hereinafter, the specific signal may be a signal indicating a specific block. For example, the original signal may be a signal indicating a target block. The prediction signal may be a signal indicating a prediction block. The residual signal may be a signal indicating a residual block.

In the following embodiments, specific information, data, flags, elements, and attributes may have their respective values. A value of 0 corresponding to each of the information, data, flags, elements, and attributes may indicate a logical false or a first predefined value. In other words, the values "0", false, logical false and first predefined value may be used interchangeably with each other. The value "1" corresponding to each of the information, data, flags, elements, and attributes may indicate a logical true or a second predefined value. In other words, the values "1", true, logical true and second predefined values may be used interchangeably with each other.

When a variable such as i or j is used to indicate a row, column, or index, the value i may be an integer of 0 or an integer greater than 0, or may be an integer of 1 or an integer greater than 1. In other words, in an embodiment, each of the rows, columns, and indexes may be counted starting from 0, or may be counted starting from 1.

Next, terms to be used in the embodiments will be described.

A unit: "unit" may mean a unit of image encoding and decoding. The meaning of the terms "unit" and "block" may be identical to each other. Furthermore, the terms "unit" and "block" are used interchangeably.

The cells may be an mxn sample matrix. M and N may each be positive integers. The term "cell" may generally refer to an array of two-dimensional (2D) samples. The term "sample" may be a pixel or a pixel value.

The terms "pixel" and "sample" may be used with the same meaning and are used interchangeably with each other.

During the encoding and decoding of an image, a "unit" may be a region created by partitioning an image. A single image may be partitioned into multiple units. In encoding and decoding an image, a process predefined for each unit may be performed according to the type of the unit. The types of units may be classified into macro units, coding Units (CUs), prediction Units (PUs), and Transform Units (TUs) according to functions. The individual cells may be further partitioned into lower level cells having smaller dimensions than the size of the cell.

The unit partition information may comprise information about the depth of the unit. The depth information may represent the number and/or degree to which the unit is partitioned.

A single unit may be hierarchically partitioned into a plurality of lower layer units, while the plurality of lower layer units have tree-structure-based depth information. In other words, a cell and a lower level cell generated by partitioning the cell may correspond to a node and a child node of the node, respectively. Each partitioned lower level unit may have depth information. The depth information of a unit indicates the number and/or degree of times the unit is partitioned, and thus the partition information of a lower layer unit may include information about the size of the lower layer unit.

In a tree structure, the top node may correspond to the initial node before partitioning. The top node may be referred to as the "root node". Further, the root node may have a minimum depth value. Here, the depth of the top node may be level "0".

A node of depth level "1" may represent a cell that is generated when the initial cell is partitioned once. A node of depth level "2" may represent a cell that is generated when an initial cell is partitioned twice.

Leaf nodes of depth level "n" may represent units that are generated when an initial unit is partitioned n times.

The leaf node may be a bottom node, which cannot be partitioned further. The depth of the leaf node may be a maximum level. For example, the predefined value for the maximum level may be 3.

-a Transform Unit (TU): the TUs may be basic units of residual signal encoding and/or residual signal decoding (such as transform, inverse transform, quantization, inverse quantization, transform coefficient encoding, and transform coefficient decoding). A single TU may be partitioned into multiple TUs, wherein each of the multiple TUs has a smaller size.

-a Prediction Unit (PU): a PU may be a basic unit in the execution of prediction or compensation. A PU may be partitioned into multiple partitions by performing the partitioning. The plurality of partitions may also be basic units in the execution of prediction or compensation. The partition generated via partitioning the PU may also be a prediction unit.

-reconstructed neighboring units: the reconstructed neighboring cells may be previously decoded and reconstructed cells that are in the vicinity of the target cell. The reconstructed neighboring cell may be a cell that is spatially neighboring the target cell or a cell that is temporally neighboring the target cell.

-prediction unit partition: the prediction unit partition may represent the shape of the PU being partitioned.

-parameter set: the parameter set may correspond to information about a header of a structure of the bitstream. For example, the parameter sets may include a sequence parameter set, a picture parameter set, an adaptation parameter set, and the like.

Rate distortion optimization: the encoding device may use rate distortion optimization to provide higher encoding efficiency by utilizing a combination of: the size of a CU, prediction mode, size of a prediction unit, motion information, and size of a TU.

-rate distortion optimization scheme: the scheme may calculate the rate distortion costs for each combination to select the optimal combination from among the combinations. The rate distortion cost may be calculated using equation 1 below. In general, the combination that minimizes the rate-distortion cost may be selected as the optimal combination under the rate-distortion optimization method.

[ equation 1]

D+λ*R

Here, D may represent distortion. D may be an average value (mean square error) of the squares of the differences between the original transform coefficients and the reconstructed transform coefficients in the transform unit.

R represents a code rate, which may represent a bit rate using the relevant context information.

Lambda represents the lagrangian multiplier. R may include not only coding parameter information such as a prediction mode, motion information, and a coding block flag, but also bits generated due to coding of transform coefficients.

The encoding device may perform processes such as inter-prediction and/or intra-prediction, transformation, quantization, entropy coding, inverse quantization, and inverse transformation in order to calculate accurate D and R. These processes can greatly increase the complexity of the encoding device.

-a reference picture: the reference picture may be an image used for inter prediction or motion compensation. The reference picture may be a picture including a reference unit that is referenced by the target unit to perform inter prediction or motion compensation.

-reference picture list: the reference picture list may be a list including reference pictures used for inter prediction or motion compensation. The type of reference picture list may be a merged List (LC), list 0 (L0), list 1 (L1), etc.

-Motion Vector (MV): the MV may be a 2D vector for inter prediction. For example, it may be as follows (mv) _x ，mv _y ) Is expressed as MV. mv (mv) _x Can indicate the horizontal component, mv _y The vertical component may be indicated.

The MV may represent an offset between the target picture and the reference picture.

Search range: the search range may be a 2D region in which a search for MVs is performed during inter prediction. For example, the size of the search range may be mxn. M and N may each be positive integers.

Fig. 1 is a block diagram showing the configuration of an embodiment of an encoding apparatus to which the present invention is applied.

The encoding apparatus 100 may be a video encoding apparatus or an image encoding apparatus. The video may include one or more images (pictures). The encoding device 100 may encode one or more images of the video sequentially over time.

Referring to fig. 1, the encoding apparatus 100 includes an inter prediction unit 110, an intra prediction unit 120, a switcher 115, a subtractor 125, a transform unit 130, a quantization unit 140, an entropy decoding unit 150, an inverse quantization unit 160, an inverse transform unit 170, an adder 175, a filtering unit 180, and a reference picture buffer 190.

The encoding apparatus 100 may perform encoding on the target image using intra mode and inter mode.

Further, the encoding apparatus 100 may generate a bitstream including information about encoding by encoding a target image, and may output the generated bitstream.

When the intra mode is used, the switcher 115 can switch to the intra mode. When the inter mode is used, the switcher 115 can switch to the inter mode.

The encoding apparatus 100 may generate a prediction block for a target block. Further, after the prediction block is generated, the encoding apparatus 100 may encode a residual between the target block and the prediction block.

When the prediction mode is an intra mode, the intra prediction unit 120 may use pixels of previously encoded neighboring blocks around the target block as reference pixels. Intra-prediction unit 120 may perform spatial prediction on the target block using the reference pixels and generate prediction samples for the target block via spatial prediction.

The inter prediction unit 110 may include a motion prediction unit and a motion compensation unit.

When the prediction mode is an inter mode, the motion prediction unit may search for a region in the reference image that best matches the target block in the motion prediction process, and may derive a motion vector for the target block and the found region. The reference picture may be stored in a reference picture buffer 190. More specifically, when encoding and/or decoding of the reference picture has been processed, the reference picture may be stored in the reference picture buffer 190.

The motion compensation unit may generate a prediction block for the target block by performing motion compensation using the motion vector. Here, the motion vector may be a two-dimensional (2D) vector for inter prediction. Further, the motion vector may represent an offset between the target image and the reference image.

The subtractor 125 may generate a residual block, wherein the residual block is a residual between the target block and the prediction block.

The transform unit 130 may generate transform coefficients by transforming the residual block, and may output the generated transform coefficients. Here, the transform coefficient may be a coefficient value generated by transforming the residual block. When the transform skip mode is used, the transform unit 130 may omit an operation of transforming the residual block.

By quantizing the transform coefficients, quantized transform coefficient levels may be generated. Here, in an embodiment, the quantized transform coefficient level may also be referred to as a "transform coefficient".

The quantization unit 140 may generate quantized transform coefficient levels by quantizing the transform coefficients according to quantization parameters. The quantization unit 140 may output quantized transform coefficient levels. In this case, the quantization unit 140 may quantize the transform coefficient using a quantization matrix.

The entropy decoding unit 150 may generate a bitstream by performing entropy encoding based on probability distribution based on the values calculated by the quantization unit 140 and/or the encoding parameter values calculated in the encoding process. The entropy decoding unit 150 may output the generated bitstream.

In addition to pixel information of an image, the entropy decoding unit 150 may perform entropy encoding on information required to decode the image. For example, information required for decoding an image may include syntax elements and the like.

The encoding parameters may be information required for encoding and/or decoding. The encoding parameters may include information encoded by the encoding device 100 and transmitted from the encoding device 100 to the decoding device, and may also include information derived during encoding or decoding. For example, the information transmitted to the decoding device may include syntax elements.

For example, the encoding device may include values or statistical information such as prediction modes, motion vectors, reference picture indexes, encoded block patterns, presence or absence of residual signals, transform coefficients, quantized transform coefficients, quantization parameters, block sizes, and block partition information. The prediction mode may be an intra prediction mode or an inter prediction mode.

The residual signal may represent a difference between the original signal and the predicted signal. Alternatively, the residual signal may be a signal generated by transforming a difference between the original signal and the predicted signal. Alternatively, the residual signal may be a signal generated by transforming and quantizing a difference between the original signal and the predicted signal.

When entropy coding is applied, fewer bits may be allocated to more frequently occurring symbols and more bits may be allocated to less frequently occurring symbols. Since the symbol is represented by this allocation, the size of the bit string for the target symbol to be encoded can be reduced. Accordingly, the compression performance of video coding can be improved by entropy coding.

Further, for entropy encoding, an encoding method such as exponential golomb, context Adaptive Variable Length Coding (CAVLC), or Context Adaptive Binary Arithmetic Coding (CABAC) may be used. For example, the entropy decoding unit 150 may perform entropy encoding using a variable length coding/coding (VLC) table. For example, the entropy decoding unit 150 may derive a binarization method for the target symbol. Furthermore, entropy decoding unit 150 may derive a probability model for the target symbol/binary bit. The entropy decoding unit 150 may perform entropy encoding using a derived binarization method or probability model.

Since the encoding apparatus 100 performs encoding via inter prediction, a target image may be used as a reference image for another image to be subsequently processed. Accordingly, the encoding apparatus 100 may decode the encoded target image and store the decoded image as a reference image in the reference picture buffer 190. For decoding, inverse quantization and inverse transformation of the encoded target image may be performed.

The quantized coefficients may be inverse quantized by the inverse quantization unit 160 and may be inverse transformed by the inverse transformation unit 170. The coefficients that have been inverse quantized and inverse transformed may be added to the prediction block by adder 175. The inverse quantized and inverse transformed coefficients and the prediction block are added, and then a reconstructed block may be generated.

The reconstructed block may be filtered by a filtering unit 180. The filtering unit 180 may apply one or more of a deblocking filter, a Sample Adaptive Offset (SAO) filter, and an Adaptive Loop Filter (ALF) to the reconstructed block or the reconstructed picture. The filtering unit 180 may also be referred to as an "adaptive in-loop filter".

The deblocking filter may remove block distortion that occurs at the boundaries of the blocks. The SAO filter may add the appropriate offset value to the pixel value to compensate for the coding error. The ALF may perform filtering based on a comparison result between the reconstructed block and the original block. The reconstructed block filtered by the filtering unit 180 may be stored in a reference picture buffer 190. The reconstructed block filtered by the filtering unit 180 may be a part of a reference picture. In other words, the reference picture may be a picture composed of a reconstructed block filtered by the filtering unit 180. The stored reference pictures may then be used for inter prediction.

Fig. 2 is a block diagram showing the configuration of an embodiment of a decoding apparatus to which the present invention is applied.

The decoding apparatus 200 may be a video decoding apparatus or an image decoding apparatus.

Referring to fig. 2, the decoding apparatus 200 may include an entropy decoding unit 210, an inverse quantization unit 220, an inverse transformation unit 230, an intra prediction unit 240, an inter prediction unit 250, an adder 255, a filtering unit 260, and a reference picture buffer 270.

The decoding apparatus 200 may receive the bit stream output from the encoding apparatus 100. The decoding apparatus 200 may perform decoding on the bit stream in an intra mode and/or an inter mode. Further, the decoding apparatus 200 may generate a reconstructed image via decoding, and may output the reconstructed image.

For example, an operation of switching to an intra mode or an inter mode based on a prediction mode for decoding may be performed by a switch. When the prediction mode for decoding is an intra mode, the switch may be operated to switch to the intra mode. When the prediction mode for decoding is an inter mode, the switch may be operated to switch to the inter mode.

The decoding apparatus 200 may acquire a reconstructed residual block from an input bitstream and may generate a prediction block. When the reconstructed residual block and the prediction block are acquired, the decoding apparatus 200 may generate the reconstructed block by adding the reconstructed residual block and the prediction block.

The entropy decoding unit 210 may generate symbols by performing entropy decoding on the bitstream based on the probability distribution. The generated symbols may include quantized coefficient format symbols. Here, the entropy decoding method may be similar to the entropy encoding method described above. That is, the entropy decoding method may be an inverse of the entropy encoding method described above.

The quantized coefficients may be dequantized by dequantization unit 220. Further, the inversely quantized coefficients may be inversely transformed by the inverse transformation unit 230. As a result of the inverse quantization and inverse transformation of the quantized coefficients, a reconstructed residual block may be generated. Here, the inverse quantization unit 220 may apply a quantization matrix to the quantized coefficients.

When using the intra mode, the intra prediction unit 240 may generate a prediction block by performing spatial prediction using pixel values of previously decoded neighboring blocks around the target block.

The inter prediction unit 250 may include a motion compensation unit. When the inter mode is used, the motion compensation unit 250 may generate a prediction block by performing motion compensation using a motion vector and a reference image. The reference picture may be stored in a reference picture buffer 270.

The reconstructed residual block and the prediction block may be added to each other by an adder 255. The adder 255 may generate a reconstructed block by adding the reconstructed residual block and the prediction block.

The reconstructed block may be filtered by a filtering unit 260. The filtering unit 260 may apply one or more of a deblocking filter, an SAO filter, and an ALF to the reconstructed block or the reconstructed picture. The reconstructed block filtered by the filtering unit 260 may be stored in a reference picture buffer 270. The reconstructed block filtered by the filtering unit 260 may be part of a reference picture. In other words, the reference picture may be a picture composed of reconstructed blocks filtered by the filtering unit 260. The stored reference pictures may then be used for inter prediction.

Fig. 3 is a diagram schematically showing an image partition structure when an image is encoded and decoded.

In order to partition an image efficiently, a Coding Unit (CU) may be used in encoding and decoding. The term "unit" may be used to collectively designate 1) a block including image samples and 2) a syntax element. For example, "partition of a unit" may represent "partition of a block corresponding to the unit".

Referring to fig. 3, an image 200 is sequentially partitioned into units corresponding to a Largest Coding Unit (LCU), and a partition structure of the image 300 may be determined according to the LCU. Here, LCUs may be used to have the same meaning as Code Tree Units (CTUs).

The partition structure may represent a distribution of Coding Units (CUs) in LCU 310 for efficiently encoding an image. Such a distribution may be determined according to whether a single CU is to be partitioned into four CUs. The horizontal and vertical sizes of each CU resulting from partitioning may be half the horizontal and vertical sizes of the CU before being partitioned. Each partitioned CU may be recursively partitioned into four CUs, and in the same manner, the horizontal and vertical sizes of the four CUs are halved.

Here, partitioning of CUs may be performed recursively until a predefined depth. The depth information may be information indicating the size of the CU. Depth information may be stored for each CU. For example, the depth of the LCU may be 0 and the depth of the Smallest Coding Unit (SCU) may be a predefined maximum depth. Here, as described above, the LCU may be a CU having a maximum coding unit size, and the SCU may be a CU having a minimum coding unit size.

Partitioning begins at LCU 310, and the depth of a CU may be increased by "1" each time the horizontal and vertical dimensions of the CU are halved by partitioning. For each depth, a CU that is not partitioned may have a size of 2n×2n. Further, in the case where a CU is partitioned, a CU having a size of 2n×2n may be partitioned into four CUs each having a size of n×n. The dimension N may be halved each time the depth increases by 1.

Referring to fig. 3, an LCU of depth 0 may have 64×64 pixels. 0 may be the minimum depth. An SCU of depth 3 may have 8 x 8 pixels. 3 may be the maximum depth. Here, a CU having 64×64 pixels as an LCU may be represented by a depth of 0. A CU with 32 x 32 pixels may be represented by a depth of 1. A CU with 16 x 16 pixels may be represented by a depth of 2. A CU with 8×8 pixels as SCU may be represented by depth 3.

Further, information on whether the corresponding CU is partitioned may be represented by partition information of the CU. The partition information may be 1-bit information. All CUs except SCU may include partition information. For example, when a CU is not partitioned, the value of partition information of the CU may be 0. When a CU is partitioned, the value of partition information of the CU may be 1.

Fig. 4 is a diagram illustrating the shape of a Prediction Unit (PU) that an encoding unit (CU) can include.

Among CUs partitioned from LCUs, a CU that is no longer partitioned may be partitioned into one or more Prediction Units (PUs). This partitioning may also be referred to as "partitioning".

A PU may be a base unit for prediction. The PU may be encoded and decoded in any one of a skip mode, an inter mode, and an intra mode. The PU may be partitioned into various shapes according to various modes. For example, the target blocks described above with reference to fig. 1 and the target blocks described above with reference to fig. 2 may each be a PU.

In skip mode, no partition may be present in the CU. In the skip mode, the 2n×2n mode 410 may be supported without partitioning, wherein the size of the PU and the size of the CU are the same as each other in the 2n×2n mode.

In inter mode, there may be 8 types of partition shapes in a CU. For example, in the inter mode, a 2n×2n mode 410, a 2n×n mode 415, an n×2n mode 420, an n×n mode 425, a 2n×nu mode 430, a 2n×nd mode 435, an nl×2n mode 440, and an nr×2n mode 445 may be supported.

In intra mode, a 2nx2n mode 410 and an nxn mode 425 may be supported.

In the 2n×2n mode 410, PUs of size 2n×2n may be encoded. A PU of size 2N x 2N may represent a PU of the same size as a CU. For example, a PU of size 2N x 2N may have a size of 64 x 64, 32 x 32, 16 x 16, or 8 x 8.

In the nxn mode 425, a PU of size nxn may be encoded.

For example, in intra prediction, four partitioned PUs may be encoded when the PU size is 8 x 8. The size of each partitioned PU may be 4 x 4.

When encoding a PU in intra mode, the PU may be encoded using any of a plurality of intra prediction modes. For example, HEVC techniques may provide 35 intra-prediction modes, and a PU may be encoded in any of the 35 intra-prediction modes.

Which of the 2nx2n mode 410 and the nxn mode 425 is to be used to encode the PU may be determined based on the rate distortion cost.

The encoding apparatus 100 may perform an encoding operation on a PU having a size of 2nx2n. Here, the encoding operation may be an operation of encoding the PU in each of a plurality of intra prediction modes that can be used by the encoding apparatus 100. By the encoding operation, the optimal intra prediction mode for a PU of size 2n×2n can be obtained. The optimal intra prediction mode may be an intra prediction mode that exhibits a minimum rate distortion cost when encoding a PU having a size of 2nx2n among a plurality of intra prediction modes that can be used by the encoding apparatus 100.

Further, the encoding apparatus 100 may sequentially perform encoding operations on respective PUs obtained by performing nxn partitioning. Here, the encoding operation may be an operation of encoding the PU in each of a plurality of intra prediction modes that can be used by the encoding apparatus 100. By the encoding operation, the best intra prediction mode for a PU of size nxn can be obtained. The optimal intra prediction mode may be an intra prediction mode that exhibits a minimum rate distortion cost when encoding a PU of size nxn among a plurality of intra prediction modes that can be used by the encoding apparatus 100.

The encoding apparatus 100 may determine which one of a PU of a size 2nx2n and a PU of a size nxn is to be encoded based on a comparison result between a rate distortion cost of the PU of the size 2nx2n and a rate distortion cost of the PU of the size nxn.

Fig. 5 is a diagram illustrating the shape of a Transform Unit (TU) that can be included in a CU.

A Transform Unit (TU) may be a basic unit in a CU used for processes such as transform, quantization, inverse transform, inverse quantization, entropy encoding, and entropy decoding. TUs may have a square or rectangular shape.

Among CUs partitioned from LCUs, a CU that is no longer partitioned into CUs may be partitioned into one or more TUs. Here, the partition structure of the TUs may be a quadtree structure. For example, as shown in FIG. 5, a single CU 510 may be partitioned one or more times according to a quadtree structure. With such partitioning, a single CU 510 may be composed of TUs having various sizes.

In the encoding apparatus 100, a Coding Tree Unit (CTU) having a size of 64×64 may be partitioned into a plurality of smaller CUs by a recursive quadtree structure. A single CU may be partitioned into four CUs having the same size. Each CU may be recursively partitioned and may have a quadtree structure.

A CU may have a given depth. When a CU is partitioned, the depth of the CU generated by the partition may be increased by 1 from the depth of the partitioned CU.

For example, the depth of a CU may have a value ranging from 0 to 3. The size of a CU may range from a 64 x 64 size to an 8 x 8 size depending on the depth of the CU.

By recursive partitioning of the CU, the best partitioning method can be selected where the minimum rate distortion cost occurs.

Fig. 6 is a diagram for explaining an embodiment of an intra prediction process.

The arrow extending radially from the center of the graph in fig. 6 may represent the prediction direction of the intra prediction mode. Further, numbers shown near the arrow may represent examples of mode values assigned to intra prediction modes or prediction directions of intra prediction modes.

Intra-coding and/or decoding may be performed using reference samples of blocks adjacent to the target block. The neighboring block may be a neighboring reconstructed block. For example, intra-frame encoding and/or decoding may be performed using values of reference samples included in each neighboring reconstructed block or encoding parameters of the neighboring reconstructed blocks.

The encoding apparatus 100 and/or the decoding apparatus 200 may generate a prediction block by performing intra prediction on a target block based on information about samples in a target image. When intra prediction is performed, the encoding apparatus 100 and/or the decoding apparatus 200 may generate a prediction block for a target block by performing intra prediction based on information about samples in a target image. When intra prediction is performed, the encoding apparatus 100 and/or the decoding apparatus 200 may perform directional prediction and/or non-directional prediction based on at least one reconstructed reference sample.

The prediction block may represent a block generated as a result of performing intra prediction. The prediction block may correspond to at least one of a CU, PU, and TU.

The unit of the prediction block may have a size corresponding to at least one of the CU, PU, and TU. The prediction block may have a square shape with a size of 2n×2n or n×n. The dimensions N x N may include dimensions 4 x 4, 8 x 8, 16 x 16, 32 x 32, 64 x 64, etc.

Alternatively, the prediction block may be a square block of size 2×2, 4×4, 16×16, 32×32, 64×64, etc. or a rectangular block of size 2×8, 4×8, 2×16, 4×16, 8×16, etc.

Intra prediction may be performed according to an intra prediction mode for a target block. The number of intra prediction modes that the target block may have may be a predefined fixed value, and may be a value that is differently determined according to the properties of the prediction block. For example, the attribute of the prediction block may include the size of the prediction block, the type of the prediction block, and the like.

For example, the number of intra prediction modes may be fixed to 35 regardless of the size of the prediction block. Alternatively, the number of intra prediction modes may be, for example, 3, 5, 9, 17, 34, 35, or 36.

As shown in fig. 6, the intra prediction modes may include two non-directional modes and 33 directional modes. The two non-directional modes may include a DC mode and a planar mode.

For example, in a vertical mode with a mode value of 26, prediction may be performed in the vertical direction based on the pixel value of the reference sample. For example, in a horizontal mode with a mode value of 10, prediction may be performed in the horizontal direction based on the pixel value of the reference sample.

Even in the directional mode other than the above-described modes, the encoding apparatus 100 and the decoding apparatus 200 can perform intra prediction on the target unit using the reference samples according to the angle corresponding to the directional mode.

The intra prediction mode located at the right side with respect to the vertical mode may be referred to as a "vertical-right mode". The intra prediction mode located below the horizontal mode may be referred to as a "horizontal-below mode". For example, in fig. 6, an intra prediction mode in which the mode value is one of 27, 28, 29, 30, 31, 32, 33, and 34 may be the vertical-right mode 613. The intra prediction mode whose mode value is one of 2, 3, 4, 5, 6, 7, 8, and 9 may be the horizontal-down mode 616.

The non-directional modes may include a DC mode and a planar mode. For example, the mode value of the DC mode may be 1. The mode value of the planar mode may be 0.

The orientation pattern may include an angular pattern. Among the plurality of intra prediction modes, modes other than the DC mode and the plane mode may be a directional mode.

In the DC mode, a prediction block may be generated based on an average of pixel values of a plurality of reference samples. For example, the pixel values of the prediction block may be determined based on an average of the pixel values of the plurality of reference samples.

The number of intra prediction modes and the mode values of the respective intra prediction modes described above are merely exemplary. The number of intra prediction modes described above and the mode values of the respective intra prediction modes may be defined differently according to embodiments, implementations, and/or requirements.

The number of intra prediction modes may be different according to the type of color component. For example, the number of prediction modes may be different depending on whether the color component is a luminance (luma) signal or a chrominance (chroma) signal.

Fig. 7 is a diagram for explaining the positions of reference samples used in an intra prediction process.

Fig. 7 shows the positions of reference samples for intra prediction of a target block. Referring to fig. 7, reconstructed reference points for intra prediction of a target block may include, for example, a lower left reference point 731, a left reference point 733, an upper left reference point 735, an upper reference point 737, and an upper right reference point 739.

For example, the left reference sample 733 may represent a reconstructed reference pixel adjacent to the left side of the target block. The upper reference sample 737 may represent reconstructed reference pixels adjacent to the top of the target block. The upper left corner reference sample point 735 may represent a reconstructed reference pixel located at the upper left corner of the target block. The lower left reference sample point 731 may represent a reference sample point located below a left sample point line formed by the left reference sample point 733 among sample points located on the same line as the left sample point line. The upper right reference sample point 739 may represent a reference sample point located right of an upper sample point line formed by the upper reference sample point 737 among sample points located on the same line as the upper sample point line.

When the size of the target block is n×n, the numbers of the lower left reference sample 731, the left reference sample 733, the upper reference sample 737, and the upper right reference sample 739 may all be N.

By performing intra prediction on a target block, a prediction block may be generated. The generation of the prediction block may include determination of pixel values in the prediction block. The size of the target block and the size of the prediction block may be the same.

The reference points for intra prediction of the target block may vary depending on the intra prediction mode of the target block. The direction of the intra prediction mode may represent a dependency relationship between the reference sample point and the pixels of the prediction block. For example, the value of a particular reference sample may be used as the value of one or more particular pixels in the prediction block. In this case, the specific reference sample point and the one or more specific pixels in the prediction block may be samples and pixels located on a straight line along the direction of the intra prediction mode. In other words, the value of the specific reference sample point may be copied to be used as a value of a pixel located in a direction opposite to the direction of the intra prediction mode. Alternatively, the value of a pixel in the prediction block may be the value of a reference sample located in the direction of the intra prediction mode with respect to the position of the pixel.

In an example, when the intra prediction mode of the target block is a vertical mode with a mode value of 26, the upper reference sample 737 may be used for intra prediction. When the intra prediction mode is a vertical mode, the value of a pixel in the prediction block may be the value of a reference pixel located vertically above the position of the pixel. Thus, the upper reference sample 737 adjacent to the top of the target block may be used for intra prediction. Furthermore, the values of the pixels in a row of the prediction block may be the same as the values of the pixels of the upper reference sample 737.

In an example, when the intra prediction mode of the current block is a horizontal mode with a mode value of 10, the left reference sample 733 may be used for intra prediction. When the intra prediction mode is a horizontal mode, the value of a pixel in the prediction block may be the value of a reference pixel horizontally located to the left of the pixel. Accordingly, the left reference sample 733 adjacent to the left side of the target block may be used for intra prediction. Further, the values of pixels in a column of the prediction block may be the same as the values of pixels of the left reference sample 733.

In an example, when a mode value of an intra prediction mode of the current block is 18, at least some of the left reference samples 733, the upper left reference samples 735, and at least some of the upper reference samples 737 may be used for intra prediction. When the mode value of the intra prediction mode is 18, the value of a pixel in the prediction block may be the value of a reference sample point diagonally located at the upper left corner of the pixel.

Further, when an intra prediction mode having a mode value corresponding to 27, 28, 29, 30, 31, 32, 33, or 34 is used, at least some of the upper right reference samples 739 may be used for intra prediction.

In addition, when an intra prediction mode having a mode value corresponding to 2, 3, 4, 5, 6, 7, 8, or 9 is used, at least some of the lower left reference samples 739 may be used for intra prediction.

In addition, when an intra prediction mode having a mode value corresponding to any one of 11 to 25 is used, the upper left corner reference sample 735 may be used for intra prediction.

The number of reference samples used to determine the pixel value of one pixel in the prediction block may be 1 or 2 or more.

As described above, the pixel value of a pixel in a prediction block may be determined according to the position of the pixel and the position of a reference sample point indicated by the direction of the intra prediction mode. When the position of the pixel and the position of the reference sample indicated by the direction of the intra prediction mode are integer positions, the value of one reference sample indicated by the integer position may be used to determine the pixel value of the pixel in the prediction block.

When the position of the pixel and the reference sample indicated by the direction of the intra prediction mode are not integer positions, an interpolation reference sample based on two reference samples nearest to the position of the reference sample may be generated. The values of the interpolated reference samples may be used to determine pixel values for pixels in the prediction block. In other words, when the position of a pixel in the prediction block and the position of a reference sample point indicated by the direction of the intra prediction mode indicate the position between two reference sample points, interpolation based on the values of the two reference sample points may be generated.

The prediction block generated via prediction may be different from the original target block. In other words, there may be a prediction error that is a difference between the target block and the prediction block, and there may also be a prediction error between the pixels of the target block and the pixels of the prediction block. For example, in the case of directional intra prediction, the longer the distance between the pixels of the prediction block and the reference sample, the greater the prediction error that may occur. Such prediction errors may cause discontinuities between the generated prediction block and neighboring blocks.

To reduce prediction errors, filtering for the prediction block may be used. The filtering may be configured to adaptively apply the filter to regions of the prediction block that are considered to have large prediction errors. For example, the region considered to have a large prediction error may be a boundary of a prediction block. In addition, regions in the prediction block that are considered to have a large prediction error may differ according to an intra prediction mode, and the characteristics of the filter may also differ according to an intra prediction mode.

Fig. 8 is a diagram for explaining an embodiment of an inter prediction process.

The rectangle shown in fig. 8 may represent an image (or screen). Further, in fig. 8, an arrow may indicate a prediction direction. That is, each image may be encoded and/or decoded according to a prediction direction.

The image may be classified into an intra picture (I picture), a unidirectional predicted picture or a predicted coded picture (P picture), and a bidirectional predicted picture or a bidirectional predicted coded picture (B picture) according to the type of encoding. Each picture may be encoded according to the type of encoding of each picture.

When the target image that is the target to be encoded is an I picture, the target image can be encoded using data contained in the target image itself without inter prediction with reference to other images. For example, an I picture may be encoded via intra prediction only.

When the target image is a P picture, the target image may be encoded via inter prediction using only reference pictures in the forward direction.

When the target image is a B picture, the image may be encoded via inter prediction using reference pictures in both the forward direction and the reverse direction, or may be encoded via inter prediction using reference pictures in one of the forward direction and the reverse direction.

P-pictures and B-pictures encoded and/or decoded using reference pictures may be considered as pictures using inter-prediction.

Hereinafter, inter prediction in inter mode according to an embodiment will be described in detail.

In the inter mode, the encoding apparatus 100 and the decoding apparatus 200 may perform prediction and/or motion compensation on the target block.

For example, the encoding apparatus 100 or the decoding apparatus 200 may perform prediction and/or motion compensation by using motion information of spatial candidates and/or temporal candidates as motion information of a target block. The target block may represent a PU and/or a PU partition.

The spatial candidates may be reconstructed blocks spatially adjacent to the target block.

The temporal candidates may be reconstructed blocks corresponding to the target block in a previously reconstructed co-located picture (col picture).

In the inter prediction, the encoding apparatus 100 and the decoding apparatus 200 may improve encoding efficiency and decoding efficiency by using motion information of spatial candidates and/or temporal candidates. The motion information of the spatial candidate may be referred to as "spatial motion information". The motion information of the temporal candidates may be referred to as "temporal motion information".

Next, the motion information of the spatial candidate may be motion information of a PU including the spatial candidate. The motion information of the temporal candidate may be motion information of a PU including the temporal candidate. The motion information of the candidate block may be motion information of a PU including the candidate block.

Inter prediction may be performed using a reference picture.

The reference picture may be at least one of a picture preceding the target picture and a picture following the target picture. The reference picture may be an image for predicting the target block.

In inter prediction, a region in a reference picture may be specified by using a reference picture index (or refIdx) indicating the reference picture, a motion vector to be described later, or the like. Here, the region specified in the reference picture may represent a reference block.

Inter prediction may select a reference picture, and may also select a reference block corresponding to a target block in the reference picture. Furthermore, inter prediction may use the selected reference block to generate a prediction block for the target block.

The motion information may be derived by each of the encoding apparatus 100 and the decoding apparatus 200 during inter prediction. The spatial candidate block may be: 1) blocks present in the target picture, 2) blocks that have been previously reconstructed via encoding and/or decoding, and 3) blocks adjacent to or located at corners of the target block. Here, the "block located at the corner of the target block" may be a block vertically adjacent to an adjacent block horizontally adjacent to the target block, or a block horizontally adjacent to an adjacent block vertically adjacent to the target block. Further, "a block located at a corner of a target block" may have the same meaning as "a block adjacent to a corner of the target block". The "block located at the corner of the target block" may be included in the "block adjacent to the target block".

For example, the spatial candidate may be a reconstructed block located to the left of the target block, a reconstructed block located above the target block, a reconstructed block located at the lower left corner of the target block, a reconstructed block located at the upper right corner of the target block, or a reconstructed block located at the upper left corner of the target block.

Each of the encoding apparatus 100 and the decoding apparatus 200 may identify a block in the col picture that exists at a position spatially corresponding to the target block. The positions of the target blocks in the target picture may correspond to each other with the positions of the blocks identified in the col picture.

Each of the encoding apparatus 100 and the decoding apparatus 200 may determine col blocks existing at predefined relative positions for the identified blocks as time candidates. The predefined relative position may be a position that exists inside and/or outside the identified block.

For example, the col blocks may include a first col block and a second col block. When the coordinates of the identified block are (xP, yP) and the size of the identified block is expressed by (nPSW, nPSH), the first col block may be a block located at the coordinates (xp+npsw, yp+npsh). The second col block may be a block located at coordinates (xp+ (nPSW > > 1), yp+ (nPSH > > 1)). The second col block may be selectively used when the first col block is not available.

The motion vector of the target block may be determined based on the motion vector of the col block. Each of the encoding apparatus 100 and the decoding apparatus 200 may scale the motion vector of the col block. The scaled motion vector of the col block may be used as the motion vector of the target block. Further, the motion vector of the motion information for the temporal candidate stored in the list may be a scaled motion vector.

The ratio of the motion vector of the target block to the motion vector of the col block may be the same as the ratio of the first distance to the second distance. The first distance may be a distance between a reference picture of the target block and the target picture. The second distance may be a distance between a reference picture of the col block and the col picture.

The scheme for deriving motion information may vary according to the inter prediction mode of the target block. For example, as an inter prediction mode applied to inter prediction, there may be an Advanced Motion Vector Predictor (AMVP) mode, a merge mode, a skip mode, and the like. The respective modes will be described in detail below.

1) AMVP mode

When the AMVP mode is used, the encoding apparatus 100 may search for similar blocks in a neighboring area of the target block. The encoding apparatus 100 may acquire a prediction block by performing prediction on a target block using motion information of the found similar block. The encoding apparatus 100 may encode a residual block, which is a difference between the target block and the prediction block.

1-1) Generation of a list of predicted motion vector candidates

When AMVP mode is used as the prediction mode, each of the encoding apparatus 100 and the decoding apparatus 200 may create a list of prediction motion vector candidates using motion vectors of spatial candidates and/or motion vectors of temporal candidates. The motion vector of the spatial candidate and/or the motion vector of the temporal candidate may be used as a predicted motion vector candidate.

The predicted motion vector candidates may be motion vector predictors for predicting motion vectors. Further, in the encoding apparatus 100, each predicted motion vector candidate may be an initial search position for a motion vector.

1-2) searching for motion vectors using a list of predicted motion vector candidates

The encoding apparatus 100 may determine a motion vector to be used for encoding the target block within the search range using the list of predicted motion vector candidates. Further, the encoding apparatus 100 may determine a predicted motion vector candidate to be used as a predicted motion vector of the target block among a plurality of predicted motion vector candidates existing in the list of predicted motion vector candidates.

The motion vector to be used for encoding the target block may be a motion vector encoded at a minimum cost.

Further, the encoding apparatus 100 may determine whether to encode the target block using the AMVP mode.

1-3) transmission of inter prediction information

The encoding apparatus 100 may generate a bitstream including inter prediction information required for inter prediction. The decoding apparatus 100 may perform inter prediction on the target block using inter prediction information in the bitstream.

The inter prediction information may include: 1) mode information indicating whether AMVP mode is used, 2) a prediction motion vector index, 3) a Motion Vector Difference (MVD), 4) a reference direction, and 5) a reference picture index.

In addition, the inter prediction information may include a residual signal.

The decoding apparatus 200 may acquire the prediction motion vector index, the MVD, the reference direction, and the reference picture index from the bitstream only when the mode information indicates that the AMVP mode is used.

The prediction motion vector index may indicate a prediction motion vector candidate to be used for predicting the target block among a plurality of prediction motion vector candidates included in the list of prediction motion vector candidates.

1-4) inter prediction in AMVP mode using inter prediction information

The decoding apparatus 200 may select, as the predicted motion vector of the target block, the predicted motion vector candidate indicated by the predicted motion vector index from among a plurality of predicted motion vector candidates included in the list of motion vector predicted candidates.

The motion vector that will actually be used for inter prediction of the target block may not match the predicted motion vector. In order to indicate the difference between the motion vector that will actually be used for inter prediction of the target block and the predicted motion vector, MVD may be used. The encoding apparatus 100 may derive a prediction motion vector similar to a motion vector that will be actually used for inter prediction of a target block in order to use as small MVD as possible.

The MVD may be the difference between the motion vector of the target block and the predicted motion vector. The encoding apparatus 100 may calculate the MVD and may encode the MVD.

The MVD may be transmitted from the encoding apparatus 100 to the decoding apparatus 200 through a bitstream. The decoding apparatus 200 may decode the received MVD. The decoding apparatus 200 may derive a motion vector of the target block using the sum of the decoded MVD and the predicted motion vector.

The reference direction may indicate a list of reference pictures to be used for predicting the target block. For example, the reference direction may indicate one of the reference picture list L0 and the reference picture list L1.

The reference direction only indicates a reference picture list to be used for prediction of the target block, and may not indicate that the direction of the reference picture is limited to a forward direction or a backward direction. In other words, each of the reference picture list L0 and the reference picture list L1 may include pictures in a forward direction and/or a backward direction.

The reference direction as unidirectional may mean that a single reference picture list is used. The reference direction as bi-directional may mean that two reference picture lists are used. In other words, the reference direction may indicate one of the following: a case where only the reference picture list L0 is used, a case where only the reference picture list L1 is used, and a case where two reference picture lists are used.

The reference picture index may indicate a reference picture to be used for predicting the target block among a plurality of reference pictures in the reference picture list.

When two reference picture lists are to be used for predicting a target block, a single reference picture index and a single motion vector may be used for each of the multiple reference picture lists. Further, when two reference picture lists are used to predict a target block, two prediction blocks may be designated for the target block. For example, the average or weighted sum of two prediction blocks for the target block may be used to generate the (final) prediction block for the target block.

The motion vector of the target block may be specified with the prediction motion vector index, MVD, prediction direction, and reference picture index.

The decoding apparatus 200 may generate a prediction block for the target block based on the derived motion vector and the reference picture index information. For example, the prediction block may be a reference block indicated by a derived motion vector in a reference picture indicated by reference picture index information.

Since the prediction motion vector index and the MVD are encoded and the motion vector of the target block is not itself encoded, the number of bits transmitted from the encoding apparatus 100 to the decoding apparatus 200 can be reduced and the encoding efficiency can be improved.

For the target block, motion information of the reconstructed neighboring block may be used. In the specific inter prediction mode, the encoding apparatus 100 may not separately encode actual motion information of the target block. The motion information of the target block is not encoded and may instead be encoded with additional information that enables the motion information of the target block to be derived using the reconstructed motion information of neighboring blocks. Since the additional information is encoded, the number of bits transmitted to the decoding apparatus 200 may be reduced, and encoding efficiency may be improved.

For example, there may be a skip mode and/or a merge mode that is an inter prediction mode that does not directly encode motion information of a target block. Here, each of the encoding apparatus 100 and the decoding apparatus 200 may use an identifier and/or index indicating one unit among the reconstructed neighboring units, wherein motion information of the one unit is to be used as motion information of the target unit.

2) Merge mode

There is a merging method as a scheme for deriving motion information of a target block. The term "merge" may refer to the merging of motions of multiple blocks. "merge" may mean that motion information of one block is also applied to other blocks.

When the merge mode is used, the encoding apparatus 100 may predict motion information of the target block using motion information of the spatial candidate and/or motion information of the temporal candidate. The encoding apparatus 100 may acquire a prediction block via prediction. The encoding apparatus 100 may encode a residual block, which is a difference between the target block and the prediction block.

2-1) generation of merge candidate list

When the merge mode is used, each of the encoding apparatus 100 and the decoding apparatus 200 may generate a merge candidate list using motion information of spatial candidates and/or motion information of temporal candidates. The motion information may include 1) a motion vector, 2) a reference picture index, and 3) a reference direction. The reference direction may be unidirectional or bidirectional.

The merge candidate list may include merge candidates. The merge candidate may be motion information. In other words, the merge candidate may be a plurality of pieces of motion information of temporal candidates and/or spatial candidates. Further, the merge candidate list may include new merge candidates generated by combining merge candidates already existing in the merge candidate list. Further, the merge candidate list may include motion information of a zero vector.

Each merge candidate may include 1) a motion vector, 2) a reference picture index, and 3) a reference direction.

The merge candidate list may be generated before prediction in the merge mode is performed.

The number of merge candidates in the merge candidate list may be predefined. Each of the encoding apparatus 100 and the decoding apparatus 200 may add the merge candidates to the merge candidate list according to a predetermined scheme and a predetermined priority such that the merge candidate list has a predetermined number of merge candidates. The merge candidate list of the encoding device 100 and the merge candidate list of the decoding device 200 may be made identical to each other using a predetermined scheme and a predetermined priority.

Merging may be applied on a CU basis or a PU basis. When merging is performed on a CU basis or a PU basis, the encoding apparatus 100 may transmit a bitstream including predefined information to the decoding apparatus 200. For example, the predefined information may include: 1) Information indicating whether to perform merging for each block partition, and 2) information about a block to be used for performing merging among a plurality of blocks that are spatial candidates and/or temporal candidates for a target block.

2-2) searching for motion vectors using merge candidate list

The encoding apparatus 100 may determine a merge candidate to be used for encoding the target block. For example, the encoding apparatus 100 may perform prediction on the target block using the merge candidates in the merge candidate list, and may generate a residual block for the merge candidates. The encoding apparatus 100 may encode the target block using a merge candidate that generates the minimum cost in prediction and encoding of the residual block.

Further, the encoding apparatus 100 may determine whether to encode the target block using the merge mode.

2-3) transmission of inter prediction information

The encoding apparatus 100 may generate a bitstream including inter prediction information required for inter prediction. The decoding apparatus 200 may perform inter prediction on the target block using inter prediction information of the bitstream.

The inter prediction information may include 1) mode information indicating whether to use a merge mode, and 2) a merge index.

Furthermore, the inter prediction information may include a residual signal.

The decoding apparatus 200 may acquire the merge index from the bitstream only when the mode information indicates that the merge mode is used.

The merge index may indicate a merge candidate to be used for predicting the target block among a plurality of merge candidates included in the merge candidate list.

2-4) inter prediction using a merge mode of inter prediction information

The decoding apparatus 200 may perform prediction on the target block using a merge candidate indicated by the merge index among a plurality of merge candidates included in the merge candidate list.

The motion vector of the target block may be specified by the motion vector of the merge candidate indicated by the merge index, the reference picture index, and the reference direction.

3) Skip mode

The skip mode may be a mode in which motion information of a spatial candidate or motion information of a temporal candidate is applied to a target block without change. In addition, the skip mode may be a mode in which a residual signal is not used. In other words, when the skip mode is used, the reconstructed block may be a prediction block.

The difference between the merge mode and the skip mode is whether to transmit or use a residual signal. That is, the skip mode may be similar to the merge mode except that the residual signal is not transmitted or used.

When the skip mode is used, the encoding apparatus 100 may transmit only information on such blocks among a plurality of blocks as spatial candidates or temporal candidates to the decoding apparatus 200 through a bitstream: the motion information of the block will be used as motion information of the target block. In addition, when the skip mode is used, the encoding apparatus 100 may not transmit other syntax information (such as MVD) to the decoding apparatus 200.

3-1) generation of merge candidate list

The skip mode may also use a merge candidate list. In other words, the merge candidate list may be used in the merge mode as well as in the skip mode. In this regard, the merge candidate list may also be referred to as a "skip candidate list" or a "merge/skip candidate list".

Alternatively, the skip mode may use an additional candidate list different from that of the merge mode. In this case, in the following description, the merge candidate list and the merge candidate may be replaced with a skip candidate list and a skip candidate, respectively.

The merge candidate list may be generated before prediction in the skip mode is performed.

3-2) searching for motion vectors using merge candidate list

The encoding apparatus 100 may determine a merge candidate to be used for encoding the target block. For example, the encoding apparatus 100 may perform prediction on the target block using the merge candidates in the merge candidate list. The encoding apparatus 100 may encode the target block using the merge candidate that generates the minimum cost in prediction.

Further, the encoding apparatus 100 may determine whether to encode the target block using the skip mode.

3-3) transmission of inter prediction information

The inter prediction information may include 1) mode information indicating whether to use a skip mode, 2) a skip index.

The skip index may be the same as the merge index described above.

When the skip mode is used, the target block may be encoded without using the residual signal. The inter prediction information may not include a residual signal. Alternatively, the bitstream may not include a residual signal.

The decoding apparatus 200 may obtain the skip index from the bitstream only when the mode information indicates that the skip mode is used. As described above, the merge index and the skip index may be identical to each other. The decoding apparatus 200 may obtain the skip index from the bitstream only when the mode information indicates that the merge mode or the skip mode is used.

The skip index may indicate a merge candidate to be used for predicting the target block among a plurality of merge candidates included in the merge candidate list.

3-4) inter prediction in skip mode using inter prediction information

The decoding apparatus 200 may perform prediction on the target block using a merge candidate indicated by the skip index among a plurality of merge candidates included in the merge candidate list.

The motion vector of the target block may be specified by the motion vector of the merge candidate indicated by the skip index, the reference picture index, and the reference direction.

In the AMVP mode, the merge mode, and the skip mode described above, motion information to be used for predicting a target block may be specified among pieces of motion information in a list using an index of the list.

In order to improve the encoding efficiency, the encoding apparatus 100 may signal only an index of an element that generates the minimum cost when inter-predicting a target block among a plurality of elements in a list. The encoding apparatus 100 may encode the index and may signal the encoded index.

Therefore, the above-described list (i.e., the predicted motion vector candidate list and the merge candidate list) must be able to be derived by the encoding apparatus 100 and the decoding apparatus 200 using the same scheme based on the same data. Here, the same data may include a reconstructed picture and a reconstructed block. Further, in order to specify elements using indexes, it is necessary to fix the order of elements in the list.

Fig. 9 illustrates spatial candidates according to an embodiment.

In fig. 9, the positions of the spatial candidates are shown.

The large block at the center of the graph may represent the target block. Five tiles may represent spatial candidates.

The coordinates of the target block may be (xP, yP), and the size of the target block may be expressed in terms of (nPSW, nPSH).

Spatial candidate A ₀ May be a block adjacent to the lower left corner of the target block. A is that ₀ May be a block that occupies pixels located at coordinates (xP-1, yp+npsh+1).

Spatial candidate A ₁ May be a block adjacent to the left side of the target block. A is that ₁ May be the lowest block among a plurality of blocks adjacent to the left side of the target block. Alternatively, A ₁ Can be with A ₀ Is not included in the block, is not included in the block. A is that ₁ May be a block that occupies pixels located at coordinates (xP-1, yp+npsh).

Spatial candidate B ₀ May be a block adjacent to the upper right corner of the target block. B (B) ₀ May be a block that occupies pixels located at coordinates (xp+npsw+1, yp-1).

Spatial candidate B ₁ May be a block adjacent to the top of the target block. B (B) ₁ May be the rightmost block of the plurality of blocks adjacent to the top of the target block. Alternatively, B ₁ Can be with B ₀ Left of (3)Side adjacent blocks. B (B) ₁ May be a block that occupies pixels located at coordinates (xp+npsw, yP-1).

Spatial candidate B ₂ May be a block adjacent to the upper left corner of the target block. B (B) ₂ May be a block that occupies a pixel located at coordinates (xP-1, yp-1).

Determination of availability of spatial and temporal candidates

In order to include motion information of a spatial candidate or motion information of a temporal candidate in a list, it is necessary to determine whether the motion information of the spatial candidate or the motion information of the temporal candidate is available.

Hereinafter, the candidate block may include a spatial candidate and a temporal candidate.

The determination may be performed, for example, by sequentially applying the following steps 1) to 4).

Step 1) when a PU including a candidate block is outside the boundary of a picture, the availability of the candidate block may be set to "false". The "availability set to false" may have the same meaning as "set to unavailable".

Step 2) when a PU that includes a candidate block is outside the boundary of a stripe, the availability of the candidate block may be set to "false". When the target block and the candidate block are located in different stripes, the availability of the candidate block may be set to "false".

Step 3) when the PU including the candidate block is outside the boundary of the parallel block, the availability of the candidate block may be set to "false". When the target block and the candidate block are located in different parallel blocks, the availability of the candidate block may be set to "false".

Step 4) when the prediction mode of the PU including the candidate block is an intra prediction mode, the availability of the candidate block may be set to "false". When the PU including the candidate block does not use inter prediction, the availability of the candidate block may be set to "false".

Fig. 10 illustrates a sequence of adding motion information of a spatial candidate to a merge list according to an embodiment.

As shown in fig. 10, when pieces of motion information of space candidates are added to the merge list A can be used ₁ 、B ₁ 、B ₀ 、A ₀ And B ₂ Is a sequence of (a). That is, according to A ₁ 、B ₁ 、B ₀ 、A ₀ And B ₂ A plurality of pieces of motion information of available spatial candidates are added to the merge list.

Method for deriving a merge list in merge mode and skip mode

As described above, the maximum number of merge candidates in the merge list may be set. The maximum number of settings is denoted by "N". The set number may be transmitted from the encoding apparatus 100 to the decoding apparatus 200. The head of the tape may comprise N. In other words, the maximum number of merge candidates in the merge list of the target block for a stripe may be set by the stripe header. For example, the value of N may be substantially 5.

The pieces of motion information (i.e., the plurality of merge candidates) may be added to the merge list in the order of steps 1) to 4) below.

Step 1) among the plurality of spatial candidates, the available spatial candidates may be added to the merge list. Pieces of motion information of available spatial candidates may be added to the merge list in the order shown in fig. 10. Here, when the motion information of the available spatial candidates is repeated with other motion information already existing in the merge list, the motion information may not be added to the merge list. The operation for checking whether the corresponding motion information is repeated with other motion information present in the list may be simply referred to as "repetition check".

The maximum number of pieces of motion information added may be N.

Step 2) when the number of pieces of motion information in the merge list is less than N and a time candidate is available, the motion information of the time candidate may be added to the merge list. Here, when the motion information of the available time candidate is repeated with other motion information already existing in the merge list, the motion information may not be added to the merge list.

Step 3) when the number of pieces of motion information in the merge list is less than N and the type of the target slice is "B", the combined motion information generated by the combined bi-prediction (bi-prediction) may be added to the merge list.

The target stripe may be a stripe that includes a target block.

The combined motion information may be a combination of L0 motion information and L1 motion information. The L0 motion information may be motion information referring to only the reference picture list L0, and the L1 motion information may be motion information referring to only the reference picture list L1.

In the merge list, there may be one or more pieces of L0 motion information. Further, in the merge list, there may be one or more pieces of L1 motion information.

The combined motion information may include one or more pieces of combined motion information. When the combined motion information is generated, L0 motion information and L1 motion information to be used to generate the combined motion information among the one or more pieces of L0 motion information and the one or more pieces of L1 motion information may be predefined. One or more pieces of combined motion information may be generated in a predefined order via combined bi-prediction, wherein the combined bi-prediction uses a pair of different motion information in a merged list. One of the pair of different motion information may be L0 motion information and the other of the pair of different motion information may be L1 motion information.

For example, the combined motion information added with the highest priority may be a combination of L0 motion information with a merge index of 0 and L1 motion information with a merge index of 1. When the motion information of the merge index 0 is not L0 motion information or when the motion information of the merge index 1 is not L1 motion information, the combined motion information is neither generated nor added. Next, the combined motion information added with the next priority may be a combination of L0 motion information with a merge index of 1 and L1 motion information with a merge index of 0. The detailed combination that follows may follow other combinations in the field of video encoding/decoding.

Here, when the combined motion information is repeated with other motion information already existing in the merge list, the combined motion information may not be added to the merge list.

Step 4) when the number of pieces of motion information in the merge list is less than N, motion information of a zero vector may be added to the merge list.

The zero vector motion information may be motion information with a motion vector of zero vector.

The number of pieces of zero vector motion information may be one or more pieces. The reference picture indexes of one or more pieces of zero vector motion information may be different from each other. For example, the value of the reference picture index of the first zero vector motion information may be 0. The value of the reference picture index of the second zero vector motion information may be 1.

The number of zero vector motion information pieces may be the same as the number of reference pictures in the reference picture list.

The reference direction of the zero vector motion information may be bi-directional. Both motion vectors may be zero vectors. The number of zero vector motion information may be the smaller one of the number of reference pictures in the reference picture list L0 and the number of reference pictures in the reference picture list L1. Alternatively, when the number of reference pictures in the reference picture list L0 and the number of reference pictures in the reference picture list L1 are different from each other, a reference direction as one direction may be used for the reference picture index, wherein the reference picture index may be applied to only a single reference picture list.

The encoding apparatus 100 and/or the decoding apparatus 200 may sequentially add zero vector motion information to the merge list while changing the reference picture index.

When the zero vector motion information is repeated with other motion information already present in the merge list, the zero vector motion information may not be added to the merge list.

The order of steps 1) to 4) described above is merely exemplary and may be changed. Furthermore, some of the above steps may be omitted depending on predefined conditions.

Method for deriving predicted motion vector candidates in AMVP mode

The maximum number of predicted motion vector candidates in the predicted motion vector candidate list may be predefined. The predefined maximum number may be denoted by N. For example, the predefined maximum number may be 2.

A plurality of pieces of motion information (i.e., a plurality of predicted motion vector candidates) may be added to the predicted motion vector candidate list in the order of the following steps 1) to 3).

Step 1) may add an available spatial candidate among the plurality of spatial candidates to the predicted motion vector candidate list. The plurality of spatial candidates may include a first spatial candidate and a second spatial candidate.

The first spatial candidate may be A ₀ 、A ₁ Scaled A ₀ And scaled A ₁ One of which is a metal alloy. The second spatial candidate may be B ₀ 、B ₁ 、B ₂ Scaled B ₀ Scaled B ₁ And scaled B ₂ One of which is a metal alloy.

The pieces of motion information of the available spatial candidates may be added to the predicted motion vector candidate list in the order of the first spatial candidate and the second spatial candidate. In this case, when the motion information of the available spatial candidates is repeated with other motion information already existing in the predicted motion vector candidate list, the motion information may not be added to the predicted motion vector candidate list. In other words, when the N value is 2, if the motion information of the second spatial candidate is identical to the motion information of the first spatial candidate, the motion information of the second spatial candidate may not be added to the predicted motion vector candidate list.

The maximum number of motion information added may be N.

Step 2) when the number of pieces of motion information in the predicted motion vector candidate list is less than N and the temporal candidate is available, the motion information of the temporal candidate may be added to the predicted motion vector candidate list. In this case, when the motion information of the available time candidate is repeated with other motion information already existing in the predicted motion vector candidate list, the motion information may not be added to the predicted motion vector candidate list.

Step 3) when the number of pieces of motion information in the predicted motion vector candidate list is less than N, zero motion information may be added to the predicted motion vector candidate list.

The zero-motion information may include one or more pieces of zero-motion information. The reference picture indexes of the one or more pieces of zero motion information may be different from each other.

The encoding apparatus 100 and/or the decoding apparatus 200 may sequentially add a plurality of pieces of zero motion information to the predicted motion vector candidate list while changing the reference picture index.

When the zero motion information is repeated with other motion information already existing in the predicted motion vector candidate list, the zero motion information may not be added to the predicted motion vector candidate list.

The description of zero vector motion information described above with respect to the merge list may also be applied to zero motion information. A repetitive description thereof will be omitted.

The order of steps 1) to 3) described above is merely exemplary and may be changed. Furthermore, some steps may be omitted depending on predefined conditions.

Fig. 11 illustrates partitioning a picture using parallel blocks according to an embodiment.

In fig. 11, the picture is indicated by a solid line, and the parallel block is indicated by a broken line. A picture may be partitioned into multiple parallel blocks.

Each parallel block may be one of the entities that serves as a partition unit of the picture. The parallel block may be a partition unit of a picture. Alternatively, the parallel block may be a unit of picture partition coding.

Information about the parallel blocks may be signaled by a Picture Parameter Set (PPS). The PPS may contain information about parallel blocks of a picture or information required to partition a picture into multiple parallel blocks.

Table 1 below shows an example of the structure of pic_parameter_set_rbsp. The picture partition information may be pic_parameter_set_rbsp or may include pic_parameter_set_rbsp.

TABLE 1

"pic_parameter_set_rbsp" may include the following elements.

-tiles_enabled_flag: the "tiles_enabled_flag" may be a parallel block presence indication flag indicating whether one or more parallel blocks exist in a picture referencing PPS.

For example, a tiles enabled flag value of "0" may indicate that there are no parallel blocks in the picture referencing PPS. the tiles enabled flag value of "1" may indicate that there are one or more parallel blocks in the picture referencing PPS.

The values of the parallel block presence indication flag tiles_enabled_flag of all activated PPS in a single Coded Video Sequence (CVS) may be identical to each other.

Num_tile_columns_minus1: "num_tile_columns_minus1" may be information on the number of column-parallel blocks corresponding to the number of parallel blocks arranged in the lateral direction of the partitioned picture. For example, a value of "num_tile_columns_minus1+1" may represent the number of laterally parallel blocks in a picture of a partition. Alternatively, the value of "num_tile_columns_minus1+1" may represent the number of parallel blocks in a row.

Num_tile_rows_minus1: "num_tile_rows_minus1" may be information on the number of line parallel blocks corresponding to the number of parallel blocks arranged in the longitudinal direction of the partitioned picture. For example, a value of "num_tile_rows_minus1+1" may represent the number of vertically parallel blocks in the partitioned picture. Alternatively, the value of "num_tile_rows_minus1+1" may represent the number of parallel blocks in a column.

-unit_spacing_flag: the "uniform_spacing_flag" may be an equal division indication flag indicating whether or not a picture is equally divided into parallel blocks in the horizontal and vertical directions. For example, the uniform_spacing_flag may be a flag indicating whether the sizes of parallel blocks in a picture are identical to each other. For example, a uniform_spacing_flag value of "0" may indicate that a picture is not equally partitioned in the horizontal and/or vertical directions. The uniform_spacing_flag value of "1" may indicate that a picture is equally partitioned in the horizontal and vertical directions. When the uniform_spacing_flag value is "0", elements defined in more detailed partitions, such as column_width_minus1[ i ] and row_height_minus1[ i ], which will be described later, may be additionally required in order to partition a picture.

Column_width_minus1[ i ]: "column_width_minus1[ i ]" may be parallel block width information corresponding to the width of the parallel block in the i-th column. Here, i may be an integer equal to or greater than 0 and less than the number n of columns of parallel blocks. For example, "column_width_minus1[ i ] +1" may represent the width of the parallel blocks in column i+1. The width may be represented by a predetermined unit. For example, the unit of width may be a Coding Tree Block (CTB).

Row_height_minus1[ i ]: "row_height_minus1[ i ]" may be parallel block height information corresponding to the height of the parallel blocks in the i-th row. Here, i may be an integer equal to or greater than 0 and less than the number n of rows of the parallel block. For example, "row_height_minus1[ i ] +1" may represent the height of parallel blocks in row i+1. The height may be represented by a predetermined unit. For example, the unit of height may be a Coding Tree Block (CTB).

In an example, the picture partition information may be included in the PPS and may be transmitted as part of the PPS when the PPS is transmitted. The decoding apparatus may obtain picture partition information required for partitioning a picture by referring to PPS of the picture.

In order to signal picture partition information different from information that has been previously transmitted, the encoding apparatus may transmit a new PPS to the decoding apparatus, wherein the new PPS includes the new picture partition information and the new PPS ID. Subsequently, the encoding device may send the slice header containing the PPS ID to the decoding device.

Fig. 12 illustrates partitioning a picture using stripes according to an embodiment.

In fig. 12, the picture is indicated by a solid line, the band is indicated by a thick dotted line, and the Coding Tree Unit (CTU) is indicated by a thin dotted line. As shown in the drawings, a picture may be partitioned into a plurality of slices. A stripe may consist of one or more consecutive CTUs.

A stripe may be one of the entities used as a partition unit of a picture. The stripe may be a partition unit of a picture. Alternatively, the slice may be a unit of picture partition coding.

Information about the stripe may be signaled by the stripe segment header. The stripe segment header may contain information about the stripe.

When a slice is a unit of picture partition coding, the picture partition information may define a start address of each of one or more slices.

The unit of the start address of each stripe may be a CTU. The picture partition information may define a starting CTU address for each of the one or more slices. The partition shape of a picture may be defined by the starting address of a stripe.

Table 2 below shows an example of the structure of the slice_segment_header. The picture partition information may be or may include slice_segment_header.

TABLE 2

The "slice_segment_header" may include the following elements.

First_slice_segment_in_pic_flag: the "first_slice_segment_in_pic_flag" may be a first slice indication flag indicating whether the slice indicated by the slice_segment_header is the first slice in the picture.

For example, a first slice segment in pic flag value of "0" may indicate that the corresponding slice is not the first slice in the picture. The first slice segment in pic flag value of 1 may indicate that the corresponding slice is the first slice in the picture.

dependency_slice_segment_flag: the "dependent_slice_segment_flag" may be a dependent slice segment indication flag indicating whether the slice indicated by the slice_segment_header is a dependent slice.

For example, a dependency_slice_segment_flag value of "0" may indicate that the corresponding stripe is not a dependent stripe. The dependent slice segment flag value of "1" may indicate that the corresponding slice is a dependent slice.

For example, the substream stripes used for Wavefront Parallel Processing (WPP) may be dependent stripes. There may be independent strips corresponding to non-independent strips. When the stripe indicated by the slice_segment_header is a non-independent stripe, at least one element of the slice_segment_header may not exist. In other words, the values of the elements in the slice_segment_header may not be defined. For elements whose values are not defined in the dependent stripe, the values of the elements of the independent stripe corresponding to the dependent stripe may be used. In other words, the value of a specific element that does not exist in the slice_segment_header of the non-independent stripe may be equal to the value of a specific element in the slice_segment_header of the independent stripe corresponding to the non-independent stripe. For example, a dependent stripe may inherit the values of elements in its corresponding independent stripe and may redefine the values of at least some of the elements in the independent stripe.

-slice_segment_address: the "slice_segment_address" may be start address information indicating a start address of a slice indicated by the slice_segment_header. The unit of the start address information may be CTB.

Methods for partitioning a picture into one or more slices may include methods 1) through 3) below.

Method 1): the first method may be a method for partitioning a picture by a maximum size of a bitstream that one slice can include.

Method 2): the second method may be a method for partitioning a picture by the maximum number of CTUs that one slice can include.

Method 3): the third method may be a method for partitioning a picture by the maximum number of parallel blocks that one stripe can include.

When the encoding apparatus intends to perform parallel encoding on a stripe basis, a second method and a third method among the three methods may be generally used.

In the case of the first method, the size of the bitstream is known after encoding has been completed, and thus it may be difficult to define slices to be processed in parallel before encoding starts. Accordingly, the picture partition method capable of slice-based parallel encoding may be a second method using the maximum number of units of CTUs and a third method using the maximum number of units of parallel blocks.

When the second method and the third method are used, the partition size of the picture may be predefined before the picture is encoded in parallel. Further, from the defined size, a slice_segment_address may be calculated. When the encoding apparatus uses slices as units of parallel encoding, there is generally a tendency that the slice_segment_address is not changed for each picture but is repeated at a fixed period and/or according to a specific rule.

Fig. 13 illustrates distributively encoding a temporally and spatially partitioned picture according to an embodiment.

In fig. 13, a configuration in which one screen is partitioned into four stripes is shown. Further, each of the four pictures is partitioned into four slices. Each picture may include stripe 0, stripe 1, stripe 2, and stripe 3.

In other words, the video may be partitioned temporally and spatially. Each picture of the video may be partitioned into a particular number of slices.

The slices of each picture may be processed by the encoding node.

Multiple identical slices of multiple pictures may be bound in units of intra periods. Multiple slices of multiple pictures may be encoded in parallel by multiple encoding nodes distributed across the network.

For example, as shown in fig. 13, each slice 0 of the plurality of pictures may be processed by encoding node 0, each slice 1 of the plurality of pictures may be processed by encoding node 1, each slice 2 of the plurality of pictures may be processed by encoding node 2, and each slice 3 of the plurality of pictures may be processed by encoding node 3.

In parallel coding, inter-frame reference is not allowed between blocks in different slices, and thus communication between nodes and efficiency of parallel coding can be improved.

Figure 14 illustrates a process of motion constrained parallel block set (MCTS) according to an embodiment.

An MCTS may be a set of one or more parallel blocks, where the set limits the scope of inter prediction to a particular region in a picture.

For example, when a region of interest (ROI) in a picture is set as an MCTS, a region of the picture outside the boundary of the MCTS may not be used for inter prediction.

In fig. 14, inter prediction for picture 2 is shown as using only the region of the MCTS in picture 1. Further, the inter prediction for picture 3 is shown using only the region of the MCTS in picture 1 and the region of the MCTS in picture 2.

FIG. 15 illustrates PUs adjacent to the boundary of a stripe, according to an embodiment.

In fig. 15, a target screen is shown. The target picture is partitioned into two slices. In the target picture, there is a slice boundary between two slices.

Further, the target PU, which is a target block, is adjacent to the slice boundary and the picture boundary.

When the above-described scheme for adding motion information of a spatial candidate to a list is used to generate the list, it may often happen that the motion information in the list cannot be used as the motion information of the target PU. This will be described in detail below with reference to fig. 16.

Fig. 16 shows a merge list according to an embodiment.

The merge list of FIG. 16 may be a merge list generated for the target PU of FIG. 15.

The merge list of FIG. 16 may be generated using the merge list generation method described above.

The maximum number of motion information in the merge list may be 5.

Each row in the merge list may indicate motion information. For example, the first row 1610 may indicate motion information with a merge index value of 0.

The first column in the merge list may indicate the merge index. The second and third columns may indicate a reference picture list for motion information. For motion information using the reference picture list L0, a motion vector and a reference picture index may be described in the second column. For motion information using the reference picture list L1, a motion vector and a reference picture index may be described in the third column. For pieces of motion information using the reference picture list L0 and the reference picture list L1, respective motion vectors and respective reference picture indexes may be described in the second and third columns, respectively.

The expression "(X, Y), Z" may indicate a motion vector (X, Y) and a reference picture index Z.

For example, "(-1, -2), 0" and "-" in the first line 1610 may indicate that the first motion information is information corresponding to a motion vector (-1, -2), a reference picture list L0 and a reference picture index 0, and that the reference picture list L1 is not used. The motion information in the first line 1610 may indicate a reference picture with an index of 0 among the plurality of reference pictures in the reference picture list L0, and may indicate a motion vector shifted one column to the left and two lines to the top.

Further, the fourth line 1640 may represent bi-predicted motion information indicating the reference picture list L0 and the reference picture list L1.

Since the target PU of fig. 15 is located at the lower right portion of the slice, if one of the x value and the y value in the motion vector of the motion information is equal to or greater than 1, the position indicated by the motion vector applied to the target PU may be outside the boundary of the picture or outside the boundary of the slice. Thus, such motion information cannot be used for the target PU unless another control method (such as control based on the use of MVD) is used.

For example, the motion information in the second line 1620 may come from the spatial candidate B ₁ . For B ₁ The motion vector (-1, 1) may be a valid motion vector that is not outside the slice boundary and the picture boundary. However, the motion vector (-1, 1) may be a motion vector outside the slice boundary for the target PU. That is, the motion vector (-1, 1) may be a motion vector that cannot be used for the target PU, and the motion information in the second column may be motion information that cannot be used.

For example, the motion vector (1, 0) in the third line 1630 may be from the spatial candidate B ₂ . However, the motion vector (1, 0) may be a motion vector outside the picture boundary for the target PU.

For example, the motion vector (1, 1) in the fourth line 1640 may be from a temporal candidate. However, the motion vector (1, 1) may be a motion vector outside the picture boundary for the target PU.

For example, the motion vector in the fifth row 1650 may be combined motion information, where the combined motion information is generated by combined bi-prediction of the motion vector in the first row 1610 and the motion information in the second row 1620. However, since the motion information in the second line 1620 cannot be used for the target PU, the motion information in the fifth line 1650 cannot be generated either.

As described above, in some cases, a large amount of motion information in the merge list may not be used for the target block. In addition, such unavailable motion information may prevent other secondary motion information from being added to the merge list.

In this case, the encoding apparatus 100 cannot use motion information that causes the determined position to be outside the slice boundary or the picture boundary among the pieces of motion information in the merge list. In certain cases, virtually all of the motion information in the merge list may not be used.

When the encoding apparatus 100 selects the optimal inter prediction mode for the target block, encoding efficiency may be deteriorated since the use of at least some of the pieces of motion information in the merge list is limited. Furthermore, certain motion information may cause overhead, such as MVD.

In the following embodiments, a motion prediction boundary checking method for improving coding efficiency while limiting the range of inter prediction is proposed.

When it is desired to add motion information of a candidate block to a list or determine availability of the candidate block, processing for motion prediction boundary checking may be performed.

The motion prediction boundary check may be configured to check whether a position determined using motion information of the candidate block is outside a region or boundary. In other words, the motion prediction boundary check may be configured to check whether a position referenced by the target block based on the motion vector of the motion information exists within the corresponding region. In other words, in inter prediction, the position referenced by the target block may be limited to the inside of the region. Motion information that has passed the motion prediction boundary check may be used for motion prediction of the target block.

The term "determined position" may be a position indicated by a motion vector applied to motion information of the target block. Here, the position indicated by the motion vector may be the position of the target block plus the position of the motion vector.

Based on the motion prediction boundary check, motion information of the candidate block may be added to the list as a motion information candidate for the target block only when the determined position exists within the region (or when the determined position is not outside the boundary).

The region may be a region of a stripe including a target block, a region of a parallel block including a target block, or a region of an MCTS including a target block. In other words, the region may be a unit including the target block among a plurality of partition units of the picture.

The boundary may comprise a boundary of a picture. Further, the boundaries may include boundaries between stripes, boundaries between parallel blocks, or boundaries between MCTSs. In other words, the boundary may represent 1) a boundary of a picture and 2) a boundary between a unit including a target block and another unit among a plurality of partition units of the picture.

Fig. 17 is a flowchart of an inter prediction method according to an embodiment.

In step 1710, the inter prediction unit 250 may check whether inter prediction is used for prediction of the target block.

For example, when the prediction information of the bitstream indicates inter prediction, the inter prediction unit 250 may check whether inter prediction is used for the target block.

At step 1720, the inter prediction unit 250 may obtain inter prediction information from the bitstream.

The inter prediction information may include mode information. The mode information may indicate which of 1) AMVP mode, 2) merge mode, and 3) skip mode is used for inter prediction of the target block.

The pattern information may include a plurality of pieces of pattern information. For example, the inter prediction information may include skip mode information. The skip mode information may indicate that the skip mode is used for inter prediction of the target block.

The inter prediction information may be different according to the mode information.

At step 1730, inter prediction unit 250 may generate a list.

The list may be a predicted motion vector candidate list or a merged list.

The list may be a list corresponding to a mode indicated by the inter prediction information. For example, when the inter prediction information indicates that AMVP mode is used, the generated list may be a predicted motion vector candidate list. When the inter prediction information indicates that a merge mode or a skip mode is used, the generated list may be a merge list.

The generation of the list will be described in more detail later with reference to fig. 18 and 19.

In step 1740, the inter prediction unit 250 may generate motion information of the target block based on the list and the inter prediction information.

In step 1750, the inter prediction unit 250 may perform inter prediction on the target block based on the motion information of the target block.

At least some of the steps 1710, 1720, 1730, 1740 and 1750 may also be performed by the inter prediction unit 110 of the encoding device 100. For example, the step 1730 of generating the list may also be performed by the encoding apparatus 100 in the same manner. In the following description related to the steps, the inter prediction unit 250 may be replaced by the inter prediction unit 110.

Steps 1710, 1720, 1730, 1740, and 1750 may be combined with the operation of other components of the encoding device 100 described above with reference to fig. 1. Further, steps 1710, 1720, 1730, 1740, and 1750 may be combined with the operation of other components of decoding device 200 described above with reference to fig. 2.

Fig. 18 is a flowchart of a method for generating a merge list for inter prediction of a target block, according to an embodiment.

Step 1730 described above with reference to fig. 17 may include steps 1810, 1820, 1830, 1840, 1850, 1860, 1870 and 1880, which will be described below.

In the present embodiment, the inter prediction mode for the target block may be a merge mode or a skip mode. The list may be a merged list. The motion information of the candidate block may correspond to a merge candidate.

At step 1810, the inter prediction unit 230 may determine whether motion information of a spatial candidate is to be added to the list.

If it is determined that motion information for a spatial candidate is to be added to the list, step 1820 may be performed.

If it is determined that motion information for a spatial candidate is not to be added to the list, step 1830 may be performed.

In step 1820, if it is determined that motion information of a spatial candidate is to be added to the list, the inter prediction unit 230 may add motion information of a spatial candidate to the list.

At steps 1810 and 1820, motion information for the spatial candidate may be added to the list.

In an embodiment, the inter prediction unit 230 may determine whether motion information of a spatial candidate is to be added to the list based on information about a target block and the motion information of the spatial candidate in steps 1810 and 1820.

In an embodiment, the information about the target block may be a location of the target block. The inter prediction unit 230 may determine whether motion information of a spatial candidate is to be added to the list based on the position of the target block and the motion vector of the spatial candidate.

In an embodiment, inter prediction unit 230 may determine whether motion information for a spatial candidate is to be added to the list based on the target block and a motion prediction boundary check for the spatial candidate.

In an embodiment, the inter prediction unit 230 may determine whether motion information of a spatial candidate is to be added to the list based on a position indicated by a motion vector applied to the target block. Here, the motion vector to be applied may be a motion vector of motion information of a spatial candidate.

Here, the position indicated by the motion vector may be a position determined by adding the position of the target block to the motion vector.

Further, the position indicated by the motion vector applied to the target block may be a reference position of the target block. Hereinafter, the position indicated by the motion vector applied to the target block will be simply referred to as "reference position of the target block". The reference location may indicate a reference block of the target block.

The position indicated by the motion vector or the reference position may be a position in the reference picture that is referenced by the target block.

In an embodiment, if the reference position of the target block exists within the region, the inter prediction unit 230 may add motion information of the spatial candidate to the list. If the reference position of the target block is located outside the region, the inter prediction unit 230 may not add the motion information of the spatial candidate to the list.

The region may be a region of a stripe including a target block, a region of a parallel block including a target block, or a region of an MCTS including a target block.

In an embodiment, if the reference position of the target block is not outside the boundary, the inter prediction unit 230 may add motion information of the spatial candidate to the list. If the reference position of the target block is outside the region, the inter prediction unit 230 may not add the motion information of the spatial candidate to the list.

The boundary may comprise a boundary of a picture. Further, the boundaries may include boundaries between stripes, boundaries between parallel blocks, or boundaries between MCTSs.

The spatial candidates may include a plurality of spatial candidates. The plurality of spatial candidates may be A ₁ 、B ₁ 、B ₀ 、A ₀ And B ₂ 。

If the number of pieces of motion information in the list is less than the preset maximum number of pieces of motion information, steps 1810 and 1820 may be sequentially and repeatedly performed on the plurality of spatial candidates.

In step 1830, inter prediction unit 230 may determine whether motion information of the temporal candidate is to be added to the list.

If it is determined that motion information for a time candidate is to be added to the list, step 1840 may be performed.

If it is determined that motion information for a time candidate is to be added to the list, step 1850 may be performed.

In step 1840, if it is determined that the motion information of the temporal candidate is to be added to the list, the inter prediction unit 230 may add the motion information of the temporal candidate to the list.

At steps 1830 and 1840, motion information for the temporal candidates may be added to the list.

In an embodiment, at steps 1830 and 1840, the inter prediction unit 230 may determine whether motion information of a temporal candidate is to be added to the list based on information about a target block and the motion information of the temporal candidate.

Hereinafter, the motion vector of the temporal candidate may be a scaled motion vector.

In an embodiment, the information about the target block may be a location of the target block. The inter prediction unit 230 may determine whether motion information of a temporal candidate is to be added to the list based on the position of the target block and the motion vector of the temporal candidate.

In an embodiment, the inter prediction unit 230 may determine whether motion information of a temporal candidate is to be added to the list based on the target block and a motion prediction boundary check for the temporal candidate.

In an embodiment, the inter prediction unit 230 may determine whether motion information of a temporal candidate is to be added to the list based on a position indicated by a motion vector applied to the target block. Here, the motion vector to be applied may be a motion vector of motion information of a temporal candidate.

In an embodiment, if the reference position of the target block exists within the region, the inter prediction unit 230 may add motion information of the temporal candidate to the list. If the reference position of the target block is outside the region, the inter prediction unit 230 may not add motion information of the temporal candidate to the list.

In an embodiment, if the reference position of the target block is not outside the boundary, the inter prediction unit 230 may add motion information of the temporal candidate to the list. If the reference position of the target block is outside the region, the inter prediction unit 230 may not add motion information of the temporal candidate to the list.

The time candidate may be the first col block or the second col block described above. When the first col block is available, the time candidate may be the first col block. When the second col block is available and the first col block is not available, the time candidate may be the second col block. In other words, the first col block may be used with a higher priority than the second col block.

If the number of pieces of motion information in the list is already equal to the preset maximum number of pieces of motion information before step 1830 is performed, steps 1830, 1840, 1850, 1860, 1870, and 1880 may not be performed and the motion information of the time candidate may not be included in the list.

For multiple spatial candidates and temporal candidates, steps 1810, 1820, 1830 and 1840 described above may be replaced with the first step and the second step.

In a first step, the inter prediction unit 230 may determine whether motion information of the candidate block is to be added to the list.

In a second step, if it is determined that motion information of the candidate block is to be added to the list, the motion information may be added to the list.

The first step and the second step may be sequentially and repeatedly performed for a plurality of spatial candidates and a plurality of temporal candidates. The first step and the second step may be repeated until the first step is performed for all of the plurality of spatial candidates and the plurality of temporal candidates, or until the number of pieces of motion information in the list reaches a maximum preset number.

In an embodiment, inter prediction unit 230 may determine whether motion information of a candidate block is to be added to the list based on the availability of spatial candidates. If the candidate block is not available, inter prediction unit 230 may not add motion information of the candidate block to the list. If a candidate block is available and the motion information of the candidate block is not repeated with other motion information present in the list, the inter prediction unit 230 may add the motion information of the candidate block to the list.

In an embodiment, the determination of whether the motion vector is outside the boundary or a determination corresponding thereto may be related to the determination of the availability. For example, even if the motion vector of the candidate block satisfies other conditions related to availability, the inter prediction unit 230 may determine whether the candidate block is available according to the result of the motion prediction boundary check. The determination of availability will be described in detail later with reference to fig. 20.

In an embodiment, the determination of whether the motion vector is outside the boundary or the determination corresponding thereto may be separate from the determination of availability. For example, even if a candidate block is available, the inter prediction unit 230 that determines the availability of the candidate block may determine whether motion information of the candidate block is to be added to the list according to the result of the motion prediction boundary check.

If the number of motion information in the list reaches a preset maximum number before the first and second steps are performed on all of the plurality of spatial candidates and the plurality of temporal candidates, the availability check may not be performed on the remaining candidates.

In step 1850, the inter prediction unit 230 may determine whether the combined motion information generated by the combined bi-prediction is to be added to the list.

If it is determined that combined motion information is to be added to the list, step 1860 may be performed.

If it is determined that the combined motion information is not to be added to the list, step 1870 may be performed.

In step 1860, if it is determined that the combined motion information is to be added to the list, the inter prediction unit 230 may add the combined motion information to the list.

1) if the number of pieces of motion information in the list is less than a preset maximum number, 2) if the combined motion information can be generated by combined bi-prediction using a plurality of pieces of motion information in the list, and 3) if the combined motion information is not repeated with other motion information in the list, the inter prediction unit 230 may add the combined motion information to the list.

Here, each of the plurality of pieces of motion information in the list may be motion information that has passed the motion prediction boundary check. Thus, the combined motion information resulting from the combined bi-prediction using the pieces of motion information in the list may pass the motion prediction boundary check. In contrast, the inter prediction unit 230 may perform motion prediction boundary checking even on the combined motion information, and may add only the motion information that has passed the motion prediction boundary checking to the list.

Steps 1850 and 1860 may only be performed if the type of target stripe is "B".

The combined motion information may include a plurality of pieces of combined motion information.

As described above, a plurality of pieces of combined motion information may be generated according to a predefined order. Steps 1850 and 1860 may be performed sequentially and repeatedly on multiple pieces of combined motion information. Steps 1850 and 1860 may be repeated until all possible pieces of combined motion information have been added to the list, or until the number of pieces of motion information in the list reaches a preset maximum number.

In step 1870, inter prediction unit 230 may determine whether zero motion vector information is to be added to the list.

If it is determined that zero vector motion information is to be added to the list, step 1880 may be performed.

If it is determined that zero vector motion information is not to be added to the list, the process may be terminated.

In step 1880, if it is determined that zero vector motion information is to be added to the list, the inter prediction unit 230 may add zero vector motion information to the list.

1) if the number of pieces of motion information in the list is less than a preset maximum number, 2) if zero vector motion information can be generated, and 3) if the zero vector motion information is not repeated with other motion information in the list, the inter prediction unit 230 may add the zero vector motion information to the list.

The zero vector motion information may include a plurality of pieces of zero vector motion information.

Steps 1870 and 1880 may be sequentially and repeatedly performed on a plurality of pieces of zero vector motion information. Steps 1870 and 1880 may be repeated until all possible pieces of zero vector motion information have been added to the list, or until the number of pieces of motion information in the list reaches a preset maximum number.

Fig. 19 is a flowchart of a method for generating a predicted motion vector candidate list for inter prediction of a target block according to an embodiment.

Step S1730 described above with reference to fig. 17 may include steps 1910, 1920, 1930, 1940, 1970, and 1980, which will be described below.

In the present embodiment, the inter prediction mode for the target block may be AMVP mode. The list may be a predicted motion vector candidate list. The motion information of the candidate block may correspond to a predicted motion vector candidate.

Steps 1910, 1920, 1930, 1940, 1970 and 1980 may correspond to steps 1810, 1820, 1830, 1840, 1870 and 1880, respectively, described above with reference to fig. 18. In other words, the descriptions of steps 1810, 1820, 1830, 1840, 1870 and 1880 are also applicable to steps 1910, 1920, 1930, 1940, 1970 and 1980, respectively. A repetitive description thereof will be omitted, and the following description will be made mainly based on differences between steps 1810, 1820, 1830, 1840, 1870 and 1880 and steps 1910, 1920, 1930, 1940, 1970 and 1980.

At step 1910, the inter prediction unit 230 may determine whether motion information of a spatial candidate is to be added to the list.

If it is determined that motion information for a spatial candidate is to be added to the list, step 1920 may be performed.

If it is determined that motion information for a spatial candidate is not to be added to the list, step 1930 may be performed.

The spatial candidates may include a plurality of spatial candidates. The plurality of spatial candidates may include a first spatial candidate and a second spatial candidate. The first spatial candidate may be A ₀ 、A ₁ Scaled A ₀ And scaled A ₁ One of which is a metal alloy. The second spatial candidate may be B ₀ 、B ₁ 、B ₂ Scaled B ₀ Scaled up and scaled downB ₁ And scaled B ₂ One of which is a metal alloy.

If the number of pieces of motion information in the list is less than the preset maximum number of pieces of motion information, steps 1910 and 1920 may be sequentially and repeatedly performed on the plurality of spatial candidates.

At step 1930, inter prediction unit 230 may determine whether motion information of the temporal candidate is to be added to the list.

If it is determined that motion information for a time candidate is to be added to the list, step 1940 may be performed.

If it is determined that motion information for a time candidate is not to be added to the list, step 1970 may be performed.

In step 1940, if it is determined that the motion information of the temporal candidate is to be added to the list, the inter prediction unit 230 may add the motion information of the temporal candidate to the list.

At steps 1930 and 1940, motion information for the time candidate may be added to the list.

If the number of pieces of motion information in the list is already equal to the preset maximum number before step 1930 is performed, steps 1930, 1940, 1970, and 1980 may not be performed, and the motion information of the time candidate may not be included in the list.

For example, if both the first spatial candidate and the second spatial candidate are available and the motion information of the first spatial candidate and the motion information of the second spatial candidate do not overlap each other, both the motion information of the first spatial candidate and the motion information of the second spatial candidate may be added to the list. In this case, if the preset maximum number is 2, the time candidate may not be derived, and the motion information of the time candidate may not be added to the list.

For multiple spatial candidates and temporal candidates, the steps 1910, 1920, 1930, and 1940 described above may be replaced by the first step and the second step.

In step 1970, the inter prediction unit 230 may determine whether zero vector motion information is to be added to the list.

If it is determined that zero vector motion information is to be added to the list, step 1980 may be performed.

In step 1980, if it is determined that zero vector motion information is to be added to the list, the inter prediction unit 230 may add zero vector motion information to the list.

Steps 1970 and 1980 may be sequentially and repeatedly performed for a plurality of pieces of zero vector motion information. Steps 1970 and 1980 may be repeated until all possible pieces of zero vector motion information have been added to the list, or until the number of pieces of motion information in the list reaches a preset maximum number.

Fig. 20 is a flowchart of a method for determining availability of candidate blocks for inter prediction of a target block, according to an embodiment.

The candidate block may include the spatial candidates and temporal candidates described above.

In step 2010, the inter prediction unit 230 may check whether a sample including a candidate block exists within a boundary of a picture.

If it is determined that there are samples including the candidate block within the boundary of the picture, step 2020 may be performed.

If there are no samples within the boundaries of the picture that include the candidate block, step 2060 may be performed.

In step 2020, the inter prediction unit 230 may check whether an object including a candidate block exists within a boundary of the region.

The object comprising the candidate block may be a PU. In other words, the entity providing the motion information may be a PU.

If there is an object including a candidate block within the boundary of the region, step 2030 may be performed.

If there are no objects within the boundaries of the region that include candidate blocks, step 2060 may be performed.

The region may correspond to a plurality of regions among a region of a stripe including a target block, a region of a parallel block including a target block, and a region of an MCTS including a target block. In this case, when an object including a candidate block exists within a plurality of boundaries of the plurality of regions, step 2030 may be performed. Step 2060 may be performed when the object comprising the candidate block does not exist within at least one of the plurality of boundaries of the plurality of regions.

In step 2030, the inter prediction unit 230 may check whether the prediction mode of the object including the candidate block is an inter mode.

If the prediction mode of the object including the candidate block is an inter mode, step 2040 may be performed.

If the prediction mode of the object including the candidate block is not inter mode, step 2060 may be performed.

In step 2040, the inter prediction unit 230 may determine whether a point indicated by a motion vector of an object including a candidate block exists within a boundary of the region.

If it is determined that the point indicated by the motion vector of the object comprising the candidate block exists within the boundary of the region, step 2050 may be performed.

If it is determined that the point indicated by the motion vector of the object comprising the candidate block does not exist within the boundary of the region, step 2060 may be performed.

At step 2050, inter prediction unit 230 may set the availability of the candidate block to true. In other words, the inter prediction unit 230 may set the candidate block to be available.

In step 2060, inter prediction unit 230 may set the availability of the candidate block to "false". In other words, the inter prediction unit 230 may set the candidate block to be unavailable.

In other words, in steps 2010, 2020, 2030 and 2040, the inter prediction unit 230 may determine whether a candidate block is available, and in steps 2050 and 2060, the inter prediction unit 230 may set the availability of the candidate block based on the result of the determination.

At step 2040, availability of the candidate block may be determined based on both the information about the target block and the motion information of the object including the candidate block.

In an embodiment, the information about the target block may be a location of the target block. The inter prediction unit 230 may determine whether a candidate block is available based on the position of the target block and motion information of the object.

In an embodiment, inter prediction unit 230 may determine whether a candidate block is available based on the target block and a motion prediction boundary check for the object.

In an embodiment, the inter prediction unit 230 may determine whether the candidate block is available based on a position indicated by a motion vector applied to the target block. Here, the motion vector to be applied may be a motion vector of motion information of the object.

Further, the position indicated by the motion vector applied to the target block may be a reference position of the target block.

In an embodiment, if the reference position of the target block exists within the region, the inter prediction unit 230 may determine that the candidate block is available. If the reference position of the target block is outside the region, the inter prediction unit 230 may determine that the candidate block is not available.

In an embodiment, if the reference position of the target block is not outside the boundary, the inter prediction unit 230 may determine that the candidate block is available. If the reference position of the target block is outside the region, the inter prediction unit 230 may determine that the candidate block is not available.

The boundary may comprise a boundary of a picture. The boundaries may include boundaries between stripes, boundaries between parallel blocks, or boundaries between MCTSs.

The order of steps 2010, 2020, 2030 and 2040 described above is merely exemplary and may be arbitrarily changed.

Fig. 21 shows a merge list to which a motion prediction boundary check according to an example is applied.

Referring to the merge list of fig. 16, as described above with reference to fig. 16, pieces of motion information in the second, third, and fourth lines 1620, 1630, 1640 in the merge list fail to pass the motion prediction boundary check. Accordingly, pieces of motion information in the second, third, and fourth lines 1620, 1630, and 1640 may not be added to the merge list of fig. 21.

Further, the motion information in the fifth line 1650 is combined motion information resulting from combined bi-prediction of the motion information in the first line 1610 and the motion information in the second line 1620. Since the motion information in the second row 1620 fails the motion prediction boundary check, the combined motion information in the fifth row 1650 cannot be generated.

Thus, in the merge list of fig. 21, only motion information corresponding to the first row 1610 may exist.

Since the number of pieces of motion information added to the merge list is only 1 among the pieces of motion information of the spatial candidates, the pieces of motion information of the temporal candidates, and the pieces of combined motion information, zero vector motion information can be added to the merge list.

When the number of reference pictures is 2, zero vector motion information of reference picture index 0 and zero vector motion information of reference picture index 1 may be added to the merge list.

As shown in fig. 21, with the above-described embodiment, only motion information that is not outside the boundary among the pieces of available motion information may be added to the list, and the merged list may include only the available motion information. In other words, all three pieces of motion information in the merged list of fig. 21 can be effectively used, as compared to the merged list of fig. 16 including only one piece of motion information that can be actually used. Therefore, the coding efficiency can be improved by the merge list of fig. 21.

Fig. 22 is a configuration diagram of an electronic device implementing an encoding apparatus according to an embodiment.

In an embodiment, at least some of the inter prediction unit 110, the intra prediction unit 120, the switcher 115, the subtractor 125, the transform unit 130, the quantization unit 140, the entropy decoding unit 150, the inverse quantization unit 160, the inverse transform unit 170, the adder 175, the filtering unit 180, and the reference picture buffer 190 of the encoding apparatus 100 may be program modules and may communicate with external devices or systems. The program modules may be included in the encoding device 100 in the form of an operating system, application program modules, and other program modules.

Program modules may be physically stored in various types of well known storage devices. Furthermore, at least some of the program modules may also be stored in a remote storage device capable of communicating with the encoding apparatus 100.

Program modules may include, but are not limited to, routines, subroutines, programs, objects, components, and data structures for performing functions or operations according to embodiments or for implementing abstract data types according to embodiments.

Program modules may be implemented using instructions or code executed by at least one processor of the encoding apparatus 100.

The encoding apparatus 100 may be implemented as an electronic device 2200 as shown in fig. 22. The electronic device 2200 may be a general-purpose computer system serving as the encoding apparatus 100.

As shown in fig. 22, the electronic device 2200 may include a processing unit 2210, a memory 2230, a User Interface (UI) input device 2250, a UI output device 2260, and a storage 2240 in communication with each other through a bus 2290. The electronic device 2200 may also include a communication unit 2220 connected to the network 2299.

The processing unit 2210 may be a Central Processing Unit (CPU) or a semiconductor device for executing processing instructions stored in the memory 2230 or the storage 2240. The processing unit 2210 may be at least one hardware processor.

The processing unit 2210 may generate and process signals, data, or information input to the electronic device 2200 or output from the electronic device 2200, and may perform checking, comparing, and determining related to the signals, data, or information. In other words, in an embodiment, the generation and processing of data or information, and the checking, comparing, and determining related to the data or information may be performed by the processing unit 2210.

For example, the processing unit 2210 may perform the steps in fig. 17, 18, 19, and 20.

The storage unit may represent the memory 2230 and/or the storage 2240. Each of the memory 2230 and the storage 2240 may be any of various types of volatile or nonvolatile storage media. For example, the memory may include at least one of Read Only Memory (ROM) 2231 and Random Access Memory (RAM) 2232.

The storage unit may store data or information for operation of the electronic device 2200. In an embodiment, data or information of the electronic device 2200 may be stored in the storage unit.

For example, the storage unit may store pictures, blocks, lists, motion information, inter prediction information, bitstreams, and the like.

The electronic device 2200 may be implemented in a computer system comprising a computer readable storage medium.

The storage medium may store at least one module required to use the electronic device 2200 as the encoding apparatus 100. The memory 2230 may store at least one module and may be configured to be executed by the processing unit 2210.

The functions related to the communication of data or information of the electronic device 2200 may be performed by the communication unit 2220.

For example, the communication unit 222 may transmit a bitstream including inter prediction information or the like to the decoding apparatus 200.

In an embodiment, at least some of the entropy decoding unit 210, the inverse quantization unit 220, the inverse transformation unit 230, the intra prediction unit 240, the inter prediction unit 250, the adder 255, the filtering unit 260, and the reference picture buffer 270 of the decoding apparatus 200 may be program modules and may communicate with external devices or systems. Program modules may be included in the decoding device 200 in the form of an operating system, application program modules, and other program modules.

Program modules may be physically stored in various types of well known storage devices. Furthermore, at least some of the program modules may also be stored in a remote storage device capable of communicating with the decoding apparatus 200.

Program modules may be implemented using instructions or code that are executed by at least one processor of decoding device 200.

The decoding apparatus 200 may be implemented as the electronic device 2300 shown in fig. 23. The electronic device 2300 may be a general-purpose computer system used as the encoding apparatus 100.

As shown in fig. 23, the electronic device 2300 may include a processing unit 2310, a memory 2330, a UI input device 2350, a UI output device 2360, and a storage 2340 that communicate with each other through a bus 2390. The electronic device 2300 may also include a communication unit 2320 that connects to the network 2399.

The processing unit 2310 may be a CPU or semiconductor device for executing processing instructions stored in the memory 2330 or the storage 2340. The processing unit 2310 may be at least one hardware processor.

The processing unit 2310 may generate and process signals, data, or information input to the electronic device 2300 or output from the electronic device 2300, and may perform checking, comparison, and determination related to the signals, data, or information. In other words, in an embodiment, the generation and processing of data or information, and the checking, comparing, and determining related to the data or information may be performed by the processing unit 2310.

For example, processing unit 2310 may perform the steps of fig. 17, 18, 19, and 20.

The storage unit may represent the memory 2330 and/or the storage 2340. Each of memory 2330 and storage 2340 may be any of various types of volatile or non-volatile storage media. For example, the memory may include at least one of Read Only Memory (ROM) 2331 and Random Access Memory (RAM) 2332.

The storage unit may store data or information for operation of the electronic device 2300. In an embodiment, data or information of the electronic device 2300 may be stored in the storage unit.

The electronic device 2300 may be implemented in a computer system that includes a computer-readable storage medium.

The storage medium may store at least one module required to use the electronic device 2300 as the decoding apparatus 200. Memory 2330 may store at least one module and may be configured to be executed by processing unit 2310.

Functions related to communication of data or information of the electronic device 2300 may be performed by the communication unit 2320.

For example, the communication unit 2320 may receive a bitstream including inter prediction information or the like from the encoding apparatus 100.

In the above-described embodiments, although the method has been described based on a flowchart as a series of steps or units, the present invention is not limited to the order of the steps, and some steps may be performed in a different order from the order of the steps that have been described or simultaneously with other steps. Furthermore, those skilled in the art will appreciate that: the steps shown in the flowcharts are not exclusive and may include other steps, or one or more steps in the flowcharts may be deleted without departing from the scope of the present invention.

The embodiments according to the present invention described above may be implemented as programs capable of being executed by various computer apparatuses, and may be recorded on a computer-readable storage medium. The computer readable storage medium may include program instructions, data files, and data structures, alone or in combination. The program instructions recorded on the storage medium may be specially designed or configured for the present invention, or may be known or available to those having ordinary skill in the computer software arts. Examples of computer storage media may include all types of hardware devices that are specially configured for recording and executing program instructions, such as magnetic media (such as hard disks, floppy disks, and magnetic tape), optical media (such as Compact Discs (CD) -ROMs, and Digital Versatile Discs (DVDs)), magneto-optical media (such as floppy disks, ROMs, RAMs, and flash memory). Examples of program instructions include both machine code, such as produced by a compiler, and high-level language code that can be executed by the computer using an interpreter. The hardware devices may be configured to operate as one or more software modules to perform the operations of the invention, and vice versa.

As described above, although the present invention has been described based on specific details (such as detailed components and a limited number of embodiments and drawings), the specific details are provided only for easy understanding of the present invention, and the present invention is not limited to these embodiments, and various changes and modifications will be practiced by those skilled in the art in light of the above description.

It is, therefore, to be understood that the spirit of the present embodiments is not to be limited to the above-described embodiments and that the appended claims and equivalents thereof, and modifications thereto, fall within the scope of the invention.

Claims

1. A decoding method, comprising:

generating a list for the target block;

determining motion information of the target block using the list and index, wherein

The index indicates a candidate for prediction among a plurality of candidates in the list.

2. The decoding method of claim 1, wherein,

it is determined whether motion information for a candidate block is added to the list based on information for the candidate block.

3. The decoding method of claim 2, wherein,

whether motion information for the candidate block is added to the list is determined based on whether the location is within a boundary.

4. The decoding method of claim 2, wherein,

The candidate block is a temporal candidate block.

5. The decoding method of claim 2, wherein,

only in case the motion information of the candidate block does not overlap with any of the one or more candidates already included in the list, the motion information of the candidate block is added to the list as a candidate.

6. The decoding method of claim 5, wherein,

the candidate block is adjacent to a top edge of a block diagonally adjacent to a lower left corner of the target block.

7. A method of encoding, comprising:

generating a list for the target block;

an index for the list is generated, wherein,

8. The encoding method of claim 7, wherein,

9. The encoding method of claim 8, wherein,

10. The encoding method of claim 8, wherein,

the candidate block is a temporal candidate block.

11. The encoding method of claim 8, wherein,

12. The encoding method of claim 11, wherein,

13. A computer-readable recording medium storing a bit stream generated by the encoding method of claim 7.

14. A computer-readable recording medium storing a bitstream generated by a video encoding method, wherein the video encoding method comprises:

generating a list for the target block;

an index for the list is generated, wherein,

15. A computer readable recording medium storing a bitstream comprising computer executable code, wherein the computer executable code, when executed by a processor of a video decoding device, causes the processor to perform the steps of:

decoding index information in the computer executable code;

Generating a list for the target block; and

the list and index are used to determine motion information for the target block, wherein,

the index is determined using the index information,

16. The computer-readable recording medium of claim 15, wherein,

17. The computer-readable recording medium of claim 16, wherein,

18. The computer-readable recording medium of claim 16, wherein,

the candidate block is a temporal candidate block.

19. The computer-readable recording medium of claim 16, wherein,

20. The computer-readable recording medium of claim 19, wherein,

21. A computer readable recording medium storing a bitstream comprising computer executable code, wherein the computer executable code, when executed by a processor of a video decoding device, causes the processor to perform the steps of:

generating a list for the target block;

determining motion information for the target block using the list and an index, wherein the index is determined based on index information in the computer executable code,

wherein the index indicates a candidate for prediction among a plurality of candidates in the list.

22. A decoding apparatus comprising:

a memory for storing a bitstream including index information indicating an index; and

a processor for generating a list for the target block using the bitstream,

wherein, the liquid crystal display device comprises a liquid crystal display device,

the processor uses the list and index to determine motion information for the target block,

23. An encoding apparatus, comprising:

a processor for generating a bit stream of index information indicating an index; and

A memory for storing the bit stream, wherein,

the processor generates a list for the target block,

24. A method of transmitting a bitstream, wherein the bitstream is generated by a video encoding device,

wherein the method comprises the following steps: the bit stream is transmitted in a manner such that,

wherein the bitstream includes index information,

the index information is used to indicate an index during decoding,

the index is used to determine motion information for a target block during decoding using a list for the target block,

25. The method of claim 24, wherein,

26. The method of claim 25, wherein,

27. The method of claim 25, wherein,

the candidate block is a temporal candidate block.

28. The method of claim 25, wherein,

29. The method of claim 28, wherein,