CN115552896A

CN115552896A - Image encoding/decoding method and apparatus for selectively encoding size information of rectangular slice, and method of transmitting bitstream

Info

Publication number: CN115552896A
Application number: CN202180033162.3A
Authority: CN
Inventors: 亨得利·亨得利; 金昇焕; S·帕鲁利
Original assignee: LG Electronics Inc
Current assignee: LG Electronics Inc
Priority date: 2020-03-09
Filing date: 2021-03-08
Publication date: 2022-12-30
Also published as: US20230036189A1; KR20220145407A; JP2023517623A; US11743469B2; EP4120681A4; EP4120681A1; WO2021182816A1; US20240031575A1

Abstract

Provided are an image encoding/decoding method and apparatus. The image decoding method performed by the image decoding apparatus according to the present disclosure may include the steps of: obtaining size information indicating a size of a current slice corresponding to at least a portion of a current picture from a bitstream; and determining a size of the current slice based on the size information.

Description

Image encoding/decoding method and apparatus for selectively encoding size information of rectangular slice, and method of transmitting bitstream

Technical Field

The present disclosure relates to an image encoding/decoding method and apparatus, and more particularly, to an image encoding and decoding method and apparatus that selectively encodes size information of a slice and a method of transmitting a bitstream generated by the image encoding method/apparatus of the present disclosure.

Background

Recently, demands for high-resolution and high-quality images, such as High Definition (HD) images and Ultra High Definition (UHD) images, are increasing in various fields. As the resolution and quality of image data improve, the amount of information or bits transmitted increases relatively compared to existing image data. An increase in the amount of transmission information or the amount of bits leads to an increase in transmission cost and storage cost.

Accordingly, efficient image compression techniques are needed to efficiently transmit, store, and reproduce information on high-resolution and high-quality images.

Disclosure of Invention

Technical problem

An object of the present disclosure is to provide an image encoding/decoding method and apparatus with improved encoding/decoding efficiency.

Another object of the present disclosure is to provide an image encoding/decoding method and apparatus for improving encoding/decoding efficiency by selectively encoding size information of a slice.

Another object of the present disclosure is to provide a method of transmitting a bitstream generated by an image encoding method or apparatus according to the present disclosure.

Another object of the present disclosure is to provide a recording medium storing a bitstream generated by an image encoding method or apparatus according to the present disclosure.

Another object of the present disclosure is to provide a recording medium storing a bitstream received, decoded, and used to reconstruct an image by the image decoding apparatus according to the present disclosure.

The technical problems solved by the present disclosure are not limited to the above technical problems, and other technical problems not described herein will be apparent to those skilled in the art from the following description.

Technical scheme

An image decoding method performed by an image decoding apparatus according to an aspect of the present disclosure may include the steps of: obtaining size information indicating a size of a current slice corresponding to at least a portion of a current picture from a bitstream; and determining the size of the current slice based on the size information. Here, the size information may include width information indicating a width of the current slice in a unit of a tile column and height information indicating a height of the current slice in a unit of a tile row, and the step of acquiring the size information from the bitstream may be performed based on whether the current slice belongs to a last tile column or a last tile row of the current picture.

In addition, an image decoding apparatus according to an aspect of the present disclosure may include a memory and at least one processor. The at least one processor may acquire size information indicating a size of a current slice corresponding to at least a portion of a current picture from a bitstream; and determining the size of the current slice based on the size information. Here, the size information may include width information indicating a width of the current slice in a tile column unit and height information indicating a height of the current slice in a tile row unit, and the size information may be acquired based on whether the current slice belongs to a last tile column or a last tile row of the current picture.

An image encoding method performed by an image encoding apparatus according to another aspect of the present disclosure may include the steps of: determining a current slice corresponding to at least a portion of a current picture; and generating a bitstream including size information of the current slice. Here, the size information may include width information indicating a width of the current slice in units of tile columns and height information indicating a height of the current slice in units of tile rows, and the step of generating the bitstream including the size information of the current slice may be performed based on whether the current slice belongs to a last tile column or a last tile row of the current picture.

In addition, the transmission method according to another aspect of the present disclosure may transmit a bitstream generated by the image encoding apparatus or the image encoding method of the present disclosure.

In addition, a computer-readable recording medium according to another aspect of the present disclosure may store a bitstream generated by the image encoding apparatus or the image encoding method of the present disclosure.

The features summarized above with respect to the present disclosure are merely exemplary aspects of the following detailed description of the disclosure, and do not limit the scope of the disclosure.

Advantageous effects

According to the present disclosure, an image encoding/decoding method and apparatus having improved encoding/decoding efficiency may be provided.

Also, according to the present disclosure, an image encoding/decoding method and apparatus that improve encoding/decoding efficiency by selectively encoding size information of a slice may be provided.

Further, according to the present disclosure, a method of transmitting a bitstream generated by an image encoding method or apparatus according to the present disclosure may be provided.

Further, according to the present disclosure, a recording medium storing a bitstream generated by an image encoding method or apparatus according to the present disclosure may be provided.

Further, according to the present disclosure, it is possible to provide a recording medium storing a bitstream received, decoded, and used to reconstruct an image by the image decoding apparatus according to the present disclosure.

Those skilled in the art will appreciate that the effects that can be achieved by the present disclosure are not limited to what has been particularly described hereinabove and that other advantages of the present disclosure will be more clearly understood from the detailed description.

Drawings

Fig. 1 is a view schematically showing a video encoding system to which an embodiment of the present disclosure is applied.

Fig. 2 is a view schematically showing an image encoding apparatus to which an embodiment of the present disclosure is applied.

Fig. 3 is a view schematically showing an image decoding apparatus to which an embodiment of the present disclosure is applied.

Fig. 4 is a view illustrating a segmentation structure of an image according to an embodiment.

FIG. 5 is a diagram illustrating an embodiment of partition types for blocks according to a multi-type tree structure.

Fig. 6 is a view illustrating a signaling mechanism of block partition information in a quad tree having a nested multi-type tree structure according to the present disclosure.

Fig. 7 is a view showing an embodiment of dividing a CTU into a plurality of CUs.

Fig. 8 is a view illustrating an adjacent reference sample according to an embodiment.

Fig. 9 to 10 are views illustrating intra prediction according to an embodiment.

Fig. 11 is a view illustrating an encoding method using inter prediction according to an embodiment.

Fig. 12 is a view illustrating a decoding method using inter prediction according to an embodiment.

Fig. 13 is a block diagram of CABAC according to an embodiment for encoding one syntax element.

Fig. 14 to 17 are views illustrating entropy encoding and entropy decoding according to an embodiment.

Fig. 18 and 19 are views illustrating an example of an image decoding and encoding process according to an embodiment.

Fig. 20 is a view illustrating a layer structure of an encoded image according to an embodiment.

Fig. 21 to 24 are views illustrating an embodiment of dividing a picture using tiles, slices, and sprites.

Fig. 25 is a view showing an embodiment of syntax of a sequence parameter set.

Fig. 26 is a view showing an embodiment of syntax of a picture parameter set.

Fig. 27 is a view showing an embodiment of syntax of a slice header.

Fig. 28 and 29 are views illustrating embodiments of an encoding method and a decoding method.

Fig. 30 and 31 are views illustrating another embodiment of a picture parameter set.

Fig. 32 is a view showing an embodiment of a decoding method.

Fig. 33 and 34 are views showing an algorithm for determining slicetopflefttileidx.

Fig. 35 is a view showing an embodiment of an encoding method.

Fig. 36 is a view showing a content streaming system to which an embodiment of the present disclosure is applied.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings to facilitate implementation by those skilled in the art. However, the present disclosure may be embodied in various different forms and is not limited to the embodiments described herein.

In describing the present disclosure, if it is determined that a detailed description of related known functions or configurations unnecessarily obscures the scope of the present disclosure, the detailed description thereof will be omitted. In the drawings, portions irrelevant to the description of the present disclosure are omitted, and like reference numerals are given to like portions.

In the present disclosure, when one component is "connected," "coupled," or "linked" to another component, it may include not only a direct connection relationship but also an indirect connection relationship in which an intermediate component exists. In addition, when an element "comprises" or "having" another element, it is meant that the other element may be included, but not excluded, unless otherwise specified.

In the present disclosure, the terms first, second, etc. may be used only for the purpose of distinguishing one component from other components, and do not limit the order or importance of the components unless otherwise specified. Accordingly, within the scope of the present disclosure, a first component in one embodiment may be referred to as a second component in another embodiment, and similarly, a second component in one embodiment may be referred to as a first component in another embodiment.

In the present disclosure, the components distinguished from each other are intended to clearly describe each feature, and do not mean that the components must be separated. That is, a plurality of components may be integrated and implemented in one hardware or software unit, or one component may be distributed and implemented in a plurality of hardware or software units. Accordingly, embodiments in which components are integrated or distributed are included within the scope of the present disclosure, even if not otherwise stated.

In the present disclosure, components described in the respective embodiments are not necessarily indispensable components, and some components may be optional components. Accordingly, embodiments consisting of a subset of the components described in the embodiments are also included within the scope of the present disclosure. Additionally, embodiments that include other components in addition to those described in the various embodiments are included within the scope of the present disclosure.

The present disclosure relates to encoding and decoding of images, and terms used in the present disclosure may have general meanings commonly used in the art to which the present disclosure belongs, unless re-defined in the present disclosure.

In this disclosure, "video" may refer to a set of images according to a passage of time. A picture generally refers to a unit of an image representing a specific time, and a slice/tile (tile) is a kind of coding unit constituting a part of the picture in a coding process. One picture may be composed of one or more slices/tiles. In addition, a slice/tile may include one or more Coding Tree Units (CTUs). One picture may be composed of one or more slices/tiles. One screen may include one or more tile groups. A tile group may include one or more tiles. A tile (brick) may represent a rectangular area of rows of CTUs within a tile in a picture. One tile may comprise one or more tiles. A tile may represent a rectangular region of rows of CTUs within the tile. One tile may be divided into a plurality of tiles, and each tile may include one or more CTU rows belonging to one tile. Tiles that are not partitioned into multiple tiles may also be considered tiles.

"pixel" or "pel (pel)" may mean the smallest unit that constitutes a picture (or image). In addition, "sample" may be used as a term corresponding to a pixel. The samples may generally represent pixels or values of pixels, as well as pixels/pixel values representing only luminance components or pixels/pixel values representing only chrominance components.

In the present disclosure, a "unit" may represent a basic unit of image processing. The unit may include at least one of a specific region of the screen and information related to the region. One unit may include one luminance block and two chrominance blocks (e.g., cb and Cr). In some cases, a cell may be used interchangeably with terms such as "sample array", "block", or "region". In general, an mxn block may include M columns and N rows of samples (or sample arrays) or sets (or arrays) of transform coefficients.

In the present disclosure, "current block" may mean one of "current encoding block", "current encoding unit", "encoding target block", "decoding target block", or "processing target block". When prediction is performed, "current block" may mean "current prediction block" or "prediction target block". When transform (inverse transform)/quantization (dequantization) is performed, the "current block" may mean a "current transform block" or a "transform target block". When performing filtering, "current block" may mean "filtering target block".

In addition, in the present disclosure, the "current block" may mean "a luminance block of the current block" unless explicitly stated as a chrominance block. The "chroma block of the current block" may be expressed by including an explicit description of a chroma block such as "chroma block" or "current chroma block".

In this disclosure, a slash "/" or "," should be interpreted as indicating "and/or". For example, the expressions "a/B" and "a, B" may mean "a and/or B". Further, "A/B/C" and "A/B/C" may mean "at least one of A, B, and/or C".

In this disclosure, the term "or" should be interpreted as indicating "and/or". For example, the expression "a or B" may include 1) only "a", 2) only "B", and/or 3) "both a and B". In other words, in the present disclosure, the term "or" should be interpreted as indicating "additionally or alternatively".

Overview of a video coding System

Fig. 1 is a view illustrating a video encoding system according to the present disclosure.

A video encoding system according to an embodiment may include a source device 10 and a sink device 20. Source device 10 may deliver the encoded video and/or image information or data to sink device 20 in the form of a file or stream via a digital storage medium or a network.

The source device 10 according to an embodiment may include a video source generator 11, an encoding device 12, and a transmitter 13. The receiving apparatus 20 according to an embodiment may include a receiver 21, a decoding apparatus 22, and a renderer 23. Encoding device 12 may be referred to as a video/image encoding device and decoding device 22 may be referred to as a video/image decoding device. The transmitter 13 may be included in the encoding device 12. The receiver 21 may be comprised in the decoding means 22. The renderer 23 may include a display and the display may be configured as a separate device or an external component.

The video source generator 11 may acquire the video/image through a process of capturing, synthesizing, or generating the video/image. The video source generator 11 may comprise a video/image capturing device and/or a video/image generating device. The video/image capture device may include, for example, one or more cameras, video/image archives including previously captured video/images, and the like. The video/image generating device may include, for example, a computer, a tablet computer, and a smartphone, and may generate (electronically) a video/image. For example, the virtual video/image may be generated by a computer or the like. In this case, the video/image capturing process may be replaced by a process of generating the relevant data.

The encoding device 12 may encode the input video/image. For compression and coding efficiency, the encoding device 12 may perform a series of processes such as prediction, transformation, and quantization. The encoding device 12 may output the encoded data (encoded video/image information) in the form of a bitstream.

The transmitter 13 may transmit the encoded video/image information or data output in the form of a bitstream to the receiver 21 of the reception apparatus 20 in the form of a file or a stream through a digital storage medium or a network. The digital storage medium may include various storage media such as USB, SD, CD, DVD, blu-ray, HDD, SSD, and the like. The transmitter 13 may include elements for generating a media file through a predetermined file format and may include elements for transmitting through a broadcast/communication network. The receiver 21 may extract/receive a bitstream from a storage medium or a network and transmit the bitstream to the decoding apparatus 22.

The decoding device 22 may perform decoding on the video/image by performing a series of processes corresponding to the operations of the encoding device 12, such as dequantization, inverse transformation, and prediction.

The renderer 23 may render the decoded video/image. The rendered video/image may be displayed by a display.

Overview of image encoding apparatus

As shown in fig. 2, the image source device 100 may include an image divider 110, a subtractor 115, a transformer 120, a quantizer 130, a dequantizer 140, an inverse transformer 150, an adder 155, a filter 160, a memory 170, an inter-frame predictor 180, an intra-frame predictor 185, and an entropy encoder 190. The inter predictor 180 and the intra predictor 185 may be collectively referred to as a "predictor". The transformer 120, the quantizer 130, the dequantizer 140, and the inverse transformer 150 may be included in the residual processor. The residual processor may also include a subtractor 115.

In some embodiments, all or at least some of the components configuring the image source device 100 may be configured by one hardware component (e.g., an encoder or a processor). In addition, the memory 170 may include a Decoded Picture Buffer (DPB) and may be configured by a digital storage medium.

The image divider 110 may divide an input image (or picture or frame) input to the image source device 100 into one or more processing units. For example, a processing unit may be referred to as a Coding Unit (CU). The coding units may be acquired by recursively partitioning a Coding Tree Unit (CTU) or a Largest Coding Unit (LCU) according to a quadtree binary tree-ternary tree (QT/BT/TT) structure. For example, one coding unit may be divided into a plurality of coding units of deeper depths based on a quadtree structure, a binary tree structure, and/or a ternary tree structure. For the partitioning of the coding unit, a quadtree structure may be applied first, and a binary tree structure and/or a ternary tree structure may be applied later. The encoding process according to the present disclosure may be performed based on the final coding unit that is not divided any more. The maximum coding unit may be used as the final coding unit, and a coding unit of a deeper depth obtained by dividing the maximum coding unit may also be used as the final coding unit. Here, the encoding process may include processes of prediction, transformation, and reconstruction, which will be described later. As another example, the processing unit of the encoding process may be a Prediction Unit (PU) or a Transform Unit (TU). The prediction unit and the transform unit may be divided or partitioned from the final coding unit. The prediction unit may be a sample prediction unit and the transform unit may be a unit for deriving transform coefficients and/or a unit for deriving residual signals from the transform coefficients.

The predictor (the inter predictor 180 or the intra predictor 185) may perform prediction on a block to be processed (a current block) and generate a prediction block including predicted samples of the current block. The predictor may determine whether to apply intra prediction or inter prediction on the basis of the current block or CU. The predictor may generate various information related to prediction of the current block and transmit the generated information to the entropy encoder 190. The information on the prediction may be encoded in the entropy encoder 190 and output in the form of a bitstream.

The intra predictor 185 can predict the current block by referring to samples in the current picture. The reference samples may be located in the neighborhood of the current block or may be placed separately according to the intra prediction mode and/or intra prediction technique. The intra-prediction modes may include a plurality of non-directional modes and a plurality of directional modes. The non-directional mode may include, for example, a DC mode and a planar mode. Depending on the degree of detail of the prediction direction, the directional modes may include, for example, 33 directional prediction modes or 65 directional prediction modes. However, this is merely an example, and more or fewer directional prediction modes may be used depending on the setting. The intra predictor 185 may determine a prediction mode applied to the current block by using a prediction mode applied to a neighboring block.

The inter predictor 180 may derive a prediction block of the current block based on a reference block (reference sample array) specified by a motion vector on a reference picture. In this case, in order to reduce the amount of motion information transmitted in the inter prediction mode, the motion information may be predicted in units of blocks, sub-blocks, or samples based on the correlation of motion information between neighboring blocks and the current block. The motion information may include a motion vector and a reference picture index. The motion information may also include inter prediction direction (L0 prediction, L1 prediction, bi prediction, etc.) information. In the case of inter prediction, the neighboring blocks may include spatially neighboring blocks existing in a current picture and temporally neighboring blocks existing in a reference picture. The reference picture including the reference block and the reference picture including the temporally adjacent block may be the same or different. The temporally neighboring blocks may be referred to as collocated reference blocks, collocated CUs (colcus), etc. A reference picture including temporally adjacent blocks may be referred to as a collocated picture (colPic). For example, the inter predictor 180 may configure a motion information candidate list based on neighboring blocks and generate information indicating which candidate is used to derive a motion vector and/or a reference picture index of the current block. Inter prediction may be performed based on various prediction modes. For example, in case of the skip mode and the merge mode, the inter predictor 180 may use motion information of neighboring blocks as motion information of the current block. In case of the skip mode, unlike the merge mode, the residual signal may not be transmitted. In case of a Motion Vector Prediction (MVP) mode, motion vectors of neighboring blocks may be used as a motion vector predictor, and a motion vector of a current block may be signaled by encoding a motion vector difference and an indicator of the motion vector predictor. The motion vector difference may mean a difference between a motion vector of the current block and the motion vector predictor.

The predictor may generate a prediction signal based on various prediction methods and prediction techniques described below. For example, the predictor may apply not only intra prediction or inter prediction, but also both intra prediction and inter prediction at the same time to predict the current block. A prediction method of predicting a current block by applying both intra prediction and inter prediction at the same time may be referred to as Combined Inter and Intra Prediction (CIIP). In addition, the predictor may perform Intra Block Copy (IBC) to predict the current block. Intra block copy may be used for content image/video coding of games etc, e.g. Screen Content Coding (SCC). IBC is a method of predicting a current picture using a previously reconstructed reference block in the current picture at a position spaced apart from a current block by a predetermined distance. When IBC is applied, the position of the reference block in the current picture may be encoded as a vector (block vector) corresponding to a predetermined distance. IBC basically performs prediction in a current picture, but may be performed similarly to inter prediction because a reference block is derived within the current picture. That is, the IBC may use at least one of the inter prediction techniques described in this disclosure.

The prediction signal generated by the predictor may be used to generate a reconstructed signal or to generate a residual signal. The subtractor 115 may generate a residual signal (residual block or residual sample array) by subtracting a prediction signal (prediction block or prediction sample array) output from the predictor from an input image signal (original block or original sample array). The generated residual signal may be transmitted to the transformer 120.

The transformer 120 may generate the transform coefficient by applying a transform technique to the residual signal. For example, the transform technique may include at least one of a Discrete Cosine Transform (DCT), a Discrete Sine Transform (DST), a karhunen-lo eve transform (KLT), a graph-based transform (GBT), or a Conditional Nonlinear Transform (CNT). Here, the GBT refers to a transform obtained from a graph when relationship information between pixels is represented by the graph. CNT refers to a transform obtained based on a prediction signal generated using all previously reconstructed pixels. In addition, the transform process may be applied to square pixel blocks having the same size or may be applied to blocks having a variable size other than a square.

The quantizer 130 may quantize the transform coefficients and send them to the entropy encoder 190. The entropy encoder 190 may encode a quantized signal (information on quantized transform coefficients) and output a bitstream. Information on the quantized transform coefficients may be referred to as residual information. The quantizer 130 may rearrange the quantized transform coefficients of the block type into a one-dimensional vector form based on the coefficient scan order, and generate information about the quantized transform coefficients based on the quantized transform coefficients of the one-dimensional vector form.

The entropy encoder 190 may perform various encoding methods, such as, for example, exponential golomb, context Adaptive Variable Length Coding (CAVLC), context Adaptive Binary Arithmetic Coding (CABAC), and so on. The entropy encoder 190 may encode information (e.g., values of syntax elements, etc.) required for video/image reconstruction other than the quantized transform coefficients together or separately. Encoded information (e.g., encoded video/image information) may be transmitted or stored in units of a Network Abstraction Layer (NAL) in the form of a bitstream. The video/image information may also include information on various parameter sets, such as an Adaptive Parameter Set (APS), a Picture Parameter Set (PPS), a Sequence Parameter Set (SPS), or a Video Parameter Set (VPS). In addition, the video/image information may also include general constraint information. The signaled information, the transmitted information, and/or the syntax elements described in this disclosure may be encoded by the above-described encoding process and included in the bitstream.

The bitstream may be transmitted through a network or may be stored in a digital storage medium. The network may include a broadcasting network and/or a communication network, and the digital storage medium may include various storage media such as USB, SD, CD, DVD, blu-ray, HDD, SSD, etc. A transmitter (not shown) transmitting the signal output from the entropy encoder 190 and/or a storage unit (not shown) storing the signal may be included as an internal/external element of the image source apparatus 100. Alternatively, a transmitter may be provided as a component of the entropy encoder 190.

The quantized transform coefficients output from the quantizer 130 may be used to generate a residual signal. For example, a residual signal (residual block or residual sample) may be reconstructed by applying dequantization and inverse transform to the quantized transform coefficients by the dequantizer 140 and the inverse transformer 150.

The adder 155 adds the reconstructed residual signal to a prediction signal output from the inter predictor 180 or the intra predictor 185 to generate a reconstructed signal (reconstructed picture, reconstructed block, reconstructed sample array). If the block to be processed has no residual, such as the case where the skip mode is applied, the prediction block may be used as a reconstructed block. The adder 155 may be referred to as a reconstructor or a reconstruction block generator. The generated reconstructed signal may be used for intra prediction of a next block to be processed in a current picture and may be used for inter prediction of a next picture through filtering as described below.

The filter 160 may improve subjective/objective image quality by applying filtering to the reconstructed signal. For example, the filter 160 may generate a modified reconstructed picture by applying various filtering methods to the reconstructed picture and store the modified reconstructed picture in the memory 170, and in particular, in the DPB of the memory 170. Various filtering methods may include, for example, deblocking filtering, sample adaptive offset, adaptive loop filtering, bilateral filtering, and so on. The filter 160 may generate various information related to filtering and transmit the generated information to the entropy encoder 190, as described later in the description of each filtering method. The information related to the filtering may be encoded by the entropy encoder 190 and output in the form of a bitstream.

The modified reconstructed picture sent to the memory 170 may be used as a reference picture in the inter predictor 180. When inter prediction is applied through the image source device 100, prediction mismatch between the image source device 100 and the image decoding apparatus may be avoided and encoding efficiency may be improved.

The DPB of the memory 170 may store the modified reconstructed picture to be used as a reference picture in the inter predictor 180. The memory 170 may store motion information of a block for deriving (or encoding) motion information in a current picture and/or motion information of an already reconstructed block in the picture. The stored motion information may be transmitted to the inter predictor 180 and used as motion information of a spatially neighboring block or motion information of a temporally neighboring block. The memory 170 may store reconstructed samples of the reconstructed block in the current picture and may transfer the reconstructed samples to the intra predictor 185.

Overview of image decoding apparatus

As shown in fig. 3, the image receiving apparatus 200 may include an entropy decoder 210, a dequantizer 220, an inverse transformer 230, an adder 235, a filter 240, a memory 250, an inter predictor 260, and an intra predictor 265. The interframe predictor 260 and intraframe predictor 265 may be collectively referred to as a "predictor". The dequantizer 220 and the inverse transformer 230 may be included in the residual processor.

According to an embodiment, all or at least some of the components configuring the image receiving apparatus 200 may be configured by a hardware component (e.g., a decoder or a processor). In addition, the memory 250 may include a Decoded Picture Buffer (DPB) or may be configured by a digital storage medium.

The image receiving apparatus 200, which has received the bitstream including the video/image information, may reconstruct an image by performing a process corresponding to the process performed by the image source apparatus 100 of fig. 2. For example, the image receiving apparatus 200 may perform decoding using a processing unit applied in an image encoding device. Thus, the processing unit of decoding may be, for example, an encoding unit. The coding unit may be acquired by dividing a coding tree unit or a maximum coding unit. The reconstructed image signal decoded and output by the image receiving apparatus 200 may be reproduced by a reproducing device (not shown).

The image receiving apparatus 200 may receive a signal output from the image encoding device of fig. 2 in the form of a bitstream. The received signal may be decoded by the entropy decoder 210. For example, the entropy decoder 210 may parse the bitstream to derive information (e.g., video/image information) required for image reconstruction (or picture reconstruction). The video/image information may also include information on various parameter sets, such as an Adaptive Parameter Set (APS), a Picture Parameter Set (PPS), a Sequence Parameter Set (SPS), or a Video Parameter Set (VPS). In addition, the video/image information may also include general constraint information. The image decoding apparatus may also decode the picture based on the information on the parameter set and/or the general constraint information. The signaled/received information and/or syntax elements described in this disclosure may be decoded and obtained from the bitstream by a decoding process. For example, the entropy decoder 210 decodes information in a bitstream based on an encoding method such as exponential golomb encoding, CAVLC, or CABAC, and outputs values of syntax elements required for image reconstruction and quantized values of transform coefficients of a residual. More specifically, the CABAC entropy decoding method may receive a bin corresponding to each syntax element in a bitstream, determine a context model using decoding target syntax element information, decoding information of a neighboring block and the decoding target block, or information of a symbol/bin decoded in a previous stage, perform arithmetic decoding on the bin by predicting an occurrence probability of the bin according to the determined context model, and generate a symbol corresponding to a value of each syntax element. In this case, the CABAC entropy decoding method may update the context model by using information of the decoded symbol/bin for the context model of the next symbol/bin after determining the context model. Information related to prediction among the information decoded by the entropy decoder 210 may be provided to the predictors (the inter predictor 260 and the intra predictor 265), and residual values (that is, quantized transform coefficients and related parameter information) on which entropy decoding is performed in the entropy decoder 210 may be input to the dequantizer 220. In addition, information regarding filtering among information decoded by the entropy decoder 210 may be provided to the filter 240. In addition, a receiver (not shown) for receiving a signal output from the image encoding apparatus may be further configured as an internal/external element of the image receiving device 200, or the receiver may be a component of the entropy decoder 210.

Further, the image decoding apparatus according to the present disclosure may be referred to as a video/image/picture decoding apparatus. The image decoding apparatus can be classified into an information decoder (video/image/picture information decoder) and a sample decoder (video/image/picture sample decoder). The information decoder may include an entropy decoder 210. The sample decoder may include at least one of a dequantizer 220, an inverse transformer 230, an adder 235, a filter 240, a memory 250, an inter predictor 260, or an intra predictor 265.

The dequantizer 220 may dequantize the quantized transform coefficients and output the transform coefficients. The dequantizer 220 may rearrange the quantized transform coefficients in the form of a two-dimensional block. In this case, the rearrangement may be performed based on the coefficient scanning order performed in the image encoding apparatus. The dequantizer 220 may perform dequantization on the quantized transform coefficient by using a quantization parameter (e.g., quantization step information) and obtain a transform coefficient.

Inverse transformer 230 may inverse transform the transform coefficients to obtain a residual signal (residual block, residual sample array).

The predictor may perform prediction on the current block and generate a prediction block including prediction samples of the current block. The predictor may determine whether to apply intra prediction or inter prediction to the current block based on information on prediction output from the entropy decoder 210, and may determine a specific intra/inter prediction mode (prediction technique).

As described in the predictor of the image source device 100, the predictor may generate a prediction signal based on various prediction methods (techniques) which will be described later.

The intra predictor 265 can predict the current block by referring to samples in the current picture. The description of the intra predictor 185 is equally applicable to the intra predictor 265.

The inter predictor 260 may derive a prediction block of the current block based on a reference block (reference sample array) on a reference picture specified by a motion vector. In this case, in order to reduce the amount of motion information transmitted in the inter prediction mode, the motion information may be predicted in units of blocks, sub-blocks, or samples based on the correlation of motion information between neighboring blocks and the current block. The motion information may include a motion vector and a reference picture index. The motion information may also include inter prediction direction (L0 prediction, L1 prediction, bi prediction, etc.) information. In the case of inter prediction, the neighboring blocks may include spatially neighboring blocks existing in a current picture and temporally neighboring blocks existing in a reference picture. For example, the inter predictor 260 may configure a motion information candidate list based on neighboring blocks and derive a motion vector and/or a reference picture index of the current block based on the received candidate selection information. Inter prediction may be performed based on various prediction modes, and the information regarding prediction may include information indicating an inter prediction mode of the current block.

The adder 235 may generate a reconstructed signal (reconstructed picture, reconstructed block, reconstructed sample array) by adding the obtained residual signal to a prediction signal (prediction block, predicted sample array) output from a predictor (including the inter predictor 260 and/or the intra predictor 265). If the block to be processed has no residual, such as when skip mode is applied, the predicted block may be used as a reconstructed block. The description of adder 155 applies equally to adder 235. Adder 235 may be referred to as a reconstructor or reconstruction block generator. The generated reconstructed signal may be used for intra prediction of a next block to be processed in a current picture and may be used for inter prediction of a next picture through filtering as described below.

The filter 240 may improve subjective/objective image quality by applying filtering to the reconstructed signal. For example, the filter 240 may generate a modified reconstructed picture by applying various filtering methods to the reconstructed picture and store the modified reconstructed picture in the memory 250, and in particular, in the DPB of the memory 250. Various filtering methods may include, for example, deblocking filtering, sample adaptive offset, adaptive loop filtering, bilateral filtering, etc.

The (modified) reconstructed pictures stored in the DPB of the memory 250 may be used as reference pictures in the inter predictor 260. The memory 250 may store motion information for a block from which motion information in a current picture is derived (or decoded) and/or motion information for blocks in a picture that have been reconstructed. The stored motion information may be transmitted to the inter predictor 260 to be used as motion information of a spatially neighboring block or motion information of a temporally neighboring block. The memory 250 may store reconstructed samples of a reconstructed block in a current picture and transfer the reconstructed samples to the intra predictor 265.

In the present disclosure, the embodiments described in the filter 160, the inter predictor 180, and the intra predictor 185 of the image source device 100 may be equally or correspondingly applied to the filter 240, the inter predictor 260, and the intra predictor 265 of the image receiving device 200.

Overview of image segmentation

The video/image encoding method according to the present disclosure may be performed based on the following image segmentation structure. In particular, the processes of prediction, residual processing ((inverse) transform, (de) quantization, etc.), syntax element encoding, and filtering, which will be described later, may be performed based on CTUs, CUs (and/or TUs, PUs) derived from the image segmentation structure. The image may be segmented in block units and the block segmentation process may be performed in the image splitter 110 of the encoding apparatus. The segmentation related information may be encoded by the entropy encoder 190 and transmitted to the decoding apparatus in the form of a bitstream. The entropy decoder 210 of the decoding apparatus may derive a block division structure of a current picture based on division-related information obtained from a bitstream, and based on this, a series of processes (e.g., prediction, residual processing, block/picture reconstruction, in-loop filtering, etc.) may be performed for image decoding.

A picture may be partitioned into a sequence of Coding Tree Units (CTUs). Fig. 4 shows an example in which a picture is divided into CTUs. The CTU may correspond to a Coding Tree Block (CTB). Alternatively, the CTU may include a coding tree block of luma samples and two coding tree blocks of corresponding chroma samples. For example, for a picture containing three arrays of samples, a CTU may include an nxn block of luma samples and two corresponding blocks of chroma samples. The maximum allowable size of the CTU for encoding and prediction may be different from the maximum allowable size of the CTU for transform. For example, even if the maximum size of a luminance transform block is 64 × 64, the maximum allowable size of a luminance block in a ctu may be 128 × 128.

Overview of CTU segmentation

As described above, the coding unit may be acquired by recursively dividing a Coding Tree Unit (CTU) or a Largest Coding Unit (LCU) according to a quadtree/binary tree/ternary tree (QT/BT/TT) structure. For example, a CTU may be first partitioned into a quadtree structure. Thereafter, the leaf nodes of the quadtree structure may be further partitioned by the multi-type tree structure.

Partitioning according to a quadtree means that the current CU (or CTU) is equally partitioned into four. By the division according to the quadtree, the current CU can be divided into four CUs having the same width and the same height. When the current CU is no longer partitioned into a quadtree structure, the current CU corresponds to a leaf node of the quadtree structure. CUs corresponding to leaf nodes of the quadtree structure may no longer be partitioned and may be used as the above-described final coding unit. Alternatively, the CUs corresponding to the leaf nodes of the quadtree structure may be further divided by a multi-type tree structure.

FIG. 5 is a diagram illustrating an embodiment of partition types for blocks according to a multi-type tree structure. The partitioning according to the multi-type tree structure may include two types of partitioning according to a binary tree structure and two types of partitioning according to a ternary tree structure.

Two types of partitions according to the binary tree structure may include a vertical binary partition (SPLIT _ BT _ VER) and a horizontal binary partition (SPLIT _ BT _ HOR). The vertical binary division (SPLIT _ BT _ VER) means that the current CU is equally divided into two in the vertical direction. As shown in fig. 4, by vertical binary division, two CUs having the same height as the current CU and half the width of the current CU can be generated. Horizontal binary partitioning (SPLIT _ BT _ HOR) means that the current CU is equally divided into two in the horizontal direction. As shown in fig. 5, by the horizontal binary division, two CUs having a height half of the height of the current CU and the same width as the current CU can be generated.

Two types of partitions according to the ternary tree structure may include a vertical three-prong partition (SPLIT _ TT _ VER) and a horizontal three-prong partition (SPLIT _ TT _ HOR). In the vertical three-way division (SPLIT _ TT _ VER), the current CU is divided in the vertical direction at a ratio of 1. As shown in fig. 5, by the vertical trifurcated division, two CUs having the same height as the current CU and a width of 1/4 of the width of the current CU and one CU having the same height as the current CU and a width of half the width of the current CU can be generated. In the horizontal three-fork division (SPLIT _ TT _ HOR), the current CU is divided in the horizontal direction at a ratio of 1. As shown in fig. 5, by the horizontal three-pointed division, two CUs having a height of 1/4 of the height of the current CU and the same width as the current CU and one CU having a height of half of the height of the current CU and the same width as the current CU can be generated.

Here, the CTU is considered as a root node of the quadtree, and is first divided into a quadtree structure. Information (e.g., QT _ split _ flag) indicating whether to perform quadtree partitioning on a current CU (CTU or node (QT _ node) of a quadtree) may be signaled. For example, when qt _ split _ flag has a first value (e.g., "1"), the current CU may be split by a quadtree. In addition, when QT _ split _ flag has a second value (e.g., "0"), the current CU is not divided by the quadtree, but becomes a leaf node (QT _ leaf _ node) of the quadtree. Each quadtree leaf node may then be further partitioned into a multi-type tree structure. That is, the leaf nodes of the quadtree may become nodes (MTT _ nodes) of a multi-type tree. In the multi-type tree structure, a first flag (e.g., mtt _ split _ cu _ flag) may be signaled to indicate whether the current node is additionally partitioned. If the corresponding node is additionally partitioned (e.g., if the first flag is 1), a second flag (e.g., mtt _ split _ cu _ vertical _ flag) may be signaled to indicate the partitioning direction. For example, the division direction may be a vertical direction when the second flag is 1, and may be a horizontal direction when the second flag is 0. Then, a third flag (e.g., mtt _ split _ cu _ binary _ flag) may be signaled to indicate whether the partition type is a binary partition type or a trifurcated partition type. For example, the partition type may be a binary partition type when the third flag is 1, and may be a trifurcate partition type when the third flag is 0. The nodes of the multi-type tree obtained by the binary division or the trifurcate division may be further divided into a multi-type tree structure. However, the nodes of the multi-type tree may not be partitioned into a quadtree structure. If the first flag is 0, the corresponding node of the multi-type tree is no longer divided, but becomes a leaf node (MTT _ leaf _ node) of the multi-type tree. CUs corresponding to leaf nodes of the multi-type tree may be used as the final encoding unit.

Based on mtt _ split _ CU _ vertical _ flag and mtt _ split _ CU _ binary _ flag, the multi-type tree partition mode (mttssplitmode) of the CU can be derived as shown in table 1 below. In the following description, the multi-type tree partition pattern may be referred to as a multi-tree partition type or a partition type.

[ Table 1]

MttSplitMode	mtt_split_cu_vertical_flag	mtt_split_cu_binary_flag
			SPLIT_TT_HOR	0	0
SPLIT_BT_HOR	0	1
			SPLIT_TT_VER	1	0
SPLIT_BT_VER	1	1

Fig. 7 is a view showing an example of dividing a CTU into CUs by applying a multi-type tree after applying a quadtree. In fig. 7, bold block side 710 represents a quadtree partition, while remaining side 720 represents a multi-type tree partition. A CU may correspond to a Coding Block (CB). In an embodiment, a CU may comprise an encoded block of luma samples and two encoded blocks of chroma samples corresponding to luma samples. The chroma component (sample) CB or TB size can be derived based on the luminance component (sample) CB or TB size based on a component ratio according to the color format of the picture/image (chroma format, e.g., 4. In the case of the 4. In the case of the 4. In the case of the 4.

In an embodiment, when the size of the CTU is 128 based on the luminance sample unit, the size of the CU may have a size from 128 × 128 to 4 × 4, which is the same size as the CTU. In one embodiment, in the case of a 4.

Furthermore, in an embodiment, the CU size and TU size may be the same. Alternatively, there may be multiple TUs in a CU region. The TU size may generally represent a luminance component (sample) Transform Block (TB) size.

The TU size may be derived based on the maximum allowed TB size maxTbSize as a predetermined value. For example, when the CU size is larger than maxTbSize, a plurality of TUs (TBs) having maxTbSize may be derived from the CU, and transform/inverse transform may be performed in units of TUs (TBs). For example, the maximum allowable luminance TB size may be 64 × 64 and the maximum allowable chrominance TB size may be 32 × 32. If the width or height of a CB split according to the tree structure is greater than the maximum transform width or height, the CB may be automatically (or implicitly) split until TB size limits in the horizontal and vertical directions are met.

In addition, for example, when intra prediction is applied, an intra prediction mode/type may be derived in units of CU (or CB), and the neighboring reference sample derivation and prediction sample generation process may be performed in units of TU (or TB). In this case, there may be one or more TUs (or TBs) in one CU (or CB) region, and in this case, a plurality of TUs or (TBs) may share the same intra prediction mode/type.

Further, for a quadtree coding tree scheme with nested multi-type trees, the following parameters may be signaled from the encoding device to the decoding device as SPS syntax elements. For example, at least one of CTU size as a parameter indicating the size of a root node of a quadtree, minQTSize as a parameter indicating the size of a leaf node of a minimum allowable quadtree, maxBtSize as a parameter indicating the size of a root node of a maximum allowable binary tree, maxTtSize as a parameter indicating the size of a root node of a maximum allowable trigeminal tree, maxMttDepth as a parameter indicating the maximum allowable hierarchical depth of multi-type tree division from a leaf node of a quadtree, minBtSize as a parameter indicating the size of a leaf node of a minimum allowable binary tree, or MinTtSize as a parameter indicating the size of a leaf node of a minimum allowable ternary tree may be signaled.

As an embodiment using the 4. In this case, minOTSize may be set to 16 × 16, maxBtSize may be set to 128 × 128, maxTtSzie may be set to 64 × 64, minBtSize and MinTtSize may be set to 4 × 4, and MaxMttDepth may be set to 4. Quadtree partitioning may be applied to CTUs to generate quadtree leaf nodes. The leaf nodes of the quadtree may be referred to as leaf QT nodes. The size of the quad tree leaf nodes may be from 16 x 16 size (e.g., minOTSize) to 128 x 128 size (e.g., CTU size). If the leaf QT node is 128 x 128, it may not be additionally split into binary/ternary trees. This is because, in this case, even if it is divided, it exceeds maxttsize and MaxTtszie (for example, 64 × 64). In other cases, the leaf QT nodes may be further partitioned into multi-type trees. Thus, the leaf QT node is the root node of the multi-type tree, and the leaf QT node may have a multi-type tree depth (mttDepth) 0 value. If the multi-type tree depth reaches MaxMttdepth (e.g., 4), then further segmentation may not be considered. If the width of the multi-type tree node is equal to MinBtSize and less than or equal to 2 xMinTtSize, then further horizontal partitioning may not be considered. If the height of the multi-type tree node is equal to MinBtSize and less than or equal to 2 xMinTtSize, then further vertical partitioning may not be considered. When the partition is not considered, the encoding apparatus may skip the signaling of the partition information. In this case, the decoding apparatus may derive the partition information having a predetermined value.

Further, one CTU may include an encoded block of luma samples (hereinafter, referred to as "luma block") and two encoded blocks of chroma samples corresponding thereto (hereinafter, referred to as "chroma blocks"). The above coding tree scheme may be applied equally or separately to the luma block and the chroma blocks of the current CU. Specifically, a luminance block and a chrominance block in one CTU may be divided into the same block TREE structure, and in this case, the TREE structure may be represented as SINGLE _ TREE. Alternatively, the luminance block and the chrominance block in one CTU may be divided into separate block TREE structures, and in this case, the TREE structure may be represented as DUAL _ TREE. That is, when the CTU is divided into two trees, a block tree structure for a luminance block and a block tree structure for a chrominance block may exist separately. In this case, the block TREE structure for the luminance block may be referred to as DUAL _ TREE _ LUMA, and the block TREE structure for the chrominance component may be referred to as DUAL _ TREE _ CHROMA. For P and B slice/tile groups, the luma and chroma blocks in one CTU may be constrained to have the same coding tree structure. However, for the I-slice/tile group, the luminance block and the chrominance block may have a block tree structure separated from each other. If a separate block tree structure is applied, the luma CTB may be partitioned into CUs based on a particular coding tree structure, and the chroma CTB may be partitioned into chroma CUs based on another coding tree structure. That is, this means that a CU in an I slice/tile group applying a separate block tree structure may include a coded block of a luma component or a coded block of two chroma components, and a CU of a P or B slice/tile group may include blocks of three color components (one luma component and two chroma components).

Although the quadtree coding tree structure having nested multi-type trees has been described, the structure of partitioning CUs is not limited thereto. For example, a BT structure and a TT structure may be interpreted as concepts included in a multi-partition tree (MPT) structure, and a CU may be interpreted as being partitioned by a QT structure and an MPT structure. In an example of partitioning a CU by a QT structure and an MPT structure, a syntax element (e.g., MPT _ split _ type) including information on how many blocks the leaf node of the QT structure is partitioned into and a syntax element (e.g., MPT _ split _ mode) including information on which of the vertical direction and the horizontal direction the leaf node of the QT structure is partitioned into may be signaled to determine the partition structure.

In another example, a CU may be partitioned in a different manner than a QT structure, a BT structure, or a TT structure. That is, unlike the splitting of a lower-depth CU into 1/4 of a higher-depth CU according to the QT structure, the splitting of a lower-depth CU into 1/2 of a higher-depth CU according to the BT structure, or the splitting of a lower-depth CU into 1/4 or 1/2 of a higher-depth CU according to the TT structure, a lower-depth CU may be split into 1/5, 1/3, 3/8, 3/5, 2/3, or 5/8 of a higher-depth CU in some cases, and the method of splitting a CU is not limited thereto.

A quadtree coding block structure with a multi-type tree can provide a very flexible block partitioning structure. Due to the partition types supported in the multi-type tree, different partition patterns may potentially result in the same coding block structure in some cases. In the encoding apparatus and the decoding apparatus, by limiting the occurrence of such redundant division patterns, the data amount of the division information can be reduced.

In addition, in encoding and decoding of video/images according to the present disclosure, the image processing unit may have a hierarchical structure. A picture may be divided into one or more tiles, patches, slices, and/or groups of tiles. A slice may include one or more tiles. One tile may include one or more rows of CTUs in the tile. A slice may include an integer number of tiles of a picture. A tile group may include one or more tiles. One segment may include one or more CTUs. A CTU may be divided into one or more CUs. A tile may be a rectangular region that includes a particular tile row and a particular tile column that are made up of multiple CTUs in the picture. A patch group may include an integer number of patches according to a patch raster scan in the picture. The slice header may carry information/parameters applicable to the corresponding slice (block in slice). When the encoding device or the decoding device has a multi-core processor, the encoding/decoding processes for tiles, slices, tiles, and/or groups of tiles may be performed in parallel.

In this disclosure, the names or concepts of the sets of slices or tiles may be used interchangeably. That is, the patch group header may be referred to as a slice header. Here, the slice may have one of slice types including an intra (I) slice, a predictive (P) slice, and a bi-predictive (B) slice. For blocks in an I slice, inter prediction is not used for prediction, and only intra prediction may be used. Of course, even in this case, the original sample values can be encoded and signaled without prediction. For blocks in a P slice, intra prediction or inter prediction may be used. When inter prediction is used, only uni-prediction may be used. Also, for blocks in the B slice, intra prediction or inter prediction may be used. When inter prediction is used, bi-prediction can be used at most.

The encoding apparatus may determine the tiles/tile groups, tiles, slices, and maximum and minimum coding unit sizes according to the characteristics (e.g., resolution) of the video image or in consideration of encoding efficiency and parallel processing. In addition, information about this or information from which this can be derived may be included in the bitstream.

The decoding apparatus may acquire information indicating whether CTUs in a tile or a tile/tile group, a tile, or a slice of a current picture are divided into a plurality of coding units. The encoding apparatus and the decoding apparatus can improve the encoding efficiency by signaling such information under specific conditions.

The slice header (slice header syntax) may include information/parameters that may be commonly applied to slices. The APS (APS syntax) or PPS (PPS syntax) may include information/parameters that may be commonly applied to one or more pictures. SPS (SPS syntax) may include information/parameters that may be commonly applied to one or more sequences. The VPS (VPS syntax) may include information/parameters that can be commonly applied to a plurality of layers. DPS (DPS syntax) may include information/parameters that may be commonly applied to the entire video. The DPS may include information/parameters associated with a combination of Coded Video Sequences (CVSs).

In addition, for example, information on the division and configuration of the tiles/tile groups/tiles/slices may be constructed through a high level syntax at an encoding level and transmitted to a decoding apparatus in the form of a bitstream.

Overview of Intra prediction

Hereinafter, the intra prediction performed by the above-described encoding apparatus and decoding apparatus will be described in more detail. The intra prediction may mean prediction for generating prediction samples of the current block based on reference samples in a picture to which the current block belongs (hereinafter, referred to as a current picture).

A description will be given with reference to fig. 8. When intra prediction is applied to the current block 801, neighboring reference samples to be used for intra prediction of the current block 801 may be derived. The neighboring reference samples of the current block may include: a total of 2 xnh samples including a sample 811 of size nW × nH adjacent to the left boundary of the current block and a sample 812 adjacent to the lower left, a total of 2 xnw samples including a sample 821 adjacent to the upper boundary of the current block and a sample 822 adjacent to the upper right, and one sample 831 adjacent to the upper left of the current block. Alternatively, the neighboring reference samples of the current block may include a plurality of columns of upper neighboring samples and a plurality of rows of left neighboring samples.

In addition, the neighboring reference samples of the current block may include: a total of nH samples 841 of size nW × nH adjacent to the right boundary of the current block, a total of nW samples 851 adjacent to the lower boundary of the current block, and one sample 842 adjacent to the lower right of the current block.

However, some neighboring reference samples of the current block have not been decoded or may not be available. In this case, the decoding apparatus may construct neighboring reference samples to be used for prediction by replacing unavailable samples with available samples. Alternatively, the neighboring reference samples to be used for prediction may be constructed by interpolation of available samples.

When deriving neighboring reference samples, (i) prediction samples may be derived based on an average or interpolation of neighboring reference samples of the current block, and (ii) prediction samples may be derived based on reference samples that exist in a specific (prediction) direction with respect to the prediction samples among the neighboring reference samples of the current block. (i) The case of (ii) may be referred to as a non-directional mode or a non-angular mode, and the case of (ii) may be referred to as a directional mode or an angular mode. In addition, the prediction samples may be generated by interpolation between first and second neighboring samples located in a direction opposite to a prediction direction of an intra prediction mode of the current block, based on the prediction samples of the current block among the neighboring reference samples. The above case may be referred to as linear interpolation intra prediction (LIP). In addition, chroma prediction samples may be generated based on luma samples using a linear model. This case may be referred to as an LM mode. In addition, a temporary prediction sample of the current block may be derived based on the filtered neighbor reference samples, and the prediction sample of the current block may be derived by weighted-summing the temporary prediction sample and at least one reference sample derived according to the intra prediction mode among existing neighbor reference samples (that is, unfiltered neighbor reference samples). The above case may be referred to as location dependent intra prediction (PDPC). In addition, a reference sample line having the highest prediction precision may be selected from among a plurality of neighboring reference sample lines of the current block, and a prediction sample may be derived using a reference sample located in a prediction direction on the corresponding line. At this time, intra prediction encoding can be performed by indicating (signaling) the used reference sample line to the decoding apparatus. The above case may be referred to as multi-reference line (MRL) intra prediction or MRL-based intra prediction. In addition, the current block may be partitioned into vertical or horizontal sub-partitions, intra prediction may be performed based on the same intra prediction mode, and neighboring reference samples may be derived and used in units of sub-partitions. That is, in this case, the intra prediction mode of the current block is equally applied to the sub-partition, and neighboring reference samples may be derived and used in units of sub-partitions, thereby increasing intra prediction performance in some cases. Such prediction methods may be referred to as intra sub-partitioning (ISP) or ISP-based intra prediction. Such an intra prediction method may be referred to as an intra prediction type so as to be distinguished from an intra prediction mode (e.g., a DC mode, a planar mode, or a directional mode). The intra prediction type may be referred to by various terms such as an intra prediction technique or an additional intra prediction mode. For example, the intra prediction type (or additional intra prediction mode, etc.) may include at least one of the above-described LIP, PDPC, MRL, and ISP. A general intra prediction method excluding a specific intra prediction type such as LIP, PDPC, MRL, and ISP may be referred to as a general intra prediction type. The general intra prediction type may refer to a case where a specific intra prediction type is not applied, and prediction may be performed based on the above intra prediction mode. Furthermore, post-filtering may be performed on the derived prediction samples as needed.

Specifically, the intra prediction process may include an intra prediction mode/type determining step, an adjacent reference sample deriving step, and a prediction sample deriving step based on the intra prediction mode/type. In addition, post-filtering may be performed on the derived prediction samples as needed.

Also, in addition to the above-described intra prediction types, affine Linear Weighted Intra Prediction (ALWIP) may be used. ALWIP may be referred to as Linear Weighted Intra Prediction (LWIP), matrix weighted intra prediction (MIP), or matrix based intra prediction. When MIP is applied to the current block, i) neighboring reference samples subjected to an averaging process are used, ii) a matrix vector multiplication process may be performed, and iii) a horizontal/vertical interpolation process may be further performed as necessary, thereby deriving prediction samples of the current block. The intra prediction mode for MIP may be constructed differently from the intra prediction modes used in the above-described LIP, PDPC, MRL, ISP intra prediction, or general intra prediction. The intra prediction mode of MIP may be referred to as MIP intra prediction mode, MIP prediction mode, or MIP mode. For example, the matrix and the offset used in the matrix vector multiplication may be differently set according to the intra prediction mode of MIP. Here, the matrix may be referred to as a (MIP) weight matrix, and the offset may be referred to as a (MIP) offset vector or a (MIP) bias vector. The detailed MIP method will be described below.

The block reconstruction process based on intra prediction and intra prediction units in the encoding apparatus may illustratively include, for example, the following. S910 may be performed by the intra predictor 185 of the encoding apparatus, and S920 may be performed by a residual processor of the encoding apparatus including at least one of the subtractor 115, the transformer 120, the quantizer 130, the dequantizer 140, and the inverse transformer 150. Specifically, S920 may be performed by the subtractor 115 of the encoding apparatus. In S930, the prediction information may be derived by the intra predictor 185 and may be encoded by the entropy encoder 190. In S930, the residual information may be derived by the residual processor and may be encoded by the entropy encoder 190. The residual information is information about residual samples. The residual information may include information on quantized transform coefficients of the residual samples. As described above, the residual samples may be derived as transform coefficients by the transformer 120 of the encoding apparatus, and the transform coefficients may be derived as quantized transform coefficients by the quantizer 130. The information regarding the quantized transform coefficients may be encoded by the entropy encoder 190 through a residual encoding process.

The encoding apparatus may perform intra prediction on the current block (S910). The encoding device may derive an intra prediction mode/type of a current block, derive neighboring reference samples of the current block, and generate prediction samples in the current block based on the intra prediction mode/type and the neighboring reference samples. Here, the process for determining the intra prediction mode/type, the process for deriving the neighboring reference samples, and the process for generating the prediction samples may be performed simultaneously, or any one of the processes may be performed before another process. For example, although not shown, the intra predictor 185 of the encoding apparatus may include an intra prediction mode/type determination unit, a reference sample derivation unit, a prediction sample derivation unit. The intra prediction mode/type determination unit may determine an intra prediction mode/type of the current block, the reference sample derivation unit may derive neighboring reference samples of the current block, and the prediction sample derivation unit may derive prediction samples of the current block. In addition, the intra predictor 185 may further include a prediction sample filter when performing a prediction sample filtering process described below. The encoding apparatus may determine a mode/type applied to the current block from among a plurality of intra prediction modes/types. The encoding device may compare the RD costs of the intra prediction modes/types and determine the optimal intra prediction mode/type for the current block.

Further, the encoding apparatus may perform a prediction sample filtering process. The prediction sample filtering may be referred to as post-filtering. Some or all of the prediction samples may be filtered by a prediction sample filtering process. In some cases, the prediction sample filtering process may be omitted.

The encoding apparatus may generate residual samples of the current block based on the (filtered) prediction samples (S920). The encoding device may compare the prediction samples with the original samples of the current block based on the phase and derive residual samples.

The encoding apparatus may encode image information including information regarding intra prediction (prediction information) and residual information of residual samples (S930). The prediction information may include intra prediction mode information and intra prediction type information. The encoding apparatus may output the encoded image information in the form of a bitstream. The output bitstream may be transmitted to a decoding apparatus through a storage medium or a network.

The residual information may include the following residual coding syntax. The encoding device may transform/quantize the residual samples to derive quantized transform coefficients. The residual information may include information on the quantized transform coefficients.

Further, as described above, the encoding apparatus may generate a reconstructed picture (including reconstructed samples and reconstructed blocks). To this end, the encoding device may perform dequantization/inverse transformation again on the quantized transform coefficients to derive (modified) residual samples. The residual samples are transformed/quantized and then dequantized/inverse transformed in order to derive the same residual samples as the residual samples derived in the decoding apparatus as described above. The encoding device may generate a reconstructed block comprising reconstructed samples of the current block based on the prediction samples and the (modified) residual samples. A reconstructed picture of the current picture may be generated based on the reconstructed block. As mentioned above, the in-loop filtering process is further adapted to reconstruct the picture.

For example, the video/image decoding process based on intra prediction and intra predictor in the decoding apparatus may illustratively include the following. The decoding apparatus may perform an operation corresponding to an operation performed in the encoding apparatus.

S1010 to S1030 may be performed by the intra predictor 265 of the decoding apparatus, and the prediction information of S1010 and the residual information of S1040 may be acquired from the bitstream by the entropy decoder 210 of the decoding apparatus. A residual processor of the decoding apparatus including at least one of the dequantizer 220 and the inverse transformer 230 may derive residual samples of the current block based on the residual information. Specifically, the dequantizer 220 of the residual processor may perform dequantization based on quantized transform coefficients derived from the residual information to derive transform coefficients, and the dequantizer 220 of the residual processor may perform inverse transform on the transform coefficients to derive residual samples of the current block. S1050 may be performed by the adder 235 or the reconstructor of the decoding device.

Specifically, the decoding apparatus may derive an intra prediction mode/type of the current block based on the received prediction information (intra prediction mode/type information) (S1010). The decoding device may derive neighboring reference samples of the current block (S1020). The decoding apparatus may generate prediction samples in the current block based on the intra prediction mode/type and the neighboring reference samples (S1030). In this case, the encoding apparatus may perform a prediction sample filtering process. The prediction sample filtering may be referred to as post-filtering. Some or all of the prediction samples may be filtered by a prediction sample filtering process. In some cases, the prediction sample filtering process may be omitted.

The decoding device may generate residual samples for the current block based on the received residual information. The decoding apparatus may generate reconstructed samples of the current block based on the prediction samples and the residual samples and derive reconstructed samples including the reconstructed samples (S1040). A reconstructed picture of the current picture may be generated based on the reconstructed block. As mentioned above, the in-loop filtering process is further adapted to reconstruct the picture.

Here, although not shown, the intra predictor 265 of the decoding apparatus may include an intra prediction mode/type determination unit, which may determine an intra prediction mode/type of the current block based on the intra prediction mode/type information acquired by the entropy decoder 210, a reference sample derivation unit, which may derive neighboring reference samples of the current block, and a predicted sample derivation unit, which may derive predicted samples of the current block. In addition, when the above-described prediction sample filtering process is performed, the intra predictor 265 may further include a prediction sample filter.

The intra prediction mode information may include flag information (e.g., intra _ luma _ MPM _ flag) indicating whether a Most Probable Mode (MPM) or a residual mode is applied to the current block, and when the MPM is applied to the current block, the prediction mode information may further include index information (e.g., intra _ luma _ MPM _ idx) indicating one of intra prediction mode candidates (MPM candidates). The intra prediction mode candidate (MPM candidate) may be configured as an MPM candidate list or an MPM list. In addition, when the MPM is not applied to the current block, the intra prediction mode information may further include residual mode information (e.g., intra _ luma _ MPM _ remaining) indicating one of the remaining intra prediction modes other than the intra prediction mode candidate (MPM candidate). The decoding apparatus may determine an intra prediction mode of the current block based on the intra prediction mode information. A separate MPM list may be configured for the MIP described above.

In addition, the intra prediction type information may be implemented in various forms. For example, the intra prediction type information may include intra prediction type index information indicating one of the intra prediction types. As another example, the intra prediction type information may include at least one of reference sample line information (e.g., intra _ luma _ ref _ idx) indicating whether the MRL is applied to the current block and using a few reference sample lines if applied, ISP flag information (e.g., intra _ sub _ partitions _ mode _ flag) indicating whether the ISP is applied to the current block, ISP type information (e.g., intra _ sub _ partitions _ split _ flag) indicating a partition type of a sub-partition when the ISP is applied, flag information indicating whether the PDCP is applied, or flag information indicating whether the LIP is applied. In addition, the intra prediction type information may include a MIP flag indicating whether MIP is applied to the current block.

The intra prediction mode information and/or the intra prediction type information may be encoded/decoded by the encoding method described in the present disclosure. For example, the intra prediction mode information and/or the intra prediction type information may be encoded/decoded by entropy encoding (e.g., CABAC or CAVLC) based on a truncated (rice) binary code.

Overview of inter prediction

Hereinafter, detailed techniques of inter prediction in the description of encoding and decoding with reference to fig. 2 and 3 will be described. In the case of a decoding apparatus, an inter prediction-based video/image decoding method and an inter predictor in the decoding apparatus may operate according to the following description. In the case of an encoding apparatus, an inter prediction-based video/image encoding method and an inter predictor in the encoding apparatus may operate according to the following description. In addition, in the following description, data encoded by the following description may be stored in the form of a bitstream.

A predictor of the encoding/decoding apparatus may perform inter prediction in units of blocks to derive prediction samples. Inter prediction may refer to prediction derived by a method that depends on data elements (e.g., sample values, motion information, etc.) of pictures other than the current picture. When inter prediction is applied to a current block, a prediction block (prediction sample array) of the current block may be derived based on a reference block (reference sample array) specified by a motion vector on a reference picture indicated by a reference picture index. In this case, in order to reduce the amount of motion information transmitted in the inter prediction mode, the motion information of the current block may be predicted in units of blocks, sub-blocks, or samples based on the correlation of motion information between neighboring blocks and the current block. The motion information may include a motion vector and a reference picture index. The motion information may also include inter prediction type (L0 prediction, L1 prediction, bi prediction, etc.) information. When inter prediction is applied, the neighboring blocks may include spatially neighboring blocks located in a current picture and temporally neighboring blocks located in a reference picture. The reference picture including the reference block and the reference picture including the temporally adjacent block may be the same or different. The temporally neighboring blocks may be referred to as collocated reference blocks or collocated CUs or colCU, and the reference pictures including the temporally neighboring blocks may be referred to as collocated pictures (colPic). For example, a motion information candidate list may be configured based on neighboring blocks of the current block, and a flag or index information indicating which candidate is selected (used) to derive a motion vector and/or a reference picture index of the current block may be signaled. Inter prediction may be performed based on various prediction modes, and for example, in the case of a skip mode and a merge mode, motion information of a current block may be equal to motion information of a selected neighboring block. In case of the skip mode, unlike the merge mode, the residual signal may not be transmitted. In the case of a Motion Vector Prediction (MVP) mode, the motion vectors of the selected neighboring blocks may be used as motion vector predictors, and the motion vector differences may be signaled. In this case, the motion vector of the current block may be derived using the sum of the motion vector predictor and the motion vector difference.

The motion information may include L0 motion information and/or L1 motion information according to an inter prediction type (L0 prediction, L1 prediction, bi prediction, etc.). The motion vector in the L0 direction may be referred to as an L0 motion vector or MVL0, and the motion vector in the L1 direction may be referred to as an L1 motion vector or MVL1. Prediction based on an L0 motion vector may be referred to as L0 prediction, prediction based on an L1 motion vector may be referred to as L1 prediction, and prediction based on both an L0 motion vector and an L1 motion vector may be referred to as Bi prediction. Here, the L0 motion vector may represent a motion vector associated with the reference picture list L0 (L0), and the L1 motion vector may represent a motion vector associated with the reference picture list L1 (L1). The reference picture list L0 may include, as a reference picture, a picture preceding the current picture in output order, and the reference picture list L1 may include a picture following the current picture in output order. Pictures before the current block may be referred to as forward (reference) pictures, and pictures after the current block may be referred to as backward (reference) pictures. The reference picture list L0 may further include a picture following the current picture in output order as a reference picture. In this case, in the reference picture list L0, pictures before the current block are first indexed, and then pictures after the current block are indexed. The reference picture list L1 may further include a picture preceding the current picture in output order as a reference picture. In this case, in the reference picture list 1, pictures following the current block may be first indexed, and then pictures preceding the current block may be indexed. Here, the output order may correspond to a Picture Order Count (POC) order.

For example, a video/image encoding process based on inter prediction and inter predictor in an encoding apparatus may be schematically as follows. A description will be given with reference to fig. 11. The encoding apparatus performs inter prediction on the current block (S1110). The encoding apparatus may derive inter prediction mode and motion information of the current block and generate prediction samples of the current block. Here, the inter prediction mode determination, motion information derivation, and prediction sample derivation processes may be performed simultaneously, and any process may be performed before another process. For example, an inter predictor of an encoding apparatus may include a prediction mode determination unit that may determine a prediction mode of a current block, a motion information derivation unit that may derive motion information of the current block, and a prediction sample derivation unit that may derive prediction samples of the current block. For example, an inter predictor of an encoding apparatus may search for a block similar to a current block within a specific area (search area) of a reference picture through motion estimation, and derive a reference block having a difference from the current block that is minimum or less than or equal to a specific criterion. Based on this, a reference picture index indicating a reference picture in which the reference block is located may be derived, and a motion vector may be derived based on a position difference between the reference block and the current block. The encoding apparatus may determine a mode applied to the current block among various prediction modes. The encoding device may compare the RD costs of the various prediction modes and determine the best prediction mode for the current block.

For example, when the skip mode or the merge mode is applied to the current block, the encoding apparatus may construct a merge candidate list, which will be described below, and derive a reference block having a minimum difference from the current block or less than or equal to a certain criterion among reference blocks indicated by merge candidates included in the merge candidate list. In this case, a merge candidate associated with the derived reference block may be selected, and merge index information indicating the selected merge candidate may be generated and signaled to the decoding apparatus. Motion information of the current block may be derived using motion information of the selected merge candidate.

As another example, when the (a) MVP mode is applied to the current block, the encoding apparatus may construct an (a) MVP candidate list, which will be described below, and use a motion vector of a Motion Vector Predictor (MVP) candidate selected from among MVP candidates included in the (a) MVP candidate list as MVP of the current block. In this case, a motion vector indicating a reference block derived through motion estimation may be used as the motion vector of the current block, and an mvp candidate having a motion vector having the smallest difference from the motion vector of the current block among mvp candidates may be the selected mvp candidate. A Motion Vector Difference (MVD) obtained by subtracting mvp from a motion vector of the current block may be derived. In this case, information on the MVD may be signaled to the decoding apparatus. In addition, when the (a) MVP mode is applied, the value of the reference picture index may be constructed as reference picture index information and signaled to the decoding apparatus.

The encoding apparatus may derive residual samples based on the prediction samples (S1120). The encoding apparatus may derive residual samples through comparison between original samples and prediction samples of the current block.

The encoding apparatus encodes image information including prediction information and residual information (S1130). The encoding apparatus may output the encoded image information in the form of a bitstream. The prediction information may include prediction mode information (e.g., a skip flag, a merge flag, a mode index, etc.) and motion information as information related to a prediction process. The motion information may include candidate selection information (e.g., a merge index, an mvp flag, or an mvp index) as information for deriving the motion vector. In addition, the motion information may include information about the above-described MVDs and/or reference picture index information. In addition, the motion information may include information indicating whether to apply L0 prediction, L1 prediction, or bi prediction. The residual information is information about residual samples. The residual information may include information on quantized transform coefficients of the residual samples.

The output bitstream may be stored in a (digital) storage medium and transmitted to the decoding device, or may be transmitted to the decoding device via a network.

Further, as described above, the encoding apparatus may generate a reconstructed picture (including reconstructed samples and reconstructed blocks) based on the reference samples and the residual samples. This enables the decoding apparatus to derive the same prediction result as that performed in the encoding apparatus, thereby improving the encoding efficiency. Accordingly, the encoding apparatus may store the reconstructed picture (or reconstructed sample or reconstructed block) in the memory and use it as a reference picture for inter prediction. As described above, the in-loop filtering process may be further applied to the reconstructed picture.

For example, the video/image decoding process based on inter prediction and inter predictor in the decoding apparatus may illustratively include the following.

The decoding apparatus may perform an operation corresponding to the operation performed by the encoding apparatus. The decoding apparatus may perform prediction on the current block based on the received prediction information and derive prediction samples.

Specifically, the decoding apparatus may determine a prediction mode of the current block based on the received prediction information (S1210). The decoding apparatus may determine which inter prediction mode is applied to the current block based on the prediction mode information in the prediction information.

For example, whether the merge mode or (a) MVP mode is applied to the current block may be determined based on the merge flag. Alternatively, one of various inter prediction mode candidates may be selected based on the mode index. The inter prediction mode candidates may include a skip mode, a merge mode, and/or (a) an MVP mode, or may include various inter prediction modes as will be described below.

The decoding apparatus derives motion information of the current block based on the determined inter prediction mode (S1220). For example, when the skip mode or the merge mode is applied to the current block, the decoding apparatus may construct a merge candidate list described below and select one of the merge candidates included in the merge candidate list. The selection may be performed based on the above-described selection information (merge index). Motion information of the current block may be derived using motion information of the selected merge candidate. The motion information of the selected merge candidate may be used as the motion information of the current block.

As another example, when the (a) MVP mode is applied to the current block, the decoding apparatus may construct an (a) MVP candidate list, which will be described below, and use a motion vector of a Motion Vector Predictor (MVP) candidate selected from among MVP candidates included in the (a) MVP candidate list as MVP of the current block. The selection may be performed based on the above-described selection information (mvp flag or mvp index). In this case, the MVD of the current block may be derived based on the information on the MVD, and the motion vector of the current block may be derived based on the MVD and mvp of the current block. In addition, a reference picture index of the current block may be derived based on the reference picture index information. The picture indicated by the reference picture index in the reference picture list of the current block may be derived as a reference picture to which inter prediction of the current block refers.

Also, as described above, motion information of the current block may be derived without constructing a candidate list. In this case, the motion information of the current block may be derived according to a procedure in a prediction mode, which will be described below. In this case, the construction of the candidate list as described above may be omitted.

The decoding apparatus may generate a prediction sample of the current block based on the motion information of the current block (S1230). In this case, a reference picture may be derived based on a reference picture index of the current block, and prediction samples of the current block may be derived using samples of a reference block indicated by a motion vector of the current block on the reference picture. In this case, as described above, in some cases, a prediction sample filtering process for all or some prediction samples of the current block may be further performed.

For example, the inter predictor of the decoding apparatus may include a prediction mode determination unit that may determine a prediction mode of the current block based on received prediction mode information, a motion information derivation unit that may derive motion information (a motion vector and/or a reference picture index) of the current block based on the received motion information, and a prediction sample derivation unit that may derive prediction samples of the current block.

The decoding apparatus generates residual samples of the current block based on the received residual information (S1240). The decoding apparatus may generate reconstructed samples of the current block based on the prediction samples and the residual samples and generate a reconstructed picture based thereon (S1250). As described above, the in-loop filtering process may be further applied to the reconstructed picture.

As described above, the inter prediction process may include a step of determining an inter prediction mode, a step of deriving motion information according to the determined prediction mode, and a step of performing prediction (prediction sample generation) based on the derived motion information. As described above, the inter prediction process may be performed in the encoding apparatus and the decoding apparatus.

Quantization/dequantization

As described above, the quantizer of the encoding apparatus may derive the quantized transform coefficient by applying quantization to the transform coefficient, and the dequantizer of the encoding apparatus or the dequantizer of the decoding apparatus may derive the transform coefficient by applying dequantization to the quantized transform coefficient.

In encoding and decoding of moving image/still image, the quantization ratio may be changed, and the compression ratio may be adjusted using the changed quantization ratio. From an implementation point of view, in consideration of complexity, instead of directly using the quantization ratio, a Quantization Parameter (QP) may be used. For example, quantization parameters having integer values from 0 to 63 may be used, and each quantization parameter value may correspond to an actual quantization ratio. In addition, the quantization parameter QP of the luminance component (luminance sample) may be set differently _Y And quantization parameter QP for chroma components (chroma samples) _C 。

In the quantization process, the transform coefficient C may be received and divided by a quantization ratio Qstep, thereby obtaining a quantized transform. In this case, in consideration of computational complexity, the quantization ratio may be multiplied by a scale to become an integer, and a shift operation may be performed by a value corresponding to the scale value. Based on the product of the quantization ratio and the scale value, a quantization scale may be derived. That is, the quantization scale may be derived from the QP. By applying a quantization scale to the transform coefficient C, a quantized transform coefficient C' can be derived.

The dequantization process is the inverse of the quantization process. The reconstructed transform coefficient C ″ may be obtained by multiplying the quantized transform coefficient C' by the quantization ratio Qstep. In addition, a level scale may be derived from the quantization parameter, and the level scale may be applied to the quantized transform coefficient C', thereby deriving a reconstructed transform coefficient C ″. The reconstructed transform coefficient C "may be slightly different from the original transform coefficient C due to losses in the transform and/or quantization process. Therefore, even in the encoding apparatus, dequantization can be performed in the same manner as in the decoding apparatus.

Furthermore, an adaptive frequency weighted quantization technique that adjusts the quantization strength according to the frequency may be applied. The adaptive frequency weighting quantization technique refers to a method of applying quantization intensity differently according to frequency. In adaptive frequency weighted quantization, quantization strength may be applied differently according to frequency using a predefined quantization scaling matrix. That is, the quantization/dequantization process described above may be performed based on a quantization scaling matrix. For example, different quantization scaling matrices may be used according to the size of the current block and/or whether a prediction mode applied to the current block is inter prediction or intra prediction in order to generate a residual signal of the current block. The quantization scaling matrix may be referred to as a quantization matrix or a scaling matrix. The quantization scaling matrix may be predefined. In addition, for frequency adaptive scaling, frequency quantization scaling information for the quantization scaling matrix may be constructed/encoded in the encoding device and signaled to the decoding device. The frequency quantization scale information may be referred to as quantization scaling information. The frequency quantizer scale information may include scaling list data scaling _ list _ data. A (modified) quantization scaling matrix may be derived based on the scaling list data. In addition, the frequency quantizer scale information may include a presence flag indicating whether or not scaling list data is present. Alternatively, when the zoom list data is signaled at a higher level (e.g., SPS), information indicating whether the zoom list data is modified at a lower level (e.g., PPS or patch group header, etc.) may also be included.

Transformation/inverse transformation

As described above, the encoding apparatus may derive a residual block (residual samples) based on a block (prediction block) predicted by intra/inter/IBC prediction, and derive quantized transform coefficients by applying transform and quantization to the derived residual samples. Information on the quantized transform coefficients (residual information) may be included and encoded in a residual coding syntax and output in the form of a bitstream. The decoding apparatus may acquire and decode information (residual information) on the quantized transform coefficients from the bitstream to derive the quantized transform coefficients. The decoding device may derive residual samples by dequantization/inverse transform based on the quantized transform coefficients. As described above, either quantization/dequantization or transform/inverse transform may be skipped. When skipping transform/inverse transform, the transform coefficients may be referred to as coefficients or residual coefficients, and may also be referred to as transform coefficients for consistency of expression. Whether a transform/inverse transform is skipped may be signaled based on a transform skip flag (e.g., transform _ skip _ flag).

The transformation/inverse transformation may be performed based on a transformation kernel. For example, a Multiple Transform Selection (MTS) scheme for performing a transform/inverse transform may be applied. In this case, some of the plurality of transform kernel sets may be selected and applied to the current block. The transform kernel may refer to various terms such as a transform matrix or a transform type. For example, the set of transform kernels may indicate a combination of a vertically oriented transform kernel (vertical transform kernel) and a horizontally oriented transform kernel (horizontal transform kernel).

The transform/inverse transform may be performed in units of CUs or TUs. That is, the transform/inverse transform may be applied to residual samples in the CU or residual samples in the TU. The CU size may be equal to the TU size, or multiple TUs may exist in the CU region. Further, the CU size may generally indicate a luma component (sample) CB size. The TU size may generally indicate the luma component (sample) TB size. The chroma component (sample) CB or TB size can be derived based on the luma component (sample) CB or TB size according to the component ratio of the color format (chroma format) (e.g., 4. The TU size may be derived based on maxTbSize. For example, when the CU size is larger than maxTbSize, a plurality of TUs (TBs) having maxTbSize may be derived from the CU, and transformation/inverse transformation may be performed in units of TUs (TBs). maxTbSize may be considered to determine whether to apply various intra prediction types (such as ISP). The information on maxTbSize may be predetermined, or may be generated and encoded in the encoding apparatus and signaled to the encoding apparatus.

Entropy coding

As described above with reference to fig. 2, all or some of the video/image information may be entropy encoded by the entropy encoder 190, and all or some of the video/image information described with reference to fig. 3 may be entropy decoded by the entropy decoder 310. In this case, the video/image information may be encoded/decoded in units of syntax elements. In this disclosure, encoding/decoding information may include encoding/decoding by the methods described in this paragraph.

Fig. 13 is a block diagram of CABAC for encoding one syntax element. In the encoding process of CABAC, first, when an input signal is a syntax element instead of a binary value, the input signal may be transformed into a binary value by binarization. When the input signal is already binary, binarization may be bypassed. Here, a

binary number

0 or 1 constituting a binary value may be referred to as bin. For example, when a binary string (bin string) after binarization is 110, each of 1, and 0 may be referred to as one bin. A bin of one syntax element may represent a value of the corresponding syntax element.

The binarized bin may be input to either a conventional encoding engine or a bypass encoding engine. The conventional encoding engine may assign a context model reflecting the probability value to the corresponding bin and encode the corresponding bin based on the assigned context model. Conventional coding engines may encode each bin and then update the probability model for the corresponding bin. A bin coded in this way may be referred to as a context coding bin. The bypass coding engine may bypass the process for estimating the probability for an input bin and the process for updating the probability model applied to the corresponding bin after coding. The bypass coding engine can encode the input bin by applying a uniform probability distribution (e.g., 50. Bins encoded in this manner may be referred to as bypass bins. The context model may be allocated and updated for each context coding (regular coding) bin and may be indicated based on ctxidx or ctxInc. ctxidx may be derived based on ctxInc. Specifically, for example, the context index ctxidx indicating the context model of each regular coding bin may be derived as the sum of a context index increment (ctxInc) and a context index offset (ctxIdxOffset). Here, ctxInc may be derived differently for each bin. ctxIdxOffset may be represented by the lowest value of ctxIdx. The lowest value of ctxIdx may be referred to as the initial value initValue of ctxIdx. ctxIdxOffset is a value that is generally used to distinguish a context model of one syntax element from other syntax elements, and can distinguish/derive the context model of one syntax element based on ctxinc.

In the entropy encoding process, it may be determined whether encoding is performed by a normal encoding engine or a bypass encoding engine, and an encoding path may be switched. The entropy decoding may be performed in the reverse order of the same process as the entropy encoding.

For example, the above entropy encoding may be performed as in fig. 14 and 15. Referring to fig. 14 and 15, an encoding apparatus (entropy encoder) may perform an entropy encoding process with respect to image/video information. The image/video information may include segmentation-related information, prediction-related information (e.g., inter/intra prediction distinction information, intra prediction mode information, inter prediction mode information, etc.), residual information, in-loop filtering-related information, etc., or may include various syntax elements related thereto. Entropy encoding may be performed in units of syntax elements. Steps S1410 to S1420 of fig. 14 may be performed by the entropy encoder 190 of the encoding apparatus of fig. 2.

The encoding device may perform binarization for the target syntax element (S1410). Here, the binarization may be based on various binarization methods such as a truncated rice binarization process, a fixed length binarization process, and the like, and the binarization method of the target syntax element may be defined in advance. The binarization process may be performed by the binarization unit 191 in the entropy encoder 190.

The encoding apparatus may entropy-encode the target syntax element (S1420). The encoding device may perform normal-coding (context-based) or bypass-coding-based coding for the bin string of the target syntax element based on an entropy encoding technique such as CABAC (context-adaptive arithmetic coding) or CAVLC (context-adaptive variable length coding), the output of which may be included in the bitstream. The entropy encoding process may be performed by an entropy encoding processor 192 in the entropy encoder 190. As described above, the bitstream may be transmitted to the decoding apparatus through a (digital) storage medium or a network.

Referring to fig. 16 and 17, a decoding apparatus (entropy decoder) may decode encoded image/video information. The image/video information may include segmentation-related information, prediction-related information (e.g., inter/intra prediction distinction information, intra prediction mode information, inter prediction mode information, etc.), residual information, in-loop filtering-related information, etc., or may include various syntax elements related thereto. Entropy encoding may be performed in units of syntax elements. Steps S1610 to S1620 may be performed by the entropy decoder 210 of the decoding apparatus of fig. 3.

The decoding apparatus may perform binarization for the target syntax element (S1610). Here, binarization may be performed based on various binarization methods such as a truncated rice binarization process, a fixed length binarization process, and the like, and the binarization method of the target syntax element may be defined in advance. The decoding device may derive an available bin string (bin string candidate) for the available value of the target syntax element through a binarization process. The binarization process may be performed by the binarization unit 211 in the entropy decoder 210.

The decoding apparatus may perform entropy decoding with respect to the target syntax element (S1620). When sequentially decoding and parsing the bins of the target syntax element from the input bits in the bitstream, the decoding device may compare the derived bin string with the available bin strings of the corresponding syntax elements. If the derived bin string is equal to one of the available bin strings, the value corresponding to the corresponding bin string may be derived as the value of the corresponding syntax element. If not, the above process may be performed again after further parsing of the next bit in the bitstream. By this procedure, the corresponding information is signaled using variable length bits without using the start bit or the end bit of specific information (specific syntax element) in the bitstream. In this way, relatively few bits may be assigned low values and overall coding efficiency may be improved.

The decoding device may perform context-based or bypass-based decoding for bins of a bin string from the bitstream based on entropy encoding techniques such as CABAC or CAVLC. The entropy decoding process may be performed by the entropy decoding processor 212 in the entropy decoder 210. The bitstream may include various information for image/video decoding as described above. As described above, the bitstream may be transmitted to the decoding apparatus through a (digital) storage medium or a network.

In the present disclosure, a table including syntax elements (syntax table) may be used for signaling of information from an encoding apparatus to a decoding apparatus. The order of syntax elements of the table including syntax elements used in the present disclosure may indicate an order in which the syntax elements are parsed from the bitstream. The encoding apparatus may construct and encode the syntax table such that the decoding apparatus parses the syntax elements in a parsing order, and the decoding apparatus may parse and decode the syntax elements of the corresponding syntax table from the bitstream according to the parsing order and obtain values of the syntax elements.

Universal image/video coding process

In image/video coding, pictures constituting an image/video may be encoded/decoded according to a decoding order. The picture order corresponding to the output order of the decoded pictures may be set to be different from the decoding order, and based on this, not only forward prediction but also backward prediction may be performed during inter prediction.

Fig. 18 shows an example of an exemplary picture decoding process to which the embodiments of the present disclosure are applied. In fig. 18, S1810 may be performed in the entropy decoder 210 of the decoding apparatus, S1820 may be performed in the predictor including the intra predictor 265 and the inter predictor 260, S1830 may be performed in the residual processor including the dequantizer 220 and the inverse transformer 230, S1840 may be performed in the adder 235, and S1850 may be performed in the filter 240. S1810 may include an information decoding process described in the present disclosure, S1820 may include an inter/intra prediction process described in the present disclosure, S1830 may include a residual processing process described in the present disclosure, S1840 may include a block/picture reconstruction process described in the present disclosure, and S1850 may include an in-loop filtering process described in the present disclosure.

Referring to fig. 18, the picture decoding process may illustratively include a process of obtaining image/video information from a bitstream (through decoding) (S1810), a picture reconstruction process (S1820 to S1840), and an in-loop filtering process of reconstructing a picture (S1850). The picture reconstruction process may be performed based on prediction samples and residual samples obtained through inter/intra prediction (S1820) and residual processing (S1830) (dequantization and inverse transformation of quantized transform coefficients) described in this disclosure. For a reconstructed picture generated by the picture reconstruction process, a modified reconstructed picture may be generated by an in-loop filtering process, which may be output as a decoded picture, stored in a decoded picture buffer or memory 250 of a decoding apparatus, and used as a reference picture in an inter prediction process when the picture is decoded later. In some cases, the in-loop filtering process may be omitted. In this case, the reconstructed picture may be output as a decoded picture, stored in a decoded picture buffer or memory 250 of the decoding apparatus, and used as a reference picture in an inter prediction process when the picture is decoded later. The in-loop filtering process (S1850) may include a deblocking filtering process, a Sample Adaptive Offset (SAO) process, an Adaptive Loop Filter (ALF) process, and/or a bilateral filtering process, some or all of which may be omitted, as described above. In addition, one or some of the deblocking filtering process, the Sample Adaptive Offset (SAO) process, the Adaptive Loop Filter (ALF) process, and/or the bilateral filter process may be applied in sequence, or all of them may be applied in sequence. For example, after applying the deblocking filtering process to the reconstructed picture, an SAO process may be performed. Alternatively, for example, after applying the deblocking filtering process to the reconstructed picture, the ALF process may be performed. This can be performed similarly even in an encoding device.

Fig. 19 shows an example of an exemplary picture coding process to which the embodiments of the present disclosure are applied. In fig. 19, S1910 may be performed in a predictor including the intra predictor 185 or the inter predictor 180 of the encoding apparatus described above with reference to fig. 2, S1920 may be performed in a residual processor including the transformer 120 and/or the quantizer 130, and S1930 may be performed in the entropy encoder 190. S1910 may include an inter/intra prediction process described in the present disclosure, S1920 may include a residual processing process described in the present disclosure, and S1930 may include an information encoding process described in the present disclosure.

Referring to fig. 19, the picture coding process may illustratively include not only a process for encoding information (e.g., prediction information, residual information, partition information, etc.) for picture reconstruction and outputting the information in the form of a bitstream, but also a process for generating a reconstructed picture of a current picture and an (optional) process for applying in-loop filtering to the reconstructed picture, as described with respect to fig. 2. The encoding apparatus may derive (modified) residual samples from the quantized transform coefficients through the dequantizer 140 and the inverse transformer 150, and generate a reconstructed picture based on the prediction samples and the (modified) residual samples, which are the output of S1910. The reconstructed picture generated in this manner may be equal to the reconstructed picture generated in the decoding apparatus. The modified reconstructed picture may be generated by an in-loop filtering process for the reconstructed picture, may be stored in the decoded picture buffer or memory 170, and may be used as a reference picture in an inter prediction process when the picture is later encoded, similar to the case in the decoding apparatus. As noted above, in some cases, some or all of the in-loop filtering process may be omitted. When performing the in-loop filtering process, the (in-loop) filtering related information (parameters) may be encoded in the entropy encoder 190 and output in the form of a bitstream, and the decoding apparatus may perform the in-loop filtering process based on the filtering related information using the same method as the encoding apparatus.

By such in-loop filtering process, noise generated during image/video encoding, such as blocking artifacts and ringing artifacts, can be reduced, and subjective/objective visual quality can be improved. In addition, by performing the in-loop filtering process in the encoding apparatus and the decoding apparatus, the encoding apparatus and the decoding apparatus can derive the same prediction result, the reliability of picture coding can be increased, and the amount of data transmitted for picture coding can be reduced.

As described above, the picture reconstruction process can be performed not only in the decoding apparatus but also in the encoding apparatus. A reconstructed block may be generated based on intra prediction/inter prediction in units of blocks, and a reconstructed picture including the reconstructed block may be generated. When the current picture/slice/tile group is an I-picture/slice/tile group, blocks included in the current picture/slice/tile group may be reconstructed based on only intra prediction. Also, when the current picture/slice/tile group is a P or B picture/slice/tile group, blocks included in the current picture/slice/tile group may be reconstructed based on intra prediction or inter prediction. In this case, inter prediction may be applied to some blocks in the current picture/slice/tile group, and intra prediction may be applied to the remaining blocks. The color components of a picture may include a luminance component and a chrominance component, and the methods and embodiments of the present disclosure may be applied to the luminance component and the chrominance component unless the present disclosure is expressly limited.

Examples of coding layers and structures

The encoded video/images according to the present disclosure may be processed, for example, according to encoding layers and structures as will be described below.

Fig. 20 is a view showing a layer structure of an encoded image. The encoded pictures can be classified into a Video Coding Layer (VCL) for an image decoding process and its own processing, a lower system for transmitting and storing encoded information, and a Network Abstraction Layer (NAL) existing between the VCL and the lower system and responsible for a network adaptation function.

In the VCL, VCL data including compressed image data (slice data) may be generated, or a Supplemental Enhancement Information (SEI) message additionally required by a decoding process of a parameter set or an image including information such as a Picture Parameter Set (PPS), a Sequence Parameter Set (SPS), or a Video Parameter Set (VPS) may be generated.

In the NAL, header information (NAL unit header) may be added to a Raw Byte Sequence Payload (RBSP) generated in the VCL to generate a NAL unit. In this case, RBSP refers to slice data, parameter sets, and SEI messages generated in the VCL. The NAL unit header may include NAL unit type information specified according to RBSP data included in a corresponding NAL unit.

As shown, NAL units can be classified into VCL NAL units and non-VCL NAL units according to RBSPs generated in VCL. A VCL NAL unit may refer to a NAL unit including information (slice data) on a picture, and a non-VCL NAL unit may refer to a NAL unit including information (parameter set or SEI message) required to decode a picture.

The VCL NAL units and non-VCL NAL units may be accompanied by header information and transmitted over the network according to the lower system data standard. For example, NAL units may be modified into a data form of a predetermined standard such as h.266/VVC file format, RTP (real-time transport protocol), or TS (transport stream), and transmitted through various networks.

As described above, in a NAL unit, a NAL unit type may be specified according to an RBSP data structure included in the corresponding NAL unit, and information on the NAL unit type may be stored in a NAL unit header and signaled.

For example, this can be roughly classified into a VCL NAL unit type and a non-VCL NAL unit type according to whether the NAL unit includes information (slice data) about a picture. The VCL NAL unit types may be classified according to the characteristics and types of pictures included in the VCL NAL units, and the non-VCL NAL unit types may be classified according to the types of parameter sets.

Examples of NAL unit types specified according to the type of parameter sets/information included in non-VCL NAL unit types are listed below.

DCI (decoding capability information) NAL unit: type of NAL unit including DCI

VPS (video parameter set) NAL unit: type of NAL unit including VPS

SPS (sequence parameter set) NAL unit: types of NAL units including SPS

PPS (picture parameter set) NAL unit: type of NAL unit including PPS

-APS (adaptation parameter set) NAL unit: type of NAL unit including APS

PH (picture header) NAL unit: type of NAL unit including PH

The NAL unit type may have syntax information for the NAL unit type, and the syntax information may be stored in a NAL unit header and signaled. For example, the syntax information may be NAL _ unit _ type, and the NAL unit type may be designated as a NAL _ unit _ type value.

Further, as described above, one picture may include a plurality of slices, and one slice may include a slice header and slice data. In this case, one picture header may be further added to a plurality of slices (slice header and slice data set) in one picture. The picture header (picture header syntax) may include information/parameters that are generally applicable to the picture.

The slice header (slice header syntax) may include information/parameters commonly applicable to slices. The APS (APS syntax) or PPS (PPS syntax) may include information/parameters commonly applicable to one or more slices or pictures. SPS (SPS syntax) may include information/parameters that are commonly applicable to one or more sequences. The VPS (VPS syntax) may include information/parameters commonly applicable to a plurality of layers. The DCI (DCI syntax) may include information/parameters commonly applied to the entire video. The DCI may include information/parameters related to decoding capability. In the present disclosure, the High Level Syntax (HLS) may include at least one of an APS syntax, a PPS syntax, an SPS syntax, a VPS syntax, a DCI syntax, a picture header syntax, or a slice header syntax. Also, in the present disclosure, the Low Level Syntax (LLS) may include, for example, a slice data syntax, a CTU syntax, a coding unit syntax, a transform unit syntax, and the like.

In the present disclosure, the image/video information encoded in the encoding apparatus and signaled to the decoding apparatus in the form of a bitstream may include not only intra-picture division related information, intra/inter prediction information, residual information, in-loop filtering information, but also information on a slice header, information on a picture header, information on an APS, information on a PPS, information on an SPS, information on a VPS, and/or information on a DCI. In addition, the picture/video information may also include general constraint information and/or information on NAL unit headers.

Segmenting a picture using sprites, slices and tiles

One picture can be divided into at least one tile row and at least one tile column. One tile may consist of a sequence of CTUs and may cover a rectangular area of one picture.

A slice may consist of an integer number of complete patches or an integer number of consecutive complete rows of CTUs in a picture.

For slicing, two modes can be supported: one mode may be referred to as a raster scan slice mode and the other mode may be referred to as a rectangular slice mode. In the raster scan slice mode, a slice may include a complete sequence of tiles that exist in a picture in tile raster scan order. In the rectangular slice mode, one slice may include multiple complete tiles assembled to form a rectangular region of the picture, or multiple consecutive complete rows of CTUs assembled to form one tile of the rectangular region of the picture. Tiles in a rectangular slice may be scanned in tile raster-scan order in the rectangular region corresponding to the slice. A sprite may include at least one slice, which is assembled to cover a rectangular area of the picture.

In order to describe the division relationship of the picture in more detail, a description will be given with reference to fig. 21 to 24. Fig. 21 to 24 illustrate embodiments of dividing a picture using tiles, slices, and sprites. Fig. 21 shows an example of a picture that is divided into 12 tiles and three raster-scan slices. Fig. 22 shows an example of a picture that is partitioned into 24 tiles (6 tile columns and 4 tile rows) and 9 rectangular slices. Fig. 23 shows an example of a picture that is divided into four tiles (two tile columns and two tile rows) and four rectangular slices.

Fig. 24 shows an example of dividing a picture into sprites. In fig. 24, a picture can be divided into 12 left tiles covering one slice composed of 4 × 4 CTUs and 6 right tiles covering two slices vertically assembled composed of 2 × 2 CTUs, so that one picture is divided into 24 slices and 24 sub-pictures having different areas. In the example of fig. 24, individual slices correspond to individual sprites.

HLS (high level syntax) signaling and semantics

As described above, HLSs may be encoded and/or signaled for video and/or image encoding. As described above, in the present disclosure, video/image information may be included in the HLS. In addition, an image/video encoding method may be performed based on such image/video information.

Picture header and slice header

The coded picture may be composed of at least one slice. The parameters describing the coded picture may be signaled in the Picture Header (PH) or the parameters describing the slice may be signaled in the Slice Header (SH). The PH may be transmitted as a NAL unit type. SH may be provided at a start point of a NAL unit configuring a payload of a slice (e.g., slice data).

Picture segmentation signaling

In one embodiment, a picture can be partitioned into multiple sub-pictures, tiles, and/or slices. The signaling of the sprite may be provided in a sequence parameter set. Signaling of tiles and rectangular slices can be provided in the picture parameter set. Additionally, signaling of raster scan slices may be provided in the slice header.

Fig. 25 illustrates an embodiment of a syntax of a sequence parameter set. In the syntax of fig. 25, syntax elements are as follows.

The sub _ info _ present _ flag syntax element may indicate whether sub picture information exists. For example, a first value (e.g., 0) of the sub _ info _ present _ flag may indicate that sub-picture information of a Coding Layer Video Sequence (CLVS) does not exist in the bitstream and that only one sub-picture exists in individual pictures of the CLVS. A second value (e.g., 1) of the sub _ info _ present _ flag may indicate that sub-picture information of a Coding Layer Video Sequence (CLVS) is present in the bitstream and at least one sub-picture may be present in an individual picture of the CLVS.

Here, CLVS may mean a layer of an encoded video sequence. The CLVS may be a Prediction Unit (PU) sequence having the same nuh layer id as a progressive decoding refresh (GDR) picture or a Prediction Unit (PU) of an Intra Random Access Point (IRAP) picture, which is not output until a reconstruction signal is generated.

The syntax element sps _ num _ sub _ minus1 may indicate the number of sub-pictures. For example, a value obtained by adding 1 to this may represent the number of sprites belonging to the individual frame of CLVS. The value of sps _ num _ sub _ minus1 may have a value from 0 to Ceil (pic _ width _ max _ in _ luma _ samples ÷ CtbSizeY) _ Ceil (pic _ height _ max _ in _ luma _ samples ÷ CtbSizeY) -1. When the value of sps _ num _ sub _ minus1 is not present, the value of sps _ num _ sub _ minus1 can be derived as 0.

A value of 1 of the syntax element sps _ independent _ sub _ flag may indicate that intra prediction is not performed outside the boundary of the sub-picture in CLVS, inter prediction is not performed, and an in-loop filtering operation is not performed.

A value of 0 for the syntax element sps _ independent _ sub _ flag may indicate that an inter prediction or an in-loop filtering operation may be performed outside the boundary of a sub-picture in CLVS. When the value of the sps _ independent _ sub _ flag does not exist, the value of the sps _ independent _ sub _ flag may be derived as 0.

The syntax element subpac CTU top left x i may indicate the horizontal position of the top left CTU of the ith sub-picture in units CtbSizeY. The length of the subppic _ ctu _ top _ left _ x [ i ] syntax element may be Ceil (Log 2 ((pic _ width _ max _ in _ luma _ samples + CtbSizeY-1) > > CtbLog2 SizeY)) bits. When subpac _ ctu _ top _ left _ x [ i ] is not present, its value can be derived as 0. Here, pic _ width _ max _ in _ luma _ samples may be a variable indicating the maximum width of a picture expressed in units of luminance samples. CtbSizeY may be a variable indicating the size of the luminance sample unit size of the CTB. The CtbLog2SizeY may be a variable indicating a value obtained by taking log2 for the luminance sample unit size of CTB.

The syntax element subpac CTU top left y i may indicate the vertical position of the top left CTU of the ith sub-picture in units CtbSizeY. The length of the subppic _ ctu _ top _ left _ x [ i ] syntax element may be Ceil (Log 2 ((pic _ height _ max _ in _ luma _ samples + CtbSizeY-1) > > CtbLog2 SizeY)) bits. Here, pic _ height _ max _ in _ luma _ samples may be a variable indicating the maximum height of a picture expressed in units of luminance samples. When the sub _ ctu _ top _ left _ y [ i ] is not present, its value can be derived as 0.

A value obtained by adding 1 to the syntax element sub _ width _ minus1[ i ] may indicate the width of the first sub-picture, and the unit thereof may be CtbSizeY. The length of the sub _ width _ minus1[ i ] can be Ceil (Log 2 ((pic _ width _ max _ in _ luma _ samples + CtbSizeY-1) > > CtbLog2 SizeY)) bits. When the value of sub _ width _ minus1[ i ] is not present, the value of sub _ width _ minus1[ i ] can be calculated as ((pic _ width _ max _ in _ luma _ samples + CtbSizeY-1) > > CtbLog2 SizeY) -sub _ ctu _ top _ left _ x [ i ] -1.

A value obtained by adding 1 to the syntax element sub _ height _ minus1[ i ] may indicate the height of the first sprite, and the unit thereof may be CtbSizeY. The length of subpac _ height _ minus1[ i ] may be a Ceil (Log 2 ((pic _ height _ max _ in _ luma _ samples + CtbSizeY-1) > > CtbLog2 SizeY)) bit. When sub _ height _ minus1[ i ] is not present, the value of sub _ height _ minus1[ i ] can be calculated as ((pic _ height _ max _ in _ luma _ samples + CtbSizeY-1) > > CtbLog2 SizeY) -sub _ ctu _ top _ left _ y [ i ] -1.

A value of 1 for the syntax element supplemental _ processed _ as _ pic _ flag [ i ] indicates that the i-th sub-picture of each coded picture in CLVS is treated as one picture except for the in-loop filtering operation. A value of 0 for supplemental _ processed _ as _ pic _ flag [ i ] indicates that the i-th sub-picture of each coded picture in CLVS is not considered as one picture except for the in-loop filtering operation. When the temporal _ traversed _ as _ pic _ flag [ i ] is not present, the value of temporal _ traversed _ as _ pic _ flag [ i ] may be set to the value of the sps _ independent _ temporal _ flag.

A value of 1 for the syntax element loop _ filter _ across _ temporal _ enabled _ flag [ i ] may indicate that in-loop filtering may be performed outside the boundaries of the ith sub-picture in each coded picture in the CLVS. A value of 0 for loop _ filter _ across _ sub _ enabled _ flag [ i ] may indicate that in-loop filtering is not performed outside the boundaries of the ith sub-picture in each coded picture in the CLVS. When the value of loop _ filter _ across _ subacid _ enabled _ flag [ i ] does not exist, the value of loop _ filter _ across _ subacid _ enabled _ flag [ i ] may be determined as 1-sps _ index _ subacid _ flag.

Fig. 26 is a view showing an embodiment of syntax of a picture parameter set. In the syntax of fig. 26, syntax elements are as follows.

A first value (e.g., 0) of the syntax element no _ pic _ partition _ flag may indicate that each picture of the reference PPS may be partitioned into two or more tiles or slices. A second value (e.g., 1) of the no _ pic _ partition _ flag may indicate that no picture partitioning is applied to each picture of the reference PPS.

A value obtained by adding 5 to the syntax element pps _ log2_ CTU _ size _ minus5 may indicate a luma encoding block size of each CTU. The value of pps _ log2_ ctu _ size _ minus5 may be limited to be equal to sps _ log2_ ctu _ size _ minus5, which indicates the same value in the sequence parameter set.

A value obtained by adding 1 to the syntax element num _ exp _ tile _ columns _ minus1 indicates the number of tile column widths explicitly provided. The value of num _ exp _ tile _ columns _ minus1 may have a value from 0 to picwidthlnctbsy-1. When the value of no _ pic _ partition _ flag is 1, the value of num _ exp _ tile _ columns _ minus1 may be derived as 0.

A value obtained by adding 1 to the syntax element num _ exp _ tile _ rows _ minus1 may indicate the number of explicitly provided tile row heights. The value of num _ exp _ tile _ rows _ minus1 may have a value from 0 to PicHeightInCtbsY-1. When the value of no _ pic _ partition _ flag is 1, the value of num _ exp _ tile _ rows _ minus1 can be derived as 0.

A value obtained by adding 1 to the syntax element tile _ column _ width _ minus1[ i ] may indicate the width of the ith tile column in units of CTBs. Here, i may have a value from 0 to num _ exp _ tile _ columns _ minus 1-1. tile _ column _ width _ minus1 num _ exp _ tile _ columns _ minus1 may be used to derive the width of a tile having an index equal to or greater than the tile column of num _ exp _ tile _ columns _ minus 1. the value of tile _ column _ width _ minus1[ i ] can have a value from 0 to PicWidthInCtbsY-1. When tile _ column _ width _ minus1[ i ] is not provided from the bitstream, the value of tile _ column _ width _ minus1[0] may be set to the value of picwidthlnctbsy-1.

A value obtained by adding 1 to the syntax element tile _ row _ height _ minus1[ i ] may indicate the height of the ith tile row in units of CTBs. Here, i may have a value from 0 to num _ exp _ tile _ rows _ minus 1-1. tile _ row _ height _ minus1 num _ exp _ tile _ rows _ minus1 may be used to derive the height of tiles with an index equal to or greater than the tile row of num _ exp _ tile _ rows _ minus 1. the value of tile _ row _ height _ minus1[ i ] may have a value from 0 to PicHeightInCtbsY-1. When tile _ row _ height _ minus1[ i ] is not provided from the bitstream, the value of tile _ row _ height _ minus1[0] may be set to the value of PicHeightInCtbsY-1.

A value of 0 for the syntax element rect slice flag may indicate that the tiles in each slice are scanned in raster scan order and no slice information is signaled through the picture parameter set. A value of 1 for rect _ slice _ flag may indicate that the tile in each slice covers a rectangular region of the picture and the slice information is signaled by the picture parameter set. Here, the variable NumTilesInPic may represent the number of tiles present in the picture. When rect _ slice _ flag is not present in the bitstream, the value of rect _ slice _ flag may be derived as 1. In addition, when the value of subpac _ info _ present _ flag is 1, the value of rect _ slice _ flag may be forced to 1.

A value of 1 for the syntax element single slice per sub flag may indicate that each sub picture consists of only one rectangular slice. A value of 0 of the single _ slice _ per _ sub _ flag may indicate that each sub-picture is composed of at least one rectangular slice. When the single _ slice _ per _ sub _ flag does not exist in the bitstream, the value of the single _ slice _ per _ sub _ flag may be derived as 0.

A value obtained by adding 1 to a syntax element num _ slices _ in _ pic _ minus1 may indicate the number of slices in a picture. A value of 0 for the syntax element tile _ idx _ delta _ present _ flag may indicate that the tile _ idx _ delta [ i ] syntax element is not present in the picture parameter set and that all pictures of the reference picture parameter set are partitioned into rectangular slice rows and rectangular slice columns according to the slice raster scan order. A value of 1 for tile _ idx _ delta _ present _ flag may indicate that tile _ idx _ delta [ i ] syntax elements may be present in the picture parameter set and that all rectangular slices of a picture belonging to the reference picture parameter set are specified according to increasing i values in the order indicated by the value of tile _ idx _ delta [ i ]. When the tile _ idx _ delta _ present _ flag does not exist, the value of the tile _ idx _ delta _ present _ flag may be derived as 0.

A value obtained by adding 1 to the syntax element slice _ width _ in _ tiles _ minus1[ i ] may indicate the width of the ith rectangular slice in units of tile columns. The value of slice _ width _ in _ tiles _ minus1[ i ] may have a value from 0 to NumTilecolumns-1. Here, when i is smaller than num _ slices _ in _ pic _ minus1 and the value of numtiecolumns is 1, the value of slice _ width _ in _ tiles _ minus1[ i ] may be derived as 0. Here, the variable numtilcolumns may indicate the number of patch columns present in the current picture. Here, the variable numtilrors may indicate the number of tile rows present in the current picture.

When the value of num _ exp _ slices _ in _ tile [ i ] is 0, a value obtained by adding 1 to a syntax element slice _ height _ in _ slices _ minus1[ i ] may indicate the height of the ith rectangular slice in units of tile rows. The value of slice _ height _ in _ tiles _ minus1[ i ] may have a value from 0 to NumTileRows-1. When the value of i is less than num _ slices _ in _ pic _ minus1 and the value of slice _ height _ in _ tiles _ minus1[ i ] is not obtained from the bitstream, the value of slice _ height _ in _ tiles _ minus1[ i ] can be derived by the following equation.

[ formula 1]

slice_height_in_tiles_minus1[i]＝NumTileRows＝＝10:slice_height_in_tiles_minus1[i-1]

SliceTopLeftTileIdx may be a variable indicating the index of the top left tile of the slice.

The syntax element num _ exp _ slices _ in _ tile [ i ] may indicate the number of heights of slices explicitly provided with respect to slices in a tile including the ith slice (e.g., a tile having the same tile index as slicetonlefttileidx [ i ]). The value of num _ exp _ slices _ in _ tile [ i ] can have a value from 0 to RowHeight [ SliceTopLeftTileIdx [ i ]/NumTilecolumns ] -1. When num _ exp _ slices _ in _ tile [ i ] is not provided from the bitstream, the value of num _ exp _ slices _ in _ tile [ i ] may be derived as 0. Here, rowHight [ i ] may be a variable indicating the height of the ith tile in CTB units. Here, when the value of num _ exp _ slices _ in _ tile [ i ] is 0, the tile including the ith slice may not be divided into a plurality of tiles.

A value obtained by adding 1 to the syntax element exp _ slice _ height _ in _ CTUs _ minus1[ i ] [ j ] may indicate the height of the jth rectangular slice in CTU units in the tile including the ith slice. The value of exp _ slice _ height _ in _ ctus _ minus1[ i ] [ j ] may have a value from 0 to RowHeight [ SliceTopLeftTileIdx [ i ]/NumTileCoolumns ] -1.

The variable numsolistontile [ i ] may indicate the number of slices present in the tile including the ith slice.

The syntax element tile _ idx _ delta [ i ] may indicate a difference between a tile index of a tile including the first CTU in the (i + 1) th rectangular slice and a tile index of a tile including the first CTU in the ith rectangular slice. the value of tile _ idx _ delta [ i ] may have a value from-NumTilesInPic +1 to NumTilesInPic-1. When the value of tile _ idx _ delta [ i ] is not present in the bitstream, the value of tile _ idx _ delta [ i ] can be derived as 0. When the value of tile _ idx _ delta [ i ] is present, the value of tile _ idx _ delta [ i ] may be forced to a non-zero value.

A value of 1 for the syntax element loop _ filter _ across _ tiles _ enabled _ flag may indicate that the in-loop filtering operation may be performed outside of the tile boundaries in the pictures of the reference picture parameter set. A value of 0 for loop _ filter _ across _ tiles _ enabled _ flag may indicate that the in-loop filtering operation is not performed outside of the tile boundaries in the pictures of the reference picture parameter set.

The in-loop filtering operation may include any of a deblocking filter, a Sample Adaptive Offset (SAO) filter, or an Adaptive Loop Filter (ALF). When the loop _ filter _ across _ tiles _ enabled _ flag does not exist in the bitstream, the value of the loop _ filter _ across _ tiles _ enabled _ flag may be derived as 1.

A value of 1 for the syntax element loop filter across slices enabled flag may indicate that the in-loop filtering operation may be performed outside slice boundaries in pictures of the reference picture parameter set. A value of 0 for loop _ filter _ across _ slice _ enabled _ flag may indicate that the in-loop filtering operation is not performed outside of the slice boundaries in the pictures of the reference picture parameter set. The in-loop filtering operation may include any of a deblocking filter, a Sample Adaptive Offset (SAO) filter, or an Adaptive Loop Filter (ALF). When the loop _ filter _ across _ slice _ enabled _ flag is not present in the bitstream, the value of the loop _ filter _ across _ slice _ enabled _ flag may be derived as 1.

Fig. 27 is a view showing an embodiment of syntax of a slice header. In the syntax of fig. 27, syntax elements are as follows.

The syntax element slice _ subppic _ ID may indicate a sub-picture ID of a sub-picture including a slice. When the value of slice _ sub _ id exists in the bitstream, the value of the variable CurrSubpicidx can be derived as the value of CurrSubpicidx using the value of SubpicidVal [ CurrSubpicidx ] of slice _ sub _ id. Otherwise, (slice _ sub _ id is not present in the bitstream), the value of currsubpapicciddx may be derived as 0. The length of the slice _ sub _ id may be sps _ sub _ id _ len _ minus1+1 bits. Here, numsoliseinsubapic [ i ] may be a variable indicating the number of slices in the i-th sub-picture. The variable currsubpaciidx may indicate the index of the current sub-picture.

The syntax element slice _ address indicates a slice address of the slice. When the slice _ address is not provided, the value of the slice _ address may be derived as 0.

Further, when the value of rect _ slice _ flag is 0, slice _ address may be equal to the raster scan tile index of the first tile in the slice, and the length of the slice _ address syntax element may be Ceil (Log 2 (numtilsinpic)) bits, and slice _ address may have a value from 0 to numtilsinpic-1. Otherwise, (if the value of rect _ slice _ flag is a non-zero value, e.g., 1), the address of the slice may be a sub-picture level slice index of the slice, and the length of the slice _ address syntax element may be Ceil (Log 2 (numsolicsinsubpic [ currsubapcidx ])) bits, and the slice _ address syntax element may have a value from 0 to numsolicsinsubpic [ currsubapcidx ] -1.

The syntax element sh _ extra _ bit [ i ] may have a value of 0 or 1. The decoding apparatus can perform decoding regardless of the value of the sh _ extra _ bit [ i ]. For this reason, the encoding apparatus needs to generate a bitstream such that decoding is performed regardless of the value of the sh _ extra _ bit [ i ]. Here, numExtraShBits may be a variable indicating the number of bits required to signal information further in the slice header.

The value obtained by adding 1 to the syntax element num _ tiles _ in _ slice _ minus1 may indicate the number of tiles in the slice (if present). The value of num _ tiles _ in _ slice _ minus1 may have a value from 0 to numsiesinpic-1.

A variable numctussnccurrslice indicating the number of CTUs in the current slice and a list ctbsaddrccurrslice [ i ] indicating the picture raster scan address of the ith CTB in the slice (where i has a value from 0 to numctussnccurrslice-1) may be derived as follows.

[ Table 2]

The variables SubpicLeftBoundarryPos, subpicTopBoundaryPos, subpicRightBoundarryPos, and SubpicBotBoudoriyPos can be derived according to the following algorithm.

[ Table 3]

Improvements in picture segmentation signalling

The above-described signaling related to picture division has a problem of signaling unnecessary information when a slice is a rectangular slice. For example, when the slices are quadrilateral (e.g., rectangular) slices, the width of an individual slice may be signaled in units of tiles. However, when the top-left tile of the slice is the tile of the last tile column, the width of the slice may not be a value other than one tile unit. For example, in this case, the width of the slice may have only a width value derived in one patch unit. Thus, the width of such a slice may not be signaled or may be limited to one patch unit.

Similarly, when the slices are quadrilateral (e.g., rectangular) slices, the width of the individual slices may be signaled in patch units. However, when the tile at the upper-left position of the slice is the tile of the last tile row, the height of the slice may not be a value other than one tile unit. Thus, the height of such a slice may not be signaled or may be limited to one patch unit.

In order to solve the above problem, the following method may be applied. The following embodiments are applicable when the slices are quadrilateral (e.g., rectangular) slices and the width and/or height of individual slices is signaled in units of tiles. The following methods may be applied alone or may be combined with at least one other embodiment.

Method 1. When the first tile of a rectangular slice (e.g., the tile in the upper left corner) is the tile located at the last tile column of the picture, signaling of the width of the slice may not be provided. In this case, the width of the slice can be derived as one patch unit.

For example, the syntax element slice _ width _ in _ tiles _ minus1[ i ] may not be present in the bitstream. The value of the syntax element slice _ width _ in _ tiles _ minus1[ i ] can be derived as 0.

Method 2. Signaling of the width of a slice can be provided even when the first tile of a rectangular slice (e.g., the top left tile) is the tile that is located in the last tile column of the picture. However, in this case, the width of the slice may be limited to one tile unit.

For example, the syntax element slice _ width _ in _ tiles _ minus1[ i ] may be present in the bitstream and thus parsed. However, the value of the syntax element slice _ width _ in _ tiles _ minus1[ i ] may be limited to 0.

Method 3. When the first tile of a rectangular slice (e.g., the tile in the upper left corner) is the tile located in the last tile row of the picture, no signaling of the slice height may be provided. In this case, the height of the slice can be derived as one patch unit.

For example, the syntax element slice _ height _ in _ tiles _ minus1[ i ] may not be present in the bitstream. The value of the syntax element slice _ height _ in _ tiles _ minus1[ i ] may be derived as 0.

Method 4. When the first tile of a rectangular slice (e.g., the tile in the upper left corner) is the tile located in the last tile row of the picture, signaling of the height of the slice can be provided. However, in this case, the height of the slice may be limited to one tile unit.

For example, the syntax element slice _ height _ in _ tiles _ minus1[ i ] may be present in the bitstream and thus parsed. However, the value of the syntax element slice _ height _ in _ tiles _ minus1[ i ] may be restricted to be equal to 0.

In one embodiment, the above embodiments are applicable to the encoding method and the decoding method shown in fig. 28 and 29. The encoding apparatus according to one embodiment may derive slices and/or patches in a current picture (S2810). In addition, the encoding apparatus may encode the current picture based on the derived slice and/or tile (S2820).

Similarly, the decoding apparatus according to the embodiment may acquire video/image information from the bitstream (S2910). In addition, the decoding apparatus may derive the slices and/or tiles present in the current picture based on the video/image information, including information on the slices and/or tiles (S2920). In addition, the decoding apparatus may reconstruct and/or decode the current picture based on the slice and/or the tile (S2930).

For the above-described processes of the encoding apparatus and the decoding apparatus, the information on the slice and/or the tile may include the above-described information and syntax. The video or image information may include HLS. The HLS may include information about the slices and/or information about the patches. The HLS may also include information about the sprite. The information on the slice may include information specifying at least one slice belonging to the current picture. In addition, the information about the tiles may include information specifying at least one tile belonging to the current picture. The information on the sub-picture may include information specifying at least one sub-picture belonging to the current picture. A tile including at least one slice may exist in one picture.

For example, at S2930 of fig. 29, the current picture may be reconstructed and/or decoded based on the derived slices and/or patches. By dividing one picture, encoding and decoding efficiency can be obtained in various aspects.

For example, a picture may be split for parallel processing and error recovery. In the case of parallel processing, some embodiments executing in a multi-core CPU may require segmentation of the source picture into tiles and/or slices. Individual slices and/or patches may be processed in parallel in different kernels. This is very efficient in order to perform high resolution real-time video coding, which cannot be performed by other methods. In addition, by reducing the information shared between tiles, this partitioning has the advantage of reducing the constraints on memory. Parallel architectures have usefulness because of their partitioning mechanism, since tiles are distributed to different threads while parallel processing is performed. For example, in deriving motion information in inter prediction, neighboring blocks that exist in different slices and/or patches may be restricted from being used. Context information used to encode information and/or syntax elements may be initialized for each slice and/or tile.

Error recovery may be achieved by applying Unequal Error Protection (UEP) to coded tiles and/or slices.

Embodiment mode 1

Hereinafter, embodiments based on the above-described methods 1 and 3 will be described. The following embodiments may be applied to improve encoding/decoding techniques, such as the VVC specification.

In one embodiment, a syntax table for signaling a picture parameter set may be set as shown in fig. 30. In another embodiment, a syntax table for signaling a picture parameter set may be set as shown in fig. 31.

In the embodiment of fig. 30, for i having a value from 0 to num _ slices _ in _ pic _ minus1-1, when the value of numtileconumns is greater than 1 and the value of slicottoplefttileidx [ i ]% numtileconumns is not numtileconumns-1, a syntax element slice _ width _ in _ tile _ minus1[ i ] may be sequentially acquired with respect to i.

In addition, for i having values from 0 to num _ slices _ in _ pic _ minus1-1, when the value of NumTileRows is greater than 1, tile _ idx _ delta _ present _ flag is 1 or the value of SliceTopLeftTileIdx [ i ]% NumTileCoolumns is 0 and the value of SliceTopLeftTileIdx [ i ]/NumTileCoolumns is not NumTileRows-1, the syntax elements slice _ height _ in _ tiles _ minus1[ i ] may be sequentially fetched with respect to i.

In the embodiments of fig. 30 and 31, the syntax element slice _ width _ in _ tiles _ minus1[ i ] may be a syntax element indicating the width of the ith rectangular slice. For example, a value obtained by adding 1 to slice _ width _ in _ tiles _ minus1[ i ] may indicate the width of the ith rectangular slice in units of tile columns. The value of slice _ width _ in _ tiles _ minus1[ i ] may have a value from 0 to NumTileCoolumns-1. When slice _ width _ in _ tiles _ minus1[ i ] is not acquired from the bitstream, the value of slice _ width _ in _ tiles _ minus1[ i ] may be derived as 0.

When the definition of slice _ width _ in _ tiles _ minus1[ i ] is changed as described above, the constraint that "when i is smaller than num _ slices _ in _ pic _ minus1 and the value of numtiecolumns is equal to 1, the value of slice _ width _ in _ tiles _ minus1[ i ] is derived to 0" may be omitted. Thus, as in the embodiment of fig. 31, the constraint "numtilcolumns >1" may be deleted from the picture parameter set syntax.

slice _ height _ in _ tiles _ minus1[ i ] may be a syntax element indicating the height of the ith rectangular slice. For example, when the value of num _ exp _ slices _ in _ tile [ i ] is 0, a value obtained by adding 1 to slice _ height _ in _ slices _ minus1[ i ] may indicate the height of the ith rectangular slice in units of tile rows. The value of slice _ height _ in _ tiles _ minus1[ i ] may have a value from 0 to NumTileRows-1.

When slice _ height _ in _ tiles _ minus1[ i ] is not obtained from the bitstream, the value of slice _ height _ in _ tiles _ minus1[ i ] may be derived as follows.

First, when the value of NumTileRow is 1 or the value of SliceTopLeftTileIdx [ i ]% NumTilecolumns is NumTilecolumns-1, the value of slice _ height _ in _ tiles _ minus1[ i ] can be derived as 0.

Otherwise (e.g., when the value of NumTileRow is not 1 and the value of SliceTopLeftTileIdx [ i ]% NumTilecolumns is not NumTilecolumns-1), the value of slice _ height _ in _ tiles _ minus1[ i ] can be derived as slice _ height _ in _ tiles _ minus1[ i-1]. For example, the value of slice _ height _ in _ tiles _ minus1[ i ] may be set to slice _ height _ in _ tiles _ minus1[ i-1] which is a height value of a previous slice. For example, the values of slice _ height _ in _ tiles _ minus1[ i ] for all slices in one tile may be set equally.

When the definition of slice _ height _ in _ tiles _ minus1[ i ] is changed as described above, the existing constraint of "when i is smaller than num _ slices _ in _ pic _ minus1 and the value of numtailrows is equal to 1, the value of slice _ width _ in _ tiles _ minus1[ i ] is derived to 0" may be omitted. Thus, as in the embodiment of fig. 31, the constraint "numtiledrows >1" may be removed from the picture parameter set syntax.

Embodiment mode 2

Hereinafter, embodiments based on the above-described methods 2 and 4 will be described. The following embodiments may be applied to improve encoding/decoding techniques, such as the VVC specification.

In one embodiment, slice _ width _ in _ tiles _ minus1[ i ] may be a syntax element indicating the width of the ith rectangular slice. For example, a value obtained by adding 1 to slice _ width _ in _ tiles _ minus1[ i ] may indicate the width of the ith rectangular slice in units of tile columns. The value of slice _ width _ in _ tiles _ minus1[ i ] may have a value from 0 to NumTilecolumns-1. When slice _ width _ in _ tiles _ minus1[ i ] is not acquired from the bitstream, the value of slice _ width _ in _ tiles _ minus1[ i ] may be derived as 0.

At this time, when i is less than num _ slices _ in _ pic _ minus1 and the value of numtiecolumns is equal to 1, the value of slice _ width _ in _ tiles _ minus1[ i ] may be derived as 0. In addition, for bitstream consistency, the value of slice _ width _ in _ tiles _ minus1[ i ] may be forced to 0 when the first tile of the ith rectangular slice is the last tile of a tile column.

At this time, when i is less than num _ slices _ in _ pic _ minus1 and the value of slice _ height _ in _ slices _ minus1[ i ] is not acquired from the bitstream, the value of slice _ height _ in _ slices _ minus1[ i ] may be determined according to the value of numtierows. This can be determined, for example, as shown below.

[ formula 2]

In addition, for bitstream consistency, the value of slice _ height _ in _ tiles _ minus1[ i ] may be forced to 0 when the first tile of the ith rectangular slice is the last tile of a tile row.

Coding and decoding method

Hereinafter, an image encoding method performed by the image encoding apparatus according to the embodiment and an image decoding method performed by the image decoding apparatus will be described.

First, the operation of the decoding apparatus will be described. The image decoding apparatus according to the embodiment may include a memory and a processor, and the decoding apparatus may perform decoding by an operation of the processor. Fig. 32 is a view illustrating an embodiment of a decoding method according to an embodiment.

The decoding apparatus according to an embodiment may acquire a syntax element no _ pic _ partition _ flag indicating the availability of the partitioning of the current picture from the bitstream. As described above, the decoding apparatus may determine the availability of the division of the current picture based on the value of no _ pic _ partition _ flag (S3210).

When the segmentation of the current picture is available, the decoding apparatus may acquire a syntax element num _ exp _ tile _ rows _ minus1 indicating the number of tile rows segmenting the current picture and a syntax element num _ exp _ tile _ columns _ minus1 indicating the number of tile columns from the bitstream and determine the number of tile rows and tile columns therefrom as described above (S3220).

Based on the number of tile columns, the decoding apparatus may acquire a syntax element tile _ column _ width _ minus1 indicating the width of each tile column segmenting the current picture from the bitstream and determine the width of each tile column therefrom as described above (S3230).

Based on the number of tile lines, the decoding apparatus may acquire a syntax element tile _ row _ height _ minus1[ i ] indicating a height of each tile line partitioning the current picture from the bitstream and determine the height of each tile line therefrom (S3240). In addition, the decoding apparatus can calculate the number of tiles that segment the current picture by multiplying the number of tile columns by the number of tile rows.

Next, the decoding apparatus may obtain a syntax element rect _ slice _ flag indicating whether the current picture is divided into rectangular slices based on whether the number of tiles dividing the current picture is greater than 1, and determine whether the current picture is divided into rectangular slices from the value thereof as described above (S3250).

Next, the decoding apparatus may acquire a syntax element num _ slices _ in _ pic _ minus1 indicating the number of slices dividing the current picture from the bitstream based on whether the current picture is divided into rectangular slices and determine the number of slices dividing the current picture therefrom as described above (S3260).

Next, the decoding apparatus may acquire size information indicating the size of each slice from the bitstream, which divides the current picture, as many as the number of slices dividing the current picture, from the bitstream (S3270).

Here, the size information may include a syntax element slice _ width _ in _ tiles _ minus1[ i ] that is width information indicating the width of the slice and a syntax element slice _ height _ in _ tiles _ minus1[ i ] that is height information indicating the height of the slice. slice _ width _ in _ tiles _ minus1[ i ] may indicate the width of a slice in units of tile columns, and slice _ height _ in _ tiles _ minus1[ i ] may indicate the height of a slice in units of tile rows.

Here, when the decoding apparatus acquires size information of a current slice (e.g., an ith slice) from the bitstream, the decoding apparatus may acquire slice _ width _ in _ tiles _ minus1[ i ] from the bitstream based on whether an upper-left tile of the current slice belongs to a last tile column of the current picture.

For example, when an upper left tile index (e.g., slicetopflettileidx) of the current slice is not a tile index corresponding to a last column of tile columns belonging to the current picture, slice _ width _ in _ tiles _ minus1[ i ] may be acquired from the bitstream. However, when the top-left tile index of the current slice is the tile index corresponding to the last column of the tile columns belonging to the current picture, slice _ width _ in _ tiles _ minus1[ i ] may not be acquired from the bitstream and may be determined to be 0.

Similarly, the decoding apparatus may obtain slice _ height _ in _ tiles _ minus1[ i ] from the bitstream based on whether the top-left tile of the current slice belongs to the last tile row of the current picture.

For example, when the top-left tile index of the current slice is not the tile index corresponding to the last row of the tile rows belonging to the current picture, slice _ height _ in _ tiles _ minus1[ i ] may be acquired from the bitstream. However, when the top-left tile index of the current slice is the tile index corresponding to the last row of the tile rows belonging to the current picture, slice _ height _ in _ tiles _ minus1[ i ] may not be acquired from the bitstream and may be determined to be 0.

Next, the decoding apparatus may determine a size of each slice dividing the current picture based on the size information and decode the determined slice, thereby decoding the image. For example, the CTUs included in the slice having the certain size may be decoded using inter prediction or intra prediction, thereby decoding the slice (S3280).

Further, sliceTopLeftTileIdx may be a variable indicating the top left tile of the slice, and may be determined by the algorithms of fig. 33 and 34. The algorithms of fig. 33 and 34 are one continuous algorithm.

Next, the operation of the encoding apparatus will be described. The image encoding apparatus according to the embodiment may include a memory and a processor, and the encoding apparatus may perform encoding in a manner corresponding to decoding by the decoding apparatus by operation of the processor. For example, as shown in fig. 35, the encoding apparatus may encode the current picture. First, the encoding apparatus may determine a tile column and a tile row of the current picture (S3510). Next, a slice of the segmented image may be determined (S3520). Next, the encoding apparatus may generate a bitstream including predetermined information including size information of the slice (S3530). For example, the encoding apparatus may generate a bitstream including no _ pic _ partition _ flag, num _ exp _ tile _ rows _ minus1, num _ exp _ tile _ columns _ minus1, tile _ column _ width _ minus1, tile _ row _ height _ minus1[ i ], rect _ slice _ flag, num _ slices _ in _ pic _ minus1, slice _ width _ in _ tiles _ minus1[ i ], and slice _ height _ in _ slices _ minus1[ i ], which are syntax elements acquired from the bitstream by the decoding apparatus.

At this time, size information may be included in the bitstream based on whether the current slice belongs to the last tile column or the last tile row of the current picture. For example, the encoding apparatus may encode the bitstream based on whether the top-left tile of the current slice is the last tile column and/or the last tile row such that slice _ width _ in _ tiles _ minus1[ i ] and slice _ height _ in _ tiles _ minus1[ i ] correspond to the description of the decoding apparatus. Here, the current slice may be a rectangular slice.

Application embodiment mode

While the exemplary methods of the present disclosure are illustrated as a series of acts for clarity of description, the order in which the steps are performed is not intended to be limited, and the steps may be performed concurrently or in a different order, as desired. To implement the method according to the present disclosure, the described steps may further include other steps, may include other steps than some steps, or may include other additional steps than some steps.

In the present disclosure, an image encoding apparatus or an image decoding apparatus that performs a predetermined operation (step) may perform an operation (step) of confirming an execution condition or situation of the corresponding operation (step). For example, if it is described that a predetermined operation is performed when a predetermined condition is satisfied, the image encoding apparatus or the image decoding apparatus may perform the predetermined operation after determining whether the predetermined condition is satisfied.

The various embodiments of the present disclosure are not a list of all possible combinations and are intended to describe representative aspects of the present disclosure, and the items described in the various embodiments may be applied independently or in combinations of two or more.

Various embodiments of the present disclosure may be implemented in hardware, firmware, software, or a combination thereof. In case of implementing the present disclosure by hardware, the present disclosure may be implemented by Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), general processors, controllers, micro-controllers, microprocessors, etc.

In addition, the image decoding apparatus and the image encoding apparatus to which embodiments of the present disclosure are applied may be included in a multimedia broadcast transmitting and receiving device, a mobile communication terminal, a home theater video device, a digital theater video device, a surveillance camera, a video chat device, a real-time communication device such as video communication, a mobile streaming device, a storage medium, a video camera, a video on demand (VoD) service providing device, an OTT video (over the top video) device, an internet streaming service providing device, a three-dimensional (3D) video device, a video telephony video device, a medical video device, and the like, and may be used to process a video signal or a data signal. For example, OTT video devices may include game consoles, blu-ray players, internet access televisions, home theater systems, smart phones, tablet PCs, digital Video Recorders (DVRs), and the like.

Fig. 36 is a view showing a content streaming system to which an embodiment of the present disclosure can be applied.

As shown in fig. 36, a content streaming system to which an embodiment of the present disclosure is applied may mainly include an encoding server, a streaming server, a web server, a media storage device, a user device, and a multimedia input device.

The encoding server compresses content input from a multimedia input device such as a smart phone, a camera, a camcorder, etc., into digital data to generate a bitstream and transmits the bitstream to the streaming server. As another example, when a multimedia input device such as a smart phone, a camera, a camcorder, etc. directly generates a bitstream, an encoding server may be omitted.

The bitstream may be generated by an image encoding method or an image encoding apparatus to which the embodiments of the present disclosure are applied, and the streaming server may temporarily store the bitstream in the course of transmitting or receiving the bitstream.

The streaming server transmits multimedia data to the user device based on a request of the user through the web server, and the web server serves as a medium for informing the user of the service. When a user requests a desired service from the web server, the web server may deliver it to the streaming server, and the streaming server may transmit multimedia data to the user. In this case, the content streaming system may include a separate control server. In this case, the control server serves to control commands/responses between devices in the content streaming system.

The streaming server may receive content from a media storage device and/or an encoding server. For example, when receiving content from an encoding server, the content may be received in real time. In this case, in order to provide a smooth streaming service, the streaming server may store the bitstream for a predetermined time.

Examples of user devices may include mobile phones, smart phones, laptop computers, digital broadcast terminals, personal Digital Assistants (PDAs), portable Multimedia Players (PMPs), navigation devices, slate PCs, tablet PCs, ultrabooks, wearable devices (e.g., smart watches, smart glasses, head-mounted displays), digital televisions, desktop computers, digital signage, and so forth.

Each server in the content streaming system may operate as a distributed server, in which case data received from each server may be distributed.

The scope of the present disclosure includes software or machine-executable instructions (e.g., operating systems, applications, firmware, programs, etc.) for enabling operations of methods according to various embodiments to be performed on a device or computer, non-transitory computer-readable media having such software or instructions stored thereon and executable on a device or computer.

Industrial applicability

Embodiments of the present disclosure may be used to encode or decode an image.

Claims

1. An image decoding method performed by an image decoding apparatus, the image decoding method comprising the steps of:

obtaining size information indicating a size of a current slice corresponding to at least a portion of a current picture from a bitstream; and

determining the size of the current slice based on the size information,

wherein the size information includes width information indicating a width of the current slice in a unit of tile columns and height information indicating a height of the current slice in a unit of tile rows, and

wherein the step of obtaining the size information is performed based on whether the current slice belongs to a last tile column or a last tile row of the current picture.

2. The image decoding method according to claim 1, wherein the width information of the current slice is not obtained from the bitstream based on that a top-left tile of the current slice belongs to a last tile column of the current picture.

3. The image decoding method of claim 1, wherein the width information of the current slice is obtained from the bitstream based on a top-left tile of the current slice not belonging to a last tile column of the current picture.

4. The image decoding method according to claim 1, wherein the width information of the current slice is not obtained from the bitstream based on that an upper-left tile of the current slice belongs to a last tile column of the current picture, and the width information is determined to be a predetermined value.

5. The image decoding method according to claim 4, wherein the predetermined value indicates one patch column.

6. The image decoding method according to claim 1, wherein the height information of the current slice is not obtained from the bitstream based on that an upper-left tile of the current slice belongs to a last tile row of the current picture.

7. The image decoding method of claim 1, wherein the height information of the current slice is obtained from the bitstream based on a top-left tile of the current slice not belonging to a last tile row of the current picture.

8. The image decoding method according to claim 1, wherein the height information of the current slice is not acquired from the bitstream based on that a top-left tile of the current slice belongs to a last tile row of the current picture, and the height information is determined to be a predetermined value.

9. The image decoding method according to claim 8, wherein the predetermined value indicates one tile row.

10. The image decoding method according to either one of claims 5 and 9, wherein the current slice is a rectangular slice.

11. The image decoding method according to claim 1,

wherein the step of acquiring the size information is performed based on a number of slices dividing the current picture,

wherein the number of slices that segment the current picture is determined by:

determining availability of segmentation of the current picture;

determining a number of tile rows and a number of tile columns that segment the current picture based on the availability of the segmentation of the current picture;

determining a width of each tile column segmenting the current picture based on the number of tile columns;

determining a height of each tile row partitioning the current picture based on the number of tile rows;

determining whether the current picture is divided into rectangular slices based on the number of slices dividing the current picture; and

the number of slices into which the current picture is divided is obtained from a bitstream based on whether the current picture is divided into rectangular slices.

12. An image decoding apparatus comprising a memory and at least one processor,

wherein the at least one processor performs the following:

determining the size of the current slice based on the size information,

wherein the size information includes width information indicating a width of the current slice in a tile row unit and height information indicating a height of the current slice in a tile row unit, and

wherein the size information is acquired based on whether the current slice belongs to a last tile column or a last tile row of the current picture.

13. An image encoding method performed by an image encoding apparatus, the image encoding method comprising the steps of:

determining a current slice corresponding to at least a portion of a current picture; and

generating a bitstream including size information of the current slice,

wherein the step of generating the bitstream is performed based on whether the current slice belongs to a last tile column or a last tile row of the current picture.

14. The image encoding method of claim 13, wherein the current slice is a rectangular slice.

15. A computer-readable recording medium storing a bitstream that causes a decoding apparatus to execute the image decoding method according to claim 1.