KR20150037660A - A method and an apparatus for encoding and decoding a multi-layer video signal - Google Patents

A method and an apparatus for encoding and decoding a multi-layer video signal Download PDF

Info

Publication number
KR20150037660A
KR20150037660A KR20140130927A KR20140130927A KR20150037660A KR 20150037660 A KR20150037660 A KR 20150037660A KR 20140130927 A KR20140130927 A KR 20140130927A KR 20140130927 A KR20140130927 A KR 20140130927A KR 20150037660 A KR20150037660 A KR 20150037660A
Authority
KR
South Korea
Prior art keywords
layer
tile
prediction
inter
picture
Prior art date
Application number
KR20140130927A
Other languages
Korean (ko)
Inventor
이배근
김주영
Original Assignee
주식회사 케이티
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 케이티 filed Critical 주식회사 케이티
Publication of KR20150037660A publication Critical patent/KR20150037660A/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/34Scalability techniques involving progressive bit-plane based encoding of the enhancement layer, e.g. fine granular scalability [FGS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A method for decoding a multi-layer video signal according to the present invention includes: determining a corresponding picture belonging to a lower layer used for intra-layer prediction of a current picture belonging to an upper layer; The current picture restrictively performs inter-layer prediction based on the limited predictive identifier.

Description

[0001] METHOD AND APPARATUS FOR ENCODING AND DECODING A MULTI-LAYER VIDEO SIGNAL [0002]

The present invention relates to a multi-layer video signal encoding / decoding method and apparatus.

Recently, the demand for high resolution and high quality images such as high definition (HD) image and ultra high definition (UHD) image is increasing in various applications. As the image data has high resolution and high quality, the amount of data increases relative to the existing image data. Therefore, when the image data is transmitted using a medium such as a wired / wireless broadband line or stored using an existing storage medium, The storage cost is increased. High-efficiency image compression techniques can be utilized to solve such problems as image data becomes high-resolution and high-quality.

An inter picture prediction technique for predicting a pixel value included in a current picture from a previous or a subsequent picture of a current picture by an image compression technique, an intra picture prediction technique for predicting a pixel value included in a current picture using pixel information in the current picture, There are various techniques such as an entropy encoding technique in which a short code is assigned to a value having a high appearance frequency and a long code is assigned to a value having a low appearance frequency. Image data can be effectively compressed and transmitted or stored using such an image compression technique.

On the other hand, demand for high-resolution images is increasing, and demand for stereoscopic image content as a new image service is also increasing. Video compression techniques are being discussed to effectively provide high resolution and ultra-high resolution stereoscopic content.

An object of the present invention is to provide a method and apparatus for determining an interlayer reference picture of a current picture of an upper layer in encoding / decoding a multi-layer video signal.

It is an object of the present invention to provide a method and apparatus for up-sampling a picture of a lower layer in encoding / decoding a multi-layer video signal.

An object of the present invention is to provide a method and apparatus for effectively encoding texture information of an upper layer through inter-layer prediction in encoding / decoding a multi-layer video signal.

An object of the present invention is to provide a limited inter-layer prediction method and apparatus based on inter-layer tile alignment in encoding / decoding a multi-layer video signal.

A method and apparatus for decoding a multi-layer video signal according to the present invention determines a corresponding picture belonging to a lower layer used for inter-layer prediction of a current picture belonging to an upper layer, And inter-layer prediction of the picture is performed.

In the method and apparatus for decoding a multi-layer video signal according to the present invention, the current picture restrictively performs inter-layer prediction based on a constrained prediction ID.

In the method and apparatus for decoding a multi-layer video signal according to the present invention, the limited predictive identifier specifies one of a first limited prediction mode, a second limited prediction mode, and a third limited prediction mode.

In the first limited prediction mode, a block of a tile set belonging to the current picture is a mode in which inter-layer prediction is not performed using a corresponding picture of the lower layer, The second limited prediction mode is a mode in which inter-layer prediction is performed using only a sample of a tile set belonging to a corresponding picture of the lower layer, and the third limited prediction mode is a mode in which a sample of a tile set belonging to the current picture is inter- And performing inter-layer prediction using the corresponding picture of the lower layer without restriction according to the first limited prediction mode and the second limited prediction mode.

In the multi-layer video signal decoding method and apparatus according to the present invention, the tile set is determined based on inter-layer constrained tile sets SEI message.

In the method and apparatus for decoding a multi-layer video signal according to the present invention, the inter-layer constrained tile sets SEI message includes tile count information, an upper left tile index and a lower right tile index.

In the method and apparatus for decoding a multi-layer video signal according to the present invention, the number of tiles is a value obtained by subtracting 1 from the number of rectangular areas belonging to the tile set, And the lower right tile index indicates the position of the tile located at the lower right end of the rectangular area belonging to the set of tiles.

A method and apparatus for encoding a multi-layer video signal according to the present invention determines a corresponding picture belonging to a lower layer used for inter-layer prediction of a current picture belonging to an upper layer, And inter-layer prediction of the picture is performed.

In the method and apparatus for encoding a multi-layer video signal according to the present invention, the current picture restrictively performs inter-layer prediction based on a constrained prediction identifier.

In the method and apparatus for encoding a multi-layer video signal according to the present invention, the limited prediction ID specifies one of a first limited prediction mode, a second limited prediction mode, and a third limited prediction mode.

In the first constrained prediction mode, a block of a tile set belonging to the current picture is a mode in which inter-layer prediction is not performed using a corresponding picture of the lower layer, The second limited prediction mode is a mode in which inter-layer prediction is performed using only a sample of a tile set belonging to a corresponding picture of the lower layer, and the third limited prediction mode is a mode in which a sample of a tile set belonging to the current picture is inter- And performing inter-layer prediction using the corresponding picture of the lower layer without restriction according to the first limited prediction mode and the second limited prediction mode.

In the multi-layer video signal encoding method and apparatus according to the present invention, the tile set is determined based on inter-layer constrained tile sets SEI message.

In the method and apparatus for encoding a multi-layer video signal according to the present invention, the inter-layer constrained tile sets SEI message includes tile count information, an upper left tile index and a lower right tile index.

In the method and apparatus for encoding a multi-layer video signal according to the present invention, the number of tiles is a value obtained by subtracting 1 from the number of rectangular areas belonging to the tile set, and the upper left tile index is the upper left corner in the rectangular area belonging to the tile set. And the lower right tile index indicates the position of the tile located at the lower right end of the rectangular area belonging to the set of tiles.

According to the present invention, a memory can be effectively managed by adaptively using a picture of a lower layer as an inter-layer reference picture of a current picture of an upper layer.

According to the present invention, a picture of a lower layer can be effectively upsampled.

According to the present invention, texture information of an upper layer can be effectively guided through inter-layer prediction.

According to the present invention, coding efficiency of a video signal can be improved by limiting inter-layer prediction based on tile alignment between layers in a multi-layer structure.

1 is a block diagram schematically illustrating an encoding apparatus according to an embodiment of the present invention.
2 is a block diagram schematically illustrating a decoding apparatus according to an embodiment of the present invention.
FIG. 3 is a flowchart illustrating a process of inter-layer prediction of an upper layer using a corresponding picture of a lower layer according to an embodiment of the present invention.
FIG. 4 illustrates a method of determining a corresponding picture of a lower layer based on a reference active flag, according to an embodiment to which the present invention is applied.
FIG. 5 illustrates a method of obtaining interlayer reference information for a current picture according to an embodiment of the present invention. Referring to FIG.
FIG. 6 shows a syntax table of interlayer reference information according to an embodiment to which the present invention is applied.
FIG. 7 illustrates a relationship between a slice and a tile according to an embodiment of the present invention.
FIG. 8 is a flowchart illustrating a method of performing limited inter-layer prediction according to whether tiles are arranged between layers, according to an embodiment of the present invention.
9 to 10 illustrate the syntax of a tile boundary alignment flag according to an embodiment to which the present invention is applied.
11 to 12 illustrate the syntax of an inter-layer constrained tile sets SEI message according to an embodiment to which the present invention is applied.
13 is a flowchart illustrating a method of upsampling a corresponding picture of a lower layer according to an embodiment of the present invention.

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. Prior to this, terms and words used in the present specification and claims should not be construed as limited to ordinary or dictionary terms, and the inventor should appropriately interpret the concepts of the terms appropriately It should be interpreted in accordance with the meaning and concept consistent with the technical idea of the present invention based on the principle that it can be defined. Therefore, the embodiments described in this specification and the configurations shown in the drawings are merely the most preferred embodiments of the present invention and do not represent all the technical ideas of the present invention. Therefore, It is to be understood that equivalents and modifications are possible.

When an element is referred to herein as being "connected" or "connected" to another element, it may mean directly connected or connected to the other element, Element may be present. In addition, the content of " including " a specific configuration in this specification does not exclude a configuration other than the configuration, and means that additional configurations can be included in the scope of the present invention or the scope of the present invention.

The terms first, second, etc. may be used to describe various configurations, but the configurations are not limited by the term. The terms are used for the purpose of distinguishing one configuration from another. For example, without departing from the scope of the present invention, the first configuration may be referred to as the second configuration, and similarly, the second configuration may be named as the first configuration.

In addition, the components shown in the embodiments of the present invention are shown independently to represent different characteristic functions, and do not mean that the components are composed of separate hardware or software constituent units. That is, each constituent unit is included in each constituent unit for convenience of explanation, and at least two constituent units of each constituent unit may form one constituent unit or one constituent unit may be divided into a plurality of constituent units to perform a function. The integrated embodiments and the separate embodiments of each component are also included in the scope of the present invention unless they depart from the essence of the present invention.

In addition, some of the components are not essential components to perform essential functions in the present invention, but may be optional components only to improve performance. The present invention can be implemented only with components essential for realizing the essence of the present invention, except for the components used for the performance improvement, and can be implemented by only including the essential components except the optional components used for performance improvement Are also included in the scope of the present invention.

The coding and decoding of video supporting a plurality of layers (multi-layers) in a bitstream is referred to as scalable video coding. Since there is a strong correlation between a plurality of layers, it is possible to remove redundant elements of data and improve the coding performance of an image by performing prediction using such a relation. Hereinafter, prediction of the current layer using information of another layer is referred to as inter-layer prediction or inter-layer prediction.

The plurality of layers may have different resolutions, where the resolution may refer to at least one of spatial resolution, temporal resolution, and image quality. Resampling such as up-sampling or down-sampling of a layer may be performed to adjust the resolution in the inter-layer prediction.

1 is a block diagram schematically illustrating an encoding apparatus according to an embodiment of the present invention.

The encoding apparatus 100 according to the present invention includes an encoding unit 100a for an upper layer and an encoding unit 100b for a lower layer.

The upper layer may be represented by a current layer or an enhancement layer and the lower layer may be represented by an enhancement layer, a base layer, or a reference layer having a resolution lower than that of the upper layer . The upper layer and the lower layer may have different spatial resolution, temporal resolution according to the frame rate, and image quality depending on the color format or the quantization size. Upsampling or downsampling of a layer may be performed when a resolution change is required to perform inter-layer prediction.

The encoding unit 100a of the upper layer includes a decomposing unit 110, a predicting unit 120, a transforming unit 130, a quantizing unit 140, a rearranging unit 150, an entropy encoding unit 160, 170, an inverse transform unit 180, a filter unit 190, and a memory 195.

The lower layer encoding unit 100b includes a partitioning unit 111, a predicting unit 125, a transforming unit 131, a quantizing unit 141, a reordering unit 151, an entropy coding unit 161, an inverse quantization unit 171, an inverse transform unit 181, a filter unit 191, and a memory 196.

The encoding unit may be implemented by the image encoding method described in the embodiments of the present invention, but operations in some components may not be performed for lowering the complexity of the encoding apparatus or for fast real-time encoding. For example, in performing intra-picture prediction in the prediction unit, it is not necessary to use a method of selecting an optimal intra-picture coding method using all the intra-picture prediction mode methods in order to perform coding in real time, The intra-picture prediction mode may be used as the final intra-picture prediction mode. As another example, it is also possible to restrictively use the type of the prediction block used in intra-picture prediction or inter-picture prediction.

The unit of the block processed by the encoding apparatus may be a coding unit for performing encoding, a prediction unit for performing prediction, and a conversion unit for performing conversion. The coding unit can be expressed by CU (Coding Unit), the prediction unit by PU (Prediction Unit), and the conversion unit by TU (Transform Unit).

In the division units 110 and 111, the layer image is divided into a plurality of encoding blocks, a prediction block, and a conversion block, and is divided into a coding block, a prediction block, Can be selected to divide the layer. For example, a recursive tree structure such as a quad tree structure can be used to divide an encoding unit in a layer image. Hereinafter, in the embodiment of the present invention, the meaning of a coding block may be used not only for a coding block but also for a block to perform decoding.

The prediction block may be a unit for performing prediction such as intra-picture prediction or inter-picture prediction. The block for intra prediction may be a square block such as 2Nx2N, NxN. As a block for performing inter picture prediction, there is a prediction block dividing method using AMP (Asymmetric Motion Partitioning), which is a square shape such as 2Nx2N or NxN or a rectangular shape or an asymmetric shape such as 2NxN and Nx2N. The method of performing the transform in the transform unit 115 may vary depending on the type of the prediction block.

The prediction units 120 and 125 of the encoding units 100a and 100b include intra prediction units 121 and 126 for performing intra prediction and inter prediction units for performing inter prediction, (122, 127). The predicting unit 120 of the upper layer encoding unit 100a may further include an inter-layer predicting unit 123 that performs prediction on an upper layer using information of a lower layer.

The prediction units 120 and 125 can determine whether to use inter-picture prediction or intra-picture prediction for the prediction block. The process of determining an intra prediction mode in units of prediction blocks in performing intra prediction and performing intra prediction on the basis of the determined intra prediction mode may be performed on a conversion block basis. The residual value (residual block) between the generated prediction block and the original block can be input to the conversion units 130 and 131. In addition, the prediction mode information, motion information, and the like used for prediction can be encoded by the entropy encoding unit 130 and transmitted to the decoding apparatus together with the residual value.

When the PCM (Pulse Coded Modulation) coding mode is used, it is also possible to directly encode the original block and transmit it to the decoding unit without performing the prediction through the prediction units 120 and 125.

Intra prediction units 121 and 126 can generate a predicted block on the basis of reference pixels existing in the vicinity of the current block (block to be predicted). In the intra prediction method, the intra prediction mode may have a directional prediction mode using the reference pixel according to the prediction direction and a non-directional mode not considering the prediction direction. The mode for predicting luma information and the mode for predicting chrominance information may be different types. In order to predict the color difference information, an intra prediction mode in which luma information is predicted or predicted luma information can be utilized. If the reference pixel is not available, replace the unavailable reference pixel with another pixel and use it to create a prediction block.

The prediction block may include a plurality of transform blocks. When intra prediction is performed, if the size of the prediction block and the size of the transform block are the same, a pixel existing on the left side of the prediction block, In-picture prediction for the prediction block based on the pixels existing in the prediction block. However, when intra prediction is performed, when the size of the prediction block is different from the size of the transform block, when a plurality of transform blocks are included in the prediction block, the intra-picture prediction is performed using the neighboring pixels adjacent to the transform block as reference pixels. Can be performed. Here, the neighboring pixels adjacent to the transform block may include at least one of neighboring pixels adjacent to the prediction block and pixels already decoded in the prediction block.

The intra-picture prediction method can generate a prediction block after applying a mode dependent intra-smoothing (MDIS) filter to the reference picture according to the intra-picture prediction mode. The type of MDIS filter applied to the reference pixel may be different. The MDIS filter can be used to reduce residuals in intra-frame predicted blocks generated after performing intra-prediction and applied to reference pixels and prediction as additional filters applied to intra-frame predicted blocks. In performing MDIS filtering, the filtering of the reference pixel and some columns included in the intra prediction block can perform filtering according to the direction of the intra prediction mode.

The inter-picture prediction units 122 and 127 can perform prediction by referring to information of a block included in at least one of a previous picture of a current picture or a following picture. The inter-picture prediction units 122 and 127 may include a reference picture interpolating unit, a motion predicting unit, and a motion compensating unit.

In the reference picture interpolating unit, the reference picture information is supplied from the memories 195 and 196, and pixel information of an integer pixel or less can be generated in the reference picture. In the case of luma pixels, a DCT-based interpolation filter (DCT) based on a different filter coefficient may be used to generate pixel information of an integer number of pixels or less in units of quarter pixels. In the case of a color difference signal, a DCT-based 4-tap interpolation filter having a different filter coefficient may be used to generate pixel information of an integer number of pixels or less in units of 1/8 pixel.

The inter-picture prediction units 122 and 127 can perform motion prediction based on the reference pictures interpolated by the reference picture interpolating unit. Various methods such as Full Search-based Block Matching Algorithm (FBMA), Three Step Search (TSS), and New Three-Step Search Algorithm (NTS) can be used to calculate motion vectors. The motion vector may have a motion vector value of 1/2 or 1/4 pixel unit based on the interpolated pixel. The inter-picture prediction units 122 and 127 can perform prediction on the current block by applying one inter-picture prediction method among various inter-picture prediction methods.

As the inter-picture prediction method, various methods such as a skip method, a merge method, and a method using a motion vector predictor (MVP) can be used.

In the inter-picture prediction, information such as motion information, such as reference indices, motion vectors, and residual signals, is entropy-encoded and transmitted to the decoding unit. When the skip mode is applied, a residual signal is not generated, so that the conversion and quantization process for the residual signal may be omitted.

The inter-layer predicting unit 123 performs inter-layer prediction for predicting an upper layer using information of the lower layer. The inter-layer predicting unit 123 may perform inter-layer prediction using texture information and motion information of a lower layer.

Inter-layer prediction can predict a current block of an upper layer using motion information on a picture of a lower layer (reference layer) using a picture of a lower layer as a reference picture. A picture of a reference layer used as a reference picture in inter-layer prediction may be a picture sampled according to the resolution of the current layer. In addition, the motion information may include a motion vector and a reference index. At this time, the value of the motion vector for the picture of the reference layer can be set to (0, 0).

As an example of inter-layer prediction, a prediction method of using a picture of a lower layer as a reference picture has been described, but the present invention is not limited to this. The inter-layer predicting unit 123 may perform inter-layer texture prediction, inter-layer motion prediction, inter-layer syntax prediction, inter-layer difference prediction, and the like.

Inter-layer texture prediction can derive the texture of the current layer based on the texture of the reference layer. The texture of the reference layer can be sampled according to the resolution of the current layer, and the inter-layer predicting unit 123 can predict the texture of the current layer based on the texture of the sampled reference layer.

The inter-layer motion prediction can derive the motion vector of the current layer based on the motion vector of the reference layer. At this time, the motion vector of the reference layer can be scaled according to the resolution of the current layer. In the inter-layer syntax prediction, the syntax of the current layer can be predicted based on the syntax of the reference layer. For example, the inter-layer predicting unit 123 may use the syntax of the reference layer as the syntax of the current layer. In the inter-layer difference prediction, the picture of the current layer can be restored by using the difference between the restored image of the reference layer and the restored image of the current layer.

A residual block including residue information which is a difference value between the prediction blocks generated by the prediction units 120 and 125 and the reconstruction blocks of the prediction blocks is generated and the residual blocks are input to the transform units 130 and 131. [

The transforming units 130 and 131 can transform the residual block using a transform method such as DCT (Discrete Cosine Transform) or DST (Discrete Sine Transform). Whether to apply the DCT or the DST to transform the residual block can be determined based on the intra prediction mode information and the prediction block size information of the prediction block used to generate the residual block. That is, the transforming units 130 and 131 can apply the transforming method differently according to the size of the prediction block and the prediction method.

The quantization units 140 and 141 may quantize the values converted into the frequency domain by the transform units 130 and 131. [ The quantization factor may vary depending on the block or the importance of the image. The values calculated by the quantization units 140 and 141 may be provided to the dequantization units 170 and 17 and the reordering units 150 and 151, respectively.

The reordering units 150 and 151 can reorder the coefficient values with respect to the quantized residual values. The reordering units 150 and 151 may change the two-dimensional block type coefficient to a one-dimensional vector form through a coefficient scanning method. For example, the rearrangement units 150 and 151 may scan a DC coefficient to a coefficient of a high frequency region using a Zig-Zag scan method, and change the DC coefficient to a one-dimensional vector form. A vertical scanning method of scanning a two-dimensional block type coefficient in a column direction instead of a jig-jag scanning method according to a size of a conversion block and an intra-picture prediction mode, and a horizontal scanning method of scanning a two- Can be used. That is, it is possible to determine whether any scan method among the jig-jag scan, the vertical scan and the horizontal scan is used according to the size of the conversion block and the intra prediction mode.

The entropy encoding units 160 and 161 can perform entropy encoding based on the values calculated by the reordering units 150 and 151. [ For entropy encoding, various encoding methods such as Exponential Golomb, Context-Adaptive Variable Length Coding (CAVLC), and Context-Adaptive Binary Arithmetic Coding (CABAC) may be used.

The entropy encoding units 160 and 161 receive the residual value coefficient information, the block type information, the prediction mode information, the division unit information, the prediction block information, and the transmission information of the encoding block from the reordering units 150 and 151 and the prediction units 120 and 125, And various information such as unit information, motion information, reference frame information, block interpolation information, filtering information, and the like, and performs entropy encoding based on a predetermined encoding method. In addition, the entropy encoding units 160 and 161 can entropy-encode the coefficient values of the encoding units input from the reordering units 150 and 151.

The entropy encoding units 160 and 161 may encode the intra-picture prediction mode information of the current block by performing binarization on the intra-picture prediction mode information. The entropy encoding units 160 and 161 may include a codeword mapping unit for performing such a binarization operation, and binarization may be performed differently depending on the size of a prediction block for performing intra prediction. In the codeword mapping unit, a codeword mapping table may be adaptively generated or stored in advance through a binarization operation. In another embodiment, the entropy encoding units 160 and 161 may represent the current intra prediction mode information using a codeword mapping unit that performs codeword mapping and a codeword mapping unit that performs codeword mapping. In the codeword mapping unit and the codeword mapping unit, a codeword mapping table and a codeword mapping table may be generated or stored.

The inverse quantization units 170 and 171 and the inverse transform units 180 and 181 dequantize the quantized values in the quantization units 140 and 141 and invert the converted values in the transform units 130 and 131. The residual values generated by the inverse quantization units 170 and 171 and the inverse transform units 180 and 181 are predicted through a motion estimation unit, a motion compensation unit, and an intra prediction unit included in the prediction units 120 and 125, It can be combined with the prediction block to generate a reconstructed block.

The filter units 190 and 191 may include at least one of a deblocking filter and an offset correcting unit.

The deblocking filter can remove block distortion caused by the boundary between the blocks in the reconstructed picture. It may be determined whether to apply a deblocking filter to the current block based on pixels included in a few columns or rows included in the block to determine whether to perform deblocking. When a deblocking filter is applied to a block, a strong filter or a weak filter may be applied according to the deblocking filtering strength required. In applying the deblocking filter, horizontal filtering and vertical filtering may be performed concurrently when vertical filtering and horizontal filtering are performed.

The offset correction unit may correct the offset of the deblocked image with respect to the original image in units of pixels. In order to perform offset correction for a specific picture, pixels included in an image are divided into a predetermined area, and then an area to be offset is determined, and an offset is applied to the area, or an offset is applied considering edge information of each pixel Can be used.

The filter units 190 and 191 may apply only the deblocking filter without applying both the deblocking filter and the offset correction, or both the deblocking filter and the offset correction.

The memories 195 and 196 may store restored blocks or pictures calculated through the filter units 190 and 191 and the stored restored blocks or pictures may be provided to the predicting units 120 and 125 have.

The information output from the entropy encoding unit 100b of the lower layer and the information output from the entropy encoding unit 100a of the upper layer can be multiplexed by the MUX 197 and output as a bitstream.

The MUX 197 may be included in the encoding unit 100a of the upper layer or the encoding unit 100b of the lower layer or may be implemented as an independent device or module separate from the encoding unit 100. [

2 is a block diagram schematically illustrating a decoding apparatus according to an embodiment of the present invention.

As shown in FIG. 2, the decoding apparatus 200 includes a decoding unit 200a of an upper layer and a decoding unit 200b of a lower layer.

The decryption unit 200a of the upper layer includes an entropy decoding unit 210, a reordering unit 220, an inverse quantization unit 230, an inverse transformation unit 240, a prediction unit 250, a filter unit 260, a memory 270 ).

The lower layer decoding unit 200b includes an entropy decoding unit 211, a rearrangement unit 221, an inverse quantization unit 231, an inverse transformation unit 241, a prediction unit 251, a filter unit 261, a memory 271 ).

When a bitstream including a plurality of layers is transmitted from the encoding apparatus, the DEMUX 280 demultiplexes information for each layer and transmits the demultiplexed information to the decoding units 200a and 200b for the respective layers. The input bitstream can be decoded in a procedure opposite to that of the encoding apparatus.

The entropy decoding units 210 and 211 may perform entropy decoding in a procedure opposite to that in which entropy encoding is performed in the entropy encoding unit of the encoding apparatus. The information for generating a prediction block from the information decoded by the entropy decoding units 210 and 211 is provided to the predictors 250 and 251 and the residual values obtained by performing entropy decoding in the entropy decoding units 210 and 211, (220, 221).

As with the entropy encoding units 160 and 161, the entropy decoding units 210 and 211 may use at least one of CABAC and CAVLC.

The entropy decoding units 210 and 211 can decode information related to the intra-picture prediction and the inter-picture prediction performed by the coding apparatus. The entropy decoding units 210 and 211 may include a codeword mapping table for generating a codeword including the codeword mapping unit in the in-picture prediction mode number. The codeword mapping table may be pre-stored or adaptively generated. When using the code-mapped mapping table, a code-mapped mapping unit for performing code-mapped mapping may additionally be provided.

The reordering units 220 and 221 can perform reordering based on a method in which the entropy decoding units 210 and 211 rearrange the entropy-decoded bitstreams in the encoding unit. The coefficients represented by the one-dimensional vector form can be rearranged by restoring the coefficients of the two-dimensional block form again. The reordering units 220 and 221 can perform reordering by providing information related to the coefficient scanning performed by the encoding unit and performing a reverse scanning based on the scanning order performed by the encoding unit.

The inverse quantization units 230 and 231 may perform inverse quantization based on the quantization parameters provided by the encoding apparatus and the coefficient values of the re-arranged blocks.

The inverse transform units 240 and 241 may perform inverse DCT or inverse DST on the DCT or DST performed by the transform units 130 and 131 with respect to the quantization result performed by the encoding apparatus. The inverse transform can be performed based on the transmission unit determined by the encoding apparatus. In the transforming unit of the encoding apparatus, DCT and DST can be selectively performed according to a plurality of information such as a prediction method, a size and a prediction direction of a current block, and the inverse transforming units 240 and 241 of a decoding apparatus It is possible to perform an inverse conversion based on the performed conversion information. Conversion can be performed based on an encoding block rather than a conversion block.

The prediction units 250 and 251 can generate prediction blocks based on the prediction block generation related information provided by the entropy decoding units 210 and 211 and the previously decoded blocks or picture information provided in the memories 270 and 271 .

The prediction units 250 and 251 may include a prediction unit determination unit, an inter-frame prediction unit, and an intra-frame prediction unit.

The prediction unit determination unit receives various information such as prediction unit information input from the entropy decoding unit, prediction mode information of the intra prediction method, motion prediction related information of the inter picture prediction method, and separates prediction blocks in the current coding block. It is possible to determine whether the inter-picture prediction is performed or the intra-picture prediction is performed.

The inter-picture prediction unit uses the information necessary for the inter-picture prediction of the current prediction block provided by the coding apparatus to predict the current picture based on the information included in at least one of the previous picture of the current picture or the following picture The inter-picture prediction can be performed. In order to perform inter-picture prediction, a motion prediction method of a prediction block included in a coded block based on a coded block is classified into a skip mode, a merge mode, a mode using an MVP (motion vector predictor) Mode) can be determined.

The intra prediction unit can generate a prediction block based on the reconstructed pixel information in the current picture. If the prediction block is a prediction block in which intra prediction is performed, intra prediction can be performed based on intra prediction mode information of the prediction block provided by the encoder. The intra-picture prediction unit includes an MDIS filter that performs filtering on the reference pixels of the current block, a reference pixel interpolator that interpolates reference pixels to generate reference pixels of a pixel unit less than an integer value, Lt; RTI ID = 0.0 > DCF < / RTI >

The predicting unit 250 of the upper layer decoding unit 200a may further include an inter-layer predicting unit for performing inter-layer prediction for predicting an upper layer using information of a lower layer.

The inter-layer prediction unit may perform inter-layer prediction using intra-picture prediction mode information, motion information, and the like.

Inter-layer prediction can predict a current block of an upper layer using motion information on a lower layer (reference layer) picture using a picture of a lower layer as a reference picture.

A picture of a reference layer used as a reference picture in inter-layer prediction may be a picture sampled according to the resolution of the current layer. In addition, the motion information may include a motion vector and a reference index. At this time, the value of the motion vector for the picture of the reference layer can be set to (0, 0).

As an example of inter-layer prediction, a prediction method of using a picture of a lower layer as a reference picture has been described, but the present invention is not limited to this. The inter-layer predicting unit 123 may further perform inter-layer texture prediction, inter-layer motion prediction, inter-layer syntax prediction, and inter-layer difference prediction.

Inter-layer texture prediction can derive the texture of the current layer based on the texture of the reference layer. The texture of the reference layer can be sampled to the resolution of the current layer, and the inter-layer prediction unit can predict the texture of the current layer based on the sampled texture. The inter-layer motion prediction can derive the motion vector of the current layer based on the motion vector of the reference layer. At this time, the motion vector of the reference layer can be scaled according to the resolution of the current layer. In the inter-layer syntax prediction, the syntax of the current layer can be predicted based on the syntax of the reference layer. For example, the inter-layer predicting unit 123 may use the syntax of the reference layer as the syntax of the current layer. In the inter-layer difference prediction, the picture of the current layer can be restored by using the difference between the restored image of the reference layer and the restored image of the current layer.

The reconstructed block or picture may be provided to the filter units 260 and 261. The filter units 260 and 261 may include a deblocking filter and an offset correction unit.

Information on whether or not a deblocking filter has been applied to the block or picture from the encoding device and information on whether a strong filter or a weak filter is applied can be provided when the deblocking filter is applied. In the deblocking filter of the decoding apparatus, the deblocking filter related information provided by the encoding apparatus is provided, and the decoding apparatus can perform deblocking filtering on the corresponding block.

The offset correction unit may perform offset correction on the reconstructed image based on the type of offset correction applied to the image and the offset value information during encoding.

The memories 270 and 271 can store the reconstructed picture or block to be used as a reference picture or a reference block, and can also output the reconstructed picture.

The encoding apparatus and the decoding apparatus can perform encoding on three or more layers instead of two layers. In this case, the encoding unit for the upper layer and the decoding unit for the upper layer are provided in a plurality corresponding to the number of the upper layers .

In SVC (Scalable Video Coding) which supports multi-layer structure, there is a relation between layers. By using this association, prediction can be performed to remove redundant elements of data and enhance the image coding performance.

Therefore, in the case of predicting a picture (video) of a current layer (enhancement layer) to be encoded / decoded, not only inter prediction or intra prediction using information of the current layer but also interlayer prediction using information of another layer can be performed .

In performing inter-layer prediction, the current layer may generate a prediction sample of a current layer using a decoded picture of a reference layer used for inter-layer prediction as a reference picture.

At this time, since at least one of the spatial resolution, the temporal resolution, and the image quality may be different between the current layer and the reference layer (i.e., due to the inter-layer scalability difference), the picture of the decoded reference layer, After resampling is performed, it can be used as a reference picture for interlayer prediction of the current layer. Resampling means up-sampling or down-sampling of the samples of the reference layer picture in accordance with the picture size of the current layer.

In this specification, a current layer refers to a layer on which encoding or decoding is currently performed, and may be an enhancement layer or an upper layer. A reference layer is a layer that the current layer refers to for interlayer prediction, and can be a base layer or a lower layer. A picture of a reference layer (i.e., a reference picture) used for inter-layer prediction of the current layer may be referred to as an inter-layer reference picture or a inter-layer reference picture.

FIG. 3 is a flowchart illustrating a process of inter-layer prediction of an upper layer using a corresponding picture of a lower layer according to an embodiment of the present invention.

Referring to FIG. 3, a corresponding picture belonging to a lower layer used for intra-layer prediction of a current picture belonging to an upper layer can be determined (S300).

Here, the lower layer may denote a base layer or another enhancement layer having a lower resolution than the upper layer. The corresponding picture may mean a picture located in the same time zone as the current picture of the upper layer.

For example, the corresponding picture may be a picture having picture order count (POC) information that is the same as the current picture of the upper layer. The corresponding picture may belong to the same access unit (AU) as the current picture of the upper layer. The corresponding picture may have the same temporal level identifier (TemporalID) as the current picture of the upper layer. Here, the time level identifier may mean an identifier for specifying each of a plurality of scalably coded layers according to a temporal resolution.

The current block may use a corresponding picture of one or more lower layers for inter-layer prediction. A method of determining the corresponding picture will be described with reference to FIGS. 4 to 6. FIG.

Layer prediction of the current picture using the corresponding picture of the lower layer determined in step S300 (S310).

Specifically, a reference block belonging to the corresponding picture of the lower layer is specified based on the motion vector of the current block, and the sample value or the texture information of the current block can be predicted using the restored sample value or texture information of the specified reference block have.

Alternatively, the current block may be a block belonging to a corresponding picture of a lower layer, and a block at the same position as the current block may be used as a reference block. If the reference picture of the current block is an inter-layer reference picture used for inter-layer prediction, the motion vector of the current block may be set to (0, 0).

Alternatively, inter-layer prediction of the current picture of the upper layer may be limited depending on whether the tiles are arranged between the multi-layers, and this will be described with reference to FIGS. 8 to 12. FIG.

On the other hand, if the current picture of the upper layer and the corresponding picture of the lower layer have different spatial resolutions, the current picture may be used as an interlayer reference picture by upsampling the corresponding picture of the lower layer. A method of upsampling a corresponding picture of a lower layer will be described with reference to FIG.

FIG. 4 illustrates a method of determining a corresponding picture of a lower layer based on a reference active flag, according to an embodiment to which the present invention is applied.

Referring to FIG. 4, a reference active flag may be obtained from the bitstream (S400).

The reference active flag (all_ref_layers_active_flag) may indicate whether the constraint that the corresponding layer of all layers having direct dependency with the upper layer is used for inter-layer prediction of the current picture is applied. The reference active flag may be obtained in a video parameter set of the bitstream.

Here, whether or not the layer is a layer having an upper layer and direct dependency can be determined based on a direct dependency flag (direct_dependency_flag [i] [j]). The direct dependency flag (direct_dependency_flag [i] [j]) may indicate whether the jth layer is used for inter-layer prediction of the i-th upper layer.

For example, when the value of the direct dependency flag is 1, the j-th layer can be used for inter-layer prediction of the i-th upper layer. If the value of the direct dependency flag is 0, it may not be used for inter-layer prediction of the i-th upper layer.

It is possible to check whether the value of the reference active flag is 1 (S410).

When the value of the reference active flag is 1, a restriction is applied that the corresponding picture of all layers having an upper layer and direct dependency is used for inter-layer prediction of the current picture. In this case, the corresponding picture of all the layers having the upper layer and the direct dependency may be included in the reference picture list of the current picture. Therefore, the corresponding picture of all the layers having the upper layer and the direct dependency can be determined as a corresponding picture used for inter-layer prediction of the current picture (S420).

On the other hand, when the value of the reference active flag is 0, the constraint that the corresponding picture of all the layers having the upper layer and direct dependency is used for inter-layer prediction of the current picture is not applied. That is, the current picture of the upper layer may perform inter-layer prediction using the corresponding picture of all the layers having the upper layer and the direct dependency, or only the corresponding picture of all layers having the upper layer and direct dependency Alternatively, inter-layer prediction may be performed. That is, when the value of the reference active flag is 0, the reference picture list of the current picture may include an upper layer and corresponding pictures of all layers having direct dependency, or only some corresponding pictures may be selectively included. Therefore, it is necessary to specify a corresponding picture used for inter-layer prediction of the current picture among the layers having the upper layer and the direct dependency. For this purpose, the interlayer reference information for the current picture may be obtained (S430).

Here, the interlayer reference information may include at least one of an interlayer prediction flag, the number of reference pictures, or a reference layer identifier.

Specifically, the inter-layer prediction flag may indicate whether inter-layer prediction is used in the decoding process of the current picture. The number information of reference pictures can indicate the number of corresponding pictures used in the inter-layer prediction of the current picture. The number information of the reference pictures can be encoded and coded to a value obtained by subtracting 1 from the number of corresponding pictures used for inter-layer prediction of the current picture for coding efficiency. The reference layer identifier may mean a layer identifier (layerId) of a layer including a corresponding picture used for inter-layer prediction of the current picture.

A method of acquiring the interlayer reference information will be described with reference to FIGS. 5 and 6. FIG.

Based on the interlayer reference information obtained in S430, a corresponding picture used for inter-layer prediction can be determined (S440).

For example, when the value of the inter-layer prediction flag of the current picture is 1, the current picture means that inter-layer prediction is performed. In this case, among the layers having the upper layer and the direct dependence, the corresponding picture of the layer specified by the reference layer identifier can be determined as a corresponding picture used for inter-layer prediction of the current picture.

On the other hand, when the value of the interlayer prediction flag of the current picture is 0, since the current picture does not perform inter-layer prediction, the corresponding picture of all the layers having the upper layer and direct dependency is inter- It will not be used.

FIG. 5 illustrates a method of acquiring interlayer reference information for a current picture according to an embodiment of the present invention. FIG. 6 illustrates an example of a syntax table of interlayer reference information FIG.

Referring to FIG. 5, the interlayer prediction flag may be obtained based on the reference active flag (S500).

Referring to FIG. 6, the interlayer prediction flag inter_layer_pred_enabled_flag may be obtained only when the value of the reference active flag (all_ref_layers_active_flag) is 0 (S600).

When the value of the reference active flag is 1, this means that the corresponding picture of all the layers having an upper layer and direct dependency is used for inter-layer prediction of the current picture. Therefore, in this case, it is not necessary to signal the interlayer prediction flag in the header information (for example, the slice segment header) of the current picture.

Referring to FIG. 6, if the layer identifier (nuh_layer_id) of the upper layer including the current picture is greater than 0, the interlayer prediction flag can be obtained. When the layer identifier of the upper layer is 0, the upper layer corresponds to a base layer which does not perform inter-layer prediction among the multi-layers.

Also, referring to FIG. 6, the interlayer prediction flag may be obtained when the number of layers (NumDirectRefLayers) having an upper layer and a direct dependency is at least one or more. This is because, if there is no layer having an upper layer and a direct dependency, all the pictures of the upper layer do not perform inter-layer prediction.

Referring to FIG. 5, it can be checked whether the value of the interlayer prediction flag obtained in S500 is 1 (S510).

As a result of the check in S510, when the value of the interlayer prediction flag is 1, the number information of the reference pictures can be obtained (S520).

As shown in FIG. 4, the number information of the reference picture can indicate the number of corresponding pictures used in the inter-layer prediction of the current picture among the corresponding pictures of the layer having the upper layer and the direct dependency.

6, when the number of layers having the upper layer and the direct dependency (NumDirectRefLayers) is 1, the number of corresponding pictures used in inter-layer prediction of the current picture can not exceed 1. Therefore, It is not necessary to signal the number information (num_inter_layer_ref_pics_minus1) of the reference pictures. In this case, the number information of the reference picture may not be obtained, and the number of corresponding pictures used for inter-layer prediction of the current picture may be derived to one.

On the other hand, the number information of the reference pictures can be limitedly obtained based on the maximum active reference flag.

Here, the maximum active reference flag may indicate whether or not at most one corresponding picture is used for intra-layer prediction of the current picture. For example, when the value of the maximum active reference flag is 1, the current picture performs inter-layer prediction using only a maximum of one corresponding picture. If the value of the maximum active reference flag is 0, Inter-layer prediction can be performed using a picture.

Referring to FIG. 6, the number information of reference pictures can be obtained only when the value of the maximum active reference flag (max_one_active_ref_layer_flag) is zero. That is, when the value of the maximum active reference flag is 1, the number of corresponding pictures used in the inter-layer prediction of the current picture is limited to one, so it is not necessary to signal the number information of the reference picture.

Referring to FIG. 5, a reference layer identifier may be obtained based on the number information of reference pictures obtained in S520 (S530).

6, the number of corresponding pictures (NumActiveRefLayerPics) used for the inter-layer prediction of the current picture among the corresponding pictures of the layer having the upper layer and the direct dependency, the number of layers having the upper layer and the direct dependency Lt; RTI ID = 0.0 > (NumDirectRefLayers) < / RTI > are different. Here, the variable NumActiveRefLayerPics is a variable derived from the number information of the reference picture. For example, when the number information of the reference picture is coded by subtracting 1 from the number of corresponding pictures used for inter-layer prediction of the current picture, the variable NumActiveRefLayerPics is set to 1 by adding the number information of the reference picture obtained in S520 Lt; / RTI >

If the variable NumActiveRefLayerPics and the variable NumDirectRefLayers are the same, this means that the corresponding picture of the layer having the higher layer and direct dependency becomes the corresponding picture used for inter-layer prediction of the current picture. Therefore, there is no need to signal the reference layer identifier.

FIG. 7 illustrates a relationship between a slice and a tile according to an embodiment of the present invention.

One picture may be divided into at least one slice. The slice can be a basic unit that can independently perform entropy decoding. One slice may be composed of a plurality of slice segments.

Also, one picture may be divided into at least one tile. Here, the tile is a rectangular area composed of a plurality of coding tree units, and entropy decoding can be performed on a tile-by-tile basis. Further, parallel processing for simultaneously decoding a plurality of tiles is enabled. The size or unit of the tile can be encoded by the encoder to transmit the optimum size or unit to the decoder.

Alternatively, the size or unit of the tile of the upper layer may be derived based on the inter-layer tile alignment, that is, the size or unit of the tile of the lower layer.

Fig. 7 (a) shows a case where one picture is divided into one independent slice segment and four dependent slice segments. Here, the independent slice segment means that the slice segment header is included. On the other hand, the dependent slice segment does not include the slice segment header, which can use the same header of the independent slice segment. The slice segment is composed of a plurality of coding tree units, and the coding tree unit corresponds to a maximum size of a coding unit which is a basic unit of video signal processing.

Referring to FIG. 7A, one tile may include a plurality of slice segments, and one slice segment may exist in one tile. Or there may be a plurality of tiles in one slice.

7 (b) shows a case where one tile is composed of two or more slices. 7 (b), slice 0 is composed of independent slice segment 0 and dependent slice segment 1, and slice 1 can be composed of independent slice segment 1 and dependent slice segment 2. Slice 0 and slice 1 can be included in one tile (tile 0).

FIG. 8 is a flowchart illustrating a method of performing limited inter-layer prediction according to whether tiles are arranged between layers, according to an embodiment of the present invention.

Referring to FIG. 8, whether tiles are arranged between the upper layer and the lower layer can be confirmed (S1300).

For example, whether tiles are arranged between the upper layer and the lower layer can be confirmed based on the tile boundary alignment flag (tile_boundaries_aligned_flag [i] [j]).

Specifically, when the value of the tile boundary alignment flag (tile_boundaries_aligned_flag [i] [j]) is 1, if two samples of the current picture belonging to the ith layer (that is, the upper layer) belong to one tile, j If two samples of the corresponding picture belonging to the ith layer belong to one tile and two samples of the current picture belonging to the ith layer belong to another tile, Two samples also belong to different tiles.

Accordingly, when the value of the tile boundary alignment flag is 1, it means that the size or unit of the tile is mapped between the current picture of the upper layer and the corresponding picture of the lower layer. Conversely, when the value of the tile boundary alignment flag is 0, this may mean that there is no tile alignment between the layers.

Here, the j-th layer may mean a layer having an i-th layer and a direct dependency. Whether or not the layer has a direct dependency with the upper layer can be determined based on the direct dependency flag (direct_dependency_flag [i] [j]). The direct dependency flag (direct_dependency_flag [i] [j]) may indicate whether the jth layer is used for inter-layer prediction of the i-th upper layer.

For example, when the value of the direct dependency flag is 1, the j-th layer can be used for inter-layer prediction of the i-th upper layer. If the value of the direct dependency flag is 0, it may not be used for inter-layer prediction of the i-th upper layer.

Further, two samples of the corresponding picture belonging to the jth layer may mean samples at the same position as the two samples of the current picture.

The tile boundary alignment flag may be obtained from video usability information (VUI) that belongs to a video parameter set. Video usability information may not be used to decode luminance components and chrominance components, but may refer to information used for decoder conformance or output timing conformance.

On the other hand, the tile boundary alignment flag is obtained when at least one picture belonging to each of the upper layer (i-th layer) and the lower layer (i.e., j-th layer) And a method of obtaining the tile boundary alignment flag will be described with reference to FIGS. 9 to 10. FIG.

Referring to FIG. 8, inter-layer prediction can be limited based on the result of the check in step S800 (S810).

If it is determined that the tile between the upper layer and the lower layer is marshaled according to the tile boundary alignment flag of the current picture belonging to the upper layer, the sample of the specific area belonging to the current picture can perform inter-layer prediction limitedly.

Specifically, the block of the specific area belonging to the current picture may be limited to not perform the inter-layer prediction using the corresponding picture of the lower layer (first limited prediction mode). Alternatively, the sample of the specific area belonging to the current picture can be limited to perform the inter-layer prediction using only the sample of the specific area belonging to the corresponding picture of the lower layer (second limited prediction mode). Alternatively, the sample of the specific area belonging to the current picture may be subjected to inter-layer prediction without restriction in the first limited prediction mode and the second limited prediction mode (third limited prediction mode). That is, the inter-layer prediction may not be performed using the corresponding picture of the lower layer of the sample of the specific area belonging to the current picture, or inter-layer prediction may be performed using only the sample of the specific area belonging to the corresponding picture of the lower layer Alternatively, inter-layer prediction may be performed using the entire area of the corresponding picture of the lower layer.

The limited predictive identifier ilc_idc may be signaled to specify any one of the first to third limited prediction modes described above. That is, any one of the first to third limited prediction modes can be selectively used based on the limited prediction ID.

On the other hand, the specific area may mean a tile set, and the tile set may be determined based on inter-layer constrained tile sets SEI message. A method of acquiring the inter-layer constrained tile sets SEI message will be described with reference to FIGS. 11 to 12. FIG.

9 to 10 illustrate the syntax of a tile boundary alignment flag according to an embodiment to which the present invention is applied.

Referring to FIG. 9, a tile boundary alignment flag tile_boundaries_aligned_flag [i] [j] may be obtained (S1900).

As previously discussed, the tile boundary alignment flag (tile_boundaries_aligned_flag [i] [j]) may indicate whether the i-th layer is mapped to the size or unit of the tile of the j-th layer. Here, the j-th layer means a layer having i-th layer and direct dependency among a plurality of layers included in the video sequence. That is, the j-th layer means a layer used for inter-layer prediction of the i-th layer. Therefore, the tile boundary alignment flag can be obtained by the number of layers having the i-th layer and the direct dependency (NumDirectRefLayers_id_in_nuh [i]).

On the other hand, tile alignment between layers in all layers in a video sequence may not be used, and for this purpose, a non-tile alignment flag (tile_boundaries_non_aligned_flag) may be signaled.

Referring to FIG. 10, a non-tile alignment flag tile_boundaries_non_aligned_flag may be obtained (S1000).

Here, the non-tile alignment flag may indicate whether the inter-layer tile alignment in the layer in the video sequence is restricted.

Specifically, when the value of the non-tile alignment flag is 1, a restriction is imposed that tile alignment between layers in the video sequence is not performed. For example, if a picture belonging to a layer in a video sequence does not use a tile, it can not perform tile alignment between layers. Therefore, even in this case, the value of the non-tile alignment flag is encoded as 1, and a restriction that the interlayer tile alignment is not performed can be applied.

On the other hand, if the value of the non-tile alignment flag is 0, this means that there is no restriction that layer-to-layer tile alignment in the video sequence is not performed. That is, when the value of the non-tile alignment flag is 0, it means that the interlayer tile alignment can be performed in at least one of the layers in the video sequence. However, in this case, it is needless to say that the inter-layer tile alignment can be performed when the picture belonging to the layer in the video sequence uses the tile.

Thus, the non-tile alignment flag may indicate whether a tile boundary alignment flag is present or whether a tile boundary alignment flag is extracted from the bitstream.

10, the tile boundary alignment flag tile_boundaries_aligned_flag [i] [j] may be limitedly obtained only when the value of the non-tile alignment flag is zero (S1010).

That is, when the value of the non-tile alignment flag is 1, since the layer in the video sequence does not perform the interlayer tile alignment, it is not necessary to signal the tile boundary alignment flag indicating whether the tile is aligned for each layer.

9, the tile boundary alignment flag tile_boundaries_aligned_flag [i] [j] indicates whether or not the i-th layer is mapped to the tile size or unit of the j-th layer, and the j- Means a layer having i-th layer and direct dependency among a plurality of included layers. That is, the j-th layer means a layer used for inter-layer prediction of the i-th layer. Therefore, the tile boundary alignment flag can be obtained by the number of layers having the i-th layer and the direct dependency (NumDirectRefLayers_id_in_nuh [i]).

11 to 12 illustrate the syntax of an inter-layer constrained tile sets SEI message according to an embodiment to which the present invention is applied.

First Embodiment

Referring to FIG. 11, the tile number flag il_one_tile_per_tile_set_flag may be obtained (S1100).

The tile count flag may indicate whether there is only one tile in a set of tiles. For example, when the value of the tile number flag is 1, only one tile exists in one tile set. [0158] The tile set number information il_num_sets_in_message_minus1 can be obtained based on the tile number flag in step S1100 (S1110 ).

More specifically, the tile set number information can be obtained when the value of the tile number flag is 0 (i.e., when there is only one tile in one tile set). Here, a value obtained by adding 1 to the tile set number information may mean the total number of tile sets in the SEI message.

If the value of the tile set number information is not 0, the tile number information (il_num_tile_rects_in_set_minus1 [i]) can be obtained (S1120).

The case where the value of the tile set count information is not 0 means that the total number of tile sets in the SEI message is at least 2. The value obtained by adding 1 to the tile number information may indicate the number of rectangular regions belonging to the i-th tile set.

If the value of the tile set number information is not 0, the upper left tile index il_top_left_tile_index [i] [j] and the lower right tile index il_bottom_right_tile_index [i] [j] are calculated by the number of rectangular areas according to the tile number information, ]) (S1130, S1140), respectively.

The upper left tile index il_top_left_tile_index [i] [j] can indicate the position of the upper left tile in the jth rectangular area belonging to the i-th tile set, and the lower right tile index il_bottom_right_tile_index [i] [j] Can represent the position of the tile located at the bottom right corner in the jth rectangular area belonging to the i-th tile set. In this case, the position of the tile can be represented by an index according to a raster scan order.

Based on the inter-layer constrained tile sets SEI message described above, a set of tiles belonging to the corresponding picture of the current picture and / or the lower layer of the upper layer can be determined.

Second Embodiment

When indicating the position of the lower right tile in the rectangular area belonging to the tile set, the index of the lower right tile according to the raster scan order is not signaled, and the difference between the index of the upper left tile and the index of the lower right tile may be signaled .

Referring to FIG. 12, the tile number flag il_one_tile_per_tile_set_flag may be obtained (S1200).

The tile count flag may indicate whether there is only one tile in a set of tiles. For example, if the value of the tile count flag is 1, there is only one tile in a set of tiles.

The tile set number information (il_num_sets_in_message_minus1) can be obtained based on the tile number flag in step S1200 (S1210).

More specifically, the tile set number information can be obtained when the value of the tile number flag is 0 (i.e., when there is only one tile in one tile set). Here, a value obtained by adding 1 to the tile set number information may mean the total number of tile sets in the SEI message.

If the value of the tile set number information is not 0, the tile number information il_num_tile_rects_in_set_minus1 [i] can be obtained (S1220).

The case where the value of the tile set count information is not 0 means that the total number of tile sets in the SEI message is at least 2. The value obtained by adding 1 to the tile number information may indicate the number of rectangular regions belonging to the i-th tile set.

If the value of the tile set number information is not 0, the upper left tile index il_top_left_tile_index [i] [j] and the difference tile index il_delta_from_top_left_tile_index [i] [j] are calculated by the number of rectangular areas according to the tile number information, (S1230 and S1240), respectively.

Here, the upper left tile index (il_top_left_tile_index [i] [j]) indicates the position of the tile located at the upper left in the jth rectangular area belonging to the i-th tile set. The difference tile index (il_delta_from_top_left_tile_index [i] [j]) represents the difference between the position of the tile located at the upper left and the position of the tile located at the lower right in the jth rectangular area belonging to the i-th tile set. In this case, the position of the tile can be represented by an index according to a raster scan order.

Therefore, the position of the tile located at the lower right end in the j-th rectangular area belonging to the i-th tile set can be derived as the sum of the upper left tile index and the difference tile index. This reduces the number of bits required to signal inter-layer constrained tile sets SEI messages.

Further, based on the inter-layer constrained tile sets SEI message, a set of tiles belonging to the corresponding picture of the current picture and / or the lower layer of the upper layer can be determined.

13 is a flowchart illustrating a method of upsampling a corresponding picture of a lower layer according to an embodiment of the present invention.

Referring to FIG. 13, a reference sample position of a lower layer corresponding to a current sample position of an upper layer may be derived (S1300).

Since the resolution of the upper layer and the resolution of the lower layer may be different, the reference sample position corresponding to the current sample position can be derived in consideration of the resolution difference therebetween. That is, the horizontal / vertical ratio between the picture of the upper layer and the picture of the lower layer can be considered. In addition, since an upsampled picture of a lower layer may not coincide in size with a picture of an upper layer, an offset for correcting the upsampled picture may be required.

For example, the reference sample position may be derived taking into account the scale factor and the upsampled lower layer offset.

Here, the scale factor can be calculated based on the ratio of the width and height between the current picture of the upper layer and the corresponding picture of the lower layer.

The upsampled lower layer offset may mean position difference information between any one of the samples located at the edge of the current picture and one of the samples located at the edge of the interlayer reference picture. For example, the upsampled lower layer offset includes positional difference information in the horizontal / vertical direction between the upper left sample of the current picture and the upper left sample of the interlayer reference picture, and the difference information between the lower right sample of the current picture and the lower right sample Directional horizontal / vertical directional difference information.

The upsampled lower layer offset may be obtained from the bitstream. For example, the upsampled lower layer offset may be obtained from at least one of a Video Parameter Set, a Sequence Parameter Set, a Picture Parameter Set, and a Slice Header .

The filter coefficient of the upsampling filter may be determined considering the phase of the reference sample position derived in step S1300 (S1310).

Here, the up-sampling filter may use either a fixed up-sampling filter or an adaptive up-sampling filter.

1. Fixed Upsampling Filter

The fixed up-sampling filter may refer to an up-sampling filter having a predetermined filter coefficient without considering the characteristics of the image. A tap filter can be used as the fixed up-sampling filter, which can be defined for the luminance component and the chrominance component, respectively. A fixed up-sampling filter having an accuracy of 1/16 sample units will be described with reference to Tables 1 to 2 below.

Phase p
Interpolation filter coefficient
f [p, 0] f [p, 1] f [p, 2] f [p, 3] f [p, 4] f [p, 5] f [p, 6] f [p, 7] 0 0 0 0 64 0 0 0 0 One 0 One -3 63 4 -2 One 0 2 -One 2 -5 62 8 -3 One 0 3 -One 3 -8 60 13 -4 One 0 4 -One 4 -10 58 17 -5 One 0 5 -One 4 -11 52 26 -8 3 -One 6 -One 3 -3 47 31 -10 4 -One 7 -One 4 -11 45 34 -10 4 -One 8 -One 4 -11 40 40 -11 4 -One 9 -One 4 -10 34 45 -11 4 -One 10 -One 4 -10 31 47 -9 3 -One 11 -One 3 -8 26 52 -11 4 -One 12 0 One -5 17 58 -10 4 -One 13 0 One -4 13 60 -8 3 -One 14 0 One -3 8 62 -5 2 -One 15 0 One -2 4 63 -3 One 0

Table 1 is a table defining the filter coefficients of the fixed up-sampling filter with respect to the luminance component.

As shown in Table 1, in the case of upsampling on the luminance component, an 8-tap filter is applied. That is, interpolation can be performed using a reference sample of a reference layer corresponding to the current sample of the upper layer and a neighboring sample adjacent to the reference sample. Here, the neighbor samples can be specified according to the direction in which the interpolation is performed. For example, when interpolation is performed in the horizontal direction, the neighboring sample may include three consecutive samples to the left and four consecutive samples to the right based on the reference sample. Alternatively, when interpolation is performed in the vertical direction, the neighboring sample may include three consecutive samples at the top and four consecutive samples at the bottom based on the reference sample.

Since interpolation is performed with an accuracy of 1/16 sample units, there are a total of 16 phases. This is to support resolution of various magnifications such as 2 times and 1.5 times.

In addition, the fixed up-sampling filter may use different filter coefficients for each phase (p). The size of each filter coefficient may be defined to fall within a range of 0 to 63, except when the phase p is zero. This means that the filtering is performed with a precision of 6 bits. Here, the phase (p) of 0 means the position of an integer multiple of n when interpolation is performed in 1 / n sample units.

Phase p
Interpolation filter coefficient
f [p, 0] f [p, 1] f [p, 2] f [p, 3] 0 0 64 0 0 One -2 62 4 0 2 -2 58 10 -2 3 -4 56 14 -2 4 -4 54 16 -2 5 -6 52 20 -2 6 -6 46 28 -4 7 -4 42 30 -4 8 -4 36 36 -4 9 -4 30 42 -4 10 -4 28 46 -6 11 -2 20 52 -6 12 -2 16 54 -4 13 -2 14 56 -4 14 -2 10 58 -2 15 0 4 62 -2

Table 2 defines the filter coefficients of the fixed up-sampling filter for the chrominance components.

As shown in Table 2, in case of up-sampling for the chrominance components, a 4-tap filter can be applied unlike the luminance component. That is, interpolation can be performed using a reference sample of a reference layer corresponding to the current sample of the upper layer and a neighboring sample adjacent to the reference sample. Here, the neighbor samples can be specified according to the direction in which the interpolation is performed. For example, when interpolation is performed in the horizontal direction, the neighboring sample may include one continuous sample to the left and two consecutive samples to the right based on the reference sample. Alternatively, when interpolation is performed in the vertical direction, the neighboring sample may include one continuous sample at the top and two consecutive samples at the bottom based on the reference sample.

On the other hand, as in the case of the luminance component, since interpolation is performed with an accuracy of 1/16 sample units, there are a total of 16 phases, and different filter coefficients can be used for each phase (p). And, the size of each filter coefficient can be defined to fall in the range of 0 to 62, except when the phase (p) is zero. This also means that filtering is performed with a precision of 6 bits.

The 8-tap filter is applied to the luminance component and the 4-tap filter is applied to the chrominance component. However, the present invention is not limited to this, and the order of the tap filter may be variably determined in consideration of the coding efficiency Of course it is.

2. Adaptive up-sampling filter

It is possible to determine the optimum filter coefficient in the encoder considering the feature of the image without using the fixed filter coefficient, signaling it to the decoder, and transmit it to the decoder. It is the adaptive up-sampling filter that uses adaptively determined filter coefficients in the encoder. Since the characteristics of the image are different in picture units, it is possible to improve the coding efficiency by using an adaptive up-sampling filter capable of expressing characteristics of the image better than using a fixed up-sampling filter in all cases.

The inter-layer reference picture can be generated by applying the filter coefficient determined in step S1310 to the corresponding picture of the lower layer (S1320).

Specifically, the filter coefficient of the determined up-sampling filter may be applied to the samples of the corresponding picture to perform interpolation. Here, the interpolation may be performed primarily in the horizontal direction and may be performed in the vertical direction with respect to the sample generated after the interpolation in the horizontal direction.

Claims (15)

Determining a corresponding picture belonging to a lower layer used for inter-layer prediction of a current picture belonging to an upper layer; And
And performing inter-layer prediction of the current picture using a corresponding picture of the determined lower layer,
Wherein the current picture restrictively performs inter-layer prediction based on the constraint prediction identifier.
The method of claim 1, wherein the limited predictive identifier specifies one of a first limited prediction mode, a second limited prediction mode, and a third limited prediction mode,
Wherein the first limited prediction mode is a mode in which a block of a tile set belonging to the current picture does not perform inter-layer prediction using a corresponding picture of the lower layer,
The second limited prediction mode is a mode in which inter-layer prediction is performed using only samples of a tile set belonging to a corresponding picture of the lower layer, a sample of a tile set belonging to the current picture,
Wherein the third limited prediction mode is a mode for performing inter-layer prediction using a corresponding picture of the lower layer without restriction according to the first limited prediction mode and the second limited prediction mode. .
3. The method of claim 2, wherein the tile set is determined based on inter-layer constrained tile sets SEI message,
Wherein the inter-layer constrained tile sets SEI message includes tile count information, an upper left tile index, and a lower right tile index.
The method of claim 3,
Wherein the tile number information is a value obtained by subtracting 1 from the number of rectangular areas belonging to the tile set,
The upper left tile index indicates a position of the tile located at the upper left corner in the rectangular area belonging to the set of tiles,
Wherein the lower-right tile index indicates a position of a tile positioned at a lower right end in a rectangular area belonging to the set of tiles.
Layer prediction of the current picture by using the corresponding picture of the determined lower layer and determining a corresponding picture belonging to a lower layer used for inter-layer prediction of a current picture belonging to an upper layer,
Wherein the predicting unit restrictively performs inter-layer prediction of the current picture based on the constrained prediction identifier.
6. The method of claim 5, wherein the constrained prediction identifier specifies one of a first constrained prediction mode, a second constrained prediction mode, and a third constrained prediction mode,
Wherein the first limited prediction mode is a mode in which a block of a tile set belonging to the current picture does not perform inter-layer prediction using a corresponding picture of the lower layer,
The second limited prediction mode is a mode in which inter-layer prediction is performed using only samples of a tile set belonging to a corresponding picture of the lower layer, a sample of a tile set belonging to the current picture,
Wherein the third limited prediction mode is a mode for performing inter-layer prediction using a corresponding picture of the lower layer without restriction according to the first limited prediction mode and the second limited prediction mode. .
7. The method of claim 6, wherein the tile set is determined based on inter-layer constrained tile sets SEI message,
Wherein the inter-layer constrained tile sets SEI message includes tile count information, an upper left tile index, and a lower right tile index.
8. The method of claim 7,
Wherein the tile number information is a value obtained by subtracting 1 from the number of rectangular areas belonging to the tile set,
The upper left tile index indicates a position of the tile located at the upper left corner in the rectangular area belonging to the set of tiles,
Wherein the lower-right tile index indicates a position of a tile located at a lower right end in a rectangular area belonging to the set of tiles.
Determining a corresponding picture belonging to a lower layer used for inter-layer prediction of a current picture belonging to an upper layer; And
And performing inter-layer prediction of the current picture using a corresponding picture of the determined lower layer,
Wherein the current picture restrictively performs inter-layer prediction based on the limited predictive identifier.
The method of claim 9, wherein the limited predictive identifier specifies one of a first limited prediction mode, a second limited prediction mode, and a third limited prediction mode,
Wherein the first limited prediction mode is a mode in which a block of a tile set belonging to the current picture does not perform inter-layer prediction using a corresponding picture of the lower layer,
The second limited prediction mode is a mode in which inter-layer prediction is performed using only samples of a tile set belonging to a corresponding picture of the lower layer, a sample of a tile set belonging to the current picture,
Wherein the third limited prediction mode is a mode for performing inter-layer prediction using a corresponding picture of the lower layer without restriction according to the first limited prediction mode and the second limited prediction mode. .
11. The method of claim 10, wherein the tile set is determined based on inter-layer constrained tile sets SEI message,
Wherein the inter-layer constrained tile sets SEI message includes tile count information, an upper left tile index, and a lower right tile index.
12. The method of claim 11, wherein the tile number information is a value obtained by subtracting 1 from the number of rectangular areas belonging to the tile set,
The upper left tile index indicates a position of the tile located at the upper left corner in the rectangular area belonging to the set of tiles,
Wherein the lower-right tile index indicates a position of a tile located at a lower right end in a rectangular area belonging to the set of tiles.
Layer prediction of the current picture by using the corresponding picture of the determined lower layer and determining a corresponding picture belonging to a lower layer used for inter-layer prediction of a current picture belonging to an upper layer,
Wherein the predicting unit restrictively performs inter-layer prediction of the current picture based on the constrained prediction identifier.
14. The method of claim 13, wherein the constrained prediction identifier specifies one of a first constrained prediction mode, a second constrained prediction mode, and a third constrained prediction mode,
Wherein the first limited prediction mode is a mode in which a block of a tile set belonging to the current picture does not perform inter-layer prediction using a corresponding picture of the lower layer,
The second limited prediction mode is a mode in which inter-layer prediction is performed using only samples of a tile set belonging to a corresponding picture of the lower layer, a sample of a tile set belonging to the current picture,
Wherein the third limited prediction mode is a mode for performing inter-layer prediction using a corresponding picture of the lower layer without restriction according to the first limited prediction mode and the second limited prediction mode. .
15. The method of claim 14, wherein the tile set is determined based on inter-layer constrained tile sets SEI message,
Wherein the inter-layer constrained tile sets SEI message includes tile count information, an upper left tile index, and a lower right tile index.
KR20140130927A 2013-09-30 2014-09-30 A method and an apparatus for encoding and decoding a multi-layer video signal KR20150037660A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020130116311 2013-09-30
KR20130116311 2013-09-30

Publications (1)

Publication Number Publication Date
KR20150037660A true KR20150037660A (en) 2015-04-08

Family

ID=53033534

Family Applications (1)

Application Number Title Priority Date Filing Date
KR20140130927A KR20150037660A (en) 2013-09-30 2014-09-30 A method and an apparatus for encoding and decoding a multi-layer video signal

Country Status (1)

Country Link
KR (1) KR20150037660A (en)

Similar Documents

Publication Publication Date Title
KR20150014871A (en) A method and an apparatus for encoding/decoding a scalable video signal
KR102286856B1 (en) A method and an apparatus for encoding/decoding a scalable video signal
KR20150046744A (en) A method and an apparatus for encoding and decoding a multi-layer video signal
KR20150133680A (en) A method and an apparatus for encoding/decoding a scalable video signal
KR20150133683A (en) A method and an apparatus for encoding and decoding a scalable video signal
KR20150133682A (en) A method and an apparatus for encoding and decoding a scalable video signal
KR20150099497A (en) A method and an apparatus for encoding and decoding a multi-layer video signal
KR20150075041A (en) A method and an apparatus for encoding/decoding a multi-layer video signal
KR20150133681A (en) A method and an apparatus for encoding and decoding a scalable video signal
KR20140138544A (en) Method for deriving motion information in multi-layer structure and apparatus using the same
KR20150064678A (en) A method and an apparatus for encoding and decoding a multi-layer video signal
KR20150110294A (en) A method and an apparatus for encoding/decoding a multi-layer video signal
KR102262064B1 (en) A method and an apparatus for encoding/decoding a scalable video signal
KR20150046742A (en) A method and an apparatus for encoding and decoding a multi-layer video signal
KR20150133684A (en) A method and an apparatus for encoding and decoding a scalable video signal
KR20150009468A (en) A method and an apparatus for encoding/decoding a scalable video signal
KR20150048077A (en) A method and an apparatus for encoding/decoding a multi-layer video signal
KR20150043990A (en) A method and an apparatus for encoding/decoding a multi-layer video signal
KR20150037660A (en) A method and an apparatus for encoding and decoding a multi-layer video signal
KR20150009469A (en) A method and an apparatus for encoding and decoding a scalable video signal
KR20150133685A (en) A method and an apparatus for encoding/decoding a multi-layer video signal
KR20150009470A (en) A method and an apparatus for encoding and decoding a scalable video signal
KR20150075031A (en) A method and an apparatus for encoding and decoding a multi-layer video signal
KR20150014872A (en) A method and an apparatus for encoding/decoding a scalable video signal
KR20150037659A (en) A method and an apparatus for encoding/decoding a multi-layer video signal

Legal Events

Date Code Title Description
WITN Withdrawal due to no request for examination