KR20150043977A

KR20150043977A - Method and apparatus for video encoding/decoding based on multi-layer

Info

Publication number: KR20150043977A
Application number: KR20140135694A
Authority: KR
Inventors: 이하현; 강정원; 이진호; 최진수; 김진웅
Original assignee: 한국전자통신연구원
Priority date: 2013-10-11
Filing date: 2014-10-08
Publication date: 2015-04-23

Abstract

Disclosed are a method and an apparatus for encoding an image to support a plurality of layers. The method for encoding the image comprises the steps of: encoding POC reset information showing whether POC (Picture Order Count) of a present picture is reset to 0 or not; calculating the POC value of the present picture and each POC value of a long-term reference picture and a short-term reference picture in DPB (Decoded Picture Buffer) which the present picture refers to, based on the POC reset information; and making RPS (Reference Picture Set) to inter-predict of the present picture based on the POC values of the long-term reference picture and the short-term reference picture.

Description

TECHNICAL FIELD [0001] The present invention relates to a multi-layer video encoding / decoding method and apparatus,

More particularly, the present invention relates to a method of coding a picture of a multi-layered structure and more particularly to a method of setting a picture order count (POC) of pictures in the same AU (Access Unit) Buffer) of a reference picture.

Recently, as a multimedia environment has been established, a variety of terminals and networks have been used, and user demands have been diversified accordingly.

For example, as the performance and computing capability of a terminal are diversified, the performance to be supported varies depending on a device. In addition, the network in which the information is transmitted is also diversified not only by the external structure such as a wired / wireless network, but also by the type of information to be transmitted, information amount and speed, and the like. The user selects the terminal and the network to be used according to the desired function, and the spectrum of the terminal and the network provided by the enterprise to the user is also diversified.

In this regard, recently, broadcasting having a high definition (HD) resolution has been expanded not only in the domestic market but also in the world, so that many users are accustomed to high resolution and high quality video. Accordingly, many video service related organizations are making efforts to develop next generation video equipment.

In addition, with the increasing interest in UHD (Ultra High Definition), which has a resolution more than four times that of HDTV in addition to HDTV, there is a growing demand for a technology for compressing and processing higher resolution and higher quality images.

An inter prediction technique for predicting a pixel value included in a current picture from a previous and / or a temporal picture in order to compress and process an image, an inter prediction technique for predicting a pixel value included in a current picture, An entropy encoding technique for assigning a short code to a symbol having a high appearance frequency and a long code to a symbol having a low appearance frequency can be used.

As described above, considering the requirements of each terminal, network, and diversified user with different functions to be supported, the quality, size, and frame of a supported image need to be diversified accordingly.

As described above, scalability that supports various image quality, resolution, size, frame rate, and viewpoint due to heterogeneous communication networks and various functions and types of terminals has become an important function of a video format.

Therefore, in order to provide a service required by a user in various environments based on a highly efficient video coding method, it is necessary to provide a scalability function so as to enable efficient video encoding and decoding in terms of time, space, image quality, and viewpoint .

The present invention provides a method and apparatus for equally setting the POC of pictures in an AU in scalable video coding comprising a plurality of layers.

The present invention provides a method and apparatus for calculating a POC of reference pictures in a DPB referenced by a current picture as resetting a POC of a current picture in scalable video coding comprising a plurality of layers.

The present invention provides a method and apparatus for signaling whether a POC of a current picture has been reset in scalable video coding comprising a plurality of layers.

According to an embodiment of the present invention, an image decoding method supporting a plurality of layers is provided. Decoding the POC reset information indicating whether or not the picture order count (POC) of the current picture has been reset to zero, decoding the POC reset information based on the POC reset information, Calculating a POC value of each of a long term reference picture and a short term reference picture in a DPB (Decoded Picture Buffer) referred to by the current picture and a POC value of the long term reference picture and a POC value of the short- And constructing a reference picture set (RPS) for inter prediction.

According to another embodiment of the present invention, an image decoding apparatus supporting a plurality of layers is provided. The image decoding apparatus includes a decoding unit for decoding POC reset information indicating whether a picture order count (POC) of a current picture has been reset to 0 and a POC reset value indicating a POC value of the current picture based on the POC reset information And a POC value of each of a long term reference picture and a short term reference picture in a DPB (Decoded Picture Buffer) referenced by the current picture, and based on the POC value of the long term reference picture and the POC value of the short term reference picture, And a prediction unit for constructing a reference picture set (RPS) for inter prediction of the reference picture.

According to the present invention, there is provided a method for equally resetting the POC of pictures in an AU when the POCs of the pictures in the same AU are not the same. In addition, even if the POC value of the current picture is reset, the reference pictures in the decoded picture buffer referenced by the current picture can be normally identified.

1 is a block diagram illustrating a configuration of an image encoding apparatus according to an embodiment of the present invention.
2 is a block diagram illustrating a configuration of an image decoding apparatus according to an embodiment of the present invention.
3 is a conceptual diagram schematically showing an embodiment of a scalable video coding structure using a plurality of layers to which the present invention can be applied.
4 schematically shows a method of reconstructing a POC of pictures in a scalable video coding structure including a plurality of layers according to an embodiment of the present invention and constructing a reference picture set for inter prediction based on the POC of the reconstructed pictures It is a flowchart.
FIG. 5 is an example of a scalable video structure including a plurality of layers shown to explain a process of resetting POC values of pictures in an AU according to an embodiment of the present invention.
6 illustrates a process of resetting a POC value of reference pictures in a DPB based on information (e.g., poc_reset_flag) indicating whether or not the POC value of the current picture is reset to 0 according to an embodiment of the present invention Lt; / RTI >
FIG. 7 is a diagram illustrating a method for calculating a POC value of long term reference pictures in a DPB according to an embodiment of the present invention.

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In describing the embodiments of the present invention, if the detailed description of related known structures or functions is deemed to obscure the subject matter of the present specification, the description may be omitted.

When an element is referred to herein as being "connected" or "connected" to another element, it may mean directly connected or connected to the other element, Element may be present. In addition, the content of " including " a specific configuration in this specification does not exclude a configuration other than the configuration, and means that additional configurations can be included in the scope of the present invention or the scope of the present invention.

The terms first, second, etc. may be used to describe various configurations, but the configurations are not limited by the term. The terms are used for the purpose of distinguishing one configuration from another. For example, without departing from the scope of the present invention, the first configuration may be referred to as the second configuration, and similarly, the second configuration may be named as the first configuration.

In addition, the constituent elements shown in the embodiments of the present invention are shown independently to represent different characteristic functions, which do not mean that each constituent element is composed of separate hardware or a single software constituent unit. That is, each constituent unit is included in each constituent unit for convenience of explanation, and at least two constituent units of each constituent unit may form one constituent unit or one constituent unit may be divided into a plurality of constituent units to perform a function. The integrated embodiments and the separate embodiments of each component are also included in the scope of the present invention unless they depart from the essence of the present invention.

In addition, some of the components are not essential components to perform essential functions in the present invention, but may be optional components only to improve performance. The present invention can be implemented only with components essential for realizing the essence of the present invention, except for the components used for the performance improvement, and can be implemented by only including the essential components except the optional components used for performance improvement Are also included in the scope of the present invention.

1 is a block diagram illustrating a configuration of an image encoding apparatus according to an embodiment of the present invention.

A scalable video encoding apparatus supporting a multi-layer structure can be implemented by extending a general video encoding apparatus having a single layer structure. The block diagram of FIG. 1 shows an embodiment of an image encoding apparatus that can be the basis of a scalable video encoding apparatus applicable to a multi-layer structure.

1, an image coding apparatus 100 includes an inter prediction unit 110, an intra prediction unit 120, a switch 115, a subtractor 125, a transform unit 130, a quantization unit 140, An inverse quantization unit 160, an inverse transform unit 170, an adder 175, a filter unit 180, and a reference picture buffer 190. The encoding unit 150, the inverse quantization unit 160,

The image encoding apparatus 100 may encode an input image in an intra mode or an inter mode and output a bit stream.

In the intra mode, the switch 115 is switched to the intra mode, and in the inter mode, the switch 115 can be switched to the inter mode. Intra prediction is intra prediction, and inter prediction is inter prediction. The image encoding apparatus 100 may generate a prediction block for an input block of an input image, and then may code a residual between the input block and the prediction block. At this time, the input image may mean an original picture.

In the intra mode, the intraprediction unit 120 can use the sample value of the already coded / decoded block around the current block as a reference sample. The intra prediction unit 120 may perform spatial prediction using the reference samples and generate prediction samples for the current block.

In the inter mode, the inter-prediction unit 110 generates a motion vector that specifies a reference block having the smallest difference from the input block (current block) in the reference picture stored in the reference picture buffer 190 in the motion prediction process Can be obtained. The inter prediction unit 110 can generate a prediction block for a current block by performing motion compensation using a motion vector and a reference picture stored in the reference picture buffer 190. [

In the case of a multi-layer structure, inter prediction that is applied in inter mode may include inter-layer prediction. The inter-prediction unit 110 may construct an inter-layer reference picture by sampling a picture of the reference layer, and may perform inter-layer prediction by including an inter-layer reference picture in the reference picture list. A reference relationship between layers can be signaled through information that specifies dependencies between layers.

On the other hand, when the current layer picture and the reference layer picture are the same size, sampling applied to the reference layer picture may mean generation of a reference sample by sample copy from the reference layer picture. The sampling applied to the reference layer picture may mean upsampling when the resolution of the current layer picture and the reference layer picture are different.

For example, in a case where the inter-layer resolution is different, an inter-layer reference picture may be constructed by up-sampling a reconstructed picture of a reference layer between layers supporting scalability regarding resolution.

Which layer picture to use to construct an interlayer reference picture can be determined in consideration of coding cost and the like. The encoding apparatus can transmit to the decoding apparatus information specifying a layer to which a picture to be used as an interlayer reference picture belongs.

The layer to be referred to in the interlayer prediction, that is, the picture used for prediction of the current block in the reference layer may be a picture of the same AU (Access Unit) as the current picture (current intra-layer prediction picture).

The subtractor 125 may generate a residual block by a difference between the input block and the generated prediction block.

The transforming unit 130 may perform a transform on the residual block to output a transform coefficient. Here, the transform coefficient may mean a coefficient value generated by performing a transform on a residual block and / or a residual signal. Hereinafter, a quantized transform coefficient level generated by applying quantization to a transform coefficient may also be referred to as a transform coefficient.

When the transform skip mode is applied, the transforming unit 130 may omit the transform for the residual block.

The quantization unit 140 may quantize the input transform coefficient according to a quantization parameter (or a quantization parameter) to output a quantized coefficient. The quantized coefficients may be referred to as quantized transform coefficient levels. At this time, the quantization unit 140 can quantize the input transform coefficients using the quantization matrix.

The entropy encoding unit 150 can output a bitstream by entropy encoding the values calculated by the quantization unit 140 or the encoding parameter values calculated in the encoding process according to the probability distribution. The entropy encoding unit 150 may entropy encode information for video decoding (e.g., a syntax element or the like) in addition to the pixel information of the video.

The encoding parameters are information necessary for encoding and decoding, and may include information that can be inferred during encoding or decoding, as well as information encoded and encoded by a coding device such as a syntax element.

For example, the coding parameters include values of intra / inter prediction mode, motion / motion vector, reference picture index, coding block pattern, residual signal presence, conversion coefficient, quantized transform coefficient, quantization parameter, block size, Or statistics.

The residual signal can mean the difference between the original signal and the predicted signal, and the difference between the original signal and the predicted signal is transformed or the difference between the original signal and the predicted signal is transformed and the quantized signal is transformed It may mean. The residual signal may be referred to as a residual block in block units.

When entropy coding is applied, a small number of bits are allocated to a symbol having a high probability of occurrence, and a large number of bits are allocated to a symbol having a low probability of occurrence, so that the size of a bit string for the symbols to be coded Can be reduced. Therefore, the compression performance of the image encoding can be enhanced through the entropy encoding.

The entropy encoding unit 150 may use an encoding method such as exponential golomb, context-adaptive variable length coding (CAVLC), and context-adaptive binary arithmetic coding (CABAC) for entropy encoding. For example, the entropy encoding unit 150 may perform entropy encoding using a Variable Length Coding / Code (VLC) table. Further, the entropy encoding unit 150 derives a binarization method of a target symbol and a probability model of a target symbol / bin, and then performs entropy encoding using the derived binarization method or probability model You may.

Since the image encoding apparatus 100 according to the embodiment of FIG. 1 performs inter-prediction encoding, that is, inter-view prediction encoding, the currently encoded image needs to be decoded and stored for use as a reference image. Accordingly, the quantized coefficients can be inversely quantized in the inverse quantization unit 160 and inversely transformed in the inverse transformation unit 170. The inverse quantized and inverse transformed coefficients are added to the prediction block through the adder 175 and a reconstructed block is generated.

The restoration block passes through the filter unit 180 and the filter unit 180 applies at least one of a deblocking filter, a sample adaptive offset (SAO), and an adaptive loop filter (ALF) can do. The filter unit 180 may be referred to as an adaptive in-loop filter. The deblocking filter can remove block distortion occurring at the boundary between the blocks. The SAO may add a proper offset value to the pixel value to compensate for coding errors. ALF can perform filtering based on the comparison between the reconstructed image and the original image. The reconstruction block having passed through the filter unit 180 can be stored in the reference picture buffer 190.

2 is a block diagram illustrating a configuration of an image decoding apparatus according to an embodiment of the present invention.

A scalable video decoding apparatus supporting a multi-layer structure can be implemented by extending a general video decoding apparatus having a single layer structure. The block diagram of FIG. 2 shows an embodiment of an image decoding apparatus that can be the basis of a scalable video decoding apparatus applicable to a multi-layer structure.

2, the image decoding apparatus 200 includes an entropy decoding unit 210, an inverse quantization unit 220, an inverse transform unit 230, an intra prediction unit 240, an inter prediction unit 250, an adder 255 A filter unit 260, and a reference picture buffer 270.

The video decoding apparatus 200 receives the bit stream output from the encoder and decodes the video stream into the intra mode or the inter mode, and outputs the reconstructed video, that is, the reconstructed video.

In the intra mode, the switch is switched to the intra mode, and in the inter mode, the switch can be switched to the inter mode.

The image decoding apparatus 200 may obtain a reconstructed residual block from the input bitstream, generate a prediction block, and add the reconstructed residual block and the prediction block to generate a reconstructed block, i.e., a reconstructed block .

The entropy decoding unit 210 entropy-decodes the input bitstream according to the probability distribution, and outputs information such as quantized coefficients and syntax elements.

The quantized coefficients are inversely quantized in the inverse quantization unit 220 and inversely transformed in the inverse transformation unit 230. As a result that the quantized coefficients are dequantized / inverse transformed, a reconstructed residual block can be generated. At this time, the inverse quantization unit 220 can apply the quantization matrix to the quantized coefficients.

In the intra mode, the intra predictor 240 can perform spatial prediction using the sample values of the already decoded blocks around the current block, and generate prediction samples for the current block.

In the inter mode, the inter-prediction unit 250 can generate a prediction block for a current block by performing motion compensation using a motion vector and a reference picture stored in the reference picture buffer 270. [

In the case of a multi-layer structure, inter prediction that is applied in inter mode may include inter-layer prediction. The inter-prediction unit 250 may construct an inter-layer reference picture by sampling a picture of the reference layer, and may perform inter-layer prediction by including an inter-layer reference picture in the reference picture list. A reference relationship between layers can be signaled through information that specifies dependencies between layers.

For example, if inter-layer resolution is different, and inter-layer prediction is applied between layers supporting scalability regarding resolution, an inter-layer reference picture may be constructed by up-sampling reconstructed pictures of a reference layer.

At this time, information specifying a layer to which a picture to be used as an interlayer reference picture belongs can be transmitted from the encoding apparatus to the decoding apparatus.

The restored residual block and the prediction block are added by the adder 255, and a restoration block is generated. In other words, the residual sample and the prediction sample are added together to generate a reconstructed sample or reconstructed picture.

The restored picture is filtered by the filter unit 260. The filter unit 260 may apply at least one of a deblocking filter, SAO, and ALF to a restoration block or a restored picture. The filter unit 260 outputs a reconstructed picture that has been modified or filtered. The reconstructed image is stored in the reference picture buffer 270 and can be used for inter prediction.

The video decoding apparatus 200 may further include a parser (not shown) parsing information related to the encoded video included in the bitstream. The parsing unit may include an entropy decoding unit 210 or may be included in the entropy decoding unit 210. Such a parsing unit may also be implemented as one component of the decoding unit.

In FIG. 1 and FIG. 2, it is described that one encoding / decoding apparatus processes all of encoding / decoding for a multi-layer. However, this is for convenience of explanation, and the encoding / decoding apparatus may be configured for each layer.

In this case, the upper layer encoding / decoding apparatus can perform encoding / decoding of the upper layer using information of the upper layer and information of the lower layer. For example, the prediction unit (inter prediction unit) of the upper layer may perform intra prediction or inter prediction on the current block using pixel information or picture information of the upper layer, or may receive picture information reconstructed from the lower layer, Inter-prediction (inter-layer prediction) on the current block of the upper layer may be performed. Here, only the inter-layer prediction has been described as an example. However, the encoder / decoder performs encoding / decoding on the current layer using information of other layers regardless of whether the apparatus is configured for each layer or one layer for processing a multilayer can do.

In the present invention, a layer may include a view. In this case, in the case of inter-layer prediction, prediction of an upper layer is not simply performed using information of a lower layer, but information of another layer among layers specified as dependency by information specifying inter-layer dependency May be used to perform inter-layer prediction.

3 is a conceptual diagram schematically showing an embodiment of a scalable video coding structure using a plurality of layers to which the present invention can be applied. In FIG. 3, a GOP (Group of Pictures) represents a picture group, that is, a group of pictures.

In order to transmit video data, a transmission medium is required, and the performance of the transmission medium varies depending on various network environments. A scalable video coding method may be provided for application to these various transmission media or network environments.

A video coding method supporting scalability (hereinafter, referred to as 'scalable coding' or 'scalable video coding') removes redundancy between layers by utilizing texture information, motion information, residual signals, etc. between layers Thereby improving the encoding and decoding performance. The scalable video coding method can be applied to various scalers in terms of spatial, temporal, image quality (or quality, quality), and viewpoint according to the surrounding conditions such as transmission bit rate, transmission error rate, It is possible to provide the capability.

Scalable video coding can be performed using multiple layers structure to provide a bitstream applicable to various network situations. For example, the scalable video coding structure may include a base layer for compressing and processing image data using a general image decoding method, and compressing and compressing the image data using the decoding information of the base layer and a general image decoding method. Lt; RTI ID = 0.0 > layer. &Lt; / RTI >

The base layer may be referred to as a base layer or may be referred to as a lower layer. The enhancement layer may be referred to as an enhancement layer or a higher layer. In this case, the lower layer may mean a layer supporting lower scalability than the specific layer, and the upper layer may mean a layer supporting higher scalability than a specific layer. In addition, a layer to be referred to in coding / decoding of another layer is referred to as a reference layer (reference layer), and a layer to be encoded / decoded using another layer may be referred to as a current layer (current layer). The reference layer may be a lower layer than the current layer, and the current layer may be a layer higher than the reference layer.

Here, the layer may be divided into a plurality of layers based on spatial (e.g., image size), temporal (e.g., decoding order, image output order, frame rate), image quality, complexity, Means a set of separated video and bitstreams.

Referring to FIG. 3, for example, the base layer may be defined by a standard definition (SD), a frame rate of 15 Hz, a bit rate of 1 Mbps, and a first enhancement layer may be defined as high definition (HD), a frame rate of 30 Hz, And the second enhancement layer may be defined as 4K-UHD (ultra high definition), a frame rate of 60 Hz, and a bit rate of 27.2 Mbps.

The format, the frame rate, the bit rate, and the like are one example, and can be determined as needed. Also, the number of layers to be used is not limited to the present embodiment, but can be otherwise determined depending on the situation. For example, if the transmission bandwidth is 4 Mbps, the frame rate of the first enhancement layer HD may be reduced to 15 Hz or less.

The scalable video coding method can provide temporal, spatial, picture quality, and viewability scalability by the above-described method in the embodiment of FIG. In this specification, scalable video coding has the same meaning as scalable video encoding in terms of encoding and scalable video decoding in decoding.

Meanwhile, pictures in the same AU (Access Unit) have the same picture order count (POC) value.

The POC is a value that can identify pictures in the same layer and may be a value indicating an output order of decoded pictures output from a decoded picture buffer (DPB).

The AU includes coded pictures with the same output time. For example, in a scalable video coding structure composed of a plurality of layers, when a picture A of the first hierarchy and a picture B of the second hierarchy have the same output time, the picture A of the first hierarchy and the picture B of the second hierarchy are the same AU. &Lt; / RTI >

If the pictures in the same AU have different picture types, pictures in the same AU may have different POC values. Thus, if the POCs of the pictures in the same AU are not the same, a method of setting the pictures in the AU to have the same POC value is needed. In addition, by resetting the POCs of the pictures in the AU, there is a need for a method that can calculate the POC of the reference pictures in the DPB so that existing reference pictures in the DPB can be normally identified.

In the present invention, when pictures having different POCs are included in the same AU in the multi-layer-based image coding / decoding process, the POC values of the pictures in the AU and the POC values of the reference pictures in the DPB are reset, And a method of constructing a reference picture set (RPS) based on the POC.

The present invention relates to encoding and decoding an image including a plurality of layers or views, wherein a plurality of layers or views are divided into a first layer, a second layer, a third layer, an n-th layer, , The third and the n-th time points.

Hereinafter, in an embodiment of the present invention, an image in which the first layer and the second layer exist is described as an example for the sake of explanation, but the same method can be applied to an image in which layers or viewpoint exist. Also, the first layer may be represented as a base layer or a base layer or a reference layer, and the second layer may be expressed as an enhancement layer or an enhancement layer or a current layer.

4 schematically shows a method of reconstructing a POC of pictures in a scalable video coding structure including a plurality of layers according to an embodiment of the present invention and constructing a reference picture set for inter prediction based on the POC of the reconstructed pictures It is a flowchart. The method of FIG. 4 can be performed in the image encoding apparatus of FIG. 1 and the image decoding apparatus of FIG. 2 described above.

Referring to FIG. 4, the encoding / decoding apparatus calculates a POC value of a current picture to be coded / decoded (hereinafter referred to as a current picture) (S410).

As described above, the POC is an identifier for identifying pictures in a layer having the same layer identifier (nuh_layer_id) value in a coded video bit stream, and may be a value indicating an output order of pictures output from the DPB .

For example, the value of the POC may increase as the order output from the DPB is delayed, and the POC value may be 0 in the case of a specific picture.

The specific picture may be an IRAP (Intra Random Access Point) picture that becomes the first picture in the bitstream in decoding order, and the POC value of the IRAP picture may be zero. In other words, since the IRAP picture is a picture that can be decoded without decoding the picture prior to the IRAP picture in the decoding order, the PAP value can be 0 in the IRAP picture. The IRAP picture is a picture to be a random access point, and includes only an I (intra) slice (a slice decoded using only intra prediction), an instantaneous decoding refresh (IDR) picture, a clean random access (CRA) ) Picture. The IDR picture may be the first picture in the bitstream in the decoding order, or may be in the middle of the bitstream. The CRA picture may be the first picture in the bitstream in the decoding order, or may be in the middle of the bitstream for normal play. A BLA picture has functions and properties similar to those of a CRA picture, and refers to a picture that exists in the middle of a bitstream as a random access point when a coded picture is spliced or a bitstream is interrupted.

The POC value can be calculated using the most significant bit (MSB) value (POC_MSB) of the POC value and the LSB (least significant bit) value (POC_LSB) of the POC value.

At this time, the POC_LSB value can be transmitted in a slice segment header of the corresponding picture, and the POC_MSB value can be calculated according to the type of the corresponding picture in the following manner.

(1-1) In the case of a non-IRAP picture, that is, a non-IRAP picture

The POC_MSB value of the non-IRAP picture is the POC of the picture (referred to as the previous picture) close to the current picture (the difference between the POC of the current picture is small) and the temporal sublayer identifier (temporal_id) of the previously decoded pictures is 0 (prevPOCLSB) and POC_MSB (prevPOCMSB) of the previous picture obtained by using the LSB value (MaxPicOrderCntLsb) of the maximum POC transmitted from the current picture (prevPOC) and the SPS (Sequence Parameter Sets), and the current picture signaled from the slice segment header of the current picture (Slice_pic_order_cnt_lsb) value of the POC_LSB.

(1-2) In the case of an IRAP picture

The POC value of the IDR picture is always assumed to be '0'.

If the first picture in the bitstream is a CRA picture or a BLA picture, the POC_MSB value of the CRA picture or BLA picture is assumed to be '0', and the value of POC_LSB (slice_pic_order_cnt_lsb) signaled in the slice segment header of the current picture is CRA picture or BLA It can be used as the POC value of a picture.

If the CRA picture is not the first picture in the bitstream, the POC value of the CRA picture can be calculated to be the same as the POC value of the non-IRAP picture.

If there is a picture having a POC different from the POC of the current picture in the AU, that is, if the pictures in the AU have different POC values, the encoding / decoding device resets the POC values so that the pictures in the AU have the same POC value . The process of resetting the POC values of pictures in the AU will be described with reference to FIGS. 5 and 6. FIG.

FIG. 5 is an example of a scalable video structure including a plurality of layers shown to explain a process of resetting POC values of pictures in an AU according to an embodiment of the present invention.

The scalable video shown in FIG. 5 may be an image including a first layer (Layer 0) and a second layer (Layer 1). For example, the first layer (Layer 0) may be a lower layer and the second layer (Layer 1) may be an upper layer. The second layer (Layer 1) may provide a higher scalability than the first layer (Layer 0).

Referring to FIG. 5, when there are IRAP pictures and non-IRAP pictures in the same AU, as in AU 'A' and AU 'B', pictures in the same AU may have different POC values.

At this time, the encoding / decoding apparatus can reset the POC values of the pictures in the AU so that all the pictures in the AU have the same POC value. For example, the encoding / decoding apparatus can reset the POC value of the picture to a predetermined value. The predetermined value may be '0'.

The encoding apparatus can signal to the decoding apparatus that the POC value of the picture has been reset to a predetermined value (e.g., 0). For example, the encoding apparatus can transmit information indicating whether or not the POC value of the picture has been reset to 0 to the decoder through the slice segment header.

Tables 1 and 2 are examples of slice segment header syntax for signaling POC reset information indicating whether the POC value of a picture according to an embodiment of the present invention has been reset to zero.

Referring to Table 1 and Table 2, poc_reset_flag indicates whether the POC value of the current picture is reset to 0 or not. For example, if the value of poc_reset_flag is 1, it indicates that the POC value of the current picture has been reset to 0, and if the value of poc_reset_flag is 0, it indicates that the POC value of the current picture has not been reset to 0.

The poc_reset_flag can be transmitted through the slice segment header according to the value of cross_layer_irap_aligned_flag signaled in the VPS (Video Parameter Sets) extension. For example, if the value of cross_layer_irap_aligned_flag signaled in the VPS extension is zero, poc_reset_flag may be transmitted via the slice segment header.

The cross_layer_irap_aligned_flag is information indicating that the picture B in the same AU belonging to the reference layer of the layer A is an IRAP picture when the picture A of the layer A in the AU is an IRAP picture. For example, when the value of cross_layer_irap_aligned_flag is 1, it can be informed that pictures in the AU are configured as IRAP pictures when there is an IRAP picture in the AU. At this time, the network abstraction layer (NAL) unit types of the IRAP pictures in the same AU may all be the same.

If the current picture is an IDR picture, poc_reset_flag may not be signaled.

If poc_reset_flag does not exist, the value of poc_reset_flag can be deduced to zero.

poc_reset_flag can be defined as a rule that all slices constituting a picture must have the same value.

Referring again to FIG. 5, AU 'A' includes an IRAP picture of a first layer (e.g., an IDR picture) and a non-IRAP picture of a second layer (Layer 1). As described above, since the POC value of the IDR picture is 0, the POC value of the IDR picture of the first layer (Layer 0) can be derived as zero. The POC value of the non-IRAP picture of Layer 2 can be calculated using the MSB and the LSB of the POC value as described above ((1-1), (1-2) method) Can be derived to a non-zero value. In other words, when at least one picture in AU 'A' has a POC of 0 as an IRAP picture and the POC of the remaining picture is not 0, the pictures in AU 'A' have different POC values, POC reset information (for example, poc_reset_flag) indicating whether or not the POC value of the pictures in the AU 'A' is reset and the POC value of the pictures in the AU 'A' is reset to 0 is set and transmitted through the slice segment header Signaling to the decoding apparatus.

For example, since the POC value of the IDR picture of the first layer (Layer 0) in the AU 'A' is 0, the encoder does not need to reset the POC value of the IDR picture to 0 and the poc_reset_flag value should not be set to 1 . Since the POC value of the non-IRAP picture of the second layer (Layer 1) in the AU 'A' is not 0, the coding apparatus sets the POC of the IDR picture of the first layer (Layer 0) The POC value of the non-IRAP picture of the second layer (Layer 1) can be reset to 0, and the poc_reset_flag value can be set to 1.

AU 'B' includes a Non-IRAP picture of a first layer and an IRAP picture of a second layer (e.g., a CRA picture). The non-IRAP picture of the first layer and the CRA picture of the second layer can be calculated using the MSB and the LSB of the POC value as described above (see (1-1), 1-2) method, for example, a non-zero value. In this case, since the pictures in AU 'B' may have different POC values, the encoder can reset the POC values of the pictures in AU 'B' and reset the POC value of pictures in AU 'B' POC reset information (e.g., poc_reset_flag) indicating whether or not the POC is reset can be set and signaled to the decoding apparatus through the slice segment header.

For example, if the non-IRAP picture of the first layer (Layer 0) and the CRA picture of the second layer (Layer 1) in the AU 'B' have POC values other than 0 and have different POC values, The encoding apparatus resets the POC of the non-IRAP picture of the first layer (Layer 0) and the POC of the CRA picture of the second layer (Layer 1) to 0 in order to equalize the POC values of the pictures in the AU 'B' , Poc_reset_flag of the non-IRAP picture of the first layer (Layer 0) and poc_reset_flag of the CRA picture of the second layer (Layer 1) can be set to 1.

On the other hand, the decoding apparatus receives the slice segment header from the encoding apparatus, and based on the POC reset information (for example, poc_reset_flag) indicating whether or not the POC value of the current picture is reset to 0, which is parsed from the slice segment header, The POC value of the picture can be reset to zero. At this time, if there are reference pictures in the DPB for the current picture, it is necessary to reset the POC value of the reference pictures in the DPB referenced by the current picture as the POC value of the current picture is reset. The decoding apparatus can calculate the POC value of the reference pictures in the DPB in the same manner as in the embodiment of FIG.

6 illustrates a process of resetting the POC value of the reference pictures in the DPB based on the POC reset information (e.g., poc_reset_flag) indicating whether or not the POC value of the current picture is reset to 0 according to the embodiment of the present invention Is a drawing that is shown for illustrative purposes.

Referring to FIG. 6, when the poc_reset_flag value parsed in the slice segment header is 1, that is, when the POC reset information indicates that the POC value of the current picture has been reset to 0, the decoding apparatus stores the POC value of the reference picture in the DPB as the current picture Lt; RTI ID = 0.0 > POC < / RTI >

For example, the decoding apparatus can calculate and decode the POC value of the current picture using the MSB and the LSB of the POC value (the methods (1-1) and (1-2)) (S610) . In step S620, the decoding apparatus resets the POC value of the reference pictures in the DPB by the POC value of the decoded current picture in step S620, and resets the POC value of the current picture to zero in step S630.

Referring again to FIG. 4, the encoding / decoding apparatus decodes a reference picture set for inter prediction of a current picture based on POC reset information (for example, poc_reset_flag) indicating whether or not the POC value of the current picture is reset to 0 (S420).

The reference picture set refers to a set of reference pictures of the current picture, and may be composed of reference pictures preceding the current picture in decoding order. The reference picture can be used for inter prediction of the current picture.

The reference picture set includes a forward short-term reference picture set (PocStCurrBefore) referenced by the current picture, a reverse short-term reference picture set (PocStCurrAfter) referenced by the current picture, a short reference picture set (PocStFoll) not referred to by the current picture, A long term reference picture set (PocLtCurr) referenced by the picture, and a long term reference picture set (PocLtFoll) not referred to by the current picture.

The encoding / decoding apparatus can derive the POC value of the reference picture constituting the reference picture set differently according to the POC reset information (for example, poc_reset_flag) indicating whether the POC value of the current picture is reset to 0.

(2-1) When the poc_reset_flag value parsed from the slice segment header is 0 (when the POC reset information indicates that the POC value of the current picture is not reset to 0), the encoding / decoding apparatus constructs the current picture The POC values of the reference pictures referred to by the slice can be calculated as follows.

In the case of a short-term reference picture, the POC value of the short-term reference picture can be calculated using the delta_poc value indicating each short-term reference picture signaled in the slice segment header and the POC value of the decoded current picture . In this case, the delta_poc value may be a POC difference value between the current picture and the i-th short-term reference picture, or may be a difference value between the (i + 1) th short-term reference picture and the i-th short-

In the case of a long-term reference picture, a POC_LSB (pocLsbLt [i]) value for indicating the LSB of each long-term reference picture POC signaled in the slice segment header and a MSB (POC_MSB) value of each long- The POC_LSB value or the POC value of the long-term reference picture can be calculated by using the value (delta_poc_msb_cycle_lt [i]) for calculating the POC_LS_Cycle_lt [i] and the POC value and the POC_LSB value of the decoded current picture.

Although a long-term reference picture can be identified by only POC_LSB basically, there may be a case where there are reference pictures having the same POC_LSB of a long-term reference picture among reference pictures. In this case, the value (delta_poc_msb_cycle_lt) for calculating the POC_MSB value of the long reference picture is additionally signaled so that the reference pictures can be distinguished.

In Equation (1), pocLsbLt [i] is the POC_LSB value of the i-th long term reference picture signaled in the slice segment header. PicOrderCntVal is the POC value of the decoded current picture. MaxPicOrderCntLsb is the value signaled in Sequence Parameter Sets (SPS). DeltaPocMsbCyCleLt [i] is a value derived from delta_poc_msb_cycle_lt signaled in the slice segment header, and can be derived as shown in equation (2).

In Equation (2), if (i = 0 || i = = num_long_term_sps) denotes a case where i is the zeroth long reference picture or i is the number of sets of long term reference picture sets in the SPS.

(2-2) When the poc_reset_flag value parsed from the slice segment header is 1 (when the POC reset information indicates that the POC value of the current picture is reset to 0), the encoding / decoding apparatus determines that the slice constituting the current picture is a reference The POC values of the reference pictures that are being processed can be calculated as follows.

In the case of a short-term reference picture, the POC value of the short-term reference picture is calculated by using the delta_poc value indicating each short-term reference picture signaled in the slice segment header and the POC value (= 0) Can be calculated. In this case, the delta_poc value may be a POC difference value between the current picture and the i-th short-term reference picture, or may be a difference value between the (i + 1) th short-term reference picture and the i-th short-

In the case of a long-term reference picture, a difference value poc_lsb (delta_poc_lsb) between the POC_LSB (pocLsbLt) value indicating the LSB of the long term reference picture POC signaled in the slice segment header and the POC_LSB (slice_pic_order_cnt_lsb) The POC_LSB value or the POC value of the long-term reference picture can be calculated by Equation (3). The long reference picture can be distinguished by the PocLt value derived by Equation (3).

In Equation (3), the difference value delta_poc_lsb between the POC_LSB (pocLstLt [i]) of the long term reference picture signaled in the slice segment header and the POC_LSB value of the current picture may have a value within a range from 0 to MaxPicOrderCntLsb-1.

(Delta_poc_lsb) value and a value (delta_poc_msb_cycle_lt) for calculating the POC_MSB value derived by the equation (3) are used in the case where there are reference pictures having the same POC_LSB (pocLsbLt) of the long reference picture among the reference pictures, The POC value of the long-term reference picture can be calculated.

As described above, the encoding / decoding apparatus can calculate the POC value of the reference picture using another method according to POC reset information (for example, poc_reset_flag) indicating whether or not the POC value of the current picture is reset to 0.

The encoding / decoding apparatus can construct a reference picture set based on the POC of the derived short-term reference picture and the POC of the long-term reference picture, and can perform inter-prediction of the current picture using the reference picture set.

FIG. 7 is a diagram illustrating a method for calculating a POC value of long term reference pictures in a DPB according to an embodiment of the present invention.

Referring to FIG. 7, when the poc_reset_flag value parsed in the slice segment header is 1, that is, when the POC reset information indicates that the POC value of the current picture has been reset to 0, the decoding apparatus calculates the POC value and the POC_LSB value of the current picture, The POC value of the long term reference picture in the DPB can be calculated using the information related to the long term reference picture transmitted in the slice segment header of the current picture.

For example, assume that the poc_reset_flag value of the current picture is 1 and the POC value of the current picture is 331. At this time, the POC of the long term reference picture corresponding to i = 2 in the DBP can be calculated as follows. Can be calculated using Equations (3) and (4) described in (2-2).

delta_poc_lsb [2] = PocLsbLt [2] - slice_pic_order_cnt_lsb = 20 - 11 = 9

pocLt [2] = delta_poc_lsb [2] & (MaxPicOrderCntLsb-1) = 9 & (32-1) = 9, where MaxPicOrderCntLsb is 32.

Since delta_poc_msb_present_flag is 1, calculate POC using delta_poc_msb_cycle_lt [i]. Since delta_poc_lsb [i] has a value larger than 0, pocLt = pocLt [2] - (DeltaPocMsbCycle) * (MaxPicOrderCntLsb) = 9-8 * 32 = -247, where DeltaPocMsbCycle can be obtained by Equation (2).

The decoding apparatus resets the POC value of the long-term reference picture corresponding to i = 2 in the DBP to -247, and stores the long-term reference picture corresponding to i = 2 from the pictures in the DBP as the POC value of the re- Can be identified.

The method according to the present invention may be implemented as a program for execution on a computer and stored in a computer-readable recording medium. Examples of the computer-readable recording medium include a ROM, a RAM, a CD- , A floppy disk, an optical data storage device, and the like, and may also be implemented in the form of a carrier wave (for example, transmission over the Internet).

The computer readable recording medium may be distributed over a networked computer system so that computer readable code can be stored and executed in a distributed manner. And, functional programs, codes and code segments for implementing the above method can be easily inferred by programmers of the technical field to which the present invention belongs.

In the above-described embodiments, the methods are described on the basis of a flowchart as a series of steps or blocks, but the present invention is not limited to the order of the steps, and some steps may occur in different orders or simultaneously . It will also be understood by those skilled in the art that the steps depicted in the flowchart illustrations are not exclusive and that other steps may be included or that one or more steps in the flowchart may be deleted without affecting the scope of the invention You will understand.

The foregoing description is merely illustrative of the technical idea of the present invention, and various changes and modifications may be made by those skilled in the art without departing from the essential characteristics of the present invention. Therefore, the embodiments disclosed in the present invention are intended to illustrate rather than limit the scope of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments. The scope of protection of the present invention should be construed according to the claims, and all technical ideas within the scope of the claims should be construed as being included in the scope of the present invention.

Claims

A video decoding method supporting a plurality of layers,
Decoding POC reset information indicating whether a picture order count (POC) of the current picture has been reset to zero;
Calculating a POC value of the current picture based on the POC reset information and a POC value of each of a long term reference picture and a short term reference picture in a DPB (Decoded Picture Buffer) referenced by the current picture; And
And constructing a reference picture set (RPS) for inter-prediction of the current picture based on the POC value of the long-term reference picture and the POC value of the short-term reference picture. .

The method according to claim 1,
And resets the POC value of the current picture to 0 when the POC reset information indicates that the POC of the current picture is reset to zero.

3. The method of claim 2,
When the POC reset information indicates that the POC of the current picture is reset to 0, the POC value of the short-
Wherein the POC difference value between the current picture and the short-term reference picture is calculated using the POC value of the reset current picture and the POC difference value between the current picture and the short-term reference picture.

The method according to claim 1,
When the POC reset information indicates that the POC of the current picture has been reset to 0, the POC value of the long-
And a POC LSB difference value between a POC LSB value for indicating the LSB (Least Significant Bit) of the long term reference picture POC and a POC LSB value for indicating the LSB of the current picture POC. .

5. The method of claim 4,
When there are reference pictures having the same POC LSB value for indicating the LSB of the long term reference picture POC in the DPB, the POC value of the long-
Wherein the POC LSB difference value is calculated using a value used for determining a MSB (Most Significant Bit) value of the POC LSB difference value and the long term reference picture POC.

3. The method of claim 2,
Wherein the POC reset information is information that is signaled by an encoder when an IRAP picture and a non-IRAP picture other than an IRAP picture are included in an AU (Access Unit) Decoding method.

The method according to claim 6,
Wherein the current picture is a Non-IRAP (Non-IRAP) picture included in the AU.

1. An image decoding apparatus for supporting a plurality of layers,
A decoding unit for decoding POC reset information indicating whether a picture order count (POC) of a current picture is reset to 0; And
Calculating a POC value of the current picture and a POC value of each of a long term reference picture and a short term reference picture in a DPB (Decoded Picture Buffer) referenced by the current picture based on the POC reset information,
And a prediction unit configured to construct a reference picture set (RPS) for inter-prediction of the current picture based on the POC value of the long-term reference picture and the POC value of the short-term reference picture. .

9. The method of claim 8,
Wherein the POC reset value of the current picture is reset to 0 when the POC reset information indicates that the POC of the current picture is reset to zero.

10. The method of claim 9,
When the POC reset information indicates that the POC of the current picture is reset to 0, the POC value of the short-
Wherein the POC value of the current picture is calculated using the POC value of the current picture and the POC difference value between the current picture and the short-term reference picture.

9. The method of claim 8,
When the POC reset information indicates that the POC of the current picture has been reset to 0, the POC value of the long-
And a POC LSB difference value between a POC LSB value for indicating the LSB (Least Significant Bit) of the long term reference picture POC and a POC LSB value for indicating the LSB of the current picture POC. .

12. The method of claim 11,
When there are reference pictures having the same POC LSB value for indicating the LSB of the long term reference picture POC in the DPB, the POC value of the long-
And a value used for determining the POC LSB difference value and the most significant bit (MSB) value of the long term reference picture POC.

10. The method of claim 9,
Wherein the POC reset information is information that is signaled by an encoder when an IRAP picture and a non-IRAP picture other than an IRAP picture are included in an AU (Access Unit) Decoding device.

14. The method of claim 13,
Wherein the current picture is a non-IRAP (Non-IRAP) picture included in the AU.