WO2014017809A1

WO2014017809A1 - Method of decoding images and device using same

Info

Publication number: WO2014017809A1
Application number: PCT/KR2013/006596
Authority: WO
Inventors: 이하현; 강정원; 이진호; 최진수; 김진웅
Original assignee: 한국전자통신연구원
Priority date: 2012-07-24
Filing date: 2013-07-23
Publication date: 2014-01-30

Abstract

Scalable video encoding uses interlayer texture prediction, interlayer motion information prediction, and interlayer residual signal prediction in order to remove redundancy from interlayer images. In order to increase the accuracy in interlayer prediction, the present invention may find a reference layer block on a location corresponding to the current target block and a block that is most similar to a sample of the current target block from images of a reference layer and use them as a prediction signal. Also, in interlayer prediction, a prediction signal obtained from an intra-layer image to which the current target block belongs and a prediction signal obtained from a reference layer image may be weighted and then used as a prediction signal.

Description

Image decoding method and apparatus using same

The present invention relates to encoding and decoding processing of an image, and more particularly, to a method and apparatus for encoding and decoding an image that supports a plurality of layers in a bitstream.

Recently, as broadcasting services having high definition (HD) resolution have been expanded not only in Korea but also in the world, many users are accustomed to high resolution and high quality images, and many organizations are accelerating the development of next generation video equipment. In addition, as interest in Ultra High Definition (UHD), which has four times the resolution of HDTV, is increasing along with HDTV, a compression technology for higher resolution and higher quality images is required.

For image compression, an inter prediction technique for predicting pixel values included in a current picture from a previous and / or subsequent picture in time, and predicting pixel values included in a current picture using pixel information in the current picture. An intra prediction technique, an entropy encoding technique of allocating a short code to a symbol with a high frequency of appearance and a long code to a symbol with a low frequency of appearance may be used.

Video compression technology is a technology that provides a constant network bandwidth under a limited operating environment of hardware without considering a fluid network environment. However, a new compression technique is required to compress image data applied to a network environment in which bandwidth changes frequently, and a scalable video encoding / decoding method may be used for this purpose.

In the present invention, in order to obtain the prediction signal of the current target block, not only the reference layer block at the position corresponding to the current target block but also the block most similar to the sample of the current target block can be used as the prediction signal for the image of the reference layer. To make it possible.

In inter-layer prediction, a weighted sum of the prediction signal obtained from the intra-layer image to which the current target block belongs and the prediction signal obtained from the reference layer image is also used as the prediction signal.

Accordingly, the purpose of the present invention is to improve the accuracy of the prediction signal and to improve the encoding and decoding efficiency by minimizing the residual signal.

According to another aspect of the present invention, there is provided a method of decoding an image supporting a plurality of layers, the method comprising: receiving prediction method information on a prediction method of a decoding target block; And generating a prediction signal of the target block based on the received information, wherein the prediction method information may include predicting the target block using a reconstructed lower layer.

The generating of the prediction signal may perform motion compensation in the lower layer direction.

The prediction method information may include a motion vector derived from motion prediction performed on a lower layer image decoded by an encoder.

The generating of the prediction signal may generate a reconstruction value of a reference block corresponding to the target block in the lower layer as the prediction signal.

The generating of the prediction signal may perform motion compensation and reconstruction of a reference picture in the same layer as the target block and a reconstructed image of the layer referenced by the current decoding target block.

The generating of the prediction signal may obtain a weighted sum of the prediction signal obtained from the forward reference picture and the prediction signal obtained from the lower layer reference picture.

The generating of the prediction signal may obtain a weighted sum of the prediction signal obtained from the backward reference picture and the prediction signal obtained from the lower layer reference picture.

The generating of the prediction signal may obtain a weighted sum of the prediction signal obtained from the forward reference picture, the prediction signal obtained from the backward reference picture, and the prediction signal obtained from the lower layer reference picture.

The generating of the prediction signal may obtain a weighted sum of the prediction signal obtained from the reference sample included in the reconstructed neighboring block adjacent to the target block and the prediction signal obtained from the lower layer reference picture.

The prediction method information is information indicating one of an intra prediction method, an inter prediction method, a lower layer direction prediction method, and a prediction method using reconstructed reference pictures of the same layer and a lower layer with respect to the prediction method of the target block. It may further include.

According to another embodiment of the present invention, an image decoding apparatus supporting a plurality of layers includes: a receiving unit receiving prediction method information on a prediction method of a target block to be decoded; The prediction unit may generate a prediction signal of the target block based on the received information, and the prediction method information may include predicting the target block using a reconstructed lower layer.

According to an embodiment of the present invention, in obtaining the prediction signal of the current target block, not only the reference layer block at the position corresponding to the current target block, but also the block most similar to the sample of the current target block for the image of the reference layer. An image decoding method and apparatus using the same are provided that can find and use a prediction signal.

In inter-layer prediction, there is provided an image decoding method and an apparatus using the same, in which a weighted sum of a prediction signal obtained from an intra-layer image to which a current target block belongs and a prediction signal obtained from a reference layer image can also be used as a prediction signal.

As a result, by increasing the accuracy of the prediction signal, the residual signal is minimized to improve the encoding and decoding efficiency.

1 is a block diagram illustrating a configuration of an image encoding apparatus according to an embodiment.

2 is a block diagram illustrating a configuration of an image decoding apparatus according to an embodiment.

3 is a conceptual diagram schematically illustrating an embodiment of a scalable video coding structure using multiple layers to which the present invention can be applied.

4 is a diagram illustrating an embodiment of an intra prediction mode.

5 is a diagram illustrating an embodiment of neighboring blocks and neighboring samples used in an intra prediction mode.

6 is a conceptual diagram illustrating generation of a prediction signal using a reference layer according to an embodiment of the present invention.

7 is a conceptual diagram illustrating generation of a prediction signal using a reference layer according to another embodiment of the present invention.

8 is a control flowchart illustrating a method of generating a prediction signal of a target block according to the present invention.

EMBODIMENT OF THE INVENTION Hereinafter, embodiment of this invention is described concretely with reference to drawings. In describing the embodiments of the present specification, when it is determined that a detailed description of a related well-known configuration or function may obscure the gist of the present specification, the detailed description thereof will be omitted.

When a component is said to be “connected” or “connected” to another component, it may be directly connected to or connected to that other component, but it may be understood that another component may exist in between. Should be. In addition, the description "include" a specific configuration in the present invention does not exclude a configuration other than the configuration, it means that additional configuration may be included in the scope of the technical spirit of the present invention or the present invention.

Terms such as first and second may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component.

In addition, the components shown in the embodiments of the present invention are shown independently to represent different characteristic functions, and do not mean that each component is made of separate hardware or one software component unit. In other words, each component is included in each component for convenience of description, and at least two of the components may be combined into one component, or one component may be divided into a plurality of components to perform a function. Integrated and separate embodiments of the components are also included within the scope of the present invention without departing from the spirit of the invention.

In addition, some of the components may not be essential components for performing essential functions in the present invention, but may be optional components for improving performance. The present invention can be implemented including only the components essential for implementing the essentials of the present invention except for the components used for improving performance, and the structure including only the essential components except for the optional components used for improving performance. Also included in the scope of the present invention.

1 is a block diagram illustrating a configuration of an image encoding apparatus according to an embodiment. A scalable video encoding / decoding method or apparatus may be implemented by an extension of a general video encoding / decoding method or apparatus that does not provide scalability, and the block diagram of FIG. 1 is scalable. An embodiment of an image encoding apparatus that may be the basis of a video encoding apparatus is illustrated.

Referring to FIG. 1, the image encoding apparatus 100 may include a motion predictor 111, a motion compensator 112, an intra predictor 120, a switch 115, a subtractor 125, and a converter 130. And a quantization unit 140, an entropy encoding unit 150, an inverse quantization unit 160, an inverse transform unit 170, an adder 175, a filter unit 180, and a reference image buffer 190.

The image encoding apparatus 100 may perform encoding in an intra mode or an inter mode on an input image and output a bit stream. Intra prediction means intra prediction and inter prediction means inter prediction. In the intra mode, the switch 115 is switched to intra, and in the inter mode, the switch 115 is switched to inter. The image encoding apparatus 100 may generate a prediction block for an input block of an input image and then encode a difference between the input block and the prediction block.

In the intra mode, the intra predictor 120 may generate a prediction block by performing spatial prediction using pixel values of blocks that are already encoded around the current block.

In the inter mode, the motion predictor 111 may obtain a motion vector by searching for a region that best matches an input block in the reference image stored in the reference image buffer 190 during the motion prediction process. The motion compensator 112 may generate a prediction block by performing motion compensation using the motion vector and the reference image stored in the reference image buffer 190.

The subtractor 125 may generate a residual block by the difference between the input block and the generated prediction block. The transform unit 130 may output a transform coefficient by performing transform on the residual block. The quantization unit 140 may output the quantized coefficient by quantizing the input transform coefficient according to the quantization parameter.

The entropy encoding unit 150 entropy encodes a symbol according to a probability distribution based on values calculated by the quantization unit 140 or encoding parameter values calculated in the encoding process, thereby generating a bit stream. You can print The entropy encoding method is a method of receiving a symbol having various values and expressing it in a decodable column while removing statistical redundancy.

Here, the symbol means a syntax element, a coding parameter, a residual signal value, or the like that is to be encoded / decoded. Encoding parameters are parameters necessary for encoding and decoding, and may include information that may be inferred during encoding or decoding, as well as information encoded by an encoder and transmitted to a decoder, such as syntax elements. Means necessary information. Coding parameters may be, for example, intra / inter prediction modes, moving / motion vectors, reference picture indexes, coding block patterns, presence or absence of residual signals, transform coefficients, quantized transform coefficients, quantization parameters, block sizes, block partitioning information, or the like. May include statistics. In addition, the residual signal may mean a difference between the original signal and the prediction signal, and a signal in which the difference between the original signal and the prediction signal is transformed or a signal in which the difference between the original signal and the prediction signal is converted and quantized It may mean. The residual signal may be referred to as a residual block in block units.

When entropy encoding is applied, a small number of bits are allocated to a symbol having a high probability of occurrence and a large number of bits are allocated to a symbol having a low probability of occurrence, whereby the size of the bit string for the symbols to be encoded is increased. Can be reduced. Therefore, compression performance of image encoding may be increased through entropy encoding.

For entropy coding, coding methods such as exponential golomb, context-adaptive variable length coding (CAVLC), and context-adaptive binary arithmetic coding (CABAC) may be used. For example, the entropy encoder 150 may store a table for performing entropy encoding, such as a variable length coding (VLC) table, and the entropy encoder 150 may store the stored variable length encoding. Entropy encoding may be performed using the (VLC) table. In addition, the entropy encoder 150 derives a binarization method of a target symbol and a probability model of a target symbol / bin, and then performs entropy encoding using the derived binarization method or a probability model. You may.

The quantized coefficients may be inversely quantized by the inverse quantizer 160 and inversely transformed by the inverse transformer 170. The inverse quantized and inverse transformed coefficients are added to the prediction block through the adder 175 and a reconstruction block can be generated.

The reconstruction block passes through the filter unit 180, and the filter unit 180 applies at least one or more of a deblocking filter, a sample adaptive offset (SAO), and an adaptive loop filter (ALF) to the reconstructed block or reconstructed picture. can do. The reconstructed block that has passed through the filter unit 180 may be stored in the reference image buffer 190.

2 is a block diagram illustrating a configuration of an image decoding apparatus according to an embodiment. As described above with reference to FIG. 1, a scalable video encoding / decoding method or apparatus may be implemented by extension of a general video encoding / decoding method or apparatus that does not provide scalability, and the block diagram of FIG. 2 is scalable video decoding. An embodiment of an image decoding apparatus that may be the basis of an apparatus is shown.

2, the image decoding apparatus 200 may include an entropy decoder 210, an inverse quantizer 220, an inverse transformer 230, an intra predictor 240, a motion compensator 250, and a filter. 260 and a reference picture buffer 270.

The image decoding apparatus 200 may receive a bitstream output from the encoder and perform decoding in an intra mode or an inter mode, and output a reconstructed image, that is, a reconstructed image. In the intra mode, the switch may be switched to intra, and in the inter mode, the switch may be switched to inter. The image decoding apparatus 200 may generate a reconstructed block, that is, a reconstructed block by obtaining a residual block reconstructed from the received bitstream, generating a prediction block, and adding the reconstructed residual block and the prediction block.

The entropy decoder 210 may entropy decode the input bitstream according to a probability distribution to generate symbols including symbols in the form of quantized coefficients. The entropy decoding method is a method of generating each symbol by receiving a binary string. The entropy decoding method is similar to the entropy coding method described above.

The quantized coefficients are inversely quantized by the inverse quantizer 220 and inversely transformed by the inverse transformer 230, and as a result of the inverse quantization / inverse transformation of the quantized coefficients, a reconstructed residual block may be generated.

In the intra mode, the intra predictor 240 may generate a predictive block by performing spatial prediction using pixel values of an already encoded block around the current block. In the inter mode, the motion compensator 250 may generate a prediction block by performing motion compensation using the motion vector and the reference image stored in the reference image buffer 270.

The reconstructed residual block and the prediction block are added through the adder 255, and the added block passes through the filter unit 260. The filter unit 260 may apply at least one or more of the deblocking filter, SAO, and ALF to the reconstructed block or the reconstructed picture. The filter unit 260 outputs a reconstructed image, that is, a reconstructed image. The reconstructed picture may be stored in the reference picture buffer 270 to be used for inter prediction.

The entropy decoder 210, the inverse quantizer 220, the inverse transformer 230, the intra predictor 240, the motion compensator 250, and the filter 260 included in the image decoding apparatus 200. And components directly related to the decoding of an image in the reference image buffer 270, for example, an entropy decoder 210, an inverse quantizer 220, an inverse transformer 230, an intra predictor 240, and motion compensation. The unit 250, the filter unit 260, and the like may be distinguished from other components and expressed as a decoder or a decoder.

Also, the image decoding apparatus 200 may further include a parsing unit (not shown) which parses information related to an encoded image included in a bitstream. The parser may include the entropy decoder 210 or may be included in the entropy decoder 210. Such a parser may also be implemented as one component of the decoder.

3 is a conceptual diagram schematically illustrating an embodiment of a scalable video coding structure using multiple layers to which the present invention can be applied. In FIG. 3, a GOP (Group of Picture) represents a picture group, that is, a group of pictures.

In order to transmit image data, a transmission medium is required, and its performance varies depending on the transmission medium according to various network environments. A scalable video coding method may be provided for application to such various transmission media or network environments.

The scalable video coding method is a coding method that improves encoding / decoding performance by removing redundancy between layers by using texture information, motion information, and residual signals between layers. The scalable video coding method may provide various scalability in terms of spatial, temporal, and image quality according to ambient conditions such as a transmission bit rate, a transmission error rate, and a system resource.

Scalable video coding may be performed using multiple layers structure to provide a bitstream applicable to various network situations. For example, the scalable video coding structure may include a base layer that compresses and processes image data by using a general image encoding method, and compresses the image data by using the encoding information of the base layer and a general image encoding method together. May include an enhancement layer for processing.

In this case, a layer is an image and a bit divided based on spatial (eg, image size), temporal (eg, coding order, image output order, frame rate), image quality, complexity, and the like. Means a set of bitstreams. In addition, the base layer may mean a lower layer, a reference layer or a base layer, and the enhancement layer may mean an upper layer and an enhancement layer. In addition, the plurality of layers may have a dependency between each other.

Referring to FIG. 3, for example, the base layer may be defined as a standard definition (SD), a frame rate of 15 Hz, and a 1 Mbps bit rate, and the first enhancement layer may be a high definition (HD), a frame rate of 30 Hz, and a 3.9 Mbps bit rate. The second enhancement layer may be defined as an ultra high definition (4K-UHE), a frame rate of 60 Hz, and a bit rate of 27.2 Mbps. The format, frame rate, bit rate, etc. are exemplary and may be determined differently as necessary. In addition, the number of hierarchies used is not limited to this embodiment and may be determined differently according to a situation.

For example, if the transmission bandwidth is 4 Mbps, the frame rate of the first enhancement layer HD may be reduced and transmitted at 15 Hz or less. The scalable video coding method can provide temporal, spatial and image quality scalability by the method described above in the embodiment of FIG. 3.

Scalable video coding has the same meaning as scalable video coding from a coding point of view and scalable video decoding from a decoding point of view.

Hereinafter, a prediction block, ie, a prediction signal, of a block that is to be encoded and decoded in a higher layer (hereinafter, referred to as a current block or a target block) of a scalable video, that is, an encoding and decoding method of an image using a multi-layer structure, is generated. See how to do it. The lower layer referred to by the upper layer is expressed as a reference layer below.

First, a prediction signal of a target block may be generated through normal intra prediction.

In intra prediction, a prediction mode may be largely classified into a directional mode and a non-directional mode according to the direction in which reference pixels used for pixel value prediction are located and a prediction method. For convenience of description, this prediction mode may be specified using a predetermined angle and mode number.

4 is a diagram illustrating an example of an intra prediction mode.

The number of intra prediction modes may be fixed to a predetermined number regardless of the size of the prediction block, and may be fixed to 35 as shown in FIG. 4.

Referring to FIG. 4, the intra prediction mode may include 33 directional prediction modes and two non-directional modes. The directional mode includes the intra prediction mode 34 in the clockwise direction starting with the intra prediction mode 2 in the lower left direction.

The number of prediction modes may vary depending on whether the color component is a luma signal or a chroma signal. In addition, “Intra_FromLuma” of FIG. 4 may refer to a specific mode for predicting a color difference signal from a luminance signal.

Planar mode (Intra_Planar) and DC mode (Intra_DC), which are non-directional modes, may be allocated to

intra prediction modes

0 and 1, respectively.

In DC mode, a single fixed value, for example, the average value of surrounding reconstructed pixel values is used as a prediction value, and in Planer mode, vertical interpolation and horizontal use are performed using vertically adjacent pixel values of the current block and horizontally adjacent pixel values. Directional interpolation is performed, and their average value is used as the predicted value.

The directional mode Intra_Angular refers to modes indicating a corresponding direction at an angle between a reference pixel located in a predetermined direction and a current pixel, and may include a horizontal mode and a vertical mode. In the vertical mode, vertically adjacent pixel values of the current block may be used as prediction values of the current block, and in the horizontal mode, horizontally adjacent pixel values may be used as prediction values of the current block.

The size of the prediction block composed of the prediction value or the prediction signal may be a square such as 4x4, 8x8, 16x16, 32x32, 64x64, or a rectangle of 2x8, 4x8, 2x16, 4x16, 8x16, or the like. The size of the prediction block may be at least one of a coding block (CB), a prediction block (PB), and a transform block (TB).

Intra-decoding / decoding may use sample values or encoding parameters included in neighboring reconstructed blocks. 5 is a diagram illustrating an embodiment of neighboring blocks and neighboring samples used in an intra prediction mode.

The neighboring reconstructed block may be, for example, a block EA, EB, EC, ED, or EG with reference to FIG. 5 according to the encoding / decoding order, and 'above', 'above_left', 'above_right', ' The sample values corresponding to 'left' and 'bottom_left' may be reference samples used for intra prediction of the target block. The encoding parameter may be at least one of an encoding mode (intra picture or inter picture), an intra picture prediction mode, an inter picture prediction mode, a block size, a quantization parameter (QP), and a coded block flag (CBF).

In FIG. 5, each block may be divided into smaller blocks, and even in this case, inside / decoding may be performed using sample values or encoding parameters corresponding to each divided block.

In addition, the prediction signal of the target block may be generated through inter prediction.

The inter prediction may use at least one of a previous picture or a subsequent picture of the current picture as a reference picture and perform prediction on the current block based on the reference picture. An image used for prediction of the current block is called a reference picture or a reference frame.

The region in the reference picture may be represented using a reference picture index refIdx, a motion vector, etc. indicating the reference picture.

The inter prediction may select a reference picture corresponding to the current block in the reference picture and the reference picture, and generate a prediction block for the current block.

In the inter prediction, the encoder and the decoder may derive the motion information of the current block and then perform the inter prediction and / or motion compensation based on the derived motion information. In this case, the encoder and the decoder may extract motion information of a coll block corresponding to the current block in a neighboring block and / or a coll picture that has already been restored. By using this, the encoding / decoding efficiency can be improved.

Here, the reconstructed neighboring block is a block in the current picture that is already encoded and / or decoded and reconstructed, and may include a block adjacent to the current block and / or a block located at an outer corner of the current block. In addition, the encoder and the decoder may determine a predetermined relative position based on a block existing at a position spatially corresponding to the current block in the call picture, and determine the predetermined relative position (the position corresponding to the current block spatially). The call block may be derived based on the location of the inside and / or outside of the block existing in the. Here, as an example, the call picture may correspond to one picture among the reference pictures included in the reference picture list.

In inter prediction, a prediction block may be generated such that a residual signal with a current block is minimized and a motion vector size is also minimized.

Meanwhile, the motion information derivation scheme may vary depending on the prediction mode of the current block. Prediction modes applied for inter prediction may include Advanced Motion Vector Predictor (AMVP), merge, and the like.

For example, when an advanced motion vector predictor (AMVP) is applied, the encoder and the decoder may generate a predicted motion vector candidate list using the motion vector of the reconstructed neighboring block and / or the motion vector of the call block. That is, the motion vector of the reconstructed neighboring block and / or the motion vector of the call block may be used as the prediction motion vector candidate. The encoder may transmit a predicted motion vector index indicating an optimal predicted motion vector selected from the predicted motion vector candidates included in the list to the decoder. In this case, the decoder may select the predicted motion vector of the current block among the predicted motion vector candidates included in the predicted motion vector candidate list by using the predicted motion vector index.

The encoder can obtain a motion vector difference (MVD) between the motion vector of the current block and the predictive motion vector, and can encode the same and transmit the same to the decoder. In this case, the decoder may decode the received motion vector difference and derive the motion vector of the current block through the sum of the decoded motion vector difference and the predicted motion vector.

The encoder may also transmit a reference picture index or the like indicating the reference picture to the decoder.

The decoder may predict the motion vector of the current block using the motion information of the neighboring block, and may derive the motion vector for the current block using the residual received from the encoder. The decoder may generate a prediction block for the current block based on the derived motion vector and the reference picture index information received from the encoder.

As another example, when merge is applied, the encoder and the decoder may generate the merge candidate list using the motion information of the reconstructed neighboring block and / or the motion information of the call block. That is, the encoder and the decoder may use the motion information of the reconstructed neighboring block and / or the call block as a merge candidate for the current block.

The encoder may select a merge candidate capable of providing an optimal encoding efficiency among the merge candidates included in the merge candidate list as motion information for the current block. In this case, a merge index indicating the selected merge candidate may be included in the bitstream and transmitted to the decoder. The decoder may select one of the merge candidates included in the merge candidate list using the transmitted merge index, and determine the selected merge candidate as motion information of the current block. Therefore, when the merge mode is applied, the motion information of the restored neighboring block and / or the call block may be used as the motion information of the current block. The decoder may reconstruct the current block by adding the prediction block and the residual transmitted from the encoder.

In the above-described AMVP and merge mode, the motion information of the reconstructed neighboring block and / or the motion information of the call block may be used to derive the motion information of the current block.

In the case of a skip mode, which is one of other modes used for inter prediction, information of neighboring blocks may be used as is in the current block. Therefore, in the skip mode, the encoder does not transmit syntax information such as residual to the decoder other than information indicating which block motion information to use as the motion information of the current block.

The encoder and the decoder may generate the prediction block of the current block by performing motion compensation on the current block based on the derived motion information. Here, the prediction block may mean a motion compensated block generated as a result of performing motion compensation on the current block. Also, the plurality of motion compensated blocks may constitute one motion compensated image.

The decoder may check and derive motion information necessary for inter prediction of the current block, for example, a skip flag, a merge flag, and the like, received from the encoder and information corresponding to the motion vector, the reference picture index, and the like.

The processing unit in which the prediction is performed and the processing unit in which the prediction method and the details are determined may be different. For example, a prediction mode may be determined in units of PUs, and prediction may be performed in units of TUs, or a prediction mode may be determined in units of PUs, and intra prediction may be performed in units of TUs.

In an image supporting a multi-layer, a prediction signal of a target block of a higher layer may use a reconstructed image of a lower layer, that is, a reference layer, to which the target block may refer, in addition to the above-described intra prediction method and inter screen prediction method.

As shown, the prediction signal of the target block 601 to be currently encoded or decoded in the upper layer 600, that is, the sample value of the prediction block is referred to as Pc [x, y], and the reconstruction of the reference layer 610 is performed. When the restored value of the captured image is called P2 [x, y], Pc [x, y] may be generated based on P2 [x, y].

The reference layer 610 may be upsampled according to the resolution of a higher layer after reconstruction, and P2 [x, y] may be an upsampled sample value.

When referring to the reference block 615 at a position corresponding to the position of the current target block 601 in the reference layer 610, P2 [x, y] may be a reconstructed sample value of the reference block 615.

A method of obtaining a prediction signal from the reconstructed reference layer 610 is to apply an inter prediction method with reference to the reconstructed reference layer 610 as shown in FIG. 6. That is, the encoder performs motion prediction and motion compensation on the reference layer 610, and uses the resulting prediction signal as the prediction signal of the current encoding target block. The decoder may perform motion compensation using a motion vector derived from motion prediction performed on the lower layer image decoded by the encoder.

The encoder of the image may encode and transmit the obtained motion information, and the decoder may perform inter prediction by referring to the reference layer 610 by decoding the received motion information. The motion information may be a reference picture index refIdx and a motion vector MV indicating the reference picture.

Meanwhile, when the reference layer 610 is used for inter prediction, a reference picture index refIdx indicating a reference picture among motion information to be encoded may not be transmitted.

The encoder predicts the motion vector of the current target block by using the motion information of neighboring blocks adjacent to the target block 601, and then encodes a difference value between the motion vector of the target block and the predicted motion vector and then moves the motion vector MV_2 [x. , y]). In this case, the neighboring blocks used for the motion prediction of the target block 601 may be blocks encoded with the reconstructed image of the reference layer. That is, the encoder can derive the motion vector of the target block 601 by using the motion information of the neighboring block encoded as the reconstructed image of the reference layer among the neighboring blocks. In this case, the encoder may encode information about which block motion information is used and transmit the encoded information to the decoder.

If none of the neighboring blocks is encoded with the reconstructed picture of the reference layer, (0,0) may be used as a motion vector prediction candidate.

In an image supporting a plurality of layers in the bitstream, when the prediction signal of the current target block is obtained through inter-layer prediction, prediction may be performed using only a reference layer block at a position corresponding to the current target block. In general, since the size of an image may be different between layers, an upsampling process is performed on a reference layer. When upsampling is performed, the phases of pixels between layers may be different, and thus, when only the reference layer block corresponding to the current target block is used, the prediction error component due to the phase difference may not be reduced. have. In order to overcome this problem, the present embodiment does not use only the corresponding block of the reference layer, but performs motion prediction on the reference layer, thereby obtaining a prediction value closer to the target block to be encoded and decoded.

Meanwhile, the encoder may use the reconstructed sample value of the reference block 615 as the prediction signal of the target block 601 in addition to a method of obtaining a prediction signal through the motion prediction from the reconstructed image of the reference layer. If this is expressed as an expression, it is as follows.

Pc [x, y] = P2 [x, y]

The encoder may generate a prediction signal through motion prediction referring to the reconstructed reference layer 610, or may use the reconstructed sample value of the reference block 615 corresponding to the target block 601 as a prediction signal. When the coder generates a prediction signal using the reference layer, the coder may code information about which method is used and transmit the coded information to the decoder.

According to another embodiment, when encoding and decoding the target block, the prediction signal of the current encoding target block may be obtained by using not only an image in the layer to which the target block belongs but also a reconstructed image of the reference layer.

Referring to FIG. 7, a target block 701 to be encoded and decoded in the current picture 700 may refer to a forward reference picture 710 or a backward reference picture 720 belonging to the same layer, and belong to another layer. Reference may be made to the lower layer reference picture 730. The forward reference picture 710, the backward reference picture 720, and the lower layer reference picture 730 may be reconstructed pictures.

When the prediction signal of the target block 701 is referred to as Pc [x, y], Pc [x, y] may be generated in various ways according to the picture to which the target block 701 may refer. The prediction signal Pc [x, y] may be generated using an average value or weighted sum of the prediction values generated from the pictures to which the target block 701 may refer, that is, the weighted average.

(Method 1)

When the prediction signal obtained from the forward reference picture 710 is called P0 [x, y] and the prediction signal obtained from the lower layer reference picture 730 is called P2 [x, y], Pc [x, y] is represented by P0. Can be obtained by weighted sum of [x, y] and P2 [x, y]. An example of the weighted sum is shown in Equation 2.

Pc [x, y] = {(a) P0 [x, y] + (b) * P2 [x, y]} / 2

In Equation 2, (a) and (b) are parameters for weighted summation, and (a) and (b) may have the same value or may have different values. (a) may be larger than (b), and conversely, (b) may be larger than (a). (a) and (b) may be set to enable integer arithmetic, or may be set to a value independent of integer arithmetic. (a) and (b) may be integers or rational numbers.

The encoder may add a predetermined offset value such that the prediction signal Pc [x, y] may be an integer.

The encoder refers to the forward reference picture 710 and the motion vector MV_l0 [x, y] obtained through motion prediction and the motion vector MV_l2 [x, obtained through motion prediction with respect to the lower layer reference picture 730. y]) can be transmitted to the decoder.

If the reference block of the position corresponding to the position of the current target block is obtained from the reconstructed image of the lower layer, and the reconstructed sample value of the reference block is used as the prediction signal of the target block, the encoder transmits motion information about the image of the lower layer. Can be omitted.

(Method 2)

When the prediction signal obtained from the backward reference picture 720 is P1 [x, y], Pc [x, y] is the prediction signal P2 [x, y obtained from P1 [x, y] and the lower layer reference picture 730. ] Can be generated by weighted sum of An example of the weighted sum is shown in Equation 3.

Pc [x, y] = {(a) * P1 [x, y] + (b) * P2 [x, y]} / 2

(a) and (b) are parameters for weighted summation, and (a) and (b) may have the same value or may have different values from each other. (a) may be larger than (b), and conversely, (b) may be larger than (a). (a) and (b) may be set to enable integer arithmetic, or may be set to a value independent of integer arithmetic. (a) and (b) may be integers or rational numbers.

The encoder refers to the backward reference picture 720 and the motion vector MV_l1 [x, y] obtained through motion prediction and the motion vector MV_l2 [x, obtained through motion prediction with respect to the lower layer reference picture 730. y]) can be transmitted to the decoder.

Even in this case, when a reference block of a position corresponding to the position of the current target block is obtained from the reconstructed image of the lower layer, and the reconstructed sample value of the reference block is used as the prediction signal of the target block, the encoder is applied to the image of the lower layer. The transmission of the motion information may be omitted.

(Method 3)

Pc [x, y] is obtained from the prediction signal P0 [x, y] obtained from the forward reference picture 710 and the prediction signal P1 [x, y] obtained from the backward reference picture 720 and the lower layer reference picture 730. It can be derived from the weighted sum of the prediction signals P2 [x, y]. An example of the weighted sum is shown in Equation 4.

(Formula 4)

Pc (x, y) = {(a) * P0 (x, y) + (b) * P1 (x, y) + (c) * P2 (x, y)} / 3

(a), (b) and (c) are parameters for weighted polymerization, and (a), (b) and (c) may have the same value or may have different values from each other. (a), (b) and (c) may be set to enable integer arithmetic, or may be set to a value independent of integer arithmetic. (a), (b) and (c) may be integers or rational numbers.

The encoder refers to the forward reference picture 710 and the backward reference picture 720 to the motion vectors MV_l0 [x, y] and MV_11 [x, y] obtained through motion prediction and the lower layer reference picture 730. The motion vector MV_l2 [x, y] obtained through motion prediction may be transmitted to the decoder.

If a reference block of a position corresponding to the position of the current target block is obtained from the reconstructed image of the lower layer and then the reconstructed sample value of the reference block is used as the prediction signal of the target block, for example, (a) and (b) In this case, the encoder may omit motion information transmission for an image of a lower layer.

(Method 4)

Pc [x, y] is the prediction signal P0 [x, y] obtained from the reference samples included in the reconstructed neighboring block adjacent to the current encoding target block and the prediction signal P2 [x, y] obtained from the lower layer reference picture 730. It can be generated by the weighted sum of. An example of a weighted sum is shown in Equation 5.

Pc [x, y] = {(a) * P0 [x, y] + (b) * P2 [x, y]} / 2

The encoder may encode and transmit motion information MV_l2 [x, y] obtained through motion prediction with respect to the intra prediction mode and the lower layer reference picture 730 obtained from the neighbor reconstruction reference sample.

Meanwhile, even in this case, regardless of the prediction signal P0 [x, y] obtained from the reference samples included in the neighboring block, the reconstructed sample value of the block corresponding to the position of the current target block from the reconstructed image of the lower layer is predicted. In this case, transmission of motion information on a lower layer image may be omitted.

Coefficients for weights (a), (b), and (c) used in Equations 2 to 5 may be signaled using encoding parameters. The encoding parameter may include information that may be inferred in the encoding or decoding process as well as information encoded by the encoder and transmitted to the decoder, such as a syntax element, and refers to information necessary for encoding or decoding an image. do.

Coefficients for (a), (b), (c), etc. for weighted summation are VPS (Video Parameter Set), SPS (Sequence Parameter Set), PPS (Picture Parameter Set), APS (Adaptation Parameter Set), Slice header, etc. It can be included, encoded, and transmitted to the decoder.

Alternatively, the coefficients for (a), (b), (c), etc. for weighted sum may be set according to a convention that allows the encoder and the decoder to use the same coefficient value.

In encoding motion information of a lower layer image, transmission of a reference picture index refIdx indicating a reference picture among motion information may be omitted.

The encoder predicts the motion vector of the current target block by using the motion information of the neighboring blocks adjacent to the target block, and then encodes a difference value between the motion vector of the target block and the predicted motion vector to obtain a motion vector (MV_2 [x, y]). Can be sent as). In this case, the neighboring blocks used for the motion prediction of the target block may be blocks encoded with the reconstructed image of the lower layer. That is, the encoder may derive the motion vector of the target block by using the motion information of the neighboring block encoded as the reconstructed image of the lower layer among the neighboring blocks. In this case, the encoder may encode information about which block motion information is used and transmit the encoded information to the decoder.

If none of the neighboring blocks is encoded with the reconstructed image of the lower layer, (0,0) may be used as the motion vector prediction candidate.

Meanwhile, the encoder may obtain a prediction signal of the current encoding target block by using at least one of the above-described methods of encoding the target block. That is, the encoder is an intra prediction method using a reference sample of the same picture as the target block from a rate-distortion point of view, an inter prediction method using a reference picture of the same layer, a method of performing inter prediction using a lower layer, and a lower one. After performing inter prediction on a plurality of reference pictures included in a layer and a higher layer, an optimal prediction method may be selected from among methods using a weighted sum of the prediction values, and information about the selected method may be encoded and transmitted. .

Information about the selection method may be encoded as shown in Table 1 for the target block in which the intra prediction is not selected as the prediction method. Table 1 shows a syntax (inter_pred_idc) indicating a prediction direction between pictures according to slice types of a higher layer in order to signal a prediction method.

Table 1

In Table 1, the number assigned for each prediction method can vary according to the probability of occurrence, can be assigned a small number for the most frequently occurring prediction method, and can be assigned a large number for the less frequently occurring prediction method. Can be.

Hereinafter, a method of generating a prediction signal in a target block to be decoded by the decoder will be described.

The prediction signal generation method of the current decoding object block may be differently selected according to the information on the prediction method transmitted from the encoder.

When the prediction signal generation method of the current decoding target block is the intra prediction described with reference to FIGS. 4 and 5, the prediction signal may be generated by performing the intra prediction from the neighbor reconstructed sample values of the current target block.

In this case, a prediction signal may be generated by performing a decoding process in a conventional intra prediction method, that is, the current block may be reconstructed by adding a residual transmitted from an encoder to the prediction signal.

In addition, when the prediction signal generation method of the current decoding target block is the above-described inter prediction, the prediction signal may be generated by performing motion compensation with reference to the previous or subsequent images based on the image including the current decoding target block. have.

That is, the decoder may generate a prediction signal by performing a decoding process according to a conventional inter-screen prediction method. The decoder may reconstruct the current block by adding a residual transmitted from the encoder to the prediction signal.

When the prediction signal generation method of the current decoding object block uses the reference layer as shown in FIG. 6, the prediction signal may be generated by performing motion compensation on the reconstructed image of the layer referred to by the current decoding object block.

After decoding the motion information transmitted from the encoder, the decoder may generate a prediction signal by performing motion compensation on the reconstructed image of the reference layer.

In this case, the decoder may configure a motion vector prediction candidate from neighboring blocks of the current decoding target block, similarly to the encoder when decoding the motion information. In this case, only the neighboring block decoded into the reconstructed picture of the reference layer may be used as the prediction candidate. If none of the neighboring blocks is decoded as a reconstructed picture of the reference layer, (0,0) may be used as a motion vector prediction candidate.

The decoder may parse the optimal prediction candidate information transmitted from the encoder and then add the selected motion vector prediction value and the decoded motion vector difference signal to obtain a motion vector value MV_l2 [x, y] used for motion compensation.

If there is an indicator to refer to the same position as the position of the current decoding object block from the encoder, the decoder infers a motion vector of the reconstructed image of the reference layer as (0,0) and determines the position corresponding to the position of the current decoding object block. A prediction signal may be generated from the reference layer reconstruction block.

Alternatively, the decoder may generate a prediction signal from a reference layer reconstruction block at a position corresponding to the position of the current decoding target block according to a predetermined protocol.

The decoder may reconstruct the current block by adding the residual transmitted from the encoder to the prediction signal generated as described above.

According to another embodiment, when the method for generating the prediction signal of the decoding object block uses the image of the same layer and the image of the reference layer as shown in FIG. 7, the decoder refers to the reference image and the current decoding object block in the same layer. The prediction signal may be generated by performing motion compensation on the reconstructed image of the layer.

The decoder decodes the motion information for the reference picture of the same layer or the intra prediction mode and the motion information for the reference layer transmitted from the encoder, and then includes the reference sample included in the motion compensation or neighbor reconstructed block for the reference picture of the same layer. The prediction signal may be generated in the same manner as the encoder by performing motion compensation on the intra prediction and the reference image of the reference layer.

The decoder decodes the motion information or the intra prediction mode for the reference picture of the same layer transmitted from the encoder, and then performs intra prediction and the current decoding target block from the reference samples included in the motion compensation or neighbor reconstructed block for the reference picture. The prediction signal may be generated from the reconstructed block of the reference layer corresponding to the position of to generate the prediction signal in the same manner as the encoder.

For example, when the slice type of the current decoding target block is the EP slice of Table 1 and the value of the decoded information inter_pred_idc is 4, the decoder uses the forward reference picture and the reconstructed picture of the reference layer to predict the prediction signal. Can be generated.

At this time, the motion information to be decoded may include a forward reference picture and motion information of a reference layer.

At this time, the prediction signal of the current decoding target block is Pc [x, y] for the prediction signal P0 [x, y] obtained through motion compensation from the forward reference image and the prediction signal P2 [x obtained for motion compensation from the reference layer image. , y] to obtain the weighted sum.

If there is an indicator to refer to the same position as the position of the current decoding target block from the encoder, the decoder infers a motion vector of the reconstructed image of the reference layer as (0,0) and determines the position corresponding to the position of the current decoding target block. A prediction signal may be generated from the reference layer block.

Alternatively, the decoder may generate a prediction signal from a reference layer block at a position corresponding to the position of the current decoding target block according to a predetermined protocol.

Table 2 is an embodiment of a syntax structure for a coding unit (CU) of a higher layer that can be applied to an image sub-decoding device that encodes / decodes a multi-layer structure according to the present invention.

TABLE 2

Referring to Table 2, adaptive_base_mode_flag may be located in a video parameter set (VPS), a sequence parameter set (SPS), a picture parameter set (PPS), an adaptation parameter set (APS), and a slice header. The base_mode_flag may have a value of "1" or "0".

When adaptive_base_mode_flag has a value of "0", the base_mode_flag value may be determined by the default_base_mode_flag value.

default_base_mode_flag can be located in VPS (Video Parameter Set), SPS (Sequence Parameter Set), PPS (Picture Parameter Set), APS (Adaptation Parameter Set), slice header. If the value is “1”, base_mode_flag is always “1”. Has the value of. When default_base_mode_flag has a value of "0", base_mode_flag always has a value of "0".

When base_mode_flag has a value of “1”, the coding unit may be encoded using a reference layer as illustrated in FIGS. 6 and 7. If the base_mode_flag has a value of "0", the coding unit may be encoded by a general intra picture prediction or inter picture prediction method using the current layer.

Table 3 is an embodiment of a syntax structure for a prediction unit (PU) of a higher layer that can be applied to an image sub-decoding device that encodes / decodes a multi-layer structure according to the present invention.

TABLE 3

Referring to Table 3, when combined_pred_flag [x0] [y0] has a value of “1” in base_mode_flag in a coding unit, if the value is “1”, the prediction signal for the prediction unit is the same as that of FIG. 7. Can be generated. When combined_pred_flag [x0] [y0] has a value of “0”, the prediction signal for the prediction unit may be generated as shown in FIG. 6.

mv_l2_zero_flag may exist in a video parameter set (VPS), a sequence parameter set (SPS), a picture parameter set (PPS), an adaptation parameter set (APS), a slice header, and an encoding unit, and has a value of “1”. In this case, the decoder may infer the motion information of the reconstructed picture of the reference layer as (0,0). In this case, no motion information may be transmitted for the reconstructed picture of the reference layer.

8 is a control flowchart illustrating a method of generating a prediction signal of a target block according to the present invention. For convenience of description, referring to FIG. 8, an example of generating a prediction signal and reconstructing a target block by a decoder will be described.

First, the decoder receives prediction method information based on Tables 2 to 3 as to which of the prediction methods the target block is predicted using (S801).

If the prediction method for the target block is intra prediction, the decoder may generate a prediction signal from the reconstructed sample values in the vicinity of the target block (S803).

The decoder may reconstruct the target block by adding the residual transmitted from the encoder to the generated prediction signal (S804).

On the other hand, if the prediction method for the target block is a normal inter-screen prediction (S805), the decoder may generate a prediction signal by performing motion compensation with reference to previous or subsequent images based on the image including the target block. (S806).

Even in this case, the decoder may reconstruct the target block by adding the residual transmitted from the encoder to the generated prediction signal (S804).

If the prediction method for the target block is a method for performing motion compensation on the reference layer, that is, the restored lower layer (S807), the decoder may generate the prediction signal by performing motion compensation toward the lower layer. (S808).

The motion vector of the motion information received from the encoder for motion estimation and compensation may be any one of motion vectors derived from the neighboring block of the current target block, and the neighboring block includes a block decoded into a reconstructed image of a lower layer. can do.

If the prediction method for the target block uses a picture of the same layer and a picture of a lower layer together (S809), the decoder moves motion from the reference picture in the same layer and the reconstructed picture of the layer referenced by the current decoding target block. The compensation may be performed to generate a prediction signal (S810).

The prediction signal is added to the residual received from the encoder, which becomes a reconstructed value of the target block (S804).

In the above-described embodiment, the methods are described based on a flowchart as a series of steps or blocks, but the present invention is not limited to the order of steps, and any steps may occur in a different order or at the same time than the other steps described above. have. Also, one of ordinary skill in the art appreciates that the steps shown in the flowcharts are not exclusive, that other steps may be included, or that one or more steps in the flowcharts may be deleted without affecting the scope of the present invention. I can understand.

The above-described embodiments include examples of various aspects. While not all possible combinations may be described to represent the various aspects, one of ordinary skill in the art will recognize that other combinations are possible. Accordingly, the invention is intended to embrace all other replacements, modifications and variations that fall within the scope of the following claims.

Claims

In the decoding method of an image supporting a plurality of layers,

Receiving prediction method information on a prediction method of a target block to be decoded;

Generating a prediction signal of the target block based on the received information,

The prediction method information may include predicting the target block using a reconstructed lower layer.
The method of claim 1,

Generating the prediction signal

And image compensation is performed in the lower layer direction.
The method of claim 2,

The prediction method information includes a motion vector derived from motion prediction performed on a lower layer image decoded by an encoder.
The method of claim 1,

Generating the prediction signal

And a reconstruction value of a reference block corresponding to the target block in the lower layer is generated as the prediction signal.
The method of claim 1,

Generating the prediction signal,

And performing motion compensation on the reference picture in the same layer as the target block and the reconstructed image of the layer referenced by the current decoding target block.
The method of claim 5,

Generating the prediction signal,

A weighted sum of a prediction signal obtained from a forward reference picture and a prediction signal obtained from a lower layer reference picture is obtained.
The method of claim 5,

Generating the prediction signal,

A weighted sum of a prediction signal obtained from a backward reference picture and a prediction signal obtained from a lower layer reference picture is obtained.
The method of claim 5,

Generating the prediction signal,

A weighted sum of a prediction signal obtained from a forward reference picture, a prediction signal obtained from a backward reference picture, and a prediction signal obtained from a lower layer reference picture is obtained.
The method of claim 5,

Generating the prediction signal,

And a weighted sum of a prediction signal obtained from a reference sample included in the reconstructed neighboring block adjacent to the target block and a prediction signal obtained from a lower layer reference picture.
The method of claim 1,

The prediction method information is information indicating one of an intra prediction method, an inter prediction method, a lower layer direction prediction method, and a prediction method using reconstructed reference pictures of the same layer and a lower layer with respect to the prediction method of the target block. The video decoding method further comprising.
In the video decoding apparatus supporting a plurality of layers,

A receiving unit for receiving prediction method information on a prediction method of a target block to be decoded;

A prediction unit generating a prediction signal of the target block based on the received information;

The prediction method information may include predicting the target block using a reconstructed lower layer.
The method of claim 11,

And the predictor performs motion compensation in the lower layer direction.
The method of claim 12,

The prediction method information includes a motion vector derived from motion prediction performed on a lower layer image decoded by an encoder.
The method of claim 11,

And the predictor generates the reconstructed value of the reference block corresponding to the target block in the lower layer as the prediction signal.
The method of claim 11,

And the predictor performs motion compensation on the reference picture in the same layer as the target block and the reconstructed image of the layer referenced by the current decoding target block.
The method of claim 15,

And the prediction unit obtains a weighted sum of a prediction signal obtained from a forward reference picture and a prediction signal obtained from a lower layer reference picture.
The method of claim 15,

And the prediction unit obtains a weighted sum of a prediction signal obtained from a backward reference picture and a prediction signal obtained from a lower layer reference picture.
The method of claim 5,

And the prediction unit obtains a weighted sum of a prediction signal obtained from a forward reference picture, a prediction signal obtained from a backward reference picture, and a prediction signal obtained from a lower layer reference picture.
The method of claim 15,

And the prediction unit obtains a weighted sum of a prediction signal obtained from a reference sample included in a reconstructed neighboring block adjacent to the target block and a prediction signal obtained from a lower layer reference picture.
The method of claim 11,

The prediction method information is information indicating one of an intra prediction method, an inter prediction method, a lower layer direction prediction method, and a prediction method using reconstructed reference pictures of the same layer and a lower layer with respect to the prediction method of the target block. The video decoding apparatus further comprises.