WO2013147455A1

WO2013147455A1 - Inter-layer prediction method and apparatus using same

Info

Publication number: WO2013147455A1
Application number: PCT/KR2013/002331
Authority: WO
Inventors: 박준영; 전병문; 박승욱; 임재현; 전용준; 박내리; 김철근
Original assignee: 엘지전자 주식회사
Priority date: 2012-03-29
Filing date: 2013-03-21
Publication date: 2013-10-03

Abstract

An inter-layer prediction method according to the present invention comprises the steps of: generating a differential image for a reference picture by acquiring a difference between a reconstructed image for an enhancement layer of the reference picture to which a current picture refers, and an upsampled image of a reconstructed image for a base layer of the reference picture; clipping, to a clipping range, a pixel value of the differential image that is outside the clipping range, such that the pixel value of the differential image is within a preset clipping range; and inter-predicting the current picture using the clipped differential image. As a result, the present invention provides the method for predicting the interlayer which can reduce storage capacity by storing the image on the basis of the increased bit depth and the apparatus using the same.

Description

Interlayer prediction method and apparatus using same

TECHNICAL FIELD The present invention relates to video compression technology, and more particularly, to a method and apparatus for storing image information in scalable video coding.

Recently, the demand for high resolution and high quality images is increasing in various applications. As an image has a high resolution and high quality, the amount of information on the image also increases.

As information volume increases, devices with various performances and networks with various environments are emerging. With the emergence of devices of varying performance and networks of different environments, the same content is available in different qualities.

In detail, as the video quality of the terminal device can be supported and the network environment is diversified, in general, video of general quality may be used in one environment, but higher quality video may be used in another environment. .

For example, a consumer who purchases video content on a mobile terminal can view the same video content on a larger screen and at a higher resolution through a large display in the home.

In recent years, broadcasts having a high definition (HD) resolution are being serviced, and many users are already accustomed to high resolution and high definition video.Ultra High Definition (UHD) has more than four times the resolution of an HDTV. I am also interested in the services of the company.

Therefore, in order to provide various video services required by users in various environments according to the quality, based on a high-efficiency encoding / decoding method for high-capacity video, the quality of the image, for example, the image quality, the resolution of the image, the size of the image, It is necessary to provide scalability in the frame rate of video and the like. In addition, various image processing methods associated with such scalability should be discussed.

An embodiment of the present invention provides an interlayer prediction method for clipping information on a differential image, and an apparatus using the same.

In addition, an embodiment of the present invention provides an interlayer prediction method and an apparatus using the same, which can reduce storage capacity by storing an image in consideration of an increased bit depth.

In addition, an embodiment of the present invention provides a prediction method and an apparatus using the same that can perform inter-layer prediction without changing the bit depth of the residual by clipping the pixel value of the differential image.

An embodiment of the present invention obtains a difference between a reconstructed picture of an enhancement layer of a reference picture referenced by a current picture and an image upsampled from a reconstructed picture of a base layer of the reference picture, and obtains a difference picture of the reference picture. Generating a pixel value of the difference image out of the clipping range to the clipping range such that the range of pixel values of the difference image is a preset clipping range; and using the clipped difference image And performing inter prediction of the current picture.

The bit depth of the pixel value of the clipped differential image may be equal to or less than the bit depth of the pixel value of the reconstructed image with respect to the enhancement layer.

If the bit depth of the pixel value of the reconstructed image is n and the range of the pixel value of the difference image is-((1 << n)-1) to ((1 << n)-1), the step of clipping Clip a pixel value such that a range of pixel values of the difference image is-(1 << (n-number of clipping bits)) to ((1 << (n-number of clipping bits))-1), and the clipping The number of bits may be a bit depth of pixel values of the difference image to be reduced.

The clipping may include pixel values ranging from-((1 << n)-1) to-((1 << (n-number of clipping bits)) + 1) of the difference image to-(1 << ( n-the number of clipping bits)) and the pixel values in the range of (1 << (n-number of clipping bits)) to ((1 << n)-1) of the difference image ((1 << (n -Clipping Bits))-1)

Adding a preset offset to the pixel value of the difference image such that the range of the pixel value of the clipped difference image is shifted from 0 to ((1 << (n-number of clipping bits + 1))-1). can do.

If the bit depth of the pixel value of the reconstructed image is n and the range of the pixel value of the difference image is-((1 << n)-1) to ((1 << n)-1), the step of clipping May clip the pixel value such that the range of the pixel value of the difference image is-((1 << (n-number of clipping bits)) -1) to (1 << (n-number of clipping bits)).

The clipping may include pixel values in the range of-((1 << n)-1) to-(1 << (n-number of clipping bits)) of the difference image-((1 << (n-clipping) Number of bits)) -1), and pixel values in the range of ((1 << (n-number of clipping bits)) + 1) to ((1 << n)-1) of the difference image (1 < You can clip with <(n-number of clipping bits).

The method may further include storing the clipped difference image.

According to an embodiment of the present invention, an interlayer prediction method for clipping information on a differential image and an apparatus using the same are provided.

According to an embodiment of the present invention, there is provided an interlayer prediction method capable of reducing storage capacity by storing an image in consideration of an increased bit depth, and an apparatus using the same.

According to an embodiment of the present invention, there is provided a prediction method and an apparatus using the same that can perform interlayer prediction without changing the bit depth of the residual by clipping the pixel value of the differential image.

1 is a block diagram schematically illustrating a video encoding apparatus supporting scalability according to an embodiment of the present invention.

2 is a block diagram schematically illustrating a video decoding apparatus supporting scalability according to an embodiment of the present invention.

3 is a diagram schematically illustrating a method of performing intra prediction when an inter-layer difference mode is applied according to the present invention.

4 is a diagram for describing clipping of a differential image, according to an exemplary embodiment.

5 is a diagram for describing clipping of a difference image, according to another exemplary embodiment.

6 is a diagram for describing clipping of a differential image, according to another exemplary embodiment.

7 is a control flowchart illustrating a clipping method of a differential image according to the present invention.

As the present invention allows for various changes and numerous embodiments, particular embodiments will be illustrated in the drawings and described in detail in the written description. However, this is not intended to limit the invention to the specific embodiments. The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting of the spirit of the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this specification, terms such as "comprise" or "have" are intended to indicate that there is a feature, number, step, action, component, part, or combination thereof described on the specification, and one or more other features. It is to be understood that the present invention does not exclude the possibility of the presence or the addition of numbers, steps, operations, components, components, or a combination thereof.

On the other hand, each of the components in the drawings described in the present invention are shown independently for the convenience of description of the different characteristic functions in the video encoding apparatus / decoding apparatus, each component is a separate hardware or separate software It does not mean that it is implemented. For example, two or more of each configuration may be combined to form one configuration, or one configuration may be divided into a plurality of configurations. Embodiments in which each configuration is integrated and / or separated are also included in the scope of the present invention without departing from the spirit of the present invention.

Hereinafter, with reference to the accompanying drawings, it will be described in detail a preferred embodiment of the present invention. Hereinafter, the same reference numerals are used for the same components in the drawings, and redundant description of the same components is omitted.

In a video coding method supporting scalability (hereinafter, referred to as 'scalable coding'), input signals may be processed for each layer. Depending on the layer, the input signals (input images) may differ in at least one of resolution, frame rate, bit-depth, color format, and aspect ratio. Can be.

In the present specification, scalable coding includes scalable encoding and scalable decoding.

In scalable encoding / decoding, prediction between layers is performed by using differences between layers, that is, based on scalability, thereby reducing overlapping transmission / processing of information and increasing compression efficiency.

Referring to FIG. 1, the encoding apparatus 100 includes an encoder 105 for layer 1 and an encoder 135 for layer 0.

Layer 0 may be a base layer, a reference layer, or a lower layer, and layer 1 may be an enhancement layer, a current layer, or an upper layer.

The encoding unit 105 of the layer 1 includes a prediction unit 110, a transform / quantization unit 115, a filtering unit 120, a memory 125, an entropy coding unit 130, and a MUX (Multiplexer, 165). do.

The encoding unit 135 of the layer 0 includes a prediction unit 140, a transform / quantization unit 145, a filtering unit 150, a memory 155, and an entropy coding unit 160.

The

prediction units

110 and 140 may perform inter prediction and intra prediction on the input image. The

prediction units

110 and 140 may perform prediction in predetermined processing units. The performing unit of prediction may be a coding unit (CU), a prediction unit (PU), or a transform unit (TU).

For example, the

prediction units

110 and 140 may determine whether to apply inter prediction or intra prediction in a CU unit, determine a mode of prediction in a PU unit, and perform prediction in a PU unit or a TU unit. have. Prediction performed includes generation of a prediction block and generation of a residual block (residual signal).

Through inter prediction, a prediction block may be generated by performing prediction based on information of at least one picture of a previous picture and / or a subsequent picture of the current picture. Through intra prediction, prediction blocks may be generated by performing prediction based on pixel information in a current picture.

As a mode or method of inter prediction, there are a skip mode, a merge mode, a motion vector predictor (MVP) mode method, and the like. In inter prediction, a reference picture may be selected with respect to the current PU that is a prediction target, and a reference block corresponding to the current PU may be selected within the reference picture. The prediction unit 110 may generate a prediction block based on the reference block.

The prediction block may be generated in integer sample units or may be generated in integer or less pixel units. In this case, the motion vector may also be expressed in units of integer pixels or units of integer pixels or less.

In inter prediction, motion information, that is, information such as an index of a reference picture, a motion vector, and a residual signal, is entropy encoded and transmitted to a decoding apparatus. When the skip mode is applied, residuals may not be generated, transformed, quantized, or transmitted.

In intra prediction, the prediction mode may have 33 directional prediction modes and at least two non-directional modes. The non-directional mode may include a DC prediction mode and a planner mode (Planar mode). In intra prediction, a prediction block may be generated after applying a filter to a reference sample.

The PU may be a block of various sizes / types, for example, in the case of inter prediction, the PU may be a 2N × 2N block, a 2N × N block, an N × 2N block, an N × N block (N is an integer), or the like. In the case of intra prediction, the PU may be a 2N × 2N block or an N × N block (where N is an integer). In this case, the PU of the N × N block size may be set to apply only in a specific case. For example, the NxN block size PU may be used only for the minimum size CU or only for intra prediction. In addition to the above-described PUs, PUs such as N × mN blocks, mN × N blocks, 2N × mN blocks, or mN × 2N blocks (m <1) may be further defined and used.

In addition, the

prediction units

110 and 140 may perform prediction on the layer 1 by using the information of the layer 0. In the present specification, a method of predicting information of a current layer using information of another layer is referred to as inter-layer prediction for convenience of description.

Information of the current layer that is predicted using information of another layer (ie, predicted by inter-layer prediction) may include texture, motion information, unit information, predetermined parameters (eg, filtering parameters, etc.).

In addition, information of another layer used for prediction for the current layer (ie, used for inter-layer prediction) may include texture, motion information, unit information, and predetermined parameters (eg, filtering parameters).

As an example of inter-layer prediction, inter-layer motion prediction is also referred to as inter-layer inter prediction. According to inter-layer inter prediction, prediction of a current block of layer 1 (current layer or enhancement layer) may be performed using motion information of layer 0 (reference layer or base layer).

In case of applying inter-layer inter prediction, motion information of a reference layer may be scaled.

As another example of inter-layer prediction, inter-layer texture prediction is also called inter-layer intra prediction or intra base layer (BL) prediction. Inter layer texture prediction may be applied when a reference block in a reference layer is reconstructed by intra prediction.

In inter-layer intra prediction, the texture of the reference block in the reference layer may be used as a prediction value for the current block of the enhancement layer. In this case, the texture of the reference block may be scaled by upsampling.

In another example of inter-layer prediction, inter-layer unit parameter prediction derives unit (CU, PU, and / or TU) information of a base layer and uses it as unit information of an enhancement layer, or based on unit information of a base layer. Unit information may be determined.

In addition, the unit information may include information at each unit level. For example, in the case of CU information, information about a partition (CU, PU and / or TU) may include information on transform, information on prediction, and information on coding. In the case of PU information, information on a PU partition and information on prediction (eg, motion information, information on a prediction mode, etc.) may be included. The information about the TU may include information about a TU partition, information on transform (transform coefficient, transform method, etc.).

In addition, the unit information may include only the partition information of the processing unit (eg, CU, PU, TU, etc.).

In another example of inter-layer prediction, inter-layer parameter prediction may derive a parameter used in the base layer to reuse it in the enhancement layer or predict a parameter for the enhancement layer based on the parameter used in the base layer.

As an example of interlayer prediction, interlayer texture prediction, interlayer motion prediction, interlayer unit information prediction, and interlayer parameter prediction have been described. However, the interlayer prediction applicable to the present invention is not limited thereto.

For example, the prediction unit may use inter-layer residual prediction, which predicts the residual of the current layer using residual information of another layer as inter-layer prediction, and performs prediction on the current block in the current layer based on the residual layer.

In addition, the prediction unit may use inter-layer prediction as inter-layer prediction to perform prediction on the current block in the current layer by using a difference (difference image) between the reconstructed picture of the current layer and the resampled picture of another layer. You can also make predictions.

In addition, the prediction unit may use interlayer syntax prediction, which is used to predict or generate a texture of a current block using syntax information of another layer as interlayer prediction. In this case, the syntax information of the reference layer used for prediction of the current block may be information about an intra prediction mode, motion information, and the like.

In this case, inter-layer syntax prediction may be performed by referring to the intra prediction mode from a block to which the intra prediction mode is applied in the reference layer and referring to motion information from the block MV to which the inter prediction mode is applied.

For example, although the reference layer is a P slice or a B slice, the reference block in the slice may be a block to which an intra prediction mode is applied. In this case, when inter-layer syntax prediction is applied, inter-layer prediction may be performed to generate / predict a texture for the current block by using an intra prediction mode of the reference block among syntax information of the reference layer.

The transform /

quantization units

115 and 145 may perform transform on the residual block in transform block units to generate transform coefficients and quantize the transform coefficients.

The transform block is a block of samples and is a block to which the same transform is applied. The transform block can be a transform unit (TU) and can have a quad tree structure.

The transform /

quantization units

115 and 145 may generate a 2D array of transform coefficients by performing transform according to the prediction mode applied to the residual block and the size of the block. For example, if intra prediction is applied to a residual block and the block is a 4x4 residual array, the residual block is transformed using a discrete sine transform (DST), otherwise the residual block is transformed into a discrete cosine transform (DCT). Can be converted using.

The transform /

quantization unit

115 and 145 may quantize the transform coefficients to generate quantized transform coefficients.

The transform /

quantization units

115 and 145 may transfer the quantized transform coefficients to the entropy coding units 130 and 180. In this case, the transform / quantization unit 145 may rearrange the two-dimensional array of quantized transform coefficients into one-dimensional arrays according to a predetermined scan order and transfer them to the entropy coding units 130 and 180. In addition, the transform /

quantizers

115 and 145 may transfer the reconstructed block generated based on the residual and the predictive block to the

filtering units

120 and 150 for inter prediction.

Meanwhile, the transform /

quantization units

115 and 145 may skip transform and perform quantization only or omit both transform and quantization as necessary. For example, the transform /

quantization unit

115 or 165 may omit the transform for a block having a specific prediction method or a specific size block, or a block of a specific size to which a specific prediction block is applied.

The

entropy coding units

130 and 160 may perform entropy encoding on the quantized transform coefficients. Entropy encoding may use, for example, an encoding method such as Exponential Golomb, Context-Adaptive Binary Arithmetic Coding (CABAC), or the like.

The

filtering units

120 and 150 may apply a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) to the reconstructed picture.

The deblocking filter may remove distortion generated at the boundary between blocks in the reconstructed picture. The adaptive loop filter (ALF) may perform filtering based on a value obtained by comparing the reconstructed image with the original image after the block is filtered through the deblocking filter. The SAO restores the offset difference from the original image on a pixel-by-pixel basis to the residual block to which the deblocking filter is applied, and is applied in the form of a band offset and an edge offset.

The

filtering units

120 and 150 may apply only the deblocking filter, only the deblocking filter and the ALF, or may apply only the deblocking filter and the SAO without applying all of the deblocking filter, ALF, and SAO.

The

memories

125 and 155 may receive the reconstructed block or the reconstructed picture from the

filtering units

120 and 150 and store the received reconstructed pictures. The

memories

125 and 155 may provide the reconstructed blocks or pictures to the

predictors

110 and 140 that perform inter prediction.

Information output from the entropy coding unit 160 of layer 0 and information output from the entropy coding unit 130 of layer 1 may be multiplexed by the MUX 185 and output as a bitstream.

Meanwhile, for the convenience of description, the encoding unit 105 of the layer 1 has been described as including the MUX 165. However, the MUX is separate from the encoding unit 105 of the layer 1 and the encoding unit 135 of the layer 0. It may be a device or a module of.

2 is a block diagram illustrating an example of interlayer prediction in an encoding apparatus that performs scalable coding according to the present invention.

Referring to FIG. 2, the decoding apparatus 200 includes a decoder 210 of layer 1 and a decoder 250 of layer 0.

The decoding unit 210 of the layer 1 includes an entropy decoding unit 215, a reordering unit 220, an inverse quantization unit 225, an inverse transform unit 230, a prediction unit 235, a filtering unit 240, and a memory 245. ) May be included.

The decoding unit 250 of the layer 0 may include an entropy decoding unit 255, a reordering unit 260, an inverse quantization unit 265, an inverse transform unit 270, a filtering unit 280, and a memory 285. .

When the bitstream including the image information is transmitted from the encoding device, the DEMUX 205 may demultiplex the information for each layer and deliver the information to the decoding device for each layer.

The

entropy decoding units

215 and 255 may perform entropy decoding corresponding to the entropy coding scheme used in the encoding apparatus. For example, when CABAC is used in the encoding apparatus, the

entropy decoding units

215 and 255 may also perform entropy decoding using CABAC.

Information for generating a prediction block among the information decoded by the

entropy decoding units

215 and 255 is provided to the

prediction units

235 and 275, and a residual value of which entropy decoding is performed by the

entropy decoding units

215 and 255. That is, the quantized transform coefficients may be input to the

reordering units

220 and 260.

The

reordering units

220 and 260 may rearrange the information of the bitstreams entropy decoded by the

entropy decoding units

215 and 255, that is, the quantized transform coefficients, based on the reordering method in the encoding apparatus.

For example, the

reordering units

220 and 260 may rearrange the quantized transform coefficients of the one-dimensional array into the coefficients of the two-dimensional array. The

reordering units

220 and 260 may generate a two-dimensional array of coefficients (quantized transform coefficients) by performing scanning based on the prediction mode applied to the current block (transform block) and / or the size of the transform block.

The

inverse quantizers

225 and 265 may generate transform coefficients by performing inverse quantization based on the quantization parameter provided by the encoding apparatus and the coefficient values of the rearranged block.

The

inverse transform units

230 and 270 may perform inverse transform on the transform performed by the transform unit of the encoding apparatus. The

inverse transform units

230 and 270 may perform inverse DCT and / or inverse DST on a discrete cosine transform (DCT) and a discrete sine transform (DST) performed by an encoding apparatus.

The DCT and / or DST in the encoding apparatus may be selectively performed according to a plurality of pieces of information, such as a prediction method, a size of a current block, and a prediction direction, and the

inverse transformers

230 and 270 of the decoding apparatus may perform transform information performed in the encoding apparatus. Inverse transformation may be performed based on.

For example, the

inverse transform units

230 and 270 may apply inverse DCT and inverse DST according to a prediction mode / block size. For example, the

inverse transformers

230 and 270 may apply an inverse DST to a 4x4 luma block to which intra prediction is applied.

In addition, the

inverse transform units

230 and 270 may fixedly use a specific inverse transform method regardless of the prediction mode / block size. For example, the inverse transform units 330 and 370 may apply only inverse DST to all transform blocks. In addition, the inverse transform units 330 and 370 may apply only inverse DCT to all transform blocks.

The

inverse transformers

230 and 270 may generate a residual or residual block by inversely transforming the transform coefficients or the block of the transform coefficients.

The

inverse transformers

230 and 270 may also skip the transformation as needed or in accordance with the manner encoded in the encoding apparatus. For example, the inverse transforms 230 and 270 may omit the transform for a block having a specific prediction method or a specific size or a block of a specific size to which a specific prediction block is applied.

The

prediction units

235 and 275 may perform prediction on the current block based on prediction block generation related information transmitted from the

entropy decoding units

215 and 255 and previously decoded blocks and / or picture information provided by the

memories

245 and 285. A prediction block can be generated.

When the prediction mode for the current block is an intra prediction mode, the

prediction units

235 and 275 may perform intra prediction on the current block based on pixel information in the current picture.

When the prediction mode for the current block is the inter prediction mode, the

prediction units

235 and 275 may perform information on the current block based on information included in at least one of a previous picture or a subsequent picture of the current picture. Inter prediction may be performed. Some or all of the motion information required for inter prediction may be derived from the information received from the encoding apparatus and correspondingly.

When the skip mode is applied as the mode of inter prediction, residual is not transmitted from the encoding apparatus, and the prediction block may be a reconstruction block.

Meanwhile, the prediction unit 235 of layer 1 may perform inter prediction or intra prediction using only information in layer 1, or may perform inter layer prediction using information of another layer (layer 0).

For example, the predictor 235 of the layer 1 may perform prediction on the current block by using one of the motion information of the layer 1, the texture information of the layer 1, the unit information of the layer 1, and the parameter information of the layer 1. In addition, the prediction unit 235 of the layer 1 may perform prediction on the current block by using a plurality of pieces of information of the motion information of the layer 1, the texture information of the layer 1, the unit information of the layer 1, and the parameter information of the layer 1. have.

The predictor 235 of the layer 1 may receive motion information of the layer 1 from the predictor 275 of the layer 0 to perform motion prediction. Inter-layer motion prediction is also called inter-layer inter prediction. By inter-layer motion prediction, prediction of a current block of a current layer (enhanced layer) may be performed using motion information of a reference layer (base layer). The prediction unit 335 may scale and use motion information of the reference layer when necessary.

The predictor 235 of the layer 1 may receive texture information of the layer 1 from the predictor 275 of the layer 0 to perform texture prediction. Texture prediction is also called inter layer intra prediction or intra base layer (BL) prediction. Texture prediction may be applied when the reference block of the reference layer is reconstructed by intra prediction. In inter-layer intra prediction, the texture of the reference block in the reference layer may be used as a prediction value for the current block of the enhancement layer. In this case, the texture of the reference block may be scaled by upsampling.

The predictor 235 of the layer 1 may receive unit parameter information of the layer 1 from the predictor 275 of the layer 0 to perform unit parameter prediction. By unit parameter prediction, unit (CU, PU, and / or TU) information of the base layer may be used as unit information of the enhancement layer, or unit information of the enhancement layer may be determined based on unit information of the base layer.

The predictor 235 of the layer 1 may receive parameter information regarding the filtering of the layer 1 from the predictor 275 of the layer 0 to perform parameter prediction. By parameter prediction, the parameters used in the base layer can be derived and reused in the enhancement layer, or the parameters for the enhancement layer can be predicted based on the parameters used in the base layer.

The

adders

290 and 295 may generate reconstruction blocks using the prediction blocks generated by the

predictors

235 and 275 and the residual blocks generated by the

inverse transformers

230 and 270. In this case, the

adders

290 and 295 can be viewed as separate units (restore block generation unit) for generating the reconstruction block.

Blocks and / or pictures reconstructed by the

adders

290 and 295 may be provided to the

filtering units

240 and 280.

Referring to the example of FIG. 2, the filtering unit 240 of the layer 1 filters the reconstructed picture by using parameter information transmitted from the predicting unit 235 of the layer 1 and / or the filtering unit 280 of the layer 0. You can also do For example, in layer 1, the filtering unit 240 may apply filtering to or between layers using the parameters predicted from the parameters of the filtering applied in the layer 0.

The

memories

245 and 285 may store the reconstructed picture or block to use as a reference picture or reference block. The

memories

245 and 285 may output the stored reconstructed picture through a predetermined output unit (not shown) or a display (not shown).

In the example of FIG. 2, the reordering unit, the inverse quantization unit, and the inverse transform unit have been described. However, as in the encoding apparatus of FIG. It can also be configured.

In the example of FIGS. 1 and 2, the prediction unit has been described, but for better understanding, the prediction unit of layer 1 may be different from the interlayer prediction unit that performs prediction using information of another layer (layer 0). It may also be regarded as including an inter / intra predictor for performing prediction without using the information of).

As described above, in scalable video coding, inter-layer prediction for predicting information of a current layer using information of another layer may be performed.

Hereinafter, inter-layer difference prediction is performed to predict the current block in the current layer by using a difference (difference image) between the reconstructed image of the current layer and the resampled image of another layer during interlayer prediction. Examine.

A reconstructed picture of the base layer is referred to as an R _BL , and an image obtained by upsampling an R _{BL according} to the resolution of an enhancement layer is called a UR _BL . The reconstructed image of the enhancement layer is called R _EL .

The reconstructed image may be an image before applying the in-loop filtering. Also, the reconstructed image may be an image after applying some of a deblocking filter, a sample adaptive offset filter, and / or an adaptive loop filter. Furthermore, the reconstructed image may be an image after applying all of the in-loop filters.

Here, if a difference image obtained by subtracting the value of UR _BL from R _EL is D, independent encoding / decoding may be performed in a domain of the D images. This method is referred to herein as inter-layer differential picture coding or inter-layer differential mode (IL-Diff mode).

The inter-layer difference mode may be applied in a sequence unit, picture unit, slice unit, LCU (Largest CU) unit, coding unit (CU) unit, or prediction unit (PU) unit. In a processing unit to which the interlayer differential mode is to be applied, a flag indicating whether to use the interlayer differential mode may be transmitted from the encoding apparatus to the decoding apparatus.

For the processing unit to which the inter-layer difference mode is applied, coding (encoding / decoding) using other scalability may not be performed, but only a single layer coding (encoding / decoding) method may be used. In this case, bits for indicating whether to perform coding using other scalability can be saved.

The inter-layer difference mode may be performed by the

predictors

110 and 235 of FIGS. 1 and 2.

(1) intra prediction for IL-Diff mode

Referring to FIG. 3, the current picture 300 of the enhancement layer includes a reconstruction area 305 and an unrestored area 310 before and after the current block 315. The reconstructed image R _EL can be obtained from the reconstructed region 305. When reconstruction of the current picture 300 is completed, a reconstructed image R _EL is performed.

Meanwhile, the image UR _BL 325 upsampled the reconstructed image R _BL 320 of the base layer includes a block P _BL 330 corresponding to the current block 315.

In the encoding process, the prediction unit (the prediction unit of the encoding apparatus) may derive the difference D between the reconstructed image of the base layer and the reconstructed image of the enhancement layer as shown in Equation 1.

D = R _EL -UR _BL

In Equation 1, R _EL may use an image in which an in-loop filter such as a deblocking filter, SAO, or ALF is not applied due to the existence of the region 310 that has not been reconstructed yet.

Since the reconstructed image 1220 of the base layer is in a state in which all regions are reconstructed, the R _BL may be a reconstructed image in which the in-loop filter is applied, or may be a reconstructed image in which a part of the in-loop filter is applied. The reconstructed image may be a state in which the loop filter is not applied.

The predictor performs intra prediction on the current block 355 in the differential image D 340 with reference to the pixel values of the reconstructed region 345 except for the unrestored region 350 in the differential image D 340. can do.

In the decoding process, the prediction unit (the prediction unit of the decoding apparatus) may reconstruct the current block by using the block value P _BL in the UR _BL existing at the same position as the current block.

R _EL = P _D + P _BL + RES

In Equation 2, P _D is a prediction block generated by performing intra prediction from the reconstructed region of the differential image D, and RES is a residual block.

(2) inter prediction for IL-Diff mode

When the inter-layer difference mode is applied, the prediction unit generates a differential image D _R for a reference picture of the current picture to perform inter prediction on the current block. For example, the prediction unit generates a differential image D _R of the reference picture of the current picture by using the reconstruction picture of the enhancement layer of the reference picture and the reconstruction picture of the base layer of the reference picture.

The prediction unit may generate the prediction block P _D in the differential image domain of the current block based on the differential image D _R of the reference picture.

The prediction unit may reconstruct the current block by using the prediction block as shown in Equation 3.

R _EL = P _D + P _BL + RES

In Equation 3, R _EL is the current block reconstructed in the enhancement layer. P _BL is a block existing at the same position as the current block in the UR _BL , and RES is a residual block.

The differential picture D _R of the reference picture may be generated in advance and stored in a decoded picture buffer (DPB). The DPB may correspond to the memory described with reference to FIGS. 1 and 2.

Further, each time it generates a difference image D _R is R _EL of the reference picture may be calculated for a given block to the required position for the current block of the current block is reconstructed by the motion information.

In inter prediction for the inter-layer difference mode, a part or all of the in-loop filter, as well as the reconstructed image without the in-loop filter applied as a reference picture reconstructed in the enhancement layer when generating the differential image for the reference picture, are generated. It is also possible to use a reconstructed image of the state where the is applied.

On the other hand, when inter-layer prediction is performed using the differential image, the range of pixel values of the block to which the prediction is applied and the residual block is different from that of applying other prediction methods. For convenience of description, when a prediction method other than the interlayer differential mode is expressed as a normal mode and assuming that the bit depth of the input image is 8, the pixel value range and the bit depth of the image block are assumed. It can be expressed as shown in Table 1 below.

Table 1

If the value of the bit depth of the input video is 8, the range of pixel values is 0 to 255. Accordingly, the range of pixel values of the difference image obtained through the subtraction operation between the reconstructed images is -255 to 255.

As shown in Table 1, in the normal mode, when the input image is an 8 bits signal, the signal of the prediction block to which intra and inter prediction are applied / applied is also an 8 bits signal. The residual pixel value for the prediction block having a pixel value in the range of 0 to 255 is -255 to 255, and the residual signal becomes 9 bits, and the signal to be converted, that is, the pixel value of the residual block to be converted is also a 9 bit signal. .

On the other hand, a pixel value of a block to which intra and inter prediction are applied to the inter-layer difference mode, that is, a prediction block of a differential image has a range of -255 to 255, and a required bit for processing one pixel value is 9 bits.

The pixel values of the residual block and the transform block for the differential image are extended to a larger range and range from -511 to 511, and the bits for processing the pixel values of the residual block and the transform block are used to process the prediction block. 10 bits, one more than the required bit.

In summary, if the bit depth of the input image input to the encoding apparatus is a, the bit depth of the prediction block to which intra and inter prediction are applied and the reconstructed image is a, but the bit of the signal to which residual and transformation of the prediction block is to be performed is a. The depth becomes a + 1.

Meanwhile, when the bit depth of the input image is a, the bit depth of the differential image formed as the difference value of the reconstructed image in order to perform prediction in the differential mode is a + 1, and the residual and transformation of the differential image is performed. The bit depth of may be a + 2.

Meanwhile, the differential image for the reference picture is pre-generated and stored in a memory such as a DPB during inter prediction for the inter-layer differential image mode.

In general, according to the bit depth of the input image, the range of pixel values of the reconstructed image is greater than or equal to 0 ((1 << (bit depth + bit depth increase)) -1) Will have a value less than or equal to The reconstructed picture may include a reconstructed picture of the current picture and an upsampled picture of the reconstructed picture of the reference picture.

The increased bit depth refers to an increased number of bits than the bit depth of the input image when the extended range of pixel values of the input image is processed. This may be the case when the range of the pixel value may be extended for the purpose of improving the image quality. For example, when the bit depth of the input image is 8, the range of pixel values may be extended by increasing the bit depth of the input image by one or two specific bits.

For convenience of explanation, hereinafter, "bit depth of an input image + increased bit depth" in consideration of a bit depth that may be changed in an image processing process is expressed as a bit depth n of a reconstructed image. If the bit depth of the input image is not increased in the image processing process, the bit depth of the input image is the same as the bit depth of the reconstructed image, and if the bit depth is increased in the image processing process, the bit depth of the reconstructed image is increased to the bit depth of the input image. It becomes the value which added bit depth.

If the bit depth of the reconstructed image is n, the bit depth of the difference image formed by the difference between the reconstructed image for the enhancement layer of the reference picture and the reconstructed image for the base layer of the reference picture is n + 1.

If the bit depth of the input image is 8 and the increased bit depth is 0, the range of pixel values of the reconstructed image is 0 to 255, and the range of pixel values of the differential image is -255 to 255, which is a 9-bit signal. .

Since the bit depth of the difference image is larger by 1 bit per pixel than the input image, an additional memory is required by 1 bit per pixel when storing the difference image than when storing the input image or the reconstructed image.

In addition, since the residual of the residual image is a signal larger than 1 bit per pixel than the residual of the normal mode, the difference in bit depth between the processing of the residual of the differential image and the processing of the residual in the normal mode is different. Occurs.

On the other hand, since the difference image is an image obtained by subtracting the reconstructed image of the up-sampled base layer from the reconstructed image of the enhancement layer, the distribution of pixel values may be different from that of the conventional image. In the case of an 8-bit image, the pixel values of the reconstructed image are relatively evenly distributed in the range of 0 to 255, while the pixel values of the differential image may be distributed around 0.

If the pixel value of the difference image is clipped in a predetermined range in consideration of the distribution form of the pixel value, the number of bits per pixel for storing the difference image can be reduced, and the size of the memory in the DPB can be saved.

According to an embodiment of the present invention, the range of pixel values of the difference image may be clipped such that the bit depth of the pixel value of the difference image is equal to or smaller than the bit depth of the pixel value of the reconstructed image.

At this time, by clipping an area in which the distribution of pixel values is sparse, a decrease in bit efficiency due to a change in pixel value due to clipping can be reduced. In other words, even if clipping is performed by clipping an area where the distribution of pixel values is sparse, loss of an image due to a change in pixel value can be minimized.

In addition, by clipping pixel values of the difference image, the bit depth of the residual of the difference image may be adjusted to be equal to the bit depth of the residual of the prediction block for which prediction is performed according to the normal mode. As a result, the memory and the converter used in the single-layer image processing may be used in the inter-layer image processing without modification.

4 is a diagram for describing clipping of a differential image, according to an exemplary embodiment. In the present embodiment, the pixel values of the difference image are clipped so that the range of the pixel values of the difference image becomes the original half. Through this, the bit depth of the pixel value of the differential image may be adjusted to be the same as the bit depth of the pixel value of the reconstructed image.

If the input image is an 8-bit signal and the bit depth is 8 and no increase in the bit depth occurs in the image processing process, the reconstructed image may have a pixel value of 0 to 255 as shown in (a). Each pixel of the reconstructed image may be represented by any one of a total of 256 pixel values.

And, as shown in (b), the pixel value of the difference image composed of the difference value of the reconstructed image may be distributed in the range of -255 to 255. Each pixel of the difference image may have any one of a total of 512 pixel values, and 9 bits are required to process pixel values of the difference image.

According to an embodiment of the present invention, as shown in (c), the pixel value in the range of -255 to -129 is clipped to -128 in the difference image, and similarly the pixel value in the range of 128 to 255 is clipped to 127. That is, in the difference image, the pixel value in the range of -255 to -129 is changed to -128, and the pixel value in the range of 128 to 255 is changed to 127.

Therefore, the pixel value in the range of -255 to -129 of the difference image may be stored as -128, and the pixel value in the range of 128 to 255 may be stored as 127. In the difference image, pixel values in the range of -128 to 127 may be stored without clipping.

That is, in order to change the pixel value in the range of -255 to 255 to the pixel value in the range of -128 to 127, the pixel value in the range of -255 to -129 is -128, and the pixel value in the range of 128 to 255 is clipped to 127. .

Due to the clipping, the pixel value of the difference image is -128 to 127, and each pixel can have any one of 256 pixels. Therefore, the pixel of the difference image is processed as an 8-bit signal. Therefore, the difference image may be stored in the memory as an 8-bits signal per pixel like the reconstructed image.

As a result, even if the difference image is increased by 1 bit per pixel compared with the reconstructed image, since the bit depth of the difference image is reduced through clipping, no additional memory for the increased bit is needed.

(d) shows an example in which the range of pixel values of the difference image to be clipped differs from (c). As shown, pixel values in the range of -255 to -128 are clipped to -127, and pixel values in the range of 129 to 255 are clipped to 128 in the difference image. The pixel value of the clipped differential image has a range of -127 to 128. Pixel values ranging from -127 to 128 can be processed or stored as 8-bit signals.

The clipping method according to FIGS. 4C and 4D is as follows. The pixel range of the reconstructed image is greater than or equal to 0 and has a value less than or equal to ((1 << n)-1), and the pixel range of the difference image is-((1 << n)-1) When ~ ((1 << n)-1), the range of pixel values of the difference image is-(1 << (n-1)) ~ ((1 << (n-1))-1) or Clip from-((1 << (n-1))-1) to (1 << (n-1)).

That is, the bit depth necessary for processing the pixel value of the differential image through clipping is adjusted to the bit depth of the pixel value of the reconstructed image.

5 is a diagram for describing clipping of a difference image, according to another exemplary embodiment. In the present embodiment, the pixel values of the difference image are clipped so that the range of the pixel values of the difference image becomes the original 1/4. Through this, the bit depth of the pixel value of the difference image may be adjusted to be smaller than the bit depth of the pixel value of the reconstructed image.

As shown in (a), the pixel value of the difference image composed of the difference values of the reconstructed images is distributed in the range of -255 to 255.

One embodiment of the present invention, as shown in (b), in order to change a pixel value in the range of -255 to 255 to a pixel value in the range of -64 to 63, -255 to-not belonging to the range of -64 to 63 Clip a pixel value in the range of 65 to -64 and a pixel value in the range of 64 to 255 that is not in the range of -64 to 63. That is, in the difference image, the pixel value in the range of -255 to -65 is changed to -64, and the pixel value in the range of 64 to 255 is changed to 63.

The pixel value in the range of -255 to -65 of the difference image may be stored as -64, and the pixel value in the range of 64 to 255 may be stored as 63.

Due to clipping, the pixel value range of the difference image is -64 to 63, and each pixel can have any one of 128 pixels, so 7 bits are required to process the pixel of the clipped difference image. The difference image may be stored in the memory as a signal having a bit depth less than the reconstructed image, that is, a 7it signal per pixel.

(c) shows that the range of pixel values of the difference image is clipped at -63 to 64. In the difference image, pixel values in the range of -255 to -64 are clipped to -63, and pixel values in the range of 65 to 255 are clipped to 64. The pixel value of the clipped differential image may be processed or stored as a 7-bit signal as shown in (b).

The clipping method according to FIGS. 5B and 5C is summarized as follows. The range of pixel values of the reconstructed image is greater than or equal to 0 and less than or equal to ((1 << n)-1), and the range of pixel values of the differential image is-((1 << n )-1) ~ ((1 << n)-1), the range of pixel values of the difference image is-((1 << (n-2))) ~ ((1 << (n-2) ))-1) or-((1 << (n-2))-1) to (1 << (n-2)).

Through the above clipping, the bit depth required to process the pixel value of the differential image can be adjusted to a value smaller than the bit depth of the pixel value of the reconstructed image. You can also reduce the size further.

A more generalized clipping method according to FIGS. 4 and 5 is as follows.

When the pixel value of the difference image has a range from-((1 << n)-1) to ((1 << n)-1), it is-((1 << (n-clipping bit))) ~ You can clip to have values of ((1 << (n-clipping bit))-1) or-((1 << (n-clipping bit))-1) to (1 << (n-clipping bit)) Can be. Herein, clipping bits refer to bit depths to be saved per pixel through clipping. The clipping bit is 1 in the embodiment of FIG. 4 and the clipping bit in the embodiment described in FIG. 5.

The above-described clipping of the differential image is applied only when a differential image of the reference picture is obtained for inter prediction on the inter-layer difference mode, and the differential image of the reconstructed region of the current picture is selected for intra prediction on the inter-layer difference mode. It may not apply when seeking. Unlike the difference picture of the reference picture, the pixel value of the difference picture for the reconstructed area of the current picture is used only temporarily for the prediction applying the difference mode of the current picture. This is because there is little need.

As shown in (a), the reconstructed image may have a pixel value of 0 to 255.

The pixel values of the difference image composed of the difference values of the reconstructed images are distributed in the range of -255 to 255 as shown in (b).

In the difference image of (b), as shown in (c), a pixel value in the range of -255 to -129 can be clipped to -128, and a pixel value in the range of 128 to 255 can be clipped to 127. The range of pixel values of the clipped difference image is -128 to 127.

According to the present embodiment, as shown in (c), the range of pixel values is shifted to the pixel value range of the reconstructed image by applying a preset offset to the pixel value of the clipped differential image. Pixel values in the range of -128 to 127 of the difference image are shifted to the range of 0 to 255, and pixel values of the difference image are stored to the shifted range of 0 to 255.

Pixel values of the difference image clipped in the range of -127 to 128 may also be shifted to the range of 0 to 255. In addition, as shown in FIG. 5, a pixel value of a difference image clipped in a range of -64 to 63 or -63 to 64 may be shifted to a range of 0 to 127.

First, the prediction unit of the encoding apparatus and the decoding apparatus obtains a difference between the reconstructed image of the enhancement layer of the reference picture and the image upsampled from the reconstructed image of the base layer of the reference picture, and generates a differential image of the reference picture (S701). ).

The predictor clips the pixel values of the difference image out of the clipping range into the clipping range so that the range of pixel values of the difference image becomes a preset clipping range (S702).

In this case, the pixel value of the difference image may be clipped such that the bit depth of the pixel value of the difference image is equal to or less than the bit depth of the pixel value of the reconstructed image.

If the bit depth of the pixel value of the reconstructed image is n and the range of the pixel value of the differential image is-((1 << n)-1) to ((1 << n)-1), the pixel value of the differential image is clipped. It can be clipped in the range- (1 << (n-number of clipping bits)) to ((1 << (n-number of clipping bits))-1).

Here, the number of clipping bits means the bit depth of the pixel value of the difference image to be reduced, and may be 1 or 2.

Pixel values in the range of-((1 << n)-1) to-((1 << (n-number of clipping bits)) + 1) of the difference image so that the pixel range of the difference image is equal to the clipping range. Is clipped to-(1 << (n-number of clipping bits)), and pixel values in the range of (1 << (n-number of clipping bits)) to ((1 << n)-1) of the difference image are ( Clipped to (1 << (n-number of clipping bits))-1).

Alternatively, the pixel value of the difference image may be clipped to the clipping range − ((1 << (n − number of clipping bits)) −1) to (1 << (n − number of clipping bits)). In this case, the pixel values in the range of-((1 << n)-1) to-(1 << (n-number of clipping bits)) of the difference image are-((1 << (n-number of clipping bits)) -1) is clipped to, and pixel values in the range of ((1 << (n-number of clipping bits)) + 1) to ((1 << n)-1) of the difference image are (1 << (n-clipping) Is clipped to)

The pixel value of the difference image may be shifted by adding a predetermined offset to the pixel value of the difference image which has been subjected to the clipping process.

When the pixel value of the difference image is clipped as described above, the bit depth of the residual for the difference image is equal to or less than the bit depth of the residual for the prediction block on which the prediction is performed according to the normal mode. As a result, a memory, a converter, and the like, which are used when processing an image of a single layer, may be used for inter-layer image processing without additional modification.

The prediction unit performs inter prediction on the current picture using the clipped difference image (S703).

In the case of inter prediction for the inter-layer difference mode, the prediction unit may generate a prediction block in the differential image domain of the current block based on the differential image of the reference picture. The prediction unit may reconstruct the current block by adding a residual block and a reference block existing at the same position as the current block in the reconstructed image of the base layer to the prediction block generated in the differential image domain.

The pixel value of the clipped differential image may be stored in a memory such as a DPB. As described above, the difference picture of the reference picture may be generated before the inter prediction and stored in a memory such as a DPB, and may be stored in a block specified as a position necessary for reconstruction of the current block by motion information of the current block. It may also be calculated for. That is, the difference image may be generated in picture units or, if necessary, in block units.

In the exemplary system described above, the methods are described based on a flowchart as a series of steps or blocks, but the invention is not limited to the order of steps, and certain steps may occur in a different order or concurrently with other steps than those described above. Can be. In addition, since the above-described embodiments may include examples of various aspects, a combination of each embodiment should also be understood as an embodiment of the present invention. Accordingly, it is intended that the present invention cover all other replacements, modifications and variations that fall within the scope of the following claims.

Claims

Generating a difference image of the reference picture by obtaining a difference between a reconstructed picture of an enhancement layer of a reference picture referred to by a current picture and an image of upsampled reconstructed picture of the base layer of the reference picture;

Clipping pixel values of the difference image out of the clipping range into the clipping range such that a range of pixel values of the difference image becomes a preset clipping range;

And performing inter prediction of the current picture using the clipped differential image.
The method of claim 1,

The bit depth of the pixel value of the clipped differential image is less than or equal to the bit depth of the pixel value of the reconstructed image for the enhancement layer.
The method of claim 2,

If the bit depth of the pixel value of the reconstructed image is n and the range of pixel values of the difference image is-((1 << n)-1) to ((1 << n)-1),

The clipping may include clipping the pixel values such that the pixel values of the difference image range from-(1 << (n-number of clipping bits)) to ((1 << (n-number of clipping bits))-1). and,

Wherein the number of clipping bits is a bit depth of a pixel value of the difference image to be reduced.
The method of claim 3,

The clipping may include pixel values ranging from-((1 << n)-1) to-((1 << (n-number of clipping bits)) + 1) of the difference image to-(1 << ( n-the number of clipping bits)) and the pixel values in the range of (1 << (n-number of clipping bits)) to ((1 << n)-1) of the difference image ((1 << (n Number of clipping bits))-1).
The method of claim 3,

Adding a preset offset to the pixel value of the difference image such that the range of the pixel value of the clipped difference image is shifted from 0 to ((1 << (n-number of clipping bits + 1))-1). An interlayer prediction method, characterized in that.
The method of claim 2,

If the bit depth of the pixel value of the reconstructed image is n and the range of pixel values of the difference image is-((1 << n)-1) to ((1 << n)-1),

The clipping may include clipping the pixel values such that the pixel values of the difference image range from-((1 << (n-number of clipping bits)) -1) to (1 << (n-number of clipping bits)). An interlayer prediction method, characterized in that.
The method of claim 6,

The clipping may include pixel values in the range of-((1 << n)-1) to-(1 << (n-number of clipping bits)) of the difference image-((1 << (n-clipping) Number of bits)) -1), and pixel values in the range of ((1 << (n-number of clipping bits)) + 1) to ((1 << n)-1) of the difference image (1 < An interlayer prediction method characterized by clipping with <(n-number of clipping bits).
The method of claim 6,

Adding a preset offset to the pixel value of the difference image such that the range of the pixel value of the clipped difference image is shifted from 0 to ((1 << (n-number of clipping bits + 1))-1). An interlayer prediction method, characterized in that.
The method of claim 1,

And storing the difference image clipped.