WO2013077650A1

WO2013077650A1 - Method and apparatus for decoding multi-view video

Info

Publication number: WO2013077650A1
Application number: PCT/KR2012/009938
Authority: WO
Inventors: 방건; 정원식; 허남호; 유선미; 심동규; 남정학; 임웅
Original assignee: 한국전자통신연구원
Priority date: 2011-11-23
Filing date: 2012-11-22
Publication date: 2013-05-30

Abstract

The present invention relates to a method and apparatus for decoding a multi-view video. The method for decoding a multi-view video includes: receiving entropy-decode quantization information for a current depth image; and obtaining a quantized residual of the current depth image on the basis of the quantization information, wherein the quantization information includes flag information providing instruction on whether to perform spatial quantization for the current depth image.

Description

Multiview video decoding method and apparatus

The present invention relates to a method for encoding / decoding an image, and more particularly, to a method and apparatus for encoding / decoding a multiview video including a color image and a depth image.

ITU-T's Video Coding Experts Group (VCEG) and ISO / IEC's Moving Picture Experts Group (MPEG) formed Joint Collaborative Team on Video Coding (JCT-VC), the next generation of video compression standards after H.264 / AVC. Standardization on HEVC (High Efficient Video Coding) is in progress. On the other hand, the MPEG 3DV group is a HEVC based multiview color image / depth standardized by existing H.264 / AVC and JCT-VC for efficient compression of multiview images and synthesis of virtual view images. Compression standardization of images is in progress.

In order to enable the synthesis of the virtual view image using the depth image, the 3DV group is in the process of standardizing a technology for compressing and transmitting not only a multiview color image but also depth image information. Accordingly, research on high-efficiency compression techniques considering the characteristics of depth image is expected to be actively conducted.

The present invention provides a method and apparatus for multiview video encoding / decoding that can improve encoding / decoding efficiency of a multiview video image.

The present invention provides a method and apparatus for encoding / decoding a depth image capable of improving the accuracy of the depth image.

The present invention provides a quantization method and apparatus capable of improving the accuracy of a depth image.

The present invention provides a filtering method and apparatus capable of preserving edge regions of a depth image and improving image quality.

According to an aspect of the present invention, a multi-view video decoding method is provided. The method includes receiving and entropy decoding quantization information of a current depth image and acquiring a quantized residual of the current depth image based on the quantization information, wherein the quantization information includes: Flag information indicating whether to perform spatial axis quantization on the current depth image is included.

When spatial axis quantization is performed on the current depth image, the quantization information includes difference value information on the quantized residual signal of the current depth image, and the difference value on the quantized residual signal is the current depth. The difference value may be a difference between a quantized residual signal of a current pixel in an image and a quantized residual signal of a neighboring pixel positioned around the current pixel.

The acquiring of the quantized residual signal may include predicting a residual signal from the neighboring pixels and adding a difference value between the predicted residual signal and the quantized residual signal of the current depth image.

The peripheral pixel is an upper pixel located at the top of the current pixel when the current pixel is located in the first column of the current depth image, and when the current pixel is located in a region other than the first column of the current depth image. It may be a left pixel located to the left of the current pixel.

The flag information indicating whether to perform the spatial axis quantization may be information encoded and transmitted based on a transform unit (TU).

According to another aspect of the present invention, a multi-view video decoding apparatus is provided. The apparatus includes an entropy decoder that entropy decodes and receives quantization information about a current depth image, and an inverse quantizer that obtains a quantized residual signal of the current depth image based on the quantization information. The information includes flag information indicating whether spatial axis quantization is performed on the current depth image.

The inverse quantization unit may estimate a residual signal from the neighboring pixels, and obtain a quantized residual signal of the current depth image by adding a difference value between the predicted residual signal and the quantized residual signal of the current depth image. .

According to another aspect of the present invention, a multi-view video decoding method is provided. The method includes receiving and entropy decoding a bitstream and performing filtering using an anisotropic median filter on a current depth image reconstructed based on the entropy decoded signal.

The performing of the filtering using the anisotropic intermediate filter may include determining whether a current pixel area in the current depth image is an edge area, and if the current pixel area is an edge area, the filtering target pixel in the current pixel area. Classifying the pixels in the current pixel area into a plurality of groups based on a value and based on a difference between each of the median values calculated from each of the classified plurality of groups and the filtering target pixel value. And determining the pixel value to be filtered as one of the median values calculated from each of the classified plurality of groups.

In determining whether the current pixel area is an edge area, the threshold value may be compared with a preset threshold based on a difference between pixel values in the current pixel area and an intermediate value calculated from neighboring pixels positioned around the current pixel area. have.

In the classifying into the plurality of groups, the pixels in the current pixel area having a value less than or equal to the filtering target pixel value are classified into a first group, and the current having a value greater than or equal to the filtering target pixel value. The pixels in the pixel area may be classified into a second group.

In the determining of the pixel value to be filtered, filtering the median having a small difference between the first pixel value calculated from the first group and the second pixel value calculated from the second group with the filtering object pixel value. Can be determined by the target pixel value.

The method may further include storing the current depth image filtered using the anisotropic intermediate filter in an image buffer.

According to another aspect of the present invention, a multi-view video decoding apparatus is provided. The apparatus includes an entropy decoding unit for receiving and entropy decoding a bitstream and a filter unit for performing filtering using an anisotropic median filter on the current depth image reconstructed based on the entropy decoded signal.

The image quality of the depth image may be improved by applying an anisotropic intermediate filter to the image degradation problem that may occur in the edge region of the reconstructed depth image. In addition, by applying the spatial axis quantization method to the depth image, it is possible to reduce errors that may occur in the region including the edge, preserve the edge region, and improve rate-distortion performance.

1 is a block diagram schematically illustrating an apparatus for decoding a multiview video image according to an embodiment of the present invention.

2 is a block diagram schematically illustrating an apparatus for decoding a multiview depth image according to an embodiment of the present invention.

3 is a diagram illustrating an encoding structure for inter-view prediction of a multiview image to which the present invention is applied.

FIG. 4 is a diagram illustrating an example of a reference structure for inter-view prediction of time V ₂ shown in FIG. 3.

FIG. 5 is a diagram illustrating an example of a reference structure for inter-view prediction of time V ₁ shown in FIG. 3.

6 is a flowchart schematically illustrating a space axis quantization method according to an embodiment of the present invention.

7 is a flowchart illustrating a method of inverse quantization in a spatial domain according to an embodiment of the present invention.

8 is a diagram illustrating a process of obtaining a quantized residual signal by applying a pixel-based prediction method according to an embodiment of the present invention.

9 is a flowchart illustrating a method of filtering by applying an anisotropic intermediate filter according to an embodiment of the present invention.

As the invention allows for various changes and numerous embodiments, particular embodiments will be illustrated in the drawings and described in detail in the written description. However, this is not intended to limit the present invention to specific embodiments, it should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention. In describing the drawings, similar reference numerals are used for similar elements.

Terms such as first and second may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component. The term and / or includes a combination of a plurality of related items or any item of a plurality of related items.

When a component is referred to as being "connected" or "connected" to another component, it may be directly connected to or connected to that other component, but it may be understood that other components may exist in the middle. Should be. On the other hand, when a component is referred to as being "directly connected" or "directly connected" to another component, it should be understood that there is no other component in between.

The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting of the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this application, the terms "comprise" or "have" are intended to indicate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, and one or more other features. It is to be understood that the present disclosure does not exclude the possibility of the presence or the addition of numbers, steps, operations, components, components, or combinations thereof.

Hereinafter, with reference to the accompanying drawings, it will be described in detail a preferred embodiment of the present invention. Hereinafter, the same reference numerals are used for the same components in the drawings, and duplicate descriptions of the same components are omitted.

High Efficiency Video Coding (HEVC) -based 3D Video Coding (3D Video Coding) technology enables the acquisition, processing, and processing of 3D video including depth images of each viewpoint as well as multi-view video images. Includes all processes for transfer and playback. The depth image is an image representing 3D distance information of an object existing in the image, and a pixel value of the depth image informs depth information of the corresponding pixel. Since the accuracy of the depth image determines the quality of the virtual mid-view image synthesized using the depth image, it is important to generate an accurate depth image.

Therefore, the 3D video decoder according to the present invention is designed to decode not only a multiview video image but also a depth image. In addition, the three-dimensional video encoder is composed of two layers to encode each of the multi-view video image and the depth image, each layer is inter-view prediction based on the inter-view correlation as well as all the tools of HEVC. Coding is possible using the method. The 3D video decoder according to the present invention may perform the decoding process in a procedure opposite to that of the 3D video encoder.

Hereinafter, a 3D video decoder according to the present invention will be described. The 3D video decoder according to the present invention may be composed of two layers, a decoder of a multiview video image and a decoder of a multiview depth image. In this case, the multiview video image may be a color image. Therefore, the 3D video decoder according to the present invention may be used to decode a 2 viewpoint / 3 viewpoint video image and a 2 viewpoint / 3 viewpoint depth image.

Referring to FIG. 1, the apparatus 100 for decoding a multiview video image includes an

entropy decoder

110, 111, and 112, an

inverse quantizer

120, 121, and 122, an inverse transformer, 130, 131, and 132.

Units

140, 141, 142,

motion compensators

150, 151, 152,

filter units

160, 161, 162, and decoded picture buffers (DPBs) 170, 171, and 172. .

The bitstreams V ₀ , V ₁ , and V ₂ of the encoded image may be input to the apparatus 100 for decoding a multiview video image. Each of the plurality of bitstreams V ₀ , V ₁ , and V ₂ may be an image obtained at different time points. For example, the bitstream V ₀ may be a base view image, and the base view is a view to which an image to be encoded independently belongs. In addition, the bitstreams V ₁ and V ₂ may be extended view images, and the extended view is a view to which an image encoded using information of the base view belongs.

The bitstreams V ₀ , V ₁ , and V ₂ input to the decoding apparatus 100 of a multiview video image may be decoded by a procedure opposite to that of a 3D video encoder, and the 3D video encoder may use, for example, HEVC technology. Can be used to encode a multiview video image.

The

entropy decoding units

110, 111, and 112 may entropy decode the input bitstreams V ₀ , V ₁ , and V ₂ . For example, when view-related information such as inter-view correlation, inter-view parallax information, prediction mode information, motion vector information, and the like are included in the bitstreams V ₀ , V ₁ , and V ₂ , entropy decoding may be performed together.

The

inverse quantizers

120, 121, and 122 may perform inverse quantization based on entropy decoded transform coefficients and quantization parameters provided by the 3D video encoder.

The

inverse transform units

130, 131, and 132 may inverse transform the inverse quantized transform coefficients to generate a residual block. The inverse transform may be performed based on a transform unit (TU) determined by the 3D video encoder, and may use transform information performed by the encoder.

The

prediction units

140, 141, and 142 may generate a prediction block corresponding to the current block by performing intra prediction and inter prediction. The current block may be a block corresponding to a coding unit (CU) or a block corresponding to a prediction unit (PU).

In the intra mode, the

prediction units

140, 141, and 142 may generate a prediction block by performing prediction using pixel values of blocks that are already encoded around the current block. In the inter mode, the

predictors

140, 141, and 142 compensate for the motion by the

motion compensators

150, 151, and 152 using the motion vector and the reference image stored in the image buffers 170, 171, and 172. By performing the prediction block can be generated.

The prediction block may be added to the residual block to generate a reconstruction block. The reconstruction block may be provided to the

filter units

160, 161, and 162. The

filter units

160, 161, and 162 may receive filter related information applied to the corresponding block from the encoder and perform filtering on the corresponding block in the decoder. For example, when the encoder performs encoding according to HEVC, the encoder may be an in-loop filter.

The

image buffer units

170, 171, and 172 store the reconstruction block. The stored reconstructed blocks may be provided to the

predictors

140, 141, and 142 and the

motion compensators

150, 151, and 152, which perform prediction, and may be used as reference images.

In the case of the base view image, such as the bitstream V ₀ , inter-view prediction is not performed and inter prediction / intra prediction is performed. Since the inter-view prediction is performed in the case of the extended view image such as the bitstreams V ₁ and V ₂ , image information of another view may be referred to. Therefore, in the case of the extended view, prediction may be performed by referring to the reference picture from the image buffer unit at another view. This reference structure for inter-view prediction will be described in more detail with reference to FIGS. 3 to 5.

As described above, the apparatus for decoding a multiview video image according to the embodiment of the present invention illustrated in FIG. 1 performs decoding of one base view image and decoding of two extended view images. Decoding may be performed on two or more extended view images.

Referring to FIG. 2, the apparatus 200 for decoding a multiview depth image includes an

entropy decoder

210, 211, and 212, an

inverse quantizer

220, 221, and 222, an

inverse transformer

230, 231, and 232.

Units

240, 241 and 242,

motion compensators

250, 251 and 252,

filter units

260, 261 and 262, and decoded picture buffers (DPBs) 270, 271 and 272. .

The bitstreams D ₀ , D ₁ , and D ₂ of the encoded depth image may be input to the apparatus 200 for decoding a multiview depth image. Each of the plurality of bitstreams D ₀ , D ₁ , and D ₂ may be depth images obtained at different views. For example, the bitstream D ₀ may be a base view image, and the base view is a view to which an image to be encoded independently belongs. In addition, the bitstreams D ₁ and D ₂ may be extended view images, and the extended view is a view to which an image encoded using information of the base view belongs.

The bitstreams D ₀ , D ₁ , and D ₂ input to the decoding apparatus 200 of the multi-view depth image may be decoded as opposed to the procedure of encoding the depth image in the 3D video encoder. The multiview depth image may be encoded by using the HEVC technique.

The

entropy decoding units

210, 211, and 212 may entropy decode the input bitstreams D ₀ , D ₁ , and D ₂ . For example, when view-related information such as inter-view correlation, inter-view parallax information, prediction mode information, motion vector information, and the like are included in the bitstreams D ₀ , D ₁ , and D ₂ , they may be entropy decoded together.

Also, the

entropy decoding units

210, 211, and 212 according to the present invention may entropy decode quantization related information included in the bitstreams D ₀ , D ₁ , and D ₂ . Quantization related information is quantization information performed by the 3D video encoder. In the 3D video encoder, the residual signal generated after performing prediction on the current depth image may be transformed to perform quantization in the frequency domain, or the residual signal may be Quantization in the spatial domain can be performed without transformation. This allows the 3D encoder to adaptively select a quantization method that yields high efficiency in consideration of Rate Distortion Optimization (RDO) based on a transform unit (TU).

That is, the quantization related information includes flag information indicating whether spatial axis quantization is performed. In addition, when spatial axis quantization is performed, the quantization related information may include difference value information about a quantized residual signal.

The

inverse quantization units

220, 221, and 222 may perform inverse quantization based on entropy decoded quantization related information. That is, inverse quantization is performed based on flag information indicating whether to perform spatial axis quantization, and if the flag indicates that space axis quantization is performed, the residual signal may be generated by inverse quantization without undergoing an inverse transform process. On the other hand, if the flag indicates that the space axis quantization has not been performed, inverse quantization is performed based on the entropy decoded transform coefficient and the quantization parameter, and the

inverse transform unit

230, 231, and 232 performs inverse transform on the inverse quantized transform coefficient to residual You can generate a signal. Details of the space axis quantization method will be described later.

The

inverse transform units

230, 231, and 232 may inverse transform the transform coefficients inversely quantized by the

inverse quantizers

220, 221, and 222 to generate a residual block. The inverse transform may be performed based on a transform unit (TU) determined by the 3D video encoder, and may use transform information performed by the encoder. The apparatus 200 for decoding a multiview depth image according to the present invention may not perform inverse transformation according to flag information indicating whether to perform spatial axis quantization.

The

prediction units

240, 241, and 242 may generate a prediction block corresponding to the current block by performing intra prediction and inter prediction. The current block may be a block corresponding to a coding unit (CU) or a block corresponding to a prediction unit (PU).

In the intra mode, the

prediction units

240, 241, and 242 may generate a prediction block by performing prediction using pixel values of blocks that are already encoded around the current block. In the inter mode, the

predictors

240, 241, and 242 compensate for the motion in the

motion compensators

250, 251, and 252 using the motion vector and the reference image stored in the image buffers 270, 271, and 272. By performing the prediction block can be generated.

filter units

260, 261, and 262. The

filter units

260, 261, and 262 may receive filter related information applied to the corresponding block from the encoder and perform filtering on the corresponding block in the decoder.

For example, when the encoder performs encoding according to HEVC, the encoder may be an in-loop filter. In addition, when encoding is performed according to HEVC, a deblocking filter may be applied to remove blocking artifacts on a coding unit (CU) or a prediction unit (PU). If the encoder performs spatial axis quantization, blocking artifacts do not occur in the spatial domain. Therefore, in the present invention, a filter capable of removing edge noise while maintaining an edge component, for example, a generally known bidirectional filter ( bilateral filters) can be used.

In addition, the apparatus 200 for decoding a multiview depth image according to an embodiment of the present invention may use an anisotropic median filter to improve the accuracy of the edge region in the reconstructed depth image that has passed through the

filter units

260, 261, and 262. filter 265, 266, 267 can be used. An anisotropic intermediate filter can be used to remove noise in a specific direction, and filter pixel values in the region to which the noise is to be removed to the intermediate values in the region. Details of the filtering method using the anisotropic intermediate filter according to an embodiment of the present invention will be described later.

The

image buffer units

270, 271, and 272 store the reconstruction block. The reconstruction block may be a reconstruction block filtered by the

filter units

260, 261, and 262 or a reconstruction block filtered by the anisotropic

intermediate filters

265, 266, and 267. The stored reconstructed blocks may be provided to the

predictors

240, 241, and 242 and the

motion compensators

250, 251, and 252, which perform prediction, to be used as reference images.

In the case of the base view image such as the bitstream D ₀ , inter-view prediction is not performed and inter prediction / intra prediction is performed. Since the inter-view prediction is performed in the case of the extended view image such as the bitstreams D ₁ and D ₂ , image information of another view may be referred to. Therefore, in the case of the extended view, prediction may be performed by referring to the reference picture from the image buffer unit at another view. This reference structure for inter-view prediction will be described in more detail with reference to FIGS. 3 to 5.

The decoding apparatus of the multi-view depth image according to the embodiment of the present invention illustrated in FIG. 2 has been shown to perform decoding of one basic view depth image and decoding of two extended view depth images. In addition, two or more extended view depth images may be decoded.

Referring to FIG. 3, three views V ₀ , V ₁ , and V ₂ may be different views. The view point V ₀ is a view that is encoded without prediction from another view and may be a base view or an I view. Point V _1, V ₂ are with reference to the different points in time may be in an extended time of predictive coding, the point V ₂ is a P point (Predictive view) that with reference to only a single time point the coded predictive encoding, the point V ₁ is both It may be a B view (Interpolative view) that is predictively encoded with reference to two viewpoints.

Each picture is divided into an I picture (Intra picture), a P picture (Predictive picture), and a B picture (Interpolative picture) according to an encoding type. The I picture encodes the image itself without inter-picture prediction, the P picture predicts and encodes the picture using the reference picture only in the forward direction, and the B picture uses the reference picture in both the forward and backward directions to inter-picture predictive encoding.

A, with the exception of the point V ₀ default time point (V _1, V _2), as shown in Figure 3 can be encoded by a cross-reference to an image obtained at different time points _{_{(V 0, V 1, V}} 2) , and The encoded image may be transmitted to the decoders illustrated in FIGS. 1 and 2. At this time, the view point V _{0, which} is the base view transmitted to the decoder, does not perform inter-view prediction but only inter or intra prediction between images or within an image. View points V ₁ and V _{2, which} are extended views, perform inter-view prediction using a reference picture stored in an image buffer unit according to a reference structure as shown in FIG. 3 to decode the picture. Here, the arrow indicates a reference relationship between the images.

Referring to FIG. 4, the viewpoint V ₂ may perform inter-view prediction with reference to an image acquired at the viewpoint V ₀ . For example, the image B ₆ of the view V ₂ is based on a reference picture list 0 for forward prediction and a reference picture list 1 for backward prediction. , Inter-view prediction may be performed by referring to the image of view V ₀ included in the reference image lists 0 and 1. For example, the reference picture list 0 and 1 includes a picture B ₆ at the time of V ₀ to the prediction point, the image of the point V ₂ B _6, and refer to it.

In this case, when the reference picture is insufficient in the reference picture list 1, the reference picture included in the reference picture list 0 may be copied and used by using the generalized pan and b (GPB) concept of HEVC.

Referring to FIG. 5, the viewpoint V ₁ may perform inter-view prediction with reference to an image acquired at the viewpoint V ₀ and an image acquired at the viewpoint V ₂ . For example, the by placing all of the image B ₆ and video B ₆ of the point V ₂ of the point V ₀ in the reference picture list 0 and 1, the picture B ₆ of the point V ₁ reference you can perform a prediction time .

On the other hand, since the decoded depth image is used for synthesizing the virtual view, when the accuracy of the depth image is improved, the image quality of the synthesized virtual view image may also be improved. Since the human visual system mainly recognizes three-dimensional depth through binocular parallax around sharp edges, the distortion in the edge region may not reduce the image quality of the three-dimensional video image and may not give a three-dimensional effect. Therefore, the subjective image quality of the virtual viewpoint image may be improved by minimizing the edge region distortion of the depth image.

When encoding the image, an error occurs in the entire region when the residual signal is converted and quantized in the frequency domain, thereby reducing the quality of the depth image. Accordingly, the present invention provides a spatial axis quantization method that can reduce the error caused by the frequency axis quantization and preserve the edge region in the depth image.

6 is a flowchart schematically illustrating a space axis quantization method according to an embodiment of the present invention. The method of FIG. 6 may be performed by a 3D video encoder (hereinafter, referred to as an 'encoder').

Referring to FIG. 6, the encoder obtains a residual signal by performing a prediction process on a current depth image (S600). The current depth image may be predicted based on a coding unit or a prediction unit, and the residual signal is a difference between the prediction unit on which the prediction is performed and the prediction target block in the current depth image.

The encoder determines whether to spatial-quantize the residual signal (S610). That is, the encoder determines whether to perform the frequency axis quantization by transforming the residual signal or perform spatial axis quantization without transforming the residual signal. This can be adaptively selected depending on whether the encoder has high efficiency in the process of performing Rate Distortion Optimization (RDO) on the basis of the transform unit for the current depth image.

Whether to perform such spatial axis quantization is determined based on the transform unit in the current depth image, and the information about the determined result may be encoded using a flag and then signaled to the decoder. For example, one bit (eg, spatial_quantization_enable_flag) for each transformation unit may be used to indicate whether to perform spatial axis quantization.

If it is determined in step S610 that spatial axis quantization is to be performed, the encoder generates spatial quantized residual signal by performing spatial axis quantization on the residual signal (S620). The spatial axis quantization may be applied in synchronization with a transform unit split flag based on the transform unit. Further, the number and representation values of the quantization representation levels in the spatial domain are determined according to the absolute error amount in the frequency domain for each quantization parameter. In this case, the expression value in the spatial domain may be set by the dispersion of the error generated in the reconstructed depth image. That is, the space axis quantizer may be designed to match the amount of errors generated by the space axis quantization to the amount of errors generated by the frequency axis quantization. In this case, each quantization parameter for spatial axis quantization is defined at the same time to the encoder and the decoder, the encoder does not need to transmit information for the quantizer to the decoder.

The encoder generates a difference value for the residual signal in units of pixels based on the quantized residual signal (S630). The difference value for the quantized residual signal is a difference value between the quantized residual signal of the current pixel in the current depth image and the quantized residual signal of neighboring pixels positioned around the current pixel.

For example, if the current pixel is located in the first column of the current depth image, the pixel located at the top of the current pixel can be determined as the surrounding pixel. If the current pixel is located except the first column in the current depth image, it is located to the left of the current pixel. The pixel may be determined as a neighboring pixel.

Accordingly, the encoder may calculate a difference value for the residual signal quantized in units of pixels using the current pixel and the neighboring pixels with respect to the current depth image, and may encode the same and transmit the encoded value to the decoder.

7 is a flowchart illustrating a method of inverse quantization in a spatial domain according to an embodiment of the present invention. The method of FIG. 7 may be performed by a decoder (hereinafter, referred to as a 'decoder') of the multi-view depth image illustrated in FIG. 2.

Referring to FIG. 7, the decoder entropy decodes the received bitstream to obtain quantization related information (S700). The quantization related information includes flag information indicating whether spatial axis quantization is performed on the current depth image. In the case where spatial axis quantization is performed in the 3D video encoder, difference information on the quantized residual signal is included together with the flag information.

The decoder determines whether spatial axis quantization is performed on the current depth image based on the quantization related information (S710). That is, the decoder can find out the quantization method performed by the encoder using flag information indicating whether to perform spatial axis quantization. For example, it may be determined whether to perform spatial axis quantization based on the value "0" or "1" of the flag spatial_quantization_enable_flag.

As a result of the determination in step S710, when the encoder determines that the residual signal is transformed and quantized into the frequency domain, the decoder performs inverse quantization based on the entropy decoded transform coefficients, and converts the dequantized transform coefficients to obtain a residual signal. (S720).

If it is determined in step S710 that the encoder quantizes the residual signal into the spatial domain without transformation, the decoder obtains the quantized residual signal by performing inverse quantization based on the quantization information, that is, the difference value information for the quantized residual signal. (S730).

In this case, the quantized residual signal may have redundancy unlike the quantization coefficients in the transformed frequency domain. Therefore, the quantized residual signal q 'according to the present invention may be determined by Equation 1 below. In addition, since the difference information about the quantized residual signal is a value calculated based on a pixel unit, the quantized residual signal q 'may be calculated for each pixel in the current depth image.

Equation 1

Where q is the difference value for the quantized residual signal obtained by entropy decoding, and p is the residual signal predicted from the neighboring pixels.

As described above, the difference value q for the quantized residual signal is a difference value between the quantized residual signal of the current pixel in the current depth image and the quantized residual signal of the neighboring pixels positioned around the current pixel. Therefore, according to Equation 1, the decoder predicts the residual signal from the neighboring pixels, and adds the difference value q of the quantized residual signal transmitted from the encoder to the predicted residual signal p of the current neighboring pixel. A quantized residual signal q 'for may be obtained. p may be a residual signal of a neighbor.

Referring to FIG. 8, the quantized residual signal q 'of each pixel in the current depth image 800 is a residual signal p predicted from neighboring pixels at a difference value q for the entropy decoded quantized residual signal. Can be obtained by adding.

For example, if the current pixel is located in the first column 810 in the current depth image 800, the surrounding pixel may be the top pixel located on top of the current pixel, and the current pixel may be in the first column 810. When positioned in the remaining region 820, the surrounding pixels may be left pixels positioned to the left of the current pixels. In this case, the value of the predicted residual signal p for the pixel 801 to first obtain the residual signal q 'may be set to zero.

As described above, the spatial axis quantization method according to an embodiment of the present invention can improve rate-distortion performance when applied to an area including an edge, and can reduce errors caused by quantization in the frequency domain. Therefore, the image quality of the depth image may be improved by improving the image quality of the depth image.

In addition, the present invention provides a method for removing blurring that may occur in a depth image reconstructed by a decoder and ringing artifacts occurring in an edge region in an image in order to improve a quality of a depth image. .

9 is a flowchart illustrating a method of filtering by applying an anisotropic intermediate filter according to an embodiment of the present invention. The method of FIG. 9 may be performed by a decoder (hereinafter, referred to as a 'decoder') of the multi-view depth image illustrated in FIG. 2. In addition, the decoder may be applied to the reconstructed depth image through the in-loop filter.

As described above, the anisotropic median filter may remove noise in a specific direction, and may filter pixels in an area to remove noise by an intermediate value of pixels in the area. For example, the decoder may generate a reconstructed depth image by adding a residual signal obtained based on the above-described spatial axis quantization and a prediction value obtained through the prediction of the depth image. In this case, filtering may be performed by applying an anisotropic intermediate filter to the reconstructed depth image.

Referring to FIG. 9, the decoder determines whether the current pixel area in the reconstructed depth image is an edge area (S900). The current pixel area refers to an area to which the current anisotropic intermediate filter is to be applied in the reconstructed depth image.

In this case, whether the current pixel area is an edge area is compared with a preset threshold based on a difference between pixel values in the current pixel area and intermediate values calculated from peripheral pixels located around the current pixel area. You can judge. For example, a determination equation such as Equation 2 below may be used.

Equation 2

here,

Is the median for the surrounding pixels located at the periphery of the current pixel region, and w _i is the reconstructed pixel value at position i in the current pixel region.

When the value S _Dev calculated by Equation 2 is greater than a preset threshold, it may be determined that the current pixel area is an edge area. In this case, an anisotropic intermediate filter may be applied to the current pixel region.

If it is determined that the current pixel area is an edge area, the decoder divides the pixels in the current pixel area into a plurality of groups based on a pixel value to be applied to the anisotropic intermediate filter in the current pixel area (hereinafter referred to as a 'filtering pixel value'). Classify (S910). In this case, intermediate values of pixels included in the classified plurality of groups may be used as pixel values in the current pixel area.

For example, as shown in Equation 3 below, pixels in the current pixel area may be classified into two groups. Pixels in the current pixel region having a value less than or equal to the filtering target pixel value are classified into the first group R _H , and pixels in the current pixel region having a value greater than or equal to the filtering target pixel value are arranged in the second group R. _L )

Equation 3

Here, w _i is a reconstructed pixel value at position i in the current pixel area, and w _cur is a pixel value of the pixel to be filtered.

The decoder determines the filtering target pixel value based on the median value of the classified pixels in the current pixel area (S920). That is, the filtering target pixel value is determined based on the difference between each of the intermediate values calculated from each of the plurality of classified groups and the filtering target pixel value, and is determined as one of the intermediate values calculated from each of the plurality of classified groups. Is determined.

For example, the process of determining the filtering target pixel value is shown in Equation 4 below.

Equation 4

Here, med is a function for outputting an intermediate value of input pixel values, and w _cur is a pixel value of a pixel to be filtered.

As shown in Equation 4, when the filtering target pixel value is closer to the first intermediate value med (R _H ) than the second intermediate value med (R _L ), the filtering target pixel value is set to the first intermediate value. (med (R _H )), and vice versa, an anisotropic intermediate filter may be applied to the edge region of the current depth image by replacing the pixel value to be filtered with the second intermediate value (med (R _L )). .

The current depth image filtered by applying the above-described anisotropic intermediate filter may be stored in an image buffer and then used as a reference image. In addition, since the anisotropic intermediate filter technique is applied by utilizing peripheral pixels for each pixel, there is no need to signal additional information for the anisotropic intermediate filter technique.

Hereinafter, a high level syntax to which the above-described technique of the present invention is applied is shown.

Table 1 below shows a sequence parameter set (SPS) for the base view color image.

Table 1

seq_parameter_set_rbsp () {	Descriptor
nal_ref_idc	u (2)
zero_bit	u (1)
seq_parameter_set_id	u (5)
pic_width_in_luma_samples	ue (v)
pic_height_in_luma_samples	ue (v)
pad_x	ue (v)
pad_y	ue (v)
max_cu_size	ue (v)
max_cu_depth	ue (v)
log2_min_transform_block_size_minus2	ue (v)
log2_diff_max_min_transform_unit_size	ue (v)
max_transform_hierarchy_depth_inter	ue (v)
max_transform_hierarchy_depth_intra	ue (v)
adaptive_loop_filter_enable_flag	u (1)
DQP_flag	u (1)
LDC_flag	u (1)
spatial_quantization_enable_flag	u (1)
merge_enable_flag	u (1)
for (Int i = 0; i <pcSPS-> getMaxCUDepth (); i ++)
{
amvp_mode/ * AMVP mode for each depth (AM_NOME or AM_EXPL) * /	u (1)
}
bit_depth_minus_8	ue (v)
}

spatial_quantization_enable_flag indicates whether spatial axis quantization according to the present invention described above is performed. For example, the encoder may set the value of spatial_quantization_enable_flag to "0" or "1" according to whether spatial axis quantization is performed and transmit the same to the decoder.

Table 2 below shows a sub-sequence parameter set for the color image and the depth map of the enhanced view.

TABLE 2

sub_seq_parameter_set_rbsp () {	Descriptor
nal_ref_idc	u (2)
zero_bit	u (1)
seq_parameter_set_id	u (5)
pic_width_in_luma_samples	ue (v)
pic_height_in_luma_samples	ue (v)
pad_x	ue (v)
pad_y	ue (v)
max_cu_size	ue (v)
max_cu_depth	ue (v)
log2_min_transform_block_size_minus2	ue (v)
log2_diff_max_min_transform_unit_size	ue (v)
max_transform_hierarchy_depth_inter	ue (v)
max_transform_hierarchy_depth_intra	ue (v)
adaptive_loop_filter_enable_flag	u (1)
DQP_flag	u (1)
LDC_flag	u (1)
spatial_quantization_enable_flag	u (1)
merge_enable_flag	u (1)
for (Int i = 0; i <pcSPS-> getMaxCUDepth (); i ++)
{
amvp_mode/ * AMVP mode for each depth (AM_NOME or AM_EXPL) * /	u (1)
}
bit_depth_minus_8	ue (v)
color_video_flag	u (1)
if (color_video_flag)
{
color_view_id	ue (v)
color_inter_view_prediction_pictures_first_flag	u (1)
color_num_views_minus_one	ue (v)
for (i = 0; i <color_num_views_minus_one + 1; i ++)
{
color_view_order [i]	ue (v)
}
color_num_anchor_refs_list0	ue (v)
color_num_anchor_refs_list1	ue (v)
color_num_non_anchor_refs_list0	ue (v)
color_num_non_anchor_refs_list1	ue (v)
for (i = 0; i <color_num_anchor_refs_list0; i ++)
{
color_anchor_refs_list0 [i]	ue (v)
}
for (i = 0; i <color_num_anchor_refs_list1; i ++)
{
color_anchor_refs_list1 [i]	ue (v)
}
for (i = 0; i <color_num_non_anchor_refs_list0; i ++)
{
color_non_anchor_refs_list0 [i]	ue (v)
}
for (i = 0; i <color_num_of _non_anchor_refs_list1; i ++)
{
color_non_anchor_refs_list1 [i]	ue (v)
}
}
Else
{
depth_view_id	ue (v)
depth_inter_view_prediction_pictures_first_flag	u (1)
depth_num_views_minus_one	ue (v)
for (i = 0; i <depth_num_views_minus_one + 1; i ++)
{
depth_view_order [i]	ue (v)
}
depth_num_anchor_refs_list0	ue (v)
depth_num_anchor_refs_list1	ue (v)
depth_num_non_anchor_refs_list0	ue (v)
depth_num_non_anchor_refs_list1	ue (v)
for (i = 0; i <depth_num_anchor_refs_list0; i ++)
{
depth_anchor_refs_list0 [i]	ue (v)
}
for (i = 0; i <depth_num_anchor_refs_list1; i ++)
{
depth_anchor_refs_list1 [i]	ue (v)
}
for (i = 0; i <depth_num_non_anchor_refs_list0; i ++)
{
depth_non_anchor_refs_list0 [i]	ue (v)
}
for (i = 0; i <depth_num_of _non_anchor_refs_list1; i ++)
{
depth_non_anchor_refs_list1 [i]	ue (v)
}
}
}

As described above, spatial_quantization_enable_flag indicates whether spatial axis quantization according to the present invention described above is performed. For example, the encoder may set the value of spatial_quantization_enable_flag to "0" or "1" according to whether spatial axis quantization is performed and transmit the same to the decoder.

color_video_flag indicates whether it is a color image or a depth image.

The color_inter_view_prediction_pictures_first_flag indicates whether to predict the inter-view of the color image. When performing inter-view prediction, a reference image list for a color image is generated using color_num_anchor_refs_list0, color_num_anchor_refs_list1, color_num_non_anchor_refs_list0, and color_num_non_anchor_refs_list1.

The depth_inter_view_prediction_pictures_first_flag indicates whether the depth image is inter-view prediction. When performing inter-view prediction, reference image lists for depth images are generated using depth_num_anchor_refs_list0, depth_num_anchor_refs_list1, depth_num_non_anchor_refs_list0, and depth_num_non_anchor_refs_list1.

Table 3 below shows information that may be included in the prefix network abstraction layer (NAL).

TABLE 3

nal_unit_prefix_mvc_extention () {	Descriptor
priority_id	u (2)
zero_bit	u (1)
prefix_id	u (5)
non_idr_flag	u (1)
view_id	u (10)
reserved_bits	u (3)
anchor_pic_flag	u (1)
inter_view_flag	u (1)
reserved_one_bit	u (1)
}

Table 4 below shows a picture parameter set (PPS).

Table 4

pic_parameter_set_rbsp () {	Descriptor
nal_ref_idc	u (2)
zero_bit	u (1)
seq_parameter_set_id	u (5)
}

The high level syntax may be added to the bitstream and transmitted from the encoder to the decoder. The decoder may decode information included in the high level syntax from the transmitted bitstream at the same level as the encoder. By using this, it is possible to decode the bitstream using a procedure opposite to that of the encoder.

The above description is merely illustrative of the technical idea of the present invention, and those skilled in the art to which the present invention pertains may make various modifications and changes without departing from the essential characteristics of the present invention. Therefore, the embodiments disclosed in the present invention are not intended to limit the technical idea of the present invention but to describe the present invention, and the scope of the technical idea of the present invention is not limited thereto. The protection scope of the present invention should be interpreted by the claims, and all technical ideas within the equivalent scope should be interpreted as being included in the scope of the present invention.

Claims

Receiving entropy decoding of quantization information of a current depth image; And

Obtaining a quantized residual of the current depth image based on the quantization information;

The quantization information is a multi-view video decoding method comprising the flag information indicating whether or not to perform spatial axis quantization (spatial quantization) for the current depth image.
The method of claim 1,

When spatial axis quantization is performed on the current depth image, the quantization information includes difference value information on the quantized residual signal of the current depth image.

The difference value with respect to the quantized residual signal is a multi-view video decoding, characterized in that the difference between the quantized residual signal of the current pixel in the current depth image and the quantized residual signal of the peripheral pixels located around the current pixel. Way.
The method of claim 2,

Acquiring the quantized residual signal,

Predicting a residual signal from the surrounding pixels; And

And adding a difference value between the predicted residual signal and the quantized residual signal of the current depth image.
The method of claim 3,

The peripheral pixel,

When the current pixel is located in the first column of the current depth image, it is an upper pixel located on the top of the current pixel.

And the current pixel is a left pixel positioned to the left of the current pixel when the current pixel is located in a region other than the first column of the current depth image.
The method of claim 1,

The flag information indicating whether the spatial axis quantization is performed is information encoded and transmitted based on a transform unit (TU).
An entropy decoder which entropy decodes the quantization information of the current depth image; And

A dequantization unit configured to obtain a quantized residual signal of the current depth image based on the quantization information,

And the quantization information includes flag information indicating whether to perform spatial axis quantization on the current depth image.
The method of claim 6,

When spatial axis quantization is performed on the current depth image, the quantization information includes difference value information on the quantized residual signal of the current depth image.

The difference value with respect to the quantized residual signal is a multi-view video decoding, characterized in that the difference between the quantized residual signal of the current pixel in the current depth image and the quantized residual signal of the peripheral pixels located around the current pixel. Device.
The method of claim 7, wherein

The inverse quantization unit,

A multi-view video, wherein the residual signal is predicted from the neighboring pixels, and the difference value between the predicted residual signal and the quantized residual signal of the current depth image is added to obtain a quantized residual signal of the current depth image. Decryption device.
The method of claim 8,

The peripheral pixel,

When the current pixel is located in the first column of the current depth image, it is an upper pixel located on the top of the current pixel.

And the current pixel is a left pixel positioned to the left of the current pixel when the current pixel is located in a region other than the first column of the current depth image.
The method of claim 6,

And flag information indicating whether to perform the spatial axis quantization is information encoded and transmitted based on a transform unit (TU).
Receiving and entropy decoding the bitstream; And

And performing an filtering using an anisotropic median filter on the current depth image reconstructed based on the entropy decoded signal.
The method of claim 11,

Performing filtering by using the anisotropic intermediate filter,

Determining whether a current pixel area in the current depth image is an edge area;

If the current pixel area is an edge area, classifying pixels in the current pixel area into a plurality of groups based on a value of a pixel to be filtered in the current pixel area; And

Determining the filtering target pixel value based on a difference between each of the intermediate values calculated from each of the classified plurality of groups and the filtering target pixel value,

The filtering target pixel value is determined as one of the intermediate values calculated from each of the classified plurality of groups.
The method of claim 12,

In the determining of whether the current pixel area is an edge area,

And comparing a predetermined threshold based on a difference between pixel values in the current pixel region and intermediate values calculated from neighboring pixels positioned around the current pixel region.
The method of claim 12,

In the step of classifying into a plurality of groups,

Classify pixels in the current pixel area having a value less than or equal to the filtering target pixel value into a first group,

And classifying pixels in the current pixel area having a value greater than or equal to the filtering target pixel value into a second group.
The method of claim 14,

In the determining of the filtering target pixel value,

The first intermediate value calculated from the first group and the second intermediate value calculated from the second group may be determined as the filtering target pixel value. Point video decoding method.
The method of claim 11,

And storing the current depth image filtered using the anisotropic intermediate filter in an image buffer.
An entropy decoder configured to receive the bitstream and entropy decode the bitstream; And

And a filter unit configured to perform filtering using an anisotropic median filter on the current depth image reconstructed based on the entropy decoded signal.