WO2012026734A2

WO2012026734A2 - Encoding/decoding apparatus and method using motion vector sharing of a color image and a depth image

Info

Publication number: WO2012026734A2
Application number: PCT/KR2011/006205
Authority: WO
Inventors: 오병태; 박두식; 위호천
Original assignee: 삼성전자 주식회사
Priority date: 2010-08-24
Filing date: 2011-08-23
Publication date: 2012-03-01
Also published as: WO2012026734A3; KR20120018906A

Abstract

Disclosed are an encoding/decoding apparatus and method using motion vector sharing of a color image and a depth image. When a depth image is compressed, a motion vector of a color image and mode information are shared in order to reduce the amount of bits generated for a depth image and improve the resolution of a combined image.

Description

Encoding / Decoding Apparatus and Method Using Motion Vector Sharing of Color Image and Depth Image

One embodiment of the present invention relates to an apparatus and method for encoding / decoding for a 3D image, and more particularly, to an apparatus and method for sharing a motion vector of a color image when encoding and decoding a depth image.

When encoding / decoding a 3D image, the depth image is treated as an image independent from the color image, and then encoded / decoded using the conventional H.264 / MPEG-4 AVE compression method. In this case, a bit is generated for each of the depth image and the color image, causing a problem in the limited bandwidth.

There are two prerequisites for this problem.

First, depth images exhibit different properties than color images. For example, the depth image includes more low-frequency components than the color image, and the flat regions form distinct outlines and thus include frequency components in the intermediate band. Because of this property, it is difficult to expect high compression efficiency when applying block DCT (Discrete Cosine Transform) or quantization-based H.264 / MPEG-4 AVC.

Second, since the color image and the depth image represent images at the same time and at the same time in color and depth, the correlation between the two is high. Therefore, compressing / transmitting the color image and the depth image independently of each other is quite inefficient.

In view of these prerequisites, a method for efficiently compressing and transmitting a depth image and a color image is required.

An encoding apparatus according to an embodiment of the present invention includes a data extractor for extracting a motion vector and mode information of a color image; And a depth image encoder configured to encode the depth image by setting the extracted data as a motion vector and mode information of the depth image.

The encoding apparatus according to an embodiment of the present invention may further include an MVS determination unit that determines whether to apply motion vector sharing (MVS).

Decoding apparatus according to an embodiment of the present invention includes an MVS determination unit for determining whether to apply the motion vector sharing; A data setting unit configured to set motion vector and mode information of a color image as motion vector and mode information of a depth image when the motion vector sharing is applied; And a depth image decoder which decodes the depth image using the motion vector and the mode information of the depth image.

An encoding method according to an embodiment of the present invention includes the steps of extracting a motion vector and mode information of a color image; And encoding the depth image by setting the extracted data as a motion vector and mode information of the depth image.

The encoding method according to an embodiment of the present invention may further include determining whether to apply Motion Vector Sharing (MVS).

Decoding method according to an embodiment of the present invention comprises the steps of determining whether to apply motion vector sharing; When applying the motion vector sharing, setting the motion vector and the mode information of the color image as the motion vector and the mode information of the depth image; And decoding the depth image using the motion vector and the mode information of the depth image.

According to an embodiment of the present invention, by sharing the mode information and the motion vector of the color image when encoding the depth image, the amount of bits generated when compressing and transmitting the depth image may be reduced.

According to an embodiment of the present invention, whether the motion vector is shared or not is expressed as a flag or a threshold and transmitted to the decoding device, so that the decoding device may easily determine whether to apply the motion vector sharing.

According to an embodiment of the present invention, the compression efficiency of the depth image may be improved by allocating a higher quantization parameter to a block to which motion vector sharing is applied than to a block to which motion vector sharing is not applied.

1 is a diagram illustrating an encoding device and a decoding device according to an embodiment of the present invention.

2 is a block diagram illustrating a detailed configuration of an encoding apparatus according to an embodiment of the present invention.

3 is a block diagram showing a detailed configuration of a decoding apparatus according to an embodiment of the present invention.

4 is a diagram illustrating mode information on a depth image according to an embodiment of the present invention.

5 illustrates an example of applying a quantization parameter differently for each block according to an embodiment of the present invention.

6 is a flowchart illustrating an encoding method according to an embodiment of the present invention.

7 is a flowchart illustrating a decoding method according to an embodiment of the present invention.

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

Referring to FIG. 1, the encoding apparatus 101 may encode a depth image constituting a 3D image. The encoded depth image may be transmitted to the decoding apparatus 102 in the form of a bitstream.

In this case, when encoding the depth image, the encoding apparatus 101 may reduce the amount of bits generated for the depth image by sharing mode information with the motion vector of the color image. In detail, the encoding apparatus 101 may determine whether to share the mode information with the motion vector of the color image. In addition, when encoding the depth image, the encoding apparatus 101 may determine a method of processing the residual information of the depth image differently for each block.

Then, the decoding apparatus 102 may decode the depth image using information indicating whether to share the mode information with the motion vector transmitted from the encoding apparatus 101.

Referring to FIG. 2, the encoding apparatus 201 may include an MVS determiner 202, a data extractor 203, and a depth image encoder 204.

The MVS determination unit 202 may determine whether to apply motion vector sharing (MVS) to the depth image. That is, when encoding the depth image, the MVS determination unit 202 may determine whether to apply motion vector sharing, which means sharing mode information and a motion vector of the color image with the depth image.

For example, the MVS determination unit 202 may use a motion vector sharing using a rate-distortion cost (RD cost) based on a rate of a depth image and a distortion of a composite image of an intermediate view. MVS) can be determined whether or not to apply. The MVS determination unit 202 may determine whether to apply motion vector sharing for each block by determining a ratio distortion cost per block. The MVS determination unit 202 may calculate a ratio distortion cost when the motion vector sharing is applied to each block and when it is not applied, and select a case where the ratio distortion cost is low.

In this case, the MVS determination unit 203 may calculate the distortion of the synthesized image of the intermediate view using at least one of a global disparity or a warping parameter. In addition, the MVS determination unit 203 may predict the distortion of the composite image of the intermediate view using the offset of the color image.

Through this process, the MVS determination unit 203 may express information indicating that motion vector sharing is applied as a flag. For example, if the flag is 0, motion vector sharing is applied, and if the flag is 1, motion vector sharing is not applied. Such a flag may be set for each block constituting the depth image. The decoding apparatus may determine whether motion vector sharing is applied to the block to be decoded through the flag.

At this time, since the flag is set for each block, overhead due to the flag may occur. For example, the MVS determiner 203 may determine the ratio distortion cost based on the distortion of the composite image at the intermediate view and the ratio of the depth image to at least one of the remaining information of the color image, the flatness of the color image, and the flatness of the depth image. Rate-Distortion Cost) can be calculated to encode a threshold that represents the optimal rate of distortion. Here, the remaining information of the color image, the flatness of the color image, and the flatness of the depth image refer to information that can be used in the decoding apparatus. The MVS determination unit 203 may calculate the ratio distortion cost of each of the residual information of the color image, the flatness of the color image, and the flatness of the depth image, and transmit only threshold values representing the optimal ratio distortion cost to the decoding apparatus.

The depth image encoder 204 may encode the depth image by setting the motion vector and the mode information of the color image as the motion vector and the mode information of the depth image. For example, the depth image encoder 204 may encode the residual information of the depth image with a quantization parameter (QP) higher than a preset quantization parameter or skip (SKIP) the encoding of the residual information. Residual information of the depth image means a value remaining according to motion compensation based on a motion vector.

In other words, since a block to which motion vector sharing is applied has low importance of residual information, it is efficient to increase the quantization parameter and compress / transmit with only a few bits. In this case, the depth image encoder 204 may apply the same quantization parameter to the entire image, or may apply a different quantization parameter for any one unit of a frame unit, a group of picture (GOP), or a block unit.

The depth image encoder 204 may transmit the encoded depth image to the decoding apparatus through a bitstream.

Referring to FIG. 3, the decoding apparatus 301 may include an MVS determination unit 302, a data setting unit 303, and a depth image decoding unit 304.

The MVS determination unit 302 may determine whether to apply motion vector sharing. For example, the MVS determination unit 302 may determine whether to apply motion vector sharing based on the flag. As another example, the MVS determination unit 302 may apply motion vector sharing based on one of the remaining information of the color image, the flatness of the color image, and the flatness of the depth image, and a threshold value indicating an optimal ratio distortion cost. Can be determined.

That is, the encoding apparatus may determine a motion vector sharing in which the depth image shares the motion vector and mode information of the color image, and transmit notification information of motion vector sharing such as a flag and a threshold to the decoding apparatus. Then, the decoding apparatus may determine whether to apply the motion vector sharing based on the notification information of the motion vector sharing.

When the motion vector sharing is applied, the data setting unit 303 may set the motion vector and the mode information of the color image as the motion vector and the mode information of the depth image.

The depth image decoder 304 may decode the depth image using the motion vector and the mode information of the depth image. If the motion vector sharing is not applied, the depth image decoder 304 may decode the depth image by determining an optimal motion vector and mode information in the depth image itself.

As shown in FIG. 4, any one of intra prediction, motion estimation / motion compensation, or motion vector sharing (MVS) may be determined as mode information on the depth image when encoding the depth image. .

Intra prediction and motion estimation / motion compensation mean that encoding is performed by finding an optimal motion vector and mode information in the depth image itself. In addition, motion vector sharing means setting the motion vector and the mode information found in the color image as the motion vector and the mode information of the depth image. Due to the sharing of motion vectors, the amount of bits generated in the depth image can be reduced.

The encoding apparatus may determine whether to apply motion vector sharing for each block. According to one embodiment of the invention, the rate distortion cost may be applied to determine whether to apply motion vector sharing. In the 3D image, unlike the conventional 2D image compression, when defining the distortion of the depth image, it is necessary to determine how accurately the composite image representing the intermediate view is generated rather than the depth image itself. This is because the depth image is a kind of additional information for generating a composite image of an intermediate view.

Therefore, the encoding apparatus may determine whether to apply the motion vector sharing or the motion vector sharing for each block constituting the depth image through the rate distortion cost. The ratio distortion cost mentioned in the present invention may be calculated based on the ratio of the depth image and the distortion of the composite image at the intermediate view. In fact, the distortion of the depth image may be expressed as the distortion of the composite image of the intermediate view.

In this case, the distortion of the composite image of the intermediate view may be actually calculated using at least one of global disparity or warping parameters.

Alternatively, the distortion of the composite image of the intermediate view may be predicted using the offset of the color image. For example, assuming that a depth image and a color image are acquired in a parallel camera, when the error of the depth image generated by compression or sharing of motion vectors is ΔD, the prediction error of the intermediate image of the intermediate view ( Dc) is shown in Equation 1 below.

Through this process, the encoding apparatus may calculate the rate distortion cost when the motion vector sharing is applied to each block and when the motion vector sharing is not applied, thereby selecting a case in which the rate distortion cost is small.

Then, the encoding apparatus may determine whether the motion vector sharing is applied for each block in the decoding apparatus through a flag indicating whether to apply the motion vector sharing.

Alternatively, in order to reduce overhead due to a flag allocated for each block, the encoding apparatus may predict a bit value of the flag and transmit it to the decoding apparatus. Specifically, the encoding apparatus may reduce overhead caused by the bit value of the flag by selectively determining whether motion vector sharing is applied based on some information that can be obtained by the decoding apparatus. Examples of information that can be used in the decoding apparatus may include residual information of a color image, flatness of a color image, or flatness of a depth image.

The encoding apparatus calculates a ratio distortion cost for each information that can be used by the decoding apparatus, calculates a threshold value having an optimal ratio distortion cost, and sends only a partial threshold value to the bit value of the flag by not sending a flag for each block. Overhead headers can be reduced. The threshold value may be changed in units of blocks, frames, or groups of pictures (GOP).

Blocks to which motion vector sharing is applied have a lower importance than residual blocks to which motion vector sharing is not applied. Accordingly, as shown in FIG. 5, the encoding apparatus may assign a higher quantization parameter to a block to which motion vector sharing is applied to lower the quality of the block. Alternatively, the encoding apparatus may skip without performing encoding of the residual information.

In FIG. 5, QP denotes a quantization parameter, and a block denoted by QP denotes a block to which motion vector sharing is not applied, and a block denoted by QP or a block denoted by SKIP denotes a block to which motion vector sharing is applied. do. As shown in FIG. 5, even if motion vector sharing is applied, different quantization parameters may be applied to each block.

By determining the quantization parameter higher, the quantization result is derived at a lower value, so that the depth image can be compressed / transmitted with fewer bits to benefit from substantially limited bandwidth.

A quantization parameter determined higher than a block to which motion vector sharing is not applied may be assigned the same value to the entire depth image. In addition, different values may be assigned according to frame units, GOP units, or block units.

The encoding apparatus may determine whether to apply motion vector sharing (S601).

For example, the encoding apparatus may determine whether to apply the motion vector sharing using the ratio distortion cost based on the ratio of the depth image and the distortion of the composite image of the intermediate view. In this case, the encoding apparatus may calculate the distortion of the synthesized image of the intermediate view using at least one of a global disparity or a warping parameter. Alternatively, the encoding apparatus may predict the distortion of the synthesized image of the intermediate view using the offset of the color image.

Thereafter, the encoding apparatus may encode whether to apply motion vector sharing by using a flag. Alternatively, the encoding apparatus may include a rate distortion cost based on a ratio of a depth image and a distortion of a composite image at an intermediate point in time to at least one of residual information of a color image, flatness of a color image, and flatness of a depth image. We can calculate the threshold value that represents the optimal ratio distortion cost by calculating.

The encoding apparatus may extract the motion vector and the mode information of the color image (S602).

Thereafter, the encoding apparatus may encode the depth image by setting the motion vector and the mode information of the color image as the motion vector and the mode information of the depth image (S603). The encoding apparatus may encode the depth image by finding an optimal motion vector and mode information in the depth image itself for a block to which motion vector sharing is not applied.

The decoding apparatus may determine whether to apply motion vector sharing (S701). For example, the decoding apparatus may determine whether to apply motion vector sharing based on a flag transmitted by the encoding apparatus. Alternatively, the decoding apparatus may determine whether to apply the motion vector sharing based on one of the remaining information of the color image, the flatness of the color image, and the flatness of the depth image, and a threshold value indicating an optimal ratio distortion cost. have.

If motion vector sharing is applied, the decoding apparatus may set the motion vector and the mode information of the color image as the motion vector and the mode information of the depth image (S702).

Then, the decoding apparatus may decode the depth image using the motion vector and the mode information of the depth image (S703).

Methods according to an embodiment of the present invention can be implemented in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts.

As described above, the present invention has been described by way of limited embodiments and drawings, but the present invention is not limited to the above embodiments, and those skilled in the art to which the present invention pertains various modifications and variations from such descriptions. This is possible.

Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined not only by the claims below but also by the equivalents of the claims.

Claims

A data extraction unit for extracting motion vector and mode information of a color image; And

A depth image encoder configured to encode the depth image by setting the extracted data as a motion vector and mode information of a depth image

Encoding apparatus comprising a.
The method of claim 1,

The encoder,

And encoding the residual information of the depth image with a quantization parameter higher than a preset quantization parameter or skipping encoding of the residual information (SKIP).
The method of claim 2,

The encoder,

In the case where the residual information of the depth image is encoded with a quantization parameter higher than a preset quantization parameter, the same quantization parameter is applied to the entire image or is different for each unit of frame unit, group of picture (GOP), or block unit. An encoding device characterized by applying a quantization parameter.
The method of claim 1,

MVS decision unit to decide whether to apply Motion Vector Sharing (MVS)

Encoding apparatus further comprising.
The method of claim 4, wherein

The MVS determination unit,

And determining whether to apply motion vector sharing using a rate-distortion cost based on the ratio of the depth image and the distortion of the composite image at the intermediate view.
The method of claim 5,

The MVS determination unit,

And a distortion of the composite image of the intermediate view is calculated using at least one of global disparity or warping parameter.
The method of claim 5,

The MVS determination unit,

And a distortion of the composite image of the intermediate view using the offset of the color image.
The method of claim 4, wherein

The MVS determination unit,

An encoding device characterized by encoding with a flag whether to apply motion vector sharing.
The method of claim 4, wherein

The MVS determination unit,

For at least one of the remaining information of the color image, the flatness of the color image, and the flatness of the depth image, a rate-distortion cost is calculated based on the ratio of the depth image and the distortion of the composite image at the intermediate view. Encoding a threshold indicating a rate distortion cost.
An MVS determination unit determining whether to apply motion vector sharing;

A data setting unit configured to set motion vector and mode information of a color image as motion vector and mode information of a depth image when the motion vector sharing is applied; And

Depth image decoder for decoding the depth image using the motion vector and mode information of the depth image

Decoding apparatus comprising a.
The method of claim 10,

The MVS determination unit,

And determining whether to apply motion vector sharing based on the flag.
The method of claim 10,

The MVS determination unit,

And determining whether to apply motion vector sharing based on one of the remaining information of the color image, the flatness of the color image, and the flatness of the depth image, and a threshold value indicating an optimal ratio distortion cost. .
Extracting motion vector and mode information of the color image; And

Encoding the depth image by setting the extracted data as a motion vector and mode information of a depth image

Encoding method comprising a.
The method of claim 13,

The encoding of the depth image may include:

And encoding the residual information of the depth image with a quantization parameter higher than a preset quantization parameter or skipping encoding of the residual information (SKIP).
The method of claim 14,

The encoding of the depth image may include:

In the case where the residual information of the depth image is encoded with a quantization parameter higher than a preset quantization parameter, the same quantization parameter is applied to the entire image or is different for each unit of frame unit, group of picture (GOP), or block unit. A coding method characterized by applying a quantization parameter.
The method of claim 13,

Steps to determine whether to apply Motion Vector Sharing (MVS)

Encoding method further comprising.
The method of claim 16,

Determining whether to apply the motion vector sharing,

And determining whether to apply motion vector sharing using a rate-distortion cost based on the ratio of the depth image and the distortion of the composite image at the intermediate view.
The method of claim 17,

Determining whether to apply the motion vector sharing,

And a distortion of the synthesized image of the intermediate view is calculated using at least one of a global disparity or a warping parameter.
The method of claim 17,

Determining whether to apply the motion vector sharing,

And a distortion of the composite image of the intermediate view using the offset of the color image.
The method of claim 16,

Determining whether to apply the motion vector sharing,

An encoding method characterized by encoding with a flag whether to apply motion vector sharing.
The method of claim 16,

Determining whether to apply the motion vector sharing,

For at least one of the remaining information of the color image, the flatness of the color image, and the flatness of the depth image, a rate-distortion cost based on the ratio of the depth image and the distortion of the composite image at the intermediate view is calculated and optimized. Encoding a threshold indicating a rate distortion cost.
Determining whether to apply motion vector sharing;

When applying the motion vector sharing, setting the motion vector and the mode information of the color image as the motion vector and the mode information of the depth image; And

Decoding the depth image using the motion vector and the mode information of the depth image.

Decryption method comprising a.
The method of claim 22,

The determining of whether to apply the motion vector sharing,

And determining whether to apply motion vector sharing based on the flag.
The method of claim 22,

The determining of whether to apply the motion vector sharing,

And determining whether to apply motion vector sharing based on one of the remaining information of the color image, the flatness of the color image, and the flatness of the depth image, and a threshold value indicating an optimal ratio distortion cost. .
A computer-readable recording medium containing a program for performing the image processing method of any one of claims 13 to 24.