JP2014112749A

JP2014112749A - Image coding device and image decoding device

Info

Publication number: JP2014112749A
Application number: JP2011060980A
Authority: JP
Inventors: Sumio Sato; 純生佐藤
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2011-03-18
Filing date: 2011-03-18
Publication date: 2014-06-19
Also published as: WO2012128211A1

Abstract

PROBLEM TO BE SOLVED: To provide a coding device and a decoding device for coding and decoding a distance image.SOLUTION: There is provided a coding device for performing coding by dividing a distance image into a plurality of blocks and performing in-screen prediction on the basis of characteristics of adjacent blocks. There is provided a coding device comprising: selection means for selecting a prediction mode to be applied to each block of a distance image from a plurality of prediction modes; first determination means for determining whether a plurality of depth values are included in adjacent coded blocks; second determination means for determining whether a block determined as including a plurality of depth values by the first determination means has a prediction mode corresponding to a direction toward a block to be coded; prediction means for setting the same value as a prediction value of the prediction mode of the block determined as having the prediction mode by the second determination means as a prediction value of a prediction mode of a block; and coding means for coding the block to be coded using the prediction value of the prediction mode and transmitting the coded block. A decoding device performs decoding on the basis of a coding scheme used in the coding device.

Description

本発明は、画像符号化装置および画像復号装置に関する。 The present invention relates to an image encoding device and an image decoding device.

被写体の三次元形状を、正確に、且つ、効率良く記録することは重要なテーマであり、従来からさまざまな方法が提案されている。その方法の一つとして、被写空間を各被写体および背景の色で表現した一般的な二次元画像であるテクスチャ画像と、被写空間を各被写体および背景までの視点からの距離で表現した画像（以下、「距離画像」と呼ぶ）との二種類の画像データを関連付けて記録する方法がある。距離画像とは、画素ごとに、被写空間中の対応する地点までの視点からの距離値（深度値）を表現する画像である。この距離画像は、例えば、テクスチャ画像を記録するカメラ近傍に設置された、デプスカメラ等の測距装置によって取得できる。あるいは、多視点カメラの撮影によって得られる複数のテクスチャ画像を解析することによっても距離画像を取得することができ、その解析手法も数多く提案されている。 Accurate and efficient recording of the three-dimensional shape of the subject is an important theme, and various methods have been proposed. As one of the methods, a texture image that is a general two-dimensional image that represents the subject space with the color of each subject and the background, and an image that represents the subject space with the distance from the viewpoint to each subject and the background. There is a method of recording in association with two types of image data (hereinafter referred to as “distance image”). A distance image is an image that expresses a distance value (depth value) from a viewpoint to a corresponding point in a subject space for each pixel. This distance image can be acquired, for example, by a distance measuring device such as a depth camera installed in the vicinity of the camera that records the texture image. Alternatively, a distance image can be acquired by analyzing a plurality of texture images obtained by photographing with a multi-viewpoint camera, and many analysis methods have been proposed.

また、距離画像に関する規格として、国際標準化機構／国際電機標準会議（ＩＳＯ／ＩＥＣ）のワーキンググループであるMoving Picture Experts Group（ＭＰＥＧ）において、距離値を２５６段階（８ビットの輝度値）で表現する規格であるＭＰＥＧ−Ｃｐａｒｔ３が定められており、標準的な距離画像は８ビットのグレースケール画像となる。また、視点からの距離が近いほど高い輝度値を割り当てるように規定されているため、標準的な距離画像では、手前に位置する被写体ほど白く、奥に位置する被写体ほど黒く表現される。距離画像の特徴として、テクスチャ画像と比べてより広い領域において単一の画素値が表れる傾向が強いと言える。例えば、テクスチャ画像に派手な柄の服を着ている人物が描かれていても、距離画像においては、服の部分の距離値がほぼ一定になる。 In addition, distance values are expressed in 256 levels (8-bit luminance values) in the Moving Picture Experts Group (MPEG), which is a working group of the International Organization for Standardization / ISO / IEC, as a standard for distance images. The standard MPEG-C part3 is defined, and the standard distance image is an 8-bit grayscale image. In addition, since it is defined that a higher luminance value is assigned as the distance from the viewpoint is shorter, in a standard distance image, a subject located in front is expressed as white and a subject located in the back is expressed in black. As a feature of the distance image, it can be said that a single pixel value tends to appear in a wider area than the texture image. For example, even if a person wearing a fancy pattern is drawn on the texture image, the distance value of the clothes portion is almost constant in the distance image.

同一の被写空間を表現したテクスチャ画像と距離画像とが得られれば、テクスチャ画像に描画されている被写体像を構成する各画素の視点からの距離が距離画像から分かるため、被写体を奥行きが最大２５６段階で表現される三次元形状として復元することができる。さらに、三次元形状を二次元平面上に幾何的に投影することにより、元のテクスチャ画像を、元の角度から一定範囲にある別の角度から被写体を撮影した場合の被写空間のテクスチャ画像に変換することが可能である。すなわち、１組のテクスチャ画像および距離画像によって一定範囲にある任意の角度から見たときの三次元形状を復元できるため、複数組のテクスチャ画像および距離画像を用いることにより三次元形状の自由視点画像を少ないデータ量で表すことが可能である。 If a texture image and a distance image representing the same subject space are obtained, the distance from the viewpoint of each pixel constituting the subject image drawn in the texture image is known from the distance image, so that the subject has the maximum depth. It can be restored as a three-dimensional shape expressed in 256 stages. Furthermore, by projecting the 3D shape onto the 2D plane geometrically, the original texture image is converted into a texture image in the subject space when the subject is photographed from another angle within a certain range from the original angle. It is possible to convert. That is, since a 3D shape can be restored when viewed from an arbitrary angle within a certain range by a set of texture images and distance images, a free viewpoint image of 3D shapes can be obtained by using multiple sets of texture images and distance images. Can be expressed with a small amount of data.

ところで、動画圧縮規格であるＨ．２６４のように、映像が内部に持つ時間的あるいは空間的な冗長性を効率良く排除することにより、映像を圧縮符号化する技術が知られている（例えば、非特許文献１）。この技術を用いた符号化装置により、テクスチャ映像（テクスチャ画像を各フレームとする映像）と距離映像（距離画像を各フレームとする映像）との各映像を符号化すると、各映像が有する冗長性を排除することが可能となり、復号装置に伝送される各映像のデータ量をさらに削減することができる。 By the way, the video compression standard H.264. As in the case of H.264, a technique for compressing and encoding video by efficiently eliminating temporal or spatial redundancy in the video is known (for example, Non-Patent Document 1). When each video of a texture video (video having a texture image as each frame) and a distance video (video having a distance image as each frame) is encoded by an encoding device using this technology, the redundancy that each video has Can be eliminated, and the data amount of each video transmitted to the decoding device can be further reduced.

このＨ．２６４規格では、画面内予測符号化と呼ばれる方法を用いて情報圧縮を図っている。画面内予測符号化とは、符号化対象の一枚の画像を正方形のブロックに分割し、例えばラスタスキャン順に符号化していくとき、符号化対象ブロックの周囲の符号化済みブロックに含まれる画素群から予め符号化対象ブロックを予測するものである。符号化対象ブロックから、この予測ブロックを差し引いた差分信号を直交変換することにより、符号化対象ブロックを直接、直交変換する場合と比べ、直交変換後の周波数スペクトルのエネルギーが低次領域に集中するため、効率的に情報圧縮することができる。 This H. In the H.264 standard, information compression is performed using a method called intra prediction encoding. In-screen predictive encoding is a group of pixels included in an encoded block around an encoding target block when one image to be encoded is divided into square blocks and encoded in, for example, raster scan order. To predict the encoding target block in advance. By performing orthogonal transform on the difference signal obtained by subtracting the prediction block from the encoding target block, the energy of the frequency spectrum after the orthogonal conversion is concentrated in the low-order region compared to the case where the encoding target block is directly orthogonally converted. Therefore, information can be efficiently compressed.

この画面内予測符号化は、輝度信号に対して、４×４画素のサブブロックまたは、１６×１６画素のマクロブロック単位で行うことができる。サブブロックの場合に対し９種類の予測モードがあり、マクロブロックに対しては４種類の予測モードがある。また、色差信号に対しては、８×８画素のブロックに対して、輝度のマクロブロックの場合と同じ、４種類の予測モードがある。 This intra prediction encoding can be performed on the luminance signal in units of 4 × 4 pixel sub-blocks or 16 × 16 pixel macroblocks. There are nine types of prediction modes for sub-blocks and four types of prediction modes for macroblocks. For color difference signals, there are four types of prediction modes for the 8 × 8 pixel block, the same as in the case of the luminance macroblock.

図１９、図２０は、サブブロックに対する９種類の予測モードを模式的に表した図である。図１９に示す４×４画素の符号化対象サブブロックＢ１に対し、その周辺の画素Ａ〜Ｍを用いて予測を行う。図２０がそれらの画素を用いる方向を示したものであり、例えば、モード１の場合、複写方向は左から右への水平方向であるので、画素Ｉ、Ｊ、Ｋ、Ｌを右に向かって複写を繰り返したものが予測ブロックとなる。モード２はＤＣモードと呼ばれ、画素群を指定方向に複写して作成するのではなく、画素Ａ〜ＤとＩ〜Ｌとの８つの画素の平均値によって予測ブロックを作成するものである。モード３〜８は、図２０に示すように、矢印の方向に複写を繰り返したものが予測ブロックとなる。 19 and 20 are diagrams schematically showing nine types of prediction modes for sub-blocks. For the 4 × 4 pixel encoding target sub-block B1 shown in FIG. 19, prediction is performed using the surrounding pixels A to M. FIG. 20 shows the direction in which these pixels are used. For example, in the case of mode 1, since the copying direction is the horizontal direction from left to right, the pixels I, J, K, and L are directed to the right. A block in which copying is repeated becomes a prediction block. Mode 2 is called a DC mode, which does not create a pixel group by copying it in a specified direction, but creates a prediction block based on the average value of eight pixels A to D and I to L. In modes 3 to 8, as shown in FIG. 20, a prediction block is obtained by repeating copying in the direction of the arrow.

また、図２１、図２２は、マクロブロックに対する４種類の予測モードを同様に表した図である。符号化対象マクロブロックに対し、その周辺の画素００〜０Ｆ、１０〜１Ｆを用いて予測を行う。図２２に示すように、予測する方向は垂直方向（モード０）と水平方向（モード１）の２種類のみで、そのほか、前述したＤＣモード（モード２）と、Ｐｌａｎｅモード（モード３）がある。Ｐｌａｎｅモードとは、画素群が滑らかに繋がるようにその間を補間することによって予測ブロックを得るものである。色差信号に対する４種類の予測モードは、周辺の画素群の数が異なるだけで、同じ内容の予測モードとなっている。 FIG. 21 and FIG. 22 are diagrams similarly showing four types of prediction modes for macroblocks. The encoding target macroblock is predicted using the surrounding pixels 00 to 0F and 10 to 1F. As shown in FIG. 22, there are only two types of prediction directions, the vertical direction (mode 0) and the horizontal direction (mode 1). In addition, there are the DC mode (mode 2) and the Plane mode (mode 3) described above. . In the Plane mode, a prediction block is obtained by interpolating between pixel groups so that they are smoothly connected. The four types of prediction modes for color difference signals are the prediction modes having the same contents except for the number of surrounding pixel groups.

そして、サブブロックに対する予測モードの符号化に際しては、符号化対象ブロックの左と上に隣接するブロックの予測モードのうち、番号が小さいモードを符号化対象ブロックの予測モードの予測値とし、その予測値と同じ予測モードの場合は、予測モードの番号の符号化を省略することにより、さらなる圧縮率の向上を図っている。 When encoding the prediction mode for the sub-block, the prediction mode of the prediction mode of the encoding target block is set as the prediction value of the prediction mode of the encoding target block among the prediction modes of the blocks adjacent to the left and above the encoding target block. In the case of the same prediction mode as the value, the compression rate is further improved by omitting the encoding of the prediction mode number.

「ＩＴＵ−Ｔ勧告Ｈ．２６４」，International Telecommunication Union - Telecommunication Standardization Sector，２００９年３月“ITU-T Recommendation H.264”, International Telecommunication Union-Telecommunication Standardization Sector, March 2009

ところで、距離画像は、被写体との距離を表しているため、同じ深度値の一まとまりの範囲が、テクスチャ画像の同じ深度値の一まとまりの範囲と比べ、一般的に非常に大きくなるという特徴を有している。距離画像において、被写体の輪郭部分以外では、距離深度値が画素単位で急激に変化することは稀である。すなわち、隣接ブロック同士で、同じ深度値を持つ確率が非常に高い。これらの特徴から、広い範囲に亘るブロック間の相関は高く、特に、同じ深度値が連続する確率が高い。さらに、被写体の輪郭は、他の被写体と重ならない限り連続するので、一本の輪郭線に沿ったブロック同士では、画面内予測の方向の相関が高くなる。そのほか、距離画像はテクスチャ画像と比べて画面の構成が単純となる傾向が強いため、サブブロックだけでなく、マクロブロックのような大きい単位でのブロック間の相関も非常に高くなることが期待できる。 By the way, since the distance image represents the distance to the subject, the group of the same depth value is generally much larger than the group of the same depth value of the texture image. Have. In the distance image, it is rare that the distance depth value changes abruptly in units of pixels other than the contour portion of the subject. That is, the probability that adjacent blocks have the same depth value is very high. From these features, the correlation between blocks over a wide range is high, and in particular, there is a high probability that the same depth value is continuous. Furthermore, since the contour of the subject is continuous as long as it does not overlap with other subjects, the correlation between the prediction directions in the screen is high between the blocks along one contour line. In addition, distance images tend to have a simpler screen structure than texture images, so not only sub-blocks but also the correlation between blocks in large units such as macroblocks can be expected to be very high. .

しかしながら、Ｈ．２６４規格を距離映像に適用した場合、上述の画面内予測において、情報圧縮が非効率的になってしまうという問題がある。前述した特徴を有する距離画像に対して、上述の画面内予測方法では、ＤＣ予測やＰｌａｎｅ予測のように、距離画像に対してはあまり有効とならないモードが含まれているために、圧縮効率に無駄が生じる。なぜならば、距離画像は前述したように、広い範囲に亘るブロックにおいて、同じ深度値が連続する確率が高いが、ＤＣ予測やＰｌａｎｅ予測では、実際の深度値の中間値を作成することになるため、距離画像における精度のよい予測には適さないからである。さらに、前述したように、隣接ブロックとの予測の方向に関して相関が高いが、その相関に関しては、同じでない限り、ビットが省略できないため、相関性が利用しきれていない。そのうえ、マクロブロックについては、４種類のモードしかなく、そのうち方向を有するものは、ＤＣモードとＰｌａｎｅモード以外の２種類しかないため、距離画像のような単純な画像に対しては、不向きであるという問題がある。 However, H.C. When the H.264 standard is applied to a distance video, there is a problem that information compression becomes inefficient in the above-described intra prediction. For the distance image having the above-described features, the above-described intra-screen prediction method includes a mode that is not so effective for the distance image, such as DC prediction and Plane prediction. Waste occurs. This is because, as described above, the distance image has a high probability that the same depth value continues in a block over a wide range, but in the DC prediction and the Plane prediction, an intermediate value of the actual depth value is created. This is because it is not suitable for accurate prediction in a distance image. Furthermore, as described above, although the correlation is high with respect to the prediction direction with the adjacent block, since the bit cannot be omitted unless the correlation is the same, the correlation is not fully utilized. In addition, since there are only four types of macroblocks, and there are only two types other than the DC mode and the Plane mode, the macroblock is not suitable for a simple image such as a distance image. There is a problem.

本発明は、このような事情に鑑みてなされたもので、距離画像の符号化データの符号量を従来よりも削減することが可能な画像符号化装置、および、この画像符号化装置から供給された符号化データから距離画像を復号する復号装置を提供することを目的とする。 The present invention has been made in view of such circumstances, and is supplied from an image encoding device capable of reducing the amount of code of encoded data of a distance image as compared with the conventional image encoding device. An object of the present invention is to provide a decoding device that decodes a distance image from encoded data.

本発明は、距離画像をブロックに分割し、隣接するブロックの特徴に基づいて画面内予測を行うことにより符号化する画像符号化装置であって、予測モードの中から、前記距離画像の各ブロックに対して適用する予測モードを選択する選択手段と、隣接する符号化済みブロック内に複数の深度値を含むか否かを判定する第１の判定手段と、前記第１の判定手段により、複数の深度値を含むと判定されたブロックが、符号化対象ブロックに向かう方向に対応する予測モードを持っているか否かを判定する第２の判定手段と、前記第２の判定手段により、持っていると判定されたブロックの予測モードと同一のものを前記ブロックの予測モードの予測値とする予測手段と、前記予測モードの予測値を用いて、前記符号化対象ブロックを符号化し伝送する符号化手段とを備えたことを特徴とする。 The present invention is an image encoding device that encodes by dividing a distance image into blocks and performing intra-screen prediction based on features of adjacent blocks, and each block of the distance image is selected from prediction modes. Selection means for selecting a prediction mode to be applied to, a first determination means for determining whether or not adjacent coded blocks include a plurality of depth values, and a plurality of the first determination means, The second determination means for determining whether or not the block determined to include the depth value has a prediction mode corresponding to the direction toward the encoding target block, and the second determination means. A prediction unit that uses a prediction value of the prediction mode of the block that is the same as the prediction mode of the block determined to be present, and the prediction value of the prediction mode. Characterized by comprising an encoding means for.

本発明は、前記隣接する複数の符号化済みブロックは、上と左に隣接するブロックとし、そのいずれからも前記予測値が得られない場合は、左斜め上と右斜め上に隣接するブロックとすることを特徴とする。 According to the present invention, the plurality of adjacent encoded blocks are adjacent to the upper and left blocks, and when the predicted value cannot be obtained from any of the blocks, the blocks adjacent to the upper left and the upper right are It is characterized by doing.

本発明は、前記予測モードは、８通りの方向に対応する予測モードのみから成ることを特徴とする。 The present invention is characterized in that the prediction mode includes only prediction modes corresponding to eight directions.

本発明は、前記予測値が得られるブロックが２つ存在する場合には、それぞれの予測方向の中間方向に対応する予測モードを予測値とすることを特徴とする。 The present invention is characterized in that, when there are two blocks from which the predicted value is obtained, a prediction mode corresponding to an intermediate direction of each prediction direction is used as the predicted value.

本発明は、前記選択した１つのモードを符号化する際、前記予測値の予測方向との方向の差分を符号化することによって、選択した１つのモードを符号化することを特徴とする。 The present invention is characterized in that, when the selected one mode is encoded, the selected one mode is encoded by encoding a difference of a direction of the predicted value from a prediction direction.

本発明は、前記符号化対象ブロックは、４×４画素、８×８画素、１６×１６画素のいずれか、あるいは、それらの組み合わせであることを特徴とする。 The present invention is characterized in that the encoding target block is any one of 4 × 4 pixels, 8 × 8 pixels, 16 × 16 pixels, or a combination thereof.

本発明は、請求項１から６のいずれかに記載された画像符号化装置により符号化された距離画像を復号する画像復号装置であって、前記距離画像の各ブロックに対して、隣接する複数の復号済みブロックに対し、ブロック内に複数の深度値を含むか否かを判定する第１の判定手段と、前記第１の判定手段により、複数の深度値を含むと判定されたブロックが、前記ブロックに向かう方向に対応する予測モードを持っているか否かを判定する第２の判定手段と、前記第２の判定手段により、持っていると判定されたブロックの予測モードと同一のものを、前記ブロックの予測モードの予測値とする予測手段と、前記予測値を用いて、受信した符号化ブロックの予測モードを復号する復号手段とを備えたことを特徴とする。 The present invention is an image decoding device that decodes a distance image encoded by the image encoding device according to any one of claims 1 to 6, wherein a plurality of adjacent images are adjacent to each block of the distance image. A first determination unit that determines whether or not the decoded block includes a plurality of depth values in the block, and a block that is determined to include a plurality of depth values by the first determination unit, A second determination unit that determines whether or not a prediction mode corresponding to a direction toward the block is present; and a prediction mode that is the same as the prediction mode of the block determined to be possessed by the second determination unit. , And a prediction means for making a prediction value of the prediction mode of the block, and a decoding means for decoding the prediction mode of the received encoded block using the prediction value.

本発明は、コンピュータを請求項１から６のいずれかに記載の画像符号化装置として機能させることを特徴とする。 The present invention causes a computer to function as the image encoding apparatus according to any one of claims 1 to 6.

本発明は、コンピュータを請求項７に記載の画像復号装置として機能させることを特徴とする。 According to the present invention, a computer is caused to function as the image decoding device according to claim 7.

本発明は、距離画像の符号化データであって、画像の各ブロックに対して、複数通りの予測方向のみから構成される予測モードの中から１つのモードを選択し、隣接する複数の符号化済みブロックに対し、ブロック内に複数の深度値を含むか否かを判定し、複数の深度値を含むと判定されたブロックが、前記ブロックに向かう方向に対応する予測モードを持っているか否かを判定し、持っていると判定されたブロックの予測モードと同一のものを、前記ブロックの予測モードの予測値とし、前記予測モードを予測値を用いて符号化したことを特徴とする。 The present invention is encoded data of a distance image, and for each block of the image, one mode is selected from prediction modes composed of only a plurality of prediction directions, and a plurality of adjacent encodings are selected. Whether or not a block that has been determined to include a plurality of depth values has a prediction mode corresponding to a direction toward the block. The prediction mode of the block determined to have the same prediction mode as the prediction mode of the block is used, and the prediction mode is encoded using the prediction value.

本発明によれば、距離画像の符号化データの符号量を従来よりも削減することができる符号化装置およびこの符号化装置から供給された符号化データから距離画像を復号する復号装置を実現することができるという効果が得られる。 According to the present invention, an encoding device capable of reducing the code amount of encoded data of a distance image and a decoding device that decodes a distance image from encoded data supplied from the encoding device are realized. The effect that it can be obtained.

本発明の一実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of one Embodiment of this invention. 符号化対象のブロック周辺の画素群を示す説明図である。It is explanatory drawing which shows the pixel group around the block of an encoding target. 符号化対象のブロック周辺の画素群を示す説明図である。It is explanatory drawing which shows the pixel group around the block of an encoding target. 符号化対象のブロック周辺の画素群を示す説明図である。It is explanatory drawing which shows the pixel group around the block of an encoding target. 予測モードを示す説明図である。It is explanatory drawing which shows prediction mode. 複写形式の種類の一例を示す説明図である。It is explanatory drawing which shows an example of the kind of copy format. 複写形式の種類の一例を示す説明図である。It is explanatory drawing which shows an example of the kind of copy format. 複写形式の種類の一例を示す説明図である。It is explanatory drawing which shows an example of the kind of copy format. 複写形式の種類の一例を示す説明図である。It is explanatory drawing which shows an example of the kind of copy format. 複写形式の種類の一例を示す説明図である。It is explanatory drawing which shows an example of the kind of copy format. 複写形式の種類の一例を示す説明図である。It is explanatory drawing which shows an example of the kind of copy format. 複写形式の種類の一例を示す説明図である。It is explanatory drawing which shows an example of the kind of copy format. 複写形式の種類の一例を示す説明図である。It is explanatory drawing which shows an example of the kind of copy format. 図１２に示す矢印群のうちの一つを抜き出して示した説明図である。It is explanatory drawing which extracted and showed one of the arrow groups shown in FIG. 画素の複写を行った状態を示す説明図である。It is explanatory drawing which shows the state which performed the copy of a pixel. 図１に示す画像符号化装置の処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of the image coding apparatus shown in FIG. 図１に示す画像符号化装置の処理動作を示す説明図である。It is explanatory drawing which shows the processing operation of the image coding apparatus shown in FIG. 符号語の一例を示す説明図である。It is explanatory drawing which shows an example of a code word. 符号化対象ブロック周辺画素群を示す説明図である。It is explanatory drawing which shows an encoding object block periphery pixel group. サブブロックに対する９種類の予測モードを模式的に表した説明図である。It is explanatory drawing which represented typically nine types of prediction modes with respect to a subblock. 符号化対象ブロック周辺画素群を示す説明図である。It is explanatory drawing which shows an encoding object block periphery pixel group. マクロブロックに対する４種類の予測モードを同様に表した説明図である。It is explanatory drawing which represented similarly 4 types of prediction modes with respect to a macroblock.

以下、図面を参照して、本発明の一実施形態による画像符号化装置および画像復号装置を説明する。図１は同実施形態の構成を示すブロック図である。この図において、符号１は、距離画像を入力し、入力した距離画像を所定の画素数で構成するブロックに分割し、ブロック毎に符号化して、伝送路を介して符号化ブロックデータを伝送する画像符号化装置である。符号２は、伝送路を介して、画像符号化装置から伝送された符号化ブロックデータを受信し、受信した符号化ブロックデータを復号して、距離画像を復元し、復元した距離画像を出力する画像復号装置である。 Hereinafter, an image encoding device and an image decoding device according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing the configuration of the embodiment. In this figure, reference numeral 1 indicates that a distance image is input, the input distance image is divided into blocks each having a predetermined number of pixels, encoded for each block, and encoded block data is transmitted via a transmission path. An image encoding device. Code 2 receives the encoded block data transmitted from the image encoding device via the transmission path, decodes the received encoded block data, restores the distance image, and outputs the restored distance image. An image decoding device.

次に、図２〜図５を参照して、画面内予測符号化処理について説明する。輝度信号のサブブロックの画面内予測に用いる周辺画素群は、Ｈ．２６４規格と同様、図２に示すように、Ａ〜Ｍの１３個の画素群である。そして、色差信号のブロックの画面内予測に用いる周辺画素群は、図３に示すように、Ａ〜Ｙの２５個の画素群である。また、輝度信号のマクロブロックの画面内予測に用いる周辺画素群は、図４に示すように、００〜２Ｆ、１０〜１Ｆ、３０の４９個の画素群である。予測モードは、図５に示すように、モード０〜７の８方向に対する予測である。 Next, the intra prediction encoding process will be described with reference to FIGS. The peripheral pixel group used for the intra prediction of the luminance signal sub-block is H.264. Similarly to the H.264 standard, as shown in FIG. 2, there are 13 pixel groups A to M. And the surrounding pixel group used for the prediction in a screen of the block of a color difference signal is 25 pixel groups of AY as shown in FIG. Further, the peripheral pixel groups used for the intra prediction of the macro block of the luminance signal are 49 pixel groups of 00 to 2F, 10 to 1F, and 30, as shown in FIG. As shown in FIG. 5, the prediction mode is prediction for eight directions of modes 0 to 7.

図６〜図１３は、画素の複写形式の種類の一例である。図６〜図１３において、右下に位置する１６×１６画素のブロックが符号化対象ブロックであり、それ以外が符号化済みの隣接ブロックである。図６〜図１３において、各ブロック内の方眼一つ一つは画素を表現しており、矢印付きの線は、画素の複写先を表現している。例えば、図６において、符号化対象ブロックは、その上に隣接する符号化済みブロックの最下行の画素を複写して作成する。具体的には、符号化対象ブロックにおいて、左からｎ列目に位置する画素群は全て、上に隣接するブロックの最下行の左からｎ番目の画素を複写する。その他の図においても同様である。矢印の意味をさらに説明すると、例えば図１２の矢印群のうちの一つを抜き出して示したものが図１４である。この場合、図１５に示すように黒く塗り潰して示した画素が、上に隣接するブロックの最下行左から９番目の画素を複写するということになる。 6 to 13 are examples of the types of pixel copying formats. 6 to 13, a 16 × 16 pixel block located at the lower right is an encoding target block, and the other blocks are encoded adjacent blocks. 6 to 13, each grid in each block represents a pixel, and a line with an arrow represents a copy destination of the pixel. For example, in FIG. 6, the encoding target block is created by copying the pixel in the lowermost row of the encoded block adjacent thereto. Specifically, in the encoding target block, all the pixel groups located in the nth column from the left copy the nth pixel from the left in the bottom row of the adjacent block above. The same applies to the other drawings. The meaning of the arrow will be further described. For example, FIG. 14 shows one of the arrows in FIG. In this case, as shown in FIG. 15, the pixel shown in black is copied as the ninth pixel from the left in the bottom row of the adjacent block.

なお、サブブロックに対して、それぞれの予測モードにおける周辺画素の複写の仕方は、Ｈ．２６４規格と同様とする。このように、サブブロックとマクロブロックの両方に対して、距離画像では符号化効率にあまり貢献しないＤＣブロックやＰｌａｎｅブロックを使用せず、代わりにさまざまな方向の予測モードを用意することで、精度のよい予測が可能となる。どの予測モードを選択するかについては、各モードについて全画素に対する歪み（差分の二乗和）を計算し、最小のものを選択する。 For sub-blocks, how to copy neighboring pixels in each prediction mode is described in H.264. The same as the H.264 standard. In this way, for both sub-blocks and macro-blocks, the DC and Plane blocks that do not contribute much to the coding efficiency in the distance image are not used, but instead, prediction modes in various directions are prepared. Can be predicted well. As to which prediction mode is selected, the distortion (sum of squares of differences) for all pixels is calculated for each mode, and the minimum one is selected.

次に、図１に示す画像符号化装置１における予測モードの符号化方法について説明する。予測モードを符号化する際、Ｈ．２６４規格のサブブロックの画面内予測における場合と同様、隣接ブロックから予測モードの予測を行う。ただし、その予測処理は、Ｈ．２６４規格のそれと異なる。予測処理動作を図１６を参照して説明する。 Next, a prediction mode encoding method in the image encoding device 1 shown in FIG. 1 will be described. When encoding the prediction mode, H.264 is used. As in the case of intra-frame prediction of H.264 standard sub-blocks, prediction mode prediction is performed from adjacent blocks. However, the prediction process is H.264. It is different from that of H.264 standard. The prediction processing operation will be described with reference to FIG.

まず、符号化対象ブロックの上と左に隣接する符号化済みブロックのうち、輪郭が含まれるブロックの予測モードを参照する。これは、輪郭が含まれるブロックからは、輪郭が連続している場合が多く、また輪郭に沿って方向が変化するため、その隣接ブロックからの変化量は大きくない場合が多いからである。輪郭が含まれるか否かの判定は、その隣接ブロックが複数の深度値を含んでいるかかによって決定する。すなわち、その隣接ブロックに輪郭が含まれる場合、必ず複数の深度値を含むからである。したがって、単一の深度値のみから成る隣接ブロックに関しては、予測モードの予測を行うのに使用しないということになる。 First, a prediction mode of a block including a contour is referred to among encoded blocks adjacent to the upper and left sides of the encoding target block. This is because the contour is often continuous from the block including the contour, and the direction changes along the contour, so that the amount of change from the adjacent block is often not large. Whether or not the contour is included is determined depending on whether or not the adjacent block includes a plurality of depth values. That is, when the adjacent block includes an outline, it always includes a plurality of depth values. Therefore, the adjacent block consisting of only a single depth value is not used for prediction mode prediction.

よって、まず、符号化対象ブロックの上と左に隣接する符号化済みブロックのうち、複数の深度値を含む符号化済みブロックが存在するか否か判定する（ステップＳ１）。この判定の結果、複数の深度値を含む符号化済みブロックが存在する場合、そのブロック内の輪郭を含む可能性のあるブロックの輪郭が、符号化対象ブロックの方向に伸びているか否かを判定する（ステップＳ２）。具体的には、左に隣接するブロックについては、予測モードが、モード１、モード３、モード４、モード５及びモード７のいずれかである場合に、存在すると判定する。上に隣接するブロックについては、予測モードが、モード０、モード２、モード３、モード４、モード５及びモード６のいずれかである場合に、存在すると判定する。ステップＳ１、Ｓ２における判定の結果、いずれも存在しない場合、符号化対象ブロックの左斜め上と右斜め上の符号化済みブロックに、複数の深度値を含む符号化済みブロックが存在するか否か判定する（ステップＳ３）。これも存在しない場合、予測モードの予測値は「なし」とし、図５に示す予測モードの番号をそのまま符号化する（ステップＳ４）。 Therefore, first, it is determined whether there is an encoded block including a plurality of depth values among the encoded blocks adjacent to the upper and left sides of the encoding target block (step S1). As a result of the determination, if there is an encoded block including a plurality of depth values, it is determined whether or not the contour of a block that may include the contour in the block extends in the direction of the encoding target block. (Step S2). Specifically, the block adjacent to the left is determined to exist when the prediction mode is any one of mode 1, mode 3, mode 4, mode 5, and mode 7. A block adjacent to the upper side is determined to exist when the prediction mode is any one of mode 0, mode 2, mode 3, mode 4, mode 5, and mode 6. As a result of the determination in steps S1 and S2, whether or not there is an encoded block including a plurality of depth values in the encoded block on the upper left and the upper right of the encoding target block, if neither exists. Determine (step S3). If this also does not exist, the prediction value of the prediction mode is set to “none”, and the prediction mode number shown in FIG. 5 is encoded as it is (step S4).

一方、左斜め上と右斜め上の符号化済みブロックのうち、複数の深度値を含む符号化済みブロックが存在する場合、そのブロック内の輪郭を含む可能性のあるブロックの輪郭が、符号化対象ブロックの方向に伸びているか否かを判定する（ステップＳ５）。左斜め上に隣接するブロックについては、予測モードがモード３の場合に、存在すると判定する。右斜め上に隣接するブロックについては、予測モードが２またはモード６の場合に、存在すると判定する。 On the other hand, if there is an encoded block including a plurality of depth values among the encoded blocks on the upper left and the upper right, the contour of the block that may include the contour in the block is encoded. It is determined whether or not it extends in the direction of the target block (step S5). It is determined that a block adjacent on the upper left is present when the prediction mode is mode 3. It is determined that a block adjacent to the upper right is present when the prediction mode is 2 or mode 6.

次に、ステップＳ２、Ｓ５における判定の結果、該当するブロックが存在する場合、判定されたブロックは２つのブロックのうち両方（上と左、または左斜め上と右斜め上）のブロックであったか否かを判定する（ステップＳ６）。この判定の結果、両方とも該当した場合、２つの予測モード番号の中間の方向を基準にする（ステップＳ７）。中間が１つの方向に定まらない場合は、モード番号の小さい方を採用する。一方、片方のブロックのみ該当する場合は、該当する方の予測モードの方向を基準にする（ステップＳ８）。 Next, as a result of the determination in steps S2 and S5, if there is a corresponding block, whether or not the determined block is a block of both of the two blocks (upper and left, or upper left and upper right). Is determined (step S6). As a result of this determination, if both are applicable, the middle direction between the two prediction mode numbers is used as a reference (step S7). When the middle is not fixed in one direction, the smaller mode number is adopted. On the other hand, when only one of the blocks is applicable, the direction of the corresponding prediction mode is used as a reference (step S8).

例えば、図５において示すモード４が基準になったとき、その方向を０とし、次にその両隣のうち番号が小さい方を１、大きい方を２として、その後は基準方向を軸にして交互に外側に番号を割り振っていく。それ以上、どちらか一方の外側に番号が無くなった場合は、その逆側の外側に順に連続して番号を振っていく（図１７参照）。そして、例えばそれぞれの番号に対し、図１８に示すように、指数ゴロム符号語を割り当てる。この方法は、隣接ブロックから予測した予測方向と符号化対象ブロックの予測方向が近い場合が大多数であるときは、符号語長が短くなるので、情報圧縮の効率化が期待できる。あるいは、そのような符号語の割り当てではなく、４ビットの固定長符号語ｂ０ｂ１ｂ２ｂ３を用意し、ｂ０には隣接ブロックから予測した予測方向と符号化対象ブロックの予測方向が同じか否かを示すフラグを割り当て、もしそれが異なる場合には、図５に示す予測モードの番号をそのままｂ１ｂ２ｂ３の３ビットを用いて符号化するなどしてもよい。あるいは、Ｈ．２６４規格における、輝度信号に対するマクロブロックと色差信号に対するブロックの予測モードのように、３ビットの固定長符号語ｂ０ｂ１ｂ２を用意し、全てのブロックに対し、その３ビットを用いて図５に示す予測モードの数字をそのまま符号化するなどしてもよい。 For example, when the mode 4 shown in FIG. 5 is the reference, the direction is set to 0, the next lower number is set to 1 and the larger number is set to 2, and then the reference direction is used as an axis. Assign a number to the outside. If there is no more number on the outside of either one, the numbers are sequentially assigned to the outside on the opposite side (see FIG. 17). Then, for example, an exponent Golomb codeword is assigned to each number as shown in FIG. In this method, when the prediction direction predicted from the adjacent block is almost the same as the prediction direction of the encoding target block, the codeword length is shortened, so that the efficiency of information compression can be expected. Alternatively, instead of such codeword allocation, a 4-bit fixed-length codeword b0 b1 b2 b3 is prepared, and whether or not the prediction direction predicted from the adjacent block and the prediction direction of the encoding target block are the same for b0 If they are different, the prediction mode number shown in FIG. 5 may be encoded as it is using the 3 bits b1 b2 b3. Alternatively, H. As in the prediction mode of the macro block for the luminance signal and the block for the color difference signal in the H.264 standard, a 3-bit fixed-length codeword b0 b1 b2 is prepared, and the 3 bits are used for all the blocks in FIG. The number of the prediction mode shown may be encoded as it is.

また、最上行や左端列に含まれるブロックが符号化対象の場合など、上と左の隣接ブロックのうち片方しか存在しない場合は当然、存在しないブロックを参照できない。このような場合には、参照できるブロックのみを用いて前述の予測処理を行う。すなわち、図１６のステップＳ６とＳ７が省略され、それらの代わりにステップＳ８が位置し、処理としては、ステップＳ４かステップＳ８のいずれかとなる。 In addition, when only one of the upper and left adjacent blocks exists, such as when the block included in the uppermost row or the leftmost column is an encoding target, it is naturally not possible to refer to the nonexistent block. In such a case, the above prediction process is performed using only blocks that can be referred to. That is, steps S6 and S7 in FIG. 16 are omitted, and step S8 is substituted for them, and the process is either step S4 or step S8.

次に、図１に示す画像復号装置２の処理動作を説明する。画像復号装置２では、符号化した順番にブロック単位で復号していく。符号化時は、符号化済みのブロックに含まれる画素群を参照し、予測値を計算したが、復号側では、復号済みブロックに含まれる画素群を参照して、同様の方法で予測値を計算する。符号化側における符号化済みブロックと、そのブロックを復号側で復号したときのブロックは同じであるため、符号化側と同じ予測値が復号側で得られる。そして、前述のように、符号化側で、この予測値を用いて予測モードを符号化した場合は、画像復号装置２において予測値を用いてその予測モードを復元することができる。 Next, the processing operation of the image decoding device 2 shown in FIG. 1 will be described. In the image decoding device 2, decoding is performed in block units in the order of encoding. At the time of encoding, the prediction value was calculated by referring to the pixel group included in the encoded block. On the decoding side, the prediction value was calculated in the same manner with reference to the pixel group included in the decoded block. calculate. Since the encoded block on the encoding side and the block when the block is decoded on the decoding side are the same, the same predicted value as that on the encoding side is obtained on the decoding side. As described above, when the prediction mode is encoded using the prediction value on the encoding side, the prediction mode can be restored using the prediction value in the image decoding apparatus 2.

前述した方法は、輝度信号に対するサブマクロブロックとマクロブロック、そして色差信号に対するブロックそれぞれに対して適用できる。以上説明した処理動作により、前述した特徴を有する距離画像に対して効率のよい画面内予測符号化を実行することができ、より効率のよい情報圧縮が可能となる。 The method described above can be applied to each of the sub-macroblock and macroblock for the luminance signal and the block for the color difference signal. Through the processing operations described above, efficient intra prediction encoding can be performed on the distance image having the above-described features, and more efficient information compression can be performed.

なお、図１における画像符号化装置及び画像復号装置２の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより画像符号化処理・画像復号処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータシステム」は、ホームページ提供環境（あるいは表示環境）を備えたＷＷＷシステムも含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 The program for realizing the functions of the image encoding device and the image decoding device 2 in FIG. 1 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into a computer system and executed. By doing so, the image encoding process and the image decoding process may be performed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer system” includes a WWW system having a homepage providing environment (or display environment). The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Further, the “computer-readable recording medium” refers to a volatile memory (RAM) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, those holding programs for a certain period of time are also included.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, what is called a difference file (difference program) may be sufficient.

距離画像の符号化・復号を行うことが不可欠な用途に適用できる。 The present invention can be applied to applications where it is indispensable to encode / decode range images.

１・・・画像符号化装置、２・・・画像復号装置 DESCRIPTION OF SYMBOLS 1 ... Image coding apparatus, 2 ... Image decoding apparatus

Claims

An image coding apparatus that divides a distance image into blocks and performs coding by performing intra prediction based on features of adjacent blocks,
Selecting means for selecting a prediction mode to be applied to each block of the distance image from among the prediction modes;
First determination means for determining whether or not to include a plurality of depth values in adjacent encoded blocks;
Second determination means for determining whether or not the block determined to include a plurality of depth values by the first determination means has a prediction mode corresponding to the direction toward the encoding target block;
Prediction means that uses the same prediction mode as the prediction mode of the block determined to have by the second determination means as a prediction value of the prediction mode of the block;
An image encoding apparatus comprising: encoding means for encoding and transmitting the encoding target block using a prediction value of the prediction mode.

The plurality of adjacent coded blocks are blocks adjacent to the upper and left sides, and if the predicted value cannot be obtained from any of them, the blocks are adjacent to the upper left and the upper right. The image encoding device according to claim 1.

The image encoding apparatus according to claim 1, wherein the prediction mode includes only prediction modes corresponding to eight directions.

The image according to any one of claims 1 to 3, wherein when there are two blocks from which the prediction value is obtained, a prediction mode corresponding to an intermediate direction of each prediction direction is set as the prediction value. Encoding device.

5. The method according to claim 1, wherein when the selected one mode is encoded, the selected one mode is encoded by encoding a difference between a direction of the prediction value and a prediction direction. The image encoding device according to any one of the above.

6. The image code according to claim 1, wherein the encoding target block is any one of 4 × 4 pixels, 8 × 8 pixels, 16 × 16 pixels, or a combination thereof. Device.

An image decoding device that decodes a distance image encoded by the image encoding device according to any one of claims 1 to 6,
First determination means for determining whether or not a plurality of adjacent decoded blocks include a plurality of depth values in each block of the distance image;
Second determination means for determining whether or not the block determined to include a plurality of depth values by the first determination means has a prediction mode corresponding to the direction toward the block;
Prediction means that uses the same prediction mode of the block determined to be possessed by the second determination means as the prediction value of the prediction mode of the block;
An image decoding apparatus comprising: decoding means for decoding a prediction mode of a received encoded block using the prediction value.

An image encoding program for causing a computer to function as the image encoding apparatus according to any one of claims 1 to 6.

An image decoding program that causes a computer to function as the image decoding device according to claim 7.

It is encoded data of a distance image, and for each block of the image, one mode is selected from prediction modes composed of only a plurality of prediction directions, and a plurality of adjacent encoded blocks are selected. Determining whether a block includes a plurality of depth values, determining whether a block determined to include a plurality of depth values has a prediction mode corresponding to a direction toward the block; Encoded data obtained by encoding the prediction mode using a prediction value, which is the same as the prediction mode of the block determined to have the prediction value of the prediction mode of the block.