JP2020120322A

JP2020120322A - Distance image coding device and program of the same, and distance image decoding device and program of the same

Info

Publication number: JP2020120322A
Application number: JP2019011278A
Authority: JP
Inventors: 片山　美和; Miwa Katayama; 美和片山; 河北　真宏; Masahiro Kawakita; 真宏河北
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2019-01-25
Filing date: 2019-01-25
Publication date: 2020-08-06
Anticipated expiration: 2039-01-25
Also published as: JP7257152B2

Abstract

To provide a distance image coding device capable of hierarchically coding a distance image in each depth range.SOLUTION: A distance image coding device 1 comprises: basic layer coding means 12 of roughly coding a distance image to generate a basic layer coding stream; local decoding means 13 of generating a decoding image from a quantization coefficient; decoding difference image generation means 14 of generating a difference between the distance image and a decoding image as a decoding difference image; region division means 11 of dividing a region of the distance image in each depth range which has been previously set; individual depth difference image generation means 15 of generating each image in each depth range from the decoding difference image as an individual depth difference image; expansion layer coding means 16 of generating an expansion layer coding stream by finely coding the individual depth difference image in each depth range; and stream coupling means 17 of coupling the expansion layer coding stream in each basic layer coding stream and each depth range.SELECTED DRAWING: Figure 1

Description

本発明は、距離画像を符号化する距離画像符号化装置およびそのプログラム、ならびに、符号化された距離画像を復号する距離画像復号装置およびそのプログラムに関する。 The present invention relates to a distance image encoding device that encodes a distance image and a program thereof, and a distance image decoding device that decodes an encoded distance image and a program thereof.

現在、立体画像（多視点画像）を伝送するにあたり、撮影画像と撮影位置（視点位置）からの奥行情報を示す距離画像（デプスマップ）とを、あわせて伝送する方式が検討されている。国際標準においては、多視点画像と距離画像とを符号化し、復号側で任意の視点画像を生成する３Ｄ−ＡＶＣ、３Ｄ−ＨＥＶＣ等の符号化方式が決められている（非特許文献１，２参照）。
従来、距離画像の符号化は、基本的に、動き補償をベースとした撮影画像の符号化方式をそのまま適用している。また、撮影画像の符号化方式を適用しつつ、撮影画像と距離画像との統計的性質の違い、例えば、撮影画像の方が距離画像より高周波成分が多い等に着目して、撮影画像と距離画像とで量子化のパラメータを変えて符号化する手法（非特許文献３参照）や、画質に影響する領域境界の距離画像の圧縮を抑える手法（非特許文献４参照）が開示されている。
また、撮影画像の符号化方式には、伝送路や受信装置（復号装置）の仕様により、階層的に圧縮効率を変えて符号化を行う手法が存在する（特許文献１，２、非特許文献５参照）。 At present, in transmitting a stereoscopic image (multi-viewpoint image), a method of transmitting a captured image and a distance image (depth map) indicating depth information from a shooting position (viewpoint position) together is being studied. In the international standard, a coding method such as 3D-AVC, 3D-HEVC in which a multi-view image and a distance image are coded and a decoding side generates an arbitrary viewpoint image is determined (Non-Patent Documents 1 and 2). reference).
Conventionally, the coding of a captured image based on motion compensation is basically applied as it is to the coding of a distance image. Further, while applying the encoding method of the captured image, focusing on the difference in the statistical property between the captured image and the distance image, for example, the captured image has more high-frequency components than the distance image, There is disclosed a method of encoding by changing a quantization parameter for an image (see Non-Patent Document 3) and a technique of suppressing compression of a distance image at a region boundary that affects image quality (see Non-Patent Document 4).
In addition, as a coding method of a captured image, there is a method of performing coding by hierarchically changing the compression efficiency depending on the specifications of a transmission line and a receiving device (decoding device) (Patent Documents 1 and 2, Non-Patent Documents) 5).

特開２００４−３５００７２号公報JP, 2004-350072, A 特開２００７−２６６７４８号公報JP, 2007-266748, A

志水、「デプスマップを用いた三次元映像符号化の国際標準化動向」、情報処理学会研究報告、Vol.2013-AVM-82、No.11、pp.1-6(2013).Shimizu, "International Standardization Trend of 3D Video Coding Using Depth Map", IPSJ Research Report, Vol.2013-AVM-82, No.11, pp.1-6 (2013). 妹尾，山本，大井，栗田、「ＭＰＥＧ多視点映像符号化の標準化活動」、情報通信研究機構季報、Vol.56、Nos.1/2、pp.79-90(2010).Senoo, Yamamoto, Oi, Kurita, "Standardization activity of MPEG multi-view video coding", National Institute of Information and Communications Technology, Vol.56, Nos.1/2, pp.79-90 (2010). Saldanha, Sanchez, Marcon, Agostini, “Block-level fast coding scheme for depth maps in three-dimensional high efficiency video coding”, Journal of Electronic Imaging, Vol.27(1), 64,010502-1-4(2018).Saldanha, Sanchez, Marcon, Agostini, “Block-level fast coding scheme for depth maps in three-dimensional high efficiency video coding”, Journal of Electronic Imaging, Vol.27(1), 64,010502-1-4(2018) . Oh, Vetro, Ho, “Depth Coding Using a Boundary Reconstruction Filter for 3-D Video Systems”, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, Vol.21, No.3, pp.350-359(2011).Oh, Vetro, Ho, “Depth Coding Using a Boundary Reconstruction Filter for 3-D Video Systems”, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, Vol.21, No.3, pp.350-359 (2011). 筑波，永吉，花村，富永、「独立階層による空間スケーラブル符号化方式に関する検討」、情報処理学会研究報告、Vol.2004-AVM-45、pp.29-34(2004).Tsukuba, Eikichi, Hanamura, Tominaga, "A Study on Spatial Scalable Coding Scheme with Independent Layers", IPSJ Research Report, Vol.2004-AVM-45, pp.29-34 (2004).

従来の距離画像の符号化方式は、基本的に撮影画像の符号化方式をそのまま適用している。そのため、伝送路の帯域、受信装置の性能等の制限による符号化データの圧縮を考慮した場合、撮影画像を階層的に符号化するとともに、距離画像も同様に階層的に符号化する必要がある。
しかし、従来の階層化手法をそのまま距離画像に適用すると、奥行きに関係なく距離画像を単に空間的に階層化することになる。
また、人の奥行感度は、視点からの距離が近い領域については感度が高く、視点からの距離が遠い領域については感度が低いことが知られている（以下の参考文献）。
（参考文献）長田、「視覚の奥行距離情報とその奥行感度」、一般社団法人映像情報メディア学会、テレビジョン、Vol.31、No.8、pp.649-655(1977)
そのため、距離画像を単に空間的に階層化しただけでは、伝送路の帯域が狭い場合、受信装置の性能が低い場合等、すべての階層が復号されない再生画像において、視点からの距離が近い領域で画質の劣化が目立ってしまうという問題がある。 As a conventional distance image encoding method, the captured image encoding method is basically applied as it is. Therefore, in consideration of compression of encoded data due to restrictions on the bandwidth of the transmission path, performance of the receiving device, etc., it is necessary to hierarchically encode the captured image and also hierarchically encode the distance image. ..
However, if the conventional layering method is applied to the range image as it is, the range image is simply layered spatially regardless of the depth.
In addition, it is known that human depth sensitivity is high in a region close to the viewpoint and low in a region far from the viewpoint (references below).
(References) Nagata, "Visual Depth Distance Information and Its Depth Sensitivity", The Institute of Image Information and Television Engineers, Television, Vol.31, No.8, pp.649-655 (1977)
Therefore, if the distance image is simply hierarchically layered, when the bandwidth of the transmission path is narrow, when the performance of the receiving device is low, etc., in the reproduced image in which all the layers are not decoded, in the area close to the viewpoint. There is a problem that the deterioration of image quality becomes noticeable.

本発明は、このような問題に鑑みてなされたものであり、距離画像を奥行範囲ごとに階層的に符号化／復号することが可能な距離画像符号化装置およびそのプログラム、ならびに、距離画像復号装置およびそのプログラムを提供することを課題とする。 The present invention has been made in view of such a problem, and a distance image encoding device and a program thereof capable of hierarchically encoding/decoding a distance image for each depth range, and a distance image decoding device. It is an object to provide a device and its program.

前記課題を解決するため、本発明に係る距離画像符号化装置は、被写体の奥行情報を示す距離画像を符号化する距離画像符号化装置であって、基本レイヤ符号化手段と、ローカル復号手段と、復号差分画像生成手段と、領域区分手段と、奥行別差分画像生成手段と、拡張レイヤ符号化手段と、ストリーム結合手段と、を備える構成とした。 In order to solve the above-mentioned problems, a distance image encoding device according to the present invention is a distance image encoding device that encodes a distance image indicating depth information of a subject, and includes a base layer encoding means and a local decoding means. The decoding differential image generating means, the area dividing means, the depth-based differential image generating means, the enhancement layer encoding means, and the stream combining means are provided.

かかる構成において、距離画像符号化装置は、基本レイヤ符号化手段によって、距離画像を周波数成分に変換して第１量子化ステップで量子化し、量子化係数を可変長符号化することで基本レイヤ符号化ストリームを生成する。なお、この第１量子化ステップは、後記する基本レイヤ符号化手段の第２量子化ステップよりも値を大きくすることで、基本レイヤ符号化手段は、距離画像の諧調のレベルを荒く量子化して符号化する。
そして、距離画像符号化装置は、ローカル復号手段によって、量子化係数を逆量子化し、周波数成分を逆変換することで、復号側で復号される距離画像の復号画像を再現する。 In such a configuration, the distance image encoding device converts the distance image into frequency components by the base layer encoding means, quantizes the frequency components in the first quantization step, and performs variable length encoding of the quantized coefficient to thereby obtain the base layer code. Generate a stream. The first quantization step has a larger value than the second quantization step of the base layer coding means described later, so that the base layer coding means roughly quantizes the gray level of the range image. Encode.
Then, the distance image encoding device reproduces the decoded image of the distance image decoded on the decoding side by dequantizing the quantized coefficient and inversely transforming the frequency component by the local decoding means.

さらに、距離画像符号化装置は、復号差分画像生成手段によって、距離画像と復号画像との差分を復号差分画像として生成する。
また、距離画像符号化装置は、領域区分手段によって、予め設定された奥行範囲ごとに、距離画像の領域を区分する。
そして、距離画像符号化装置は、奥行別差分画像生成手段によって、復号差分画像から、領域区分手段で区分された領域ごとの画像を奥行別差分画像として生成する。 Further, the distance image coding device generates the difference between the distance image and the decoded image as the decoded difference image by the decoded difference image generating means.
Further, in the distance image coding device, the area dividing unit divides the area of the distance image for each preset depth range.
Then, the distance image encoding device uses the depth difference image generation means to generate an image for each area divided by the area division means from the decoded difference image as a depth difference image.

そして、距離画像符号化装置は、拡張レイヤ符号化手段によって、奥行範囲ごとに、対応する奥行別差分画像を周波数成分に変換して第１量子化ステップよりも小さい第２量子化ステップで量子化し、量子化係数を可変長符号化することで拡張レイヤ符号化ストリームを生成する。なお、この第２量子化ステップは基本レイヤ符号化手段の第１量子化ステップよりも値が小さいため、拡張レイヤ符号化手段は、奥行別差分画像の諧調のレベルを細かく量子化して符号化することができる。 Then, in the distance image encoding device, the enhancement layer encoding means converts the corresponding depth-specific difference image into frequency components for each depth range, and quantizes them in a second quantization step smaller than the first quantization step. A variable length coding of the quantized coefficient generates an enhancement layer coded stream. Since the value of this second quantization step is smaller than that of the first quantization step of the base layer encoding means, the enhancement layer encoding means finely quantizes and encodes the gradation level of the depth difference image. be able to.

そして、距離画像符号化装置は、ストリーム結合手段によって、基本レイヤ符号化ストリームと、奥行範囲ごとの拡張レイヤ符号化ストリームとを結合して、距離画像の符号化ストリームを生成する。
これによって、距離画像を荒く符号化した基本レイヤ符号化ストリームと、荒く符号化した距離画像と元の距離画像との差分を細かく符号化した奥行範囲ごとの複数の拡張レイヤ符号化ストリームとによって、奥行範囲ごとに階層的に符号化したストリームが生成されることになる。
なお、距離画像符号化装置は、コンピュータを、前記した各手段として機能させるためのプログラムで動作させることができる。 Then, the distance image encoding device combines the base layer encoded stream and the enhancement layer encoded stream for each depth range by the stream combining means to generate an encoded stream of the distance image.
By this, by the base layer encoded stream roughly encoded distance image, a plurality of enhancement layer encoded stream for each depth range finely encoded the difference between the coarsely encoded distance image and the original distance image, A hierarchically encoded stream is generated for each depth range.
The distance image encoding device can operate a computer with a program for causing each of the above-described means to function.

また、前記課題を解決するため、本発明に係る距離画像復号装置は、第１量子化ステップで被写体の奥行情報を示す距離画像を符号化した基本レイヤ符号化ストリームと、前記基本レイヤ符号化ストリームとして符号化された距離画像と元の距離画像との差分を第１量子化ステップよりも小さい第２量子化ステップで符号化した奥行範囲ごとの複数の拡張レイヤ符号化ストリームとを連結した符号化ストリームを復号する距離画像復号装置であって、ストリーム分離手段と、基本レイヤ復号手段と、拡張レイヤ復号手段と、画像合成手段と、を備える構成とした。 Further, in order to solve the above-mentioned problems, the distance image decoding device according to the present invention provides a base layer encoded stream obtained by encoding a distance image indicating depth information of a subject in a first quantization step, and the base layer encoded stream. Encoding by concatenating a plurality of enhancement layer coded streams for each depth range, in which the difference between the distance image coded as "1" and the original distance image is coded in the second quantization step smaller than the first quantization step A distance image decoding device for decoding a stream, comprising a stream separating means, a base layer decoding means, an enhancement layer decoding means, and an image synthesizing means.

かかる構成において、距離画像復号装置は、ストリーム分離手段によって、符号化ストリームを、基本レイヤ符号化ストリームと複数の拡張レイヤ符号化ストリームとに分離する。 In such a configuration, the distance image decoding device separates the encoded stream into the base layer encoded stream and the plurality of enhancement layer encoded streams by the stream separating means.

そして、距離画像復号装置は、基本レイヤ復号手段によって、基本レイヤ符号化ストリームを、符号化側と同じ値の大きい第１量子化ステップを用いて復号し、基本レイヤ復号画像を生成する。これによって、諧調のレベルが大まかな距離画像が生成されることになる。
また、距離画像復号装置は、拡張レイヤ復号手段によって、複数の拡張レイヤ符号化ストリームを、符号化側と同じ小さい第２量子化ステップを用いて復号し、複数の拡張レイヤ復号画像を生成する。これによって、基本レイヤ符号化ストリームとして符号化された距離画像と元の距離画像との差分が細かい諧調のレベルまで精度よく再現されることになる。 Then, the distance image decoding device decodes the base layer encoded stream by the base layer decoding means using the first quantization step having the same large value as that on the encoding side to generate a base layer decoded image. As a result, a range image having a rough gradation level is generated.
Further, the distance image decoding device uses the enhancement layer decoding means to decode the plurality of enhancement layer encoded streams using the same small second quantization step as that on the encoding side to generate a plurality of enhancement layer decoded images. As a result, the difference between the distance image encoded as the base layer encoded stream and the original distance image can be accurately reproduced up to a fine gradation level.

そして、距離画像復号装置は、画像合成手段によって、基本レイヤ復号画像と複数の拡張レイヤ復号画像とを合成し、符号化ストリームを復号した距離画像を生成する。
これによって、距離画像復号装置は、階層的に符号化ストリームを復号することができる。
なお、距離画像復号装置は、コンピュータを、前記した各手段として機能させるためのプログラムで動作させることができる。 Then, the distance image decoding device synthesizes the base layer decoded image and the plurality of enhancement layer decoded images by the image synthesizing means to generate a distance image obtained by decoding the encoded stream.
Thereby, the distance image decoding device can hierarchically decode the encoded stream.
Note that the distance image decoding device can operate a computer by a program for causing each of the above-described means to function.

本発明は、以下に示す優れた効果を奏するものである。
本発明によれば、距離画像を奥行範囲ごとに階層的に符号化／復号することができる。これによって、本発明は、伝送路の帯域に応じて階層数を限定したり、距離画像復号装置のＣＰＵパワー等の性能に応じて奥行範囲に優先順位を付けて復号することが可能な符号化ストリームを生成することができる。 The present invention has the following excellent effects.
According to the present invention, a distance image can be hierarchically encoded/decoded for each depth range. As a result, the present invention enables coding in which the number of layers is limited according to the band of the transmission path, and the depth range is prioritized for decoding according to the performance of the distance image decoding device such as the CPU power. Streams can be created.

本発明の実施形態に係る距離画像符号化装置の構成を示すブロック構成図である。It is a block block diagram which shows the structure of the distance image coding apparatus which concerns on embodiment of this invention. 距離画像を説明するための説明図であって、（ａ）はカメラ（３眼カメラ）と被写体との配置を示す斜視図、（ｂ）は撮影画像、（ｃ）は距離画像を示す。It is explanatory drawing for demonstrating a distance image, (a) is a perspective view which shows arrangement|positioning of a camera (trinocular camera) and a to-be-photographed object, (b) shows a picked-up image, (c) shows a distance image. 奥行範囲の閾値を説明するための説明図である。It is explanatory drawing for demonstrating the threshold value of a depth range. （ａ）〜（ｃ）は奥行範囲ごとのマスクデータを示す図である。(A)-(c) is a figure which shows the mask data for every depth range. 本発明の実施形態に係る距離画像符号化装置の動作を示すフローチャートである。It is a flowchart which shows operation|movement of the distance image coding apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る距離画像復号装置の構成を示すブロック構成図である。It is a block block diagram which shows the structure of the distance image decoding apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る距離画像復号装置の動作を示すフローチャートである。It is a flowchart which shows operation|movement of the distance image decoding apparatus which concerns on embodiment of this invention. 奥行範囲の他の閾値を説明するための説明図である。It is explanatory drawing for demonstrating the other threshold value of a depth range.

以下、本発明の実施形態について図面を参照して説明する。
〔距離画像符号化装置の構成〕
図１を参照して、本発明の実施形態に係る距離画像符号化装置１の構成について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[Configuration of range image encoding device]
The configuration of the distance image encoding device 1 according to the embodiment of the present invention will be described with reference to FIG.

距離画像符号化装置１は、被写体の奥行情報を示す距離画像（デプスマップ）を、奥行きに応じて階層的に符号化するものである。
距離画像は、被写体を撮影した被写体空間における視点位置から被写体までの距離を、画素ごとに予め定めた奥行最小値から奥行最大値までの範囲に割り当てた画素値で表した画像である。なお、被写体を撮影した撮影画像が静止画像であれば、距離画像は１枚の画像である。また、撮影画像が動画像であれば、距離画像も撮影画像のフレームに対応した画像となる。 The distance image encoding device 1 hierarchically encodes a distance image (depth map) indicating depth information of a subject according to depth.
The distance image is an image in which the distance from the viewpoint position in the subject space where the subject is photographed to the subject is represented by pixel values assigned to a range from a depth minimum value to a depth maximum value that is predetermined for each pixel. If the captured image of the subject is a still image, the distance image is a single image. If the captured image is a moving image, the distance image will also be an image corresponding to the frame of the captured image.

ここで、図２を参照して、距離画像の例について説明する。
図２（ａ）に示すように、距離画像は、被写体空間上の被写体（ここでは、Ｏ_１〜Ｏ_３）をカメラ（３眼カメラ）Ｃで撮影した画像のうち２枚の画像（ステレオ画像）の視差に応じて画素ごとに画素値を割り当てることで生成することができる。
カメラＣは、左眼カメラＣ_Ｌと中央カメラＣ_Ｃと右眼カメラＣ_Ｒとを備え、それぞれを水平方向に等間隔に配置して構成している。
中央カメラＣ_Ｃは、撮影画像として被写体を撮影するカメラである。
左眼カメラＣ_Ｌおよび右眼カメラＣ_Ｒは、視差を求めるための画像を撮影するカメラである。 Here, an example of the range image will be described with reference to FIG. 2.
As shown in FIG. 2A, the range image is two images (stereo image) of images obtained by photographing a subject (here, O _{1 to} O ₃ ) in the subject space with a camera (trinocular camera) C. It can be generated by allocating a pixel value for each pixel according to the parallax of FIG.
The camera C includes a left-eye camera C _L , a central camera C _C, and a right-eye camera C _R , which are arranged at equal intervals in the horizontal direction.
The central camera C _C is a camera that captures a subject as a captured image.
The left-eye camera C _L and the right-eye camera C _R are cameras that capture images for obtaining parallax.

図２（ｂ）は、図２（ａ）において中央カメラＣ_Ｃで撮影した撮影画像Ｐを示している。距離画像は、左眼カメラＣ_Ｌで撮影した撮影画像（不図示）と、右眼カメラＣ_Ｒで撮影した撮影画像（不図示）とでマッチングを行い、対応する画素の水平方向の距離に応じた視差に対応する画素値を撮影画像Ｐの画素位置に割り当てたものである。
例えば、図２（ｂ）の撮影画像Ｐに対応する奥行情報は、図２（ｃ）のグレー画像で表される距離画像Ｄとなる。
ここでは、距離画像Ｄは、視点位置に近いほど白く、視点位置から遠いほど黒く表示している。例えば、距離画像の諧調を２５６階調としたとき、距離画像は、奥行最小値を“０”、奥行最大値を“２５５”とする。
なお、ここでは、距離画像をカメラＣで撮影したステレオ画像から生成したものとしたが、投射したレーザの往復時間から距離を測定する距離画像センサ等、一般的な手法によって取得したものでもよい。 FIG. 2 (b) shows a photographic image P taken by the center camera _{C C} in FIG. 2 (a). The distance image is matched with a photographed image (not shown) photographed by the left-eye camera C _L and a photographed image (not shown) photographed by the right-eye camera C _R , depending on the horizontal distance of the corresponding pixel. The pixel value corresponding to the parallax is assigned to the pixel position of the captured image P.
For example, the depth information corresponding to the captured image P in FIG. 2B is the distance image D represented by the gray image in FIG. 2C.
Here, the distance image D is displayed in white as it is closer to the viewpoint position and as black as it is farther from the viewpoint position. For example, when the gradation of the distance image is 256 gradations, the depth image has a minimum depth value of “0” and a maximum depth value of “255”.
Note that the distance image is generated here from the stereo image captured by the camera C, but it may be acquired by a general method such as a distance image sensor that measures the distance from the round trip time of the projected laser.

図１に戻って、距離画像符号化装置１の構成について説明を続ける。
図１に示すように、距離画像符号化装置１は、閾値設定手段１０と、領域区分手段１１と、基本レイヤ符号化手段１２と、ローカル復号手段１３と、復号差分画像生成手段１４と、奥行別差分画像生成手段１５と、拡張レイヤ符号化手段１６と、ストリーム結合手段１７と、を備える。 Returning to FIG. 1, the description of the configuration of the distance image encoding device 1 will be continued.
As shown in FIG. 1, the distance image encoding device 1 includes a threshold setting unit 10, a region dividing unit 11, a base layer encoding unit 12, a local decoding unit 13, a decoded difference image generating unit 14, and a depth. The differential image generation unit 15, the enhancement layer encoding unit 16, and the stream combining unit 17 are provided.

閾値設定手段１０は、距離画像の奥行きを階層的に区分するための閾値を設定するものである。閾値は、距離画像を奥行範囲ごとに区分した奥行階層の境界を示す奥行値である。なお、ここでは、奥行階層数を“４”として説明するが、少なくとも“２”以上であればよい。
この閾値設定手段１０は、距離画像において、最も小さい奥行値を探索し、その奥行値に予め定めた値を加算した奥行値を、第１の閾値Ｔ_１と設定する。
これによって、視点位置（奥行最小値）からの距離が最も近い被写体を含んだ奥行範囲を特定することができる。 The threshold setting means 10 sets a threshold for hierarchically classifying the depth of the distance image. The threshold value is a depth value indicating the boundary of the depth layer in which the distance image is divided for each depth range. Note that, here, the number of depth layers is described as “4”, but it may be at least “2” or more.
The threshold value setting means 10 searches for the smallest depth value in the distance image, and sets the depth value obtained by adding a predetermined value to the depth value as the first threshold value T ₁ .
This makes it possible to specify the depth range that includes the subject whose distance from the viewpoint position (minimum depth) is closest.

また、閾値設定手段１０は、閾値Ｔ_１から奥行最大値まで範囲を、予め定めた奥行階層数（ここでは、“４”）から“１”を減算した数で等分して、第２以降の閾値を設定する。
すなわち、閾値設定手段１０は、閾値Ｔ_１を設定後、以下の式（１）により、閾値Ｔ_ｉ（ｉは２以上奥行階層数未満の整数）を設定する。なお、Ｄ_ｍａｘは奥行最大値を示し、ｎは奥行階層数を示す。 Further, the threshold value setting means 10 equally divides the range from the threshold value T ₁ to the maximum depth value by the number obtained by subtracting “1” from the predetermined depth hierarchy number (here, “4”), and the second and later. Set the threshold of.
That is, the threshold value setting unit 10 sets the threshold value T ₁ and then sets the threshold value T _i (i is an integer of 2 or more and less than the number of depth layers) by the following equation (1). Note that D _max indicates the maximum depth value, and n indicates the number of depth layers.

これによって、図３に示すように、奥行最小値から閾値Ｔ_１までの奥行範囲を第１階層Ｌ_１、閾値Ｔ_１から閾値Ｔ_２までの奥行範囲を第２階層Ｌ_２、閾値Ｔ_２から閾値Ｔ_３までの奥行範囲を第３階層Ｌ_３、閾値Ｔ_３から奥行最大値までの奥行範囲を第４階層Ｌ_４として、距離画像を階層的に区分することができる。このとき、第１階層Ｌ_１の閾値間の距離はｄ_１（＝Ｔ_１）、第２階層Ｌ_２〜第４階層Ｌ_４のそれぞれの閾値間の距離はすべてｄ_ｍ（＝（Ｄ_ｍａｘ−Ｔ_１）／３）となる。
閾値設定手段１０は、設定した閾値（Ｔ_１〜Ｔ_３）を領域区分手段１１に出力する。
なお、入力される距離画像が動画像に対応した連続したフレームで構成されている場合、閾値設定手段１０は、フレームごとに閾値を設定することとしてもよいし、最初のフレームだけで閾値を設定することとしてもよい。
また、最も直近に配置される被写体までの距離が既知であれば、閾値設定手段１０は、外部から、閾値Ｔ_１を入力することとしてもよい。 As a result, as shown in FIG. 3, the depth range from the depth minimum value to the threshold T ₁ is the first layer L ₁ , the depth range from the threshold T ₁ to the threshold T ₂ is the second layer L ₂ , and the threshold T _{2 is} The depth image up to the threshold value T ₃ is the third layer L ₃ , and the depth range from the threshold value T ₃ to the maximum depth value is the fourth layer L ₄ , whereby the distance image can be hierarchically divided. At this time, the distance between the thresholds of the first layer L ₁ is d ₁ (=T ₁ ), and the distance between the thresholds of the second layer L ₂ to the fourth layer L ₄ is d _m (=(D _max − T ₁ )/3).
The threshold setting means 10 outputs the set thresholds (T _{1 to} T ₃ ) to the area dividing means 11.
When the input distance image is composed of continuous frames corresponding to the moving image, the threshold setting unit 10 may set the threshold for each frame, or set the threshold only for the first frame. It may be done.
In addition, if the distance to the closest object is known, the threshold setting unit 10 may input the threshold T ₁ from the outside.

領域区分手段１１は、設定された閾値で特定される奥行範囲ごとに、距離画像の領域を区分するものである。
この領域区分手段１１は、奥行範囲ごとに、距離画像の対応する奥行値を有する画素の集合を、奥行範囲に対応する領域を示す領域情報とする。
ここでは、領域区分手段１１は、奥行範囲ごとの領域情報をマスクデータとして生成する。具体的には、領域区分手段１１は、奥行範囲ごとに、距離画像の画素値が奥行範囲に含まれる画素の画素値を“１”、それ以外の画素値を“０”としてマスクデータを生成する。なお、２つの奥行範囲を区分する閾値はいずれか一方の奥行範囲に含ませることとする。 The area dividing unit 11 divides the area of the distance image for each depth range specified by the set threshold.
The area dividing unit 11 sets, for each depth range, a set of pixels having a corresponding depth value in the distance image as area information indicating an area corresponding to the depth range.
Here, the area dividing unit 11 generates area information for each depth range as mask data. Specifically, the area dividing unit 11 generates mask data for each depth range by setting the pixel value of the pixel in which the pixel value of the distance image is included in the depth range to “1” and the other pixel values to “0”. To do. It should be noted that the threshold value that separates the two depth ranges is included in one of the depth ranges.

例えば、図２に示すように被写体Ｏ_１〜Ｏ_３に対する距離画像が図２（ｃ）の距離画像Ｄであって、図３に示すように被写体Ｏ_１〜Ｏ_３が、第１階層Ｌ_１〜第３階層Ｌ_３のそれぞれの奥行範囲に存在していたとする。
この場合、領域区分手段１１は、第１階層Ｌ_１については、距離画像Ｄにおいて、奥行最小値以上、閾値Ｔ_１未満の奥行範囲の画素の画素値を“１”、それ以外の画素値を“０”として、図４（ａ）に示すマスクデータＭ_１を生成する。
また、領域区分手段１１は、第２階層Ｌ_２については、距離画像Ｄにおいて、閾値Ｔ_１以上、閾値Ｔ_２未満の奥行範囲の画素の画素値を“１”、それ以外の画素値を“０”として、図４（ｂ）に示すマスクデータＭ_２を生成する。
また、領域区分手段１１は、第３階層Ｌ_３については、距離画像Ｄにおいて、閾値Ｔ_２以上、閾値Ｔ_３未満の奥行範囲の画素の画素値を“１”、それ以外の画素値を“０”として、図４（ｃ）に示すマスクデータＭ_３を生成する。
なお、領域区分手段１１は、第４階層Ｌ_４については、距離画像Ｄにおいて、閾値Ｔ_３以上、奥行最大値以下の奥行範囲の画素の画素値を“１”、それ以外の画素値を“０”としてマスクデータ（不図示）を生成する。
これによって、領域区分手段１１は、閾値で特定される奥行範囲ごとに、距離画像の領域を区分することができる。 For example, as shown in FIG. 2, the distance image with respect to the objects O _{1 to} O ₃ is the distance image D of FIG. 2C, and as shown in FIG. 3, the objects O _{1 to} O ₃ have the first layer L ₁ and it was present in each of the depth range between the third layer L _3.
In this case, for the first layer L ₁ , the area dividing unit 11 sets “1” as the pixel value of the pixels in the depth range that is greater than or equal to the minimum depth value and less than the threshold value T ₁ in the distance image D, and sets the other pixel values as the pixel values. As “0”, the mask data M ₁ shown in FIG. 4A is generated.
In the distance image D, the area dividing unit 11 sets the pixel value of the pixels in the depth range equal to or more than the threshold value T _{1 and} less than the threshold value T ₂ to “1” for the second layer L ₂ , and sets the other pixel values to “1”. The mask data M ₂ shown in FIG. 4B is generated as 0″.
In the distance image D, the area dividing unit 11 sets the pixel values of the pixels in the depth range equal to or larger than the threshold T ₂ and smaller than the threshold T ₃ to “1” for the third layer L ₃ , and sets the other pixel values to “1”. The mask data M ₃ shown in FIG. 4C is generated as 0″.
In the distance image D, the area dividing unit 11 sets the pixel value of the pixels in the depth range equal to or larger than the threshold value T ₃ and equal to or smaller than the maximum depth value to “1” for the fourth layer L ₄ , and sets the other pixel values to “1”. Mask data (not shown) is generated as 0″.
As a result, the area dividing unit 11 can divide the area of the distance image for each depth range specified by the threshold value.

領域区分手段１１は、生成した領域情報（マスクデータ）を奥行別差分画像生成手段１５に出力する。ここでは、領域区分手段１１は、視点位置からの距離が近い順に、それぞれの奥行範囲の領域情報を、奥行別差分画像生成手段１５Ａ，１５Ｂ，１５Ｃ，１５Ｄに出力する。 The area division unit 11 outputs the generated area information (mask data) to the depth-based difference image generation unit 15. Here, the area dividing unit 11 outputs the area information of each depth range to the depth difference image generating units 15A, 15B, 15C, and 15D in the order of increasing distance from the viewpoint position.

基本レイヤ符号化手段１２は、距離画像を周波数成分に変換して量子化し、量子化係数を可変長符号化することで基本レイヤの符号化ストリーム（基本レイヤ符号化ストリーム）を生成するものである。
ここで、基本レイヤは、距離画像全体を符号化対象とする階層を示す。また、後記する拡張レイヤは、距離画像を奥行範囲ごとに符号化対象とする階層（奥行階層）を示す。
なお、基本レイヤ符号化手段１２は、拡張レイヤ符号化手段１６よりも大きい量子化ステップ（第１量子化ステップ）で、諧調のレベルを荒くして距離画像の量子化を行う。
基本レイヤ符号化手段１２は、直交変換手段１２０と、量子化手段１２１と、可変長符号化手段１２２と、を備える。 The base layer encoding means 12 generates a base layer coded stream (base layer coded stream) by converting a distance image into frequency components, quantizing them, and variable-length coding the quantized coefficients. ..
Here, the base layer indicates a layer in which the entire range image is an encoding target. Further, the enhancement layer described below indicates a layer (depth layer) in which the distance image is to be encoded for each depth range.
The base layer coding unit 12 performs quantization of the range image by roughening the gradation level in a quantization step (first quantization step) larger than that of the enhancement layer coding unit 16.
The base layer coding unit 12 includes an orthogonal transforming unit 120, a quantizing unit 121, and a variable length coding unit 122.

直交変換手段１２０は、距離画像を所定の大きさのブロック（例えば、８×８画素）ごとに直交変換し、周波数成分に変換するものである。この直交変換は、例えば、離散コサイン変換（ＤＣＴ：Discrete Cosine Transform）である。直交変換手段１２０は、算出した周波数成分である変換係数を量子化手段１２１に出力する。 The orthogonal transformation unit 120 orthogonally transforms the distance image for each block (for example, 8×8 pixels) having a predetermined size, and transforms it into frequency components. This orthogonal transform is, for example, a discrete cosine transform (DCT). The orthogonal transforming unit 120 outputs the transform coefficient, which is the calculated frequency component, to the quantizing unit 121.

量子化手段１２１は、直交変換手段３１が算出した周波数成分である変換係数を、予め設定した量子化ステップ（第１量子化ステップ）で量子化するものである。この量子化ステップは、変換係数を離散化するときの刻み幅（量子化幅）であって、拡張レイヤ符号化手段１６で用いる量子化ステップよりも大きい値とする。なお、この量子化ステップは、予め設定された値であってもよいし、外部から量子化パラメータで指定される値であってもよい。
量子化手段１２１は、変換係数を量子化ステップのサイズで除算し、整数値に丸めることで、量子化係数を生成する。
量子化手段１２１は、量子化した変換係数（量子化係数）を、可変長符号化手段１２２およびローカル復号手段１３に出力する。 The quantizing means 121 quantizes the transform coefficient which is the frequency component calculated by the orthogonal transforming means 31 in a preset quantizing step (first quantizing step). This quantization step is a step size (quantization width) when the transform coefficient is discretized, and has a larger value than the quantization step used by the enhancement layer encoding means 16. The quantization step may be a preset value or a value externally designated by a quantization parameter.
The quantizing means 121 generates a quantized coefficient by dividing the transform coefficient by the size of the quantization step and rounding it to an integer value.
The quantizing means 121 outputs the quantized transform coefficient (quantized coefficient) to the variable length coding means 122 and the local decoding means 13.

可変長符号化手段１２２は、量子化手段１２１で生成された量子化係数を、可変長符号化して、ストリームデータ（基本レイヤ符号化ストリーム）を生成するものである。なお、可変長符号化手段１２２における可変長符号化は、ハフマン符号化、算術符号化等、一般的な方式を用いればよい。
可変長符号化手段１２２は、生成した基本レイヤ符号化ストリームをストリーム結合手段１７に出力する。 The variable length coding unit 122 performs variable length coding on the quantized coefficient generated by the quantization unit 121 to generate stream data (base layer coded stream). The variable length coding in the variable length coding means 122 may be a general method such as Huffman coding or arithmetic coding.
The variable length coding means 122 outputs the generated base layer coded stream to the stream combining means 17.

ローカル復号手段１３は、基本レイヤ符号化手段１２で生成された量子化係数を逆量子化し、周波数成分を逆変換することで距離画像を復号した復号画像を生成するものである。この復号画像は、復号側（動画像復号装置）で、基本レイヤの距離画像として復号される画像を、符号化側（距離画像符号化装置）でローカルに再現した画像である。
ローカル復号手段１３は、逆量子化手段１３０と、逆直交変換手段１３１と、を備える。 The local decoding means 13 inversely quantizes the quantized coefficient generated by the base layer encoding means 12, and inversely transforms the frequency component to generate a decoded image obtained by decoding the distance image. This decoded image is an image that is locally reproduced on the encoding side (distance image encoding device), on the decoding side (moving image decoding device), as an image that is decoded as a distance image of the base layer.
The local decoding means 13 includes an inverse quantization means 130 and an inverse orthogonal transformation means 131.

逆量子化手段１３０は、基本レイヤ符号化手段１２の量子化手段１２１が量子化した量子化係数に対して、量子化手段１２１で行った処理の逆の処理である逆量子化を行うものである。すなわち、逆量子化手段１３０は、量子化係数に量子化ステップのサイズを乗算することで、周波数成分である変換係数を生成する。
逆量子化手段１３０は、逆量子化後の変換係数を、逆直交変換手段１３１に出力する。 The inverse quantization means 130 performs inverse quantization, which is the inverse processing of the processing performed by the quantization means 121, on the quantized coefficient quantized by the quantization means 121 of the base layer encoding means 12. is there. That is, the inverse quantization unit 130 multiplies the quantized coefficient by the size of the quantization step to generate a transform coefficient that is a frequency component.
The inverse quantization unit 130 outputs the inversely quantized transform coefficient to the inverse orthogonal transform unit 131.

逆直交変換手段１３１は、逆量子化手段１３０が逆量子化した変換係数に対して、基本レイヤ符号化手段１２の直交変換手段１２０で行った処理の逆の処理である逆直交変換（例えば、逆離散コサイン変換）を行うものである。この逆直交変換手段１３１で変換されたブロックごとの画像によって、復号側（動画像復号装置）で基本レイヤの距離画像として復号される画像（復号画像）を再現することができる。
逆直交変換手段１３１は、生成した復号画像を復号差分画像生成手段１４に出力する。 The inverse orthogonal transform unit 131 performs an inverse orthogonal transform (for example, an inverse orthogonal transform) that is a process reverse to the process performed by the orthogonal transform unit 120 of the base layer coding unit 12 on the transform coefficient dequantized by the inverse quantization unit 130. Inverse Discrete Cosine Transform) is performed. An image (decoded image) that is decoded as a distance image of the base layer on the decoding side (moving image decoding device) can be reproduced by the image for each block converted by the inverse orthogonal transform unit 131.
The inverse orthogonal transformation unit 131 outputs the generated decoded image to the decoded difference image generation unit 14.

復号差分画像生成手段１４は、符号化対象である元の距離画像と、ローカル復号手段１３で復号された復号画像との差分を復号差分画像として生成するものである。
この復号差分画像生成手段１４は、距離画像から復号画像を減算することで、符号化対象である元の距離画像と、基本レイヤで符号化される距離画像との差分を生成する。
復号差分画像生成手段１４は、生成した復号差分画像を奥行別差分画像生成手段１５（１５Ａ，１５Ｂ，１５Ｃ，１５Ｄ）に出力する。 The decoded difference image generation means 14 generates a difference between the original distance image to be encoded and the decoded image decoded by the local decoding means 13 as a decoded difference image.
The decoded difference image generation unit 14 subtracts the decoded image from the distance image to generate a difference between the original distance image to be encoded and the distance image encoded in the base layer.
The decoded difference image generation means 14 outputs the generated decoded difference image to the depth-specific difference image generation means 15 (15A, 15B, 15C, 15D).

奥行別差分画像生成手段１５は、復号差分画像生成手段１４で生成された復号差分画像から、領域区分手段１１で区分された領域ごとの画像を奥行別差分画像として生成するものである。
ここでは、奥行別差分画像生成手段１５を、奥行範囲ごとに、拡張レイヤ数（奥行階層数）に応じた複数の奥行別差分画像生成手段１５Ａ，１５Ｂ，１５Ｃ，１５Ｄで構成している。
奥行別差分画像生成手段１５Ａは、復号差分画像のうちで、領域区分手段１１で区分された視点位置からの距離が最も近い奥行範囲の領域を、奥行別差分画像として生成するものである。
奥行別差分画像生成手段１５Ｂは、復号差分画像のうちで、領域区分手段１１で区分された視点位置からの距離が２番目に近い奥行範囲の領域を、奥行別差分画像として生成するものである。
同様に、奥行別差分画像生成手段１５Ｃ，１５Ｄは、それぞれ復号差分画像のうちで、領域区分手段１１で区分された視点位置からの距離が３番目，４番目に近い奥行範囲の領域を、奥行別差分画像として生成するものである。 The depth-by-depth difference image generation means 15 generates an image for each area divided by the area division means 11 as a depth-by-depth difference image from the decoded difference image generated by the decoded difference image generation means 14.
Here, the depth-based difference image generation means 15 is configured by a plurality of depth-based difference image generation means 15A, 15B, 15C, and 15D corresponding to the number of enhancement layers (depth layers) for each depth range.
The depth-difference image generation unit 15A is configured to generate, as the depth-difference image, the region of the depth range having the shortest distance from the viewpoint position divided by the region division unit 11 in the decoded difference image.
The depth-difference image generation unit 15B is configured to generate, as the depth-difference image, a region in the depth range having the second closest distance from the viewpoint position divided by the region division unit 11 in the decoded difference image. ..
Similarly, the depth-by-depth difference image generation units 15C and 15D respectively determine the depth range regions whose distances from the viewpoint position divided by the region division unit 11 are the third and fourth distances from the decoded difference images. It is generated as another difference image.

なお、奥行別差分画像生成手段１５Ａ〜１５Ｄは、復号差分画像から、所定の奥行範囲の領域の奥行別差分画像を生成する点で同じ処理を行う。
具体的には、奥行別差分画像生成手段１５Ａ〜１５Ｄは、それぞれ、復号差分画像生成手段１４で生成された復号差分画像に、領域区分手段１１から出力される領域情報であるマスクデータを乗算することで、奥行範囲ごとの奥行別差分画像を生成する。
奥行別差分画像生成手段１５は、生成した奥行範囲ごとの奥行別差分画像を拡張レイヤ符号化手段１６に出力する。 It should be noted that the depth-specific difference image generation means 15A to 15D perform the same process in that the depth-specific difference images of the region of the predetermined depth range are generated from the decoded difference image.
Specifically, the depth-by-depth difference image generation units 15A to 15D each multiply the decoded difference image generated by the decoded difference image generation unit 14 by the mask data which is the region information output from the region classification unit 11. Thus, a depth-specific difference image for each depth range is generated.
The depth-specific difference image generation unit 15 outputs the generated depth-specific difference image for each depth range to the enhancement layer encoding unit 16.

拡張レイヤ符号化手段１６は、奥行別差分画像生成手段１５で生成された奥行別差分画像を周波数成分に変換して基本レイヤ符号化手段１２よりも小さい量子化ステップ（第２量子化ステップ）で量子化し、量子化係数を可変長符号化することで拡張レイヤの符号化ストリーム（拡張レイヤ符号化ストリーム）を生成するものである。
ここでは、拡張レイヤ符号化手段１６を、奥行範囲ごとに、拡張レイヤ数に応じた複数の拡張レイヤ符号化手段１６Ａ，１６Ｂ，１６Ｃ，１６Ｄで構成している。 The enhancement layer encoding unit 16 converts the depth-specific difference image generated by the depth-specific difference image generating unit 15 into a frequency component, and performs the quantization step (second quantization step) smaller than that of the base layer encoding unit 12. Quantization and variable length coding of the quantized coefficient generate an encoded stream of the enhancement layer (enhancement layer encoded stream).
Here, the enhancement layer coding means 16 is configured by a plurality of enhancement layer coding means 16A, 16B, 16C, 16D according to the number of enhancement layers for each depth range.

拡張レイヤ符号化手段１６Ａは、奥行別差分画像生成手段１５（１５Ａ）で生成される視点位置からの距離が最も近い奥行範囲の奥行別差分画像を符号化するものである。
拡張レイヤ符号化手段１６Ｂは、奥行別差分画像生成手段１５（１５Ｂ）で生成される視点位置からの距離が２番目に近い奥行範囲の奥行別差分画像を符号化するものである。
同様に、拡張レイヤ符号化手段１６Ｃ，１６Ｄは、それぞれ奥行別差分画像生成手段１５（１５Ｃ，１５Ｄ）で生成される視点位置からの距離が３番目，４番目に近い奥行範囲の奥行別差分画像を符号化するものである。 The enhancement layer encoding unit 16A encodes the depth-specific difference image in the depth range having the shortest distance from the viewpoint position generated by the depth-specific difference image generating unit 15 (15A).
The enhancement layer encoding unit 16B encodes the depth-specific difference image in the depth range having the second closest distance from the viewpoint position generated by the depth-specific difference image generation unit 15 (15B).
Similarly, the enhancement layer encoding units 16C and 16D respectively depth-difference images in the depth range whose distances from the viewpoint position generated by the depth-difference image generation unit 15 (15C and 15D) are the third and fourth closest, respectively. Is to be encoded.

なお、拡張レイヤ符号化手段１６Ａ〜１６Ｄは、奥行別差分画像を符号化する点で同じ処理を行う。
拡張レイヤ符号化手段１６Ａ〜１６Ｄは、同じ構成であるため、ここでは、拡張レイヤ符号化手段１６Ａの構成を例に説明する。
拡張レイヤ符号化手段１６Ａは、直交変換手段１６０と、量子化手段１６１と、可変長符号化手段１６２と、を備える。 Note that the enhancement layer coding units 16A to 16D perform the same processing in that the difference images for each depth are coded.
Since the enhancement layer coding means 16A to 16D have the same configuration, the configuration of the enhancement layer coding means 16A will be described here as an example.
The enhancement layer coding unit 16A includes an orthogonal transformation unit 160, a quantization unit 161, and a variable length coding unit 162.

直交変換手段１６０は、奥行別差分画像を所定の大きさのブロックごとに直交変換し、周波数成分に変換するものである。この直交変換は、例えば、離散コサイン変換（ＤＣＴ）である。直交変換手段１６０は、算出した周波数成分である変換係数を量子化手段１６１に出力する。 The orthogonal transformation means 160 orthogonally transforms the depth difference image for each block of a predetermined size, and transforms it into frequency components. This orthogonal transform is, for example, a discrete cosine transform (DCT). The orthogonal transform means 160 outputs the transform coefficient, which is the calculated frequency component, to the quantizing means 161.

量子化手段１６１は、直交変換手段１６０が算出した周波数成分である変換係数を、予め設定した量子化ステップ（第２量子化ステップ）で量子化するものである。量子化手段１６１は、基本レイヤ符号化手段１２で用いる量子化ステップよりも小さい値の量子化ステップで諧調のレベルを細かくして量子化する。
量子化手段１６１は、量子化した変換係数（量子化係数）を、可変長符号化手段１６２に出力する。 The quantizing unit 161 quantizes the transform coefficient, which is the frequency component calculated by the orthogonal transforming unit 160, in a preset quantizing step (second quantizing step). The quantizing means 161 finely quantizes the gradation level with a quantizing step having a smaller value than the quantizing step used in the base layer encoding means 12.
The quantizing means 161 outputs the quantized transform coefficient (quantized coefficient) to the variable length coding means 162.

可変長符号化手段１６２は、量子化手段１６１で生成された量子化係数を、可変長符号化して、ストリームデータ（拡張レイヤ符号化ストリーム）を生成するものである。
可変長符号化手段１６２は、生成した拡張レイヤ符号化ストリームをストリーム結合手段１７に出力する。
このように、拡張レイヤ符号化手段１６Ａ〜１６Ｄは、それぞれ奥行範囲ごとに奥行別差分画像を符号化し、符号化した拡張レイヤ符号化ストリームをストリーム結合手段１７に出力する。 The variable length coding unit 162 performs variable length coding on the quantized coefficient generated by the quantization unit 161 to generate stream data (enhancement layer coded stream).
The variable length coding unit 162 outputs the generated enhancement layer coded stream to the stream combining unit 17.
In this way, the enhancement layer encoding units 16A to 16D encode the depth-specific difference images for each depth range, and output the encoded enhancement layer encoded stream to the stream combining unit 17.

ストリーム結合手段１７は、基本レイヤ符号化手段１２で生成された基本レイヤ符号化ストリームと、拡張レイヤ符号化手段１６で生成された奥行範囲ごとの拡張レイヤ符号化ストリームとを結合するものである。
このストリーム結合手段１７は、基本レイヤ符号化ストリームと奥行範囲ごとの拡張レイヤ符号化ストリームとを連結する。このとき、ストリーム結合手段１７は、基本レイヤ符号化ストリームの次に拡張レイヤ符号化ストリームを連結する。また、ストリーム結合手段１７は、複数の拡張レイヤ符号化ストリームについては、視点位置に近い奥行範囲の拡張レイヤ符号化ストリームほど前にして連結を行う。ここでは、ストリーム結合手段１７は、拡張レイヤ符号化手段１６Ａ，１６Ｂ，１６Ｃ，１６Ｄの順に、生成した拡張レイヤ符号化ストリームを基本レイヤ符号化ストリームに連結する。これによって、復号側では、基本レイヤ符号化ストリームの次に、優先度の高い拡張レイヤ符号化ストリームを復号することができる。 The stream combining means 17 combines the base layer coded stream generated by the base layer coding means 12 and the enhancement layer coded stream for each depth range generated by the enhancement layer coding means 16.
The stream combining unit 17 connects the base layer coded stream and the enhancement layer coded stream for each depth range. At this time, the stream combining unit 17 connects the enhancement layer coded stream next to the base layer coded stream. In addition, the stream combining unit 17 connects the plurality of enhancement layer coded streams earlier in the depth range closer to the viewpoint position. Here, the stream combining unit 17 connects the generated enhancement layer coded stream to the base layer coded stream in the order of the enhancement layer coding units 16A, 16B, 16C and 16D. As a result, the decoding side can decode the enhancement layer coded stream with the highest priority next to the base layer coded stream.

さらに、ストリーム結合手段１７は、レイヤの構成（例えば、拡張レイヤ〔奥行階層〕の数）、距離画像の大きさ（水平画素数、垂直画素数）、基本レイヤ符号化ストリームのデータ長、各拡張レイヤ符号化ストリームのデータ長等のストリームの構成情報をヘッダ情報として生成する。
そして、ストリーム結合手段１７は、基本レイヤ符号化ストリームと奥行範囲ごとの拡張レイヤ符号化ストリームとを連結したストリームに、ヘッダ情報を付加して、符号化ストリームを生成する。
なお、入力される距離画像が動画像に対応した連続したフレームで構成される場合、ストリーム結合手段１７は、フレーム数だけ符号化ストリームを連続させて出力する。 Furthermore, the stream combining unit 17 configures the layers (for example, the number of extension layers [depth layers]), the size of the distance image (the number of horizontal pixels and the number of vertical pixels), the data length of the base layer encoded stream, and each extension. Stream configuration information such as the data length of the layer encoded stream is generated as header information.
Then, the stream combining unit 17 adds header information to the stream obtained by concatenating the base layer coded stream and the enhancement layer coded stream for each depth range to generate a coded stream.
When the input range image is composed of continuous frames corresponding to the moving image, the stream combining unit 17 continuously outputs the encoded streams by the number of frames.

以上説明したように距離画像符号化装置１を構成することで、距離画像符号化装置１は、距離画像全体を荒く量子化して符号化した基本レイヤ符号化ストリームと、距離画像を奥行範囲ごとに細かく量子化した符号化ストリームとによって、距離画像を階層的に符号化することができる。 By configuring the distance image encoding device 1 as described above, the distance image encoding device 1 allows the distance image encoding device 1 to roughly quantize and encode the entire distance image and encode the distance image for each depth range. The distance image can be hierarchically encoded by the finely quantized encoded stream.

これによって、符号化ストリームを復号する距離画像復号装置では、ＣＰＵパワー等の性能に応じて、基本レイヤと視点位置からの距離が最も近い拡張レイヤのみを復号する等、階層的に距離画像を復号することができる。
また、符号化ストリームは階層的に符号化されているため、伝送路上の帯域に応じて、優先度の低いレイヤの伝送を行わないようにすることができる。この場合でも、視点位置に近い奥行範囲のレイヤは、伝送対象となるため、復号装置において、精度よく奥行きを再現することができる。
なお、距離画像符号化装置１は、コンピュータを、前記した各手段として機能させるためのプログラム（距離画像符号化プログラム）で動作させることができる。 As a result, a range image decoding device that decodes an encoded stream decodes a range image hierarchically, such as decoding only the enhancement layer that is the closest to the base layer in distance from the viewpoint position, according to performance such as CPU power. can do.
Further, since the coded stream is hierarchically coded, it is possible to prevent transmission of a layer having a low priority according to the band on the transmission path. Even in this case, since the layer in the depth range close to the viewpoint position is the transmission target, the depth can be accurately reproduced in the decoding device.
It should be noted that the distance image encoding device 1 can operate a computer with a program (distance image encoding program) for causing each of the above-described means to function.

〔距離画像符号化装置の動作〕
次に、図５を参照（構成については、適宜図１参照）して、本発明の実施形態に係る距離画像符号化装置１の動作について説明する。なお、ここでは、距離画像は、動画像に対応した連続したフレームで構成されているものとする。
ステップＳ１において、閾値設定手段１０は、距離画像の奥行きを階層的に区分するための閾値を設定する。この閾値によって、距離画像を、奥行最小値から奥行最大値までの範囲で、閾値を境界とする奥行範囲ごとに階層化することができる。
ステップＳ２において、領域区分手段１１は、ステップＳ１で設定された閾値で特定される奥行範囲ごとに、距離画像の領域を区分する。ここでは、領域区分手段１１は、奥行範囲ごとの領域情報をマスクデータとして生成する。 [Operation of range image encoding device]
Next, with reference to FIG. 5 (for the configuration, refer to FIG. 1 as needed), the operation of the distance image encoding device 1 according to the embodiment of the present invention will be described. Note that, here, the distance image is assumed to be composed of continuous frames corresponding to the moving image.
In step S1, the threshold setting means 10 sets a threshold for hierarchically dividing the depth of the distance image. With this threshold, the distance image can be hierarchized in the range from the minimum depth value to the maximum depth value for each depth range with the threshold as a boundary.
In step S2, the area dividing unit 11 divides the area of the distance image for each depth range specified by the threshold value set in step S1. Here, the area dividing unit 11 generates area information for each depth range as mask data.

ステップＳ３において、基本レイヤ符号化手段１２は、基本レイヤとして、距離画像全体を符号化する。このステップＳ３では、基本レイヤ符号化手段１２は、直交変換手段１２０によって、距離画像を所定の大きさのブロックごとに直交変換（ＤＣＴ変換）し、周波数成分である変換係数を生成する。そして、基本レイヤ符号化手段１２は、量子化手段１２１によって、拡張レイヤよりも大きい量子化ステップで変換係数を量子化し、量子化係数を生成する。そして、基本レイヤ符号化手段１２は、可変長符号化手段１２２によって、量子化係数を可変長符号化し、ストリームデータ（基本レイヤ符号化ストリーム）を生成する。 In step S3, the base layer encoding means 12 encodes the entire distance image as a base layer. In this step S3, the base layer coding means 12 performs orthogonal transform (DCT transform) on the range image for each block of a predetermined size by the orthogonal transform means 120 to generate transform coefficients which are frequency components. Then, the base layer coding means 12 quantizes the transform coefficient by the quantizing means 121 in a quantization step larger than that of the enhancement layer, and generates a quantized coefficient. Then, the base layer coding means 12 performs variable length coding on the quantized coefficient by the variable length coding means 122 to generate stream data (base layer coded stream).

ステップＳ４において、ローカル復号手段１３は、ステップＳ３で量子化によって生成された量子化係数から、基本レイヤ符号化手段１２で符号化した距離画像を復号した復号画像を生成する。このステップＳ４では、ローカル復号手段１３は、逆量子化手段１３０によって、量子化手段１２１で生成された量子化係数を逆量子化し、周波数成分である変換係数を生成する。そして、ローカル復号手段１３は、逆直交変換手段１３１によって、変換係数を逆直交変換し、復号画像を生成する。
ステップＳ５において、復号差分画像生成手段１４は、符号化対象である元の距離画像と、ステップＳ４で生成された復号画像との差分を復号差分画像として生成する。 In step S4, the local decoding means 13 generates a decoded image obtained by decoding the distance image encoded by the base layer encoding means 12 from the quantized coefficient generated by the quantization in step S3. In step S4, the local decoding unit 13 dequantizes the quantized coefficient generated by the quantizing unit 121 by the dequantizing unit 130 to generate a transform coefficient that is a frequency component. Then, the local decoding unit 13 performs inverse orthogonal transform on the transform coefficient by the inverse orthogonal transform unit 131 to generate a decoded image.
In step S5, the decoded difference image generation unit 14 generates a difference between the original distance image to be encoded and the decoded image generated in step S4 as a decoded difference image.

ステップＳ６において、奥行別差分画像生成手段１５は、ステップＳ５で生成された復号差分画像から、ステップＳ２で区分された奥行範囲ごとの領域の画像を奥行別差分画像として生成する。ここでは、奥行別差分画像生成手段１５は、ステップＳ５で生成された復号差分画像に、ステップＳ２で生成された奥行範囲ごとのマスクデータを乗算することで、奥行範囲ごとの奥行別差分画像を生成する。 In step S6, the depth-by-depth difference image generation unit 15 generates an image of a region for each depth range divided in step S2 as a depth-by-depth difference image from the decoded difference image generated in step S5. Here, the depth-specific difference image generation unit 15 multiplies the decoded difference image generated in step S5 by the mask data for each depth range generated in step S2 to generate the depth-specific difference image for each depth range. To generate.

ステップＳ７において、拡張レイヤ符号化手段１６は、拡張レイヤとして、ステップＳ６で生成された奥行別差分画像を奥行範囲ごとに符号化する。このステップＳ７では、拡張レイヤ符号化手段１６は、ステップＳ３と同様の符号化処理によって、奥行別差分画像を符号化し、ストリームデータ（拡張レイヤ符号化ストリーム）を生成する。なお、拡張レイヤ符号化手段１６の符号化処理内における量子化は、ステップＳ３の基本レイヤ符号化手段１２における量子化よりも量子化ステップの値を小さくして行う。 In step S7, the enhancement layer encoding unit 16 encodes the depth-specific difference image generated in step S6 for each depth range as an enhancement layer. In step S7, the enhancement layer encoding unit 16 encodes the depth difference image by the same encoding process as in step S3 to generate stream data (enhancement layer encoded stream). The quantization in the encoding process of the enhancement layer encoding means 16 is performed with a smaller value of the quantization step than the quantization in the base layer encoding means 12 in step S3.

ステップＳ８において、ストリーム結合手段１７は、ステップＳ３で生成した基本レイヤ符号化ストリームと、ステップＳ７で生成された奥行範囲ごとの拡張レイヤ符号化ストリームとを結合して、符号化ストリームを生成する。
ここで、次フレームがさらに入力される場合（ステップＳ９でＹｅｓ）、距離画像符号化装置１は、ステップＳ２に戻って動作を続ける。
一方、入力が終了した場合（ステップＳ９でＮｏ）、距離画像符号化装置１は、動作を終了する。なお、距離画像が、静止画像の撮影画像に対応した画像の場合、ステップＳ９の動作は省略することができる。 In step S8, the stream combining unit 17 combines the base layer coded stream generated in step S3 and the enhancement layer coded stream for each depth range generated in step S7 to generate a coded stream.
Here, when the next frame is further input (Yes in step S9), the distance image encoding device 1 returns to step S2 and continues the operation.
On the other hand, when the input is completed (No in step S9), the distance image encoding device 1 ends the operation. If the distance image is an image corresponding to the captured image of the still image, the operation of step S9 can be omitted.

〔距離画像復号装置の構成〕
次に、図６を参照して、本発明の実施形態に係る距離画像復号装置２の構成について説明する。 [Configuration of range image decoding device]
Next, the configuration of the distance image decoding device 2 according to the embodiment of the present invention will be described with reference to FIG.

距離画像復号装置２は、距離画像符号化装置１（図１）で符号化された距離画像の符号化ストリームを復号するものである。
図６に示すように、距離画像復号装置２は、ストリーム分離手段２０と、基本レイヤ復号手段２１と、拡張レイヤ復号手段２２と、画像合成手段２３と、を備える。 The distance image decoding device 2 decodes the encoded stream of the distance image encoded by the distance image encoding device 1 (FIG. 1).
As shown in FIG. 6, the distance image decoding device 2 includes a stream separating unit 20, a base layer decoding unit 21, an enhancement layer decoding unit 22, and an image synthesizing unit 23.

ストリーム分離手段２０は、符号化ストリームを、基本レイヤ符号化ストリームと、複数の拡張レイヤ符号化ストリームとに分離するものである。
このストリーム分離手段２０は、符号化ストリームのヘッダ情報を参照して、符号化ストリームから、基本レイヤ符号化ストリームと、複数の拡張レイヤ符号化ストリームとを分離して抽出する。
ストリーム分離手段２０は、分離した基本レイヤ符号化ストリームを基本レイヤ復号手段２１に出力する。また、ストリーム分離手段２０は、分離した拡張レイヤ符号化ストリームを拡張レイヤ復号手段２２に出力する。ここでは、ストリーム分離手段２０は、複数の拡張レイヤ符号化ストリームを、入力した順、すなわち、優先度の高い（視点位置からの距離が近い）順に、拡張レイヤ復号手段２２Ａ，２２Ｂ，２２Ｃ，２２Ｄに出力する。 The stream separating means 20 separates the encoded stream into a base layer encoded stream and a plurality of enhancement layer encoded streams.
The stream separating means 20 refers to the header information of the coded stream and separates and extracts the base layer coded stream and the plurality of enhancement layer coded streams from the coded stream.
The stream separating means 20 outputs the separated base layer encoded stream to the base layer decoding means 21. Further, the stream separating means 20 outputs the separated enhancement layer encoded stream to the enhancement layer decoding means 22. Here, the stream separation unit 20 receives the plurality of enhancement layer encoded streams in the order of input, that is, in the order of high priority (closest distance from the viewpoint position), the enhancement layer decoding units 22A, 22B, 22C, 22D. Output to.

基本レイヤ復号手段２１は、ストリーム分離手段２０で分離された基本レイヤ符号化ストリームを復号するものである。
基本レイヤ復号手段２１は、可変長復号手段２１０と、逆量子化手段２１１と、逆直交変換手段２１２と、を備える。 The base layer decoding means 21 decodes the base layer encoded stream separated by the stream separating means 20.
The base layer decoding unit 21 includes a variable length decoding unit 210, an inverse quantization unit 211, and an inverse orthogonal transformation unit 212.

可変長復号手段２１０は、基本レイヤ符号化ストリームに対して、可変長符号化手段１２２（図１）の逆変換となる可変長復号を行うものである。この可変長復号によって、基本レイヤ符号化ストリームから量子化係数が復号される。
可変長復号手段２１０は、復号した量子化係数を逆量子化手段２１１に出力する。 The variable length decoding unit 210 performs variable length decoding on the base layer coded stream, which is the inverse transform of the variable length coding unit 122 (FIG. 1). By this variable length decoding, the quantized coefficient is decoded from the base layer encoded stream.
The variable length decoding means 210 outputs the decoded quantized coefficient to the inverse quantization means 211.

逆量子化手段２１１は、可変長復号手段２１０で復号された量子化係数に対して、量子化手段１２１（図１）で行った処理の逆の処理である逆量子化を行うものである。すなわち、逆量子化手段１３０は、量子化係数に量子化手段１２１と同じ量子化ステップのサイズを乗算することで、周波数成分である変換係数を生成する。
逆量子化手段２１１は、逆量子化後の変換係数を、逆直交変換手段２１２に出力する。 The inverse quantizer 211 performs inverse quantization, which is the reverse of the process performed by the quantizer 121 (FIG. 1), on the quantized coefficient decoded by the variable length decoder 210. That is, the inverse quantization unit 130 multiplies the quantized coefficient by the size of the same quantization step as the quantization unit 121 to generate a transform coefficient that is a frequency component.
The inverse quantization unit 211 outputs the inversely quantized transform coefficient to the inverse orthogonal transform unit 212.

逆直交変換手段２１２は、逆量子化手段２１１が逆量子化した変換係数に対して、直交変換手段１２０（図１）で行った処理の逆の処理である逆直交変換（例えば、逆離散コサイン変換）を行うものである。この逆直交変換手段２１２で変換されたブロックごとの画像によって、基本レイヤ符号化ストリームを復号した画像（基本レイヤ復号画像）が生成される。この基本レイヤ復号画像は、単体でも距離画像となる画像であるが、荒く量子化されて符号化／復号された画像であるため、奥行きの精度は低い。
逆直交変換手段２１２は、生成した基本レイヤ復号画像を画像合成手段２３に出力する。
また、逆直交変換手段２１２は、基本レイヤ復号画像を生成した後、拡張レイヤ復号手段２２（２２Ａ）に拡張レイヤ符号化ストリームの復号開始を指示する。 The inverse orthogonal transform unit 212 performs an inverse orthogonal transform (for example, an inverse discrete cosine) that is the reverse process of the process performed by the orthogonal transform unit 120 (FIG. 1) on the transform coefficient inversely quantized by the inverse quantization unit 211. Conversion). An image obtained by decoding the base layer encoded stream (base layer decoded image) is generated by the image for each block converted by the inverse orthogonal transform unit 212. This base layer decoded image is an image that becomes a range image even by itself, but since it is an image that is roughly quantized and encoded/decoded, the depth accuracy is low.
The inverse orthogonal transform unit 212 outputs the generated base layer decoded image to the image synthesizing unit 23.
After generating the base layer decoded image, the inverse orthogonal transformation unit 212 instructs the enhancement layer decoding unit 22 (22A) to start decoding the enhancement layer encoded stream.

拡張レイヤ復号手段２２は、ストリーム分離手段２０で分離された拡張レイヤ符号化ストリームを復号するものである。
ここでは、拡張レイヤ復号手段２２を、奥行範囲ごとに、拡張レイヤ数に応じた複数の拡張レイヤ復号手段２２Ａ，２２Ｂ，２２Ｃ，２２Ｄで構成している。なお、ここでは、拡張レイヤ復号手段２２を、距離画像符号化装置１（図１）の拡張レイヤ符号化手段１６と同じ数の手段で構成しているが、拡張レイヤ復号手段２２の数は、“１”以上、拡張レイヤのレイヤ数以下であればよい。ただし、距離画像を精度よく復号するには、拡張レイヤ復号手段２２を、距離画像符号化装置１（図１）の拡張レイヤ符号化手段１６と同じ数の手段で構成することが好ましい。 The enhancement layer decoding means 22 decodes the enhancement layer coded stream separated by the stream separating means 20.
Here, the enhancement layer decoding means 22 is composed of a plurality of enhancement layer decoding means 22A, 22B, 22C, 22D according to the number of enhancement layers for each depth range. Here, the enhancement layer decoding means 22 is composed of the same number of means as the enhancement layer coding means 16 of the distance image coding device 1 (FIG. 1), but the number of enhancement layer decoding means 22 is It may be "1" or more and less than the number of enhancement layers. However, in order to decode the distance image with high accuracy, it is preferable that the enhancement layer decoding means 22 is configured by the same number of means as the enhancement layer encoding means 16 of the distance image encoding device 1 (FIG. 1).

拡張レイヤ復号手段２２Ａは、ストリーム分離手段２０で分離された優先度が最も高い拡張レイヤ符号化ストリームを復号した拡張レイヤ復号画像（奥行別差分画像）を生成するものである。
拡張レイヤ復号手段２２Ｂは、ストリーム分離手段２０で分離された優先度が２番目に高い拡張レイヤ符号化ストリームを復号した拡張レイヤ復号画像（奥行別差分画像）を生成するものである。
同様に、拡張レイヤ復号手段２２Ｃ，２２Ｄは、それぞれ、ストリーム分離手段２０で分離された優先度が３番目，４番目に高い拡張レイヤ符号化ストリームを復号した拡張レイヤ復号画像（奥行別差分画像）を生成するものである。 The enhancement layer decoding means 22A is for generating an enhancement layer decoded image (depth difference image) obtained by decoding the enhancement layer encoded stream having the highest priority separated by the stream separation means 20.
The enhancement layer decoding means 22B is for generating the enhancement layer decoded image (depth difference image) obtained by decoding the enhancement layer coded stream having the second highest priority separated by the stream separation means 20.
Similarly, the enhancement layer decoding means 22C and 22D respectively decode the enhancement layer decoded image (depth difference image) obtained by decoding the enhancement layer encoded streams separated by the stream separating means 20 and having the third and fourth highest priorities. Is generated.

なお、拡張レイヤ復号手段２２Ａ〜２２Ｄは、拡張レイヤ符号化ストリームから奥行別差分画像を復号する点で同じ処理を行う。
拡張レイヤ復号手段２２Ａ〜２２Ｄは、同じ構成であるため、ここでは、拡張レイヤ復号手段２２Ａの構成を例に説明する。
拡張レイヤ復号手段２２Ａは、可変長復号手段２２０と、逆量子化手段２２１と、逆直交変換手段２２２と、を備える。 Note that the enhancement layer decoding means 22A to 22D perform the same process in that the depth-specific difference image is decoded from the enhancement layer encoded stream.
Since the enhancement layer decoding means 22A to 22D have the same configuration, the configuration of the enhancement layer decoding means 22A will be described here as an example.
The enhancement layer decoding unit 22A includes a variable length decoding unit 220, an inverse quantization unit 221, and an inverse orthogonal transformation unit 222.

可変長復号手段２２０、逆量子化手段２２１および逆直交変換手段２２２は、復号対象が拡張レイヤ符号化ストリームである点を除いて、それぞれ、可変長復号手段２１０、逆量子化手段２１１および逆直交変換手段２１２と同じ処理を行うものであるため、詳細な説明は省略する。ただし、逆量子化手段２２１で使用する量子化ステップは、逆量子化手段２１１で使用する量子化ステップよりも小さい値で、量子化手段１６１（図１）で使用する量子化ステップと同じである。
逆量子化手段２２１は、復号した拡張レイヤ復号画像（奥行別差分画像）を画像合成手段２３に出力する。
なお、拡張レイヤ復号手段２２Ａは、基本レイヤ復号手段２１（逆直交変換手段２１２）から、復号開始を指示された段階で、拡張レイヤ符号化ストリームの復号を開始する。 The variable length decoding unit 220, the dequantization unit 221, and the inverse orthogonal transformation unit 222 respectively include the variable length decoding unit 210, the dequantization unit 211, and the anti-orthogonal unit except that the decoding target is the enhancement layer coded stream. Since the same processing as that of the conversion means 212 is performed, detailed description will be omitted. However, the quantization step used by the inverse quantization means 221 has a smaller value than the quantization step used by the inverse quantization means 211, and is the same as the quantization step used by the quantization means 161 (FIG. 1). ..
The inverse quantization unit 221 outputs the decoded enhancement layer decoded image (depth-specific difference image) to the image synthesis unit 23.
The enhancement layer decoding means 22A starts decoding the enhancement layer coded stream at the stage when the base layer decoding means 21 (inverse orthogonal transformation means 212) instructs the start of decoding.

また、拡張レイヤ復号手段２２Ａは、優先度が１番目の拡張レイヤ符号化ストリームを復号した後、優先度の低い拡張レイヤ復号手段２２Ｂに、優先度が２番目の拡張レイヤ符号化ストリームの復号開始を指示する。
このように、拡張レイヤ復号手段２２Ａ〜２２Ｄは、優先度の高い拡張レイヤ符号化ストリームから順番に復号し、復号した奥行別差分画像を画像合成手段２３に出力する。 Further, the enhancement layer decoding means 22A decodes the enhancement layer coded stream having the first priority, and then starts decoding the enhancement layer coded stream having the second priority to the enhancement layer decoding means 22B having the low priority. Instruct.
In this way, the enhancement-layer decoding units 22A to 22D sequentially decode the enhancement-layer encoded streams with higher priorities, and output the decoded depth-specific difference images to the image synthesizing unit 23.

画像合成手段２３は、基本レイヤ復号手段２１で復号された基本レイヤ復号画像と、拡張レイヤ復号手段２２で復号された複数の拡張レイヤ復号画像（奥行別差分画像）とを合成するものである。
この画像合成手段２３は、基本レイヤ復号画像に複数の拡張レイヤ復号画像を加算することで、奥行精度を高めた距離画像を生成する。 The image synthesizing unit 23 synthesizes the base layer decoded image decoded by the base layer decoding unit 21 and the plurality of enhancement layer decoded images (depth difference images) decoded by the enhancement layer decoding unit 22.
The image synthesizing unit 23 adds a plurality of enhancement layer decoded images to the base layer decoded image to generate a depth image with increased depth accuracy.

なお、画像合成手段２３は、動画像の撮影画像に対応する距離画像を復号する場合には、撮影画像の１フレームの周期内で距離画像を復号する必要がある。そこで、距離画像復号装置２は、図示を省略したタイマ等の計時手段によって１フレーム分の符号化ストリームが入力された時点からの時間を計測する。そして、画像合成手段２３は、フレーム周期の残り時間が、１レイヤ分の拡張フレームの復号時間に満たない場合、基本レイヤの距離画像と、その時点までに復号されている拡張レイヤの奥行別差分画像とを合成し、符号化ストリームに対する距離画像として出力する。なお、１レイヤ分の拡張フレームの復号時間は、例えば、固定的に予め定めた時間であってもよいし、基本フレームの復号時間としてもよい。 When decoding the range image corresponding to the captured image of the moving image, the image composition unit 23 needs to decode the range image within the cycle of one frame of the captured image. Therefore, the distance image decoding device 2 measures the time from the time point when the encoded stream for one frame is input by a timing unit such as a timer (not shown). Then, when the remaining time of the frame period is less than the decoding time of the enhancement frame for one layer, the image synthesizing means 23 determines the depth image difference between the depth image of the base layer and the enhancement layer decoded up to that point. The image is combined and output as a distance image for the encoded stream. It should be noted that the decoding time of the extension frame for one layer may be, for example, a fixed fixed time or the decoding time of the basic frame.

以上説明したように距離画像復号装置２を構成することで、距離画像復号装置２は、距離画像符号化装置１で距離画像を奥行範囲ごとに階層的に符号化された符号化ストリームを復号することができる。
また、距離画像復号装置２は、ＣＰＵパワー等の性能に応じて、基本レイヤと視点位置からの距離が最も近い拡張レイヤのみを優先的に復号する等、階層的に距離画像を復号することができる。 By configuring the distance image decoding device 2 as described above, the distance image decoding device 2 decodes the encoded stream in which the distance image encoding device 1 hierarchically encodes the distance image for each depth range. be able to.
Further, the distance image decoding device 2 can hierarchically decode the distance image, such as preferentially decoding only the enhancement layer having the closest distance from the base layer and the viewpoint position according to the performance such as CPU power. it can.

〔距離画像復号装置の動作〕
次に、図７を参照（構成については、適宜図６参照）して、本発明の実施形態に係る距離画像復号装置２の動作について説明する。なお、ここでは、距離画像は、動画像に対応した連続したフレームで構成されているものとする。
ステップＳ１０において、ストリーム分離手段２０は、フレームごとに、符号化ストリームを基本レイヤ符号化ストリームと複数の拡張レイヤ符号化ストリームとに分離する。なお、ここでは、距離画像復号装置２は、図示を省略した計時手段によって１フレーム分の符号化ストリームが入力された時点からの時間を計測する [Operation of range image decoding device]
Next, the operation of the distance image decoding device 2 according to the embodiment of the present invention will be described with reference to FIG. 7 (for the configuration, refer to FIG. 6 as appropriate). Note that, here, the distance image is assumed to be composed of continuous frames corresponding to the moving image.
In step S10, the stream separating means 20 separates the coded stream into a base layer coded stream and a plurality of enhancement layer coded streams for each frame. Note that, here, the distance image decoding device 2 measures the time from the time point when the encoded stream for one frame is input by the timing means (not shown).

ステップＳ１１において、基本レイヤ復号手段２１は、ステップＳ１０で分離された基本レイヤ符号化ストリームを復号する。このステップＳ１１では、基本レイヤ復号手段２１は、可変長復号手段２１０によって、基本レイヤ符号化ストリームを可変長復号することで量子化係数を生成する。そして、基本レイヤ復号手段２１は、逆量子化手段２１１によって、逆量子化を行うことで、周波数成分である変換係数を生成する。そして、基本レイヤ復号手段２１は、逆直交変換手段２１２によって、変換係数を逆直交変換（例えば、逆離散コサイン変換）することで、基本レイヤ符号化ストリームの復号結果となる画像（基本レイヤ復号画像）を生成する。 In step S11, the base layer decoding means 21 decodes the base layer coded stream separated in step S10. In step S11, the base layer decoding means 21 generates a quantized coefficient by performing variable length decoding on the base layer encoded stream by the variable length decoding means 210. Then, the base layer decoding means 21 performs inverse quantization by the inverse quantization means 211 to generate transform coefficients which are frequency components. Then, the base layer decoding unit 21 performs an inverse orthogonal transform (for example, an inverse discrete cosine transform) on the transform coefficient by the inverse orthogonal transform unit 212 to obtain an image that is a decoding result of the base layer encoded stream (base layer decoded image). ) Is generated.

ステップＳ１２において、画像合成手段２３は、フレーム周期の残り時間が、１レイヤ分の拡張フレームの復号時間以上あるか否かを判定する。
ここで、フレーム周期の残り時間が拡張フレームの復号時間未満の場合（ステップＳ１２でＮｏ）、距離画像復号装置２は、ステップＳ１８に動作を進める。
一方、フレーム周期の残り時間が拡張フレームの復号時間以上の場合（ステップＳ１２でＹｅｓ）、ステップＳ１３において、拡張レイヤ復号手段２２は、拡張レイヤを識別するための変数ｉを初期化（ここでは、“１”に設定）する。 In step S12, the image synthesizing unit 23 determines whether the remaining time of the frame cycle is equal to or longer than the decoding time of the extended frame for one layer.
Here, when the remaining time of the frame period is less than the decoding time of the extension frame (No in step S12), the distance image decoding device 2 advances the operation to step S18.
On the other hand, when the remaining time of the frame cycle is equal to or longer than the decoding time of the enhancement frame (Yes in step S12), the enhancement layer decoding unit 22 initializes a variable i for identifying the enhancement layer (here, in step S13). Set to "1").

ステップＳ１４において、拡張レイヤ復号手段２２は、ステップＳ１０で分離されたｉ番目の優先度の拡張レイヤ符号ストリームを復号する。このステップＳ１４では、拡張レイヤ復号手段２２は、ステップＳ１１と同様の符号化処理によって、拡張レイヤ符号ストリームを復号し、拡張レイヤ復号画像を生成する。
ここで、フレーム周期の残り時間が拡張フレームの復号時間未満の場合（ステップＳ１５でＮｏ）、距離画像復号装置２は、ステップＳ１８に動作を進める。
一方、フレーム周期の残り時間が拡張フレームの復号時間以上の場合（ステップＳ１５でＹｅｓ）、ステップＳ１６において、拡張レイヤ復号手段２２は、変数ｉに“１”を加算する。 In step S14, the enhancement layer decoding means 22 decodes the i-th priority enhancement layer codestream separated in step S10. In this step S14, the enhancement layer decoding means 22 decodes the enhancement layer code stream by the same encoding process as in step S11 to generate an enhancement layer decoded image.
Here, when the remaining time of the frame period is less than the decoding time of the extension frame (No in step S15), the distance image decoding device 2 advances the operation to step S18.
On the other hand, when the remaining time of the frame period is equal to or longer than the decoding time of the enhancement frame (Yes in step S15), the enhancement layer decoding means 22 adds "1" to the variable i in step S16.

ここで、拡張レイヤ復号手段２２は、まだ、変数ｉが拡張レイヤ数（ここでは、“４”）に達していない場合（ステップＳ１７でＮｏ）、ステップＳ１４に戻って、次の階層の拡張レイヤ符号ストリームを復号する。
一方、変数ｉが拡張レイヤ数に達した場合（ステップＳ１７でＹｅｓ）、拡張レイヤ復号手段２２における復号処理を終了し、距離画像復号装置２は、ステップＳ１８に動作を進める。 Here, if the variable i has not reached the number of enhancement layers (here, “4”) (No in step S17), the enhancement layer decoding unit 22 returns to step S14 and returns to the enhancement layer of the next layer. Decode the code stream.
On the other hand, when the variable i has reached the number of enhancement layers (Yes in step S17), the decoding process in the enhancement layer decoding means 22 ends, and the distance image decoding device 2 advances the operation to step S18.

ステップＳ１８において、画像合成手段２３は、ステップＳ１１で復号された基本レイヤ復号画像と、ステップＳ１４でフレーム期間内に復号された拡張レイヤ復号画とを合成することで、距離画像を生成する。
ここで、次フレームの符号化ストリームがさらに入力される場合（ステップＳ１９でＹｅｓ）、距離画像復号装置２は、ステップＳ１０に戻って動作を続ける。
一方、入力が終了した場合（ステップＳ１９でＮｏ）、距離画像復号装置２は、動作を終了する。
なお、復号対象の距離画像が、静止画像の撮影画像に対応した画像の場合、ステップＳ９の動作は省略することができる。また、ステップＳ１２，Ｓ１５の動作も省略することができる。 In step S18, the image synthesizing unit 23 synthesizes the base layer decoded image decoded in step S11 and the enhancement layer decoded image decoded in the frame period in step S14 to generate a distance image.
Here, when the encoded stream of the next frame is further input (Yes in step S19), the distance image decoding device 2 returns to step S10 and continues the operation.
On the other hand, when the input is completed (No in step S19), the distance image decoding device 2 ends the operation.
If the distance image to be decoded is an image corresponding to the captured image of the still image, the operation of step S9 can be omitted. Also, the operations of steps S12 and S15 can be omitted.

〔変形例〕
以上、本発明の実施形態に係る距離画像符号化装置１および距離画像復号装置２の構成および動作について説明したが、本発明は、この実施形態に限定されるものではない。 [Modification]
Although the configurations and operations of the range image encoding device 1 and the range image decoding device 2 according to the embodiment of the present invention have been described above, the present invention is not limited to this embodiment.

（変形例その１）
ここでは、距離画像符号化装置１の閾値設定手段１０は、視点位置から直近の閾値を設定し、それよりも遠方の奥行値を等分するように閾値を設定した。
しかし、閾値は、図８に示すように、奥行きが奥に行くほど、奥行範囲を広くするように設定してもよい。この場合、視点位置から直近の閾値Ｔ_１は、少なくとも奥行最大値を奥行階層数で除算した値よりも小さい値とする。
具体的には、閾値設定手段１０は、閾値Ｔ_１を設定後、以下の式（２）により、閾値Ｔ_ｉ（ｉは２以上奥行階層数未満の整数）を設定する。ここで、ｄ_１は視点位置から区分した１番目の奥行範囲の距離を示す。すなわち、ｄ_１＝Ｔ_１である。また、ｒは“１”より大きい整数を示す。 (Modification 1)
Here, the threshold value setting means 10 of the distance image encoding device 1 sets the threshold value closest to the viewpoint position, and sets the threshold value so as to equally divide the depth value farther than that.
However, as shown in FIG. 8, the threshold value may be set such that the depth range becomes wider as the depth becomes deeper. In this case, the threshold value T ₁ closest to the viewpoint position is at least smaller than the value obtained by dividing the maximum depth value by the number of depth layers.
Specifically, the threshold setting means 10 sets the threshold T ₁ and then sets the threshold T _i (i is an integer of 2 or more and less than the number of depth layers) by the following equation (2). Here, d ₁ indicates the distance of the first depth range divided from the viewpoint position. That is, d ₁ =T ₁ . Further, r represents an integer larger than “1”.

奥行最大値をＤ_ｍａｘとしたとき、奥行範囲の距離ｄ_ｊは、以下の式（３）の関係を有する。なお、ｎは奥行階層数を示し、ｊは視点位置からの奥行範囲の順番を示す番号（１以上ｎ以下）を示す。また、ａ＝ｄ_１＝Ｔ_１である。 When the depth maximum value is D _max , the depth range distance d _j has the relationship of the following expression (3). Note that n indicates the number of depth layers, and j indicates a number (1 or more and n or less) indicating the order of the depth range from the viewpoint position. Also, a=d ₁ =T ₁ .

よって、式（２）のｒは、以下の式（４）の方程式の解として求めることができる。 Therefore, r of the equation (2) can be obtained as a solution of the equation of the following equation (4).

（変形例その２）
ここでは、距離画像符号化装置１の閾値設定手段１０は、距離画像から、あるいは、外部から指定されることで閾値を設定した。
しかし、これらの閾値は、距離画像符号化装置１の内部メモリ等に予め設定されているものとしてもよい。この場合、距離画像符号化装置１は、構成から閾値設定手段１０を省略してもよい。また、この場合、領域区分手段１１は、予め設定されている閾値によって、距離画像の領域を区分すればよい。 (Modification 2)
Here, the threshold value setting unit 10 of the distance image encoding device 1 sets the threshold value by being specified from the distance image or externally.
However, these thresholds may be set in advance in the internal memory of the distance image encoding device 1 or the like. In this case, the distance image encoding device 1 may omit the threshold setting means 10 from the configuration. Further, in this case, the area dividing unit 11 may divide the area of the distance image by a preset threshold value.

（変形例その３）
ここでは、距離画像符号化装置１の奥行別差分画像生成手段１５を、拡張レイヤ数に応じた複数の奥行別差分画像生成手段１５Ａ，１５Ｂ，１５Ｃ，１５Ｄで、並列に動作させる構成とした。また、拡張レイヤ符号化手段１６も、拡張レイヤ数に応じた複数の拡張レイヤ符号化手段１６Ａ，１６Ｂ，１６Ｃ，１６Ｄで、並列に動作させる構成とした。
しかし、奥行別差分画像生成手段１５および拡張レイヤ符号化手段１６は、それぞれ、必ずしも並列に動作を行う構成とする必要はない。
すなわち、奥行別差分画像生成手段１５および拡張レイヤ符号化手段１６は、それぞれ、単一の構成とし、拡張レイヤごとに順番に動作することとしてもよい。 (Modification 3)
Here, the depth-difference image generation means 15 of the distance image encoding device 1 is configured to be operated in parallel by a plurality of depth-difference image generation means 15A, 15B, 15C, 15D according to the number of enhancement layers. Further, the enhancement layer coding means 16 is also configured to be operated in parallel by the plurality of enhancement layer coding means 16A, 16B, 16C, 16D according to the number of enhancement layers.
However, the depth-based difference image generation unit 15 and the enhancement layer encoding unit 16 do not necessarily have to be configured to operate in parallel.
That is, the depth-by-depth difference image generating unit 15 and the enhancement layer encoding unit 16 may each have a single configuration and operate sequentially for each enhancement layer.

（変形例その４）
ここで、距離画像復号装置２の拡張レイヤ復号手段２２を、拡張レイヤ数に応じた複数の拡張レイヤ復号手段２２Ａ，２２Ｂ，２２Ｃ，２２Ｄで、優先度の高い拡張レイヤから順に動作させる構成とした。
しかし、距離画像復号装置２は、ＣＰＵパワー等の性能に余裕があれば、基本レイヤ復号手段２１を含め、拡張レイヤ復号手段２２Ａ，２２Ｂ，２２Ｃ，２２Ｄを、並列に動作させる構成としてもよい。 (Modification 4)
Here, the enhancement layer decoding means 22 of the distance image decoding device 2 is configured to operate in order from the enhancement layer having the highest priority among the plurality of enhancement layer decoding means 22A, 22B, 22C, 22D according to the number of enhancement layers. ..
However, the distance image decoding device 2 may be configured to operate the enhancement layer decoding means 22A, 22B, 22C, 22D including the base layer decoding means 21 in parallel as long as there is a margin in performance such as CPU power.

（変形例その５）
ここでは、距離画像符号化装置１は、奥行別差分画像生成手段１５および拡張レイヤ符号化手段１６において、視点位置からの距離が近い奥行範囲ほど、優先度の高い拡張レイヤに割り当てて符号化した。そして、距離画像符号化装置１は、ストリーム結合手段１７で、基本レイヤ符号化ストリームの次にその優先度に応じて拡張レイヤ符号化ストリームを連結することとした。
しかし、拡張レイヤの優先度は、必ずしも視点位置からの距離が近いことで定める必要はない。例えば、光線再生型の立体ディスプレイで画像を表示する場合、スクリーン面近傍で表示される奥行範囲ほど優先度の高い拡張レイヤに割り当てることが好ましい。また、例えば、インテグラル立体方式で画像を表示する場合、レンズアレイ面近傍で表示される奥行範囲ほど優先度の高い拡張レイヤに割り当てることが好ましい。
これによって、高画質に表示したい奥行範囲を優先的に符号化／復号することができる。 (Modification 5)
Here, in the distance image encoding device 1, in the depth difference image generating means 15 and the enhancement layer encoding means 16, the depth range closer to the viewpoint position is assigned to the enhancement layer having higher priority and encoded. .. Then, in the distance image coding device 1, the stream combining means 17 connects the enhancement layer coded stream next to the base layer coded stream according to its priority.
However, the priority of the enhancement layer does not necessarily need to be set because the distance from the viewpoint position is short. For example, when displaying an image on a ray reproduction type three-dimensional display, it is preferable that the depth range displayed near the screen surface be assigned to an enhancement layer having a higher priority. Further, for example, when displaying an image in the integral stereoscopic method, it is preferable that the depth range displayed near the lens array surface is assigned to the enhancement layer having higher priority.
Thereby, it is possible to preferentially encode/decode the depth range desired to be displayed with high image quality.

１距離画像符号化装置
１０閾値設定手段
１１領域区分手段
１２基本レイヤ符号化手段
１２０直交変換手段
１２１量子化手段
１２２可変長符号化手段
１３ローカル復号手段
１３０逆量子化手段
１３１逆直交変換手段
１４復号差分画像生成手段
１５，１５Ａ，１５Ｂ，１５Ｃ，１５Ｄ奥行別差分画像生成手段
１６，１６Ａ，１６Ｂ，１６Ｃ，１６Ｄ拡張レイヤ符号化手段
１６０直交変換手段
１６１量子化手段
１６２可変長符号化手段
１７ストリーム結合手段
２距離画像復号手段
２０ストリーム分離手段
２１基本レイヤ復号手段
２１０可変長復号手段
２１１逆量子化手段
２１２逆直交変換手段
２２，２２Ａ，２２Ｂ，２２Ｃ，２２Ｄ拡張レイヤ復号手段
２３画像合成手段 DESCRIPTION OF SYMBOLS 1 Distance image coding apparatus 10 Threshold setting means 11 Area dividing means 12 Base layer coding means 120 Orthogonal transformation means 121 Quantization means 122 Variable length coding means 13 Local decoding means 130 Inverse quantization means 131 Inverse orthogonal transformation means 14 Decoding Difference image generating means 15, 15A, 15B, 15C, 15D Depth-by-depth difference image generating means 16, 16A, 16B, 16C, 16D Enhancement layer coding means 160 Orthogonal transformation means 161 Quantization means 162 Variable length coding means 17 Stream combination Means 2 Distance image decoding means 20 Stream separation means 21 Base layer decoding means 210 Variable length decoding means 211 Inverse quantization means 212 Inverse orthogonal transformation means 22, 22A, 22B, 22C, 22D Enhancement layer decoding means 23 Image combining means

Claims

A distance image encoding device that encodes a distance image indicating depth information of a subject,
Base layer coding means for generating a base layer coded stream by converting the distance image into frequency components, quantized in the first quantization step, and variable length coding the quantized coefficients;
Dequantizing the quantized coefficient, local decoding means for generating a decoded image obtained by decoding the distance image by inversely transforming the frequency component,
A decoded difference image generation means for generating a difference between the distance image and the decoded image as a decoded difference image,
Area dividing means for dividing the area of the distance image for each preset depth range;
From the decoded difference image, depth-by-depth difference image generation means for generating an image for each area divided by the area division means as a depth-specific difference image,
For each depth range, the corresponding depth difference image is converted into frequency components, quantized in a second quantization step smaller than the first quantization step, and quantized coefficients are variable-length coded for expansion. Enhancement layer coding means for generating a layer coded stream,
Stream combining means for combining the base layer encoded stream and the enhancement layer encoded stream for each depth range to generate an encoded stream of the distance image,
A range image encoding device comprising:

The distance image coding apparatus according to claim 1, further comprising a threshold setting unit that sets a threshold for dividing the depth range on a predetermined basis.

The area dividing unit generates mask data for dividing the area of the distance image for each depth range,
The distance image encoding according to claim 1 or 2, wherein the depth-specific difference image generation means generates the depth-specific difference image by multiplying the decoded difference image by the mask data. apparatus.

The enhancement layer coding means generates the enhancement layer coded stream from the depth difference image as a layer having a higher priority in a depth range closer to a predetermined depth,
4. The stream combining means connects the enhancement layer coded stream for each depth range to the base layer coded stream in the descending order of priority, according to any one of claims 1 to 3. The range image encoding device according to the item.

A distance image coding program for causing a computer to function as the distance image coding device according to any one of claims 1 to 4.

In the first quantization step, the difference between the base layer encoded stream obtained by encoding the distance image indicating the depth information of the subject and the distance image obtained by decoding the base layer encoded stream and the original distance image is obtained by the first quantization. A distance image decoding device for decoding a coded stream in which a plurality of enhancement layer coded streams for each depth range coded in a second quantization step smaller than the step is concatenated,
Stream separation means for separating the encoded stream into the base layer encoded stream and the plurality of enhancement layer encoded streams;
Base layer decoding means for decoding the base layer coded stream using the first quantization step to generate a base layer decoded image;
Enhancement layer decoding means for decoding the plurality of enhancement layer coded streams using the second quantization step to generate a plurality of enhancement layer decoded images;
An image synthesizing unit that synthesizes the base layer decoded image and the plurality of enhancement layer decoded images to generate a distance image by decoding the encoded stream,
A range image decoding device comprising:

The encoded stream is a stream obtained by encoding a distance image corresponding to a captured image of a moving image,
The enhancement layer decoding means decodes the enhancement layer coded streams that are linked to the coded stream in a linking order,
7. The distance image decoding device according to claim 6, wherein the image synthesizing unit synthesizes the base layer decoded image and the enhancement layer decoded image decoded within a frame period.

A distance image decoding program for causing a computer to function as the distance image decoding device according to claim 6.