JP7489253B2

JP7489253B2 - Depth map generating device and program thereof, and depth map generating system

Info

Publication number: JP7489253B2
Application number: JP2020127411A
Authority: JP
Inventors: 正規加納; 真宏河北
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2020-07-28
Filing date: 2020-07-28
Publication date: 2024-05-23
Anticipated expiration: 2040-07-28
Also published as: JP2022024688A

Description

本発明は、デプスマップを生成するデプスマップ生成装置及びそのプログラム、並びに、デプスマップ生成システムに関する。 The present invention relates to a depth map generating device and a program for generating a depth map, and a depth map generating system.

近年、空間中に存在する被写体の三次元形状（デプスマップ）を取得する技術が盛んに研究されている。この技術は、三次元映像制作、ＡＲ（Augmented Reality）、ＶＲ（Virtual Reality）、ロボティクスなど様々な分野への適用が期待されている。被写体の三次元形状を取得するアプローチとしては、能動的な手法と受動的な手法に大別される（非特許文献１）。 In recent years, there has been active research into technology for acquiring the three-dimensional shape (depth map) of a subject existing in space. This technology is expected to be applied to a variety of fields, including three-dimensional video production, augmented reality (AR), virtual reality (VR), and robotics. Approaches for acquiring the three-dimensional shape of a subject can be broadly divided into active and passive methods (Non-Patent Document 1).

能動的な手法は、計測装置が光源を有し、被写体からの反射光を利用して奥行き（デプス）を計測するものである。具体的な手法としては、パターン光投影、光飛行時間法（ＴｏＦ:Time of Flight）、照度差ステレオ法がある。これらの中で近年注目されているのが、ＴｏＦカメラを用いた手法である。ＴｏＦカメラは、光源から照射した光が被写体で反射して戻るまでの時間を計測することで、ＴｏＦカメラから被写体までの距離を求める。能動的な手法のメリットは、高度な計算処理を行うことなくリアルタイムで高精度な距離が得られることである。一方、能動的な手法のデメリットは、外乱光に弱い、被写体の反射率や距離によっては測定誤差が生じる、スケールの校正が必要な場合があることである。 In active methods, the measuring device has a light source and uses reflected light from the subject to measure depth. Specific methods include pattern light projection, time of flight (ToF), and photometric stereo. Of these, the method using a ToF camera has been attracting attention in recent years. A ToF camera determines the distance from the subject to the ToF camera by measuring the time it takes for light emitted from a light source to be reflected by the subject and return. The advantage of active methods is that they can obtain highly accurate distances in real time without the need for advanced calculation processing. On the other hand, the disadvantages of active methods are that they are vulnerable to external light, measurement errors can occur depending on the reflectance and distance of the subject, and scale calibration may be required.

受動的な手法は、複数台のカラーカメラ（以降、「ＲＧＢカメラ」）、又は１台のＲＧＢカメラを移動させて、その視差から奥行き距離を計測するものである。具体的な手法としては、ステレオ法（多眼ステレオ）、モーションステレオがある。これらの原理はステレオ法であり、２台以上のカメラの視差からデプスを計算する。受動的な手法のメリットは、被写体に特殊な光を照射する必要がない、外乱光の影響を受けない、一般的なカラーカメラとコンピュータだけで実現できることである。一方、受動的な手法のデメリットは、得られるデプスに曖昧さが残る（テクスチャレス、オクルージョン領域）、計算コストが高くなることである。 Passive methods involve moving multiple color cameras (hereafter referred to as "RGB cameras") or a single RGB camera and measuring the depth distance from the parallax. Specific methods include the stereo method (multiple-eye stereo) and motion stereo. These methods are based on the same principle as stereo methods, which calculate depth from the parallax of two or more cameras. The advantages of passive methods are that they do not require special light to be shone on the subject, they are not affected by external light, and they can be achieved with just a general color camera and a computer. On the other hand, the disadvantages of passive methods are that the obtained depth can be ambiguous (texture-less, occlusion areas), and the calculation costs are high.

その他、ＲＧＢカメラとデプスカメラを同一光軸上に配置し、レンズアレイを用いて、複数視点分のＲＧＢ画像及びデプス画像を取得できるＲＧＢ－Ｄカメラが知られている（特許文献１）。この手法では、カメラレンズから入射した光線をミラー（例えば、ハーフミラーやダイクロイックミラー）で分光し、ＲＧＢカメラとデプスカメラで受光する。 Another known camera is an RGB-D camera that can acquire RGB images and depth images from multiple viewpoints by arranging an RGB camera and a depth camera on the same optical axis and using a lens array (Patent Document 1). In this method, the light entering through the camera lens is split by a mirror (e.g., a half mirror or a dichroic mirror) and received by the RGB camera and the depth camera.

特開２００９－３００２６８号公報JP 2009-300268 A

ディジタル画像処理（改訂新版）、ＣＧ－ＡＲＴＳ協会、２０１５年Digital Image Processing (Revised New Edition), CG-ARTS Association, 2015

前記したように、三次元形状の取得は、その応用できる分野が広いため、様々な手法が提案されているが、未だ確立されていない。汎用的な目的を考えると、１視点のカラー画像（以降、ＲＧＢ画像）とデプスマップのみでなく、様々な視点のＲＧＢ画像とデプスマップがあると使い勝手がよい。つまり、複数視点のＲＧＢ画像及びデプスマップのセットがあると、汎用性が向上する。 As mentioned above, since obtaining three-dimensional shapes can be applied in a wide range of fields, various methods have been proposed, but none have yet been established. For general-purpose purposes, it is more convenient to have RGB images and depth maps from various viewpoints, rather than just a color image (hereafter referred to as an RGB image) and depth map from one viewpoint. In other words, having a set of RGB images and depth maps from multiple viewpoints improves versatility.

また、デプスマップの精度も重要である。ＲＧＢ－Ｄカメラで得られるデプス画像は、画素値（輝度値）で表されているため、この画素値を実スケールのデプスマップに変換する必要がある。しかし、実スケールへの変換関数が、デプスマップの精度に大きな影響を与える。さらに、デプスマップの精度は、撮影環境や被写体の種類によっても影響される。なお、実スケールとは、実空間上の距離（奥行き）のことである。 The accuracy of the depth map is also important. Depth images obtained with RGB-D cameras are represented by pixel values (brightness values), so these pixel values must be converted into a real-scale depth map. However, the conversion function to real scale has a significant impact on the accuracy of the depth map. Furthermore, the accuracy of the depth map is also affected by the shooting environment and the type of subject. Note that real scale refers to the distance (depth) in real space.

本発明は、前記した問題を解決し、複数視点の撮影画像及び高精度なデプスマップを容易に取得できるデプスマップ生成装置及びそのプログラム、並びに、デプスマップ生成システムを提供することを課題とする。 The present invention aims to solve the above problems and provide a depth map generation device and program thereof, as well as a depth map generation system, that can easily obtain images captured from multiple viewpoints and highly accurate depth maps.

前記課題を解決するため、本発明に係るデプスマップ生成装置は、同一光軸の撮影カメラ及びデプスカメラと光学素子アレイとで構成された撮影装置が各視点で被写体を撮影した撮影画像及びデプス画像を用いて、各視点の撮影画像に対応したデプスマップを生成するデプスマップ生成装置であって、コストボリューム生成手段と、奥行き変換手段と、コストウェイト算出手段と、ビジビリティウェイト算出手段と、ウェイト適用手段と、最終デプスマップ生成手段と、を備える構成とした。 To solve the above problem, the depth map generating device according to the present invention is a depth map generating device that generates a depth map corresponding to the captured images at each viewpoint using captured images and depth images captured by an imaging device consisting of a imaging camera and a depth camera with the same optical axis and an optical element array, and is configured to include a cost volume generating means, a depth conversion means, a cost weight calculation means, a visibility weight calculation means, a weight application means, and a final depth map generating means.

かかる構成によれば、コストボリューム生成手段は、奥行き方向で所定間隔の奥行きレイヤ及び撮影画像の画素位置毎に、奥行きレイヤに投影された撮影画像間の類似度を表すコストを算出し、コストを奥行きレイヤ及び画素位置で三次元配列したコストボリュームを生成する。
奥行き変換手段は、デプス画像の各画素の画素値を奥行きに変換する奥行き変換関数により、デプス画像を中間デプスマップに変換する。
コストウェイト算出手段は、中間デプスマップの重みを正規分布関数で表したコストウェイトを算出する。 According to this configuration, the cost volume generation means calculates a cost representing the similarity between the captured images projected onto the depth layer for each pixel position of the captured image and the depth layer at a predetermined interval in the depth direction, and generates a cost volume in which the costs are arranged three-dimensionally by depth layer and pixel position.
The depth conversion means converts the depth image into an intermediate depth map by a depth conversion function that converts the pixel value of each pixel of the depth image into a depth.
The cost weight calculation means calculates a cost weight that represents the weight of the intermediate depth map using a normal distribution function.

また、ビジビリティウェイト算出手段は、中間デプスマップから、オクルージョン発生時にコストを低下させるビジビリティウェイトを算出する。
ウェイト適用手段は、コストボリュームにコストウェイト及びビジビリティウェイトを適用する。
最終デプスマップ生成手段は、ウェイト適用後のコストボリュームで同一画素位置のコスト列において、コストが最小となる奥行きレイヤのデプスを示す最終デプスマップを生成する。 Also, the visibility weight calculation means calculates, from the intermediate depth map, a visibility weight that reduces the cost when occlusion occurs.
The weight application means applies a cost weight and a visibility weight to the cost volume.
The final depth map generating means generates a final depth map indicating the depth of the depth layer having the smallest cost in a cost sequence at the same pixel position in the cost volume after the weights are applied.

すなわち、デプスマップ生成装置は、デプス画像から生成したデプスマップに基づいて、撮影画像から生成したコストボリュームを２つのウェイトで制約するリファインメント処理を行う。このリファインメント処理によって、デプスマップ生成装置は、各視点の撮影画像に対応した高精度なデプスマップを生成できる。 In other words, the depth map generating device performs a refinement process that constrains the cost volume generated from the captured image with two weights based on the depth map generated from the depth image. This refinement process enables the depth map generating device to generate a highly accurate depth map that corresponds to the captured image from each viewpoint.

なお、本発明は、コンピュータを、前記したデプスマップ生成装置として機能させるためのプログラムで実現することができる。 The present invention can be realized by a program that causes a computer to function as the depth map generating device described above.

また、本発明は、同一光軸の撮影カメラ及びデプスカメラと光学素子アレイとで構成された撮影装置と、前記したデプスマップ生成装置と、を備えることを特徴とするデプスマップ生成システムで実現することもできる。 The present invention can also be realized in a depth map generation system that includes an imaging device that is composed of a imaging camera and a depth camera with the same optical axis and an optical element array, and the depth map generation device described above.

本発明によれば、複数視点の撮影画像及び高精度なデプスマップを容易に取得できる。 The present invention makes it easy to obtain images captured from multiple viewpoints and highly accurate depth maps.

実施形態に係る三次元形状取得システムの全体構成図である。1 is an overall configuration diagram of a three-dimensional shape acquisition system according to an embodiment. 実施形態に係る三次元形状取得装置の構成を示すブロック図である。1 is a block diagram showing a configuration of a three-dimensional shape acquisition apparatus according to an embodiment. ＲＧＢ－Ｄカメラによる校正パターンの撮影を説明する説明図であり、（ａ）は校正データＡを示し、（ｂ）は校正データＢを示す。5A and 5B are explanatory diagrams for explaining the photographing of a calibration pattern by an RGB-D camera, in which (a) shows calibration data A and (b) shows calibration data B. 校正パターンを撮影した画像の分割を説明する説明図であり、（ａ）はＲＧＢ画像を示し、（ｂ）はデプス画像を示す。1A and 1B are diagrams illustrating division of an image obtained by capturing a calibration pattern, in which (a) shows an RGB image and (b) shows a depth image. スケール変換関数の算出を説明する説明図であり、（ａ）は仮想カメラから校正パターンまでの距離を示し、（ｂ）はスケール変換関数の一例を示す。5A and 5B are diagrams illustrating calculation of a scale conversion function, in which FIG. 5A shows a distance from a virtual camera to a calibration pattern, and FIG. 5B shows an example of the scale conversion function. 被写体を撮影した画像の分割を説明する説明図であり、（ａ）はＲＧＢ画像を示し、（ｂ）はデプス画像を示す。1A and 1B are diagrams illustrating division of an image of a subject, where (a) shows an RGB image and (b) shows a depth image. 奥行きレイヤの一例を説明する説明図である。FIG. 11 is an explanatory diagram illustrating an example of a depth layer. コストボリュームを説明する説明図である。FIG. 13 is an explanatory diagram for explaining cost volume. 正規分布関数を説明する説明図である。FIG. 1 is an explanatory diagram for explaining a normal distribution function. （ａ）はコストウェイト関数の一例を説明する説明図であり、（ｂ）はビジビリティ関数の一例を説明する説明図である。FIG. 4A is an explanatory diagram illustrating an example of a cost weight function, and FIG. 4B is an explanatory diagram illustrating an example of a visibility function. 実施形態において、カメラ校正処理を示すフローチャートである。4 is a flowchart showing a camera calibration process in the embodiment. 実施形態において、リファインメント手理を示すフローチャートである。4 is a flow chart illustrating a refinement process in an embodiment.

以下、本発明の実施形態について図面を参照して説明する。但し、以下に説明する実施形態は、本発明の技術思想を具体化するためのものであって、特定的な記載がない限り、本発明を以下のものに限定しない。 The following describes an embodiment of the present invention with reference to the drawings. However, the embodiment described below is intended to embody the technical concept of the present invention, and unless otherwise specified, the present invention is not limited to the following.

［三次元形状取得システムの概要］
図１を参照し、実施形態に係る三次元形状取得システム（デプスマップ生成システム）１の概要について説明する。
三次元形状取得システム１は、被写体９について、複数視点のＲＧＢ画像（撮影画像）及びデプスマップと、仮想カメラＣのカメラパラメータとを取得するものである。図１に示すように、三次元形状取得システム１は、ＲＧＢ－Ｄカメラ（撮影装置）２と、三次元形状取得装置（デプスマップ生成装置）３とを備える。 [Overview of 3D shape acquisition system]
An overview of a three-dimensional shape acquisition system (depth map generation system) 1 according to an embodiment will be described with reference to FIG.
The three-dimensional shape acquisition system 1 acquires RGB images (captured images) and a depth map from multiple viewpoints of a subject 9, and camera parameters of a virtual camera C. As shown in FIG 1, the three-dimensional shape acquisition system 1 includes an RGB-D camera (capturing device) 2 and a three-dimensional shape acquisition device (depth map generating device) 3.

複数視点で撮影するために多数のＲＧＢカメラ及びデプスカメラを配置した場合、システムが大規模となり、コストが高くなる。そこで、三次元形状取得システム１では、後記する１台のＲＧＢ－Ｄカメラ（撮影装置）２により、多数のＲＧＢカメラ及びデプスカメラを配置したのと同等の構成を実現し、システム構成を簡略化できる。 When multiple RGB cameras and depth cameras are arranged to capture images from multiple viewpoints, the system becomes large-scale and expensive. Therefore, the 3D shape acquisition system 1 uses a single RGB-D camera (image capture device) 2 (described below) to achieve a configuration equivalent to that of multiple RGB cameras and depth cameras, simplifying the system configuration.

三次元映像制作などの分野では、仮想カメラＣのカメラパラメータが必要となる。さらに、デプス画像は画素値（輝度値）で表されているため、この画素値を実スケールのデプスマップに変換するスケール変換関数も必要となる。そこで、三次元形状取得システム１では、三次元形状取得装置３によって、校正パターンを用いたカメラ校正処理を行って、仮想カメラＣのカメラパラメータとスケール変換関数を算出する。 In fields such as three-dimensional video production, the camera parameters of the virtual camera C are required. Furthermore, because a depth image is represented by pixel values (brightness values), a scale conversion function is also required to convert these pixel values into a real-scale depth map. Therefore, in the three-dimensional shape acquisition system 1, the three-dimensional shape acquisition device 3 performs a camera calibration process using a calibration pattern to calculate the camera parameters and scale conversion function of the virtual camera C.

デプスマップの精度も重要である。前記したように、スケール変換関数が、デプスマップの精度に大きな影響を与えてしまう。さらに、デプスマップの精度は、撮影環境や被写体の種類によって大きく低下する。そこで、三次元形状取得システム１では、後記する三次元形状取得装置３によって、複数視点のＲＧＢ画像及びデプス画像を用いて、デプスマップの精度を改善する（リファインメント処理）。このとき、三次元形状取得装置３では、１台のＲＧＢ－Ｄカメラ２で撮影した１枚のＲＧＢ画像を視点毎に分割してマッチングするため、複数台のＲＧＢカメラで撮影した画像をマッチングする場合に比べ、色の差に起因するエラーを抑制できる。 The accuracy of the depth map is also important. As mentioned above, the scale conversion function has a large effect on the accuracy of the depth map. Furthermore, the accuracy of the depth map is greatly reduced depending on the shooting environment and the type of subject. Therefore, in the three-dimensional shape acquisition system 1, the three-dimensional shape acquisition device 3 described below uses RGB images and depth images from multiple viewpoints to improve the accuracy of the depth map (refinement processing). At this time, the three-dimensional shape acquisition device 3 divides one RGB image captured by one RGB-D camera 2 for each viewpoint and matches them, so errors caused by color differences can be suppressed compared to matching images captured by multiple RGB cameras.

最初に、ＲＧＢ－Ｄカメラ２の構成について説明する。次に、三次元形状取得装置３によるカメラ校正処理について説明する。このカメラ校正処理は、各仮想カメラＣのカメラパラメータ、及び、スケール変換関数を算出する処理である。最後に、三次元形状取得装置３による、デプスマップの精度を改善するリファインメント処理について説明する。 First, the configuration of the RGB-D camera 2 will be described. Next, the camera calibration process performed by the three-dimensional shape acquisition device 3 will be described. This camera calibration process is a process for calculating the camera parameters and scale conversion function of each virtual camera C. Finally, the refinement process performed by the three-dimensional shape acquisition device 3 to improve the accuracy of the depth map will be described.

［ＲＧＢ－Ｄカメラの構成］
図１に示すように、ＲＧＢ－Ｄカメラ２は、カメラ本体２０と、レンズ系２１とを備える撮像装置である。本実施形態では、カメラ本体２０は、図示を省略したＲＧＢカメラ及びデプスカメラを同一光軸上に配置したものである。また、カメラ本体２０は、被写体９からの光線を分光素子（不図示）で分光し、分光した光線をＲＧＢカメラ及びデプスカメラでそれぞれ受光する。例えば、ＲＧＢカメラとしては、一般的なカラーカメラがあげられる。また、分光素子としては、ハーフミラー又はダイクロイックミラーがあげられる。 [RGB-D Camera Configuration]
As shown in FIG. 1, the RGB-D camera 2 is an imaging device including a camera body 20 and a lens system 21. In this embodiment, the camera body 20 is an RGB camera and a depth camera (not shown) arranged on the same optical axis. The camera body 20 also disperses light from a subject 9 using a spectroscopic element (not shown), and receives the dispersed light with the RGB camera and the depth camera, respectively. For example, the RGB camera may be a general color camera. The spectroscopic element may be a half mirror or a dichroic mirror.

本実施形態では、デプスカメラとして、ＴｏＦカメラを用いる。このＴｏＦカメラは、距離計測時、被写体９に赤外線を照射するための赤外線ＬＥＤアレイ２５を備える。ＴｏＦカメラが撮影した赤外線画像のフレーム間差分を求めることにより、デプス画像を取得できる。 In this embodiment, a ToF camera is used as the depth camera. This ToF camera is equipped with an infrared LED array 25 for irradiating the subject 9 with infrared rays when measuring distance. A depth image can be obtained by calculating the inter-frame difference of the infrared images captured by the ToF camera.

レンズ系２１は、フレネルレンズ２２と、レンズアレイ（光学素子アレイ）２３とを備える。レンズアレイ２３は、Ｎ_Ｘ×Ｎ_Ｙ個の要素レンズ２４を２次元状に配列したものである。ＲＧＢ－Ｄカメラ２は、このレンズアレイ２３を介することで、Ｎ_Ｘ×Ｎ_Ｙ視点分のＲＧＢ画像及びデプス画像を取得できる。すなわち、ＲＧＢ－Ｄカメラ２は、Ｎ_Ｘ×Ｎ_Ｙ個の仮想カメラＣを配置したのと同等の構成を実現している。本実施形態では、２×２個の要素レンズ２４に対応した４視点（４台の仮想カメラＣ）であることとする。 The lens system 21 includes a Fresnel lens 22 and a lens array (optical element array) 23. The lens array 23 is a two-dimensional array of N _x N _Y element lenses 24. The RGB-D camera 2 can acquire RGB images and depth images for N _x N _Y viewpoints via the lens array 23. That is, the RGB-D camera 2 realizes a configuration equivalent to an arrangement of N _x N _Y virtual cameras C. In this embodiment, it is assumed that there are four viewpoints (four virtual cameras C) corresponding to the 2 x 2 element lenses 24.

なお、カメラ本体２０とレンズ系２１との位置関係を調整すると、仮想カメラＣの画角を調整できる。また、図１では、４台の仮想カメラＣのうち、２台の仮想カメラＣのみを図示した。 The angle of view of the virtual camera C can be adjusted by adjusting the positional relationship between the camera body 20 and the lens system 21. Also, in FIG. 1, only two of the four virtual cameras C are illustrated.

［三次元形状取得装置の構成］
図２を参照し、三次元形状取得装置３の構成について説明する。
三次元形状取得装置３は、ＲＧＢ－Ｄカメラ２が各視点で被写体９を撮影したＲＧＢ画像及びデプス画像を用いて、各視点のＲＧＢ画像に対応したデプスマップを生成するものである。図２に示すように、三次元形状取得装置３は、カメラ校正処理を行うカメラ校正手段４と、リファインメント処理を行うリファインメント手段５とを備える。 [Configuration of the three-dimensional shape acquisition device]
The configuration of the three-dimensional shape acquisition device 3 will be described with reference to FIG.
The three-dimensional shape acquisition device 3 generates a depth map corresponding to the RGB images of each viewpoint by using RGB images and depth images of the subject 9 captured at each viewpoint by the RGB-D camera 2. As shown in Fig. 2, the three-dimensional shape acquisition device 3 includes a camera calibration means 4 that performs a camera calibration process and a refinement means 5 that performs a refinement process.

＜カメラ校正手段＞
カメラ校正手段４は、２種類のパラメータを推定する。一つ目は、仮想カメラＣのカメラパラメータである。仮想カメラＣのカメラパラメータは、レンズの焦点距離、レンズ歪み、仮想カメラＣの位置や姿勢など表す。二つ目は、各仮想カメラＣのスケール変換関数である。さらに、カメラ校正手段４は、必要に応じて、ＲＧＢ画像及びデプス画像の画角補正を行う。なお、カメラ校正手段４は、撮影の都度、カメラ校正処理を行う必要がなく、ＲＧＢ－Ｄカメラ２の焦点距離やＲＧＢ－Ｄカメラ２とフレネルレンズ２２とレンズアレイ２３との位置・姿勢の関係が変化したときにカメラ校正処理を行えばよい。 <Camera calibration method>
The camera calibration means 4 estimates two types of parameters. The first is the camera parameters of the virtual camera C. The camera parameters of the virtual camera C represent the focal length of the lens, the lens distortion, the position and the orientation of the virtual camera C, and the like. The second is a scale conversion function of each virtual camera C. Furthermore, the camera calibration means 4 corrects the angle of view of the RGB image and the depth image as necessary. Note that the camera calibration means 4 does not need to perform camera calibration processing every time shooting is performed, and it is sufficient to perform camera calibration processing when the focal length of the RGB-D camera 2 or the relationship between the position and orientation of the RGB-D camera 2, the Fresnel lens 22, and the lens array 23 changes.

図３（ａ）に示すように、カメラ校正手段４には、ＲＧＢ－Ｄカメラ２で校正パターン９０を撮影したＲＧＢ画像及びデプス画像が入力される。校正パターン９０は、平面状で特徴点の配置が既知のパターンである（例えば、チェスボードパターン）。このとき、ＲＧＢ－Ｄカメラ２は、校正パターン９０の姿勢を２回以上変更して撮影する（破線で図示）。なお、ＲＧＢ－Ｄカメラ２は、内部パラメータのスキューを０以外とする場合、校正パターン９０の姿勢を３回以上変更して撮影する。図３（ａ）に示すように、レンズ系２１を配置して撮影したＲＧＢ画像及びデプス画像を校正データＡと呼ぶ。前記した画角補正を行う場合、図３（ｂ）に示すように、レンズ系２１を外して校正パターン９０を撮影する。このように、レンズ系２１を外して撮影したＲＧＢ画像及びデプス画像を校正データＢと呼ぶ。 As shown in FIG. 3(a), the camera calibration means 4 receives an RGB image and a depth image captured by the RGB-D camera 2 of a calibration pattern 90. The calibration pattern 90 is a planar pattern with a known arrangement of feature points (for example, a chessboard pattern). At this time, the RGB-D camera 2 captures the calibration pattern 90 by changing its posture two or more times (shown by the dashed line). Note that, when the skew of the internal parameters is other than 0, the RGB-D camera 2 captures the calibration pattern 90 by changing its posture three or more times. As shown in FIG. 3(a), the RGB image and the depth image captured by disposing the lens system 21 are called calibration data A. When performing the above-mentioned angle of view correction, the calibration pattern 90 is captured by removing the lens system 21 as shown in FIG. 3(b). The RGB image and the depth image captured by removing the lens system 21 in this way are called calibration data B.

図２に示すように、カメラ校正手段４は、画角補正手段４０と、画像分割手段４１と、初期カメラパラメータ算出手段４２と、カメラパラメータ最適化手段４３と、スケール変換関数算出手段（奥行き変換関数算出手段）４４とを備える。 As shown in FIG. 2, the camera calibration means 4 includes an angle of view correction means 40, an image division means 41, an initial camera parameter calculation means 42, a camera parameter optimization means 43, and a scale conversion function calculation means (depth conversion function calculation means) 44.

画角補正手段４０は、ＲＧＢ－Ｄカメラ２から入力されたデプス画像の画角がＲＧＢ画像の画角に一致するように、デプス画像を射影変換するものである。ＲＧＢ－Ｄカメラ２の取り付け精度に起因して、ＲＧＢカメラで撮影したＲＧＢ画像とデプスカメラで撮影したデプス画像との画角が微妙にずれることがある。このため、画角補正手段４０は、校正データＢを用いて、この微妙な画角のずれを補正する。具体的には、画角補正手段４０は、ＲＧＢ画像及びデプス画像の間で４点以上の対応点（校正パターン９０の特徴点）を基準として、ホモグラフィ行列を算出する（参考文献１）。そして、画角補正手段４０は、このホモグラフィ行列によりデプス画像を射影変換することで、デプス画像の画角をＲＧＢ画像の画角に一致させる。
なお、画角補正手段４０は、ＲＧＢカメラ及びデプスカメラの画角が一致している場合、前記した画角補正処理を行う必要がない。 The angle of view correction means 40 performs projective transformation on the depth image so that the angle of view of the depth image input from the RGB-D camera 2 coincides with the angle of view of the RGB image. Due to the installation accuracy of the RGB-D camera 2, the angle of view of the RGB image captured by the RGB camera and the depth image captured by the depth camera may be slightly shifted. For this reason, the angle of view correction means 40 corrects this slight shift in the angle of view using the calibration data B. Specifically, the angle of view correction means 40 calculates a homography matrix based on four or more corresponding points (feature points of the calibration pattern 90) between the RGB image and the depth image (Reference 1). Then, the angle of view correction means 40 performs projective transformation on the depth image using this homography matrix to make the angle of view of the depth image coincide with the angle of view of the RGB image.
It should be noted that, when the angles of view of the RGB camera and the depth camera are the same, the angle-of-view correction means 40 does not need to perform the angle-of-view correction process described above.

参考文献１：“ＯｐｅｎＣＶ”,［online］、［令和２年６月２４日検索］、インターネット〈URL：https://opencv.org/〉 Reference 1: "OpenCV", [online], [searched June 24, 2020], Internet <URL: https://opencv.org/>

また、画角補正手段４０は、校正データＢを用いて、レンズ歪みを除去できる。例えば、画角補正手段４０は、Ｚｈａｎｇの手法により、ＲＧＢ－Ｄカメラ２のレンズ歪み係数を算出し、ＲＧＢ画像及びデプス画像からレンズ歪みを除去する（参考文献２）。 The angle-of-view correction means 40 can also remove lens distortion using the calibration data B. For example, the angle-of-view correction means 40 calculates the lens distortion coefficient of the RGB-D camera 2 using Zhang's method, and removes the lens distortion from the RGB image and the depth image (Reference 2).

参考文献２：Z. Zhang, “A flexible new technique for camera calibration”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 11, pp. 1330-1334 (2000) Reference 2: Z. Zhang, “A flexible new technique for camera calibration”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 11, pp. 1330-1334 (2000)

画像分割手段４１は、画角補正手段４０から入力されたＲＧＢ画像及びデプス画像を視点（要素レンズ２４）毎に分割するものである。つまり、画像分割手段４１は、ＲＧＢ画像及びデプス画像を仮想カメラＣ毎に分割することで、仮想カメラＣで仮想的に撮影したＲＧＢ画像及びデプス画像を生成する。本実施形態では、画像分割手段４１は、図４（ａ）及び（ｂ）に示すように、ＲＧＢ画像Ｐ_Ｃ及びデプス画像Ｐ_Ｄを４分割する。 The image division means 41 divides the RGB image and the depth image input from the angle of view correction means 40 for each viewpoint (element lens 24). That is, the image division means 41 divides the RGB image and the depth image for each virtual camera C to generate an RGB image and a depth image virtually captured by the virtual camera C. In this embodiment, the image division means 41 divides the RGB image P _C and the depth image P _D into four, as shown in Figures 4(a) and 4(b).

なお、ＲＧＢ画像Ｐ_Ｃ及びデプス画像Ｐ_Ｄを分割する領域αは、手動で設定する。このとき、分割後のＲＧＢ画像Ｐ_Ｃ及びデプス画像Ｐ_Ｄでは、レンズアレイ２３の外側や要素レンズ２４同士の隙間が不要なので、これら不要領域を分割せずともよい。以後の説明を簡易にするため、分割後のＲＧＢ画像Ｐ_Ｃ及びデプス画像Ｐ_Ｄは、同一の画像サイズであることとする。 The region α into which the RGB image P _C and the depth image P _D are divided is set manually. In this case, since the outside of the lens array 23 and the gaps between the element lenses 24 are not necessary in the divided RGB image P _C and the depth image P _D , it is not necessary to divide these unnecessary regions. To simplify the following explanation, it is assumed that the divided RGB image P _C and the depth image P _D have the same image size.

初期カメラパラメータ算出手段４２は、画像分割手段４１から入力された各視点のＲＧＢ画像Ｐ_Ｃにカメラ校正処理を施すことで、各視点に対応した仮想カメラＣの初期カメラパラメータを算出するものである。例えば、初期カメラパラメータ算出手段４２は、各視点のＲＧＢ画像Ｐ_ＣにＺｈａｎｇの手法を適用し、各仮想カメラＣのカメラパラメータ及び各校正パターン９０の位置・姿勢が含まれる初期カメラパラメータを算出する。 The initial camera parameter calculation means 42 calculates initial camera parameters of the virtual camera C corresponding to each viewpoint by performing camera calibration processing on the RGB image P _C of each viewpoint input from the image division means 41. For example, the initial camera parameter calculation means 42 applies Zhang's method to the RGB image P _C of each viewpoint to calculate initial camera parameters including the camera parameters of each virtual camera C and the position and orientation of each calibration pattern 90.

カメラパラメータ最適化手段４３は、初期カメラパラメータ算出手段４２から入力された初期カメラパラメータを初期値としたカメラ校正処理により、各仮想カメラＣの間でカメラパラメータを最適化するものである。前記した初期カメラパラメータ算出手段４２では、各仮想カメラＣのカメラパラメータを個別に算出していたが、全ての仮想カメラＣの間でカメラパラメータを最適化することで、カメラパラメータの精度が向上する。 The camera parameter optimization means 43 optimizes the camera parameters between each virtual camera C through a camera calibration process using the initial camera parameters input from the initial camera parameter calculation means 42 as initial values. The initial camera parameter calculation means 42 described above calculates the camera parameters of each virtual camera C individually, but optimizing the camera parameters between all virtual cameras C improves the accuracy of the camera parameters.

ここで、校正パターン９０の位置・姿勢を共通のパラメータとする。最適化するカメラパラメータは、各仮想カメラＣのカメラパラメータと、共通化した校正パターン９０の位置・姿勢が含まれる。具体的には、カメラパラメータ最適化手段４３は、各仮想カメラＣのカメラパラメータ及び校正パターン９０の位置・姿勢の平均値を初期値として、初期カメラパラメータに含まれる仮想カメラＣの位置・姿勢を使用する。そして、カメラパラメータ最適化手段４３は、これら初期値をバンドル調整することでカメラパラメータを最適化する。 Here, the position and orientation of the calibration pattern 90 are the common parameters. The camera parameters to be optimized include the camera parameters of each virtual camera C and the position and orientation of the common calibration pattern 90. Specifically, the camera parameter optimization means 43 uses the average values of the camera parameters of each virtual camera C and the position and orientation of the calibration pattern 90 as initial values, and uses the position and orientation of the virtual camera C included in the initial camera parameters. The camera parameter optimization means 43 then optimizes the camera parameters by bundle adjusting these initial values.

スケール変換関数算出手段４４は、カメラパラメータ最適化手段４３より入力されたカメラパラメータが示す仮想カメラＣの位置から校正パターン９０までの距離をデプス画像Ｐ_Ｄの各画素の画素値に対応させることで、スケール変換関数を算出するものである。すなわち、スケール変換関数算出手段４４は、デプス画像Ｐ_Ｄを実スケールのデプスマップに変換するためのスケール変換関数を算出する。前記したように、カメラパラメータにおいて、仮想カメラＣの位置・姿勢と校正パターン９０の位置・姿勢とが既知のため、仮想カメラＣから校正パターン９０までの距離ｒが実スケールで算出できる。 The scale conversion function calculation means 44 calculates a scale conversion function by associating the distance from the position of the virtual camera C indicated by the camera parameters input from the camera parameter optimization means 43 to the calibration pattern 90 with the pixel value of each pixel of the depth image P _D. That is, the scale conversion function calculation means 44 calculates a scale conversion function for converting the depth image P _D into a real-scale depth map. As described above, since the position and orientation of the virtual camera C and the position and orientation of the calibration pattern 90 are known in the camera parameters, the distance r from the virtual camera C to the calibration pattern 90 can be calculated in real scale.

具体的には、スケール変換関数算出手段４４は、図５（ａ）に示すように、仮想カメラＣから校正パターン９０までの距離ｒと、デプス画像Ｐ_Ｄの各画素の輝度値ｑ（画素値）とを対応づける。このとき、デプス画像Ｐ_Ｄに含まれる校正パターン９０では、黒色模様の部分で反射率が低下するため、正確な対応付けが困難である。このため、スケール変換関数算出手段４４は、デプス画像Ｐ_Ｄに含まれる校正パターン９０の白色部分のみで対応付けを行うことが好ましい。ここで、スケール変換関数算出手段４４は、校正パターン９０を撮影した全てのデプス画像Ｐ_Ｄで対応付けを行うことで、図５（ｂ）に示すようにグラフが得られる。そして、スケール変換関数算出手段４４は、このグラフを関数（例えば、５次関数）で近似することで、スケール変換関数ｈ（ｑ）を算出できる。なお、スケール変換関数算出手段４４は、このグラフをスケール変換関数で近似せず、ルックアップデーブルとしてもよい。 Specifically, as shown in FIG. 5A, the scale conversion function calculation means 44 associates the distance r from the virtual camera C to the calibration pattern 90 with the brightness value q (pixel value) of each pixel of the depth image P _D. At this time, the calibration pattern 90 included in the depth image P _D has a low reflectance in the black pattern portion, making it difficult to perform accurate association. For this reason, it is preferable that the scale conversion function calculation means 44 performs association only in the white portion of the calibration pattern 90 included in the depth image P _D. Here, the scale conversion function calculation means 44 performs association in all depth images P _D in which the calibration pattern 90 is captured, thereby obtaining a graph as shown in FIG. 5B. Then, the scale conversion function calculation means 44 can calculate the scale conversion function h(q) by approximating this graph with a function (for example, a quintic function). Note that the scale conversion function calculation means 44 may use a look-up table instead of approximating this graph with a scale conversion function.

その後、カメラ校正手段４は、算出したスケール変換関数をスケール変換手段５４に出力し、仮想カメラＣのカメラパラメータをコストボリューム生成手段５１及びウェイト適用手段５９に出力する。 Then, the camera calibration means 4 outputs the calculated scale conversion function to the scale conversion means 54, and outputs the camera parameters of the virtual camera C to the cost volume generation means 51 and the weight application means 59.

＜リファインメント手段＞
リファインメント手段５は、ＲＧＢ－Ｄカメラ２で被写体９を撮影したＲＧＢ画像Ｐ_Ｃ及びデプス画像Ｐ_Ｄが入力される。そして、リファインメント手段５は、デプス画像Ｐ_Ｄから生成したデプスマップに基づいて、ＲＧＢ画像Ｐ_Ｃから生成したコストボリュームを２つのウェイトで制約することで、デプスマップの精度を向上させる。なお、リファインメント手段５は、撮影の都度、リファインメント処理を行う。 <Refinement Means>
The refinement means 5 receives an RGB image P _C and a depth image P _D captured by the RGB-D camera 2 of a subject 9. The refinement means 5 improves the accuracy of the depth map by restricting the cost volume generated from the RGB image P _C with two weights based on the depth map generated from the depth image P _D. The refinement means 5 performs refinement processing each time an image is captured.

図２に示すように、リファインメント手段５は、画像分割手段５０と、コストボリューム生成手段５１と、初期デプスマップ生成手段５２と、平滑化手段５３と、スケール変換手段（奥行き変換手段）５４と、レイヤ化処理手段５５と、スケール補正手段（中間デプスマップ補正手段）５６と、コストウェイト算出手段５７と、ビジビリティウェイト算出手段５８と、ウェイト適用手段５９と、最終デプスマップ生成手段６０とを備える。 As shown in FIG. 2, the refinement means 5 includes an image division means 50, a cost volume generation means 51, an initial depth map generation means 52, a smoothing means 53, a scale conversion means (depth conversion means) 54, a layering processing means 55, a scale correction means (intermediate depth map correction means) 56, a cost weight calculation means 57, a visibility weight calculation means 58, a weight application means 59, and a final depth map generation means 60.

画像分割手段５０は、ＲＧＢ－Ｄカメラ２から入力されたＲＧＢ画像Ｐ_Ｃ及びデプス画像Ｐ_Ｄを視点毎に分割するものである。図６（ａ）及び（ｂ）に示すように、画像分割手段５０は、画像分割手段４１と同様、被写体９が撮影されたＲＧＢ画像Ｐ_Ｃ及びデプス画像Ｐ_Ｄを分割する。 The image division means 50 divides the RGB image P _C and the depth image P _D input from the RGB-D camera 2 for each viewpoint. As shown in Figures 6(a) and 6(b), the image division means 50 divides the RGB image P _C and the depth image P _D of the subject 9, similar to the image division means 41.

なお、図６では、レンズ系２１を介しているため、ＲＧＢ画像Ｐ_Ｃ及びデプス画像Ｐ_Ｄで被写体９が逆立像になっている。この場合、被写体９が正立像となるようにＲＧＢ画像Ｐ_Ｃ及びデプス画像Ｐ_Ｄに反転処理を施してもよい。 6, the subject 9 appears as an inverted image in the RGB image _PC and the depth image _PD due to the lens system 21. In this case, the RGB image _PC and the depth image _PD may be subjected to an inversion process so that the subject 9 appears as an upright image.

コストボリューム生成手段５１は、後記する奥行きレイヤ及びＲＧＢ画像Ｐ_Ｃの画素位置毎にコストを算出し、コストを奥行きレイヤ及び画素位置で三次元配列したコストボリュームを生成するものである。本実施形態では、コストボリューム生成手段５１は、コストボリュームを推定する手法の一つであるプレーンスイープ法を用いることとする（参考文献３）。 The cost volume generating means 51 calculates a cost for each pixel position of a depth layer and an RGB image _PC described below, and generates a cost volume in which the costs are arranged three-dimensionally by the depth layer and the pixel position. In this embodiment, the cost volume generating means 51 uses a plane sweep method, which is one of the methods for estimating a cost volume (Reference 3).

参考文献３：David Gallup, et al. , "Real-time plane-sweeping stereo with multiple sweeping directions", IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8 (2007) Reference 3: David Gallup, et al., "Real-time plane-sweeping stereo with multiple sweeping directions", IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8 (2007)

まず、コストボリューム生成手段５１は、図７に示すように、被写体９が配置された空間中に、奥行き方向で所定間隔の奥行きレイヤＮ_Ｄを複数設定する。図７の例では、５つの奥行きレイヤＮ_Ｄが設定されている（Ｄ＝１，…，５）。なお、図７では、ｘ軸が水平方向、ｙ軸が垂直方向、ｚ軸が奥行方向を示す。次に、コストボリューム生成手段５１は、全ての仮想カメラＣのうち何れか１台をリファレンスカメラとして、このリファレンスカメラと、別のもう１台の仮想カメラＣとでカメラペアを設定する。そして、コストボリューム生成手段５１は、カメラペアを構成する各仮想カメラＣのＲＧＢ画像Ｐ_Ｃを射影変換により奥行きレイヤＮ_Ｄに投影する。さらに、コストボリューム生成手段５１は、奥行きレイヤＮ_Ｄに投影した２つのＲＧＢ画像Ｐ_Ｃの各画素の画素値の差分（例えば、ＳＡＤ：Sum of Absolute Difference）を求めることで、コストを算出する。このコストは、その奥行きレイヤＮ_Ｄに投影された２つのＲＧＢ画像Ｐ_Ｃの類似度を表し、その値が小さくなる程、その奥行きレイヤＮ_Ｄに被写体９の奥行きが存在する可能性が高いことを表す。 First, as shown in FIG. 7, the cost volume generating means 51 sets a plurality of depth layers N _D at a predetermined interval in the depth direction in the space in which the subject 9 is arranged. In the example of FIG. 7, five depth layers N _D are set (D=1, ..., 5). In FIG. 7, the x-axis indicates the horizontal direction, the y-axis indicates the vertical direction, and the z-axis indicates the depth direction. Next, the cost volume generating means 51 sets a camera pair with one of all virtual cameras C as a reference camera and another virtual camera C. Then, the cost volume generating means 51 projects the RGB images P _C of each virtual camera C constituting the camera pair onto the depth layer _N _D by projective transformation. Furthermore, the cost volume generating means 51 calculates the cost by calculating the difference (for example, SAD: Sum of Absolute Difference) between the pixel values of each pixel of the two RGB images P _C projected onto the depth layer N D. This cost represents the similarity between the two RGB images P _C projected onto the depth layer N _D , and the smaller the value, the higher the possibility that the depth of the subject 9 exists in the depth layer N _D .

コストボリューム生成手段５１は、前記した処理を全ての奥行きレイヤＮ_Ｄで行うことで、コストボリュームを生成できる。図８に示すように、ＲＧＢ画像Ｐ_ＣのサイズをＵ×Ｖ画素とすると、コストボリューム９１は、Ｕ×Ｖ×Ｎ_Ｄのコストの３次元配列となる。また、コストボリューム９１では、同一画素位置で奥行方向に配列されたコストをコスト列９２とする。つまり、コスト列９２は、１×１×Ｎ_Ｄのコストの３次元配列となる。そして、コストボリューム生成手段５１は、リファレンスカメラのＲＧＢ画像Ｐ_Ｃをガイドとして、ガイデッドフィルタをコストボリューム９１に適用する（参考文献４）。これにより、エッジを保持したままコストボリューム９１を平滑化できるため、コストボリューム９１のノイズを低減できる。 The cost volume generating means 51 can generate a cost volume by performing the above-mentioned process for all depth layers N _D. As shown in FIG. 8, if the size of the RGB image P _C is U×V pixels, the cost volume 91 is a three-dimensional array of U×V×N _D costs. In addition, in the cost volume 91, costs arranged in the depth direction at the same pixel position are set as a cost column 92. In other words, the cost column 92 is a three-dimensional array of 1×1×N _D costs. Then, the cost volume generating means 51 applies a guided filter to the cost volume 91 using the RGB image P _C of the reference camera as a guide (Reference 4). This allows the cost volume 91 to be smoothed while maintaining edges, thereby reducing noise in the cost volume 91.

参考文献４：Kaiming He, Sun Jian, and Tang Xiaoou, "Guided image filtering", European conference on computer vision. Springer, pp. 1-10, (2010) Reference 4: Kaiming He, Sun Jian, and Tang Xiaoou, "Guided image filtering", European conference on computer vision. Springer, pp. 1-10, (2010)

リファレンスカメラの周辺にある仮想カメラＣの集合をＳとすると、集合の要素数｜Ｓ｜だけカメラペアを設定できる。このとき、コストボリューム９１もカメラペアと同数できる。例えば、仮想カメラＣが４台の場合、１台のリファレンスカメラに対して、カメラペアが３つとなり、コストボリューム９１も３つとなる。例えば、仮想カメラＣ_１がリファレンスカメラの場合、カメラペアが（Ｃ_１，Ｃ_２）、（Ｃ_１，Ｃ_３）、（Ｃ_１，Ｃ_４）となる。 If a set of virtual cameras C around a reference camera is S, then the number of camera pairs that can be set is the number of elements in the set |S|. In this case, the number of cost volumes 91 can be the same as the number of camera pairs. For example, when there are four virtual cameras C, there are three camera pairs for one reference camera, and there are also three cost volumes 91. For example, when virtual camera _C1 is the reference camera, the camera pairs are ( _C1 , _C2 ), ( _C1 , _C3 ), and ( _C1 , _C4 ).

初期デプスマップ生成手段５２は、コストボリューム生成手段５１から入力されたコストボリューム９１で同一画素位置のコスト列９２において、コストが最小となる奥行きレイヤＮ_Ｄのデプスを示す初期デプスマップを生成するものである。 The initial depth map generating means 52 generates an initial depth map indicating the depth of the depth layer _ND having the smallest cost in the cost column 92 at the same pixel position in the cost volume 91 input from the cost volume generating means 51 .

ここで、初期デプスマップ生成手段５２は、１台のリファレンスカメラに対して複数のコストボリューム９１が存在するため、各コストボリューム９１の総和をリファレンスカメラの最終的なコストボリューム９１として求める。そして、初期デプスマップ生成手段５２は、各コスト列９２で最小のコストを有する奥行きレイヤＮ_Ｄを正しいデプスとして求め、リファレンスカメラの初期デプスマップＤ^Ｃを生成する。 Here, since there are multiple cost volumes 91 for one reference camera, the initial depth map generating means 52 calculates the sum of each cost volume 91 as the final cost volume 91 of the reference camera. Then, the initial depth map generating means 52 calculates the depth layer N _D having the smallest cost in each cost column 92 as the correct depth, and generates an initial depth map D ^C of the reference camera.

その後、初期デプスマップ生成手段５２は、初期デプスマップＤ^Ｃをスケール補正手段５６に出力し、最終的なコストボリューム９１をウェイト適用手段５９に出力する。 The initial depth map generator 52 then outputs the initial depth map ^DC to the scale corrector 56 and outputs the final cost volume 91 to the weight applier 59 .

平滑化手段５３は、画像分割手段５０から入力したデプス画像Ｐ_Ｄを平滑化するものである。ここで、平滑化手段５３は、デプスカメラのショットノイズなどのノイズがデプス画像Ｐ_Ｄに含まれるため、このデプス画像Ｐ_Ｄをフィルタ処理により平滑化する。例えば、フィルタ処理として、ガイデッドフィルタがあげられる。このガイデッドフィルタは、平滑化フィルタの一種であり、ガイド画像を用いて対象の画像を平滑化する。ここでは、ガイド画像として、ＲＧＢ画像Ｐ_Ｃを用いる。 The smoothing means 53 smoothes the depth image _PD input from the image dividing means 50. Here, since the depth image _PD contains noise such as shot noise of the depth camera, the smoothing means 53 smoothes the depth image _PD by filtering. For example, a guided filter is used as the filtering process. This guided filter is a type of smoothing filter, and smoothes the target image using a guide image. Here, an RGB image _PC is used as the guide image.

なお、フィルタ処理によりノイズを除去できる一方、過度な平滑化によりデプス画像Ｐ_Ｄの精度が低下する可能性がある。このため、平滑化手段５３は、必要に応じでフィルタ処理を実行すればよい。 Although noise can be removed by filtering, excessive smoothing may reduce the accuracy of the depth image _PD . For this reason, the smoothing means 53 may perform filtering only as necessary.

スケール変換手段５４は、デプス画像Ｐ_Ｄの各画素の画素値を実スケールのデプスに変換するスケール変換関数により、デプス画像Ｐ_Ｄを中間デプスマップに変換するものである。本実施形態では、スケール変換手段５４は、スケール変換関数算出手段４４から入力されたスケール変換関数により、平滑化手段５３から入力されたデプス画像Ｐ_Ｄを実スケールのデプスマップへと変換する。なお、スケール変換手段５４は、ＲＧＢ－Ｄカメラ２のメーカからスケール変換関数が提供される場合、これを使用してもよい。 The scale conversion means 54 converts the depth image P _D into an intermediate depth map by a scale conversion function that converts the pixel value of each pixel of the depth image P _D into a real-scale depth. In this embodiment, the scale conversion means 54 converts the depth image P _D input from the smoothing means 53 into a real-scale depth map by the scale conversion function input from the scale conversion function calculation means 44. Note that, when a scale conversion function is provided by the manufacturer of the RGB-D camera 2, the scale conversion means 54 may use this function.

レイヤ化処理手段５５は、スケール変換手段５４から入力された中間デプスマップのデプスを最も近い奥行きレイヤＮ_Ｄのデプスに置き換えるレイヤ化処理を施すものである。具体的には、レイヤ化処理手段５５は、カメラパラメータが既知のため、実スケールの中間デプスマップを３次元点群化できる。ここで、レイヤ化処理手段５５は、中間デプスマップがカメラ座標系における光軸方向（一般的にはz方向）の距離ではなく、光学中心からの距離を表している場合、その距離を考慮して３次元点群化する。そして、レイヤ化処理手段５５は、各点のデプスを最も近い奥行きレイヤＮ_Ｄの所属とすることで、中間デプスマップを奥行きレイヤＮ_Ｄで表現する。以後、レイヤ化処理を施した中間デプスマップをＤ^Ｄとする。 The layering processing means 55 performs layering processing to replace the depth of the intermediate depth map input from the scale conversion means 54 with the depth of the closest depth layer N _D. Specifically, since the camera parameters are known, the layering processing means 55 can convert the intermediate depth map of real scale into a three-dimensional point cloud. Here, when the intermediate depth map represents a distance from the optical center, rather than a distance in the optical axis direction (generally the z direction) in the camera coordinate system, the layering processing means 55 converts the intermediate depth map into a three-dimensional point cloud taking into account the distance. Then, the layering processing means 55 represents the intermediate depth map in the depth layer N _D by making the depth of each point belong to the closest depth layer N _D. Hereinafter, the intermediate depth map subjected to layering processing is referred to as D ^D.

スケール補正手段５６は、初期デプスマップＤ^Ｃと中間デプスマップＤ^Ｄとのデプス差が閾値以下の画素について、各奥行きレイヤＮ_Ｄでデプス差の平均を補正値として求め、中間デプスマップＤ^Ｄのデプスを補正値で補正するものである。つまり、スケール補正手段５６は、スケール変換関数の精度が低い場合、デプス画像Ｐ_Ｄから生成した中間デプスマップＤ^ＤをＲＧＢ画像Ｐ_Ｃから生成した初期デプスマップＤ^Ｃに合わせるように補正する。 The scale correction means 56 calculates an average of the depth differences in each depth layer N _D as a correction value for pixels where the depth difference between the initial depth map D ^C and the intermediate depth map D ^D is equal to or less than a threshold, and corrects the depth of the intermediate depth map D ^D with the correction value. In other words, when the accuracy of the scale conversion function is low, the scale correction means 56 corrects the intermediate depth map D ^D generated from the depth image P _D to match it with the initial depth map D ^C generated from the RGB image P _C.

具体的には、スケール補正手段５６は、初期デプスマップＤ^Ｃと中間デプスマップＤ^Ｄとの各画素のデプス差Ｄ^Ｓｕｂ＝Ｄ^Ｃ－Ｄ^Ｄを算出する。次に、スケール補正手段５６は、｜Ｄ^Ｓｕｂ｜≦ｔｈｒｅｓｏｌｄを満たす画素のみを対象として、初期デプスマップＤ^Ｃの各デプスｄ（ｄ＝１，２，・・・，Ｎ_Ｄ）でデプス差Ｄ^Ｓｕｂの平均を算出し、補正値とする。なお、閾値ｔｈｒｅｓｏｌｄは手動で設定する。そして、スケール補正手段５６は、Ｄ^Ｄ _Ｎｅｗ＝Ｄ^Ｄ _Ｏｌｄ＋Ｄ^Ｃｏｒのように、補正前の中間デプスマップＤ^Ｄ _Ｏｌｄに補正デプス値Ｄ^Ｃｏｒを適用し、補正後の中間デプスマップＤ^Ｄ _Ｎｅｗを求める（以後、中間デプスマップＤ^Ｄ）。
なお、スケール補正手段５６は、スケール変換関数の精度が高い場合、処理を行わなくともよい。 Specifically, the scale correction means 56 calculates the depth difference D ^Sub = D ^C - D ^D for each pixel between the initial depth map D ^C and the intermediate depth map D ^D. Next, the scale correction means 56 calculates the average of the depth differences D Sub at each depth d (d = 1, 2, ..., N _D ) of the initial depth map D ^C for only pixels that satisfy | ^{D Sub} ^| ≤ threshold, and sets the average as a correction value. Note that the threshold value threshold is set manually. Then, the scale correction means 56 applies the correction depth value D ^Cor to the intermediate depth map D ^D _Old before correction, such as D ^D _New = D ^D _Old + D ^Cor , to obtain the intermediate depth map D ^D _New after correction (hereinafter, intermediate depth map D ^D ).
It should be noted that the scale correction means 56 does not need to perform the process if the scale conversion function has high accuracy.

コストウェイト算出手段５７は、スケール補正手段５６から入力された中間デプスマップＤ^Ｄの重みを正規分布関数で表したコストウェイトＷ^Ｃを算出するものである。前記したように、コストボリューム９１は、ＲＧＢ画像Ｐ_Ｃのみから生成されており、デプスマップを考慮していない。そこで、中間デプスマップＤ^Ｄから算出したコストウェイトＷ^Ｃをコストボリューム９１に適用することで、ＲＧＢ画像Ｐ_Ｃとデプスマップとの両方が考慮されたコストボリューム９１となる。 The cost weight calculation means 57 calculates a cost weight W ^C which represents the weight of the intermediate depth map D ^D input from the scale correction means 56 by a normal distribution function. As described above, the cost volume 91 is generated only from the RGB image P _C , and does not take the depth map into consideration. Therefore, by applying the cost weight W ^C calculated from the intermediate depth map D ^D to the cost volume 91, the cost volume 91 takes into consideration both the RGB image P _C and the depth map.

コストウェイトＷ^Ｃは、中間デプスマップＤ^Ｄが正しいデプス値を有する可能性が高いとして、そのデプスのウェイトを最小値とした正規分布で表す。図９に示すように、正規分布の最大値を１とし、奥行きレイヤｄの正規分布関数ｇ（ｄ）を以下の式（１）で定義する。 The cost weight W ^C is expressed as a normal distribution with the weight of the intermediate depth map D ^D as the minimum value, assuming that the intermediate depth map D D has a high possibility of having a correct depth value. As shown in Fig. 9, the maximum value of the normal distribution is 1, and the normal distribution function g(d) of the depth layer d is defined by the following formula (1).

ここで、μは平均、σ^２は分散、σは標準偏差を表す。この正規分布関数ｇ（ｄ）を用いてコストウェイト関数ｆ_Ｃ（ｄ）を以下の式（２）で定義する。なお、ａ_ｃは、コストウェイトＷ^Ｃを決めるパラメータである。また、図１０（ａ）に示すように、式（２）の正規分布関数ｇ（ｄ）において、平均μが中間デプスマップＤ^Ｄの画素（ｕ，ｖ)のデプス値Ｄ^Ｄ（ｕ，ｖ)の平均を表し、分散σ^２がコストウェイト関数ｆ_Ｃ（ｄ）の設計方針に応じて予め設定される（例えば、σ^２＝Ｎ_Ｄ／３）。 Here, μ represents the mean, ^σ2 represents the variance, and σ represents the standard deviation. Using this normal distribution function g(d), a cost weight function f _C (d) is defined by the following formula (2). Note that a _c is a parameter that determines the cost weight W ^C. Also, as shown in FIG. 10(a), in the normal distribution function g(d) of formula (2), the mean μ represents the mean of the depth values D ^D (u, v) of the pixels (u, v) of the intermediate depth map D ^D , and the variance σ ² is set in advance according to the design policy of the cost weight function f _C (d) (for example, σ ² = N _D /3).

コストウェイトＷ^Ｃは、コストボリューム９１と同一サイズの３次元配列となる。そして、コストウェイトＷ^Ｃの各要素には、以下の式（３）に示すように、コストウェイト関数ｆ_Ｃ（ｄ）の値が入る。以上より、コストウェイト算出手段５７は、式（３）を用いて、コストウェイトＷ^Ｃを算出する。 The cost weight W ^C is a three-dimensional array of the same size as the cost volume 91. Each element of the cost weight W ^C contains the value of the cost weight function f _C (d), as shown in the following formula (3). From the above, the cost weight calculation means 57 calculates the cost weight W ^C using formula (3).

ビジビリティウェイト算出手段５８は、コストウェイト算出手段５７から入力された中間デプスマップＤ^Ｄから、オクルージョン発生時にコストを低下させるビジビリティウェイトＷ^Ｖを算出するものである。 The visibility weight calculation means 58 calculates a visibility weight ^WV that reduces the cost when occlusion occurs, from the intermediate depth map D ^D input from the cost weight calculation means 57 .

ここで、コストボリューム９１を生成したときにオクルージョンが考慮されておらず、オクルージョンが発生した部分のコストがノイズとなり、前記したレイヤ化処理でもエラーが発生している。複数のカメラペアでコストボリューム９１の総和を求めた場合でも、このエラーは同様に発生する。なお、オクルージョンとは、一方の仮想カメラＣで見え、かつ、他方の仮想カメラＣで見えない領域が発生することである。 Here, occlusion was not taken into consideration when the cost volume 91 was generated, and the cost of the occluded parts becomes noise, causing an error in the layering process described above. This error also occurs when the sum of the cost volumes 91 is calculated for multiple camera pairs. Note that occlusion refers to the occurrence of an area that is visible from one virtual camera C but not visible from the other virtual camera C.

その一方、中間デプスマップＤ^Ｄは、１台のデプスカメラから生成されているため、オクルージョンの影響を受けない。そこで、ビジビリティウェイト算出手段５８は、オクルージョンの影響を緩和する（オクルージョンが発生した部分のコストを低下させる）ため、中間デプスマップＤ^ＤからビジビリティウェイトＷ^Ｖを算出する。 On the other hand, the intermediate depth map D 1 ^D is not affected by occlusion because it is generated from a single depth camera. Therefore, the visibility weight calculation means 58 calculates a visibility weight W 1 ^V from the intermediate depth map D ^{1 D} in order to mitigate the effect of occlusion (to reduce the cost of the portion where occlusion occurs).

図１０（ｂ）に示すように、ビジビリティウェイト関数ｆ_Ｖ（ｄ）を以下の式（４）で定義する。なお、ａ_Ｖは、ビジビリティウェイトＷ^Ｖを決めるパラメータである。式（４）の正規分布関数ｇ（ｄ）において、平均μは、デプス値Ｄ^Ｄ（ｕ，ｖ)の平均に定数ｓｈｉｆｔを加えた値Ｄ^Ｄ（ｕ，ｖ)＋ｓｈｉｆｔを表す（但し、ｓｈｉｆｔ≧０）。また、分散σ^２は、ビジビリティウェイト関数ｆ_Ｖ（ｄ）の設計方針に応じて予め設定される（例えば、σ^２＝Ｎ_Ｄ／１０）。定数ｓｈｉｆｔの値を大きくすることで、中間デプスマップＤ^Ｄに誤差が存在しても許容される一方、ビジビリティウェイトＷ^Ｖの効果が小さくなる。 As shown in FIG. 10B, the visibility weight function f _V (d) is defined by the following formula (4). Note that a _V is a parameter that determines the visibility weight W ^V. In the normal distribution function g(d) of formula (4), the mean μ represents a value D ^D (u, v)+shift obtained by adding a constant shift to the average of the depth values D ^D (u, v) (where shift ≧0). In addition, the variance σ ² is set in advance according to the design policy of the visibility weight function f _V (d) (for example, σ ² = N _D /10). By increasing the value of the constant shift, the presence of an error in the intermediate depth map D ^D is tolerated, but the effect of the visibility weight W ^V is reduced.

ビジビリティウェイトＷ^Ｖは、コストボリューム９１と同一サイズの３次元配列となる。そして、ビジビリティウェイトＷ^Ｖの各要素には、以下の式（５）に示すように、ビジビリティウェイト関数ｆ_Ｖ（ｄ）の値が入る。以上より、ビジビリティウェイト算出手段５８は、式（５）のビジビリティウェイトＷ^Ｖを算出する。 The visibility weight ^WV is a three-dimensional array of the same size as the cost volume 91. Each element of the visibility weight ^WV contains the value of the visibility weight function _fV (d) as shown in the following formula (5). From the above, the visibility weight calculation means 58 calculates the visibility weight ^WV of formula (5).

ウェイト適用手段５９は、初期デプスマップ生成手段５２から入力されたコストボリューム９１にコストウェイトＷ^Ｃ及びビジビリティウェイトＷ^Ｖを適用するものである。ここで、最終的なコストボリュームＥ^Ｓは、リファレンスカメラＣとして、全てのカメラペアで統合したコストボリューム９１である。つまり、ウェイト適用手段５９は、以下の式（６）に示すように、リファレンスカメラのコストウェイトＷ^Ｃ（ｘ，ｙ，ｚ）、コストボリュームＥ_ｊ、ビジビリティウェイトＷ^Ｖにより、最終的なコストボリュームＥ^Ｓを算出する。 The weight application means 59 applies the cost weight W ^C and the visibility weight W ^V to the cost volume 91 input from the initial depth map generation means 52. Here, the final cost volume E ^S is the cost volume 91 integrated for all camera pairs with the reference camera C. That is, the weight application means 59 calculates the final cost volume E ^S using the cost weight W ^C (x, y, z), cost volume E _j , and visibility weight W ^V of the reference camera, as shown in the following formula (6).

なお、コストボリュームＥ_ｊは、リファレンスカメラＣと周囲のカメラ集合Ｓに含まれる仮想カメラＣ_ｊ（ｊ∈Ｓ）とのコストボリューム９１である。また、ｗａｒｐは、仮想カメラＣ_ｊからリファレンスカメラＣへの各奥行きレイヤＮ_Ｄを平面とした射影変換を表す。 The cost volume _Ej is a cost volume 91 between the reference camera C and a virtual camera _Cj (jεS) included in the surrounding camera set S. Furthermore, warp represents a projective transformation from the virtual camera _Cj to the reference camera C with each depth layer N _D as a plane.

最終デプスマップ生成手段６０は、ウェイト適用手段５９から入力されたコストボリューム９１で同一画素位置のコスト列９２において、コストが最小となる奥行きレイヤＮ_Ｄのデプスを示す最終デプスマップを生成するものである。つまり、最終デプスマップ生成手段６０は、各コスト列９２で最小のコストを有する奥行きレイヤＮ_Ｄを正しいデプスとして求め、最終的なデプスマップを生成する。
なお、最終デプスマップ生成手段６０は、初期デプスマップ生成手段５２と同様の手法で最終的なデプスマップを生成するため、これ以上の説明を省略する。 The final depth map generating means 60 generates a final depth map indicating the depth of the depth layer N _D having the smallest cost in the cost column 92 at the same pixel position in the cost volume 91 input from the weight application means 59. In other words, the final depth map generating means 60 determines the depth layer N _D having the smallest cost in each cost column 92 as the correct depth, and generates the final depth map.
It should be noted that the final depth map generating means 60 generates the final depth map in a similar manner to that of the initial depth map generating means 52, and therefore further explanation thereof will be omitted.

その後、リファインメント手段５は、各視点のＲＧＢ画像Ｐ_Ｃ及び最終的なデプスマップと、カメラ校正手段４から入力された仮想カメラＣのカメラパラメータとをセットで出力する。 Thereafter, the refinement means 5 outputs a set of the RGB images P _C for each viewpoint, the final depth map, and the camera parameters of the virtual camera C input from the camera calibration means 4 .

［カメラ校正処理］
図１１を参照し、カメラ校正処理について説明する。
図１１に示すように、ステップＳ１において、画角補正手段４０は、ＲＧＢ－Ｄカメラ２から入力されたデプス画像Ｐ_Ｄの画角がＲＧＢ画像Ｐ_Ｃの画角に一致するように、デプス画像Ｐ_Ｄを射影変換する。なお、ステップＳ１の処理は、必須でないため破線で図示した。 [Camera calibration process]
The camera calibration process will be described with reference to FIG.
11, in step S1, the angle-of-view correction means 40 performs projective transformation on the depth image P _D input from the RGB-D camera 2 so that the angle of view of the depth image P _D matches the angle of view of the RGB image P _C. Note that the process of step S1 is not essential and is therefore illustrated by a dashed line.

ステップＳ２において、画像分割手段４１は、ＲＧＢ画像Ｐ_Ｃ及びデプス画像Ｐ_Ｄを視点毎に分割する。
ステップＳ３において、初期カメラパラメータ算出手段４２は、各視点のＲＧＢ画像Ｐ_Ｃにカメラ校正処理を施すことで、各視点に対応した仮想カメラＣの初期カメラパラメータを算出する。
ステップＳ４において、カメラパラメータ最適化手段４３は、初期カメラパラメータを初期値としたカメラ校正処理により、各仮想カメラＣの間でカメラパラメータを最適化する。
ステップＳ５において、スケール変換関数算出手段４４は、カメラパラメータが示す仮想カメラＣの位置から校正パターンまでの距離をデプス画像Ｐ_Ｄの各画素の画素値に対応させることで、スケール変換関数を算出する。 In step S2, the image dividing means 41 divides the RGB image _PC and the depth image P _D for each viewpoint.
In step S3, the initial camera parameter calculation means 42 performs camera calibration processing on the RGB image _PC of each viewpoint to calculate initial camera parameters of the virtual camera C corresponding to each viewpoint.
In step S4, the camera parameter optimization means 43 optimizes the camera parameters between the virtual cameras C by a camera calibration process using the initial camera parameters as initial values.
In step S5, the scale conversion function calculation means 44 calculates a scale conversion function by associating the distance from the position of the virtual camera C indicated by the camera parameters to the calibration pattern with the pixel value of each pixel of the depth image _PD .

［リファインメント処理］
図１２を参照し、リファインメント処理について説明する。
図１２に示すように、ステップＳ１０において、画像分割手段５０は、ＲＧＢ画像Ｐ_Ｃ及びデプス画像Ｐ_Ｄを仮想カメラＣ毎に分割する。
ステップＳ１１において、コストボリューム生成手段５１は、奥行きレイヤ及びＲＧＢ画像Ｐ_Ｃの画素毎にコストを算出し、コストの三次元配列であるコストボリューム９１を生成する。 [Refinement Processing]
The refinement process will now be described with reference to FIG.
As shown in FIG. 12, in step S10, the image dividing means 50 divides the RGB image P _{1 C} and the depth image P _{1 D} for each virtual camera C.
In step S11, the cost volume generating means 51 calculates the cost for each pixel of the depth layer and the RGB image _PC , and generates a cost volume 91 which is a three-dimensional array of the costs.

ステップＳ１２において、初期デプスマップ生成手段５２は、コストボリューム９１で同一画素位置のコスト列９２において、コストが最小となる奥行きレイヤのデプスを示す初期デプスマップを生成する。
なお、ステップＳ１１，Ｓ１２の処理と、後記するステップＳ１３～Ｓ１８の処理は、並列で実行できる。 In step S<b>12 , the initial depth map generating means 52 generates an initial depth map indicating the depth of the depth layer with the smallest cost in the cost column 92 at the same pixel position in the cost volume 91 .
The processes in steps S11 and S12 and the processes in steps S13 to S18 described below can be executed in parallel.

ステップＳ１３において、平滑化手段５３は、デプス画像Ｐ_Ｄを平滑化する。
ステップＳ１４において、スケール変換手段５４は、デプス画像Ｐ_Ｄの各画素の画素値を実スケールのデプスに変換するスケール変換関数により、デプス画像Ｐ_Ｄを中間デプスマップに変換する。
ステップＳ１５において、レイヤ化処理手段５５は、中間デプスマップのデプスを最も近い奥行きレイヤのデプスに置き換えるレイヤ化処理を施す。 In step S13, the smoothing means 53 smoothes the depth image _PD .
In step S14, the scale conversion means 54 converts the depth image _PD into an intermediate depth map using a scale conversion function that converts the pixel value of each pixel of the depth image _PD into a real-scale depth.
In step S15, the layering processing means 55 performs layering processing to replace the depth of the intermediate depth map with the depth of the closest depth layer.

ステップＳ１６において、スケール補正手段５６は、初期デプスマップＤ^Ｃと中間デプスマップＤ^Ｄとのデプス差が閾値以下の画素について、各奥行きレイヤＮ_Ｄでデプス差の平均を補正値として求め、中間デプスマップＤ^Ｄのデプスを補正値で補正する。なお、ステップＳ１６の処理は、必須でないため破線で図示した。
ステップＳ１７において、コストウェイト算出手段５７は、中間デプスマップＤ^Ｄの重みを正規分布関数で表したコストウェイトＷ^Ｃを算出する。
ステップＳ１８において、ビジビリティウェイト算出手段５８は、中間デプスマップＤ^Ｄから、オクルージョン発生時にコストを低下させるビジビリティウェイトＷ^Ｖを算出する。 In step S16, the scale correction means 56 calculates an average of the depth differences in each depth layer N _D as a correction value for pixels where the depth difference between the initial depth map D ^C and the intermediate depth map D ^D is equal to or less than a threshold value, and corrects the depth of the intermediate depth map D ^D with the correction value. Note that the process of step S16 is not essential and is therefore illustrated by a dashed line.
In step S17, the cost weight calculation means 57 calculates a cost weight W ^C which represents the weight of the intermediate depth map D ^D by a normal distribution function.
In step S18, the visibility weight calculation means 58 calculates, from the intermediate depth map ^D1D , a visibility weight ^W1V that reduces the cost when occlusion occurs.

ステップＳ１９において、ウェイト適用手段５９は、コストウェイトＷ^Ｃ及びビジビリティウェイトＷ^Ｖをコストボリューム９１に適用する。
ステップＳ２０において、最終デプスマップ生成手段６０は、コストボリューム９１で同一画素位置のコスト列９２において、コストが最小となる奥行きレイヤＮ_Ｄのデプスを示す最終デプスマップを生成する。 In step S 19 , the weight application means 59 applies the cost weight ^W_C and the visibility weight ^W_V to the cost volume 91 .
In step S20, the final depth map generating means 60 generates a final depth map indicating the depth of the depth layer _ND that has the smallest cost in the cost column 92 at the same pixel position in the cost volume 91.

［作用・効果］
以上のように、三次元形状取得システム１は、複数視点のＲＧＢ画像Ｐ_Ｃ及び高精度なデプスマップと、仮想カメラＣのカメラパラメータとを容易に取得できる。すなわち、三次元形状取得システム１は、簡易なシステム構成を実現し、複数視点分のＲＧＢ画像Ｐ_Ｃ及び高精度なデプスマップと、仮想カメラＣのカメラパラメータとを提供できる。これらデータは、様々なアプリケーションで利用可能である。例えば、三次元画像を生成する場合、密な多視点ＲＧＢ画像が必要になる。三次元形状取得システム１が提供するデータは、仮想カメラＣのカメラパラメータや高精度なデプスマップを含んでいるため、簡単な処理で三次元画像を生成できる。 [Action and Effects]
As described above, the three-dimensional shape acquisition system 1 can easily acquire the RGB images P _C from multiple viewpoints, the highly accurate depth map, and the camera parameters of the virtual camera C. That is, the three-dimensional shape acquisition system 1 realizes a simple system configuration and can provide the RGB images P _C from multiple viewpoints, the highly accurate depth map, and the camera parameters of the virtual camera C. These data can be used in various applications. For example, when generating a three-dimensional image, a dense multi-view RGB image is required. The data provided by the three-dimensional shape acquisition system 1 includes the camera parameters of the virtual camera C and the highly accurate depth map, so that a three-dimensional image can be generated by simple processing.

以上、本発明の実施形態を詳述してきたが、本発明はこれに限られるものではなく、本発明の要旨を逸脱しない範囲の設計変更等も含まれる。 Although the embodiment of the present invention has been described in detail above, the present invention is not limited to this, and includes design modifications and the like that do not deviate from the gist of the present invention.

前記した実施形態では、デプスカメラがＴｏＦカメラであることとして説明したが、これに限定されない。例えば、デプスカメラがステレオカメラであってもよい。 In the above embodiment, the depth camera is described as a ToF camera, but this is not limited to this. For example, the depth camera may be a stereo camera.

本発明は、コンピュータが備えるＣＰＵ、メモリ、ハードディスク等のハードウェア資源を、前記した三次元形状取得装置として動作させるプログラムで実現することもできる。これらのプログラムは、通信回線を介して配布してもよく、ＣＤ－ＲＯＭやフラッシュメモリ等の記録媒体に書き込んで配布してもよい。 The present invention can also be realized by a program that causes hardware resources such as a CPU, memory, and hard disk of a computer to operate as the above-mentioned three-dimensional shape acquisition device. These programs may be distributed via a communication line, or written onto a recording medium such as a CD-ROM or flash memory and distributed.

１三次元形状取得システム（デプスマップ生成システム）
２ＲＧＢ－Ｄカメラ（撮影装置）
２０カメラ本体
２１レンズ系
２２フレネルレンズ
２３レンズアレイ
２４要素レンズ
２５赤外線ＬＥＤアレイ
３三次元形状取得装置（デプスマップ生成装置）
４カメラ校正手段
４０画角補正手段
４１画像分割手段
４２初期カメラパラメータ算出手段
４３カメラパラメータ最適化手段
４４スケール変換関数算出手段（奥行き変換関数算出手段）
５リファインメント手段
５０画像分割手段
５１コストボリューム生成手段
５２初期デプスマップ生成手段
５３平滑化手段
５４スケール変換手段（奥行き変換手段）
５５レイヤ化処理手段
５６スケール補正手段（中間デプスマップ補正手段）
５７コストウェイト算出手段
５８ビジビリティウェイト算出手段
５９ウェイト適用手段
６０最終デプスマップ生成手段
９被写体
９０校正パターン
９１コストボリューム
９２コスト列
Ｃ仮想カメラ
Ｄ^Ｃ初期デプスマップ
Ｄ^Ｄ中間デプスマップ
Ｎ_Ｄ奥行きレイヤ 1. 3D shape acquisition system (depth map generation system)
2 RGB-D camera (photography device)
20 Camera body 21 Lens system 22 Fresnel lens 23 Lens array 24 Element lens 25 Infrared LED array 3 Three-dimensional shape acquisition device (depth map generation device)
4 Camera calibration means 40 View angle correction means 41 Image division means 42 Initial camera parameter calculation means 43 Camera parameter optimization means 44 Scale conversion function calculation means (depth conversion function calculation means)
5 Refinement means 50 Image division means 51 Cost volume generation means 52 Initial depth map generation means 53 Smoothing means 54 Scale conversion means (depth conversion means)
55 Layering processing means 56 Scale correction means (intermediate depth map correction means)
57 Cost weight calculation means 58 Visibility weight calculation means 59 Weight application means 60 Final depth map generation means 9 Object 90 Calibration pattern 91 Cost volume 92 Cost sequence C Virtual camera D ^C Initial depth map D ^D Intermediate depth map N _D Depth layer

Claims

A depth map generating device that generates a depth map corresponding to the captured images at each viewpoint using captured images and depth images captured by an imaging device including a imaging camera and a depth camera on the same optical axis and an optical element array, the device comprising:
a cost volume generating means for calculating a cost representing a similarity between the captured images projected onto the depth layer for each pixel position of the depth layer and the captured images at a predetermined interval in the depth direction, and generating a cost volume in which the costs are three-dimensionally arranged in the depth layer and the pixel positions;
a depth conversion means for converting the depth image into an intermediate depth map using a depth conversion function that converts a pixel value of each pixel of the depth image into a depth;
a cost weight calculation means for calculating a cost weight obtained by expressing the weight of the intermediate depth map as a normal distribution function;
a visibility weight calculation means for calculating a visibility weight that reduces the cost when an occlusion occurs from the intermediate depth map;
weight application means for applying the cost weight and the visibility weight to the cost volume;
a final depth map generating means for generating a final depth map indicating a depth of the depth layer in which the cost is minimum in a cost sequence at the same pixel position in the cost volume after applying a weight;
A depth map generating device comprising:

A smoothing means for smoothing the depth image is further provided,
2. The depth map generating device according to claim 1, wherein the depth conversion means converts the depth image smoothed by the smoothing means into the intermediate depth map using the depth conversion function.

an initial depth map generating means for generating an initial depth map indicating a depth of the depth layer in which the cost is minimum in a cost sequence at the same pixel position in the cost volume generated by the cost volume generating means;
an intermediate depth map correction means for calculating an average of depth differences between the depth layers as a correction value for pixels in which a depth difference between the initial depth map and the intermediate depth map is equal to or smaller than a threshold, and correcting the depth of the intermediate depth map with the correction value;
The depth map generating device according to claim 1 or 2, further comprising:

A layering processing means for performing a layering process of replacing a depth of the intermediate depth map with a depth of the closest depth layer,
4. The depth map generating device according to claim 3, wherein the intermediate depth map correcting means corrects the depth of the intermediate depth map, which has been subjected to layering processing by the layering processing means, with the correction value.

an initial camera parameter calculation means for calculating initial camera parameters of a virtual camera corresponding to each viewpoint by performing a camera calibration process on a captured image of a calibration pattern captured by the image capture device from each viewpoint;
a camera parameter optimization means for optimizing camera parameters between the virtual cameras through the camera calibration process using the initial camera parameters as initial values;
a depth conversion function calculation means for calculating the depth conversion function by making the distance from the position of the virtual camera indicated by the optimized camera parameters to the calibration pattern correspond to the pixel value of each pixel of the depth image;
The depth map generating device according to claim 1 , further comprising:

and a field-of-view correction unit that performs projective transformation on the depth image so that the field of view of the depth image, which is obtained by photographing the calibration pattern from each viewpoint by the photographing device, coincides with the field of view of the photographed image,
The depth map generating device according to claim 5, characterized in that the depth conversion function calculation means calculates the depth conversion function by making the depth from the position of the virtual camera to the calibration pattern correspond to the pixel values of each pixel of the depth image projected by the angle of view correction means.

A program for causing a computer to function as a depth map generating device according to any one of claims 1 to 6.

A photographing device including a photographing camera and a depth camera on the same optical axis and an optical element array;
A depth map generating device according to any one of claims 1 to 6,
A depth map generating system comprising: