JP2022024688A

JP2022024688A - Depth map generation device and program thereof, and depth map generation system

Info

Publication number: JP2022024688A
Application number: JP2020127411A
Authority: JP
Inventors: 正規加納; Masanori Kano; 真宏河北; Masahiro Kawakita
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2020-07-28
Filing date: 2020-07-28
Publication date: 2022-02-09
Anticipated expiration: 2040-07-28
Also published as: JP7489253B2

Abstract

To provide a depth map generation device capable of easily acquiring photographic images at a plurality of viewpoints and a highly accurate depth map.SOLUTION: A three-dimensional shape acquisition device 3 includes cost volume generation means 51 for generating a cost volume, scale conversion means 54 for converting a depth image into an intermediate depth map using a scale conversion function, cost weight calculation means 57 for calculating a cost weight, visibility weight calculation means 58 for calculating a visibility weight, weight application means 59 for applying the cost weight and the visibility weight to the cost volume, and final depth map generation means 60 for generating a final depth map indicating the depth of a depth layer that minimizes the cost in a cost column at the same pixel position in the cost volume.SELECTED DRAWING: Figure 2

Description

本発明は、デプスマップを生成するデプスマップ生成装置及びそのプログラム、並びに、デプスマップ生成システムに関する。 The present invention relates to a depth map generator and a program thereof for generating a depth map, and a depth map generation system.

近年、空間中に存在する被写体の三次元形状（デプスマップ）を取得する技術が盛んに研究されている。この技術は、三次元映像制作、ＡＲ（Augmented Reality）、ＶＲ（Virtual Reality）、ロボティクスなど様々な分野への適用が期待されている。被写体の三次元形状を取得するアプローチとしては、能動的な手法と受動的な手法に大別される（非特許文献１）。 In recent years, techniques for acquiring a three-dimensional shape (depth map) of a subject existing in space have been actively studied. This technology is expected to be applied to various fields such as 3D video production, AR (Augmented Reality), VR (Virtual Reality), and robotics. The approach for acquiring the three-dimensional shape of the subject is roughly classified into an active method and a passive method (Non-Patent Document 1).

能動的な手法は、計測装置が光源を有し、被写体からの反射光を利用して奥行き（デプス）を計測するものである。具体的な手法としては、パターン光投影、光飛行時間法（ＴｏＦ:Time of Flight）、照度差ステレオ法がある。これらの中で近年注目されているのが、ＴｏＦカメラを用いた手法である。ＴｏＦカメラは、光源から照射した光が被写体で反射して戻るまでの時間を計測することで、ＴｏＦカメラから被写体までの距離を求める。能動的な手法のメリットは、高度な計算処理を行うことなくリアルタイムで高精度な距離が得られることである。一方、能動的な手法のデメリットは、外乱光に弱い、被写体の反射率や距離によっては測定誤差が生じる、スケールの校正が必要な場合があることである。 In the active method, the measuring device has a light source and the depth is measured by using the reflected light from the subject. Specific methods include pattern light projection, light flight time method (ToF: Time of Flight), and illuminance difference stereo method. Among these, the method using a ToF camera has been attracting attention in recent years. The ToF camera obtains the distance from the ToF camera to the subject by measuring the time until the light emitted from the light source is reflected by the subject and returned. The merit of the active method is that a highly accurate distance can be obtained in real time without performing advanced calculation processing. On the other hand, the disadvantages of the active method are that it is vulnerable to ambient light, measurement errors occur depending on the reflectance and distance of the subject, and scale calibration may be required.

受動的な手法は、複数台のカラーカメラ（以降、「ＲＧＢカメラ」）、又は１台のＲＧＢカメラを移動させて、その視差から奥行き距離を計測するものである。具体的な手法としては、ステレオ法（多眼ステレオ）、モーションステレオがある。これらの原理はステレオ法であり、２台以上のカメラの視差からデプスを計算する。受動的な手法のメリットは、被写体に特殊な光を照射する必要がない、外乱光の影響を受けない、一般的なカラーカメラとコンピュータだけで実現できることである。一方、受動的な手法のデメリットは、得られるデプスに曖昧さが残る（テクスチャレス、オクルージョン領域）、計算コストが高くなることである。 The passive method is to move a plurality of color cameras (hereinafter referred to as "RGB cameras") or one RGB camera and measure the depth distance from the parallax. Specific methods include a stereo method (multi-eye stereo) and a motion stereo. These principles are the stereo method, which calculates the depth from the parallax of two or more cameras. The merit of the passive method is that it does not need to irradiate the subject with special light, is not affected by ambient light, and can be realized only with a general color camera and computer. On the other hand, the disadvantage of the passive method is that the obtained depth remains ambiguous (textureless, occlusion area) and the calculation cost is high.

その他、ＲＧＢカメラとデプスカメラを同一光軸上に配置し、レンズアレイを用いて、複数視点分のＲＧＢ画像及びデプス画像を取得できるＲＧＢ－Ｄカメラが知られている（特許文献１）。この手法では、カメラレンズから入射した光線をミラー（例えば、ハーフミラーやダイクロイックミラー）で分光し、ＲＧＢカメラとデプスカメラで受光する。 In addition, there is known an RGB-D camera capable of acquiring RGB images and depth images for a plurality of viewpoints by arranging an RGB camera and a depth camera on the same optical axis and using a lens array (Patent Document 1). In this method, light rays incident from a camera lens are separated by a mirror (for example, a half mirror or a dichroic mirror) and received by an RGB camera and a depth camera.

特開２００９－３００２６８号公報Japanese Unexamined Patent Publication No. 2009-300268

ディジタル画像処理（改訂新版）、ＣＧ－ＡＲＴＳ協会、２０１５年Digital Image Processing (Revised New Edition), CG-ARTS Association, 2015

前記したように、三次元形状の取得は、その応用できる分野が広いため、様々な手法が提案されているが、未だ確立されていない。汎用的な目的を考えると、１視点のカラー画像（以降、ＲＧＢ画像）とデプスマップのみでなく、様々な視点のＲＧＢ画像とデプスマップがあると使い勝手がよい。つまり、複数視点のＲＧＢ画像及びデプスマップのセットがあると、汎用性が向上する。 As described above, since the acquisition of a three-dimensional shape has a wide range of applications, various methods have been proposed, but have not yet been established. Considering a general purpose, it is convenient to have RGB images and depth maps of various viewpoints as well as color images (hereinafter, RGB images) and depth maps of one viewpoint. That is, having a set of RGB images and depth maps from a plurality of viewpoints improves versatility.

また、デプスマップの精度も重要である。ＲＧＢ－Ｄカメラで得られるデプス画像は、画素値（輝度値）で表されているため、この画素値を実スケールのデプスマップに変換する必要がある。しかし、実スケールへの変換関数が、デプスマップの精度に大きな影響を与える。さらに、デプスマップの精度は、撮影環境や被写体の種類によっても影響される。なお、実スケールとは、実空間上の距離（奥行き）のことである。 The accuracy of the depth map is also important. Since the depth image obtained by the RGB-D camera is represented by a pixel value (luminance value), it is necessary to convert this pixel value into a depth map of an actual scale. However, the conversion function to real scale has a great influence on the accuracy of the depth map. Furthermore, the accuracy of the depth map is also affected by the shooting environment and the type of subject. The actual scale is a distance (depth) in real space.

本発明は、前記した問題を解決し、複数視点の撮影画像及び高精度なデプスマップを容易に取得できるデプスマップ生成装置及びそのプログラム、並びに、デプスマップ生成システムを提供することを課題とする。 An object of the present invention is to provide a depth map generation device and a program thereof, and a depth map generation system, which can solve the above-mentioned problems and easily acquire a photographed image of a plurality of viewpoints and a highly accurate depth map.

前記課題を解決するため、本発明に係るデプスマップ生成装置は、同一光軸の撮影カメラ及びデプスカメラと光学素子アレイとで構成された撮影装置が各視点で被写体を撮影した撮影画像及びデプス画像を用いて、各視点の撮影画像に対応したデプスマップを生成するデプスマップ生成装置であって、コストボリューム生成手段と、奥行き変換手段と、コストウェイト算出手段と、ビジビリティウェイト算出手段と、ウェイト適用手段と、最終デプスマップ生成手段と、を備える構成とした。 In order to solve the above-mentioned problems, the depth map generation device according to the present invention is a photographed image and a depth image in which a photographing device composed of a photographing camera having the same optical axis and a depth camera and an optical element array captures a subject from each viewpoint. It is a depth map generation device that generates a depth map corresponding to a captured image of each viewpoint by using, and is a cost volume generation means, a depth conversion means, a cost weight calculation means, a visibility weight calculation means, and a weight application. The configuration includes a means and a final depth map generation means.

かかる構成によれば、コストボリューム生成手段は、奥行き方向で所定間隔の奥行きレイヤ及び撮影画像の画素位置毎に、奥行きレイヤに投影された撮影画像間の類似度を表すコストを算出し、コストを奥行きレイヤ及び画素位置で三次元配列したコストボリュームを生成する。
奥行き変換手段は、デプス画像の各画素の画素値を奥行きに変換する奥行き変換関数により、デプス画像を中間デプスマップに変換する。
コストウェイト算出手段は、中間デプスマップの重みを正規分布関数で表したコストウェイトを算出する。 According to such a configuration, the cost volume generating means calculates the cost representing the similarity between the captured images projected on the depth layer for each of the depth layers and the pixel positions of the captured images at predetermined intervals in the depth direction, and calculates the cost. Generate a cost volume that is three-dimensionally arranged by the depth layer and the pixel position.
The depth conversion means converts the depth image into an intermediate depth map by a depth conversion function that converts the pixel value of each pixel of the depth image into the depth.
The cost weight calculation means calculates the cost weight in which the weight of the intermediate depth map is expressed by a normal distribution function.

また、ビジビリティウェイト算出手段は、中間デプスマップから、オクルージョン発生時にコストを低下させるビジビリティウェイトを算出する。
ウェイト適用手段は、コストボリュームにコストウェイト及びビジビリティウェイトを適用する。
最終デプスマップ生成手段は、ウェイト適用後のコストボリュームで同一画素位置のコスト列において、コストが最小となる奥行きレイヤのデプスを示す最終デプスマップを生成する。 In addition, the visibility weight calculation means calculates the visibility weight that reduces the cost when occlusion occurs from the intermediate depth map.
The weight application means applies the cost weight and the visibility weight to the cost volume.
The final depth map generation means generates a final depth map showing the depth of the depth layer that minimizes the cost in the cost column at the same pixel position in the cost volume after weight application.

すなわち、デプスマップ生成装置は、デプス画像から生成したデプスマップに基づいて、撮影画像から生成したコストボリュームを２つのウェイトで制約するリファインメント処理を行う。このリファインメント処理によって、デプスマップ生成装置は、各視点の撮影画像に対応した高精度なデプスマップを生成できる。 That is, the depth map generation device performs refinement processing in which the cost volume generated from the captured image is constrained by two weights based on the depth map generated from the depth image. By this refinement processing, the depth map generator can generate a highly accurate depth map corresponding to the captured image of each viewpoint.

なお、本発明は、コンピュータを、前記したデプスマップ生成装置として機能させるためのプログラムで実現することができる。 The present invention can be realized by a program for making a computer function as the depth map generator described above.

また、本発明は、同一光軸の撮影カメラ及びデプスカメラと光学素子アレイとで構成された撮影装置と、前記したデプスマップ生成装置と、を備えることを特徴とするデプスマップ生成システムで実現することもできる。 Further, the present invention is realized by a depth map generation system including a photographing device including a photographing camera having the same optical axis, a depth camera, and an optical element array, and the depth map generating device described above. You can also do it.

本発明によれば、複数視点の撮影画像及び高精度なデプスマップを容易に取得できる。 According to the present invention, it is possible to easily acquire a photographed image from a plurality of viewpoints and a highly accurate depth map.

実施形態に係る三次元形状取得システムの全体構成図である。It is an overall block diagram of the 3D shape acquisition system which concerns on embodiment. 実施形態に係る三次元形状取得装置の構成を示すブロック図である。It is a block diagram which shows the structure of the 3D shape acquisition apparatus which concerns on embodiment. ＲＧＢ－Ｄカメラによる校正パターンの撮影を説明する説明図であり、（ａ）は校正データＡを示し、（ｂ）は校正データＢを示す。It is explanatory drawing explaining the photographing of the calibration pattern by the RGB-D camera, (a) shows calibration data A, and (b) shows calibration data B. 校正パターンを撮影した画像の分割を説明する説明図であり、（ａ）はＲＧＢ画像を示し、（ｂ）はデプス画像を示す。It is explanatory drawing explaining the division of the image which photographed the calibration pattern, (a) shows an RGB image, (b) shows a depth image. スケール変換関数の算出を説明する説明図であり、（ａ）は仮想カメラから校正パターンまでの距離を示し、（ｂ）はスケール変換関数の一例を示す。It is explanatory drawing explaining the calculation of a scale conversion function, (a) shows the distance from a virtual camera to a calibration pattern, and (b) shows an example of a scale conversion function. 被写体を撮影した画像の分割を説明する説明図であり、（ａ）はＲＧＢ画像を示し、（ｂ）はデプス画像を示す。It is explanatory drawing explaining division of the image which photographed a subject, (a) shows an RGB image, (b) shows a depth image. 奥行きレイヤの一例を説明する説明図である。It is explanatory drawing explaining an example of a depth layer. コストボリュームを説明する説明図である。It is explanatory drawing explaining the cost volume. 正規分布関数を説明する説明図である。It is explanatory drawing explaining a normal distribution function. （ａ）はコストウェイト関数の一例を説明する説明図であり、（ｂ）はビジビリティ関数の一例を説明する説明図である。(A) is an explanatory diagram for explaining an example of a cost weight function, and (b) is an explanatory diagram for explaining an example of a visibility function. 実施形態において、カメラ校正処理を示すフローチャートである。In the embodiment, it is a flowchart which shows the camera calibration process. 実施形態において、リファインメント手理を示すフローチャートである。In the embodiment, it is a flowchart which shows the refinement procedure.

以下、本発明の実施形態について図面を参照して説明する。但し、以下に説明する実施形態は、本発明の技術思想を具体化するためのものであって、特定的な記載がない限り、本発明を以下のものに限定しない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, the embodiments described below are for embodying the technical idea of the present invention, and the present invention is not limited to the following unless otherwise specified.

［三次元形状取得システムの概要］
図１を参照し、実施形態に係る三次元形状取得システム（デプスマップ生成システム）１の概要について説明する。
三次元形状取得システム１は、被写体９について、複数視点のＲＧＢ画像（撮影画像）及びデプスマップと、仮想カメラＣのカメラパラメータとを取得するものである。図１に示すように、三次元形状取得システム１は、ＲＧＢ－Ｄカメラ（撮影装置）２と、三次元形状取得装置（デプスマップ生成装置）３とを備える。 [Overview of 3D shape acquisition system]
An outline of the three-dimensional shape acquisition system (depth map generation system) 1 according to the embodiment will be described with reference to FIG.
The three-dimensional shape acquisition system 1 acquires RGB images (photographed images) and depth maps of a plurality of viewpoints and camera parameters of the virtual camera C for the subject 9. As shown in FIG. 1, the three-dimensional shape acquisition system 1 includes an RGB-D camera (shooting device) 2 and a three-dimensional shape acquisition device (depth map generation device) 3.

複数視点で撮影するために多数のＲＧＢカメラ及びデプスカメラを配置した場合、システムが大規模となり、コストが高くなる。そこで、三次元形状取得システム１では、後記する１台のＲＧＢ－Ｄカメラ（撮影装置）２により、多数のＲＧＢカメラ及びデプスカメラを配置したのと同等の構成を実現し、システム構成を簡略化できる。 If a large number of RGB cameras and depth cameras are arranged for shooting from multiple viewpoints, the system becomes large and the cost becomes high. Therefore, in the three-dimensional shape acquisition system 1, one RGB-D camera (shooting device) 2 described later realizes a configuration equivalent to arranging a large number of RGB cameras and depth cameras, and simplifies the system configuration. can.

三次元映像制作などの分野では、仮想カメラＣのカメラパラメータが必要となる。さらに、デプス画像は画素値（輝度値）で表されているため、この画素値を実スケールのデプスマップに変換するスケール変換関数も必要となる。そこで、三次元形状取得システム１では、三次元形状取得装置３によって、校正パターンを用いたカメラ校正処理を行って、仮想カメラＣのカメラパラメータとスケール変換関数を算出する。 In fields such as 3D video production, the camera parameters of the virtual camera C are required. Further, since the depth image is represented by a pixel value (luminance value), a scale conversion function for converting this pixel value into a depth map of an actual scale is also required. Therefore, in the three-dimensional shape acquisition system 1, the three-dimensional shape acquisition device 3 performs camera calibration processing using the calibration pattern, and calculates the camera parameters and the scale conversion function of the virtual camera C.

デプスマップの精度も重要である。前記したように、スケール変換関数が、デプスマップの精度に大きな影響を与えてしまう。さらに、デプスマップの精度は、撮影環境や被写体の種類によって大きく低下する。そこで、三次元形状取得システム１では、後記する三次元形状取得装置３によって、複数視点のＲＧＢ画像及びデプス画像を用いて、デプスマップの精度を改善する（リファインメント処理）。このとき、三次元形状取得装置３では、１台のＲＧＢ－Ｄカメラ２で撮影した１枚のＲＧＢ画像を視点毎に分割してマッチングするため、複数台のＲＧＢカメラで撮影した画像をマッチングする場合に比べ、色の差に起因するエラーを抑制できる。 The accuracy of the depth map is also important. As mentioned above, the scale conversion function has a great influence on the accuracy of the depth map. Furthermore, the accuracy of the depth map is greatly reduced depending on the shooting environment and the type of subject. Therefore, in the three-dimensional shape acquisition system 1, the accuracy of the depth map is improved by using the RGB image and the depth image of a plurality of viewpoints by the three-dimensional shape acquisition device 3 described later (refining process). At this time, in the three-dimensional shape acquisition device 3, one RGB image taken by one RGB-D camera 2 is divided and matched for each viewpoint, so that the images taken by a plurality of RGB cameras are matched. Compared to the case, the error caused by the color difference can be suppressed.

最初に、ＲＧＢ－Ｄカメラ２の構成について説明する。次に、三次元形状取得装置３によるカメラ校正処理について説明する。このカメラ校正処理は、各仮想カメラＣのカメラパラメータ、及び、スケール変換関数を算出する処理である。最後に、三次元形状取得装置３による、デプスマップの精度を改善するリファインメント処理について説明する。 First, the configuration of the RGB-D camera 2 will be described. Next, the camera calibration process by the three-dimensional shape acquisition device 3 will be described. This camera calibration process is a process of calculating the camera parameters and the scale conversion function of each virtual camera C. Finally, the refinement process for improving the accuracy of the depth map by the three-dimensional shape acquisition device 3 will be described.

［ＲＧＢ－Ｄカメラの構成］
図１に示すように、ＲＧＢ－Ｄカメラ２は、カメラ本体２０と、レンズ系２１とを備える撮像装置である。本実施形態では、カメラ本体２０は、図示を省略したＲＧＢカメラ及びデプスカメラを同一光軸上に配置したものである。また、カメラ本体２０は、被写体９からの光線を分光素子（不図示）で分光し、分光した光線をＲＧＢカメラ及びデプスカメラでそれぞれ受光する。例えば、ＲＧＢカメラとしては、一般的なカラーカメラがあげられる。また、分光素子としては、ハーフミラー又はダイクロイックミラーがあげられる。 [Structure of RGB-D camera]
As shown in FIG. 1, the RGB-D camera 2 is an imaging device including a camera body 20 and a lens system 21. In the present embodiment, the camera body 20 has an RGB camera and a depth camera (not shown) arranged on the same optical axis. Further, the camera body 20 disperses the light rays from the subject 9 by a spectroscopic element (not shown), and the separated light rays are received by the RGB camera and the depth camera, respectively. For example, as an RGB camera, a general color camera can be mentioned. Further, examples of the spectroscopic element include a half mirror or a dichroic mirror.

本実施形態では、デプスカメラとして、ＴｏＦカメラを用いる。このＴｏＦカメラは、距離計測時、被写体９に赤外線を照射するための赤外線ＬＥＤアレイ２５を備える。ＴｏＦカメラが撮影した赤外線画像のフレーム間差分を求めることにより、デプス画像を取得できる。 In this embodiment, a ToF camera is used as the depth camera. This ToF camera includes an infrared LED array 25 for irradiating the subject 9 with infrared rays at the time of distance measurement. A depth image can be acquired by obtaining the difference between frames of the infrared image taken by the ToF camera.

レンズ系２１は、フレネルレンズ２２と、レンズアレイ（光学素子アレイ）２３とを備える。レンズアレイ２３は、Ｎ_Ｘ×Ｎ_Ｙ個の要素レンズ２４を２次元状に配列したものである。ＲＧＢ－Ｄカメラ２は、このレンズアレイ２３を介することで、Ｎ_Ｘ×Ｎ_Ｙ視点分のＲＧＢ画像及びデプス画像を取得できる。すなわち、ＲＧＢ－Ｄカメラ２は、Ｎ_Ｘ×Ｎ_Ｙ個の仮想カメラＣを配置したのと同等の構成を実現している。本実施形態では、２×２個の要素レンズ２４に対応した４視点（４台の仮想カメラＣ）であることとする。 The lens system 21 includes a Fresnel lens 22 and a lens array (optical element array) 23. The lens array 23 is a two-dimensional arrangement of _NX × _NY element lenses 24. The RGB-D camera 2 can acquire RGB images and depth images for _NX × _NY viewpoints via the lens array 23. That is, the RGB-D camera 2 realizes a configuration equivalent to the arrangement of _NX × _NY virtual cameras C. In the present embodiment, it is assumed that there are four viewpoints (four virtual cameras C) corresponding to the 2 × 2 element lenses 24.

なお、カメラ本体２０とレンズ系２１との位置関係を調整すると、仮想カメラＣの画角を調整できる。また、図１では、４台の仮想カメラＣのうち、２台の仮想カメラＣのみを図示した。 By adjusting the positional relationship between the camera body 20 and the lens system 21, the angle of view of the virtual camera C can be adjusted. Further, in FIG. 1, only two virtual cameras C out of the four virtual cameras C are shown.

［三次元形状取得装置の構成］
図２を参照し、三次元形状取得装置３の構成について説明する。
三次元形状取得装置３は、ＲＧＢ－Ｄカメラ２が各視点で被写体９を撮影したＲＧＢ画像及びデプス画像を用いて、各視点のＲＧＢ画像に対応したデプスマップを生成するものである。図２に示すように、三次元形状取得装置３は、カメラ校正処理を行うカメラ校正手段４と、リファインメント処理を行うリファインメント手段５とを備える。 [Configuration of 3D shape acquisition device]
The configuration of the three-dimensional shape acquisition device 3 will be described with reference to FIG.
The three-dimensional shape acquisition device 3 generates a depth map corresponding to the RGB image of each viewpoint by using the RGB image and the depth image of the subject 9 taken by the RGB-D camera 2 at each viewpoint. As shown in FIG. 2, the three-dimensional shape acquisition device 3 includes a camera calibration means 4 for performing a camera calibration process and a refinement means 5 for performing a refinement process.

＜カメラ校正手段＞
カメラ校正手段４は、２種類のパラメータを推定する。一つ目は、仮想カメラＣのカメラパラメータである。仮想カメラＣのカメラパラメータは、レンズの焦点距離、レンズ歪み、仮想カメラＣの位置や姿勢など表す。二つ目は、各仮想カメラＣのスケール変換関数である。さらに、カメラ校正手段４は、必要に応じて、ＲＧＢ画像及びデプス画像の画角補正を行う。なお、カメラ校正手段４は、撮影の都度、カメラ校正処理を行う必要がなく、ＲＧＢ－Ｄカメラ２の焦点距離やＲＧＢ－Ｄカメラ２とフレネルレンズ２２とレンズアレイ２３との位置・姿勢の関係が変化したときにカメラ校正処理を行えばよい。 <Camera calibration means>
The camera calibration means 4 estimates two types of parameters. The first is the camera parameters of the virtual camera C. The camera parameters of the virtual camera C represent the focal length of the lens, the lens distortion, the position and orientation of the virtual camera C, and the like. The second is the scale conversion function of each virtual camera C. Further, the camera calibration means 4 corrects the angle of view of the RGB image and the depth image as necessary. The camera calibration means 4 does not need to perform camera calibration processing each time a photograph is taken, and is related to the focal length of the RGB-D camera 2 and the position / orientation of the RGB-D camera 2, the Frenel lens 22 and the lens array 23. The camera may be calibrated when the value changes.

図３（ａ）に示すように、カメラ校正手段４には、ＲＧＢ－Ｄカメラ２で校正パターン９０を撮影したＲＧＢ画像及びデプス画像が入力される。校正パターン９０は、平面状で特徴点の配置が既知のパターンである（例えば、チェスボードパターン）。このとき、ＲＧＢ－Ｄカメラ２は、校正パターン９０の姿勢を２回以上変更して撮影する（破線で図示）。なお、ＲＧＢ－Ｄカメラ２は、内部パラメータのスキューを０以外とする場合、校正パターン９０の姿勢を３回以上変更して撮影する。図３（ａ）に示すように、レンズ系２１を配置して撮影したＲＧＢ画像及びデプス画像を校正データＡと呼ぶ。前記した画角補正を行う場合、図３（ｂ）に示すように、レンズ系２１を外して校正パターン９０を撮影する。このように、レンズ系２１を外して撮影したＲＧＢ画像及びデプス画像を校正データＢと呼ぶ。 As shown in FIG. 3A, an RGB image and a depth image obtained by capturing the calibration pattern 90 with the RGB-D camera 2 are input to the camera calibration means 4. The calibration pattern 90 is a planar pattern in which the arrangement of feature points is known (for example, a chess board pattern). At this time, the RGB-D camera 2 changes the posture of the calibration pattern 90 two or more times to take a picture (shown by a broken line). When the skew of the internal parameter is set to other than 0, the RGB-D camera 2 changes the posture of the calibration pattern 90 three times or more to take a picture. As shown in FIG. 3A, the RGB image and the depth image taken by arranging the lens system 21 are referred to as calibration data A. When performing the above-mentioned angle of view correction, as shown in FIG. 3B, the lens system 21 is removed and the calibration pattern 90 is photographed. The RGB image and the depth image taken with the lens system 21 removed in this way are referred to as calibration data B.

図２に示すように、カメラ校正手段４は、画角補正手段４０と、画像分割手段４１と、初期カメラパラメータ算出手段４２と、カメラパラメータ最適化手段４３と、スケール変換関数算出手段（奥行き変換関数算出手段）４４とを備える。 As shown in FIG. 2, the camera calibration means 4 includes an angle of view correction means 40, an image division means 41, an initial camera parameter calculation means 42, a camera parameter optimization means 43, and a scale conversion function calculation means (depth conversion). The function calculation means) 44 is provided.

画角補正手段４０は、ＲＧＢ－Ｄカメラ２から入力されたデプス画像の画角がＲＧＢ画像の画角に一致するように、デプス画像を射影変換するものである。ＲＧＢ－Ｄカメラ２の取り付け精度に起因して、ＲＧＢカメラで撮影したＲＧＢ画像とデプスカメラで撮影したデプス画像との画角が微妙にずれることがある。このため、画角補正手段４０は、校正データＢを用いて、この微妙な画角のずれを補正する。具体的には、画角補正手段４０は、ＲＧＢ画像及びデプス画像の間で４点以上の対応点（校正パターン９０の特徴点）を基準として、ホモグラフィ行列を算出する（参考文献１）。そして、画角補正手段４０は、このホモグラフィ行列によりデプス画像を射影変換することで、デプス画像の画角をＲＧＢ画像の画角に一致させる。
なお、画角補正手段４０は、ＲＧＢカメラ及びデプスカメラの画角が一致している場合、前記した画角補正処理を行う必要がない。 The angle of view correction means 40 projects and converts the depth image so that the angle of view of the depth image input from the RGB-D camera 2 matches the angle of view of the RGB image. Due to the mounting accuracy of the RGB-D camera 2, the angle of view between the RGB image taken by the RGB camera and the depth image taken by the depth camera may be slightly different. Therefore, the angle of view correction means 40 uses the calibration data B to correct this delicate angle of view deviation. Specifically, the angle-of-view correction means 40 calculates a homography matrix based on four or more corresponding points (characteristic points of the calibration pattern 90) between the RGB image and the depth image (Reference 1). Then, the angle of view correction means 40 projects and transforms the depth image by this homography matrix to match the angle of view of the depth image with the angle of view of the RGB image.
The angle of view correction means 40 does not need to perform the above-mentioned angle of view correction process when the angle of view of the RGB camera and the depth camera match.

参考文献１：“ＯｐｅｎＣＶ”,［online］、［令和２年６月２４日検索］、インターネット〈URL：https://opencv.org/〉 Reference 1: "OpenCV", [online], [Search on June 24, 2nd year of Reiwa], Internet <URL: https://opencv.org/>

また、画角補正手段４０は、校正データＢを用いて、レンズ歪みを除去できる。例えば、画角補正手段４０は、Ｚｈａｎｇの手法により、ＲＧＢ－Ｄカメラ２のレンズ歪み係数を算出し、ＲＧＢ画像及びデプス画像からレンズ歪みを除去する（参考文献２）。 Further, the angle of view correction means 40 can remove the lens distortion by using the calibration data B. For example, the angle of view correction means 40 calculates the lens distortion coefficient of the RGB-D camera 2 by the method of Zhang, and removes the lens distortion from the RGB image and the depth image (Reference 2).

参考文献２：Z. Zhang, “A flexible new technique for camera calibration”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 11, pp. 1330-1334 (2000) Reference 2: Z. Zhang, “A flexible new technique for camera calibration”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 11, pp. 1330-1334 (2000)

画像分割手段４１は、画角補正手段４０から入力されたＲＧＢ画像及びデプス画像を視点（要素レンズ２４）毎に分割するものである。つまり、画像分割手段４１は、ＲＧＢ画像及びデプス画像を仮想カメラＣ毎に分割することで、仮想カメラＣで仮想的に撮影したＲＧＢ画像及びデプス画像を生成する。本実施形態では、画像分割手段４１は、図４（ａ）及び（ｂ）に示すように、ＲＧＢ画像Ｐ_Ｃ及びデプス画像Ｐ_Ｄを４分割する。 The image segmentation means 41 divides the RGB image and the depth image input from the angle of view correction means 40 for each viewpoint (element lens 24). That is, the image segmentation means 41 divides the RGB image and the depth image into each virtual camera C to generate the RGB image and the depth image virtually taken by the virtual camera C. In the present embodiment, the image segmentation means 41 divides the RGB image _{PC and the depth image P D} _into four as shown in FIGS. 4A and 4B.

なお、ＲＧＢ画像Ｐ_Ｃ及びデプス画像Ｐ_Ｄを分割する領域αは、手動で設定する。このとき、分割後のＲＧＢ画像Ｐ_Ｃ及びデプス画像Ｐ_Ｄでは、レンズアレイ２３の外側や要素レンズ２４同士の隙間が不要なので、これら不要領域を分割せずともよい。以後の説明を簡易にするため、分割後のＲＧＢ画像Ｐ_Ｃ及びデプス画像Ｐ_Ｄは、同一の画像サイズであることとする。 The area α for dividing the RGB image _{PC and the depth image P D} _is manually set. At this time, since the RGB image _CC and the depth image P _D after division do not require a gap on the outside of the lens array 23 or between the element lenses 24, it is not necessary to divide these unnecessary regions. For the sake of simplification of the following description, it is assumed that the RGB image _PC and the depth image _PD after division have the same image size.

初期カメラパラメータ算出手段４２は、画像分割手段４１から入力された各視点のＲＧＢ画像Ｐ_Ｃにカメラ校正処理を施すことで、各視点に対応した仮想カメラＣの初期カメラパラメータを算出するものである。例えば、初期カメラパラメータ算出手段４２は、各視点のＲＧＢ画像Ｐ_ＣにＺｈａｎｇの手法を適用し、各仮想カメラＣのカメラパラメータ及び各校正パターン９０の位置・姿勢が含まれる初期カメラパラメータを算出する。 The initial camera parameter calculation means 42 calculates the initial camera parameters of the virtual camera _C corresponding to each viewpoint by performing a camera calibration process on the RGB image PC of each viewpoint input from the image segmentation means 41. .. For example, the initial camera parameter calculation means 42 applies Zhang's method to the RGB image PC of each viewpoint, and calculates the camera parameters of each virtual camera _C and the initial camera parameters including the position / orientation of each calibration pattern 90. ..

カメラパラメータ最適化手段４３は、初期カメラパラメータ算出手段４２から入力された初期カメラパラメータを初期値としたカメラ校正処理により、各仮想カメラＣの間でカメラパラメータを最適化するものである。前記した初期カメラパラメータ算出手段４２では、各仮想カメラＣのカメラパラメータを個別に算出していたが、全ての仮想カメラＣの間でカメラパラメータを最適化することで、カメラパラメータの精度が向上する。 The camera parameter optimizing means 43 optimizes the camera parameters among the virtual cameras C by the camera calibration process using the initial camera parameters input from the initial camera parameter calculating means 42 as initial values. In the above-mentioned initial camera parameter calculation means 42, the camera parameters of each virtual camera C are calculated individually, but by optimizing the camera parameters among all the virtual cameras C, the accuracy of the camera parameters is improved. ..

ここで、校正パターン９０の位置・姿勢を共通のパラメータとする。最適化するカメラパラメータは、各仮想カメラＣのカメラパラメータと、共通化した校正パターン９０の位置・姿勢が含まれる。具体的には、カメラパラメータ最適化手段４３は、各仮想カメラＣのカメラパラメータ及び校正パターン９０の位置・姿勢の平均値を初期値として、初期カメラパラメータに含まれる仮想カメラＣの位置・姿勢を使用する。そして、カメラパラメータ最適化手段４３は、これら初期値をバンドル調整することでカメラパラメータを最適化する。 Here, the position and orientation of the calibration pattern 90 are set as common parameters. The camera parameters to be optimized include the camera parameters of each virtual camera C and the position / orientation of the common calibration pattern 90. Specifically, the camera parameter optimizing means 43 sets the position / posture of the virtual camera C included in the initial camera parameters as the initial value with the average value of the position / posture of the camera parameter of each virtual camera C and the calibration pattern 90 as the initial value. use. Then, the camera parameter optimizing means 43 optimizes the camera parameters by bundle-adjusting these initial values.

スケール変換関数算出手段４４は、カメラパラメータ最適化手段４３より入力されたカメラパラメータが示す仮想カメラＣの位置から校正パターン９０までの距離をデプス画像Ｐ_Ｄの各画素の画素値に対応させることで、スケール変換関数を算出するものである。すなわち、スケール変換関数算出手段４４は、デプス画像Ｐ_Ｄを実スケールのデプスマップに変換するためのスケール変換関数を算出する。前記したように、カメラパラメータにおいて、仮想カメラＣの位置・姿勢と校正パターン９０の位置・姿勢とが既知のため、仮想カメラＣから校正パターン９０までの距離ｒが実スケールで算出できる。 The scale conversion function calculation means 44 makes the distance from the position of the virtual camera _C indicated by the camera parameters input from the camera parameter optimization means 43 to the calibration pattern 90 correspond to the pixel value of each pixel of the depth image PD. , Calculates the scale conversion function. That is, the scale conversion function calculation means 44 calculates a scale conversion function for converting the depth image P _D into a depth map of an actual scale. As described above, since the position / posture of the virtual camera C and the position / posture of the calibration pattern 90 are known in the camera parameters, the distance r from the virtual camera C to the calibration pattern 90 can be calculated on an actual scale.

具体的には、スケール変換関数算出手段４４は、図５（ａ）に示すように、仮想カメラＣから校正パターン９０までの距離ｒと、デプス画像Ｐ_Ｄの各画素の輝度値ｑ（画素値）とを対応づける。このとき、デプス画像Ｐ_Ｄに含まれる校正パターン９０では、黒色模様の部分で反射率が低下するため、正確な対応付けが困難である。このため、スケール変換関数算出手段４４は、デプス画像Ｐ_Ｄに含まれる校正パターン９０の白色部分のみで対応付けを行うことが好ましい。ここで、スケール変換関数算出手段４４は、校正パターン９０を撮影した全てのデプス画像Ｐ_Ｄで対応付けを行うことで、図５（ｂ）に示すようにグラフが得られる。そして、スケール変換関数算出手段４４は、このグラフを関数（例えば、５次関数）で近似することで、スケール変換関数ｈ（ｑ）を算出できる。なお、スケール変換関数算出手段４４は、このグラフをスケール変換関数で近似せず、ルックアップデーブルとしてもよい。 Specifically, as shown in FIG. 5A, the scale conversion function calculation means 44 has a distance r from the virtual camera C to the calibration pattern 90 and a luminance value q (pixel value) of each pixel of the depth image P _D. ) And. At this time, in the calibration pattern 90 included in the depth image _PD , the reflectance is lowered in the black pattern portion, so that accurate mapping is difficult. Therefore, it is preferable that the scale conversion function calculation means 44 associates only with the white portion of the calibration pattern 90 included in the depth image _PD . Here, the scale conversion function calculation means 44 obtains a graph as shown in FIG. 5 (b) by associating all the depth images _PD in which the calibration pattern 90 is captured. Then, the scale conversion function calculation means 44 can calculate the scale conversion function h (q) by approximating this graph with a function (for example, a quintic function). The scale conversion function calculation means 44 may not approximate this graph with the scale conversion function and may use it as a lookup table.

その後、カメラ校正手段４は、算出したスケール変換関数をスケール変換手段５４に出力し、仮想カメラＣのカメラパラメータをコストボリューム生成手段５１及びウェイト適用手段５９に出力する。 After that, the camera calibration means 4 outputs the calculated scale conversion function to the scale conversion means 54, and outputs the camera parameters of the virtual camera C to the cost volume generation means 51 and the weight application means 59.

＜リファインメント手段＞
リファインメント手段５は、ＲＧＢ－Ｄカメラ２で被写体９を撮影したＲＧＢ画像Ｐ_Ｃ及びデプス画像Ｐ_Ｄが入力される。そして、リファインメント手段５は、デプス画像Ｐ_Ｄから生成したデプスマップに基づいて、ＲＧＢ画像Ｐ_Ｃから生成したコストボリュームを２つのウェイトで制約することで、デプスマップの精度を向上させる。なお、リファインメント手段５は、撮影の都度、リファインメント処理を行う。 <Refinement means>
The refinement means 5 inputs an RGB image _{PC and a depth image P D} _obtained by photographing the subject 9 with the RGB-D camera 2. Then, the refinement means 5 improves the accuracy of the depth map by constraining the cost volume generated from the _RGB image PC with two weights based on the depth map generated from the depth image _PD . The refinement means 5 performs a refinement process each time a photograph is taken.

図２に示すように、リファインメント手段５は、画像分割手段５０と、コストボリューム生成手段５１と、初期デプスマップ生成手段５２と、平滑化手段５３と、スケール変換手段（奥行き変換手段）５４と、レイヤ化処理手段５５と、スケール補正手段（中間デプスマップ補正手段）５６と、コストウェイト算出手段５７と、ビジビリティウェイト算出手段５８と、ウェイト適用手段５９と、最終デプスマップ生成手段６０とを備える。 As shown in FIG. 2, the refinement means 5 includes an image segmentation means 50, a cost volume generation means 51, an initial depth map generation means 52, a smoothing means 53, and a scale conversion means (depth conversion means) 54. , A layering processing means 55, a scale correction means (intermediate depth map correction means) 56, a cost weight calculation means 57, a visibility weight calculation means 58, a weight application means 59, and a final depth map generation means 60. ..

画像分割手段５０は、ＲＧＢ－Ｄカメラ２から入力されたＲＧＢ画像Ｐ_Ｃ及びデプス画像Ｐ_Ｄを視点毎に分割するものである。図６（ａ）及び（ｂ）に示すように、画像分割手段５０は、画像分割手段４１と同様、被写体９が撮影されたＲＧＢ画像Ｐ_Ｃ及びデプス画像Ｐ_Ｄを分割する。 The image segmentation means 50 divides the RGB image PC and the depth image P _D input from the RGB- _D camera 2 for each viewpoint. As shown in FIGS. 6A and 6B, the image segmentation means 50 divides the RGB image _PC and the depth image PD in which the subject 9 is captured, similarly to the image _segmentation means 41.

なお、図６では、レンズ系２１を介しているため、ＲＧＢ画像Ｐ_Ｃ及びデプス画像Ｐ_Ｄで被写体９が逆立像になっている。この場合、被写体９が正立像となるようにＲＧＢ画像Ｐ_Ｃ及びデプス画像Ｐ_Ｄに反転処理を施してもよい。 In FIG. 6, since the subject 9 is interposed in the lens system 21, the subject 9 is an upright image in the RGB image _PC and the depth image _PD . In this case, the RGB image _{PC and the depth image P D} _may be inverted so that the subject 9 becomes an upright image.

コストボリューム生成手段５１は、後記する奥行きレイヤ及びＲＧＢ画像Ｐ_Ｃの画素位置毎にコストを算出し、コストを奥行きレイヤ及び画素位置で三次元配列したコストボリュームを生成するものである。本実施形態では、コストボリューム生成手段５１は、コストボリュームを推定する手法の一つであるプレーンスイープ法を用いることとする（参考文献３）。 The cost volume generating means 51 calculates the cost for each pixel position of the depth layer and the RGB image PC described later, and generates a cost volume in which the cost is three _- dimensionally arranged in the depth layer and the pixel position. In the present embodiment, the cost volume generating means 51 uses the plain sweep method, which is one of the methods for estimating the cost volume (Reference 3).

参考文献３：David Gallup, et al. , "Real-time plane-sweeping stereo with multiple sweeping directions", IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8 (2007) Reference 3: David Gallup, et al., "Real-time plane-sweeping stereo with multiple sweeping directions", IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8 (2007)

まず、コストボリューム生成手段５１は、図７に示すように、被写体９が配置された空間中に、奥行き方向で所定間隔の奥行きレイヤＮ_Ｄを複数設定する。図７の例では、５つの奥行きレイヤＮ_Ｄが設定されている（Ｄ＝１，…，５）。なお、図７では、ｘ軸が水平方向、ｙ軸が垂直方向、ｚ軸が奥行方向を示す。次に、コストボリューム生成手段５１は、全ての仮想カメラＣのうち何れか１台をリファレンスカメラとして、このリファレンスカメラと、別のもう１台の仮想カメラＣとでカメラペアを設定する。そして、コストボリューム生成手段５１は、カメラペアを構成する各仮想カメラＣのＲＧＢ画像Ｐ_Ｃを射影変換により奥行きレイヤＮ_Ｄに投影する。さらに、コストボリューム生成手段５１は、奥行きレイヤＮ_Ｄに投影した２つのＲＧＢ画像Ｐ_Ｃの各画素の画素値の差分（例えば、ＳＡＤ：Sum of Absolute Difference）を求めることで、コストを算出する。このコストは、その奥行きレイヤＮ_Ｄに投影された２つのＲＧＢ画像Ｐ_Ｃの類似度を表し、その値が小さくなる程、その奥行きレイヤＮ_Ｄに被写体９の奥行きが存在する可能性が高いことを表す。 First, as shown in FIG. 7, the cost volume generating means 51 sets a plurality of depth layers _ND at predetermined intervals in the depth direction in the space where the subject 9 is arranged. In the example of FIG. 7, five depth layers ND are set ( _D = 1, ..., 5). In FIG. 7, the x-axis indicates the horizontal direction, the y-axis indicates the vertical direction, and the z-axis indicates the depth direction. Next, the cost volume generation means 51 uses any one of all the virtual cameras C as a reference camera, and sets a camera pair with this reference camera and another virtual camera C. Then, the cost volume generation means 51 projects the RGB image CC of each virtual camera _C constituting the camera pair onto the depth layer _ND by projective transformation. Further, the cost volume generating means 51 calculates the cost by obtaining the difference (for example, SAD: Sum of Absolute Difference) of the pixel value of each pixel of the two RGB image _PCs _projected on the depth layer ND. This cost represents the similarity between the two _RGB images PC projected on the depth layer _ND , and the smaller the value, the higher the possibility that the depth of the subject 9 exists in the depth layer _ND . Represents.

コストボリューム生成手段５１は、前記した処理を全ての奥行きレイヤＮ_Ｄで行うことで、コストボリュームを生成できる。図８に示すように、ＲＧＢ画像Ｐ_ＣのサイズをＵ×Ｖ画素とすると、コストボリューム９１は、Ｕ×Ｖ×Ｎ_Ｄのコストの３次元配列となる。また、コストボリューム９１では、同一画素位置で奥行方向に配列されたコストをコスト列９２とする。つまり、コスト列９２は、１×１×Ｎ_Ｄのコストの３次元配列となる。そして、コストボリューム生成手段５１は、リファレンスカメラのＲＧＢ画像Ｐ_Ｃをガイドとして、ガイデッドフィルタをコストボリューム９１に適用する（参考文献４）。これにより、エッジを保持したままコストボリューム９１を平滑化できるため、コストボリューム９１のノイズを低減できる。 The cost volume generating means 51 can generate a cost volume by performing the above- _mentioned processing in all the depth layers ND. As shown in FIG. 8, assuming that the size of the RGB image PC is U × V pixels, the cost volume 91 is a three _- dimensional array of costs of U × V × N _D. Further, in the cost volume 91, the costs arranged in the depth direction at the same pixel position are referred to as the cost column 92. That is, the cost column 92 is a three-dimensional _array of costs of 1 × 1 × ND. Then, the cost volume generating means 51 applies the guided filter to the cost volume 91 using the _RGB image PC of the reference camera as a guide (Reference 4). As a result, the cost volume 91 can be smoothed while holding the edge, so that the noise of the cost volume 91 can be reduced.

参考文献４：Kaiming He, Sun Jian, and Tang Xiaoou, "Guided image filtering", European conference on computer vision. Springer, pp. 1-10, (2010) Reference 4: Kaiming He, Sun Jian, and Tang Xiaoou, "Guided image filtering", European conference on computer vision. Springer, pp. 1-10, (2010)

リファレンスカメラの周辺にある仮想カメラＣの集合をＳとすると、集合の要素数｜Ｓ｜だけカメラペアを設定できる。このとき、コストボリューム９１もカメラペアと同数できる。例えば、仮想カメラＣが４台の場合、１台のリファレンスカメラに対して、カメラペアが３つとなり、コストボリューム９１も３つとなる。例えば、仮想カメラＣ_１がリファレンスカメラの場合、カメラペアが（Ｃ_１，Ｃ_２）、（Ｃ_１，Ｃ_３）、（Ｃ_１，Ｃ_４）となる。 Assuming that the set of virtual cameras C around the reference camera is S, the camera pair can be set by the number of elements | S | of the set. At this time, the cost volume 91 can be the same as that of the camera pair. For example, when there are four virtual cameras C, there are three camera pairs and three cost volumes 91 for one reference camera. For example, when the virtual camera C ₁ is a reference camera, the camera pairs are (C ₁ , C ₂ ), (C ₁ , C ₃ ), and (C ₁ , C ₄ ).

初期デプスマップ生成手段５２は、コストボリューム生成手段５１から入力されたコストボリューム９１で同一画素位置のコスト列９２において、コストが最小となる奥行きレイヤＮ_Ｄのデプスを示す初期デプスマップを生成するものである。 The initial depth map generation means 52 generates an initial depth map showing the depth of the depth layer _ND that minimizes the cost in the cost column 92 at the same pixel position with the cost volume 91 input from the cost volume generation means 51. Is.

ここで、初期デプスマップ生成手段５２は、１台のリファレンスカメラに対して複数のコストボリューム９１が存在するため、各コストボリューム９１の総和をリファレンスカメラの最終的なコストボリューム９１として求める。そして、初期デプスマップ生成手段５２は、各コスト列９２で最小のコストを有する奥行きレイヤＮ_Ｄを正しいデプスとして求め、リファレンスカメラの初期デプスマップＤ^Ｃを生成する。 Here, since the initial depth map generation means 52 has a plurality of cost volumes 91 for one reference camera, the sum of the cost volumes 91 is obtained as the final cost volume 91 of the reference camera. Then, the initial depth map generation means 52 obtains the depth layer N _D having the minimum cost in each cost column 92 as the correct depth, and generates the initial depth map ^DC of the reference camera.

その後、初期デプスマップ生成手段５２は、初期デプスマップＤ^Ｃをスケール補正手段５６に出力し、最終的なコストボリューム９１をウェイト適用手段５９に出力する。 After that, the initial depth map generation means 52 outputs the initial depth map ^DC to the scale correction means 56, and outputs the final cost volume 91 to the weight application means 59.

平滑化手段５３は、画像分割手段５０から入力したデプス画像Ｐ_Ｄを平滑化するものである。ここで、平滑化手段５３は、デプスカメラのショットノイズなどのノイズがデプス画像Ｐ_Ｄに含まれるため、このデプス画像Ｐ_Ｄをフィルタ処理により平滑化する。例えば、フィルタ処理として、ガイデッドフィルタがあげられる。このガイデッドフィルタは、平滑化フィルタの一種であり、ガイド画像を用いて対象の画像を平滑化する。ここでは、ガイド画像として、ＲＧＢ画像Ｐ_Ｃを用いる。 The smoothing means 53 _smoothes the depth image PD input from the image segmentation means 50. Here, the smoothing means 53 smoothes the depth image P _D by filtering because noise such as shot noise of the depth camera is included in the depth image P _D. For example, as a filtering process, a guided filter can be mentioned. This guided filter is a kind of smoothing filter and smoothes a target image by using a guide image. Here, an RGB image _PC is used as the guide image.

なお、フィルタ処理によりノイズを除去できる一方、過度な平滑化によりデプス画像Ｐ_Ｄの精度が低下する可能性がある。このため、平滑化手段５３は、必要に応じでフィルタ処理を実行すればよい。 While noise can be removed by filtering, the accuracy of the depth image _PD may decrease due to excessive smoothing. Therefore, the smoothing means 53 may execute the filtering process as needed.

スケール変換手段５４は、デプス画像Ｐ_Ｄの各画素の画素値を実スケールのデプスに変換するスケール変換関数により、デプス画像Ｐ_Ｄを中間デプスマップに変換するものである。本実施形態では、スケール変換手段５４は、スケール変換関数算出手段４４から入力されたスケール変換関数により、平滑化手段５３から入力されたデプス画像Ｐ_Ｄを実スケールのデプスマップへと変換する。なお、スケール変換手段５４は、ＲＧＢ－Ｄカメラ２のメーカからスケール変換関数が提供される場合、これを使用してもよい。 The scale conversion means 54 converts the depth image P _D into an intermediate depth map by a scale conversion function that converts the pixel value of each pixel of the depth image P _D into the depth of the actual scale. In the present embodiment, the scale conversion means 54 converts the depth image _PD input from the smoothing means 53 into an actual scale depth map by the scale conversion function input from the scale conversion function calculation means 44. The scale conversion means 54 may use the scale conversion function when the manufacturer of the RGB-D camera 2 provides the scale conversion function.

レイヤ化処理手段５５は、スケール変換手段５４から入力された中間デプスマップのデプスを最も近い奥行きレイヤＮ_Ｄのデプスに置き換えるレイヤ化処理を施すものである。具体的には、レイヤ化処理手段５５は、カメラパラメータが既知のため、実スケールの中間デプスマップを３次元点群化できる。ここで、レイヤ化処理手段５５は、中間デプスマップがカメラ座標系における光軸方向（一般的にはz方向）の距離ではなく、光学中心からの距離を表している場合、その距離を考慮して３次元点群化する。そして、レイヤ化処理手段５５は、各点のデプスを最も近い奥行きレイヤＮ_Ｄの所属とすることで、中間デプスマップを奥行きレイヤＮ_Ｄで表現する。以後、レイヤ化処理を施した中間デプスマップをＤ^Ｄとする。 The layering processing means 55 performs layering processing in which the depth of the intermediate depth map input from the scale conversion means 54 is replaced with the depth of the nearest depth layer _ND . Specifically, since the layering processing means 55 has known camera parameters, it is possible to form a real-scale intermediate depth map into a three-dimensional point cloud. Here, the layering processing means 55 considers the distance from the optical center when the intermediate depth map represents the distance from the optical center instead of the distance in the optical axis direction (generally the z direction) in the camera coordinate system. 3D point cloud. Then, the _layering processing means 55 expresses the intermediate depth map by the depth layer ND by making the depth of each point belong to the nearest depth layer _ND . Hereinafter, the intermediate depth map that has been layered is referred to as ^DD .

スケール補正手段５６は、初期デプスマップＤ^Ｃと中間デプスマップＤ^Ｄとのデプス差が閾値以下の画素について、各奥行きレイヤＮ_Ｄでデプス差の平均を補正値として求め、中間デプスマップＤ^Ｄのデプスを補正値で補正するものである。つまり、スケール補正手段５６は、スケール変換関数の精度が低い場合、デプス画像Ｐ_Ｄから生成した中間デプスマップＤ^ＤをＲＧＢ画像Ｐ_Ｃから生成した初期デプスマップＤ^Ｃに合わせるように補正する。 The scale correction means 56 obtains the average of the depth differences in each depth layer N _D for the pixels whose depth difference between the initial depth map DC and the intermediate depth map ^{D D} ^is equal to or less than the threshold value, and obtains the average of the depth differences as the correction value of the intermediate depth map D ^D. The depth is corrected by the correction value. That is, when the accuracy of the scale conversion function is low, the scale correction means 56 ^corrects the intermediate depth map _DD generated from the depth image ^PD to match the initial depth map DC generated from the _RGB image PC.

具体的には、スケール補正手段５６は、初期デプスマップＤ^Ｃと中間デプスマップＤ^Ｄとの各画素のデプス差Ｄ^Ｓｕｂ＝Ｄ^Ｃ－Ｄ^Ｄを算出する。次に、スケール補正手段５６は、｜Ｄ^Ｓｕｂ｜≦ｔｈｒｅｓｏｌｄを満たす画素のみを対象として、初期デプスマップＤ^Ｃの各デプスｄ（ｄ＝１，２，・・・，Ｎ_Ｄ）でデプス差Ｄ^Ｓｕｂの平均を算出し、補正値とする。なお、閾値ｔｈｒｅｓｏｌｄは手動で設定する。そして、スケール補正手段５６は、Ｄ^Ｄ _Ｎｅｗ＝Ｄ^Ｄ _Ｏｌｄ＋Ｄ^Ｃｏｒのように、補正前の中間デプスマップＤ^Ｄ _Ｏｌｄに補正デプス値Ｄ^Ｃｏｒを適用し、補正後の中間デプスマップＤ^Ｄ _Ｎｅｗを求める（以後、中間デプスマップＤ^Ｄ）。
なお、スケール補正手段５６は、スケール変換関数の精度が高い場合、処理を行わなくともよい。 Specifically, the scale correction means 56 calculates the depth difference ^D ^Sub = DC ^−DD of each pixel between the initial depth map DC and the intermediate depth map ^D ^D. Next, the scale correction means 56 targets only the pixels satisfying | D ^Sub | ≤thold, and the depth difference ^D at each depth d (d = 1, 2, ..., N _D ) of the initial depth map DC. The average of ^Sub is calculated and used as a correction value. The threshold threshold is set manually. Then, the scale correction means 56 applies the correction depth value D ^Cor to the intermediate depth map D ^D _Old before correction, such as D ^D _New = D ^D _Old + ^{DC Cor} , and obtains the corrected intermediate depth map D ^D _New . Find (hereafter, intermediate depth map ^DD ).
The scale correction means 56 does not have to perform processing when the accuracy of the scale conversion function is high.

コストウェイト算出手段５７は、スケール補正手段５６から入力された中間デプスマップＤ^Ｄの重みを正規分布関数で表したコストウェイトＷ^Ｃを算出するものである。前記したように、コストボリューム９１は、ＲＧＢ画像Ｐ_Ｃのみから生成されており、デプスマップを考慮していない。そこで、中間デプスマップＤ^Ｄから算出したコストウェイトＷ^Ｃをコストボリューム９１に適用することで、ＲＧＢ画像Ｐ_Ｃとデプスマップとの両方が考慮されたコストボリューム９１となる。 The cost weight calculation means 57 calculates the cost weight ^WC in which the weights of the intermediate depth maps ^DD input from the scale correction means 56 are represented by a normal distribution function. As described above, the cost volume 91 is generated only from the _RGB image PC and does not consider the depth map. Therefore, by applying the cost weight ^WC calculated from the intermediate depth map ^DD to the cost volume 91, the cost volume 91 is obtained in consideration of both the RGB image _PC and the depth map.

コストウェイトＷ^Ｃは、中間デプスマップＤ^Ｄが正しいデプス値を有する可能性が高いとして、そのデプスのウェイトを最小値とした正規分布で表す。図９に示すように、正規分布の最大値を１とし、奥行きレイヤｄの正規分布関数ｇ（ｄ）を以下の式（１）で定義する。 The cost weight ^WC is expressed as a normal distribution with the weight of the depth as the minimum value, assuming that the intermediate depth map ^DD is likely to have the correct depth value. As shown in FIG. 9, the maximum value of the normal distribution is set to 1, and the normal distribution function g (d) of the depth layer d is defined by the following equation (1).

ここで、μは平均、σ^２は分散、σは標準偏差を表す。この正規分布関数ｇ（ｄ）を用いてコストウェイト関数ｆ_Ｃ（ｄ）を以下の式（２）で定義する。なお、ａ_ｃは、コストウェイトＷ^Ｃを決めるパラメータである。また、図１０（ａ）に示すように、式（２）の正規分布関数ｇ（ｄ）において、平均μが中間デプスマップＤ^Ｄの画素（ｕ，ｖ)のデプス値Ｄ^Ｄ（ｕ，ｖ)の平均を表し、分散σ^２がコストウェイト関数ｆ_Ｃ（ｄ）の設計方針に応じて予め設定される（例えば、σ^２＝Ｎ_Ｄ／３）。 Here, μ represents the mean, σ ² represents the variance, and σ represents the standard deviation. Using this normal distribution function g (d), the cost weight function f _C (d) is defined by the following equation (2). Note that a _c is a parameter that determines the cost weight ^WC . Further, as shown in FIG. 10A, in the normal distribution function g (d) of the equation (2), the depth values ^DD (u, v) of the pixels (u, v) whose average μ is the intermediate depth map ^DD . ), And the variance σ ² is preset according to the design policy of the cost weight function f _C (d) (for example, σ ² = N _D / 3).

コストウェイトＷ^Ｃは、コストボリューム９１と同一サイズの３次元配列となる。そして、コストウェイトＷ^Ｃの各要素には、以下の式（３）に示すように、コストウェイト関数ｆ_Ｃ（ｄ）の値が入る。以上より、コストウェイト算出手段５７は、式（３）を用いて、コストウェイトＷ^Ｃを算出する。 The cost weight ^WC is a three-dimensional array having the same size as the cost volume 91. Then, as shown in the following equation (3), the value of the cost weight function f _C (d) is input to each element of the cost weight ^WC . From the above, the cost weight calculation means 57 calculates the cost weight ^WC using the equation (3).

ビジビリティウェイト算出手段５８は、コストウェイト算出手段５７から入力された中間デプスマップＤ^Ｄから、オクルージョン発生時にコストを低下させるビジビリティウェイトＷ^Ｖを算出するものである。 The visibility weight calculation means 58 calculates the visibility weight ^WV that reduces the cost when occlusion occurs from the intermediate depth map ^DD input from the cost weight calculation means 57.

ここで、コストボリューム９１を生成したときにオクルージョンが考慮されておらず、オクルージョンが発生した部分のコストがノイズとなり、前記したレイヤ化処理でもエラーが発生している。複数のカメラペアでコストボリューム９１の総和を求めた場合でも、このエラーは同様に発生する。なお、オクルージョンとは、一方の仮想カメラＣで見え、かつ、他方の仮想カメラＣで見えない領域が発生することである。 Here, occlusion is not taken into consideration when the cost volume 91 is generated, the cost of the portion where occlusion occurs becomes noise, and an error occurs even in the layering process described above. This error also occurs when the sum of the cost volumes 91 is calculated for a plurality of camera pairs. Note that occlusion means that a region that can be seen by one virtual camera C and cannot be seen by the other virtual camera C is generated.

その一方、中間デプスマップＤ^Ｄは、１台のデプスカメラから生成されているため、オクルージョンの影響を受けない。そこで、ビジビリティウェイト算出手段５８は、オクルージョンの影響を緩和する（オクルージョンが発生した部分のコストを低下させる）ため、中間デプスマップＤ^ＤからビジビリティウェイトＷ^Ｖを算出する。 On the other hand, since the intermediate depth map ^DD is generated from one depth camera, it is not affected by occlusion. Therefore, the visibility weight calculation means 58 calculates the visibility weight ^WV from the intermediate depth map ^DD in order to mitigate the influence of occlusion (reduce the cost of the portion where occlusion occurs).

図１０（ｂ）に示すように、ビジビリティウェイト関数ｆ_Ｖ（ｄ）を以下の式（４）で定義する。なお、ａ_Ｖは、ビジビリティウェイトＷ^Ｖを決めるパラメータである。式（４）の正規分布関数ｇ（ｄ）において、平均μは、デプス値Ｄ^Ｄ（ｕ，ｖ)の平均に定数ｓｈｉｆｔを加えた値Ｄ^Ｄ（ｕ，ｖ)＋ｓｈｉｆｔを表す（但し、ｓｈｉｆｔ≧０）。また、分散σ^２は、ビジビリティウェイト関数ｆ_Ｖ（ｄ）の設計方針に応じて予め設定される（例えば、σ^２＝Ｎ_Ｄ／１０）。定数ｓｈｉｆｔの値を大きくすることで、中間デプスマップＤ^Ｄに誤差が存在しても許容される一方、ビジビリティウェイトＷ^Ｖの効果が小さくなる。 As shown in FIG. 10 (b), the visibility weight function f _V (d) is defined by the following equation (4). Note that a _V is a parameter that determines the visibility weight W ^V. In the normal distribution function g (d) of the equation (4), the mean μ represents the value ^DD (u, v) + shift obtained by adding the constant shift to the mean of the depth values ^DD (u, v) (however, shift). ≧ 0). Further, the variance σ ² is set in advance according to the design policy of the visibility weight function f _V (d) (for example, σ ² = N _D / 10). By increasing the value of the constant shift, it is permissible even if there is an error in the intermediate depth map ^DD , but the effect of the visibility weight ^WV becomes smaller.

ビジビリティウェイトＷ^Ｖは、コストボリューム９１と同一サイズの３次元配列となる。そして、ビジビリティウェイトＷ^Ｖの各要素には、以下の式（５）に示すように、ビジビリティウェイト関数ｆ_Ｖ（ｄ）の値が入る。以上より、ビジビリティウェイト算出手段５８は、式（５）のビジビリティウェイトＷ^Ｖを算出する。 The visibility weight ^WV is a three-dimensional array having the same size as the cost volume 91. Then, as shown in the following equation (5), the value of the visibility weight function f ^V (d) is input to each element of the visibility weight _WV . From the above, the visibility weight calculation means 58 calculates the visibility weight ^WV of the equation (5).

ウェイト適用手段５９は、初期デプスマップ生成手段５２から入力されたコストボリューム９１にコストウェイトＷ^Ｃ及びビジビリティウェイトＷ^Ｖを適用するものである。ここで、最終的なコストボリュームＥ^Ｓは、リファレンスカメラＣとして、全てのカメラペアで統合したコストボリューム９１である。つまり、ウェイト適用手段５９は、以下の式（６）に示すように、リファレンスカメラのコストウェイトＷ^Ｃ（ｘ，ｙ，ｚ）、コストボリュームＥ_ｊ、ビジビリティウェイトＷ^Ｖにより、最終的なコストボリュームＥ^Ｓを算出する。 The weight applying means 59 applies the cost weight ^WC and the visibility weight ^WV to the cost volume 91 input from the initial depth map generating means 52. Here, the final cost volume ^ES is the cost volume 91 integrated in all camera pairs as the reference camera C. That is, as shown in the following equation (6), the weight applying means 59 has a final cost volume due to the cost weight ^WC (x, y, _z ), the cost volume EJ, and the visibility weight ^WV of the reference camera. Calculate ^ES .

なお、コストボリュームＥ_ｊは、リファレンスカメラＣと周囲のカメラ集合Ｓに含まれる仮想カメラＣ_ｊ（ｊ∈Ｓ）とのコストボリューム９１である。また、ｗａｒｐは、仮想カメラＣ_ｊからリファレンスカメラＣへの各奥行きレイヤＮ_Ｄを平面とした射影変換を表す。 The cost volume E _j is a cost volume 91 of the reference camera C and the virtual camera C _j (j ∈ S) included in the surrounding camera set S. Further, warp represents a projective transformation with each depth layer ND as a plane from the virtual camera _C _j to the reference camera C.

最終デプスマップ生成手段６０は、ウェイト適用手段５９から入力されたコストボリューム９１で同一画素位置のコスト列９２において、コストが最小となる奥行きレイヤＮ_Ｄのデプスを示す最終デプスマップを生成するものである。つまり、最終デプスマップ生成手段６０は、各コスト列９２で最小のコストを有する奥行きレイヤＮ_Ｄを正しいデプスとして求め、最終的なデプスマップを生成する。
なお、最終デプスマップ生成手段６０は、初期デプスマップ生成手段５２と同様の手法で最終的なデプスマップを生成するため、これ以上の説明を省略する。 The final depth map generation means 60 generates a final depth map showing the depth of the depth layer _ND that minimizes the cost in the cost column 92 at the same pixel position with the cost volume 91 input from the weight application means 59. be. That is, the final depth map generation means 60 obtains the depth layer _ND having the lowest cost in each cost column 92 as the correct depth, and generates the final depth map.
Since the final depth map generation means 60 generates the final depth map by the same method as the initial depth map generation means 52, further description thereof will be omitted.

その後、リファインメント手段５は、各視点のＲＧＢ画像Ｐ_Ｃ及び最終的なデプスマップと、カメラ校正手段４から入力された仮想カメラＣのカメラパラメータとをセットで出力する。 After that, the refinement means 5 outputs the RGB image PC of each viewpoint, the final depth map, and the camera parameters of the virtual camera _C input from the camera calibration means 4 as a set.

［カメラ校正処理］
図１１を参照し、カメラ校正処理について説明する。
図１１に示すように、ステップＳ１において、画角補正手段４０は、ＲＧＢ－Ｄカメラ２から入力されたデプス画像Ｐ_Ｄの画角がＲＧＢ画像Ｐ_Ｃの画角に一致するように、デプス画像Ｐ_Ｄを射影変換する。なお、ステップＳ１の処理は、必須でないため破線で図示した。 [Camera calibration process]
The camera calibration process will be described with reference to FIG.
As shown in FIG. 11, in step S1, the angle-of-view correction means 40 uses the depth image so that the angle of view of the depth image PD input from the RGB _- _D camera 2 matches the angle of view of the RGB image PC. _Projective conversion of PD. Since the process of step S1 is not essential, it is shown by a broken line.

ステップＳ２において、画像分割手段４１は、ＲＧＢ画像Ｐ_Ｃ及びデプス画像Ｐ_Ｄを視点毎に分割する。
ステップＳ３において、初期カメラパラメータ算出手段４２は、各視点のＲＧＢ画像Ｐ_Ｃにカメラ校正処理を施すことで、各視点に対応した仮想カメラＣの初期カメラパラメータを算出する。
ステップＳ４において、カメラパラメータ最適化手段４３は、初期カメラパラメータを初期値としたカメラ校正処理により、各仮想カメラＣの間でカメラパラメータを最適化する。
ステップＳ５において、スケール変換関数算出手段４４は、カメラパラメータが示す仮想カメラＣの位置から校正パターンまでの距離をデプス画像Ｐ_Ｄの各画素の画素値に対応させることで、スケール変換関数を算出する。 In step S2, the image segmentation means 41 divides the RGB image _PC and the depth image _PD for each viewpoint.
In step S3, the initial camera parameter calculation means 42 calculates the initial camera parameters of the virtual camera _C corresponding to each viewpoint by performing camera calibration processing on the RGB image PC of each viewpoint.
In step S4, the camera parameter optimizing means 43 optimizes the camera parameters among the virtual cameras C by the camera calibration process with the initial camera parameters as the initial values.
In step S5, the scale conversion function calculation means 44 calculates the scale conversion function by making the distance from the position of the virtual camera _C indicated by the camera parameters to the calibration pattern correspond to the pixel value of each pixel of the depth image PD. ..

［リファインメント処理］
図１２を参照し、リファインメント処理について説明する。
図１２に示すように、ステップＳ１０において、画像分割手段５０は、ＲＧＢ画像Ｐ_Ｃ及びデプス画像Ｐ_Ｄを仮想カメラＣ毎に分割する。
ステップＳ１１において、コストボリューム生成手段５１は、奥行きレイヤ及びＲＧＢ画像Ｐ_Ｃの画素毎にコストを算出し、コストの三次元配列であるコストボリューム９１を生成する。 [Refinement processing]
The refinement process will be described with reference to FIG.
As shown in FIG. 12, in step S10, the image segmentation means 50 divides the RGB image PC and the depth image _PD into each virtual camera _C.
In step S11, the cost volume generating means 51 calculates the cost for each pixel of the depth layer and the RGB image PC, and generates the cost volume 91 which is a three _- dimensional array of costs.

ステップＳ１２において、初期デプスマップ生成手段５２は、コストボリューム９１で同一画素位置のコスト列９２において、コストが最小となる奥行きレイヤのデプスを示す初期デプスマップを生成する。
なお、ステップＳ１１，Ｓ１２の処理と、後記するステップＳ１３～Ｓ１８の処理は、並列で実行できる。 In step S12, the initial depth map generation means 52 generates an initial depth map showing the depth of the depth layer that minimizes the cost in the cost column 92 at the same pixel position with the cost volume 91.
The processes of steps S11 and S12 and the processes of steps S13 to S18 described later can be executed in parallel.

ステップＳ１３において、平滑化手段５３は、デプス画像Ｐ_Ｄを平滑化する。
ステップＳ１４において、スケール変換手段５４は、デプス画像Ｐ_Ｄの各画素の画素値を実スケールのデプスに変換するスケール変換関数により、デプス画像Ｐ_Ｄを中間デプスマップに変換する。
ステップＳ１５において、レイヤ化処理手段５５は、中間デプスマップのデプスを最も近い奥行きレイヤのデプスに置き換えるレイヤ化処理を施す。 In step S13, the smoothing means 53 _smoothes the depth image PD.
In step S14, the scale conversion means 54 converts the depth image P _D into an intermediate depth map by a scale conversion function that converts the pixel value of each pixel of the depth image P _D into the depth of the actual scale.
In step S15, the layering processing means 55 performs a layering process of replacing the depth of the intermediate depth map with the depth of the nearest depth layer.

ステップＳ１６において、スケール補正手段５６は、初期デプスマップＤ^Ｃと中間デプスマップＤ^Ｄとのデプス差が閾値以下の画素について、各奥行きレイヤＮ_Ｄでデプス差の平均を補正値として求め、中間デプスマップＤ^Ｄのデプスを補正値で補正する。なお、ステップＳ１６の処理は、必須でないため破線で図示した。
ステップＳ１７において、コストウェイト算出手段５７は、中間デプスマップＤ^Ｄの重みを正規分布関数で表したコストウェイトＷ^Ｃを算出する。
ステップＳ１８において、ビジビリティウェイト算出手段５８は、中間デプスマップＤ^Ｄから、オクルージョン発生時にコストを低下させるビジビリティウェイトＷ^Ｖを算出する。 In step S16, the scale correction means 56 obtains the average of the depth differences in each depth layer N ^D as a correction value for the pixels whose depth difference between the initial depth map _DC and the intermediate depth map ^DD is equal to or less than the threshold value, and obtains the intermediate depth. Correct the depth of the map ^DD with the correction value. Since the process of step S16 is not essential, it is shown by a broken line.
In step S17, the cost weight calculation means 57 calculates the cost weight ^WC in which the weights of the intermediate depth maps ^DD are represented by a normal distribution function.
In step S18, the visibility weight calculation means 58 calculates the visibility weight ^WV that reduces the cost when occlusion occurs from the intermediate depth map ^DD .

ステップＳ１９において、ウェイト適用手段５９は、コストウェイトＷ^Ｃ及びビジビリティウェイトＷ^Ｖをコストボリューム９１に適用する。
ステップＳ２０において、最終デプスマップ生成手段６０は、コストボリューム９１で同一画素位置のコスト列９２において、コストが最小となる奥行きレイヤＮ_Ｄのデプスを示す最終デプスマップを生成する。 In step S19, the weight applying means 59 applies the cost weight ^WC and the visibility weight ^WV to the cost volume 91.
In step S20, the final depth map generation means 60 generates a final depth map showing the depth of the depth layer _ND that minimizes the cost in the cost column 92 at the same pixel position with the cost volume 91.

［作用・効果］
以上のように、三次元形状取得システム１は、複数視点のＲＧＢ画像Ｐ_Ｃ及び高精度なデプスマップと、仮想カメラＣのカメラパラメータとを容易に取得できる。すなわち、三次元形状取得システム１は、簡易なシステム構成を実現し、複数視点分のＲＧＢ画像Ｐ_Ｃ及び高精度なデプスマップと、仮想カメラＣのカメラパラメータとを提供できる。これらデータは、様々なアプリケーションで利用可能である。例えば、三次元画像を生成する場合、密な多視点ＲＧＢ画像が必要になる。三次元形状取得システム１が提供するデータは、仮想カメラＣのカメラパラメータや高精度なデプスマップを含んでいるため、簡単な処理で三次元画像を生成できる。 [Action / Effect]
As described above, the three-dimensional shape acquisition system 1 can easily acquire the RGB image _CC of a plurality of viewpoints, the highly accurate depth map, and the camera parameters of the virtual camera C. That is, the three-dimensional shape acquisition system 1 can realize a simple system configuration, and can provide RGB image PCs for a plurality of viewpoints, a highly accurate depth map, and camera parameters of a virtual camera _C. These data are available in various applications. For example, when generating a three-dimensional image, a dense multi-viewpoint RGB image is required. Since the data provided by the three-dimensional shape acquisition system 1 includes the camera parameters of the virtual camera C and the highly accurate depth map, a three-dimensional image can be generated by a simple process.

以上、本発明の実施形態を詳述してきたが、本発明はこれに限られるものではなく、本発明の要旨を逸脱しない範囲の設計変更等も含まれる。 Although the embodiments of the present invention have been described in detail above, the present invention is not limited to this, and includes design changes and the like within a range that does not deviate from the gist of the present invention.

前記した実施形態では、デプスカメラがＴｏＦカメラであることとして説明したが、これに限定されない。例えば、デプスカメラがステレオカメラであってもよい。 In the above-described embodiment, the depth camera is described as a ToF camera, but the present invention is not limited to this. For example, the depth camera may be a stereo camera.

本発明は、コンピュータが備えるＣＰＵ、メモリ、ハードディスク等のハードウェア資源を、前記した三次元形状取得装置として動作させるプログラムで実現することもできる。これらのプログラムは、通信回線を介して配布してもよく、ＣＤ－ＲＯＭやフラッシュメモリ等の記録媒体に書き込んで配布してもよい。 The present invention can also be realized by a program that operates the hardware resources such as the CPU, memory, and hard disk of the computer as the above-mentioned three-dimensional shape acquisition device. These programs may be distributed via a communication line, or may be written and distributed on a recording medium such as a CD-ROM or a flash memory.

１三次元形状取得システム（デプスマップ生成システム）
２ＲＧＢ－Ｄカメラ（撮影装置）
２０カメラ本体
２１レンズ系
２２フレネルレンズ
２３レンズアレイ
２４要素レンズ
２５赤外線ＬＥＤアレイ
３三次元形状取得装置（デプスマップ生成装置）
４カメラ校正手段
４０画角補正手段
４１画像分割手段
４２初期カメラパラメータ算出手段
４３カメラパラメータ最適化手段
４４スケール変換関数算出手段（奥行き変換関数算出手段）
５リファインメント手段
５０画像分割手段
５１コストボリューム生成手段
５２初期デプスマップ生成手段
５３平滑化手段
５４スケール変換手段（奥行き変換手段）
５５レイヤ化処理手段
５６スケール補正手段（中間デプスマップ補正手段）
５７コストウェイト算出手段
５８ビジビリティウェイト算出手段
５９ウェイト適用手段
６０最終デプスマップ生成手段
９被写体
９０校正パターン
９１コストボリューム
９２コスト列
Ｃ仮想カメラ
Ｄ^Ｃ初期デプスマップ
Ｄ^Ｄ中間デプスマップ
Ｎ_Ｄ奥行きレイヤ 1 3D shape acquisition system (depth map generation system)
2 RGB-D camera (shooting device)
20 Camera body 21 Lens system 22 Fresnel lens 23 Lens array 24 Element lens 25 Infrared LED array 3 Three-dimensional shape acquisition device (depth map generator)
4 Camera calibration means 40 Angle of view correction means 41 Image division means 42 Initial camera parameter calculation means 43 Camera parameter optimization means 44 Scale conversion function calculation means (depth conversion function calculation means)
5 Refinement means 50 Image segmentation means 51 Cost volume generation means 52 Initial depth map generation means 53 Smoothing means 54 Scale conversion means (depth conversion means)
55 Layering processing means 56 Scale correction means (intermediate depth map correction means)
57 Cost weight calculation means 58 Visibility weight calculation means 59 Weight application means 60 Final depth map generation means 9 Subject 90 Calibration pattern 91 Cost volume 92 Cost column ^C Virtual camera DC Initial depth map D ^D Intermediate depth map N _D Depth layer

Claims

A shooting device composed of a shooting camera with the same optical axis, a depth camera, and an optical element array generates a depth map corresponding to the shot image of each viewpoint by using the shot image and the depth image of the subject taken from each viewpoint. Depth map generator
A cost representing the similarity between the captured images projected on the depth layer is calculated for each of the depth layers at predetermined intervals in the depth direction and the pixel positions of the captured images, and the cost is calculated at the depth layer and the pixel positions. A cost volume generation means that generates a three-dimensional array of cost volumes,
Depth conversion means for converting the depth image into an intermediate depth map by a depth conversion function that converts the pixel value of each pixel of the depth image into depth.
A cost weight calculation means for calculating a cost weight in which the weight of the intermediate depth map is expressed by a normal distribution function, and a cost weight calculation means.
A visibility weight calculation means that calculates a visibility weight that reduces the cost when an occlusion occurs from the intermediate depth map.
A weight application means for applying the cost weight and the visibility weight to the cost volume,
A final depth map generation means that generates a final depth map showing the depth of the depth layer that minimizes the cost in the cost column at the same pixel position in the cost volume after weight application.
A depth map generator characterized by being equipped with.

Further provided with a smoothing means for smoothing the depth image,
The depth map generation device according to claim 1, wherein the depth conversion means converts a depth image smoothed by the smoothing means into the intermediate depth map by the depth conversion function.

An initial depth map generation means that generates an initial depth map showing the depth of the depth layer that minimizes the cost in a cost column at the same pixel position in the cost volume generated by the cost volume generation means.
For pixels whose depth difference between the initial depth map and the intermediate depth map is equal to or less than the threshold value, the average of the depth differences between the depth layers is obtained as a correction value, and the depth of the intermediate depth map is corrected by the correction value. Map correction means and
The depth map generator according to claim 1 or 2, further comprising.

Further provided is a layering processing means for performing a layering process for replacing the depth of the intermediate depth map with the depth of the nearest depth layer.
The depth map generation device according to claim 3, wherein the intermediate depth map correction means corrects the depth of the intermediate depth map to which the layering processing means has been layered with the correction value.

An initial camera parameter calculation means for calculating the initial camera parameters of the virtual camera corresponding to each viewpoint by performing a camera calibration process on the captured image obtained by the photographing device from each viewpoint.
A camera parameter optimization means that optimizes camera parameters among virtual cameras by the camera calibration process with the initial camera parameters as initial values.
Depth conversion function calculation means for calculating the depth conversion function by associating the distance from the position of the virtual camera indicated by the optimized camera parameters to the calibration pattern with the pixel value of each pixel of the depth image.
The depth map generator according to any one of claims 1 to 4, further comprising.

Further, the photographing apparatus further includes an angle-of-view correction means for projecting and converting the depth image so that the angle of view of the depth image obtained by photographing the calibration pattern at each viewpoint matches the angle of view of the photographed image.
The depth conversion function calculating means makes the depth from the position of the virtual camera to the calibration pattern correspond to the pixel value of each pixel of the depth image projected and converted by the angle of view correction means, thereby causing the depth conversion function. The depth map generator according to claim 5, wherein the depth map generator is calculated.

A program for causing a computer to function as the depth map generator according to any one of claims 1 to 6.

An imaging device composed of a photographing camera with the same optical axis, a depth camera, and an optical element array, and
The depth map generator according to any one of claims 1 to 6.
Depth map generation system characterized by being equipped with.