JP2023000111A

JP2023000111A - Three-dimensional model restoration device, method, and program

Info

Publication number: JP2023000111A
Application number: JP2021100741A
Authority: JP
Inventors: 達也小林; Tatsuya Kobayashi
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2021-06-17
Filing date: 2021-06-17
Publication date: 2023-01-04
Anticipated expiration: 2041-06-17
Also published as: JP7518040B2

Abstract

To provide a three-dimensional model restoration device, a method, and a program that estimate a three-dimensional shape of an object in advance when generating a parallax image of the object by stereo matching, and restrict a pixel for the stereo matching to a vicinity of a two-dimensional projection area of a three-dimensional object shape, and restrict a range of parallax searched for each pixel to a vicinity of a surface of the three-dimensional object shape.SOLUTION: In a three-dimensional model restoration device 1 that restores, on the basis of a plurality of camera images obtained by photographing an object from different viewpoints, a three-dimensional model of the object, a three-dimensional object shape estimation unit 10 estimates, on the basis of the plurality of camera images, the three-dimensional shape of the object. A stereo matching unit 20 generates a depth image by stereo matching using an estimation result of the three-dimensional shape of the object for each camera. An object model generation unit 30 restores the three-dimensional model of the object using the depth image generated for each camera.SELECTED DRAWING: Figure 1

Description

本発明は、カメラ画像に写る物体の三次元モデルを復元する装置、方法及びプログラムに係り、特に、カメラパラメータが較正済みの複数のカメラで物体を様々な方向から撮影した複数視点画像を用いて当該物体の三次元モデルを高速かつ低ノイズで復元する三次元モデル復元装置、方法及びプログラムに関する。 The present invention relates to an apparatus, method, and program for restoring a three-dimensional model of an object captured in camera images, and in particular, using multi-viewpoint images obtained by photographing an object from various directions with a plurality of cameras whose camera parameters have been calibrated. The present invention relates to a three-dimensional model restoration device, method, and program for restoring a three-dimensional model of an object at high speed and with low noise.

物体の二次元画像から三次元復元を行う手法は、物体にレーザーや光を投影することで復元を行う能動型の方式と通常のカメラ画像のみから復元を行う受動型の方式とに分類できる。 Methods for 3D reconstruction from a 2D image of an object can be classified into an active method that reconstructs the object by projecting a laser or light onto the object, and a passive method that reconstructs the image only from an ordinary camera image.

能動型の方式として、特許文献1にはドットパターンを物体に投影してステレオカメラで撮影し、ステレオマッチングで視差を求めることで三次元復元を行う方式が開示されている。特許文献2には深度センサを用いて物体の三次元復元を行う方式が開示されている。 As an active method, Patent Document 1 discloses a method of projecting a dot pattern onto an object, photographing it with a stereo camera, and obtaining parallax by stereo matching to perform three-dimensional restoration. Patent Document 2 discloses a method for three-dimensional reconstruction of an object using a depth sensor.

受動型の方式として、物体の複数視点画像を用いて当該物体のシルエットを抽出し、SfS（Shape-from-Silhouette）法で三次元形状を求める方式が一般に知られている。特許文献3にはシルエット抽出が不正確な場合でも高精度な三次元形状を求められるようにSfSを改良し、複数のシルエット画像を大局的に評価することで三次元復元を行う方式が開示されている。 As a passive method, a method of extracting the silhouette of the object using multi-viewpoint images of the object and determining the three-dimensional shape by the SfS (Shape-from-Silhouette) method is generally known. Patent Document 3 discloses a method of improving SfS so that a highly accurate 3D shape can be obtained even when silhouette extraction is inaccurate, and performing 3D restoration by globally evaluating multiple silhouette images. ing.

受動型の別の方式として、物体の複数視点画像（ステレオ画像）の間のステレオマッチングによって視差画像を生成し、視差画像の合成により三次元形状を求める方法も知られている。一般にステレオマッチングはSfS法と比較して、模様の少ない領域の復元精度が劣る傾向があるものの凹構造の復元が可能であり、解像度の向上に伴う復元精度向上が期待できる。また、形状の連続性を考慮に入れることで安定性を向上させることも可能である。 As another passive method, a method of generating parallax images by stereo matching between multiple viewpoint images (stereo images) of an object and obtaining a three-dimensional shape by synthesizing the parallax images is also known. Generally speaking, compared to the SfS method, the stereo matching tends to have lower restoration accuracy in areas with few patterns, but it is possible to restore concave structures, and an improvement in restoration accuracy can be expected as the resolution increases. It is also possible to improve stability by taking into account continuity of shape.

特許文献4にはランダム探索によって物体表面の法線方向を考慮したステレオマッチングを高速に行う、PatchMatch Stereo法（非特許文献1）を用いた三次元復元方式が開示されている。特許文献5にはステレオマッチングの際に、別の手段で抽出した被写体のシルエット情報を用いてマッチングの探索範囲をシルエット領域内に限定することで、マッチング時間を短縮する手法が提案されている。 Patent Document 4 discloses a three-dimensional reconstruction method using the PatchMatch Stereo method (Non-Patent Document 1), which performs high-speed stereo matching in consideration of the normal direction of the object surface by random search. Patent Literature 5 proposes a technique for shortening the matching time by limiting the search range for matching to within the silhouette area using silhouette information of a subject extracted by another means during stereo matching.

特許文献6には、ステレオカメラと深度センサとを併用し、ステレオマッチングを行う各画素において探索する視差の範囲を深度センサで取得した深度値の近傍に制限することで、マッチング時間を短縮する手法が提案されている。 Patent document 6 describes a method of shortening the matching time by using both a stereo camera and a depth sensor and limiting the parallax range searched for each pixel for stereo matching to the vicinity of the depth value obtained by the depth sensor. is proposed.

特開2020-71034号公報JP 2020-71034 A 特開2014-67372号公報Japanese Patent Application Laid-Open No. 2014-67372 特開2013-25458号公報JP 2013-25458 A 特開2018-181047号公報Japanese Patent Application Laid-Open No. 2018-181047 特開2000-331160号公報JP-A-2000-331160 特開2009-47495号公報JP 2009-47495 A

M.Bleyer, C.Rhemann, C.Rother, "PatchMatch Stereo - Stereo Matching with Slanted Support Windows", The British Machine Vision Conference (BMVC), 2011M.Bleyer, C.Rhemann, C.Rother, "PatchMatch Stereo - Stereo Matching with Slanted Support Windows", The British Machine Vision Conference (BMVC), 2011

しかしながら、上記のいずれの方式であっても、カメラの種類や配置に柔軟性を持たせつつ、物体の三次元モデルを高速かつ高精度に復元することが困難であった。例えば、特許文献1，2は特殊なセンサの利用を前提としており、通常のRGBカメラを用いる装置には適用できない。 However, in any of the above methods, it is difficult to restore a three-dimensional model of an object at high speed and with high accuracy while allowing flexibility in the type and arrangement of cameras. For example, Patent Documents 1 and 2 assume the use of special sensors, and cannot be applied to devices using ordinary RGB cameras.

特許文献2が活用する深度センサには、物体の表面素材（一般に黒色領域で誤差が拡大）や領域（一般にエッジ付近で誤差が拡大）によって推定精度にばらつきが生じ得る。特許文献3が開示するSfSベースの方式は、カメラが物体の全周をほぼ均等間隔で取り囲むカメラ配置を前提としているため、カメラ配置の柔軟性に欠け、カメラ台数も多く必要となる。 In the depth sensor used in Patent Document 2, the estimation accuracy may vary depending on the surface material of the object (error generally increases in black areas) and area (error generally increases near edges). The SfS-based method disclosed in Patent Document 3 is based on the premise that the cameras surround the entire circumference of the object at substantially equal intervals, so it lacks flexibility in camera placement and requires a large number of cameras.

特許文献4が開示するステレオマッチングベースの方式では処理速度やマッチングミスによるノイズが発生し得る。特許文献5が開示するシルエット情報を活用したステレオマッチングベースの方式ではステレオマッチングの精度がシルエット抽出精度に依存するため、シルエット抽出に欠損が生じた際に三次元モデルにも欠損が生じ得る。 In the stereo matching-based method disclosed in Patent Document 4, noise may occur due to processing speed and matching errors. In the stereo-matching-based method utilizing silhouette information disclosed in Patent Document 5, the accuracy of stereo matching depends on the accuracy of silhouette extraction, so when a loss occurs in silhouette extraction, a loss may occur in the 3D model as well.

特許文献6が開示する、深度センサを併用したステレオマッチングベースの方式では、ステレオマッチングの精度が深度センサの精度に依存するため、物体の表面素材や領域によって推定精度にばらつきが生じ得る。 In the stereo-matching-based method that uses a depth sensor together, which is disclosed in Patent Document 6, the accuracy of stereo matching depends on the accuracy of the depth sensor, so the estimation accuracy may vary depending on the surface material and area of the object.

本発明の目的は、上記の技術課題を解決し、ステレオマッチングで物体の視差画像を生成する際に予め当該物体の三次元形状を推定し、ステレオマッチングを行う画素を当該三次元物体形状の二次元投影領域近傍に制限するとともに、各画素において探索する視差の範囲を三次元物体形状の表面近傍に制限することで、ステレオカメラ以外に追加のカメラを必要としない三次元モデル復元装置、方法及びプログラムを提供することにある。 An object of the present invention is to solve the above technical problems by estimating the three-dimensional shape of an object in advance when generating a parallax image of the object by stereo matching, and performing stereo matching on pixels of the three-dimensional object shape. A three-dimensional model reconstruction apparatus, method, and method that does not require an additional camera other than a stereo camera by limiting the range of parallax to be searched for in each pixel to the vicinity of the surface of the three-dimensional object shape while limiting it to the vicinity of the dimensional projection area. to provide the program.

上記の目的を達成するために、本発明は、物体を異なる視点で撮影した複数のカメラ画像に基づいて当該物体の三次元モデルを復元する三次元モデル復元装置において、以下の構成を具備した点に特徴がある。 In order to achieve the above object, the present invention provides a three-dimensional model restoration device for restoring a three-dimensional model of an object based on a plurality of camera images photographing the object from different viewpoints, comprising the following configuration. is characterized by

(1)複数のカメラ画像に基づいて物体の三次元形状を推定する手段と、カメラごとに前記物体の三次元形状の推定結果を用いたステレオマッチングにより奥行画像を生成する手段とを具備し、カメラごとに生成した奥行画像を用いて物体の三次元モデルを復元するようにした。 (1) means for estimating the three-dimensional shape of an object based on a plurality of camera images; and means for generating a depth image by stereo matching using the estimated three-dimensional shape of the object for each camera, A three-dimensional model of an object is reconstructed using depth images generated by each camera.

(2)奥行画像を生成する手段が、カメラごとに当該カメラ画像および前記物体の三次元形状の推定結果に基づいて、当該カメラ画像を基準画像とするときの参照画像を他のカメラ画像の中から選択する手段を具備し、カメラごとに当該カメラ画像を基準画像として、前記選択した参照画像との間でステレオマッチングを行って奥行画像を生成するようにした。 (2) The means for generating the depth image selects the reference image among the other camera images when the camera image is used as the reference image based on the camera image and the estimation result of the three-dimensional shape of the object for each camera. using the camera image as a reference image for each camera, performing stereo matching with the selected reference image to generate the depth image.

(3)奥行画像を生成する手段は、物体の三次元形状に基づいてステレオマッチングの基準画像における物体領域を特定し、ステレオマッチングの探索範囲を物体領域に制限するようにした。 (3) The means for generating the depth image specifies the object region in the stereo matching reference image based on the three-dimensional shape of the object, and limits the stereo matching search range to the object region.

(4)奥行画像を生成する手段が、カメラごとに当該カメラのカメラパラメータを用いて前記物体の三次元形状の推定結果を物体視差画像に変換する手段を具備し、ステレオマッチングにおいて注目する基準画像の画素ごとに、物体視差画像の推定結果に基づいて、参照画像における対応画素の探索範囲を制限するようにした。 (4) The means for generating the depth image includes means for converting the estimation result of the three-dimensional shape of the object into an object parallax image using the camera parameters of each camera, and the reference image to be focused on in stereo matching. For each pixel, the search range of corresponding pixels in the reference image is limited based on the estimation result of the object parallax image.

(1)カメラごとに物体の三次元形状の推定結果を用いたステレオマッチングにより奥行画像を生成するので、柔軟なカメラ配置で校正済みのカメラで物体を様々な方向から撮影した複数視点画像からノイズの少ない三次元モデルを高速に復元できるようになる。 (1) Since depth images are generated by stereo matching using the results of estimating the 3D shape of an object for each camera, noise can be detected from multi-viewpoint images captured from various directions using calibrated cameras with flexible camera placement. A 3D model with a small amount of data can be restored at high speed.

(2)参照画像を物体の三次元形状に基づいて選択するので、少ない計算コストで見え方が近い画像をステレオペアとして選択することが可能となり、三次元復元精度を向上させることができるようになる。 (2) Since the reference image is selected based on the 3D shape of the object, it is possible to select images with similar appearances as a stereo pair at a low computational cost, improving the 3D reconstruction accuracy. Become.

(3)ステレオマッチングにおける画素の探索範囲を物体の三次元形状に基づいて選択するので、誤って必要な領域をマッチング範囲から除外してしまうことによる精度劣化を抑制できるようになる。 (3) Since the pixel search range in stereo matching is selected based on the three-dimensional shape of the object, it is possible to suppress accuracy deterioration due to erroneously excluding a necessary region from the matching range.

(4)ステレオマッチングにおける画素の探索範囲を物体の表面近傍に制限できるので、品質向上と処理負荷低減を実現できるようになる。 (4) Since the pixel search range in stereo matching can be limited to the vicinity of the surface of the object, quality improvement and processing load reduction can be achieved.

本発明の第1実施形態に係る三次元モデル復元装置の機能ブロック図である。1 is a functional block diagram of a 3D model restoration device according to a first embodiment of the present invention; FIG. 物体の三次元形状を推定する第1の方法を模式的に示した図である。FIG. 2 schematically shows a first method for estimating the three-dimensional shape of an object; 物体の三次元形状を推定する第２の方法を模式的に示した図である。FIG. 4 is a diagram schematically showing a second method of estimating the three-dimensional shape of an object; 物体の三次元形状を推定する第３の方法を模式的に示した図である。FIG. 10 is a diagram schematically showing a third method of estimating the three-dimensional shape of an object; 2台のカメラで複数の物体の三次元形状を推定する方法を示した図である。FIG. 4 is a diagram showing a method of estimating three-dimensional shapes of multiple objects with two cameras; 3台のカメラで複数の物体の三次元形状を推定する方法を示した図である。FIG. 4 is a diagram showing a method of estimating three-dimensional shapes of multiple objects using three cameras; 多視点のステレオマッチングの例を示した図である。FIG. 4 is a diagram showing an example of multi-viewpoint stereo matching; カメラごとに三次元物体形状の関節点座標の相違度に基づいて参照画像を選択する方法を模式的に示した図である。FIG. 4 is a diagram schematically showing a method of selecting a reference image based on the degree of difference in joint point coordinates of three-dimensional object shapes for each camera; 基準画像の全画素について、復元対象のボクセル空間の大きさに応じた視差の範囲内でウィンドウマッチングを行う例を示した図である。FIG. 10 is a diagram showing an example of performing window matching within a parallax range corresponding to the size of a voxel space to be restored for all pixels of a reference image; 物体が映っていないウィンドウのマッチングを省略することでマッチングの処理負荷を軽減する例を示した図である。FIG. 10 is a diagram showing an example of reducing the processing load of matching by omitting matching for windows in which no object is shown;

以下、図面を参照して本発明の実施の形態について詳細に説明する。図1は、本発明の第1実施形態に係る三次元モデル復元装置1の構成を示した機能ブロック図であり、三次元物体形状推定部10、ステレオマッチング部20および物体モデル生成部30を主要な構成としている。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a functional block diagram showing the configuration of a three-dimensional model restoration device 1 according to the first embodiment of the present invention, which mainly comprises a three-dimensional object shape estimation unit 10, a stereo matching unit 20 and an object model generation unit 30. configuration.

このような三次元モデル復元装置１は、汎用の少なくとも一台のコンピュータやサーバに各機能を実現するアプリケーション（プログラム）を実装することで構成できる。あるいはアプリケーションの一部をハードウェア化またはソフトウェア化した専用機や単能機としても構成できる。 Such a three-dimensional model restoration device 1 can be configured by installing an application (program) that realizes each function in at least one general-purpose computer or server. Alternatively, a part of the application can be configured as a dedicated machine or a single-function machine that is made into hardware or software.

三次元物体形状推定部10は、視点の異なる複数のカメラ画像（複数視点画像）を入力として、各カメラ画像に含まれる物体の三次元形状を大まかに推定し、推定結果をステレオマッチング部20へ出力する。本実施形態では、物体を包含する三次元空間中の三次元モデルとして、ソリッドモデル、メッシュ（サーフェス）モデル、ワイヤフレームモデル等の形状が推定される。 The three-dimensional object shape estimation unit 10 receives a plurality of camera images with different viewpoints (multi-viewpoint images) as input, roughly estimates the three-dimensional shape of the object included in each camera image, and sends the estimation result to the stereo matching unit 20. Output. In this embodiment, a shape such as a solid model, a mesh (surface) model, or a wireframe model is estimated as a three-dimensional model in a three-dimensional space containing an object.

前記三次元物体形状推定部10はあらゆる三次元形状のモデルを推定できるが、本実施形態では三次元形状がメッシュモデル（頂点・線分・面の情報で物体表面を表現した三次元モデル）である場合を例にして説明する。 The three-dimensional object shape estimation unit 10 can estimate any three-dimensional shape model. A certain case will be described as an example.

また、本発明はあらゆる物体（人物の全身、人物の顔や上半身等の部分領域、犬・猫等の動物、車や家具などの人工物など）のモデル復元に適用できるが、本実施形態では人物形状の復元を例にして説明する。 In addition, the present invention can be applied to model restoration of any object (human body, partial regions such as a human face and upper body, animals such as dogs and cats, artificial objects such as cars and furniture, etc.). The restoration of a person's shape will be described as an example.

前記三次元物体形状推定部10は位置姿勢推定部101および三次元形状推定部102を具備し、カメラ画像に映る物体の三次元形状を大まかに推定する。前記位置姿勢推定部101は、カメラごとにそのカメラ画像に基づいて物体の位置姿勢を推定する。三次元形状推定部102は、カメラごとに推定した位置姿勢に基づいて物体の三次元形状をただ一つ推定する。三次元形状の推定方法として、例えば以下の3種類の方法を採用できる。 The three-dimensional object shape estimation unit 10 includes a position/orientation estimation unit 101 and a three-dimensional shape estimation unit 102, and roughly estimates the three-dimensional shape of an object captured in a camera image. The position and orientation estimation unit 101 estimates the position and orientation of an object based on the camera image for each camera. A three-dimensional shape estimation unit 102 estimates only one three-dimensional shape of an object based on the position and orientation estimated for each camera. As a three-dimensional shape estimation method, for example, the following three types of methods can be adopted.

(1)第1の推定方法
図2に示すように、初めに各カメラ画像から物体を包含する矩形領域を検出し、矩形領域の中心位置の二次元座標を得る。次いで、各カメラ画像の二次元座標を三角測量によって三次元座標に逆投影し、物体中心の三次元位置を得る。三角測量は二次元座標のペアから行うことが可能であるため、カメラ画像が3枚以上存在する場合はペアの組み合わせごとに三次元位置を算出し、複数の三次元位置の平均値で三次元位置を代表させても良い。最後に、物体中心の三次元位置を中心に物体形状を包含するように予め設計した物体近似三次元モデルを配置し、これを三次元物体形状とする。 (1) First Estimation Method As shown in FIG. 2, first, a rectangular area containing an object is detected from each camera image, and two-dimensional coordinates of the central position of the rectangular area are obtained. The two-dimensional coordinates of each camera image are then backprojected to three-dimensional coordinates by triangulation to obtain the three-dimensional position of the object center. Triangulation can be performed from pairs of 2D coordinates, so if there are three or more camera images, the 3D position is calculated for each combination of pairs, and the average value of multiple 3D positions is used to calculate the 3D It may represent the position. Finally, an object approximation three-dimensional model designed in advance so as to include the object shape around the three-dimensional position of the object center is placed as the three-dimensional object shape.

物体の矩形領域は、コンピュータビジョンの領域で一般に利用されているR-CNNやYOLO、SSD等の任意のアルゴリズムで検出できる。物体近似三次元モデルとしては角柱モデルや円柱モデル等が利用可能である。 Rectangular regions of objects can be detected by arbitrary algorithms such as R-CNN, YOLO, SSD, etc., which are commonly used in the field of computer vision. A prismatic model, a cylindrical model, or the like can be used as the object approximation three-dimensional model.

(2)第２の方法
図3に示すように、初めに各カメラ画像から予め定義された物体の関節点の位置を推定（姿勢推定）し、各関節点を三角測量によって三次元座標に逆投影する。次いで、各関節点の三次元座標を中心に物体の各部分領域の近似三次元モデル（物体の部分領域を包含するように予め設計）を配置し、これを三次元物体形状とする。 (2) Second method As shown in Fig. 3, first, the positions of joint points of a predefined object are estimated from each camera image (posture estimation), and each joint point is reversed to three-dimensional coordinates by triangulation. Project. Next, an approximate three-dimensional model of each partial area of the object (designed in advance to include the partial area of the object) is placed around the three-dimensional coordinates of each joint point, and this is taken as a three-dimensional object shape.

物体の姿勢は、コンピュータビジョンの領域で一般に利用されているOpenPose等の任意のアルゴリズムで推定できる。物体の各部分領域の近似三次元モデルとしては角柱モデルや円柱モデル等が利用可能である。 The pose of an object can be estimated by any algorithm such as OpenPose, which is commonly used in the field of computer vision. A prismatic model, a cylindrical model, or the like can be used as the approximate three-dimensional model of each partial region of the object.

(3)第3の方法
図4に示すように、初めに物体のパラメトリックモデルを利用して、カメラごとにその画像からパラメトリックモデルのパラメータを推定する様に学習された識別器を利用して、カメラ画像ごとに物体の三次元姿勢および三次元物体形状を同時に推定する。そして、カメラ画像ごとに推定した三次元姿勢および三次元物体形状を平均化するなどして一つの三次元姿勢および三次元物体形状に統合し、これを出力とする。あるいは、複数のカメラ画像からパラメトリックモデルのパラメータを推定するように学習した識別器を利用して、当該物体の三次元姿勢および三次元物体形状を同時に推定し、これを出力としても良い。 (3) Third method As shown in FIG. 4, first using a parametric model of the object, using a classifier trained to estimate the parameters of the parametric model from the image for each camera, The 3D pose and 3D object shape of the object are simultaneously estimated for each camera image. Then, the three-dimensional posture and the three-dimensional object shape estimated for each camera image are averaged or otherwise integrated into one three-dimensional posture and three-dimensional object shape, which is output. Alternatively, a classifier trained to estimate parameters of a parametric model from multiple camera images may be used to simultaneously estimate the 3D pose and 3D object shape of the object and output them.

ここで、物体のパラメトリックモデルとは、物体の三次元姿勢をパラメータとして物体の三次元形状モデルを表現することが可能なメッシュモデルのことであり、人物モデルの場合はSMPL（Skinned Multi-Person Linear model）が一般に用いられる。SMPLは6890点の頂点で表現される三次元人物形状モデルを、72次元の姿勢パラメータθおよび10次元の体型パラメータβで制御することが可能である。例えば、SPIN（SMPL oPtimization IN the loop）等の方式によって、画像を入力としてθおよびβを推定することが可能であり、当該θおよびβから三次元人物形状が求められる。 Here, the parametric model of an object is a mesh model that can express a three-dimensional shape model of an object using the three-dimensional posture of the object as a parameter. model) is commonly used. SMPL is capable of controlling a 3D human shape model represented by 6890 vertices with a 72-dimensional posture parameter θ and a 10-dimensional body shape parameter β. For example, by a method such as SPIN (SMPL oOptimization IN the loop), it is possible to estimate θ and β using an image as an input, and obtain a three-dimensional human shape from the θ and β.

なお、上記の説明では各カメラ画像中に物体が1つのみ写っている場合を想定しているが、図5に示すように、複数の物体が写っている場合についても同様に適用できる。ただし、カメラが2台のみ存在する場合においては、画像間で同一物体の識別（対応付け）を行わないと、三角測量を行う点（第1の方法では物体中心、第2の方法では各関節点）のペアを特定することができない。 In the above explanation, it is assumed that only one object is captured in each camera image, but as shown in FIG. 5, it can be similarly applied to the case where a plurality of objects are captured. However, when there are only two cameras, if the same object is not identified (associated) between images, triangulation points (object center in the first method, each joint in the second method) points) pairs cannot be identified.

ここで、各カメラ画像間で画像特徴の類似度等の尺度で同一の物体の対応付けを行うことで三角測量を行う点のペアを特定することが可能になる。カメラが3台以上存在するシステム構成の場合は2台の場合と同じ方法で同一物体の対応付けを行うことで三角測量を行っても良いし、図6に示すように、一般に「三眼視」と呼ばれる、カメラ3台の情報を用いた三角測量を用いることで、画像間の同一物体の識別を行うことなく幾何学的処理のみにより物体中心や各関節点の三次元位置を特定するようにしても良い。 Here, it is possible to specify pairs of points for which triangulation is performed by associating the same object between each camera image based on a measure such as the degree of similarity of image features. In the case of a system configuration with three or more cameras, triangulation may be performed by associating identical objects in the same manner as in the case of two cameras. By using triangulation using information from three cameras, it is possible to specify the 3D position of the object center and each joint point only by geometric processing without identifying the same object between images. You can do it.

ステレオマッチング部20は、視点の異なるN枚のカメラ画像および三次元物体形状の推定結果（撮影シーン中にM個の物体が存在する場合は計M個）を用いて、カメラ画像間のステレオマッチングにより各カメラ画像に対応するN枚の視差画像を生成し、更に視差画像を用いて奥行画像を生成する。生成したN枚の奥行画像は物体モデル生成部30へ出力される。 The stereo matching unit 20 performs stereo matching between camera images using N camera images from different viewpoints and the estimation result of the three-dimensional object shape (if there are M objects in the shooting scene, a total of M objects). generates N parallax images corresponding to each camera image, and further generates a depth image using the parallax images. The generated N depth images are output to the object model generation unit 30 .

ステレオマッチングの処理は、一般に視差画像を求めたい画像（基準画像）とマッチングに使用する参照画像とをステレオペアとして選択し、カメラパラメータを用いた画像平行化（Stereo Rectification）およびウィンドウマッチングの2ステップの処理によって視差画像を生成する。 Stereo matching processing generally involves selecting an image for which a parallax image is to be obtained (reference image) and a reference image to be used for matching as a stereo pair, and performing image parallelization (Stereo Rectification) using camera parameters and window matching. to generate parallax images.

基準画像に対して参照画像は1枚でも複数枚（多視点のステレオマッチング）でも良い。参照画像が2枚の場合、図7に示すように、基準画像は2枚の参照画像に対してウィンドウマッチングを行うことで、より高精度な視差画像を生成できる。 One or a plurality of reference images (multi-viewpoint stereo matching) may be used for the reference image. When there are two reference images, as shown in FIG. 7, the standard image can generate a more accurate parallax image by performing window matching on the two reference images.

参照画像は、カメラ画像ごとに他のカメラ画像（候補画像）の中から基準画像と撮影領域のオーバーラップが大きい順に選択することが望ましい。単純な方法として、基準画像の撮影方向rbと各候補画像の撮影方向rcとの角度差Δθを次式(1)で計算し、角度差Δθがより小さい候補画像を参照画像として優先的に選択する方法がある。 It is desirable that the reference image is selected from other camera images (candidate images) for each camera image in descending order of overlap between the reference image and the shooting area. As a simple method, the angle difference Δθ between the shooting direction rb of the reference image and the shooting direction rc of each candidate image is calculated by the following equation (1), and the candidate image with the smaller angle difference Δθ is preferentially selected as the reference image. There is a way.

また、基準画像と各候補画像との間で特徴点マッチングを行い、より多くの対応点が得られた候補画像を参照画像として優先的に選択するようにしても良い。 Also, feature point matching may be performed between the reference image and each candidate image, and the candidate image from which more corresponding points are obtained may be preferentially selected as the reference image.

本実施形態では、ステレオマッチング部20が参照画像選択部201を具備し、カメラごとにそのカメラ画像および前記三次元物体形状の推定結果に基づいて他のカメラ画像の中から参照画像を選択する。 In this embodiment, the stereo matching unit 20 includes a reference image selection unit 201, which selects a reference image from other camera images based on the camera image and the estimation result of the three-dimensional object shape for each camera.

前記参照画像選択部201は、カメラごとに前記推定した三次元物体形状の各頂点V（=[[x1, y1, z1]，[x2, y2, z2]，…]）を当該カメラのカメラパラメータを用いてそのカメラ画像に二次元投影して二次元投影点P（=[[u1, v1]，[u2, v2]…]）を求める。そして、カメラごとにそのカメラ画像（基準画像）における三次元物体形状の二次元投影点Pbと他のカメラのカメラ画像（候補画像）における三次元物体形状の二次元投影点Pcとの間の相違度（例えばプロクラステス距離）d^p(Pb，Pc)を算出し、相違度がより小さい少なくとも一つの候補画像を参照画像として選択する。 The reference image selection unit 201 converts each vertex V (=[[x1, y1, z1], [x2, y2, z2], ...]) of the estimated three-dimensional object shape for each camera into the camera parameter of the camera. is used to two-dimensionally project onto the camera image to obtain a two-dimensional projection point P (=[[u1, v1], [u2, v2]...]). Then, for each camera, the difference between the two-dimensional projection point Pb of the three-dimensional object shape in the camera image (reference image) and the two-dimensional projection point Pc of the three-dimensional object shape in the camera image (candidate image) of another camera A degree (eg Procrustes distance) d ^p (Pb, Pc) is calculated, and at least one candidate image with a smaller dissimilarity is selected as a reference image.

このように、本実施形態ではステレオマッチング部20が参照画像選択部201を具備し、カメラごとに三次元物体形状の推定結果を考慮して、各カメラ画像を基準画像としたときの参照画像を選択するので、実際の見え方が近い、すなわちオーバーラップする領域がより大きい画像をステレオペアとして選択することが可能となり、結果として三次元復元精度を向上させることができる。 As described above, in this embodiment, the stereo matching unit 20 includes the reference image selection unit 201, and in consideration of the estimation result of the three-dimensional object shape for each camera, a reference image is selected when each camera image is used as a reference image. Since the images are selected, it is possible to select as a stereo pair images that are close in actual appearance, that is, images having a larger overlapping area, and as a result, it is possible to improve the three-dimensional reconstruction accuracy.

なお、本発明においてカメラごとに三次元物体形状を考慮して参照画像を選択する方法は上記の手法に限定されない。例えば図8に示すように、三次元物体形状推定部10が推定した三次元物体形状の各頂点Vを、カメラごとにそのカメラパラメータを用いて当該カメラのカメラ画像に二次元投影することで、その姿勢推定の指標となる各関節点の二次元座標群P'（=[[u'1, v'1], [u'2, v'2], ….]）として取得することができる。 Note that, in the present invention, the method of selecting a reference image in consideration of the three-dimensional object shape for each camera is not limited to the above method. For example, as shown in FIG. 8, each vertex V of the three-dimensional object shape estimated by the three-dimensional object shape estimation unit 10 is two-dimensionally projected onto the camera image of the camera using the camera parameters for each camera. It can be obtained as a two-dimensional coordinate group P' (=[[u'1, v'1], [u'2, v'2], ….]) of each joint point, which is an index for posture estimation. .

そして、カメラごとにそのカメラ画像における物体の姿勢P'bと他の候補画像における物体の姿勢P'cとの相違度d^p(P'b，P'c)を算出し、相違度がより小さい候補画像をカメラごとに優先的に参照画像として選択しても良い。図８の例では、d^p (P₁，P₂）＜d^p (P₁，P₃) となるため、候補画像cam2が参照画像として選択されることになる。これにより、より少ない計算コストで見え方が近い画像をステレオペアとして選択することが可能になる。 Then, for each camera, the degree of difference d ^p (P'b, P'c) between the orientation P'b of the object in the camera image and the orientation P'c of the object in the other candidate images is calculated. A small candidate image may be preferentially selected as a reference image for each camera. In the example of FIG. 8, since d ^p (P ₁ , P ₂ )<d ^p (P ₁ , P ₃ ), the candidate image cam2 is selected as the reference image. This makes it possible to select images that are similar in appearance as a stereo pair with less computational cost.

このとき、各相違度d^pに所定の閾値（第1の閾値）を設定し、相違度d^pが第1の閾値を下回る全ての候補画像を参照画像として選択し、多視点のステレオマッチングを行うようにしても良い。一般に、オーバーラップ領域の少ないステレオペアが含まれるとノイズが増加することが起こり得るが、このような多視点のステレオマッチングを採用すれば、画像間のオーバーラップ領域の大きさが不均一な入力画像群に対して参照画像の枚数を自動的に調整できるので、ノイズの少ない三次元復元を行うことが可能となる。 At this time, a predetermined threshold (first threshold) is set for each dissimilarity d ^p , all candidate images with dissimilarity d ^p less than the first threshold are selected as reference images, and multi-viewpoint stereo matching is performed. You can do it. In general, noise may increase when stereo pairs with small overlapping regions are included. Since the number of reference images for an image group can be automatically adjusted, it is possible to perform three-dimensional reconstruction with less noise.

一方、奥行（撮影距離）に対して基線長（カメラ間距離）が極端に短いと視差推定の精度が劣化することがある。一般に基線長と前記各相違度d^pとの間には正の相関があり、基線長が短いほど相違度d^pが小さくなる傾向にある。そこで、本実施形態では視差推定の精度を劣化させるほどに基線長が短くなるカメラを排除する閾値として、前記第1の閾値を下回る第2の閾値を設定し、相違度d^pが第2の閾値を上回る候補画像の中から相違度がより小さい候補画像を参照画像として優先的に選択するようにしても良い。 On the other hand, if the baseline length (inter-camera distance) is extremely short with respect to the depth (shooting distance), the accuracy of parallax estimation may deteriorate. In general, there is a positive correlation between the baseline length and each of the dissimilarities d ^p , and the shorter the baseline length, the smaller the dissimilarity d ^p tends to be. Therefore, in the present embodiment, a second threshold lower than the first threshold is set as a threshold for excluding a camera whose baseline length is so short as to degrade the accuracy of parallax estimation, and the degree of dissimilarity d ^p is the second threshold. A candidate image with a smaller degree of dissimilarity may be preferentially selected as a reference image from among the candidate images exceeding the threshold.

あるいは、相違度d^pが第2の閾値を上回り、かつ第1の閾値を下回る全ての候補画像を参照画像として選択し、多視点のステレオマッチングを行っても良い。これにより、基線長不足が原因で視差推定精度が劣化する問題を回避しつつ、実際の見え方が近い画像をステレオペアとして選択することが可能になり、結果として三次元復元精度を向上させることが可能になる。 Alternatively, all candidate images whose dissimilarity d ^p exceeds the second threshold and is lower than the first threshold may be selected as reference images to perform multi-viewpoint stereo matching. As a result, while avoiding the problem of degraded parallax estimation accuracy due to insufficient baseline length, it is possible to select images that are close to the actual appearance as a stereo pair, resulting in improved 3D reconstruction accuracy. becomes possible.

上記の各処理で参照画像が選択されると、ステレオマッチング部20はカメラごとに基準画像と各参照画像との間でウィンドウマッチングを行う。ウィンドウマッチングでは、基準画像の全画素について、図9に示すように、復元対象のボクセル空間の大きさに応じた視差の範囲内で参照画像との間でウィンドウ領域のマッチングを行い、マッチングの移動距離から視差を算出する。なお、マッチング関数としてはSSDやNCCが一般に用いられる。 When a reference image is selected in each of the processes described above, the stereo matching unit 20 performs window matching between the reference image and each reference image for each camera. In window matching, as shown in Fig. 9, for all pixels of the reference image, window regions are matched with the reference image within the range of parallax according to the size of the voxel space to be restored. Parallax is calculated from the distance. Note that SSD and NCC are generally used as matching functions.

本実施形態では、ステレオマッチングの処理負荷を軽減するために、ステレオマッチング部20が探索範囲制限部202を具備し、図10に示すように、三次元物体形状の情報に基づいて、物体が映っていないウィンドウのマッチングを省略することでマッチングの処理負荷を軽減する。 In this embodiment, in order to reduce the processing load of stereo matching, the stereo matching unit 20 is provided with a search range limiting unit 202, and as shown in FIG. The processing load of matching is reduced by omitting matching of windows that are not

前記探索範囲制限部202は、各カメラに共通する三次元物体形状の推定結果をカメラごとにそのカメラパラメータを用いてカメラ画像（基準画像）に二次元投影することで基準画像中の物体領域を特定し、ウィンドウの中心座標が当該物体領域外であるとマッチング処理をスキップすることで処理負荷を軽減する。 The search range limiting unit 202 two-dimensionally projects the estimation result of the three-dimensional object shape common to each camera onto the camera image (reference image) using the camera parameters for each camera, thereby limiting the object region in the reference image. If the center coordinates of the window are outside the object area, the processing load is reduced by skipping the matching process.

マッチング範囲を削減する処理自体は特許文献5等にも開示されるが、本実施形態ではマッチングの探索範囲を決定する参照情報として三次元物体形状の二次元投影像を用いる点に特徴がある。 Although the process itself for reducing the matching range is disclosed in Patent Document 5 and the like, this embodiment is characterized by using a two-dimensional projection image of the three-dimensional object shape as reference information for determining the search range for matching.

復元する物体の一般的形状に関する事前知識なしに物体領域（シルエット）をノイズに対して頑健に推定することは一般に困難である。しかしながら、本実施形態では三次元物体形状の二次元投影像を用いて物体領域を識別するので、誤って必要な領域をマッチング範囲から除外してしまうことによる精度劣化を抑制できる。 It is generally difficult to estimate the object region (silhouette) robustly against noise without prior knowledge of the general shape of the object to be reconstructed. However, in this embodiment, since the object area is identified using the two-dimensional projection image of the three-dimensional object shape, it is possible to suppress accuracy deterioration due to erroneously excluding a necessary area from the matching range.

ただし、物体領域が参照画像中の物体を真に包含しているとは限らないため、上記の処理によって復元精度が劣化する可能性も否定できない。そこで、本実施形態では物体領域を2値画像（領域内を1、領域外を0とする）に変換し、当該2値画像に膨張（Erode）フィルタ処理することによって二次元投影像の領域を拡張する。そして、当該膨張後の領域に基づいてマッチング処理をスキップするか否かを判断することで物体の欠損を抑制するようにしている。 However, since the object region does not necessarily include the object in the reference image, it cannot be denied that the restoration accuracy may be degraded by the above processing. Therefore, in this embodiment, the object region is converted into a binary image (with 1 inside the region and 0 outside the region), and the binary image is subjected to erosion filter processing to convert the region of the two-dimensional projection image into Expand. Defects of the object are suppressed by determining whether or not to skip the matching process based on the region after the dilation.

なお、ステレオマッチングの処理負荷はボクセル空間の奥行幅（視差の範囲）の大きさに比例して増加する。更に、視差の範囲が広くなるほど誤マッチングのリスクが上昇し、三次元モデルの復元精度が劣化する点も問題になる。そこで、本実施形態では前記探索範囲制限部202が、推定した三次元物体形状の情報に基づいて、ウィンドウマッチングにおいて探索する視差の範囲を制限することで処理負荷の低減と精度向上を同時に実現する。 Note that the processing load of stereo matching increases in proportion to the depth width (parallax range) of the voxel space. Furthermore, the wider the range of parallax, the higher the risk of matching error, and the problem of deteriorating the reconstruction accuracy of the three-dimensional model. Therefore, in the present embodiment, the search range limiting unit 202 limits the parallax range searched in window matching based on the information of the estimated three-dimensional object shape, thereby reducing the processing load and improving accuracy at the same time. .

本実施形態では、カメラごとに三次元物体形状の推定結果が当該カメラのカメラパラメータを用いて基準画像の視差画像（物体視差画像）に変換され、基準画像の各画素を参照画像の各画素とウィンドウマッチングする際に、物体視差画像における当該画素の視差の値を基準に視差の探索範囲を一定の閾値範囲内に制限する。 In this embodiment, the estimation result of the three-dimensional object shape for each camera is converted into a parallax image (object parallax image) of a reference image using the camera parameters of the camera, and each pixel of the reference image is converted to each pixel of the reference image. When window matching is performed, the parallax search range is limited within a certain threshold range based on the parallax value of the pixel in the object parallax image.

ここで、「物体視差画像」とは、一般に処理負荷が高いとされるステレオマッチング処理を行うことなく、物体形状の事前知識の情報に基づき簡易に推定される視差画像を意味し、前記ステレオマッチング部20がウィンドウマッチング処理の結果として出力する視差画像（アウトプット）とは別物である。 Here, the "object parallax image" means a parallax image that can be easily estimated based on prior knowledge of the object shape without performing stereo matching processing, which is generally considered to have a high processing load. It is different from the parallax image (output) that the unit 20 outputs as a result of window matching processing.

したがって、物体視差画像はあくまで、ステレオマッチング部20がウィンドウマッチング処理を高速かつ低ノイズで行うための補助的な入力（インプット）の位置づけであり、一般にアウトプットの視差画像はインプットとなる物体視差画像よりも厳密で高精度（低ノイズ）となることが期待される。 Therefore, the object parallax image is only positioned as an auxiliary input (input) for the stereo matching unit 20 to perform window matching processing at high speed and with low noise, and generally the output parallax image is the input object parallax image. It is expected to be stricter and more accurate (lower noise) than

そして、例えば当該画素の物体視差画像における視差がd_oであれば、当該画素からエピポーラ線に沿ってd_oだけシフトした画素を中心に左右に±αd_o（αは所定の係数）の画素範囲のみに探索範囲を絞ってウィンドウマッチングを行う。これにより、視差の探索範囲を狭めることができ、品質向上と処理負荷低減を実現できる。 Then, for example, if the parallax in the object parallax image of the pixel is d _o , a pixel range of ± _αd ₀ (α is a predetermined coefficient) to the left and right around the pixel shifted by d 0 along the epipolar line from the pixel. Window matching is performed by narrowing the search range to only As a result, the parallax search range can be narrowed, and quality improvement and processing load reduction can be achieved.

視差の探索範囲を削減する処理自体は特許文献6等にも開示されるが、本実施形態では探索範囲を制限する参照情報として、三次元物体形状の推定結果に基づいて生成した物体視差画像を用いる点で異なる。復元する物体の一般的形状に関する事前知識なしにデプスマップをノイズに対して頑健に推定することは一般に困難であるが、本実施形態によれば、誤って必要な奥行値を視差探索範囲から除外してしまうことによる精度劣化を抑制することが可能である。 The process itself for reducing the parallax search range is disclosed in Patent Document 6 and the like, but in this embodiment, as reference information for limiting the search range, an object parallax image generated based on the estimation result of the three-dimensional object shape is used. They differ in terms of use. It is generally difficult to estimate a depth map robustly against noise without prior knowledge of the general shape of the object to be reconstructed. It is possible to suppress the deterioration of accuracy caused by doing so.

ただし、当該物体視差画像内の三次元物体形状の占める領域が参照画像中の物体を真に包含しているとは限らないため、上記の処理によって復元精度が劣化する可能性がある。そこで、本実施形態では基準画像の各画素のウィンドウマッチングの際に、物体視差画像における当該画素の近傍領域の視差の値を基準に、一定の閾値の範囲内に視差の探索範囲を制限する。 However, since the area occupied by the three-dimensional object shape in the object parallax image does not necessarily include the object in the reference image, the above processing may degrade the restoration accuracy. Therefore, in the present embodiment, when window matching is performed for each pixel of the reference image, the parallax search range is limited within a certain threshold range based on the parallax value of the region near the pixel in the object parallax image.

具体的には、近傍領域内に存在する有効な（三次元物体形状の範囲外の画素の視差は無効値に設定）視差の値の最小値と最大値を算出し、（最小値－閾値）～（最大値＋閾値）の範囲内に視差の探索範囲を制限することによって物体の欠損を抑制する。 Specifically, the minimum and maximum values of effective parallax values (parallax of pixels outside the range of the three-dimensional object shape are set to invalid values) existing in the neighboring region are calculated, and (minimum value - threshold value) (maximum value+threshold value) to suppress object loss by limiting the parallax search range.

より具体的には、例えば基準画像の注目する画素を中心とする5×5画素の範囲（ウィンドウサイズとは異なる）を近傍領域と定義し、視差探索範囲の閾値を5とした場合、仮に物体視差画像の全25画素に対応する値（視差）の最大値が「10」、最小値が「5」だったとすると、探索の最小値が0（5-5）、最大値が15(10+5)となり、参照画像のエピポーラ線上で視差が0～15の範囲の画素を対象にウィンドウマッチングを行う。 More specifically, for example, a 5×5 pixel range (different from the window size) centered on the pixel of interest in the reference image is defined as the neighboring region, and the threshold for the parallax search range is set to 5. If the object If the maximum value (parallax) corresponding to all 25 pixels of the parallax image is "10" and the minimum value is "5", the minimum search value is 0 (5-5) and the maximum value is 15 (10+ 5), window matching is performed for pixels with a parallax in the range of 0 to 15 on the epipolar line of the reference image.

変換部203は、最終的に次式(2)で視差dを奥行きZに変換することで視差画像を奥行画像に変換する。ここで、Bは平行化画像における焦点間距離、fは焦点距離である。 The transformation unit 203 finally transforms the parallax image into the depth image by transforming the parallax d into the depth Z by the following equation (2). Here, B is the distance between focal points in the collimated image, and f is the focal length.

ステレオマッチング20は、上記の各処理をN枚のカメラ画像に対して行い、最終的にN枚の奥行画像を物体モデル生成部30へ出力する。 Stereo matching 20 performs each of the above processes on N camera images, and finally outputs N depth images to object model generation unit 30 .

物体モデル生成部30は、ステレオマッチング部20が出力するN枚の奥行画像を用いて物体の三次元形状モデルを生成し、カメラ画像をテクスチャとして合成した上で三次元モデル復元装置１の出力とする。複数枚の奥行画像から三次元形状モデルを生成する手法に制約はなく、任意の手法を採用できる。 The object model generation unit 30 generates a three-dimensional shape model of the object using the N depth images output by the stereo matching unit 20, synthesizes the camera image as a texture, and outputs the three-dimensional model restoration device 1. do. There are no restrictions on the method of generating a three-dimensional shape model from multiple depth images, and any method can be adopted.

例えば、N枚の奥行画像から単一のボクセルデータを生成する手法としてTSDF（Truncated Signed Distance Functions）を用いることができる。TSDFでは各奥行画像がボクセル空間に逆投影され、ボクセル空間における重み付き平均処理によって単一のボクセルデータへの合成が行われる。 For example, TSDF (Truncated Signed Distance Functions) can be used as a technique for generating single voxel data from N depth images. In TSDF, each depth image is back-projected to voxel space and synthesized into single voxel data by weighted averaging in voxel space.

また、ボクセルデータから三次元形状モデル（ポリゴンデータ）を生成する手法としてマーチングキューブ法を用いることができる。マーチングキューブ法では、隣接する8個のボクセルを頂点とする立方体を一単位として、8頂点のボクセルの値に応じて予め定義された15パターンのポリゴンに変換する処理を繰り返すことによってボクセルデータを三次元形状モデルに変換できる。 Also, a marching cube method can be used as a technique for generating a three-dimensional shape model (polygon data) from voxel data. In the marching cube method, a cube with 8 adjacent voxels as vertices is treated as a unit, and voxel data is converted to cubic by repeating the process of converting into 15 patterns of polygons predefined according to the voxel values of the 8 vertices. It can be converted to the original shape model.

そして、上記の実施形態によれば複数のカメラで捉えた物体の三次元モデルを正確に復元できるようになるので、地理的あるいは経済的な格差を超えて多くの人々に多様なサービスやエンターテインメントを提供できるようになる。その結果、国連が主導する持続可能な開発目標(SDGs)の目標9「レジリエントなインフラを整備し、包括的で持続可能な産業化を推進する」や目標11「都市を包摂的、安全、レジリエントかつ持続可能にする」に貢献することが可能となる。 In addition, according to the above-described embodiments, it is possible to accurately restore a three-dimensional model of an object captured by multiple cameras, so that various services and entertainment can be provided to many people overcoming geographical or economic disparities. be able to provide. As a result, the UN-led Sustainable Development Goals (SDGs) include Goal 9 “Build resilient infrastructure and promote inclusive and sustainable industrialization” and Goal 11 “Make cities inclusive, safe and resilient.” and make it sustainable.

10…三次元物体形状推定部，20…ステレオマッチング部，201…参照画像選択部，202…探索範囲制限部，203…変換部，30…物体モデル生成部 10... Three-dimensional object shape estimation unit 20... Stereo matching unit 201... Reference image selection unit 202... Search range limitation unit 203... Conversion unit 30... Object model generation unit

Claims

A three-dimensional model restoration device for restoring a three-dimensional model of an object based on a plurality of camera images taken from different viewpoints of the object,
means for estimating a three-dimensional shape of an object based on multiple camera images;
means for generating a depth image by stereo matching using the estimation result of the three-dimensional shape of the object for each camera;
A three-dimensional model restoration device that restores a three-dimensional model of an object using a depth image generated for each camera.

The means for generating the depth image selects a reference image from other camera images when the camera image is used as a reference image based on the camera image and the estimation result of the three-dimensional shape of the object for each camera. have the means to
2. The three-dimensional model restoration apparatus according to claim 1, wherein a depth image is generated by performing stereo matching with the selected reference image using the camera image as a reference image for each camera.

means for generating the depth image,
means for two-dimensionally projecting each vertex of the estimated three-dimensional shape of the object onto the camera image using the camera parameters for each camera;
With the camera image of one camera as the reference image and the camera images of the other cameras as candidate images, the degree of difference between the two-dimensional projection point of each vertex in the reference image and the two-dimensional projection point of each vertex in each candidate image and means for calculating
3. The three-dimensional model reconstruction apparatus according to claim 2, wherein at least one candidate image having the smaller degree of difference is selected as a reference image.

means for generating the depth image,
means for two-dimensionally projecting each joint point of the estimated three-dimensional shape of the object onto the camera image using the camera parameters for each camera;
Using the camera image of one camera as a reference image and the camera images of the other cameras as candidate images, the distance between the two-dimensional projection point of each joint point in the reference image and the two-dimensional projection point of each joint point in each candidate image is and means for calculating the degree of dissimilarity,
3. The three-dimensional model reconstruction apparatus according to claim 2, wherein at least one candidate image having the smaller degree of difference is selected as a reference image.

5. The 3D model reconstruction apparatus according to claim 3, wherein the means for generating the depth image selects all the candidate images whose dissimilarity is less than a first threshold as reference images.

3. The means for generating the depth image selects, as the reference images, all images whose dissimilarity exceeds a second threshold smaller than the first threshold and falls below the first threshold. 6. The three-dimensional model restoration device according to 5.

4. The means for generating the depth image selects at least one candidate image having a relatively smaller dissimilarity from among the reference images whose dissimilarity exceeds a second threshold as the reference image. 5. The three-dimensional model restoration device according to 4.

8. The three-dimensional model reconstruction apparatus according to claim 3, wherein the means for generating the depth image uses a Procrustes distance as the dissimilarity.

9. The method according to any one of claims 1 to 8, wherein the means for generating the depth image specifies an object region in the stereo matching reference image based on the three-dimensional shape, and limits a stereo matching search range to the object region. 3D model reconstruction device according to any one of the above.

10. The three-dimensional image according to claim 9, wherein the means for generating the depth image identifies the object area by two-dimensionally projecting the three-dimensional shape onto the reference image using the camera parameters of each camera. Model restoration device.

The means for generating the depth image comprises means for converting the estimation result of the three-dimensional shape into an object parallax image using the camera parameters of each camera,
9. The method according to any one of claims 1 to 8, wherein for each pixel of the reference image to be focused on in stereo matching, a search range of corresponding pixels in the reference image is limited based on the estimation result of the object parallax image. 3D model reconstruction device.

The means for generating the depth image searches the search range of the corresponding pixel in the reference image for each pixel of the reference image of interest in stereo matching according to the parallax of the pixel of the object parallax image corresponding to the pixel of interest. 12. The three-dimensional model reconstruction device according to claim 11, wherein the range is restricted.

The means for generating the depth image converts a search range of corresponding pixels in the reference image for each pixel of the reference image of interest in stereo matching to the object parallax image corresponding to a predetermined pixel range including the pixel of interest. 12. The three-dimensional model restoration apparatus according to claim 11, wherein the search range is limited according to the parallax range of each pixel within the pixel range.

In a three-dimensional model restoration method in which a computer restores a three-dimensional model of an object based on a plurality of camera images photographing the object from different viewpoints,
estimating the three-dimensional shape of an object based on multiple camera images;
generating a depth image by stereo matching using the estimation result of the three-dimensional shape of the object for each camera;
A three-dimensional model restoration method comprising restoring a three-dimensional model of an object using depth images generated for each camera.

In a three-dimensional model restoration program for restoring a three-dimensional model of an object based on multiple camera images of the object taken from different viewpoints,
estimating a three-dimensional shape of an object based on multiple camera images;
a procedure for generating a depth image by stereo matching using the results of estimating the three-dimensional shape of the object for each camera;
A three-dimensional model restoration program for causing a computer to execute a procedure for restoring a three-dimensional model of an object using depth images generated for each camera.