JP2021015559A

JP2021015559A - Three-dimensional shape model generation device, three-dimensional shape model generation method, and program

Info

Publication number: JP2021015559A
Application number: JP2019131259A
Authority: JP
Inventors: 渡邉　隆史; Takashi Watanabe; 隆史渡邉; 酒井　修二; Shuji Sakai; 修二酒井
Original assignee: Toppan Printing Co Ltd
Current assignee: Toppan Inc
Priority date: 2019-07-16
Filing date: 2019-07-16
Publication date: 2021-02-12
Anticipated expiration: 2039-07-16
Also published as: JP7334516B2

Abstract

【課題】多視点画像を用いた三次元復元手法において作成した三次元形状の実際の寸法を、精度よく推定することができる三次元形状モデル生成装置、三次元形状モデル生成方法、及びプログラムを提供する。【解決手段】対象物を、互いに異なる視点から撮像した複数の多視点画像多視点画像の各画像におけるピントが合った領域を検出するピント領域検出部と、前記多視点画像の各画像におけるピントが合っている領域の画素について、前記対象物の三次元形状モデルを二次元平面に投影させた投影画像と、前記投影画像における視点位置との、仮想的な距離である仮想距離を算出する距離算出部と、前記仮想距離に対する、前記投影画像と、前記投影画像における視点位置までの実際の現実距離、の比を示すスケール値を推定するスケール推定部と、前記スケール値を用いて、前記仮想距離を、前記現実距離に変換するスケール変換部と、を備える。【選択図】図１PROBLEM TO BE SOLVED: To provide a three-dimensional shape model generation device, a three-dimensional shape model generation method, and a program capable of accurately estimating the actual dimensions of a three-dimensional shape created by a three-dimensional restoration method using a multi-viewpoint image. do. A plurality of multi-viewpoint images obtained by capturing an object from different viewpoints A focus region detection unit that detects an in-focus region in each image of a multi-viewpoint image and a focus region detection unit in each image of the multi-viewpoint image are Distance calculation for calculating a virtual distance, which is a virtual distance between a projected image obtained by projecting a three-dimensional shape model of the object onto a two-dimensional plane and a viewpoint position in the projected image for pixels in a matching region. The virtual distance is used by a scale estimation unit that estimates a scale value indicating the ratio of the unit, the projected image to the virtual distance, and the actual actual distance to the viewpoint position in the projected image, and the scale value. Is provided with a scale conversion unit that converts the image into the actual distance. [Selection diagram] Fig. 1

Description

本発明は、三次元形状モデル生成装置、三次元形状モデル生成方法、及びプログラムに関する。 The present invention relates to a three-dimensional shape model generation device, a three-dimensional shape model generation method, and a program.

従来、対象物を互いに異なる視点から撮像した複数の画像（多視点画像）を用いて、対象物の三次元形状モデルを生成する三次元復元手法がある。この手法では、多視点画像ごとに対象物の見え方が異なることから、ステレオカメラの原理を用いて画像における各画素の奥行値を計算することにより対象物の三次元形状を作成（復元）することができる。三次元復元手法では、三次元形状を作成することができるが、対象物の実際の大きさ（スケール）を求めることはできない。画像に対象物が撮像されているだけでは、対象物の実際の大きさを求めることができないためである。 Conventionally, there is a three-dimensional restoration method for generating a three-dimensional shape model of an object by using a plurality of images (multi-view images) obtained by capturing the object from different viewpoints. In this method, the appearance of the object differs for each multi-viewpoint image, so the three-dimensional shape of the object is created (restored) by calculating the depth value of each pixel in the image using the principle of a stereo camera. be able to. In the three-dimensional restoration method, a three-dimensional shape can be created, but the actual size (scale) of the object cannot be obtained. This is because the actual size of the object cannot be obtained only by capturing the object in the image.

画像から対象物のスケールを推定する方法の一つに、マーカを利用するものがある（例えば、特許文献１）。特許文献１には、実際の寸法（実寸）が既知のマーカを対象物と共に撮像した画像を用いて対象物の実寸を推定する技術が開示されている。 One of the methods for estimating the scale of an object from an image is to use a marker (for example, Patent Document 1). Patent Document 1 discloses a technique for estimating the actual size of an object by using an image obtained by capturing an image of a marker whose actual size (actual size) is known together with the object.

また、画像の被写界深度を用いて対象物のスケールを推定する方法がある（例えば、特許文献２）。被写界深度は、ピント（焦点）が合っているように認識されるカメラから対象物まで実際の距離の範囲である。特許文献２では、Ｄｅｐｔｈｆｒｏｍｄｅｆｏｃｕｓ方式を用いて、ピントが合っている位置が異なる複数の画像を取得し、取得した複数の画像における互いのピントの相関値を算出することにより、カメラから対象物まで実際の距離を算出する技術が開示されている。 There is also a method of estimating the scale of an object using the depth of field of an image (for example, Patent Document 2). Depth of field is the range of the actual distance from the camera that appears to be in focus to the object. In Patent Document 2, a plurality of images having different in-focus positions are acquired by using the Depth from defocus method, and the correlation value of each other's focus in the acquired plurality of images is calculated to obtain an object from the camera. The technique of calculating the actual distance is disclosed.

特開２０１８−５７５３２号公報JP-A-2018-57532 特許第５９３２４７６号公報Japanese Patent No. 5923476

しかしながら、マーカと対象物とを同時に、且つ、互いに異なる視点から複数の画像を撮像しようとすれば、対象物がマーカの影に隠れてしまう可能性がある。対象物がマーカの影に隠れてしまうと、その部分の三次元形状を精度よく作成することが困難となってしまう。
また、被写界深度には幅がある。このため、被写界深度から推定した距離には誤差が含まれており、精度よく距離を推定することができないという問題があった。 However, if the marker and the object are simultaneously captured and a plurality of images are taken from different viewpoints, the object may be hidden behind the marker. If the object is hidden behind the marker, it becomes difficult to accurately create the three-dimensional shape of that part.
In addition, the depth of field varies. Therefore, the distance estimated from the depth of field contains an error, and there is a problem that the distance cannot be estimated accurately.

本発明は、このような状況に鑑みてなされたもので、多視点画像を用いた三次元復元手法において作成した三次元形状の実際の寸法を、精度よく推定することができる三次元形状モデル生成装置、三次元形状モデル生成方法、及びプログラムを提供する。 The present invention has been made in view of such a situation, and a three-dimensional shape model generation capable of accurately estimating the actual dimensions of the three-dimensional shape created by the three-dimensional restoration method using a multi-viewpoint image. A device, a three-dimensional shape model generation method, and a program are provided.

本発明の三次元形状モデル生成装置は、対象物を、互いに異なる視点から撮像した複数の多視点画像から、前記対象物の三次元形状モデルを生成する三次元形状生成部と、前記多視点画像の各画像におけるピントが合った領域を検出するピント領域検出部と、前記多視点画像の各画像におけるピントが合っている領域の画素について、前記三次元形状モデルを二次元平面に投影させた投影画像と、前記投影画像における視点位置との、仮想的な距離である仮想距離を算出する距離算出部と、前記仮想距離に対する、前記投影画像と、前記投影画像における視点位置までの実際の現実距離、の比を示すスケール値を推定するスケール推定部と、前記スケール値を用いて、前記仮想距離を、前記現実距離に変換するスケール変換部と、を備えることを特徴とする。 The three-dimensional shape model generation device of the present invention includes a three-dimensional shape generator that generates a three-dimensional shape model of the object from a plurality of multi-viewpoint images obtained by capturing the object from different viewpoints, and the multi-viewpoint image. A projection in which the three-dimensional shape model is projected onto a two-dimensional plane with respect to a focus area detection unit that detects an in-focus area in each image and pixels in an in-focus area in each image of the multi-viewpoint image. A distance calculation unit that calculates a virtual distance that is a virtual distance between the image and the viewpoint position in the projected image, and the actual actual distance between the projected image and the viewpoint position in the projected image with respect to the virtual distance. It is characterized by including a scale estimation unit that estimates a scale value indicating the ratio of, and a scale conversion unit that converts the virtual distance into the real distance using the scale value.

本発明の三次元形状モデル生成装置では、前記ピント領域検出部は、前記多視点画像の各画像において、前記多視点画像を撮像したカメラのフォーカス機能を用いてフォーカスを合わせた領域に応じて、前記多視点画像の各画像におけるピントの合った領域を検出する。 In the three-dimensional shape model generation device of the present invention, the focus region detection unit responds to each image of the multi-viewpoint image according to the region focused by using the focus function of the camera that captured the multi-viewpoint image. A focused region in each image of the multi-viewpoint image is detected.

本発明の三次元形状モデル生成装置では、前記ピント領域検出部は、画像処理により、前記多視点画像の各画像におけるピントの合った領域を検出する。 In the three-dimensional shape model generation device of the present invention, the focus area detection unit detects an in-focus area in each image of the multi-viewpoint image by image processing.

本発明の三次元形状モデル生成装置では、前記ピント領域検出部は、入力と出力とが対応づけられた学習用データセットを用いて機械学習を行うことにより生成された学習済みモデルを用いて、前記多視点画像の各画像におけるピントの合った領域を検出し、前記学習用データセットの入力は、学習用の入力画像であり、前記学習用データセットの出力は、前記入力画像におけるピントの合った領域を示す情報である。 In the three-dimensional shape model generation device of the present invention, the focus area detection unit uses a trained model generated by performing machine learning using a training data set in which inputs and outputs are associated with each other. The in-focus area in each image of the multi-viewpoint image is detected, the input of the learning data set is an input image for learning, and the output of the learning data set is the in-focus area of the input image. This is information indicating the area.

本発明の三次元形状モデル生成装置では、前記学習用データセットの入力は、多視点画像の各画像であり、前記学習用データセットの出力は、実際の寸法が既知であるマーカの三次元形状モデルを、前記マーカの実際の寸法に基づいてスケール補正した、補正済み三次元形状モデルを二次元平面に投影させた、補正済みの投影画像における画素ごとのデプス値に基づいて判定した、前記補正済みの投影画像におけるピントの合った領域を示す情報である。 In the three-dimensional shape model generator of the present invention, the input of the training data set is each image of the multi-viewpoint image, and the output of the training data set is the three-dimensional shape of the marker whose actual dimensions are known. The correction was determined based on the depth value of each pixel in the corrected projected image, in which the model was scale-corrected based on the actual dimensions of the marker, and the corrected three-dimensional shape model was projected onto a two-dimensional plane. This is information indicating an in-focus area in the completed projected image.

本発明の三次元形状モデル生成装置では、前記距離算出部は、前記ピント領域検出部によりピントが合っていると判定された画素がエッジであるか否かの判定結果、及び前記ピント領域検出部によりピントが合っていると判定された画素に対応する対応画素であって、前記三次元形状モデルを二次元平面に投影させた投影画像の前記対応画素がエッジであるか否かの判定結果うち、少なくとも何れか一方の判定結果に基づき、前記投影画像の前記対応画素における前記仮想距離を、前記投影画像における前記仮想距離の算出に用いるか否かを判定する。 In the three-dimensional shape model generation device of the present invention, the distance calculation unit determines whether or not the pixel determined to be in focus by the focus region detection unit is an edge, and the focus region detection unit. Of the determination results of whether or not the corresponding pixel of the projected image obtained by projecting the three-dimensional shape model onto the two-dimensional plane is an edge, which is a corresponding pixel corresponding to the pixel determined to be in focus. Based on the determination result of at least one of them, it is determined whether or not the virtual distance in the corresponding pixel of the projected image is used for calculating the virtual distance in the projected image.

本発明の三次元形状モデル生成装置では、前記スケール推定部は、前記多視点画像を撮像したカメラのカメラパラメータから算出される被写界深度に基づき、前記現実距離を導出する。 In the three-dimensional shape model generation device of the present invention, the scale estimation unit derives the actual distance based on the depth of field calculated from the camera parameters of the camera that captured the multi-viewpoint image.

本発明の三次元形状モデル生成装置では、実際の寸法が既知であるマーカの三次元形状モデルを、実際の寸法に基づいてスケール補正した補正済み三次元形状モデルに基づいて、前記現実距離を導出するマーカスケール推定部を、更に備え、前記スケール推定部は、前記マーカスケール推定部により導出された前記現実距離を用いて、前記スケール値を推定する。 In the three-dimensional shape model generator of the present invention, the actual distance is derived based on the corrected three-dimensional shape model in which the three-dimensional shape model of the marker whose actual dimensions are known is scale-corrected based on the actual dimensions. A marker scale estimation unit is further provided, and the scale estimation unit estimates the scale value using the actual distance derived by the marker scale estimation unit.

本発明の三次元形状モデル生成装置では、前記マーカスケール推定部は、前記補正済み三次元形状モデルに基づいて、前記補正済み三次元形状モデルを二次元平面に投影させた、補正済みの投影画像における画素のうち、ピントが合っていると判定される画素のデプス値に基づき、前記現実距離を導出する。 In the three-dimensional shape model generation device of the present invention, the marker scale estimation unit projects the corrected three-dimensional shape model onto a two-dimensional plane based on the corrected three-dimensional shape model, and the corrected projected image. The actual distance is derived based on the depth value of the pixel determined to be in focus among the pixels in.

本発明の三次元形状モデル生成装置では、前記スケール推定部は、前記多視点画像を撮像したカメラのフォーカス機能から得られる距離に基づき、前記現実距離を導出する。 In the three-dimensional shape model generation device of the present invention, the scale estimation unit derives the actual distance based on the distance obtained from the focus function of the camera that has captured the multi-viewpoint image.

本発明の三次元形状モデル生成装置では、前記距離算出部は、前記検出されたピントが合っている領域に含まれる画素のデプス値に応じた距離を、前記投影画像の対応画素ごとに対応させることにより、前記投影画像の画素ごとの仮想距離を算出する。 In the three-dimensional shape model generation device of the present invention, the distance calculation unit makes the distance corresponding to the depth value of the pixels included in the detected in-focus region correspond to each corresponding pixel of the projected image. As a result, the virtual distance for each pixel of the projected image is calculated.

本発明の三次元形状モデル生成装置では、前記ピント領域検出部は、画素ごとにピントが合った度合いを検出し、前記距離算出部は、前記検出されたピントが合っている領域に含まれる画素のデプス値に応じた距離を、前記ピントが合った度合いに応じて重みづけし、重みづけした距離を前記投影画像の対応画素ごとに対応させることにより、前記投影画像の画素ごとの仮想距離を算出する。 In the three-dimensional shape model generation device of the present invention, the focus area detection unit detects the degree of focus for each pixel, and the distance calculation unit detects pixels included in the detected in-focus area. By weighting the distance according to the depth value of the above according to the degree of focus and making the weighted distance correspond to each corresponding pixel of the projected image, the virtual distance for each pixel of the projected image can be obtained. calculate.

本発明の三次元形状モデル生成装置では、前記距離算出部は、前記検出されたピントが合っている領域に含まれる画素のデプス値に応じた距離を、当該距離の大きさに応じて重みづけし、重みづけした距離を前記投影画像の対応画素ごとに対応させることにより、前記投影画像の画素ごとの仮想距離を算出する。 In the three-dimensional shape model generation device of the present invention, the distance calculation unit weights the distance according to the depth value of the pixels included in the detected region in focus according to the magnitude of the distance. Then, by associating the weighted distance with each corresponding pixel of the projected image, the virtual distance for each pixel of the projected image is calculated.

本発明の三次元形状モデル生成装置では、前記距離算出部は、前記多視点画像の各画像におけるピントが合った領域の画素ごとのデプス値に応じた距離を算出し、算出した距離を前記投影画像の画素に対応させ、前記投影画像の画素に対応する距離が複数ある場合において、当該複数の距離を比較し、当該複数の距離のばらつきの度合いに応じて、前記投影画像の画素における前記仮想距離を、前記投影画像における前記仮想距離の算出に用いるか否かを判定する。 In the three-dimensional shape model generation device of the present invention, the distance calculation unit calculates a distance according to the depth value of each pixel in the focused region in each image of the multi-viewpoint image, and the calculated distance is projected. When there are a plurality of distances corresponding to the pixels of the image and corresponding to the pixels of the projected image, the plurality of distances are compared, and the virtual in the pixels of the projected image is compared according to the degree of variation of the plurality of distances. It is determined whether or not the distance is used for calculating the virtual distance in the projected image.

本発明の三次元形状モデル生成方法は、三次元形状生成部が、対象物を、互いに異なる視点から撮像した複数の多視点画像から、前記対象物の三次元形状モデルを生成する三次元形状生成工程と、ピント領域検出部が、前記多視点画像の各画像におけるピントが合った領域を検出するピント領域検出工程と、距離算出部が、前記多視点画像の各画像におけるピントが合っている領域の画素について、前記三次元形状モデルを二次元平面に投影させた投影画像と、前記投影画像における視点位置との、仮想的な距離である仮想距離を算出する距離算出工程と、スケール推定部が、前記仮想距離に対する、前記投影画像と、前記投影画像における視点位置までの実際の現実距離、の比を示すスケール値を推定するスケール推定工程と、スケール変換部が、前記スケール値を用いて、前記仮想距離を、前記現実距離に変換するスケール変換工程と、を含むことを特徴とする。 In the three-dimensional shape model generation method of the present invention, the three-dimensional shape generation unit generates a three-dimensional shape model of the object from a plurality of multi-viewpoint images obtained by capturing the object from different viewpoints. The process, the focus area detection step in which the focus area detection unit detects the in-focus area in each image of the multi-viewpoint image, and the distance calculation unit in the focus area in each image of the multi-viewpoint image. A distance calculation process for calculating a virtual distance, which is a virtual distance between a projected image obtained by projecting the three-dimensional shape model onto a two-dimensional plane and a viewpoint position in the projected image, and a scale estimation unit A scale estimation step for estimating the ratio of the projected image to the actual actual distance to the viewpoint position in the projected image with respect to the virtual distance, and a scale conversion unit using the scale value. It is characterized by including a scale conversion step of converting the virtual distance into the real distance.

本発明のプログラムは、コンピュータを、対象物を、互いに異なる視点から撮像した複数の多視点画像から、前記対象物の三次元形状モデルを生成する三次元形状生成手段、前記多視点画像の各画像におけるピントが合った領域を検出するピント領域検出手段、前記多視点画像の各画像におけるピントが合っている領域の画素について、前記三次元形状モデルを二次元平面に投影させた投影画像と、前記投影画像における視点位置との、仮想的な距離である仮想距離を算出する距離算出手段と、前記仮想距離に対する、前記投影画像と、前記投影画像における視点位置までの実際の現実距離、の比を示すスケール値を推定するスケール推定手段、前記スケール値を用いて、前記仮想距離を、前記現実距離に変換するスケール変換手段、として動作させるためのプログラムである。 The program of the present invention is a three-dimensional shape generating means for generating a three-dimensional shape model of the object from a plurality of multi-view images obtained by capturing an object from different viewpoints, and each image of the multi-view image. A focus area detecting means for detecting an in-focus area in the above, a projected image obtained by projecting the three-dimensional shape model onto a two-dimensional plane with respect to pixels in the in-focus area in each image of the multi-viewpoint image, and the above. The ratio of the distance calculation means for calculating the virtual distance, which is a virtual distance, to the viewpoint position in the projected image, and the ratio of the projected image to the actual actual distance to the viewpoint position in the projected image with respect to the virtual distance. It is a program for operating as a scale estimation means for estimating the indicated scale value, and a scale conversion means for converting the virtual distance into the real distance by using the scale value.

本発明によれば、多視点画像を用いた三次元復元手法において作成した三次元形状の実際の寸法を、精度よく推定することができる。 According to the present invention, the actual dimensions of the three-dimensional shape created by the three-dimensional restoration method using the multi-viewpoint image can be estimated with high accuracy.

第１の実施形態に係る三次元形状モデル生成装置１の構成の例を示すブロック図である。It is a block diagram which shows the example of the structure of the 3D shape model generation apparatus 1 which concerns on 1st Embodiment. 第１の実施形態に係るスケール情報記憶部１０９に記憶される情報の構成の例を示す図である。It is a figure which shows the example of the structure of the information stored in the scale information storage unit 109 which concerns on 1st Embodiment. 第１の実施形態に係る複数の多視点画像ＴＧの例を示す図である。It is a figure which shows the example of the plurality of multi-viewpoint image TG which concerns on 1st Embodiment. 第１の実施形態に係る三次元形状モデルＭの例を示す図である。It is a figure which shows the example of the 3D shape model M which concerns on 1st Embodiment. 第１の実施形態に係る多視点画像ＴＧの例を示す図である。It is a figure which shows the example of the multi-viewpoint image TG which concerns on 1st Embodiment. 第１の実施形態に係る多視点画像ＴＧのブラーマップＢＭの例を示す図である。It is a figure which shows the example of the blur map BM of the multi-viewpoint image TG which concerns on 1st Embodiment. 第１の実施形態に係る被写界深度の関数の例を示す図である。It is a figure which shows the example of the function of the depth of field which concerns on 1st Embodiment. 第１の実施形態に係る三次元形状モデルＭの投影画像の画素における仮想距離の分布の例である。This is an example of the distribution of the virtual distance in the pixels of the projected image of the three-dimensional shape model M according to the first embodiment. 第１の実施形態に係る三次元形状モデル生成装置１が行う処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process performed by the 3D shape model generation apparatus 1 which concerns on 1st Embodiment. 第２の実施形態に係る三次元形状モデル生成装置１Ａの構成の例を示すブロック図である。It is a block diagram which shows the example of the structure of the 3D shape model generation apparatus 1A which concerns on 2nd Embodiment.

以下、実施形態の三次元形状モデル生成装置を、図面を参照しながら説明する。 Hereinafter, the three-dimensional shape model generator of the embodiment will be described with reference to the drawings.

＜第１の実施形態＞
まず、第１の実施形態について説明する。
図１は、第１の実施形態に係る三次元形状モデル生成装置１の構成の例を示すブロック図である。三次元形状モデル生成装置１は、例えば、画像データ取得部１０１と、三次元形状生成部１０２と、ピント領域検出部１０３と、距離算出部１０４と、スケール推定部１０５と、スケール変換部１０６と、画像データ記憶部１０７と、三次元形状記憶部１０８と、スケール情報記憶部１０９とを備える。 <First Embodiment>
First, the first embodiment will be described.
FIG. 1 is a block diagram showing an example of the configuration of the three-dimensional shape model generation device 1 according to the first embodiment. The three-dimensional shape model generation device 1 includes, for example, an image data acquisition unit 101, a three-dimensional shape generation unit 102, a focus area detection unit 103, a distance calculation unit 104, a scale estimation unit 105, and a scale conversion unit 106. The image data storage unit 107, the three-dimensional shape storage unit 108, and the scale information storage unit 109 are provided.

画像データ取得部１０１は、多視点画像ＴＧ（図３Ａ参照）の画像情報を、画像データ記憶部１０７から取得する。多視点画像ＴＧは、対象物Ｔが互いに異なる視点から撮像された画像である。対象物Ｔは、撮像し得る物体であって、任意の三次元形状を有する物体である。多視点画像ＴＧの画像情報には、多視点画像ＴＧの画素ごとの、ＲＧＢ値等の色、又はグレースケールを示す情報を含む。 The image data acquisition unit 101 acquires the image information of the multi-viewpoint image TG (see FIG. 3A) from the image data storage unit 107. The multi-viewpoint image TG is an image in which the objects T are captured from different viewpoints. The object T is an object that can be imaged and has an arbitrary three-dimensional shape. The image information of the multi-viewpoint image TG includes information indicating a color such as an RGB value or a gray scale for each pixel of the multi-viewpoint image TG.

画像データ取得部１０１は、多視点画像ＴＧのカメラパラメータを、スケール情報記憶部１０９から取得する。多視点画像ＴＧのカメラパラメータとは、多視点画像ＴＧの属性情報であって、いわゆるＥｘｉｆ（Exchangeable image file format）により示される情報である。例えば、カメラパラメータは、多視点画像ＴＧを撮像した際における、視点位置（撮像時のカメラの位置）、撮像方向、画角などを示す情報である。また、カメラパラメータには、多視点画像ＴＧを撮像した撮像装置（カメラ）に関する情報を含んでいてもよい。撮像装置に関する情報は、撮像装置の構成要素の仕様や撮像時の状態を示す情報であって、例えば、撮像時におけるレンズの焦点距離、シャッタースピード、露光状態、画像の分解能（ピクセル数）、レンズの歪曲収差係数などを示す情報である。 The image data acquisition unit 101 acquires the camera parameters of the multi-viewpoint image TG from the scale information storage unit 109. The camera parameter of the multi-viewpoint image TG is the attribute information of the multi-viewpoint image TG, and is the information indicated by the so-called Exif (Exchangeable image file format). For example, the camera parameter is information indicating the viewpoint position (position of the camera at the time of imaging), the imaging direction, the angle of view, and the like when the multi-viewpoint image TG is imaged. Further, the camera parameters may include information about an imaging device (camera) that has captured the multi-viewpoint image TG. The information about the image pickup device is information indicating the specifications of the components of the image pickup device and the state at the time of imaging. This is information indicating the distortion coefficient of the lens.

画像データ取得部１０１は、複数の多視点画像ＴＧにおける画像情報、及びカメラパラメータを取得し、取得した情報を三次元形状生成部１０２に出力する。 The image data acquisition unit 101 acquires image information and camera parameters in a plurality of multi-viewpoint image TGs, and outputs the acquired information to the three-dimensional shape generation unit 102.

三次元形状生成部１０２は、対象物Ｔの三次元形状モデルＭを作成する。三次元形状生成部１０２は、まず、複数の多視点画像ＴＧの画像情報、及びカメラパラメータを用いて、ステレオマッチングの原理から複数の多視点画像ＴＧの各々のデプスマップを生成する。デプスマップは、画像の各画素の奥行き（デプス）を示す情報（マップ）である。 The three-dimensional shape generation unit 102 creates a three-dimensional shape model M of the object T. First, the three-dimensional shape generation unit 102 generates the depth maps of each of the plurality of multi-viewpoint image TGs from the principle of stereo matching by using the image information of the plurality of multi-viewpoint image TGs and the camera parameters. The depth map is information (map) indicating the depth (depth) of each pixel of the image.

三次元形状生成部１０２は、多視点画像ＴＧの各々のデプスマップを統合して三次元点群を生成する。三次元点群は、対象物Ｔの三次元形状に対応する三次元点の集合である。三次元形状生成部１０２は、三次元点群を用いて、メッシュモデルを生成する。メッシュモデルは、対象物の三次元形状をポリゴン（多角形）の集合体として示す三次元形状モデルである。三次元形状生成部１０２は、例えば、メッシュ再構築（Poisson Surface Reconstruction）の手法を用いて、三次元点群からメッシュモデルを生成する。三次元形状生成部１０２は、生成したメッシュモデルを三次元形状モデルとする。 The three-dimensional shape generation unit 102 integrates each depth map of the multi-viewpoint image TG to generate a three-dimensional point cloud. The three-dimensional point cloud is a set of three-dimensional points corresponding to the three-dimensional shape of the object T. The three-dimensional shape generation unit 102 generates a mesh model using the three-dimensional point cloud. The mesh model is a three-dimensional shape model that shows the three-dimensional shape of an object as an aggregate of polygons. The three-dimensional shape generation unit 102 generates a mesh model from a three-dimensional point cloud, for example, by using a method of mesh reconstruction (Poisson Surface Reconstruction). The three-dimensional shape generation unit 102 uses the generated mesh model as a three-dimensional shape model.

三次元形状生成部１０２は、生成した三次元点群、及びメッシュモデルに関する情報を、三次元形状記憶部１０８に記憶させる。三次元点群に関する情報には、三次元点群の各点の座標（三次元座標）を示す情報が含まれる。また、三次元点群に関する情報には、三次元点群の各点の色（例えば、ＲＧＢ値など）を示す情報が含まれてもよい。メッシュモデルに関する情報には、メッシュモデルを構成するポリゴン（多角形）の形状、座標（三次元座標）、色、テクスチャ等を示す情報が含まれる。 The three-dimensional shape generation unit 102 stores the generated information about the three-dimensional point cloud and the mesh model in the three-dimensional shape storage unit 108. The information about the three-dimensional point cloud includes information indicating the coordinates (three-dimensional coordinates) of each point of the three-dimensional point cloud. Further, the information regarding the three-dimensional point cloud may include information indicating the color (for example, RGB value) of each point in the three-dimensional point cloud. The information about the mesh model includes information indicating the shape, coordinates (three-dimensional coordinates), colors, textures, and the like of polygons (polygons) constituting the mesh model.

ピント領域検出部１０３は、複数の多視点画像ＴＧの各々における、ピントの合った領域を検出する。ピント領域検出部１０３は、例えば、機械学習的手法を用いて、複数の多視点画像ＴＧの各々におけるピントの合った領域を検出する。この場合、ピント領域検出部１０３は、学習済みモデルに多視点画像ＴＧを入力する。学習済みモデルは、例えば、畳み込みニューラルネットワーク（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｕｅｒａｌＮｅｔｗｏｒｋ、ＣＮＮ）の学習モデルに、学習用データセットを学習させることにより生成されたモデルである。学習用のデータセットは、入力と出力（入力に対する答え）が組み合わされた（セットになった）情報である。 The focus area detection unit 103 detects an in-focus area in each of the plurality of multi-viewpoint image TGs. The focus region detection unit 103 detects an in-focus region in each of the plurality of multi-viewpoint image TGs by using, for example, a machine learning method. In this case, the focus area detection unit 103 inputs the multi-viewpoint image TG into the trained model. The trained model is, for example, a model generated by training a training data set of a convolutional neural network (CNN) with a training model of a convolutional neural network (CNN). A data set for training is information in which inputs and outputs (answers to inputs) are combined (set).

ここでの、学習用データセットの入力は、学習用に用意した任意の対象物が撮像された画像であり、ピントが合っている部分と合っていない部分とが混在した画像である。学習用データセットの出力は、学習用の画像におけるピントが合っている部分と合っていない部分とを示す情報であり、例えば、画素ごとにピントが合っているか否かを示す情報が対応付けられたものである。学習用データセットの出力は、例えば、学習用データセットを作成する作業員により判断される。つまり、作業員が、画素ごとにピントが合っているか否かを判断し、学習用データセットの出力に設定する。 Here, the input of the learning data set is an image in which an arbitrary object prepared for learning is captured, and is an image in which an in-focus part and an out-of-focus part are mixed. The output of the training data set is information indicating the in-focus part and the out-of-focus part of the training image. For example, information indicating whether or not the image is in focus is associated with each pixel. It is a data set. The output of the training data set is determined, for example, by the worker who creates the training data set. That is, the worker determines whether or not each pixel is in focus and sets it as the output of the learning data set.

このような学習用データセットを学習することにより、学習済みモデルは、入力された（未学習の）画像に対し、その画像における画素ごとのピントが合っている度合いを推定する（出力する）モデルとなる。 By training such a training data set, the trained model estimates (outputs) the degree to which each pixel in the input (unlearned) image is in focus. It becomes.

ピント領域検出部１０３は、検出結果を示す情報（複数の多視点画像ＴＧの各々における、ピントの合った領域を示す情報）を、画像データ記憶部１０７に記憶させる。検出結果を示す情報は、例えば、多視点画像ＴＧの各々の画像の画素ごとに、ピントが合っている度合い（以下、ブラー量ともいう）が対応付けられた情報である。ブラー量は、例えば０から１までの実数値で表現され、０に近いとピントが合っていることを示し、１に近いとピントが合っていないことを示す。つまり、ピントが合っている方が、ブラー量が小さくなる。 The focus area detection unit 103 stores the information indicating the detection result (information indicating the in-focus area in each of the plurality of multi-viewpoint image TGs) in the image data storage unit 107. The information indicating the detection result is, for example, information in which the degree of focusing (hereinafter, also referred to as the blur amount) is associated with each pixel of each image of the multi-viewpoint image TG. The amount of blur is expressed by a real value from 0 to 1, for example, and when it is close to 0, it indicates that it is in focus, and when it is close to 1, it indicates that it is out of focus. That is, the amount of blur is smaller when the subject is in focus.

なお、上記では、ピント領域検出部１０３が機械学習の手法を用いて、ピントの合った領域を検出する場合を例示して説明したが、これに限定されない。ピント領域検出部１０３は、任意の手法を用いてピントの合った領域を検出してよい。例えば、ピント領域検出部１０３は、かめらのフォーカス機能により、フォーカスを合せた領域を、ピントの合った領域と判定するようにしてもよい。このフォーカスを合せた領域は、例えば、撮像時に撮像範囲を表示する背面ディスプレイに、撮像範囲の上に重ねられるようにして表示される、フォーカスされている領域を示す枠の内側の領域である。 In the above description, the case where the focus area detection unit 103 detects the in-focus area by using the machine learning method has been described as an example, but the present invention is not limited to this. The focus area detection unit 103 may detect an in-focus area by using an arbitrary method. For example, the focus area detection unit 103 may determine the focused area as the focused area by the focus function of the camera. This focused area is, for example, an area inside a frame indicating a focused area, which is displayed so as to be superimposed on the imaging range on a rear display that displays the imaging range at the time of imaging.

例えば、ピント領域検出部１０３は、画像処理を用いてピントの合った領域を検出してもよい。この場合、ピント領域検出部１０３は、画像における色の変化の度合いを、周波数解析により検出する。ピント領域検出部１０３は、画素ごとのＲＧＢ値について、その画素の近傍における局所領域についてフーリエ変換を行い、その局所領域における高周波数成分を抽出する。そして、ピント領域検出部１０３は、抽出した局所領域における高周波数成分が、所定の閾値以上であるか否かに応じてその局所領域のピントが合っているか否を判定する。ピント領域検出部１０３は、局所領域における高周波数成分のレベルが所定の閾値以上である場合にその局所領域のピントが合っていると判定し、局所領域における高周波数成分のレベルが所定の閾値未満である場合にその局所領域のピントが合っていないと判定する。 For example, the focus area detection unit 103 may detect an in-focus area by using image processing. In this case, the focus region detection unit 103 detects the degree of color change in the image by frequency analysis. The focus region detection unit 103 performs a Fourier transform on the RGB value for each pixel for a local region in the vicinity of the pixel, and extracts a high frequency component in the local region. Then, the focus region detection unit 103 determines whether or not the high frequency component in the extracted local region is in focus depending on whether or not the high frequency component is equal to or higher than a predetermined threshold value. When the level of the high frequency component in the local region is equal to or higher than a predetermined threshold value, the focus region detection unit 103 determines that the local region is in focus, and the level of the high frequency component in the local region is less than the predetermined threshold value. If, it is determined that the local region is out of focus.

距離算出部１０４は、仮想距離を算出する。ここでの仮想距離は、所定の位置から、三次元形状モデルを構成する三次元点群の任意の点に対応する対象物Ｔの対応部分までの、仮想的な距離である。所定の位置は、三次元形状モデルを二次元平面に再投影させて生成される投影画像における、仮想的な視点位置である。 The distance calculation unit 104 calculates the virtual distance. The virtual distance here is a virtual distance from a predetermined position to a corresponding portion of the object T corresponding to an arbitrary point of the three-dimensional point cloud constituting the three-dimensional shape model. The predetermined position is a virtual viewpoint position in the projected image generated by reprojecting the three-dimensional shape model onto the two-dimensional plane.

仮想距離は、実際の距離（現実距離）を定数倍した値となる。これは、距離算出部１０４が、仮想距離を、三次元点群の各点の奥行値（デプス値）に応じて算出するためである。三次元点群の奥行値は、多視点画像ＴＧから各画像に撮像された対象物Ｔの相対的な位置関係に基づいて算出される値である。このため、現実の寸法に応じた位置関係ではなく、何らかの値を基準とした相対的な値となる。したがって、三次元点群の奥行値に応じて算出される距離は、現実距離と比例関係にある仮想的な距離となる。 The virtual distance is a value obtained by multiplying the actual distance (real distance) by a constant. This is because the distance calculation unit 104 calculates the virtual distance according to the depth value (depth value) of each point in the three-dimensional point cloud. The depth value of the three-dimensional point cloud is a value calculated based on the relative positional relationship of the object T imaged in each image from the multi-viewpoint image TG. Therefore, it is not a positional relationship according to the actual dimensions, but a relative value based on some value. Therefore, the distance calculated according to the depth value of the three-dimensional point cloud is a virtual distance that is proportional to the real distance.

距離算出部１０４は、複数の多視点画像ＴＧの各々における、ピントの合った領域に含まれる画素を用いて、仮想距離を算出する。ピントの合った領域とは、ピント領域検出部１０３により検出された領域である。 The distance calculation unit 104 calculates the virtual distance using the pixels included in the in-focus region of each of the plurality of multi-viewpoint image TGs. The in-focus area is an area detected by the focus area detection unit 103.

距離算出部１０４は、例えば、ピント領域検出部１０３により検出された多視点画像ＴＧの画素に対応付けられたブラー量が、所定の閾値（例えば、０．１）未満である画素をピントの合った画素とする。距離算出部１０４は、ピントの合った画素の奥行値（デプス値）を取得する。画素の奥行値（デプス値）は、例えば、三次元形状生成部１０２により三次元形状モデルＭが生成される過程において算出される、画素ごとの奥行値（デプス値）そのものである。 The distance calculation unit 104 focuses on pixels in which the amount of blur associated with the pixels of the multi-viewpoint image TG detected by the focus area detection unit 103 is less than a predetermined threshold value (for example, 0.1). Pixel. The distance calculation unit 104 acquires the depth value (depth value) of the focused pixel. The pixel depth value (depth value) is, for example, the depth value (depth value) for each pixel calculated in the process of generating the three-dimensional shape model M by the three-dimensional shape generation unit 102.

距離算出部１０４は、ピントの合った画素の奥行値（デプス値）の各々を、投影画像の各画素に対応させる。投影画像の各画素には、一つ又は複数の多視点画像ＴＧにおけるピントの合った画素の各々の奥行値が対応付けられる。距離算出部１０４は、投影画像の各画素に対応する複数の多視点画像ＴＧの画素を統合することにより、投影画像の各画素における仮想距離を算出する。 The distance calculation unit 104 makes each of the depth values (depth values) of the in-focus pixels correspond to each pixel of the projected image. Each pixel of the projected image is associated with the depth value of each of the in-focus pixels in one or more multi-viewpoint image TGs. The distance calculation unit 104 calculates the virtual distance in each pixel of the projected image by integrating the pixels of the plurality of multi-viewpoint image TGs corresponding to each pixel of the projected image.

一般に、カメラには、そのカメラパラメータに応じて決定される被写界深度（図４参照）が存在し、ピントが合う範囲が予め定められている。この被写界深度は、画像に撮像された被写体のピントが合った部分における、視点位置から当該部分までの実際の距離に依存する。このことから、被写界深度を利用すれば、現実距離を求めることが可能である。一方、上述したとおり、仮想距離はピントの合った領域における、所定の位置から対象物Ｔの対応部分までの仮想的な距離である。つまり、仮想距離は、被写界深度を定数倍した値ということができ、被写界深度を介して、仮想距離に対する現実距離の比（スケール値ＳＣ）を求めることが可能である。 In general, a camera has a depth of field (see FIG. 4) determined according to its camera parameters, and a range of focus is predetermined. This depth of field depends on the actual distance from the viewpoint position to the portion in which the subject captured in the image is in focus. From this, it is possible to obtain the actual distance by using the depth of field. On the other hand, as described above, the virtual distance is a virtual distance from a predetermined position to the corresponding portion of the object T in the focused region. That is, the virtual distance can be said to be a value obtained by multiplying the depth of field by a constant, and the ratio of the real distance to the virtual distance (scale value SC) can be obtained via the depth of field.

しかしながら、被写界深度には、幅がある。このため、仮想距離の値は、画素ごとにばらつきが生じる（図５参照）。このばらつきが、仮想距離の真値に対する誤差となり、仮想距離の精度が劣化する要因となり得る。また、ピント領域検出部１０３は、学習済みモデルを用いてピントが合うか否かを推定している。このため、学習済みモデルに学習させる学習用データセットの内容によっては、推定の精度が不十分なものとなり得る。推定の精度が悪ければ、被写界深度の範囲外の（つまりピントが合っていない範囲）の画素について、ピントが合っているとする誤った推定が行われる可能性がある。仮想距離を求めた画素に、実際にはピントが合っていないにもかかわらず、誤った推定によりピントが合っているとみなされた画素が含まれていれば、その画素における仮想距離が、仮想距離の真値に対する誤差となる。 However, the depth of field varies. Therefore, the value of the virtual distance varies from pixel to pixel (see FIG. 5). This variation becomes an error with respect to the true value of the virtual distance, and can be a factor of deteriorating the accuracy of the virtual distance. In addition, the focus region detection unit 103 estimates whether or not the focus is achieved by using the trained model. Therefore, the accuracy of estimation may be insufficient depending on the contents of the training data set to be trained by the trained model. If the estimation is inaccurate, it is possible that pixels outside the depth of field range (that is, out of focus) will be erroneously estimated to be in focus. If the pixel for which the virtual distance is calculated includes a pixel that is considered to be in focus due to incorrect estimation even though it is not actually in focus, the virtual distance in that pixel is virtual. It is an error with respect to the true value of the distance.

この対策として、本実施形態では、投影画像の各画素の仮想距離に、統計的な処理を行うことにより仮想距離の真値を探索する。 As a countermeasure, in the present embodiment, the true value of the virtual distance is searched for by performing statistical processing on the virtual distance of each pixel of the projected image.

例えば、距離算出部１０４は、投影画像の各画素の仮想距離にＲＡＮＳＡＣ（RANdom SAmple Consensus）を適用することにより、仮想距離の真値を探索する。ＲＡＮＳＡＣでは、外れ値（アウトライア）、つまり誤差を含むデータ群に対し、ランダムに抽出したデータサンプルに最小二乗法を適用することを繰り返すことにより、外れ値を含まないデータを推定する手法である。距離算出部１０４は、被写界深度の範囲を、ＲＡＮＳＡＣにおけるインライア（誤差の範囲）として計算することにより、仮想距離の真値を探索する。 For example, the distance calculation unit 104 searches for the true value of the virtual distance by applying RANSAC (RANdom SAmple Consensus) to the virtual distance of each pixel of the projected image. RANSAC is a method of estimating data that does not include outliers by repeating applying the least squares method to randomly extracted data samples for outliers, that is, data groups that include errors. .. The distance calculation unit 104 searches for the true value of the virtual distance by calculating the range of the depth of field as an aligner (range of error) in RANSAC.

ただし、被写界深度が、被写体までの距離ｕ（図４参照）をパラメータとして算出される値である。これに対し、仮想距離は、現実距離に換算される前の仮想上の距離である。このため、距離算出部１０４は、ＲＡＮＳＡＣにおけるインライアとして、被写界深度を用いる際、被写体までの距離uとして仮の値（例えば、２００ｍｍなど）に設定する。距離算出部１０４は、投影画像の各画素の仮想距離にＲＡＮＳＡＣを適用することにより求められた距離を、仮想距離の真値とする。距離算出部１０４は、投影画像の各画素の仮想距離、及び仮想距離の真値を、三次元形状記憶部１０８に記憶させる。 However, the depth of field is a value calculated with the distance u to the subject (see FIG. 4) as a parameter. On the other hand, the virtual distance is a virtual distance before being converted into a real distance. Therefore, when the depth of field is used as the aligner in RANSAC, the distance calculation unit 104 sets a tentative value (for example, 200 mm or the like) as the distance u to the subject. The distance calculation unit 104 sets the distance obtained by applying RANSAC to the virtual distance of each pixel of the projected image as the true value of the virtual distance. The distance calculation unit 104 stores the virtual distance of each pixel of the projected image and the true value of the virtual distance in the three-dimensional shape storage unit 108.

なお、上記では、距離算出部１０４がＲＡＮＳＡＣを用いて、仮想距離の真値を算出する場合を例示して説明したが、これに限定されない。距離算出部１０４は、少なくとも統計的な手法を用いて、ばらつきを含む仮想距離の集合から、最も確からしい仮想距離を算出すればよい。例えば、距離算出部１０４は、仮想距離の集合から代表値を導出し、導出した値を仮想距離の真値としてもよい。代表値は、仮想距離の集合から統計的手法により導出される任意の値であってよいが、例えば、単純加算平均値、重みづけ平均値、中央値、最大値、最小値等である。或いは、距離算出部１０４は、仮想距離の集合から取捨選択した仮想距離を用いて、仮想距離の真値を算出してもよい。この場合、例えば、三次元点群の同一の点に対応する複数の画素における仮想距離のばらつきが大きい場合には、仮想距離の真値を算出しないようにしてもよい。 In the above description, the case where the distance calculation unit 104 calculates the true value of the virtual distance by using RANSAC has been described as an example, but the present invention is not limited to this. The distance calculation unit 104 may calculate the most probable virtual distance from a set of virtual distances including variations by using at least a statistical method. For example, the distance calculation unit 104 may derive a representative value from the set of virtual distances and use the derived value as the true value of the virtual distance. The representative value may be an arbitrary value derived from a set of virtual distances by a statistical method, and is, for example, a simple addition mean value, a weighted mean value, a median value, a maximum value, a minimum value, or the like. Alternatively, the distance calculation unit 104 may calculate the true value of the virtual distance by using the virtual distance selected from the set of virtual distances. In this case, for example, when the variation of the virtual distance between a plurality of pixels corresponding to the same point in the three-dimensional point cloud is large, the true value of the virtual distance may not be calculated.

スケール推定部１０５は、スケール値ＳＣを推定する。スケール値ＳＣは、仮想距離に対する現実距離である。スケール推定部１０５は、スケール値ＳＣを推定する際の仮想距離として、距離算出部１０４により算出された仮想距離の真値を用いる。スケール推定部１０５は、スケール値ＳＣを推定する際の現実距離を、カメラパラメータを用いて導出する。スケール推定部１０５は、多視点画像ＴＧのＥｘｉｆに、被写体までの距離ｕそのものが記載されている場合には、その情報を現実距離として用いる。或いは、スケール推定部１０５は、多視点画像ＴＧのＥｘｉｆに、被写界深度、焦点距離、レンズＦ値、許容錯乱円径が示されている場合には、図４の関係式に基づいて、被写体までの距離ｕを算出し、算出した値を現実距離とする。或いは、スケール推定部１０５は、カメラの表示機能として、撮像時に被写体までの距離ｕに関する情報が、カメラの背面ディスプレイに表示される場合には、その表示に応じた値を現実距離とするようにしてもよい。スケール推定部１０５は、スケール値ＳＣの推定に用いた現実距離、及び推定したスケール値ＳＣを、スケール情報記憶部１０９に記憶させる。 The scale estimation unit 105 estimates the scale value SC. The scale value SC is the real distance with respect to the virtual distance. The scale estimation unit 105 uses the true value of the virtual distance calculated by the distance calculation unit 104 as the virtual distance when estimating the scale value SC. The scale estimation unit 105 derives the actual distance when estimating the scale value SC by using the camera parameters. When the Exif of the multi-viewpoint image TG describes the distance u itself to the subject, the scale estimation unit 105 uses that information as the actual distance. Alternatively, when the Exif of the multi-viewpoint image TG indicates the depth of field, the focal length, the lens F value, and the permissible circle of confusion diameter, the scale estimation unit 105 is based on the relational expression of FIG. The distance u to the subject is calculated, and the calculated value is used as the actual distance. Alternatively, as a display function of the camera, the scale estimation unit 105 sets the value corresponding to the display as the actual distance when the information on the distance u to the subject is displayed on the rear display of the camera at the time of imaging. You may. The scale estimation unit 105 stores the actual distance used for estimating the scale value SC and the estimated scale value SC in the scale information storage unit 109.

なお、処理を簡単にするため、三次元形状モデルＭの生成に用いた複数の多視点画像ＴＧにおけるカメラパラメータを統一（固定）するほうが望ましい。多視点画像ＴＧごとにカメラパラメータの内容が互いに異なる設定とする場合、カメラパラメータの内容ごとに、スケール値ＳＣ等が算出されるようにする。 In order to simplify the processing, it is desirable to unify (fix) the camera parameters in the plurality of multi-viewpoint image TGs used to generate the three-dimensional shape model M. When the contents of the camera parameters are set differently for each multi-viewpoint image TG, the scale value SC or the like is calculated for each contents of the camera parameters.

スケール変換部１０６は、仮想距離を現実距離に換算することにより、スケール変換を行う。スケール変換部１０６は、距離算出部１０４により算出された仮想距離に、スケール推定部１０５により推定されたスケール値ＳＣを乗算することにより、スケール変換を行う。また、スケール変換部１０６は、投影画像の各画素の三次元座標にスケール値ＳＣを乗算することにより、三次元形状を実際の寸法に応じた座標系に対応させる。これにより、三次元形状の実際の寸法を求めることができる。 The scale conversion unit 106 performs scale conversion by converting the virtual distance into a real distance. The scale conversion unit 106 performs scale conversion by multiplying the virtual distance calculated by the distance calculation unit 104 by the scale value SC estimated by the scale estimation unit 105. Further, the scale conversion unit 106 makes the three-dimensional shape correspond to the coordinate system according to the actual dimensions by multiplying the three-dimensional coordinates of each pixel of the projected image by the scale value SC. This makes it possible to obtain the actual dimensions of the three-dimensional shape.

画像データ記憶部１０７は、多視点画像ＴＧに関する情報を記憶する。多視点画像ＴＧに関する情報には、多視点画像ＴＧの画素ごとに算出された奥行値を示す情報（デプス値）、およびピントが合っている度合いを示す情報（ブラー量）が含まれる。
三次元形状記憶部１０８は、三次元形状モデルに関する情報を記憶する。三次元形状モデルに関する情報には、投影画像の各画素における仮想距離、及び三次元形状モデルにおける仮想距離の真値を示す情報が含まれる。
スケール情報記憶部１０９は、スケール変換に関する情報を記憶する。 The image data storage unit 107 stores information related to the multi-viewpoint image TG. The information regarding the multi-viewpoint image TG includes information indicating a depth value (depth value) calculated for each pixel of the multi-viewpoint image TG and information indicating the degree of focusing (blurring amount).
The three-dimensional shape storage unit 108 stores information about the three-dimensional shape model. The information about the three-dimensional shape model includes information indicating the virtual distance in each pixel of the projected image and the true value of the virtual distance in the three-dimensional shape model.
The scale information storage unit 109 stores information related to scale conversion.

図２は、第１の実施形態に係るスケール情報記憶部１０９に記憶される情報（スケール情報）の構成の例を示す図である。例えば、スケール情報記憶部１０９は、多視点画像ＴＧごとに作成される。
図２に示すように、スケール変換に関する情報には、カメラパラメータ、及びスケール変換用パラメータ等の項目を有する。カメラパラメータには、カメラ及び撮像時の属性情報、例えばＥｘｉｆを示す情報が含まれる。カメラパラメータには、画像ＩＤ、カメラ機種、焦点距離、フォーカス、レンズＦ値、許容錯乱円径、被写界深度等を示す情報が含まれる。スケール変換用パラメータには、スケール推定部１０５により推定されたスケール値ＳＣ、及びスケール値ＳＣの推定に用いられた現実距離を示す情報が含まれる。 FIG. 2 is a diagram showing an example of the configuration of information (scale information) stored in the scale information storage unit 109 according to the first embodiment. For example, the scale information storage unit 109 is created for each multi-viewpoint image TG.
As shown in FIG. 2, the information related to scale conversion includes items such as camera parameters and scale conversion parameters. The camera parameters include attribute information at the time of camera and imaging, for example, information indicating Exif. The camera parameters include information indicating an image ID, a camera model, a focal length, a focus, a lens F value, an allowable circle of confusion, a depth of field, and the like. The scale conversion parameters include the scale value SC estimated by the scale estimation unit 105 and the information indicating the actual distance used for estimating the scale value SC.

図３Ａは、第１の実施形態に係る複数の多視点画像ＴＧの例を示す図である。ここでの対象物Ｔはウッドボードに載せられたパンである。図３Ａに示すように、多視点画像ＴＧは、対象物Ｔを互いに異なる視点から撮像した複数の画像から構成される。 FIG. 3A is a diagram showing an example of a plurality of multi-viewpoint image TGs according to the first embodiment. The object T here is bread placed on a wood board. As shown in FIG. 3A, the multi-viewpoint image TG is composed of a plurality of images obtained by capturing the object T from different viewpoints.

図３Ｂは、第１の実施形態に係る三次元形状モデルＭの例を示す図である。図３Ｂに示すように、多視点画像ＴＧから三次元形状が復元できる。この三次元形状モデルＭは、形状を復元しているが、実際の寸法は不明である。実際の寸法は、三次元形状モデルＭを拡大又は縮小した大きさとなるが、その具体的な係数は、不明である。 FIG. 3B is a diagram showing an example of the three-dimensional shape model M according to the first embodiment. As shown in FIG. 3B, the three-dimensional shape can be restored from the multi-viewpoint image TG. This three-dimensional shape model M restores the shape, but the actual dimensions are unknown. The actual size is the size obtained by enlarging or reducing the three-dimensional shape model M, but the specific coefficient thereof is unknown.

図３Ｃは、第１の実施形態に係る多視点画像ＴＧの例を示す図である。図３Ｃに示すように、多視点画像ＴＧの一部の撮像領域、例えば、図３Ｃに示す多視点画像ＴＧに撮像されたウッドボードの左側の端部、においてピントが合っていない。また、多視点画像ＴＧの他の一部の撮像領域、例えば、図３Ｃに示す多視点画像ＴＧに撮像されたパンの中心から右側の部分、においてピントが合っている。 FIG. 3C is a diagram showing an example of the multi-viewpoint image TG according to the first embodiment. As shown in FIG. 3C, a part of the imaging region of the multi-viewpoint image TG, for example, the left end of the wood board imaged by the multi-viewpoint image TG shown in FIG. 3C, is out of focus. In addition, a part of the other imaging region of the multi-viewpoint image TG, for example, a portion on the right side from the center of the pan imaged by the multi-viewpoint image TG shown in FIG. 3C, is in focus.

図３Ｄは、第１の実施形態に係る多視点画像ＴＧのブラーマップＢＭの例を示す図である。ブラーマップは、画像における画素ごとにブラー量が対応付けられた画像（マップ）である。この例では、白に近づくにしたがい、ブラー量が大きい、つまりピントが合っていない度合いが高いことを示している。また、黒に近づくにしたがい、ブラー量が小さい、つまりピントが合っている度合いが高いことを示している。 FIG. 3D is a diagram showing an example of a blur map BM of the multi-viewpoint image TG according to the first embodiment. A blur map is an image (map) in which a blur amount is associated with each pixel in the image. In this example, the closer to white, the greater the amount of blur, that is, the greater the degree of out-of-focus. It also shows that the closer to black, the smaller the amount of blur, that is, the higher the degree of focus.

図４は、第１の実施形態に係る被写界深度の関数の例を示す図である。図４において、ＤｏＦ_ｆは前方被写界深度、ＤｏＦ_ｒは後方被写界深度、ＮはレンズＦ値、ｃは許容錯乱円径、ｆは焦点距離、ｕは被写体までの距離である。
図４に示すように、被写界深度は、被写体までの距離ｕを中心とする、前方被写界深度ＤｏＦ_ｆと後方被写界深度ＤｏＦ_ｒとの和により求められ、所定の幅をもつ値となる。前方被写界深度ＤｏＦ_ｆは、被写体までの距離ｕから視点位置に近づく方向においてピントがあう範囲である。後方被写界深度ＤｏＦ_ｒは、被写体までの距離ｕから視点位置から遠ざかる方向においてピントがあう範囲である。 FIG. 4 is a diagram showing an example of a function of depth of field according to the first embodiment. In FIG. 4, DoF _f is the front depth of field, DoF _r is the rear depth of field, N is the lens F value, c is the permissible circle of confusion diameter, f is the focal length, and u is the distance to the subject.
As shown in FIG. 4, the depth of field is obtained by the sum of the front depth of field DoF _f and the rear depth of field DoF _r , centered on the distance u to the subject, and has a predetermined width. It becomes a value. The forward depth of field DoF _f is a range in which focus is achieved in the direction approaching the viewpoint position from the distance u to the subject. Rear depth of field DoF _r is a focus range in the direction away from the viewpoint position from the distance u to the object.

図５は、第１の実施形態に係る投影画像の各画素の仮想距離の分布の例である。図５に示すように、投影画像の各画素の仮想距離にはばらつきが生じる。このような誤差が含まれる仮想距離の集合にＲＡＮＳＡＣ等の統計処理を適用することにより、確からしい仮想距離を算出する。これにより高精度の仮想距離を求めることができ、三次元形状モデルＭの実際の寸法を精度よく求めることが可能となる。 FIG. 5 is an example of the distribution of the virtual distance of each pixel of the projected image according to the first embodiment. As shown in FIG. 5, the virtual distance of each pixel of the projected image varies. By applying statistical processing such as RANSAC to a set of virtual distances including such an error, a probable virtual distance is calculated. As a result, the virtual distance with high accuracy can be obtained, and the actual dimensions of the three-dimensional shape model M can be obtained with high accuracy.

図６は、第１の実施形態に係る三次元形状モデル生成装置１が行う処理の流れを示すフローチャートである。
ステップＳ１０１：
画像データ取得部１０１は、対象物Ｔの多視点画像ＴＧを取得する。
ステップＳ１０２：
三次元形状生成部１０２は、多視点画像ＴＧを用いて三次元形状モデルＭを生成する。
ステップＳ１０３：
ピント領域検出部１０３は、多視点画像ＴＧにおけるブラーマップを生成する。
ステップＳ１０４：
距離算出部１０４は、多視点画像ＴＧにおいてピントの合った領域について、その領域に含まれる画素ごとの仮想距離を算出する。
ステップＳ１０５：
距離算出部１０４は、三次元形状モデルＭの投影画像の各画素に、ステップＳ１０４で算出した多視点画像ＴＧの画素ごとの仮想距離を対応させることにより、投影画像の各画素の仮想距離を算出する。
ステップＳ１０６：
距離算出部１０４は、投影画像の各画素の仮想距離のばらつきに対し、統計的な処理を行うことにより、投影画像における最も確からしい仮想距離（仮想距離の真値）を算出する。
ステップＳ１０７：
スケール推定部１０５は、仮想距離に対する現実距離である、スケール値ＳＣを推定する。スケール推定部１０５は、例えば、カメラパラメータを用いて導出した現実距離、及びステップＳ１０６にて算出した仮想距離を用いて、スケール値ＳＣを算出する。
ステップＳ１０８：
スケール変換部１０６は、現実距離を算出する。スケール変換部１０６は、ステップＳ１０６にて算出した仮想距離に、ステップＳ１０７にて推定したスケール値ＳＣを乗算することにより、現実距離を算出する。 FIG. 6 is a flowchart showing a flow of processing performed by the three-dimensional shape model generation device 1 according to the first embodiment.
Step S101:
The image data acquisition unit 101 acquires a multi-viewpoint image TG of the object T.
Step S102:
The three-dimensional shape generation unit 102 generates a three-dimensional shape model M using the multi-viewpoint image TG.
Step S103:
The focus area detection unit 103 generates a blur map in the multi-viewpoint image TG.
Step S104:
The distance calculation unit 104 calculates the virtual distance for each pixel included in the focused region in the multi-viewpoint image TG.
Step S105:
The distance calculation unit 104 calculates the virtual distance of each pixel of the projected image by associating each pixel of the projected image of the three-dimensional shape model M with the virtual distance of each pixel of the multi-viewpoint image TG calculated in step S104. To do.
Step S106:
The distance calculation unit 104 calculates the most probable virtual distance (true value of the virtual distance) in the projected image by performing statistical processing on the variation in the virtual distance of each pixel of the projected image.
Step S107:
The scale estimation unit 105 estimates the scale value SC, which is the actual distance with respect to the virtual distance. The scale estimation unit 105 calculates the scale value SC by using, for example, the real distance derived by using the camera parameters and the virtual distance calculated in step S106.
Step S108:
The scale conversion unit 106 calculates the actual distance. The scale conversion unit 106 calculates the actual distance by multiplying the virtual distance calculated in step S106 by the scale value SC estimated in step S107.

なお、上述した実施形態では、図６のフローチャートに示すように、三次元形状モデルを生成（ステップＳ１０２）後に、多視点画像ＴＧにおけるブラーマップを生成する（ステップＳ１０３）を行う場合を例示して説明したが、これに限定されない。例えば、三次元形状生成部１０２は、ブラーの強度が小さい画像と比較して、画素ブラーの強度が強い画像の重みが小さくなるように設定することにより、多視点画像ＴＧのピントが合う領域のみを用いて三次元形状モデルを生成するようにしてもよい。この場合、投影画像の各画素が、多視点画像ＴＧにおけるピントが合った画素のみで構成される。この場合、ステップＳ１０５における、「三次元形状モデルＭの投影画像の各画素に、ステップＳ１０４で算出した多視点画像ＴＧの画素ごとの仮想距離を対応させる」処理を省略することができる。つまり、ステップＳ１０５では、投影画像の各画素のデプス値を、そのまま仮想距離とすることができる。 In the above-described embodiment, as shown in the flowchart of FIG. 6, a case where a three-dimensional shape model is generated (step S102) and then a blur map in the multi-viewpoint image TG is generated (step S103) is illustrated. As explained, but not limited to this. For example, the three-dimensional shape generation unit 102 is set so that the weight of the image having a high pixel blur intensity is smaller than that of the image having a low blur intensity, so that only the region where the multi-viewpoint image TG is in focus is set. May be used to generate a three-dimensional shape model. In this case, each pixel of the projected image is composed of only the pixels that are in focus in the multi-viewpoint image TG. In this case, the process of "corresponding each pixel of the projected image of the three-dimensional shape model M with the virtual distance of each pixel of the multi-viewpoint image TG calculated in step S104" in step S105 can be omitted. That is, in step S105, the depth value of each pixel of the projected image can be used as the virtual distance as it is.

以上説明したように、第１の実施形態に係る三次元形状モデル生成装置１は、三次元形状生成部１０２と、ピント領域検出部１０３と、距離算出部１０４と、スケール推定部１０５と、スケール変換部１０６とを備える。三次元形状生成部１０２は、対象物Ｔを、互いに異なる視点から撮像した複数の多視点画像ＴＧから、対象物Ｔの三次元形状モデルＭを生成する。ピント領域検出部１０３は、多視点画像ＴＧの各画像におけるピントが合った領域を検出する。距離算出部１０４は、多視点画像ＴＧの各画像におけるピントが合っている領域の画素について、三次元形状モデルＭを二次元平面に投影させた投影画像と視点位置との仮想的な距離である仮想距離を算出する。スケール推定部１０５は、スケール値ＳＣを推定する。スケール変換部１０６は、スケール値ＳＣを用いて、仮想距離を、実際の距離に変換する。これにより、第１の実施形態の三次元形状モデル生成装置１は、ピントが合っている領域が被写界深度の範囲にあることを利用して仮想距離の精度を高めることができる。このため、多視点画像ＴＧを用いた三次元復元手法において復元した三次元形状モデルＭの実際の寸法を、精度よく推定することができる。 As described above, the three-dimensional shape model generation device 1 according to the first embodiment includes the three-dimensional shape generation unit 102, the focus area detection unit 103, the distance calculation unit 104, the scale estimation unit 105, and the scale. It includes a conversion unit 106. The three-dimensional shape generation unit 102 generates a three-dimensional shape model M of the object T from a plurality of multi-viewpoint images TG obtained by capturing the object T from different viewpoints. The focus area detection unit 103 detects an in-focus area in each image of the multi-viewpoint image TG. The distance calculation unit 104 is a virtual distance between the projected image obtained by projecting the three-dimensional shape model M onto a two-dimensional plane and the viewpoint position for the pixels in the in-focus region of each image of the multi-viewpoint image TG. Calculate the virtual distance. The scale estimation unit 105 estimates the scale value SC. The scale conversion unit 106 converts the virtual distance into an actual distance by using the scale value SC. As a result, the three-dimensional shape model generation device 1 of the first embodiment can improve the accuracy of the virtual distance by utilizing the fact that the in-focus region is within the range of the depth of field. Therefore, the actual dimensions of the three-dimensional shape model M restored by the three-dimensional restoration method using the multi-viewpoint image TG can be estimated with high accuracy.

また、第１の実施形態の三次元形状モデル生成装置１では、ピント領域検出部１０３は、多視点画像ＴＧの各画像において、多視点画像ＴＧを撮像したカメラのフォーカス機能を用いてフォーカスを合わせた領域に応じて、多視点画像ＴＧの各画像におけるピントの合った領域を検出するようにしてもよい。これにより、ピントの合った領域を容易に検出することができる。 Further, in the three-dimensional shape model generation device 1 of the first embodiment, the focus region detection unit 103 focuses on each image of the multi-viewpoint image TG by using the focus function of the camera that captured the multi-viewpoint image TG. The in-focus region in each image of the multi-viewpoint image TG may be detected according to the region. As a result, the in-focus area can be easily detected.

また、第１の実施形態の三次元形状モデル生成装置１では、ピント領域検出部１０３は、画像処理により、多視点画像ＴＧの各画像におけるピントの合った領域を検出するようにしてもよい。これにより、フォーカスを合せた領域が不明である場合でも、ピントの合った領域を検出することができる。 Further, in the three-dimensional shape model generation device 1 of the first embodiment, the focus area detection unit 103 may detect the in-focus area in each image of the multi-viewpoint image TG by image processing. As a result, even if the focused area is unknown, the in-focus area can be detected.

また、第１の実施形態の三次元形状モデル生成装置１では、ピント領域検出部１０３は、入力と出力とが対応づけられた学習用データセットを用いて機械学習を行うことにより生成された学習済みモデルを用いて、多視点画像ＴＧの各画像におけるピントの合った領域を検出するようにしてもよい。この場合、学習用データセットの入力は、学習用の入力画像である。学習用データセットの出力は、学習用の入力画像におけるピントの合った領域を示す情報である。これにより、複雑な画像処理を行わなくとも、ピントの合った領域を検出することができる。 Further, in the three-dimensional shape model generation device 1 of the first embodiment, the focus area detection unit 103 is learning generated by performing machine learning using a learning data set in which inputs and outputs are associated with each other. The completed model may be used to detect an in-focus region in each image of the multi-viewpoint image TG. In this case, the input of the training data set is the input image for training. The output of the training data set is information indicating an in-focus area in the training input image. As a result, it is possible to detect an in-focus area without performing complicated image processing.

また、第１の実施形態の三次元形状モデル生成装置１では、距離算出部１０４は、ピント領域検出部１０３によりピントが合っていると判定された画素に対応する対応画素であって、三次元形状モデルＭを二次元平面に投影させた投影画像の対応画素におけるデプス値を、投影画像の対応画素における仮想距離とするようにしてもよい。これにより、三次元形状モデルＭを生成する過程において求めたデプス値から、容易に仮想距離を求めることができる。 Further, in the three-dimensional shape model generation device 1 of the first embodiment, the distance calculation unit 104 is a corresponding pixel corresponding to a pixel determined to be in focus by the focus area detection unit 103, and is three-dimensional. The depth value in the corresponding pixel of the projected image obtained by projecting the shape model M on the two-dimensional plane may be set as the virtual distance in the corresponding pixel of the projected image. As a result, the virtual distance can be easily obtained from the depth value obtained in the process of generating the three-dimensional shape model M.

また、第１の実施形態の三次元形状モデル生成装置１では、スケール推定部１０５は、多視点画像ＴＧを撮像したカメラのカメラパラメータから算出される被写界深度に基づき、現実距離を導出するようにしてもよい。これにより、現実距離を、容易に導出することができる。 Further, in the three-dimensional shape model generation device 1 of the first embodiment, the scale estimation unit 105 derives the actual distance based on the depth of field calculated from the camera parameters of the camera that captured the multi-viewpoint image TG. You may do so. Thereby, the actual distance can be easily derived.

また、第１の実施形態の三次元形状モデル生成装置１では、スケール推定部１０５は、多視点画像ＴＧを撮像したカメラのフォーカス機能から得られる距離に基づき、現実距離を導出するようにしてもよい。これにより、現実距離を、容易に導出することができる。 Further, in the three-dimensional shape model generation device 1 of the first embodiment, the scale estimation unit 105 may derive the actual distance based on the distance obtained from the focus function of the camera that has captured the multi-viewpoint image TG. Good. Thereby, the actual distance can be easily derived.

また、第１の実施形態の三次元形状モデル生成装置１では、距離算出部１０４は、検出されたピントが合っている領域に含まれる画素のデプス値に応じた距離を、投影画像に対応する点に対応させることにより、投影画像の画素ごとの仮想距離を算出するようにしてもよい。これにより、現実距離を、容易に導出することができる。 Further, in the three-dimensional shape model generation device 1 of the first embodiment, the distance calculation unit 104 corresponds to the projected image with a distance corresponding to the depth value of the pixels included in the detected in-focus region. The virtual distance for each pixel of the projected image may be calculated by corresponding to the points. Thereby, the actual distance can be easily derived.

また、第１の実施形態の三次元形状モデル生成装置１では、距離算出部１０４は、検出されたピントが合っている領域に含まれる画素のデプス値に応じた距離を、投影画像に対応する点に対応させることにより、投影画像の画素ごとの仮想距離を算出し、算出した画素ごとの仮想距離を、統計的手法を用いて統合することにより、投影画像の仮想距離を算出するようにしてもよい。これにより、投影画像の画素ごとの仮想距離に、誤差がある場合であっても、統計的手法を用いて統合することにより誤差を低減させた、より確からしい仮想距離を算出することが可能である。 Further, in the three-dimensional shape model generation device 1 of the first embodiment, the distance calculation unit 104 corresponds to the projected image with a distance corresponding to the depth value of the pixels included in the detected region in focus. The virtual distance for each pixel of the projected image is calculated by making it correspond to the points, and the virtual distance for each pixel is calculated by integrating the calculated virtual distance for each pixel using a statistical method. May be good. As a result, even if there is an error in the virtual distance for each pixel of the projected image, it is possible to calculate a more probable virtual distance with the error reduced by integrating using a statistical method. is there.

（第１の実施形態の変形例１）
次に、第１の実施形態の変形例１について説明する。本変形例では、ピント領域検出部１０３によりピントが合っていると判定された画素が、エッジであるか否かに応じて、仮想距離を算出する点において、上述した実施形態と相違する。 (Modification 1 of the first embodiment)
Next, a modification 1 of the first embodiment will be described. This modification is different from the above-described embodiment in that the virtual distance is calculated depending on whether or not the pixel determined to be in focus by the focus region detection unit 103 is an edge.

一般に、対象物Ｔのエッジに対応する画素のデプス値には、誤差が多く含まれる傾向にある。このため、画素が対象物Ｔのエッジに対応するか否かに応じて、仮想距離を算出すれば、算出する仮想距離の精度を向上させることが可能である。 In general, the depth value of the pixel corresponding to the edge of the object T tends to include a large amount of error. Therefore, if the virtual distance is calculated according to whether or not the pixel corresponds to the edge of the object T, the accuracy of the calculated virtual distance can be improved.

例えば、距離算出部１０４は、エッジに該当する画素のデプス値を仮想距離の算出に用いない、或いは、エッジに該当する画素のデプス値に乗算する重みづけを、他の画素と比較して小さい値に設定する。距離算出部１０４は、例えば、多視点画像ＴＧにおいてピントが合っている領域の画素がエッジであるか否かを判定する。距離算出部１０４は、エッジであるか否かを、例えば、Ｃａｎｎｙ法によるエッジ検出の手法を用いて検出する。距離算出部１０４は、複数の多視点画像ＴＧの各々における、ピントの合った領域に含まれる画素のうち、エッジでないと判定された画素のデプス値のみを投影画像の画素に対応させることにより、投影画像の各画素の仮想距離を決定する。距離算出部１０４は、投影画像の各画素における仮想距離のそれぞれに対し、統計的な処理を行うことにより、投影画像における最も確からしい仮想距離（仮想距離の真値）を算出する。
或いは、距離算出部１０４は、投影画像における対応画素がエッジであるか否かを判定し、対応画素がエッジでないと判定された画素のみを用いて、投影画像における最も確からしい仮想距離（仮想距離の真値）を算出するようにしてもよい。 For example, the distance calculation unit 104 does not use the depth value of the pixel corresponding to the edge in the calculation of the virtual distance, or the weighting for multiplying the depth value of the pixel corresponding to the edge is smaller than that of other pixels. Set to a value. The distance calculation unit 104 determines, for example, whether or not the pixel in the in-focus region in the multi-viewpoint image TG is an edge. The distance calculation unit 104 detects whether or not it is an edge by using, for example, a method of edge detection by the Canny method. The distance calculation unit 104 makes the depth values of the pixels determined not to be edges correspond to the pixels of the projected image among the pixels included in the in-focus region in each of the plurality of multi-view image TGs. Determine the virtual distance of each pixel in the projected image. The distance calculation unit 104 calculates the most probable virtual distance (true value of the virtual distance) in the projected image by performing statistical processing on each of the virtual distances in each pixel of the projected image.
Alternatively, the distance calculation unit 104 determines whether or not the corresponding pixel in the projected image is an edge, and uses only the pixel determined that the corresponding pixel is not an edge to use the most probable virtual distance (virtual distance) in the projected image. The true value of) may be calculated.

以上説明したように、第１の実施形態の変形例に係る三次元形状モデル生成装置１では、距離算出部１０４は、ピント領域検出部１０３によりピントが合っていると判定された画素が、エッジであるか否かを画像処理により判定した判定結果に基づき、仮想距離を算出する。或いは、距離算出部１０４は、投影画像における対応画素がエッジであるか否かを判定し、対応画素がエッジでないと判定された画素のみを用いて、投影画像における最も確からしい仮想距離（仮想距離の真値）を算出するようにしてもよい。すなわち、距離算出部１０４は、多視点画像ＴＧにおいてピントが合っている領域の画素がエッジであるか否かの判定結果、及び投影画像における対応画素がエッジであるか否かの判定結果の少なくとも一方の判定結果を用いて、投影画像における最も確からしい仮想距離（仮想距離の真値）を算出する。これにより、誤差が含まれる可能性が高いデプス値が、仮想距離の計算に与える影響を低減させることが可能である。 As described above, in the three-dimensional shape model generation device 1 according to the modified example of the first embodiment, in the distance calculation unit 104, the pixels determined by the focus area detection unit 103 to be in focus are edges. The virtual distance is calculated based on the determination result of determining whether or not the image is. Alternatively, the distance calculation unit 104 determines whether or not the corresponding pixel in the projected image is an edge, and uses only the pixel determined that the corresponding pixel is not an edge to use the most probable virtual distance (virtual distance) in the projected image. The true value of) may be calculated. That is, the distance calculation unit 104 at least determines whether or not the pixel in the focused region in the multi-viewpoint image TG is an edge, and whether or not the corresponding pixel in the projected image is an edge. Using one of the determination results, the most probable virtual distance (true value of the virtual distance) in the projected image is calculated. As a result, it is possible to reduce the influence of the depth value, which is likely to include an error, on the calculation of the virtual distance.

（第１の実施形態の変形例２）
次に、第１の実施形態の変形例２について説明する。本変形例では、ピント領域検出部１０３によりピントが合っていると判定された画素のデプス値に重みづけをする点において、上述した実施形態と相違する。 (Modification 2 of the first embodiment)
Next, a modification 2 of the first embodiment will be described. This modification differs from the above-described embodiment in that the depth value of the pixel determined to be in focus by the focus region detection unit 103 is weighted.

距離算出部１０４は、ピントが合っている領域に含まれる画素のデプス値に応じた距離を、ピントが合った度合いに応じて重みづけする。距離算出部１０４は、よりピントが合っている（ピントが合った度合いが大きい）場合に重みづけの乗算値が大きくなるように、重みづけを設定する。距離算出部１０４は、よりピントが合っていない（ピントが合った度合いが小さい）場合に重みづけの乗算値が小さくなるように、重みづけを設定する。距離算出部１０４は、重みづけした距離を、投影画像に対応する点に対応させることにより、投影画像の画素ごとの仮想距離を算出する。 The distance calculation unit 104 weights the distance according to the depth value of the pixels included in the in-focus region according to the degree of focus. The distance calculation unit 104 sets the weighting so that the multiplication value of the weighting becomes large when the focus is higher (the degree of focusing is larger). The distance calculation unit 104 sets the weighting so that the multiplication value of the weighting becomes smaller when the focus is less (the degree of the focus is smaller). The distance calculation unit 104 calculates the virtual distance for each pixel of the projected image by associating the weighted distance with the point corresponding to the projected image.

また、距離算出部１０４は、ピントが合っている領域に含まれる画素のデプス値に応じた距離を、その距離の大きさに応じて重みづけするようにしてもよい。距離算出部１０４は、距離が小さい（カメラの視点位置に近い）場合に重みづけの乗算値が大きくなるように、重みづけを設定する。距離算出部１０４は、距離が大きい（カメラの視点位置から遠い）場合に重みづけの乗算値が小さくなるように、重みづけを設定する。距離算出部１０４は、重みづけした距離を、投影画像に対応する点に対応させることにより、投影画像の画素ごとの仮想距離を算出する。 Further, the distance calculation unit 104 may weight the distance corresponding to the depth value of the pixels included in the in-focus region according to the magnitude of the distance. The distance calculation unit 104 sets the weighting so that the multiplication value of the weighting becomes large when the distance is small (close to the viewpoint position of the camera). The distance calculation unit 104 sets the weighting so that the multiplication value of the weighting becomes small when the distance is large (far from the viewpoint position of the camera). The distance calculation unit 104 calculates the virtual distance for each pixel of the projected image by associating the weighted distance with the point corresponding to the projected image.

以上説明したように、第１の実施形態の変形例２に係る三次元形状モデル生成装置１では、ピント領域検出部１０３は、画素ごとにピントが合った度合いを検出し、距離算出部１０４は、ピントが合っている領域に含まれる画素のデプス値に応じた距離を、ピントが合った度合いに応じて重みづけする。これにより、ピントが合った度合いを仮想距離の算出に反映させることができ、より精度よく仮想距離を算出することが可能となる。 As described above, in the three-dimensional shape model generation device 1 according to the second modification of the first embodiment, the focus area detection unit 103 detects the degree of focus for each pixel, and the distance calculation unit 104 determines the degree of focus. , The distance according to the depth value of the pixel included in the in-focus area is weighted according to the degree of in-focus. As a result, the degree of focus can be reflected in the calculation of the virtual distance, and the virtual distance can be calculated more accurately.

また、第１の実施形態の変形例２に係る三次元形状モデル生成装置１では、距離算出部１０４は、ピントが合っている領域に含まれる画素のデプス値に応じた距離を、当該距離の大きさに応じて重みづけする。一般に、カメラの視点位置から遠ざかるにしたがって、被写体までの距離ｕに含まれる誤差が大きくなる傾向にある。このため、視点位置からの距離に応じた重みづけを行うことにより、より精度よく仮想距離を算出することが可能となる。 Further, in the three-dimensional shape model generation device 1 according to the second modification of the first embodiment, the distance calculation unit 104 sets the distance according to the depth value of the pixels included in the in-focus area. Weight according to size. Generally, as the distance from the viewpoint position of the camera increases, the error included in the distance u to the subject tends to increase. Therefore, it is possible to calculate the virtual distance more accurately by weighting according to the distance from the viewpoint position.

（第１の実施形態の変形例３）
次に、第１の実施形態の変形例３について説明する。本実施形態では、投影画像の１つの画素に対応する、複数の多視点画像ＴＧの画素の奥行値のばらつきを考慮して、投影画像の仮想距離を算出する点において、上述した実施形態と相違する。 (Modification 3 of the first embodiment)
Next, a modification 3 of the first embodiment will be described. The present embodiment differs from the above-described embodiment in that the virtual distance of the projected image is calculated in consideration of the variation in the depth values of the pixels of the plurality of multi-viewpoint image TGs corresponding to one pixel of the projected image. To do.

例えば、距離算出部１０４は、投影画像の画素ごとに、その画素に対応する多視点画像ＴＧの画素を抽出する。距離算出部１０４は、投影画像の一つの画素に対応する多視点画像ＴＧの画素が複数ある場合、その複数の画素における奥行値のばらつきの度合いを算出する。ばらつきの度合いの算出には、分散など、任意の統計的手法が用いられてよい。距離算出部１０４は、多視点画像ＴＧの画素における奥行値のばらつきの度合いが所定の閾値より大きい場合、投影画像におけるその画素の仮想距離を、投影画像の仮想距離（仮想距離の真値）の算出に用いない。一方、距離算出部１０４は、奥行値のばらつきの度合いが所定の閾値より小さい場合、投影画像におけるその画素の仮想距離を、投影画像の仮想距離（仮想距離の真値）の算出に用いるようにする。 For example, the distance calculation unit 104 extracts the pixels of the multi-viewpoint image TG corresponding to the pixels of the projected image for each pixel. When there are a plurality of pixels of the multi-viewpoint image TG corresponding to one pixel of the projected image, the distance calculation unit 104 calculates the degree of variation in the depth value in the plurality of pixels. Any statistical method, such as variance, may be used to calculate the degree of variability. When the degree of variation in the depth value of the pixels of the multi-viewpoint image TG is larger than a predetermined threshold value, the distance calculation unit 104 sets the virtual distance of the pixel in the projected image as the virtual distance (true value of the virtual distance) of the projected image. Not used for calculation. On the other hand, when the degree of variation in the depth value is smaller than a predetermined threshold value, the distance calculation unit 104 uses the virtual distance of the pixel in the projected image to calculate the virtual distance (true value of the virtual distance) of the projected image. To do.

以上説明したように、第１の実施形態の変形例３に係る三次元形状モデル生成装置１では、距離算出部１０４は、投影画像の画素に対応する複数の多視点画像ＴＧの画素における奥行値のばらつきの度合いに応じて、投影画像におけるその画素の仮想距離を、投影画像の仮想距離の算出に用いるか否かを判定する。これにより、画素ごとの奥行値のばらつきが大きい場合には、投影画像の仮想距離の算出に用いないようにすることができ、より精度よく投影画像の仮想距離を算出することが可能である。 As described above, in the three-dimensional shape model generation device 1 according to the third modification of the first embodiment, the distance calculation unit 104 has a depth value in the pixels of a plurality of multi-viewpoint image TGs corresponding to the pixels of the projected image. It is determined whether or not the virtual distance of the pixel in the projected image is used for calculating the virtual distance of the projected image according to the degree of variation of. As a result, when the variation in the depth value for each pixel is large, it can be prevented from being used for calculating the virtual distance of the projected image, and the virtual distance of the projected image can be calculated more accurately.

（第２の実施形態）
次に、第２の実施形態について説明する。以下の説明においては、上述した実施形態と異なる部分についてのみ説明し、同じ部分については同等の符号を付してその説明を省略する。 (Second Embodiment)
Next, the second embodiment will be described. In the following description, only the parts different from the above-described embodiment will be described, and the same parts will be designated by the same reference numerals and the description thereof will be omitted.

本実施形態においては、スケール値ＳＣを推定する際に用いる現実距離を、キャリブレーションにより求める点において、上述した実施形態と相違する。現実距離を、キャリブレーションにより求めることにより、より精度よく現実距離を求めることができ、三次元形状モデルＭの実際の寸法を、さらに精度よく推定することが可能となる。 The present embodiment differs from the above-described embodiment in that the actual distance used when estimating the scale value SC is obtained by calibration. By obtaining the actual distance by calibration, the actual distance can be obtained more accurately, and the actual dimensions of the three-dimensional shape model M can be estimated more accurately.

図７は、第２の実施形態に係る三次元形状モデル生成装置１Ａの構成の例を示すブロック図である。三次元形状モデル生成装置１Ａは、マーカスケール推定部１１０を備える。
マーカスケール推定部１１０は、マーカ三次元形状モデルＭＭの投影画像におけるスケール値ＳＣを推定する。マーカ三次元形状モデルＭＭは、マーカが付された対象物Ｔの三次元形状モデルである。マーカは、実際の寸法が既知の印である。 FIG. 7 is a block diagram showing an example of the configuration of the three-dimensional shape model generation device 1A according to the second embodiment. The three-dimensional shape model generation device 1A includes a marker scale estimation unit 110.
The marker scale estimation unit 110 estimates the scale value SC in the projected image of the marker three-dimensional shape model MM. The marker three-dimensional shape model MM is a three-dimensional shape model of the object T to which the marker is attached. The marker is a mark whose actual dimensions are known.

マーカスケール推定部１１０は、第１の実施形態における、スケール推定部１０５がおこなう処理と同様の処理にて、マーカ三次元形状モデルＭＭ（図８参照）の投影画像におけるスケール値ＳＣを推定する。ただし、マーカ多視点画像ＭＴＧでは、画像に撮像されたマーカを手掛かりとして実際の寸法が判る。この実際の寸法に基づいて、仮想距離ではなく、現実距離を求める。 The marker scale estimation unit 110 estimates the scale value SC in the projected image of the marker three-dimensional shape model MM (see FIG. 8) by the same processing as that performed by the scale estimation unit 105 in the first embodiment. However, in the marker multi-viewpoint image MTG, the actual dimensions can be known by using the marker captured in the image as a clue. Based on this actual dimension, the real distance, not the virtual distance, is calculated.

具体的に、画像データ取得部１０１は、マーカが付された対象物Ｔの多視点画像であるマーカ多視点画像ＭＴＧを取得する。
三次元形状生成部１０２は、マーカ多視点画像ＭＴＧを用いてマーカ三次元形状モデルＭＭを生成する。
ピント領域検出部１０３は、マーカ多視点画像ＭＴＧにおけるブラーマップを生成する。
距離算出部１０４は、マーカ多視点画像ＭＴＧにおいてピントの合った領域について、その領域に含まれる画素ごとの現実距離を算出する。
距離算出部１０４は、マーカ三次元形状モデルＭＭの投影画像の各画素に、上記で算出した画素ごとの現実距離を対応させることにより、投影画像の各画素の現実距離を算出する。
距離算出部１０４は、投影画像の各画素の現実距離のばらつきに対し、統計的な処理を行うことにより、投影画像における最も確からしい現実距離（現実距離の真値）を算出する。
マーカスケール推定部１１０は、距離算出部１０４により算出された現実距離（現実距離の真値）、および、第１の実施形態において距離算出部１０４により算出された仮想距離（仮想距離の真値）を用いて、スケール値ＳＣを推定する。 Specifically, the image data acquisition unit 101 acquires a marker multi-viewpoint image MTG, which is a multi-viewpoint image of the object T with a marker.
The three-dimensional shape generation unit 102 generates a marker three-dimensional shape model MM using the marker multi-viewpoint image MTG.
The focus area detection unit 103 generates a blur map in the marker multi-viewpoint image MTG.
The distance calculation unit 104 calculates the actual distance for each pixel included in the focused region in the marker multi-viewpoint image MTG.
The distance calculation unit 104 calculates the actual distance of each pixel of the projected image by associating each pixel of the projected image of the marker three-dimensional shape model MM with the actual distance of each pixel calculated above.
The distance calculation unit 104 calculates the most probable real distance (true value of the real distance) in the projected image by performing statistical processing on the variation in the real distance of each pixel of the projected image.
The marker scale estimation unit 110 includes a real distance (true value of the real distance) calculated by the distance calculation unit 104 and a virtual distance (true value of the virtual distance) calculated by the distance calculation unit 104 in the first embodiment. Is used to estimate the scale value SC.

以上説明したように、第２の実施形態の三次元形状モデル生成装置１Ａでは、マーカスケール推定部１１０を備える。マーカスケール推定部１１０は、実際の寸法が既知であるマーカの三次元形状モデルＭを、実際の寸法に基づいてスケール補正した、補正済み三次元形状モデルに基づいて、現実距離を導出する。スケール推定部１０５は、マーカスケール推定部１１０により推定された現実距離を用いて、スケール値ＳＣを推定する。これにより、より精度が高い現実距離を用いて、スケール値ＳＣを推定することが可能である。 As described above, the three-dimensional shape model generation device 1A of the second embodiment includes the marker scale estimation unit 110. The marker scale estimation unit 110 derives the actual distance based on the corrected three-dimensional shape model in which the three-dimensional shape model M of the marker whose actual dimensions are known is scale-corrected based on the actual dimensions. The scale estimation unit 105 estimates the scale value SC using the actual distance estimated by the marker scale estimation unit 110. This makes it possible to estimate the scale value SC using a more accurate real distance.

（第２の実施形態の変形例１）
次に、第２の実施形態の変形例１について説明する。本変形例では、学習済みモデルに学習させる学習用データセットの内容が、上述した実施形態と相違する。 (Modification 1 of the second embodiment)
Next, a modification 1 of the second embodiment will be described. In this modification, the content of the learning data set to be trained by the trained model is different from the above-described embodiment.

本変形例では、学習用データセットの入力を、補正済みの投影画像とする。補正済みの投影画像は、補正済みのマーカ三次元形状モデルＭＭを、二次元平面に投影させた画像である。補正済みのマーカ三次元形状モデルＭＭとは、マーカが付された対象物Ｔの多視点画像であるマーカ多視点画像ＭＴＧを用いて生成された、マーカ三次元形状モデルＭＭを、マーカの実際の値に応じて拡大又は縮小させることにより、マーカが付された対象物Ｔの実際の寸法に補正したモデルである。 In this modification, the input of the training data set is a corrected projection image. The corrected projected image is an image obtained by projecting the corrected marker three-dimensional shape model MM onto a two-dimensional plane. The corrected marker three-dimensional shape model MM is an actual marker three-dimensional shape model MM generated by using the marker multi-view image MTG, which is a multi-view image of the object T to which the marker is attached. It is a model corrected to the actual size of the object T with a marker by enlarging or reducing it according to the value.

また、学習用データセットの出力を、補正済みの投影画像における画素ごとのデプス値に基づいて判定した、補正済みの投影画像におけるピントの合った領域を示す情報とする。 Further, the output of the training data set is used as information indicating a focused region in the corrected projection image, which is determined based on the depth value for each pixel in the corrected projection image.

以上説明したように、第２の実施形態の変形例１に係る三次元形状モデル生成装置１Ａでは、学習用データセットの入力は、実際の寸法が既知であるマーカの三次元形状モデルを、マーカの実際の寸法に基づいてスケール補正した補正済み三次元形状モデルに基づいて、補正済み三次元形状モデルを二次元平面に投影させた、補正済みの投影画像であり、学習用データセットの出力は、補正済みの投影画像における画素ごとのデプス値に基づいて判定した、投影画像におけるピントの合った領域を示す情報である。これにより、第２の実施形態の変形例１に係る三次元形状モデル生成装置１Ａでは、補正済みの投影画像（すなわち、実際の寸法の情報をもつ投影画像）と、その補正済みの投影画像の画素ごとのデプス値（すなわち、実際の被写体までの距離）に応じて、ピントが合っているか否を学習させることができる。これにより、学習済みモデルを、被写界深度に応じてピントが合っているか否かを推測するモデルとすることができる。学習済みモデルが、被写界深度に応じてピントが合っているか否かを推測することにより、仮想距離を被写界深度の範囲内に収めて、仮想距離に含まれる誤差を低減させることが可能である。 As described above, in the three-dimensional shape model generation device 1A according to the first modification of the second embodiment, the input of the training data set is a three-dimensional shape model of a marker whose actual dimensions are known. This is a corrected projection image obtained by projecting the corrected 3D shape model onto a 2D plane based on the corrected 3D shape model scale-corrected based on the actual dimensions of, and the output of the training data set is , Information indicating an in-focus area in the projected image, which is determined based on the depth value for each pixel in the corrected projected image. As a result, in the three-dimensional shape model generation device 1A according to the first modification of the second embodiment, the corrected projection image (that is, the projection image having the actual dimensional information) and the corrected projection image are It is possible to learn whether or not the image is in focus according to the depth value for each pixel (that is, the distance to the actual subject). As a result, the trained model can be used as a model for estimating whether or not the subject is in focus according to the depth of field. By estimating whether or not the trained model is in focus according to the depth of field, it is possible to keep the virtual distance within the range of the depth of field and reduce the error included in the virtual distance. It is possible.

上述した実施形態における三次元形状モデル生成装置１の全部または一部をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＦＰＧＡ等のプログラマブルロジックデバイスを用いて実現されるものであってもよい。 All or part of the three-dimensional shape model generation device 1 in the above-described embodiment may be realized by a computer. In that case, the program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by the computer system and executed. The term "computer system" as used herein includes hardware such as an OS and peripheral devices. Further, the "computer-readable recording medium" refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, or a CD-ROM, or a storage device such as a hard disk built in a computer system. Further, a "computer-readable recording medium" is a communication line for transmitting a program via a network such as the Internet or a communication line such as a telephone line, and dynamically holds the program for a short period of time. It may also include a program that holds a program for a certain period of time, such as a volatile memory inside a computer system that serves as a server or a client in that case. Further, the above program may be for realizing a part of the above-mentioned functions, and may be further realized for realizing the above-mentioned functions in combination with a program already recorded in the computer system. It may be realized by using a programmable logic device such as FPGA.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiments of the present invention have been described in detail with reference to the drawings, the specific configuration is not limited to this embodiment, and includes designs and the like within a range that does not deviate from the gist of the present invention.

１…三次元形状モデル生成装置
１０１…画像データ取得部
１０２…三次元形状生成部
１０３…ピント領域検出部
１０４…距離算出部
１０５…スケール推定部
１０６…スケール変換部
１０７…画像データ記憶部
１０８…三次元形状記憶部
１０９…スケール情報記憶部
１１０…マーカスケール推定部 1 ... Three-dimensional shape model generation device 101 ... Image data acquisition unit 102 ... Three-dimensional shape generation unit 103 ... Focus area detection unit 104 ... Distance calculation unit 105 ... Scale estimation unit 106 ... Scale conversion unit 107 ... Image data storage unit 108 ... Three-dimensional shape storage unit 109 ... Scale information storage unit 110 ... Marker scale estimation unit

Claims

A three-dimensional shape generator that generates a three-dimensional shape model of the object from a plurality of multi-viewpoint images obtained by capturing the object from different viewpoints.
A focus area detection unit that detects an in-focus area in each image of the multi-viewpoint image,
It is a virtual distance between the projected image obtained by projecting the three-dimensional shape model onto a two-dimensional plane and the viewpoint position in the projected image for the pixels in the in-focus region of each image of the multi-viewpoint image. The distance calculation unit that calculates the virtual distance and
A scale estimation unit that estimates a scale value indicating the ratio of the projected image to the actual actual distance to the viewpoint position in the projected image with respect to the virtual distance.
A scale conversion unit that converts the virtual distance into the real distance using the scale value, and
A three-dimensional shape model generator equipped with.

The focus region detection unit focuses on each image of the multi-viewpoint image according to the region focused by using the focus function of the camera that captured the multi-viewpoint image. Detect the area
The three-dimensional shape model generator according to claim 1.

The focus area detection unit detects an in-focus area in each image of the multi-viewpoint image by image processing.
The three-dimensional shape model generator according to claim 1.

The focus area detection unit uses a trained model generated by performing machine learning using a learning data set in which inputs and outputs are associated with each other, and focuses on each image of the multi-viewpoint image. Detects the area
The input of the training data set is an input image for training, and is
The output of the training data set is information indicating a focused region in the input image.
The three-dimensional shape model generator according to claim 1.

The input of the training data set is each image of the multi-viewpoint image.
The output of the training data set is to project a corrected 3D shape model on a 2D plane, which is a scale-corrected 3D shape model of a marker whose actual dimensions are known based on the actual dimensions of the marker. Further, it is information indicating an in-focus area in the corrected projection image determined based on the depth value for each pixel in the corrected projection image.
The three-dimensional shape model generator according to claim 4.

The distance calculation unit determines whether or not the pixel determined to be in focus by the focus area detection unit is an edge, and the pixel determined to be in focus by the focus area detection unit. Based on the determination result of at least one of the determination results of whether or not the corresponding pixel of the projected image obtained by projecting the three-dimensional shape model onto the two-dimensional plane is an edge. It is determined whether or not the virtual distance in the corresponding pixel of the projected image is used for calculating the virtual distance in the projected image.
The three-dimensional shape model generator according to any one of claims 1 to 5.

The scale estimation unit derives the actual distance based on the depth of field calculated from the camera parameters of the camera that captured the multi-viewpoint image.
The three-dimensional shape model generator according to any one of claims 1 to 6.

A marker scale estimation unit for deriving the actual distance based on the corrected 3D shape model in which the 3D shape model of the marker whose actual dimensions are known is scale-corrected based on the actual dimensions is further provided.
The scale estimation unit estimates the scale value using the actual distance derived by the marker scale estimation unit.
The three-dimensional shape model generator according to any one of claims 1 to 6.

The marker scale estimation unit determines that the pixels in the corrected projected image obtained by projecting the corrected three-dimensional shape model onto a two-dimensional plane based on the corrected three-dimensional shape model are in focus. The actual distance is derived based on the depth value of the determined pixel.
The three-dimensional shape model generator according to claim 8.

The scale estimation unit derives the actual distance based on the distance obtained from the focus function of the camera that captured the multi-viewpoint image.
The three-dimensional shape model generator according to any one of claims 1 to 6.

The distance calculation unit makes the distance corresponding to the depth value of the pixels included in the detected focused region correspond to each pixel of the projected image, whereby the virtual of each pixel of the projected image. Calculate the distance,
The three-dimensional shape model generator according to any one of claims 1 to 10.

The focus area detection unit detects the degree of focus for each pixel and determines the degree of focus.
The distance calculation unit weights the distance according to the depth value of the pixel included in the detected in-focus area according to the degree of the in-focus, and the weighted distance is the projected image. The virtual distance for each pixel of the projected image is calculated by associating each pixel with.
The three-dimensional shape model generator according to claim 11.

The distance calculation unit weights the distance according to the depth value of the pixels included in the detected in-focus region according to the magnitude of the distance, and the weighted distance is the projected image. The virtual distance for each pixel of the projected image is calculated by associating each pixel.
The three-dimensional shape model generator according to claim 11 or 12.

The distance calculation unit calculates a distance corresponding to the depth value of each pixel in the focused region in each image of the multi-viewpoint image, and associates the calculated distance with the pixels of the projected image to cause the projected image. When there are a plurality of distances corresponding to the pixels, the plurality of distances are compared, and the virtual distance in the pixels of the projected image is changed to the virtual distance in the projected image according to the degree of variation in the plurality of distances. Judge whether to use for calculation,
The three-dimensional shape model generator according to any one of claims 1 to 13.

A three-dimensional shape generation step in which the three-dimensional shape generation unit generates a three-dimensional shape model of the object from a plurality of multi-viewpoint images obtained by capturing the object from different viewpoints.
A focus area detection step in which the focus area detection unit detects an in-focus area in each image of the multi-viewpoint image,
The distance calculation unit virtualizes the projected image obtained by projecting the three-dimensional shape model onto a two-dimensional plane and the viewpoint position in the projected image for the pixels in the in-focus region of each image of the multi-viewpoint image. A distance calculation process that calculates a virtual distance, which is a typical distance,
A scale estimation step in which the scale estimation unit estimates a scale value indicating the ratio of the projected image to the actual actual distance to the viewpoint position in the projected image with respect to the virtual distance.
A scale conversion step in which the scale conversion unit converts the virtual distance into the real distance using the scale value.
3D shape model generation method including.

Computer,
A three-dimensional shape generating means for generating a three-dimensional shape model of the object from a plurality of multi-viewpoint images obtained by capturing the object from different viewpoints.
Focus area detecting means for detecting a focused area in each image of the multi-viewpoint image,
It is a virtual distance between the projected image obtained by projecting the three-dimensional shape model onto a two-dimensional plane and the viewpoint position in the projected image for the pixels in the in-focus region of each image of the multi-viewpoint image. Distance calculation means for calculating virtual distance and
A scale estimation means for estimating a scale value indicating the ratio of the projected image to the actual actual distance to the viewpoint position in the projected image with respect to the virtual distance.
A scale conversion means for converting the virtual distance into the real distance using the scale value.
A program to operate as.