JP7334516B2

JP7334516B2 - Three-dimensional shape model generation device, three-dimensional shape model generation method, and program

Info

Publication number: JP7334516B2
Application number: JP2019131259A
Authority: JP
Inventors: 隆史渡邉; 修二酒井
Original assignee: Toppan Inc
Current assignee: Toppan Inc
Priority date: 2019-07-16
Filing date: 2019-07-16
Publication date: 2023-08-29
Anticipated expiration: 2039-07-16
Also published as: JP2021015559A

Description

本発明は、三次元形状モデル生成装置、三次元形状モデル生成方法、及びプログラムに関する。 The present invention relates to a three-dimensional geometric model generation device, a three-dimensional geometric model generation method, and a program.

従来、対象物を互いに異なる視点から撮像した複数の画像（多視点画像）を用いて、対象物の三次元形状モデルを生成する三次元復元手法がある。この手法では、多視点画像ごとに対象物の見え方が異なることから、ステレオカメラの原理を用いて画像における各画素の奥行値を計算することにより対象物の三次元形状を作成（復元）することができる。三次元復元手法では、三次元形状を作成することができるが、対象物の実際の大きさ（スケール）を求めることはできない。画像に対象物が撮像されているだけでは、対象物の実際の大きさを求めることができないためである。 Conventionally, there is a three-dimensional reconstruction method of generating a three-dimensional shape model of an object using a plurality of images (multi-viewpoint images) obtained by imaging the object from mutually different viewpoints. In this method, since the appearance of the object differs for each multi-view image, the three-dimensional shape of the object is created (restored) by calculating the depth value of each pixel in the image using the stereo camera principle. be able to. The 3D reconstruction technique can create a 3D shape, but cannot determine the actual size (scale) of the object. This is because the actual size of the object cannot be determined only by capturing the object in the image.

画像から対象物のスケールを推定する方法の一つに、マーカを利用するものがある（例えば、特許文献１）。特許文献１には、実際の寸法（実寸）が既知のマーカを対象物と共に撮像した画像を用いて対象物の実寸を推定する技術が開示されている。 One method of estimating the scale of an object from an image uses markers (for example, Patent Document 1). Patent Literature 1 discloses a technique of estimating the actual size of an object using an image obtained by imaging a marker whose actual size (actual size) is known together with the object.

また、画像の被写界深度を用いて対象物のスケールを推定する方法がある（例えば、特許文献２）。被写界深度は、ピント（焦点）が合っているように認識されるカメラから対象物まで実際の距離の範囲である。特許文献２では、Ｄｅｐｔｈｆｒｏｍｄｅｆｏｃｕｓ方式を用いて、ピントが合っている位置が異なる複数の画像を取得し、取得した複数の画像における互いのピントの相関値を算出することにより、カメラから対象物まで実際の距離を算出する技術が開示されている。 There is also a method of estimating the scale of an object using the depth of field of an image (for example, Patent Document 2). Depth of field is the range of actual distances from the camera to objects that are perceived to be in focus. In Patent Document 2, a depth from defocus method is used to obtain a plurality of images with different in-focus positions, and by calculating the correlation value of the mutual focus in the obtained plurality of images, the object can be detected from the camera. A technique for calculating the actual distance to is disclosed.

特開２０１８－５７５３２号公報JP 2018-57532 A 特許第５９３２４７６号公報Japanese Patent No. 5932476

しかしながら、マーカと対象物とを同時に、且つ、互いに異なる視点から複数の画像を撮像しようとすれば、対象物がマーカの影に隠れてしまう可能性がある。対象物がマーカの影に隠れてしまうと、その部分の三次元形状を精度よく作成することが困難となってしまう。
また、被写界深度には幅がある。このため、被写界深度から推定した距離には誤差が含まれており、精度よく距離を推定することができないという問題があった。 However, if a plurality of images of the marker and the object are simultaneously captured from different viewpoints, the object may be hidden behind the marker. If the object is hidden by the shadow of the marker, it becomes difficult to create the three-dimensional shape of that portion with high accuracy.
Also, there is a wide range of depth of field. Therefore, the distance estimated from the depth of field includes an error, and there is a problem that the distance cannot be estimated with high accuracy.

本発明は、このような状況に鑑みてなされたもので、多視点画像を用いた三次元復元手法において作成した三次元形状の実際の寸法を、精度よく推定することができる三次元形状モデル生成装置、三次元形状モデル生成方法、及びプログラムを提供する。 The present invention has been made in view of such circumstances, and is capable of accurately estimating the actual dimensions of a 3D shape created by a 3D reconstruction method using multi-viewpoint images. An apparatus, a three-dimensional geometric model generation method, and a program are provided.

本発明の三次元形状モデル生成装置は、対象物を、互いに異なる視点から撮像した複数の多視点画像から、前記対象物の三次元形状モデルを生成する三次元形状生成部と、前記多視点画像の各画像におけるピントが合った領域を検出するピント領域検出部と、前記多視点画像の各画像におけるピントが合っている領域の画素について、前記三次元形状モデルを二次元平面に投影させた投影画像と、前記投影画像における視点位置との、仮想的な距離である仮想距離を算出する距離算出部と、前記仮想距離に対する、前記投影画像と、前記投影画像における視点位置までの実際の現実距離、の比を示すスケール値を推定するスケール推定部と、前記スケール値を用いて、前記仮想距離を、前記現実距離に変換するスケール変換部と、を備え、前記ピント領域検出部は、入力と出力とが対応づけられた学習用データセットを用いて機械学習を行うことにより生成された学習済みモデルを用いて、前記多視点画像の各画像におけるピントの合った領域を検出し、前記学習用データセットの入力は、学習用の入力画像であり、前記学習用データセットの出力は、前記入力画像におけるピントの合った領域を示す情報であることを特徴とする。
本発明の三次元形状モデル生成装置は、対象物を、互いに異なる視点から撮像した複数の多視点画像から、前記対象物の三次元形状モデルを生成する三次元形状生成部と、前記多視点画像の各画像におけるピントが合った領域を検出するピント領域検出部と、前記多視点画像の各画像におけるピントが合っている領域の画素について、前記三次元形状モデルを二次元平面に投影させた投影画像と、前記投影画像における視点位置との、仮想的な距離である仮想距離を算出する距離算出部と、前記仮想距離に対する、前記投影画像と、前記投影画像における視点位置までの実際の現実距離、の比を示すスケール値を推定するスケール推定部と、前記スケール値を用いて、前記仮想距離を、前記現実距離に変換するスケール変換部と、を備え、前記距離算出部は、前記ピント領域検出部によりピントが合っていると判定された画素がエッジであるか否かの判定結果、及び前記ピント領域検出部によりピントが合っていると判定された画素に対応する対応画素であって、前記三次元形状モデルを二次元平面に投影させた投影画像の前記対応画素がエッジであるか否かの判定結果うち、少なくとも何れか一方の判定結果に基づき、前記投影画像の前記対応画素における前記仮想距離を、前記投影画像における前記仮想距離の算出に用いるか否かを判定することを特徴とする。
本発明の三次元形状モデル生成装置は、対象物を、互いに異なる視点から撮像した複数の多視点画像から、前記対象物の三次元形状モデルを生成する三次元形状生成部と、前記多視点画像の各画像におけるピントが合った領域を検出するピント領域検出部と、前記多視点画像の各画像におけるピントが合っている領域の画素について、前記三次元形状モデルを二次元平面に投影させた投影画像と、前記投影画像における視点位置との、仮想的な距離である仮想距離を算出する距離算出部と、前記仮想距離に対する、前記投影画像と、前記投影画像における視点位置までの実際の現実距離、の比を示すスケール値を推定するスケール推定部と、前記スケール値を用いて、前記仮想距離を、前記現実距離に変換するスケール変換部と、を備え、前記距離算出部は、前記検出されたピントが合っている領域に含まれる画素のデプス値に応じた距離を、前記投影画像の画素ごとに対応させることにより、前記投影画像の画素ごとの前記仮想距離を算出し、前記ピント領域検出部は、画素ごとにピントが合った度合いを検出し、前記距離算出部は、前記検出されたピントが合っている領域に含まれる画素のデプス値に応じた距離を、前記ピントが合った度合いに応じて重みづけし、重みづけした距離を前記投影画像の画素ごとに対応させることにより、前記投影画像の画素ごとの前記仮想距離を算出することを特徴とする。
本発明の三次元形状モデル生成装置は、対象物を、互いに異なる視点から撮像した複数の多視点画像から、前記対象物の三次元形状モデルを生成する三次元形状生成部と、前記多視点画像の各画像におけるピントが合った領域を検出するピント領域検出部と、前記多視点画像の各画像におけるピントが合っている領域の画素について、前記三次元形状モデルを二次元平面に投影させた投影画像と、前記投影画像における視点位置との、仮想的な距離である仮想距離を算出する距離算出部と、前記仮想距離に対する、前記投影画像と、前記投影画像における視点位置までの実際の現実距離、の比を示すスケール値を推定するスケール推定部と、前記スケール値を用いて、前記仮想距離を、前記現実距離に変換するスケール変換部と、を備え、前記距離算出部は、前記検出されたピントが合っている領域に含まれる画素のデプス値に応じた距離を、前記投影画像の画素ごとに対応させることにより、前記投影画像の画素ごとの前記仮想距離を算出し、前記検出されたピントが合っている領域に含まれる画素のデプス値に応じた距離を、当該距離の大きさに応じて重みづけし、重みづけした距離を前記投影画像の画素ごとに対応させることにより、前記投影画像の画素ごとの前記仮想距離を算出することを特徴とする。
本発明の三次元形状モデル生成装置は、対象物を、互いに異なる視点から撮像した複数の多視点画像から、前記対象物の三次元形状モデルを生成する三次元形状生成部と、前記多視点画像の各画像におけるピントが合った領域を検出するピント領域検出部と、前記多視点画像の各画像におけるピントが合っている領域の画素について、前記三次元形状モデルを二次元平面に投影させた投影画像と、前記投影画像における視点位置との、仮想的な距離である仮想距離を算出する距離算出部と、前記仮想距離に対する、前記投影画像と、前記投影画像における視点位置までの実際の現実距離、の比を示すスケール値を推定するスケール推定部と、前記スケール値を用いて、前記仮想距離を、前記現実距離に変換するスケール変換部と、を備え、前記距離算出部は、前記多視点画像の各画像におけるピントが合った領域の画素ごとのデプス値に応じた距離を算出し、算出した距離を前記投影画像の画素に対応させ、前記投影画像の画素に対応する距離が複数ある場合において、当該複数の距離を比較し、当該複数の距離のばらつきの度合いに応じて、前記投影画像の画素における前記仮想距離を、前記投影画像における前記仮想距離の算出に用いるか否かを判定することを特徴とする。 A three-dimensional shape model generation device according to the present invention comprises a three-dimensional shape generation unit for generating a three-dimensional shape model of an object from a plurality of multi-view images of the object taken from different viewpoints; a focus area detection unit that detects an in-focus area in each image of the multi-viewpoint image; a distance calculator for calculating a virtual distance between an image and a viewpoint position in the projected image; and an actual distance between the projected image and the viewpoint position in the projected image with respect to the virtual distance. and a scale conversion unit that converts the virtual distance into the real distance using the scale value, wherein the focus area detection unit includes an input and Using a learned model generated by performing machine learning using a learning data set associated with the output, detecting an in-focus region in each image of the multi-view image, the learning The input of the dataset is an input image for learning, and the output of the dataset for learning is information indicating an in-focus region in the input image.
A three-dimensional shape model generation device according to the present invention comprises a three-dimensional shape generation unit for generating a three-dimensional shape model of an object from a plurality of multi-view images of the object taken from different viewpoints; a focus area detection unit that detects an in-focus area in each image of the multi-viewpoint image; a distance calculator for calculating a virtual distance between an image and a viewpoint position in the projected image; and an actual distance between the projected image and the viewpoint position in the projected image with respect to the virtual distance. and a scale conversion unit for converting the virtual distance into the real distance using the scale value, wherein the distance calculation unit includes the focus area A determination result of whether or not a pixel determined to be in focus by the detection unit is an edge, and a corresponding pixel corresponding to the pixel determined to be in focus by the focus area detection unit, Based on at least one determination result of whether or not the corresponding pixel of the projection image obtained by projecting the three-dimensional shape model onto a two-dimensional plane is an edge, the It is characterized in that it is determined whether or not the virtual distance is used for calculating the virtual distance in the projection image.
A three-dimensional shape model generation device according to the present invention comprises a three-dimensional shape generation unit for generating a three-dimensional shape model of an object from a plurality of multi-view images of the object taken from different viewpoints; a focus area detection unit that detects an in-focus area in each image of the multi-viewpoint image; a distance calculator for calculating a virtual distance between an image and a viewpoint position in the projected image; and an actual distance between the projected image and the viewpoint position in the projected image with respect to the virtual distance. and a scale conversion unit that converts the virtual distance into the real distance using the scale value, wherein the distance calculation unit includes: The virtual distance for each pixel of the projected image is calculated by associating the distance corresponding to the depth value of the pixels included in the focused area with each pixel of the projected image, and the focus area detection is performed. A unit detects a degree of focus for each pixel, and the distance calculation unit calculates a distance corresponding to a depth value of a pixel included in the detected in-focus region as the degree of focus. and corresponding the weighted distance to each pixel of the projection image, thereby calculating the virtual distance for each pixel of the projection image.
A three-dimensional shape model generation device according to the present invention comprises a three-dimensional shape generation unit for generating a three-dimensional shape model of an object from a plurality of multi-view images of the object taken from different viewpoints; a focus area detection unit that detects an in-focus area in each image of the multi-viewpoint image; a distance calculator for calculating a virtual distance between an image and a viewpoint position in the projected image; and an actual distance between the projected image and the viewpoint position in the projected image with respect to the virtual distance. and a scale conversion unit that converts the virtual distance into the real distance using the scale value, wherein the distance calculation unit includes: The virtual distance for each pixel of the projection image is calculated by associating the distance corresponding to the depth value of the pixels included in the focused area with each pixel of the projection image, and the detected By weighting the distance corresponding to the depth value of the pixel included in the in-focus area according to the size of the distance, and making the weighted distance correspond to each pixel of the projection image, the projection The virtual distance is calculated for each pixel of the image.
A three-dimensional shape model generation device according to the present invention comprises a three-dimensional shape generation unit for generating a three-dimensional shape model of an object from a plurality of multi-view images of the object taken from different viewpoints; a focus area detection unit that detects an in-focus area in each image of the multi-viewpoint image; a distance calculator for calculating a virtual distance between an image and a viewpoint position in the projected image; and an actual distance between the projected image and the viewpoint position in the projected image with respect to the virtual distance. and a scale conversion unit that converts the virtual distance into the real distance using the scale value, wherein the distance calculation unit includes the multi-viewpoint When a distance corresponding to the depth value of each pixel in the in-focus area in each image is calculated, the calculated distance is associated with the pixel of the projection image, and there are a plurality of distances corresponding to the pixel of the projection image. in comparing the plurality of distances, and determining whether or not to use the virtual distance in pixels of the projection image for calculating the virtual distance in the projection image according to the degree of variation in the plurality of distances. It is characterized by

本発明の三次元形状モデル生成装置では、前記学習用データセットの入力は、多視点画像の各画像であり、前記学習用データセットの出力は、実際の寸法が既知であるマーカの三次元形状モデルを、前記マーカの実際の寸法に基づいてスケール補正した、補正済み三次元形状モデルを二次元平面に投影させた、補正済みの投影画像における画素ごとのデプス値に基づいて判定した、前記補正済みの投影画像におけるピントの合った領域を示す情報である。 In the three-dimensional shape model generation device of the present invention, the input of the learning data set is each image of multi-viewpoint images, and the output of the learning data set is the three-dimensional shape of a marker whose actual dimensions are known. the model is scale-corrected based on the actual dimensions of the markers, the corrected three-dimensional shape model is projected onto a two-dimensional plane, and the correction is determined based on the depth value for each pixel in a corrected projected image. This is information indicating an in-focus region in the already projected image.

本発明の三次元形状モデル生成装置では、前記スケール推定部は、前記多視点画像を撮像したカメラのカメラパラメータから算出される被写界深度に基づき、前記現実距離を導出する。 In the three-dimensional shape model generation device of the present invention, the scale estimation unit derives the actual distance based on the depth of field calculated from the camera parameters of the camera that captured the multi-viewpoint images.

本発明の三次元形状モデル生成装置では、実際の寸法が既知であるマーカの三次元形状モデルを、実際の寸法に基づいてスケール補正した補正済み三次元形状モデルに基づいて、前記現実距離を導出するマーカスケール推定部を、更に備え、前記スケール推定部は、前記マーカスケール推定部により導出された前記現実距離を用いて、前記スケール値を推定する。 In the three-dimensional shape model generation device of the present invention, the actual distance is derived based on the corrected three-dimensional shape model obtained by performing scale correction on the three-dimensional shape model of the marker whose actual dimensions are known. and a marker scale estimator, wherein the scale estimator estimates the scale value using the actual distance derived by the marker scale estimator.

本発明の三次元形状モデル生成装置では、前記マーカスケール推定部は、前記補正済み三次元形状モデルに基づいて、前記補正済み三次元形状モデルを二次元平面に投影させた、補正済みの投影画像における画素のうち、ピントが合っていると判定される画素のデプス値に基づき、前記現実距離を導出する。 In the three-dimensional geometric model generation device of the present invention, the marker scale estimating section projects the corrected three-dimensional geometric model onto a two-dimensional plane based on the corrected three-dimensional geometric model, and produces a corrected projected image. The actual distance is derived based on the depth value of a pixel that is determined to be in focus among the pixels in .

本発明の三次元形状モデル生成装置では、前記スケール推定部は、前記多視点画像を撮像したカメラのフォーカス機能から得られる距離に基づき、前記現実距離を導出する。 In the three-dimensional geometric model generation device of the present invention, the scale estimation unit derives the actual distance based on the distance obtained from the focus function of the camera that captured the multi-viewpoint images.

本発明の三次元形状モデル生成方法は、コンピュータが行う三次元形状モデル生成方法であって、三次元形状生成部が、対象物を、互いに異なる視点から撮像した複数の多視点画像から、前記対象物の三次元形状モデルを生成する三次元形状生成工程と、ピント領域検出部が、前記多視点画像の各画像におけるピントが合った領域を検出するピント領域検出工程と、距離算出部が、前記多視点画像の各画像におけるピントが合っている領域の画素について、前記三次元形状モデルを二次元平面に投影させた投影画像と、前記投影画像における視点位置との、仮想的な距離である仮想距離を算出する距離算出工程と、スケール推定部が、前記仮想距離に対する、前記投影画像と、前記投影画像における視点位置までの実際の現実距離、の比を示すスケール値を推定するスケール推定工程と、スケール変換部が、前記スケール値を用いて、前記仮想距離を、前記現実距離に変換するスケール変換工程と、を含み、前記ピント領域検出工程において、入力と出力とが対応づけられた学習用データセットを用いて機械学習を行うことにより生成された学習済みモデルを用いて、前記多視点画像の各画像におけるピントの合った領域を検出し、前記学習用データセットの入力は、学習用の入力画像であり、前記学習用データセットの出力は、前記入力画像におけるピントの合った領域を示す情報であることを特徴とする。
本発明の三次元形状モデル生成方法は、コンピュータが行う三次元形状モデル生成方法であって、三次元形状生成部が、対象物を、互いに異なる視点から撮像した複数の多視点画像から、前記対象物の三次元形状モデルを生成する三次元形状生成工程と、ピント領域検出部が、前記多視点画像の各画像におけるピントが合った領域を検出するピント領域検出工程と、距離算出部が、前記多視点画像の各画像におけるピントが合っている領域の画素について、前記三次元形状モデルを二次元平面に投影させた投影画像と、前記投影画像における視点位置との、仮想的な距離である仮想距離を算出する距離算出工程と、スケール推定部が、前記仮想距離に対する、前記投影画像と、前記投影画像における視点位置までの実際の現実距離、の比を示すスケール値を推定するスケール推定工程と、スケール変換部が、前記スケール値を用いて、前記仮想距離を、前記現実距離に変換するスケール変換工程と、を含み、前記距離算出工程において、前記ピント領域検出工程にてピントが合っていると判定された画素がエッジであるか否かの判定結果、及び前記ピント領域検出工程にてピントが合っていると判定された画素に対応する対応画素であって、前記三次元形状モデルを二次元平面に投影させた投影画像の前記対応画素がエッジであるか否かの判定結果うち、少なくとも何れか一方の判定結果に基づき、前記投影画像の前記対応画素における前記仮想距離を、前記投影画像における前記仮想距離の算出に用いるか否かを判定することを特徴とする。
本発明の三次元形状モデル生成方法は、コンピュータが行う三次元形状モデル生成方法であって、三次元形状生成部が、対象物を、互いに異なる視点から撮像した複数の多視点画像から、前記対象物の三次元形状モデルを生成する三次元形状生成工程と、ピント領域検出部が、前記多視点画像の各画像におけるピントが合った領域を検出するピント領域検出工程と、距離算出部が、前記多視点画像の各画像におけるピントが合っている領域の画素について、前記三次元形状モデルを二次元平面に投影させた投影画像と、前記投影画像における視点位置との、仮想的な距離である仮想距離を算出する距離算出工程と、スケール推定部が、前記仮想距離に対する、前記投影画像と、前記投影画像における視点位置までの実際の現実距離、の比を示すスケール値を推定するスケール推定工程と、スケール変換部が、前記スケール値を用いて、前記仮想距離を、前記現実距離に変換するスケール変換工程と、を含み、前記距離算出工程において、前記検出されたピントが合っている領域に含まれる画素のデプス値に応じた距離を、前記投影画像の画素ごとに対応させることにより、前記投影画像の画素ごとの前記仮想距離を算出し、前記ピント領域検出工程において、画素ごとにピントが合った度合いを検出し、前記距離算出工程において、前記検出されたピントが合っている領域に含まれる画素のデプス値に応じた距離を、前記ピントが合った度合いに応じて重みづけし、重みづけした距離を前記投影画像の画素ごとに対応させることにより、前記投影画像の画素ごとの前記仮想距離を算出することを特徴とする。
本発明の三次元形状モデル生成方法は、コンピュータが行う三次元形状モデル生成方法であって、三次元形状生成部が、対象物を、互いに異なる視点から撮像した複数の多視点画像から、前記対象物の三次元形状モデルを生成する三次元形状生成工程と、ピント領域検出部が、前記多視点画像の各画像におけるピントが合った領域を検出するピント領域検出工程と、距離算出部が、前記多視点画像の各画像におけるピントが合っている領域の画素について、前記三次元形状モデルを二次元平面に投影させた投影画像と、前記投影画像における視点位置との、仮想的な距離である仮想距離を算出する距離算出工程と、スケール推定部が、前記仮想距離に対する、前記投影画像と、前記投影画像における視点位置までの実際の現実距離、の比を示すスケール値を推定するスケール推定工程と、スケール変換部が、前記スケール値を用いて、前記仮想距離を、前記現実距離に変換するスケール変換工程と、を含み、前記距離算出工程において、前記検出されたピントが合っている領域に含まれる画素のデプス値に応じた距離を、前記投影画像の画素ごとに対応させることにより、前記投影画像の画素ごとの前記仮想距離を算出し、前記検出されたピントが合っている領域に含まれる画素のデプス値に応じた距離を、当該距離の大きさに応じて重みづけし、重みづけした距離を前記投影画像の画素ごとに対応させることにより、前記投影画像の画素ごとの前記仮想距離を算出することを特徴とする。
本発明の三次元形状モデル生成方法は、コンピュータが行う三次元形状モデル生成方法であって、三次元形状生成部が、対象物を、互いに異なる視点から撮像した複数の多視点画像から、前記対象物の三次元形状モデルを生成する三次元形状生成工程と、ピント領域検出部が、前記多視点画像の各画像におけるピントが合った領域を検出するピント領域検出工程と、距離算出部が、前記多視点画像の各画像におけるピントが合っている領域の画素について、前記三次元形状モデルを二次元平面に投影させた投影画像と、前記投影画像における視点位置との、仮想的な距離である仮想距離を算出する距離算出工程と、スケール推定部が、前記仮想距離に対する、前記投影画像と、前記投影画像における視点位置までの実際の現実距離、の比を示すスケール値を推定するスケール推定工程と、スケール変換部が、前記スケール値を用いて、前記仮想距離を、前記現実距離に変換するスケール変換工程と、を含み、前記距離算出工程において、前記多視点画像の各画像におけるピントが合った領域の画素ごとのデプス値に応じた距離を算出し、算出した距離を前記投影画像の画素に対応させ、前記投影画像の画素に対応する距離が複数ある場合において、当該複数の距離を比較し、当該複数の距離のばらつきの度合いに応じて、前記投影画像の画素における前記仮想距離を、前記投影画像における前記仮想距離の算出に用いるか否かを判定することを特徴とする。 A three-dimensional shape model generation method of the present invention is a three-dimensional shape model generation method performed by a computer, wherein a three-dimensional shape generation unit generates an object from a plurality of multi-viewpoint images captured from mutually different viewpoints. a three-dimensional shape generation step of generating a three-dimensional shape model of an object; a focus region detection step of detecting a focused region in each image of the multi-viewpoint image; A virtual distance between a projection image obtained by projecting the three-dimensional shape model onto a two-dimensional plane and a viewpoint position in the projection image for pixels in an in-focus area in each image of the multi-viewpoint image. a distance calculating step of calculating a distance; and a scale estimating step of estimating a scale value indicating a ratio of the projected image and the actual distance to the viewpoint position in the projected image to the virtual distance. and a scale conversion step of converting the virtual distance into the real distance using the scale value, wherein learning in which the input and the output are associated in the focus area detection step Using a trained model generated by performing machine learning using a data set for learning, an in-focus region in each image of the multi-view image is detected, and the input of the data set for learning is for learning , and the output of the learning data set is information indicating an in-focus region in the input image .
A three-dimensional shape model generation method of the present invention is a three-dimensional shape model generation method performed by a computer, wherein a three-dimensional shape generation unit generates an object from a plurality of multi-viewpoint images captured from mutually different viewpoints. a three-dimensional shape generation step of generating a three-dimensional shape model of an object; a focus region detection step of detecting a focused region in each image of the multi-viewpoint image; A virtual distance between a projection image obtained by projecting the three-dimensional shape model onto a two-dimensional plane and a viewpoint position in the projection image for pixels in an in-focus area in each image of the multi-viewpoint image. a distance calculating step of calculating a distance; and a scale estimating step of estimating a scale value indicating a ratio of the projected image and the actual distance to the viewpoint position in the projected image to the virtual distance. and a scale conversion step of converting the virtual distance into the real distance by using the scale value, wherein in the distance calculation step, the focus area is detected in the focus area detection step. and a corresponding pixel corresponding to the pixel determined to be in focus in the focus area detection step, the three-dimensional shape model being divided into two Based on at least one determination result as to whether or not the corresponding pixel of the projection image projected onto the dimensional plane is an edge, the virtual distance of the corresponding pixel of the projection image is calculated as the projection image. It is characterized in that it is determined whether or not to use for the calculation of the virtual distance in.
A three-dimensional shape model generation method of the present invention is a three-dimensional shape model generation method performed by a computer, wherein a three-dimensional shape generation unit generates an object from a plurality of multi-viewpoint images captured from mutually different viewpoints. a three-dimensional shape generation step of generating a three-dimensional shape model of an object; a focus region detection step of detecting a focused region in each image of the multi-viewpoint image; A virtual distance between a projection image obtained by projecting the three-dimensional shape model onto a two-dimensional plane and a viewpoint position in the projection image for pixels in an in-focus area in each image of the multi-viewpoint image. a distance calculating step of calculating a distance; and a scale estimating step of estimating a scale value indicating a ratio of the projected image and the actual distance to the viewpoint position in the projected image to the virtual distance. and a scale conversion step in which a scale conversion unit converts the virtual distance into the real distance using the scale value, and in the distance calculation step, the distance is included in the detected in-focus region. The virtual distance for each pixel of the projection image is calculated by associating the distance corresponding to the depth value of the pixel in the projection image with each pixel of the projection image, and in the focus area detecting step, each pixel is in focus. and, in the distance calculation step, weighting a distance according to the depth value of pixels included in the detected in-focus region according to the degree of in-focus, and weighting The virtual distance is calculated for each pixel of the projection image by associating the calculated distance with each pixel of the projection image.
A three-dimensional shape model generation method of the present invention is a three-dimensional shape model generation method performed by a computer, wherein a three-dimensional shape generation unit generates an object from a plurality of multi-viewpoint images captured from mutually different viewpoints. a three-dimensional shape generation step of generating a three-dimensional shape model of an object; a focus region detection step of detecting a focused region in each image of the multi-viewpoint image; A virtual distance between a projection image obtained by projecting the three-dimensional shape model onto a two-dimensional plane and a viewpoint position in the projection image for pixels in an in-focus area in each image of the multi-viewpoint image. a distance calculating step of calculating a distance; and a scale estimating step of estimating a scale value indicating a ratio of the projected image and the actual distance to the viewpoint position in the projected image to the virtual distance. and a scale conversion step in which a scale conversion unit converts the virtual distance into the real distance using the scale value, and in the distance calculation step, the distance is included in the detected in-focus region. The virtual distance for each pixel of the projection image is calculated by associating the distance corresponding to the depth value of the pixel in the projection image with each pixel of the projection image, and the virtual distance is included in the detected in-focus region. The virtual distance for each pixel of the projection image is calculated by weighting the distance corresponding to the depth value of the pixel according to the magnitude of the distance and by associating the weighted distance with each pixel of the projection image. It is characterized by calculating.
A three-dimensional shape model generation method of the present invention is a three-dimensional shape model generation method performed by a computer, wherein a three-dimensional shape generation unit generates an object from a plurality of multi-viewpoint images captured from mutually different viewpoints. a three-dimensional shape generation step of generating a three-dimensional shape model of an object; a focus region detection step of detecting a focused region in each image of the multi-viewpoint image; A virtual distance between a projection image obtained by projecting the three-dimensional shape model onto a two-dimensional plane and a viewpoint position in the projection image for pixels in an in-focus area in each image of the multi-viewpoint image. a distance calculating step of calculating a distance; and a scale estimating step of estimating a scale value indicating a ratio of the projected image and the actual distance to the viewpoint position in the projected image to the virtual distance. and a scale conversion step in which the scale conversion unit converts the virtual distance into the real distance using the scale value, and in the distance calculation step, each image of the multi-viewpoint image is in focus. calculating a distance according to the depth value of each pixel in the region, making the calculated distance correspond to the pixel of the projection image, and comparing the plurality of distances when there are a plurality of distances corresponding to the pixels of the projection image. and determining whether or not to use the virtual distance in pixels of the projection image for calculating the virtual distance in the projection image according to the degree of variation in the plurality of distances.

本発明のプログラムは、コンピュータを、対象物を、互いに異なる視点から撮像した複数の多視点画像から、前記対象物の三次元形状モデルを生成する三次元形状生成手段、前記多視点画像の各画像におけるピントが合った領域を検出するピント領域検出手段、前記多視点画像の各画像におけるピントが合っている領域の画素について、前記三次元形状モデルを二次元平面に投影させた投影画像と、前記投影画像における視点位置との、仮想的な距離である仮想距離を算出する距離算出手段と、前記仮想距離に対する、前記投影画像と、前記投影画像における視点位置までの実際の現実距離、の比を示すスケール値を推定するスケール推定手段、前記スケール値を用いて、前記仮想距離を、前記現実距離に変換するスケール変換手段、として動作させるためのプログラムであって、前記距離算出手段において、前記検出されたピントが合っている領域に含まれる画素のデプス値に応じた距離を、前記投影画像の画素ごとに対応させることにより、前記投影画像の画素ごとの前記仮想距離を算出し、前記検出されたピントが合っている領域に含まれる画素のデプス値に応じた距離を、当該距離の大きさに応じて重みづけし、重みづけした距離を前記投影画像の画素ごとに対応させることにより、前記投影画像の画素ごとの前記仮想距離を算出するプログラムである。
本発明のプログラムは、コンピュータを、対象物を、互いに異なる視点から撮像した複数の多視点画像から、前記対象物の三次元形状モデルを生成する三次元形状生成手段、前記多視点画像の各画像におけるピントが合った領域を検出するピント領域検出手段、前記多視点画像の各画像におけるピントが合っている領域の画素について、前記三次元形状モデルを二次元平面に投影させた投影画像と、前記投影画像における視点位置との、仮想的な距離である仮想距離を算出する距離算出手段と、前記仮想距離に対する、前記投影画像と、前記投影画像における視点位置までの実際の現実距離、の比を示すスケール値を推定するスケール推定手段、前記スケール値を用いて、前記仮想距離を、前記現実距離に変換するスケール変換手段、として動作させるためのプログラムであって、前記距離算出手段において、前記ピント領域検出手段にてピントが合っていると判定された画素がエッジであるか否かの判定結果、及び前記ピント領域検出手段にてピントが合っていると判定された画素に対応する対応画素であって、前記三次元形状モデルを二次元平面に投影させた投影画像の前記対応画素がエッジであるか否かの判定結果うち、少なくとも何れか一方の判定結果に基づき、前記投影画像の前記対応画素における前記仮想距離を、前記投影画像における前記仮想距離の算出に用いるか否かを判定するプログラムである。
本発明のプログラムは、コンピュータを、対象物を、互いに異なる視点から撮像した複数の多視点画像から、前記対象物の三次元形状モデルを生成する三次元形状生成手段、前記多視点画像の各画像におけるピントが合った領域を検出するピント領域検出手段、前記多視点画像の各画像におけるピントが合っている領域の画素について、前記三次元形状モデルを二次元平面に投影させた投影画像と、前記投影画像における視点位置との、仮想的な距離である仮想距離を算出する距離算出手段と、前記仮想距離に対する、前記投影画像と、前記投影画像における視点位置までの実際の現実距離、の比を示すスケール値を推定するスケール推定手段、前記スケール値を用いて、前記仮想距離を、前記現実距離に変換するスケール変換手段、として動作させるためのプログラムであって、前記距離算出手段において、前記検出されたピントが合っている領域に含まれる画素のデプス値に応じた距離を、前記投影画像の画素ごとに対応させることにより、前記投影画像の画素ごとの前記仮想距離を算出し、前記ピント領域検出手段において、画素ごとにピントが合った度合いを検出し、前記距離算出手段において、前記検出されたピントが合っている領域に含まれる画素のデプス値に応じた距離を、前記ピントが合った度合いに応じて重みづけし、重みづけした距離を前記投影画像の画素ごとに対応させることにより、前記投影画像の画素ごとの前記仮想距離を算出するプログラムである。
本発明のプログラムは、コンピュータを、対象物を、互いに異なる視点から撮像した複数の多視点画像から、前記対象物の三次元形状モデルを生成する三次元形状生成手段、前記多視点画像の各画像におけるピントが合った領域を検出するピント領域検出手段、前記多視点画像の各画像におけるピントが合っている領域の画素について、前記三次元形状モデルを二次元平面に投影させた投影画像と、前記投影画像における視点位置との、仮想的な距離である仮想距離を算出する距離算出手段と、前記仮想距離に対する、前記投影画像と、前記投影画像における視点位置までの実際の現実距離、の比を示すスケール値を推定するスケール推定手段、前記スケール値を用いて、前記仮想距離を、前記現実距離に変換するスケール変換手段、として動作させるためのプログラムであって、前記距離算出手段において、前記検出されたピントが合っている領域に含まれる画素のデプス値に応じた距離を、前記投影画像の画素ごとに対応させることにより、前記投影画像の画素ごとの前記仮想距離を算出し、前記検出されたピントが合っている領域に含まれる画素のデプス値に応じた距離を、当該距離の大きさに応じて重みづけし、重みづけした距離を前記投影画像の画素ごとに対応させることにより、前記投影画像の画素ごとの前記仮想距離を算出するプログラムである。
本発明のプログラムは、コンピュータを、対象物を、互いに異なる視点から撮像した複数の多視点画像から、前記対象物の三次元形状モデルを生成する三次元形状生成手段、前記多視点画像の各画像におけるピントが合った領域を検出するピント領域検出手段、前記多視点画像の各画像におけるピントが合っている領域の画素について、前記三次元形状モデルを二次元平面に投影させた投影画像と、前記投影画像における視点位置との、仮想的な距離である仮想距離を算出する距離算出手段と、前記仮想距離に対する、前記投影画像と、前記投影画像における視点位置までの実際の現実距離、の比を示すスケール値を推定するスケール推定手段、前記スケール値を用いて、前記仮想距離を、前記現実距離に変換するスケール変換手段、として動作させるためのプログラムであって、前記距離算出手段において、前記多視点画像の各画像におけるピントが合った領域の画素ごとのデプス値に応じた距離を算出し、算出した距離を前記投影画像の画素に対応させ、前記投影画像の画素に対応する距離が複数ある場合において、当該複数の距離を比較し、当該複数の距離のばらつきの度合いに応じて、前記投影画像の画素における前記仮想距離を、前記投影画像における前記仮想距離の算出に用いるか否かを判定するプログラムである。 The program of the present invention comprises a computer, a three-dimensional shape generation means for generating a three-dimensional shape model of the object from a plurality of multi-view images of the object taken from different viewpoints, and each image of the multi-view images. a focused area detection means for detecting an in-focus area in the multi-viewpoint image, a projected image obtained by projecting the three-dimensional shape model onto a two-dimensional plane for pixels of the in-focus area in each image of the multi-viewpoint image; A distance calculation means for calculating a virtual distance, which is a virtual distance, between a viewpoint position in a projected image and a ratio of the actual distance between the projected image and the viewpoint position in the projected image to the virtual distance. A program for operating as scale estimation means for estimating a scale value indicating a scale value and scale conversion means for converting the virtual distance into the real distance using the scale value, wherein the distance calculation means performs the detection The virtual distance for each pixel of the projection image is calculated by associating the distance corresponding to the depth value of the pixels included in the focused area with each pixel of the projection image, and the detected By weighting the distance corresponding to the depth value of the pixels included in the in-focus region according to the size of the distance, and making the weighted distance correspond to each pixel of the projection image, A program for calculating the virtual distance for each pixel of a projection image .
The program of the present invention comprises a computer, a three-dimensional shape generation means for generating a three-dimensional shape model of the object from a plurality of multi-view images of the object taken from different viewpoints, and each image of the multi-view images. a focused area detection means for detecting an in-focus area in the multi-viewpoint image, a projected image obtained by projecting the three-dimensional shape model onto a two-dimensional plane for pixels of the in-focus area in each image of the multi-viewpoint image; A distance calculation means for calculating a virtual distance, which is a virtual distance, between a viewpoint position in a projected image and a ratio of the actual distance between the projected image and the viewpoint position in the projected image to the virtual distance. A program for operating as scale estimation means for estimating a scale value indicating a scale value and scale conversion means for converting the virtual distance into the real distance using the scale value, wherein the distance calculation means performs the focus A determination result of whether or not a pixel determined to be in focus by the area detection means is an edge, and a corresponding pixel corresponding to the pixel determined to be in focus by the focus area detection means. wherein the correspondence of the projection image is determined based on at least one determination result of whether or not the corresponding pixel of the projection image obtained by projecting the three-dimensional shape model onto a two-dimensional plane is an edge. A program for determining whether or not to use the virtual distance in a pixel for calculating the virtual distance in the projection image.
The program of the present invention comprises a computer, a three-dimensional shape generation means for generating a three-dimensional shape model of the object from a plurality of multi-view images of the object taken from different viewpoints, and each image of the multi-view images. a focused area detection means for detecting an in-focus area in the multi-viewpoint image, a projected image obtained by projecting the three-dimensional shape model onto a two-dimensional plane for pixels of the in-focus area in each image of the multi-viewpoint image; A distance calculation means for calculating a virtual distance, which is a virtual distance, between a viewpoint position in a projected image and a ratio of the actual distance between the projected image and the viewpoint position in the projected image to the virtual distance. A program for operating as scale estimation means for estimating a scale value indicating a scale value and scale conversion means for converting the virtual distance into the real distance using the scale value, wherein the distance calculation means performs the detection The virtual distance for each pixel of the projected image is calculated by associating the distance according to the depth value of the pixels included in the focused area with each pixel of the projected image, and the in-focus area is calculated. The detection means detects the degree of focus for each pixel, and the distance calculation means calculates the distance according to the depth value of the pixels included in the detected in-focus area. The program calculates the virtual distance for each pixel of the projection image by weighting according to the degree and making the weighted distance correspond to each pixel of the projection image.
The program of the present invention comprises a computer, a three-dimensional shape generation means for generating a three-dimensional shape model of the object from a plurality of multi-view images of the object taken from different viewpoints, and each image of the multi-view images. a focused area detection means for detecting an in-focus area in the multi-viewpoint image, a projected image obtained by projecting the three-dimensional shape model onto a two-dimensional plane for pixels of the in-focus area in each image of the multi-viewpoint image; A distance calculation means for calculating a virtual distance, which is a virtual distance, between a viewpoint position in a projected image and a ratio of the actual distance between the projected image and the viewpoint position in the projected image to the virtual distance. A program for operating as scale estimation means for estimating a scale value indicating a scale value and scale conversion means for converting the virtual distance into the real distance using the scale value, wherein the distance calculation means performs the detection The virtual distance for each pixel of the projection image is calculated by associating the distance corresponding to the depth value of the pixels included in the focused area with each pixel of the projection image, and the detected By weighting the distance corresponding to the depth value of the pixels included in the in-focus region according to the size of the distance, and making the weighted distance correspond to each pixel of the projection image, A program for calculating the virtual distance for each pixel of a projection image.
The program of the present invention comprises a computer, a three-dimensional shape generation means for generating a three-dimensional shape model of the object from a plurality of multi-view images of the object taken from different viewpoints, and each image of the multi-view images. a focused area detection means for detecting an in-focus area in the multi-viewpoint image, a projected image obtained by projecting the three-dimensional shape model onto a two-dimensional plane for pixels of the in-focus area in each image of the multi-viewpoint image; A distance calculation means for calculating a virtual distance, which is a virtual distance, between a viewpoint position in a projected image and a ratio of the actual distance between the projected image and the viewpoint position in the projected image to the virtual distance. A program for operating as scale estimation means for estimating a scale value shown and scale conversion means for converting the virtual distance into the real distance using the scale value, wherein the distance calculation means includes the multi A distance corresponding to a depth value of each pixel in an in-focus region in each image of the viewpoint image is calculated, the calculated distance is associated with the pixel of the projection image, and there are a plurality of distances corresponding to the pixel of the projection image. case, the plurality of distances are compared, and whether or not the virtual distance in pixels of the projection image is used to calculate the virtual distance in the projection image is determined according to the degree of variation in the plurality of distances. It is a program that

本発明によれば、多視点画像を用いた三次元復元手法において作成した三次元形状の実際の寸法を、精度よく推定することができる。 According to the present invention, it is possible to accurately estimate the actual dimensions of a three-dimensional shape created by a three-dimensional reconstruction method using multi-viewpoint images.

第１の実施形態に係る三次元形状モデル生成装置１の構成の例を示すブロック図である。1 is a block diagram showing an example configuration of a three-dimensional geometric model generating device 1 according to a first embodiment; FIG. 第１の実施形態に係るスケール情報記憶部１０９に記憶される情報の構成の例を示す図である。4 is a diagram showing an example of the structure of information stored in a scale information storage unit 109 according to the first embodiment; FIG. 第１の実施形態に係る複数の多視点画像ＴＧの例を示す図である。FIG. 4 is a diagram showing an example of a plurality of multi-viewpoint images TG according to the first embodiment; FIG. 第１の実施形態に係る三次元形状モデルＭの例を示す図である。It is a figure which shows the example of the three-dimensional-shaped model M which concerns on 1st Embodiment. 第１の実施形態に係る多視点画像ＴＧの例を示す図である。FIG. 3 is a diagram showing an example of a multi-viewpoint image TG according to the first embodiment; FIG. 第１の実施形態に係る多視点画像ＴＧのブラーマップＢＭの例を示す図である。FIG. 4 is a diagram showing an example of a blur map BM of a multi-viewpoint image TG according to the first embodiment; 第１の実施形態に係る被写界深度の関数の例を示す図である。FIG. 4 is a diagram showing an example of a function of depth of field according to the first embodiment; 第１の実施形態に係る三次元形状モデルＭの投影画像の画素における仮想距離の分布の例である。It is an example of the distribution of virtual distances in the pixels of the projection image of the three-dimensional shape model M according to the first embodiment. 第１の実施形態に係る三次元形状モデル生成装置１が行う処理の流れを示すフローチャートである。4 is a flow chart showing the flow of processing performed by the three-dimensional geometric model generation device 1 according to the first embodiment; 第２の実施形態に係る三次元形状モデル生成装置１Ａの構成の例を示すブロック図である。It is a block diagram which shows the example of a structure of 1 A of three-dimensional geometric model generation apparatuses which concern on 2nd Embodiment.

以下、実施形態の三次元形状モデル生成装置を、図面を参照しながら説明する。 A three-dimensional geometric model generation device according to an embodiment will be described below with reference to the drawings.

＜第１の実施形態＞
まず、第１の実施形態について説明する。
図１は、第１の実施形態に係る三次元形状モデル生成装置１の構成の例を示すブロック図である。三次元形状モデル生成装置１は、例えば、画像データ取得部１０１と、三次元形状生成部１０２と、ピント領域検出部１０３と、距離算出部１０４と、スケール推定部１０５と、スケール変換部１０６と、画像データ記憶部１０７と、三次元形状記憶部１０８と、スケール情報記憶部１０９とを備える。 <First embodiment>
First, the first embodiment will be explained.
FIG. 1 is a block diagram showing an example of the configuration of a three-dimensional geometric model generation device 1 according to the first embodiment. The 3D shape model generation device 1 includes, for example, an image data acquisition unit 101, a 3D shape generation unit 102, a focus area detection unit 103, a distance calculation unit 104, a scale estimation unit 105, and a scale conversion unit 106. , an image data storage unit 107 , a three-dimensional shape storage unit 108 , and a scale information storage unit 109 .

画像データ取得部１０１は、多視点画像ＴＧ（図３Ａ参照）の画像情報を、画像データ記憶部１０７から取得する。多視点画像ＴＧは、対象物Ｔが互いに異なる視点から撮像された画像である。対象物Ｔは、撮像し得る物体であって、任意の三次元形状を有する物体である。多視点画像ＴＧの画像情報には、多視点画像ＴＧの画素ごとの、ＲＧＢ値等の色、又はグレースケールを示す情報を含む。 The image data acquisition unit 101 acquires image information of the multi-viewpoint image TG (see FIG. 3A) from the image data storage unit 107 . The multi-viewpoint image TG is an image of the object T captured from different viewpoints. The object T is an object that can be imaged and has an arbitrary three-dimensional shape. The image information of the multi-viewpoint image TG includes information indicating colors such as RGB values or gray scale for each pixel of the multi-viewpoint image TG.

画像データ取得部１０１は、多視点画像ＴＧのカメラパラメータを、スケール情報記憶部１０９から取得する。多視点画像ＴＧのカメラパラメータとは、多視点画像ＴＧの属性情報であって、いわゆるＥｘｉｆ（Exchangeable image file format）により示される情報である。例えば、カメラパラメータは、多視点画像ＴＧを撮像した際における、視点位置（撮像時のカメラの位置）、撮像方向、画角などを示す情報である。また、カメラパラメータには、多視点画像ＴＧを撮像した撮像装置（カメラ）に関する情報を含んでいてもよい。撮像装置に関する情報は、撮像装置の構成要素の仕様や撮像時の状態を示す情報であって、例えば、撮像時におけるレンズの焦点距離、シャッタースピード、露光状態、画像の分解能（ピクセル数）、レンズの歪曲収差係数などを示す情報である。 The image data acquisition unit 101 acquires camera parameters of the multi-viewpoint image TG from the scale information storage unit 109 . The camera parameters of the multi-viewpoint image TG are attribute information of the multi-viewpoint image TG, and are information indicated by a so-called Exif (Exchangeable image file format). For example, the camera parameters are information indicating viewpoint positions (camera positions at the time of imaging), imaging directions, angles of view, and the like when the multi-viewpoint images TG are captured. Also, the camera parameters may include information about the imaging device (camera) that captured the multi-viewpoint image TG. The information about the imaging device is information indicating the specifications of the constituent elements of the imaging device and the state at the time of imaging. This is information indicating the distortion aberration coefficient of .

画像データ取得部１０１は、複数の多視点画像ＴＧにおける画像情報、及びカメラパラメータを取得し、取得した情報を三次元形状生成部１０２に出力する。 The image data acquisition unit 101 acquires image information and camera parameters in multiple multi-viewpoint images TG, and outputs the acquired information to the three-dimensional shape generation unit 102 .

三次元形状生成部１０２は、対象物Ｔの三次元形状モデルＭを作成する。三次元形状生成部１０２は、まず、複数の多視点画像ＴＧの画像情報、及びカメラパラメータを用いて、ステレオマッチングの原理から複数の多視点画像ＴＧの各々のデプスマップを生成する。デプスマップは、画像の各画素の奥行き（デプス）を示す情報（マップ）である。 The three-dimensional shape generation unit 102 creates a three-dimensional shape model M of the target object T. FIG. The three-dimensional shape generation unit 102 first generates a depth map for each of the multi-viewpoint images TG from the principle of stereo matching, using image information of the multi-viewpoint images TG and camera parameters. A depth map is information (map) indicating the depth of each pixel of an image.

三次元形状生成部１０２は、多視点画像ＴＧの各々のデプスマップを統合して三次元点群を生成する。三次元点群は、対象物Ｔの三次元形状に対応する三次元点の集合である。三次元形状生成部１０２は、三次元点群を用いて、メッシュモデルを生成する。メッシュモデルは、対象物の三次元形状をポリゴン（多角形）の集合体として示す三次元形状モデルである。三次元形状生成部１０２は、例えば、メッシュ再構築（Poisson Surface Reconstruction）の手法を用いて、三次元点群からメッシュモデルを生成する。三次元形状生成部１０２は、生成したメッシュモデルを三次元形状モデルとする。 The three-dimensional shape generation unit 102 integrates each depth map of the multi-viewpoint image TG to generate a three-dimensional point group. A three-dimensional point group is a set of three-dimensional points corresponding to the three-dimensional shape of the object T. FIG. The three-dimensional shape generator 102 uses the three-dimensional point group to generate a mesh model. A mesh model is a three-dimensional shape model that represents the three-dimensional shape of an object as an aggregate of polygons. The three-dimensional shape generation unit 102 generates a mesh model from the three-dimensional point cloud using, for example, a method of mesh reconstruction (Poisson Surface Reconstruction). The three-dimensional shape generation unit 102 uses the generated mesh model as a three-dimensional shape model.

三次元形状生成部１０２は、生成した三次元点群、及びメッシュモデルに関する情報を、三次元形状記憶部１０８に記憶させる。三次元点群に関する情報には、三次元点群の各点の座標（三次元座標）を示す情報が含まれる。また、三次元点群に関する情報には、三次元点群の各点の色（例えば、ＲＧＢ値など）を示す情報が含まれてもよい。メッシュモデルに関する情報には、メッシュモデルを構成するポリゴン（多角形）の形状、座標（三次元座標）、色、テクスチャ等を示す情報が含まれる。 The three-dimensional shape generation unit 102 causes the three-dimensional shape storage unit 108 to store the generated three-dimensional point group and information on the mesh model. The information about the three-dimensional point group includes information indicating the coordinates (three-dimensional coordinates) of each point of the three-dimensional point group. The information about the 3D point group may also include information indicating the color of each point of the 3D point group (for example, RGB values). The information about the mesh model includes information indicating the shape, coordinates (three-dimensional coordinates), color, texture, etc. of the polygons that constitute the mesh model.

ピント領域検出部１０３は、複数の多視点画像ＴＧの各々における、ピントの合った領域を検出する。ピント領域検出部１０３は、例えば、機械学習的手法を用いて、複数の多視点画像ＴＧの各々におけるピントの合った領域を検出する。この場合、ピント領域検出部１０３は、学習済みモデルに多視点画像ＴＧを入力する。学習済みモデルは、例えば、畳み込みニューラルネットワーク（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｕｅｒａｌＮｅｔｗｏｒｋ、ＣＮＮ）の学習モデルに、学習用データセットを学習させることにより生成されたモデルである。学習用のデータセットは、入力と出力（入力に対する答え）が組み合わされた（セットになった）情報である。 The focus area detection unit 103 detects an in-focus area in each of the plurality of multi-viewpoint images TG. The focus area detection unit 103 uses, for example, a machine learning technique to detect an in-focus area in each of the plurality of multi-viewpoint images TG. In this case, the focus area detection unit 103 inputs the multi-viewpoint image TG to the trained model. A trained model is, for example, a model generated by causing a learning model of a convolutional neural network (CNN) to learn a learning data set. A training data set is information that combines (sets) inputs and outputs (answers to inputs).

ここでの、学習用データセットの入力は、学習用に用意した任意の対象物が撮像された画像であり、ピントが合っている部分と合っていない部分とが混在した画像である。学習用データセットの出力は、学習用の画像におけるピントが合っている部分と合っていない部分とを示す情報であり、例えば、画素ごとにピントが合っているか否かを示す情報が対応付けられたものである。学習用データセットの出力は、例えば、学習用データセットを作成する作業員により判断される。つまり、作業員が、画素ごとにピントが合っているか否かを判断し、学習用データセットの出力に設定する。 Here, the input of the learning data set is an image of an arbitrary object prepared for learning, and is an image containing a mixture of in-focus and out-of-focus portions. The output of the learning data set is information indicating in-focus and out-of-focus portions in the learning image. For example, information indicating whether or not each pixel is in focus is associated with the image. It is a thing. The output of the training data set is determined, for example, by a worker who creates the training data set. In other words, the operator determines whether or not each pixel is in focus, and sets it to the output of the learning data set.

このような学習用データセットを学習することにより、学習済みモデルは、入力された（未学習の）画像に対し、その画像における画素ごとのピントが合っている度合いを推定する（出力する）モデルとなる。 By learning such a training data set, the trained model estimates (outputs) the degree of focus for each pixel in the input (untrained) image. becomes.

ピント領域検出部１０３は、検出結果を示す情報（複数の多視点画像ＴＧの各々における、ピントの合った領域を示す情報）を、画像データ記憶部１０７に記憶させる。検出結果を示す情報は、例えば、多視点画像ＴＧの各々の画像の画素ごとに、ピントが合っている度合い（以下、ブラー量ともいう）が対応付けられた情報である。ブラー量は、例えば０から１までの実数値で表現され、０に近いとピントが合っていることを示し、１に近いとピントが合っていないことを示す。つまり、ピントが合っている方が、ブラー量が小さくなる。 The focus area detection unit 103 causes the image data storage unit 107 to store information indicating the detection result (information indicating the focused area in each of the plurality of multi-viewpoint images TG). The information indicating the detection result is, for example, information in which the degree of focus (hereinafter also referred to as blur amount) is associated with each pixel of each image of the multi-viewpoint image TG. The amount of blur is represented by a real number ranging from 0 to 1, for example, and when it is close to 0, it indicates that it is in focus, and when it is close to 1, it indicates that it is out of focus. That is, the more in focus, the smaller the amount of blur.

なお、上記では、ピント領域検出部１０３が機械学習の手法を用いて、ピントの合った領域を検出する場合を例示して説明したが、これに限定されない。ピント領域検出部１０３は、任意の手法を用いてピントの合った領域を検出してよい。例えば、ピント領域検出部１０３は、かめらのフォーカス機能により、フォーカスを合せた領域を、ピントの合った領域と判定するようにしてもよい。このフォーカスを合せた領域は、例えば、撮像時に撮像範囲を表示する背面ディスプレイに、撮像範囲の上に重ねられるようにして表示される、フォーカスされている領域を示す枠の内側の領域である。 In the above description, the focus area detection unit 103 uses a machine learning technique to detect an in-focus area, but the present invention is not limited to this. The focus area detection unit 103 may detect an in-focus area using any method. For example, the focus area detection unit 103 may determine a focused area as an in-focus area by the camera's focus function. This focused area is, for example, the area inside the frame indicating the focused area, which is displayed so as to be superimposed on the imaging range on the rear display that displays the imaging range at the time of imaging.

例えば、ピント領域検出部１０３は、画像処理を用いてピントの合った領域を検出してもよい。この場合、ピント領域検出部１０３は、画像における色の変化の度合いを、周波数解析により検出する。ピント領域検出部１０３は、画素ごとのＲＧＢ値について、その画素の近傍における局所領域についてフーリエ変換を行い、その局所領域における高周波数成分を抽出する。そして、ピント領域検出部１０３は、抽出した局所領域における高周波数成分が、所定の閾値以上であるか否かに応じてその局所領域のピントが合っているか否を判定する。ピント領域検出部１０３は、局所領域における高周波数成分のレベルが所定の閾値以上である場合にその局所領域のピントが合っていると判定し、局所領域における高周波数成分のレベルが所定の閾値未満である場合にその局所領域のピントが合っていないと判定する。 For example, the focus area detection unit 103 may detect an in-focus area using image processing. In this case, the focus area detection unit 103 detects the degree of color change in the image by frequency analysis. The focus area detection unit 103 performs Fourier transform on a local area in the vicinity of the pixel for the RGB values of each pixel, and extracts high frequency components in the local area. Then, the focus area detection unit 103 determines whether or not the local area is in focus depending on whether or not the high frequency component in the extracted local area is equal to or greater than a predetermined threshold. The focus area detection unit 103 determines that the local area is in focus when the level of the high frequency component in the local area is equal to or higher than a predetermined threshold, and the level of the high frequency component in the local area is less than the predetermined threshold. , it is determined that the local area is out of focus.

距離算出部１０４は、仮想距離を算出する。ここでの仮想距離は、所定の位置から、三次元形状モデルを構成する三次元点群の任意の点に対応する対象物Ｔの対応部分までの、仮想的な距離である。所定の位置は、三次元形状モデルを二次元平面に再投影させて生成される投影画像における、仮想的な視点位置である。 A distance calculation unit 104 calculates a virtual distance. The virtual distance here is a virtual distance from a predetermined position to a corresponding portion of the target object T corresponding to an arbitrary point in the three-dimensional point group forming the three-dimensional shape model. The predetermined position is a virtual viewpoint position in a projection image generated by reprojecting the 3D shape model onto a 2D plane.

仮想距離は、実際の距離（現実距離）を定数倍した値となる。これは、距離算出部１０４が、仮想距離を、三次元点群の各点の奥行値（デプス値）に応じて算出するためである。三次元点群の奥行値は、多視点画像ＴＧから各画像に撮像された対象物Ｔの相対的な位置関係に基づいて算出される値である。このため、現実の寸法に応じた位置関係ではなく、何らかの値を基準とした相対的な値となる。したがって、三次元点群の奥行値に応じて算出される距離は、現実距離と比例関係にある仮想的な距離となる。 The virtual distance is a value obtained by multiplying the actual distance (real distance) by a constant. This is because the distance calculation unit 104 calculates the virtual distance according to the depth value of each point of the three-dimensional point group. The depth value of the three-dimensional point group is a value calculated based on the relative positional relationship of the target object T captured in each image from the multi-viewpoint image TG. For this reason, it is a relative value based on some value rather than a positional relationship according to actual dimensions. Therefore, the distance calculated according to the depth value of the three-dimensional point cloud is a virtual distance proportional to the real distance.

距離算出部１０４は、複数の多視点画像ＴＧの各々における、ピントの合った領域に含まれる画素を用いて、仮想距離を算出する。ピントの合った領域とは、ピント領域検出部１０３により検出された領域である。 The distance calculation unit 104 calculates a virtual distance using pixels included in the focused region in each of the plurality of multi-viewpoint images TG. An in-focus area is an area detected by the focus area detection unit 103 .

距離算出部１０４は、例えば、ピント領域検出部１０３により検出された多視点画像ＴＧの画素に対応付けられたブラー量が、所定の閾値（例えば、０．１）未満である画素をピントの合った画素とする。距離算出部１０４は、ピントの合った画素の奥行値（デプス値）を取得する。画素の奥行値（デプス値）は、例えば、三次元形状生成部１０２により三次元形状モデルＭが生成される過程において算出される、画素ごとの奥行値（デプス値）そのものである。 For example, the distance calculation unit 104 determines that the pixels in which the amount of blur associated with the pixels of the multi-viewpoint image TG detected by the focus area detection unit 103 is less than a predetermined threshold value (for example, 0.1) are in focus. pixels. The distance calculation unit 104 acquires the depth value of the focused pixel. The depth value of a pixel is, for example, the depth value itself for each pixel calculated in the process of generating the three-dimensional shape model M by the three-dimensional shape generation unit 102 .

距離算出部１０４は、ピントの合った画素の奥行値（デプス値）の各々を、投影画像の各画素に対応させる。投影画像の各画素には、一つ又は複数の多視点画像ＴＧにおけるピントの合った画素の各々の奥行値が対応付けられる。距離算出部１０４は、投影画像の各画素に対応する複数の多視点画像ＴＧの画素を統合することにより、投影画像の各画素における仮想距離を算出する。 The distance calculation unit 104 associates each depth value (depth value) of the focused pixel with each pixel of the projection image. Each pixel of the projection image is associated with a depth value of each focused pixel in one or a plurality of multi-viewpoint images TG. The distance calculation unit 104 calculates a virtual distance in each pixel of the projection image by integrating pixels of the plurality of multi-viewpoint images TG corresponding to each pixel of the projection image.

一般に、カメラには、そのカメラパラメータに応じて決定される被写界深度（図４参照）が存在し、ピントが合う範囲が予め定められている。この被写界深度は、画像に撮像された被写体のピントが合った部分における、視点位置から当該部分までの実際の距離に依存する。このことから、被写界深度を利用すれば、現実距離を求めることが可能である。一方、上述したとおり、仮想距離はピントの合った領域における、所定の位置から対象物Ｔの対応部分までの仮想的な距離である。つまり、仮想距離は、被写界深度を定数倍した値ということができ、被写界深度を介して、仮想距離に対する現実距離の比（スケール値ＳＣ）を求めることが可能である。 In general, a camera has a depth of field (see FIG. 4) that is determined according to its camera parameters, and a range that is in focus is predetermined. This depth of field depends on the actual distance from the viewpoint position to the in-focus portion of the subject captured in the image. From this, it is possible to obtain the actual distance by using the depth of field. On the other hand, as described above, the virtual distance is the virtual distance from the predetermined position to the corresponding portion of the object T in the focused area. In other words, the virtual distance can be said to be a value obtained by multiplying the depth of field by a constant, and it is possible to obtain the ratio of the real distance to the virtual distance (scale value SC) via the depth of field.

しかしながら、被写界深度には、幅がある。このため、仮想距離の値は、画素ごとにばらつきが生じる（図５参照）。このばらつきが、仮想距離の真値に対する誤差となり、仮想距離の精度が劣化する要因となり得る。また、ピント領域検出部１０３は、学習済みモデルを用いてピントが合うか否かを推定している。このため、学習済みモデルに学習させる学習用データセットの内容によっては、推定の精度が不十分なものとなり得る。推定の精度が悪ければ、被写界深度の範囲外の（つまりピントが合っていない範囲）の画素について、ピントが合っているとする誤った推定が行われる可能性がある。仮想距離を求めた画素に、実際にはピントが合っていないにもかかわらず、誤った推定によりピントが合っているとみなされた画素が含まれていれば、その画素における仮想距離が、仮想距離の真値に対する誤差となる。 However, the depth of field has a width. Therefore, the value of the virtual distance varies from pixel to pixel (see FIG. 5). This variation becomes an error with respect to the true value of the virtual distance, and can be a factor in degrading the accuracy of the virtual distance. Also, the focus area detection unit 103 estimates whether or not the subject is in focus using the learned model. For this reason, depending on the content of the learning data set with which the trained model is made to learn, the accuracy of estimation may be insufficient. If the estimation accuracy is poor, pixels outside the depth-of-field range (ie, out-of-focus range) may be incorrectly estimated to be in-focus. If the pixels for which the virtual distance is calculated include pixels that are considered to be in focus due to erroneous estimation, even though they are actually out of focus, the virtual distance at that pixel is It becomes an error against the true value of the distance.

この対策として、本実施形態では、投影画像の各画素の仮想距離に、統計的な処理を行うことにより仮想距離の真値を探索する。 As a countermeasure, in this embodiment, the true value of the virtual distance is searched for by performing statistical processing on the virtual distance of each pixel of the projection image.

例えば、距離算出部１０４は、投影画像の各画素の仮想距離にＲＡＮＳＡＣ（RANdom SAmple Consensus）を適用することにより、仮想距離の真値を探索する。ＲＡＮＳＡＣでは、外れ値（アウトライア）、つまり誤差を含むデータ群に対し、ランダムに抽出したデータサンプルに最小二乗法を適用することを繰り返すことにより、外れ値を含まないデータを推定する手法である。距離算出部１０４は、被写界深度の範囲を、ＲＡＮＳＡＣにおけるインライア（誤差の範囲）として計算することにより、仮想距離の真値を探索する。 For example, the distance calculation unit 104 searches for the true value of the virtual distance by applying RANSAC (RANdom SAmple Consensus) to the virtual distance of each pixel of the projection image. RANSAC is a method of estimating data that does not contain outliers, that is, data groups that do not contain outliers by repeatedly applying the least-squares method to randomly extracted data samples. . The distance calculation unit 104 searches for the true value of the virtual distance by calculating the range of depth of field as an inlier (range of error) in RANSAC.

ただし、被写界深度が、被写体までの距離ｕ（図４参照）をパラメータとして算出される値である。これに対し、仮想距離は、現実距離に換算される前の仮想上の距離である。このため、距離算出部１０４は、ＲＡＮＳＡＣにおけるインライアとして、被写界深度を用いる際、被写体までの距離uとして仮の値（例えば、２００ｍｍなど）に設定する。距離算出部１０４は、投影画像の各画素の仮想距離にＲＡＮＳＡＣを適用することにより求められた距離を、仮想距離の真値とする。距離算出部１０４は、投影画像の各画素の仮想距離、及び仮想距離の真値を、三次元形状記憶部１０８に記憶させる。 However, the depth of field is a value calculated using the distance u (see FIG. 4) to the subject as a parameter. On the other hand, the virtual distance is a virtual distance before being converted into a real distance. Therefore, when using the depth of field as an inlier in RANSAC, the distance calculation unit 104 sets the distance u to the subject to a temporary value (for example, 200 mm). The distance calculation unit 104 sets the distance obtained by applying RANSAC to the virtual distance of each pixel of the projection image as the true value of the virtual distance. The distance calculation unit 104 causes the three-dimensional shape storage unit 108 to store the virtual distance of each pixel of the projection image and the true value of the virtual distance.

なお、上記では、距離算出部１０４がＲＡＮＳＡＣを用いて、仮想距離の真値を算出する場合を例示して説明したが、これに限定されない。距離算出部１０４は、少なくとも統計的な手法を用いて、ばらつきを含む仮想距離の集合から、最も確からしい仮想距離を算出すればよい。例えば、距離算出部１０４は、仮想距離の集合から代表値を導出し、導出した値を仮想距離の真値としてもよい。代表値は、仮想距離の集合から統計的手法により導出される任意の値であってよいが、例えば、単純加算平均値、重みづけ平均値、中央値、最大値、最小値等である。或いは、距離算出部１０４は、仮想距離の集合から取捨選択した仮想距離を用いて、仮想距離の真値を算出してもよい。この場合、例えば、三次元点群の同一の点に対応する複数の画素における仮想距離のばらつきが大きい場合には、仮想距離の真値を算出しないようにしてもよい。 In addition, although the case where the distance calculation part 104 calculates the true value of a virtual distance using RANSAC was illustrated and demonstrated above, it is not limited to this. The distance calculation unit 104 may calculate the most probable virtual distance from a set of virtual distances including variations using at least a statistical method. For example, the distance calculation unit 104 may derive a representative value from a set of virtual distances and use the derived value as the true value of the virtual distance. The representative value may be any value derived from a set of virtual distances by a statistical method, such as a simple addition average value, a weighted average value, a median value, a maximum value, a minimum value, and the like. Alternatively, the distance calculation unit 104 may calculate the true value of the virtual distance using virtual distances selected from a set of virtual distances. In this case, for example, when there is a large variation in the virtual distances among a plurality of pixels corresponding to the same point in the three-dimensional point group, the true value of the virtual distance may not be calculated.

スケール推定部１０５は、スケール値ＳＣを推定する。スケール値ＳＣは、仮想距離に対する現実距離である。スケール推定部１０５は、スケール値ＳＣを推定する際の仮想距離として、距離算出部１０４により算出された仮想距離の真値を用いる。スケール推定部１０５は、スケール値ＳＣを推定する際の現実距離を、カメラパラメータを用いて導出する。スケール推定部１０５は、多視点画像ＴＧのＥｘｉｆに、被写体までの距離ｕそのものが記載されている場合には、その情報を現実距離として用いる。或いは、スケール推定部１０５は、多視点画像ＴＧのＥｘｉｆに、被写界深度、焦点距離、レンズＦ値、許容錯乱円径が示されている場合には、図４の関係式に基づいて、被写体までの距離ｕを算出し、算出した値を現実距離とする。或いは、スケール推定部１０５は、カメラの表示機能として、撮像時に被写体までの距離ｕに関する情報が、カメラの背面ディスプレイに表示される場合には、その表示に応じた値を現実距離とするようにしてもよい。スケール推定部１０５は、スケール値ＳＣの推定に用いた現実距離、及び推定したスケール値ＳＣを、スケール情報記憶部１０９に記憶させる。 A scale estimation unit 105 estimates a scale value SC. The scale value SC is the real distance versus the virtual distance. The scale estimation unit 105 uses the true value of the virtual distance calculated by the distance calculation unit 104 as the virtual distance when estimating the scale value SC. The scale estimation unit 105 derives the actual distance when estimating the scale value SC using camera parameters. If the Exif of the multi-viewpoint image TG describes the distance u to the subject, the scale estimation unit 105 uses that information as the actual distance. Alternatively, when the Exif of the multi-viewpoint image TG indicates the depth of field, the focal length, the lens F value, and the permissible circle of confusion diameter, the scale estimating unit 105 calculates based on the relational expression in FIG. A distance u to the subject is calculated, and the calculated value is defined as the actual distance. Alternatively, if information regarding the distance u to the subject is displayed on the rear display of the camera as a display function of the camera, the scale estimation unit 105 sets the value corresponding to the display as the actual distance. may The scale estimation unit 105 causes the scale information storage unit 109 to store the actual distance used for estimating the scale value SC and the estimated scale value SC.

なお、処理を簡単にするため、三次元形状モデルＭの生成に用いた複数の多視点画像ＴＧにおけるカメラパラメータを統一（固定）するほうが望ましい。多視点画像ＴＧごとにカメラパラメータの内容が互いに異なる設定とする場合、カメラパラメータの内容ごとに、スケール値ＳＣ等が算出されるようにする。 In order to simplify the processing, it is desirable to unify (fix) the camera parameters in the multiple multi-viewpoint images TG used to generate the three-dimensional shape model M. FIG. When the content of the camera parameters is set differently for each multi-viewpoint image TG, the scale value SC and the like are calculated for each content of the camera parameters.

スケール変換部１０６は、仮想距離を現実距離に換算することにより、スケール変換を行う。スケール変換部１０６は、距離算出部１０４により算出された仮想距離に、スケール推定部１０５により推定されたスケール値ＳＣを乗算することにより、スケール変換を行う。また、スケール変換部１０６は、投影画像の各画素の三次元座標にスケール値ＳＣを乗算することにより、三次元形状を実際の寸法に応じた座標系に対応させる。これにより、三次元形状の実際の寸法を求めることができる。 A scale conversion unit 106 performs scale conversion by converting the virtual distance into a real distance. The scale conversion unit 106 performs scale conversion by multiplying the virtual distance calculated by the distance calculation unit 104 by the scale value SC estimated by the scale estimation unit 105 . Also, the scale conversion unit 106 multiplies the three-dimensional coordinates of each pixel of the projection image by the scale value SC, thereby making the three-dimensional shape correspond to the coordinate system according to the actual dimensions. This makes it possible to obtain the actual dimensions of the three-dimensional shape.

画像データ記憶部１０７は、多視点画像ＴＧに関する情報を記憶する。多視点画像ＴＧに関する情報には、多視点画像ＴＧの画素ごとに算出された奥行値を示す情報（デプス値）、およびピントが合っている度合いを示す情報（ブラー量）が含まれる。
三次元形状記憶部１０８は、三次元形状モデルに関する情報を記憶する。三次元形状モデルに関する情報には、投影画像の各画素における仮想距離、及び三次元形状モデルにおける仮想距離の真値を示す情報が含まれる。
スケール情報記憶部１０９は、スケール変換に関する情報を記憶する。 The image data storage unit 107 stores information regarding the multi-viewpoint images TG. The information about the multi-view image TG includes information (depth value) indicating the depth value calculated for each pixel of the multi-view image TG and information (blur amount) indicating the degree of focus.
The 3D shape storage unit 108 stores information about the 3D shape model. The information about the three-dimensional shape model includes virtual distances in each pixel of the projected image and information indicating the true value of the virtual distances in the three-dimensional shape model.
The scale information storage unit 109 stores information regarding scale conversion.

図２は、第１の実施形態に係るスケール情報記憶部１０９に記憶される情報（スケール情報）の構成の例を示す図である。例えば、スケール情報記憶部１０９は、多視点画像ＴＧごとに作成される。
図２に示すように、スケール変換に関する情報には、カメラパラメータ、及びスケール変換用パラメータ等の項目を有する。カメラパラメータには、カメラ及び撮像時の属性情報、例えばＥｘｉｆを示す情報が含まれる。カメラパラメータには、画像ＩＤ、カメラ機種、焦点距離、フォーカス、レンズＦ値、許容錯乱円径、被写界深度等を示す情報が含まれる。スケール変換用パラメータには、スケール推定部１０５により推定されたスケール値ＳＣ、及びスケール値ＳＣの推定に用いられた現実距離を示す情報が含まれる。 FIG. 2 is a diagram showing an example of the configuration of information (scale information) stored in the scale information storage unit 109 according to the first embodiment. For example, the scale information storage unit 109 is created for each multi-viewpoint image TG.
As shown in FIG. 2, the information about scale conversion includes items such as camera parameters and scale conversion parameters. The camera parameters include attribute information at the time of camera and imaging, such as information indicating Exif. The camera parameters include information indicating an image ID, camera model, focal length, focus, lens F value, permissible circle of confusion diameter, depth of field, and the like. The scale conversion parameters include information indicating the scale value SC estimated by the scale estimation unit 105 and the actual distance used for estimating the scale value SC.

図３Ａは、第１の実施形態に係る複数の多視点画像ＴＧの例を示す図である。ここでの対象物Ｔはウッドボードに載せられたパンである。図３Ａに示すように、多視点画像ＴＧは、対象物Ｔを互いに異なる視点から撮像した複数の画像から構成される。 FIG. 3A is a diagram showing an example of a plurality of multi-viewpoint images TG according to the first embodiment. The object T here is bread placed on a wooden board. As shown in FIG. 3A, the multi-viewpoint image TG is composed of a plurality of images of the object T captured from different viewpoints.

図３Ｂは、第１の実施形態に係る三次元形状モデルＭの例を示す図である。図３Ｂに示すように、多視点画像ＴＧから三次元形状が復元できる。この三次元形状モデルＭは、形状を復元しているが、実際の寸法は不明である。実際の寸法は、三次元形状モデルＭを拡大又は縮小した大きさとなるが、その具体的な係数は、不明である。 FIG. 3B is a diagram showing an example of the three-dimensional shape model M according to the first embodiment. As shown in FIG. 3B, a three-dimensional shape can be restored from the multi-viewpoint image TG. This three-dimensional shape model M has a restored shape, but the actual dimensions are unknown. The actual size is the size obtained by enlarging or reducing the three-dimensional shape model M, but the specific coefficient is unknown.

図３Ｃは、第１の実施形態に係る多視点画像ＴＧの例を示す図である。図３Ｃに示すように、多視点画像ＴＧの一部の撮像領域、例えば、図３Ｃに示す多視点画像ＴＧに撮像されたウッドボードの左側の端部、においてピントが合っていない。また、多視点画像ＴＧの他の一部の撮像領域、例えば、図３Ｃに示す多視点画像ＴＧに撮像されたパンの中心から右側の部分、においてピントが合っている。 FIG. 3C is a diagram showing an example of the multi-viewpoint image TG according to the first embodiment. As shown in FIG. 3C, a part of the imaging area of the multi-viewpoint image TG, for example, the left end of the wood board captured in the multi-viewpoint image TG shown in FIG. 3C, is out of focus. In addition, another part of the imaging area of the multi-viewpoint image TG, for example, the part on the right side of the center of the pan imaged in the multi-viewpoint image TG shown in FIG. 3C, is in focus.

図３Ｄは、第１の実施形態に係る多視点画像ＴＧのブラーマップＢＭの例を示す図である。ブラーマップは、画像における画素ごとにブラー量が対応付けられた画像（マップ）である。この例では、白に近づくにしたがい、ブラー量が大きい、つまりピントが合っていない度合いが高いことを示している。また、黒に近づくにしたがい、ブラー量が小さい、つまりピントが合っている度合いが高いことを示している。 FIG. 3D is a diagram showing an example of the blur map BM of the multi-viewpoint image TG according to the first embodiment. A blur map is an image (map) in which a blur amount is associated with each pixel in the image. In this example, the closer to white, the greater the amount of blur, that is, the greater the degree of out-of-focus. Also, the closer to black, the smaller the amount of blur, that is, the higher the degree of focus.

図４は、第１の実施形態に係る被写界深度の関数の例を示す図である。図４において、ＤｏＦ_ｆは前方被写界深度、ＤｏＦ_ｒは後方被写界深度、ＮはレンズＦ値、ｃは許容錯乱円径、ｆは焦点距離、ｕは被写体までの距離である。
図４に示すように、被写界深度は、被写体までの距離ｕを中心とする、前方被写界深度ＤｏＦ_ｆと後方被写界深度ＤｏＦ_ｒとの和により求められ、所定の幅をもつ値となる。前方被写界深度ＤｏＦ_ｆは、被写体までの距離ｕから視点位置に近づく方向においてピントがあう範囲である。後方被写界深度ＤｏＦ_ｒは、被写体までの距離ｕから視点位置から遠ざかる方向においてピントがあう範囲である。 FIG. 4 is a diagram showing an example of a function of depth of field according to the first embodiment. In FIG. 4, _DoFf is the front depth of field, _DoFr is the rear depth of field, N is the lens F value, c is the permissible circle of confusion diameter, f is the focal length, and u is the distance to the object.
As shown in FIG. 4, the depth of field is determined by the sum of the front depth of field DoF _f and the rear depth of field DoF _r centered on the distance u to the subject, and has a predetermined width. value. The front depth of field DoF _f is the range in focus in the direction from the distance u to the subject to the viewpoint position. The rear depth of field DoF _r is the range in focus in the direction away from the viewpoint position from the distance u to the subject.

図５は、第１の実施形態に係る投影画像の各画素の仮想距離の分布の例である。図５に示すように、投影画像の各画素の仮想距離にはばらつきが生じる。このような誤差が含まれる仮想距離の集合にＲＡＮＳＡＣ等の統計処理を適用することにより、確からしい仮想距離を算出する。これにより高精度の仮想距離を求めることができ、三次元形状モデルＭの実際の寸法を精度よく求めることが可能となる。 FIG. 5 is an example of the virtual distance distribution of each pixel of the projection image according to the first embodiment. As shown in FIG. 5, the virtual distance of each pixel of the projection image varies. Probable virtual distances are calculated by applying statistical processing such as RANSAC to a set of virtual distances including such errors. As a result, the virtual distance can be obtained with high accuracy, and the actual dimensions of the three-dimensional shape model M can be obtained with high accuracy.

図６は、第１の実施形態に係る三次元形状モデル生成装置１が行う処理の流れを示すフローチャートである。
ステップＳ１０１：
画像データ取得部１０１は、対象物Ｔの多視点画像ＴＧを取得する。
ステップＳ１０２：
三次元形状生成部１０２は、多視点画像ＴＧを用いて三次元形状モデルＭを生成する。
ステップＳ１０３：
ピント領域検出部１０３は、多視点画像ＴＧにおけるブラーマップを生成する。
ステップＳ１０４：
距離算出部１０４は、多視点画像ＴＧにおいてピントの合った領域について、その領域に含まれる画素ごとの仮想距離を算出する。
ステップＳ１０５：
距離算出部１０４は、三次元形状モデルＭの投影画像の各画素に、ステップＳ１０４で算出した多視点画像ＴＧの画素ごとの仮想距離を対応させることにより、投影画像の各画素の仮想距離を算出する。
ステップＳ１０６：
距離算出部１０４は、投影画像の各画素の仮想距離のばらつきに対し、統計的な処理を行うことにより、投影画像における最も確からしい仮想距離（仮想距離の真値）を算出する。
ステップＳ１０７：
スケール推定部１０５は、仮想距離に対する現実距離である、スケール値ＳＣを推定する。スケール推定部１０５は、例えば、カメラパラメータを用いて導出した現実距離、及びステップＳ１０６にて算出した仮想距離を用いて、スケール値ＳＣを算出する。
ステップＳ１０８：
スケール変換部１０６は、現実距離を算出する。スケール変換部１０６は、ステップＳ１０６にて算出した仮想距離に、ステップＳ１０７にて推定したスケール値ＳＣを乗算することにより、現実距離を算出する。 FIG. 6 is a flow chart showing the flow of processing performed by the three-dimensional geometric model generation device 1 according to the first embodiment.
Step S101:
The image data acquisition unit 101 acquires a multi-viewpoint image TG of the target object T. FIG.
Step S102:
The three-dimensional shape generation unit 102 generates a three-dimensional shape model M using the multi-viewpoint image TG.
Step S103:
The focus area detection unit 103 generates a blur map in the multi-viewpoint image TG.
Step S104:
The distance calculation unit 104 calculates a virtual distance for each pixel included in an in-focus area in the multi-viewpoint image TG.
Step S105:
The distance calculation unit 104 calculates the virtual distance of each pixel of the projection image by associating each pixel of the projection image of the three-dimensional shape model M with the virtual distance of each pixel of the multi-viewpoint image TG calculated in step S104. do.
Step S106:
The distance calculation unit 104 calculates the most probable virtual distance (true value of the virtual distance) in the projected image by performing statistical processing on variations in the virtual distance of each pixel of the projected image.
Step S107:
The scale estimator 105 estimates a scale value SC, which is the actual distance with respect to the virtual distance. The scale estimation unit 105 calculates the scale value SC using, for example, the real distance derived using the camera parameters and the virtual distance calculated in step S106.
Step S108:
A scale conversion unit 106 calculates the actual distance. The scale conversion unit 106 calculates the actual distance by multiplying the virtual distance calculated in step S106 by the scale value SC estimated in step S107.

なお、上述した実施形態では、図６のフローチャートに示すように、三次元形状モデルを生成（ステップＳ１０２）後に、多視点画像ＴＧにおけるブラーマップを生成する（ステップＳ１０３）を行う場合を例示して説明したが、これに限定されない。例えば、三次元形状生成部１０２は、ブラーの強度が小さい画像と比較して、画素ブラーの強度が強い画像の重みが小さくなるように設定することにより、多視点画像ＴＧのピントが合う領域のみを用いて三次元形状モデルを生成するようにしてもよい。この場合、投影画像の各画素が、多視点画像ＴＧにおけるピントが合った画素のみで構成される。この場合、ステップＳ１０５における、「三次元形状モデルＭの投影画像の各画素に、ステップＳ１０４で算出した多視点画像ＴＧの画素ごとの仮想距離を対応させる」処理を省略することができる。つまり、ステップＳ１０５では、投影画像の各画素のデプス値を、そのまま仮想距離とすることができる。 In the above-described embodiment, as shown in the flowchart of FIG. 6, the case where the blur map in the multi-viewpoint image TG is generated (step S103) after the three-dimensional shape model is generated (step S102) is exemplified. Illustrated, but not limited to. For example, the three-dimensional shape generation unit 102 sets the weight of an image with a high pixel blur intensity to be smaller than that of an image with a low blur intensity. may be used to generate a three-dimensional shape model. In this case, each pixel of the projected image is composed only of pixels in focus in the multi-viewpoint image TG. In this case, the process of "associating each pixel of the projected image of the three-dimensional shape model M with the virtual distance of each pixel of the multi-viewpoint image TG calculated in step S104" in step S105 can be omitted. That is, in step S105, the depth value of each pixel of the projection image can be directly used as the virtual distance.

以上説明したように、第１の実施形態に係る三次元形状モデル生成装置１は、三次元形状生成部１０２と、ピント領域検出部１０３と、距離算出部１０４と、スケール推定部１０５と、スケール変換部１０６とを備える。三次元形状生成部１０２は、対象物Ｔを、互いに異なる視点から撮像した複数の多視点画像ＴＧから、対象物Ｔの三次元形状モデルＭを生成する。ピント領域検出部１０３は、多視点画像ＴＧの各画像におけるピントが合った領域を検出する。距離算出部１０４は、多視点画像ＴＧの各画像におけるピントが合っている領域の画素について、三次元形状モデルＭを二次元平面に投影させた投影画像と視点位置との仮想的な距離である仮想距離を算出する。スケール推定部１０５は、スケール値ＳＣを推定する。スケール変換部１０６は、スケール値ＳＣを用いて、仮想距離を、実際の距離に変換する。これにより、第１の実施形態の三次元形状モデル生成装置１は、ピントが合っている領域が被写界深度の範囲にあることを利用して仮想距離の精度を高めることができる。このため、多視点画像ＴＧを用いた三次元復元手法において復元した三次元形状モデルＭの実際の寸法を、精度よく推定することができる。 As described above, the three-dimensional shape model generation device 1 according to the first embodiment includes the three-dimensional shape generation unit 102, the focus area detection unit 103, the distance calculation unit 104, the scale estimation unit 105, the scale and a conversion unit 106 . The three-dimensional shape generation unit 102 generates a three-dimensional shape model M of the target object T from a plurality of multi-viewpoint images TG obtained by imaging the target object T from different viewpoints. The focus area detection unit 103 detects an in-focus area in each image of the multi-viewpoint image TG. The distance calculation unit 104 calculates a virtual distance between a projection image obtained by projecting the three-dimensional shape model M onto a two-dimensional plane and the viewpoint position, for pixels in an in-focus area in each image of the multi-viewpoint image TG. Calculate the virtual distance. A scale estimation unit 105 estimates a scale value SC. A scale conversion unit 106 converts the virtual distance into an actual distance using the scale value SC. As a result, the three-dimensional shape model generation device 1 of the first embodiment can improve the accuracy of the virtual distance by utilizing the fact that the focused region is within the depth of field. Therefore, it is possible to accurately estimate the actual dimensions of the three-dimensional shape model M restored by the three-dimensional restoration method using the multi-viewpoint images TG.

また、第１の実施形態の三次元形状モデル生成装置１では、ピント領域検出部１０３は、多視点画像ＴＧの各画像において、多視点画像ＴＧを撮像したカメラのフォーカス機能を用いてフォーカスを合わせた領域に応じて、多視点画像ＴＧの各画像におけるピントの合った領域を検出するようにしてもよい。これにより、ピントの合った領域を容易に検出することができる。 Further, in the three-dimensional shape model generation device 1 of the first embodiment, the focus area detection unit 103 focuses each image of the multi-viewpoint image TG using the focus function of the camera that captured the multi-viewpoint image TG. The in-focus area in each image of the multi-viewpoint image TG may be detected according to the area obtained. This makes it possible to easily detect an in-focus area.

また、第１の実施形態の三次元形状モデル生成装置１では、ピント領域検出部１０３は、画像処理により、多視点画像ＴＧの各画像におけるピントの合った領域を検出するようにしてもよい。これにより、フォーカスを合せた領域が不明である場合でも、ピントの合った領域を検出することができる。 Further, in the three-dimensional geometric model generation device 1 of the first embodiment, the focus area detection unit 103 may detect an in-focus area in each image of the multi-viewpoint images TG by image processing. This makes it possible to detect the focused area even if the focused area is unknown.

また、第１の実施形態の三次元形状モデル生成装置１では、ピント領域検出部１０３は、入力と出力とが対応づけられた学習用データセットを用いて機械学習を行うことにより生成された学習済みモデルを用いて、多視点画像ＴＧの各画像におけるピントの合った領域を検出するようにしてもよい。この場合、学習用データセットの入力は、学習用の入力画像である。学習用データセットの出力は、学習用の入力画像におけるピントの合った領域を示す情報である。これにより、複雑な画像処理を行わなくとも、ピントの合った領域を検出することができる。 Further, in the three-dimensional shape model generation device 1 of the first embodiment, the focus area detection unit 103 performs machine learning using a learning data set in which inputs and outputs are associated with each other. An in-focus region in each image of the multi-viewpoint image TG may be detected using the finished model. In this case, the input of the training data set is the input image for training. The output of the learning data set is information indicating the in-focus region in the input image for learning. As a result, an in-focus area can be detected without performing complicated image processing.

また、第１の実施形態の三次元形状モデル生成装置１では、距離算出部１０４は、ピント領域検出部１０３によりピントが合っていると判定された画素に対応する対応画素であって、三次元形状モデルＭを二次元平面に投影させた投影画像の対応画素におけるデプス値を、投影画像の対応画素における仮想距離とするようにしてもよい。これにより、三次元形状モデルＭを生成する過程において求めたデプス値から、容易に仮想距離を求めることができる。 Further, in the three-dimensional shape model generation device 1 of the first embodiment, the distance calculation unit 104 uses the corresponding pixels corresponding to the pixels determined to be in focus by the focus area detection unit 103, The depth value at the corresponding pixel of the projection image obtained by projecting the shape model M onto the two-dimensional plane may be used as the virtual distance at the corresponding pixel of the projection image. Thereby, the virtual distance can be easily obtained from the depth value obtained in the process of generating the three-dimensional shape model M. FIG.

また、第１の実施形態の三次元形状モデル生成装置１では、スケール推定部１０５は、多視点画像ＴＧを撮像したカメラのカメラパラメータから算出される被写界深度に基づき、現実距離を導出するようにしてもよい。これにより、現実距離を、容易に導出することができる。 Further, in the three-dimensional geometric model generation device 1 of the first embodiment, the scale estimation unit 105 derives the actual distance based on the depth of field calculated from the camera parameters of the camera that captured the multi-viewpoint image TG. You may do so. This makes it possible to easily derive the actual distance.

また、第１の実施形態の三次元形状モデル生成装置１では、スケール推定部１０５は、多視点画像ＴＧを撮像したカメラのフォーカス機能から得られる距離に基づき、現実距離を導出するようにしてもよい。これにより、現実距離を、容易に導出することができる。 Further, in the three-dimensional geometric model generation device 1 of the first embodiment, the scale estimation unit 105 may derive the actual distance based on the distance obtained from the focus function of the camera that captured the multi-viewpoint image TG. good. This makes it possible to easily derive the actual distance.

また、第１の実施形態の三次元形状モデル生成装置１では、距離算出部１０４は、検出されたピントが合っている領域に含まれる画素のデプス値に応じた距離を、投影画像に対応する点に対応させることにより、投影画像の画素ごとの仮想距離を算出するようにしてもよい。これにより、現実距離を、容易に導出することができる。 Further, in the three-dimensional geometric model generation device 1 of the first embodiment, the distance calculation unit 104 assigns the distance corresponding to the depth value of the pixels included in the detected in-focus region to the projected image. A virtual distance may be calculated for each pixel of the projected image by associating the points. This makes it possible to easily derive the actual distance.

また、第１の実施形態の三次元形状モデル生成装置１では、距離算出部１０４は、検出されたピントが合っている領域に含まれる画素のデプス値に応じた距離を、投影画像に対応する点に対応させることにより、投影画像の画素ごとの仮想距離を算出し、算出した画素ごとの仮想距離を、統計的手法を用いて統合することにより、投影画像の仮想距離を算出するようにしてもよい。これにより、投影画像の画素ごとの仮想距離に、誤差がある場合であっても、統計的手法を用いて統合することにより誤差を低減させた、より確からしい仮想距離を算出することが可能である。 Further, in the three-dimensional geometric model generation device 1 of the first embodiment, the distance calculation unit 104 assigns the distance corresponding to the depth value of the pixels included in the detected in-focus region to the projected image. A virtual distance for each pixel of the projected image is calculated by matching the points, and the virtual distance for the projected image is calculated by integrating the calculated virtual distances for each pixel using a statistical method. good too. As a result, even if there is an error in the virtual distance for each pixel of the projection image, it is possible to calculate a more probable virtual distance by reducing the error by integrating using a statistical method. be.

（第１の実施形態の変形例１）
次に、第１の実施形態の変形例１について説明する。本変形例では、ピント領域検出部１０３によりピントが合っていると判定された画素が、エッジであるか否かに応じて、仮想距離を算出する点において、上述した実施形態と相違する。 (Modification 1 of the first embodiment)
Next, Modification 1 of the first embodiment will be described. This modification differs from the above-described embodiment in that the virtual distance is calculated depending on whether the pixel determined to be in focus by the focus area detection unit 103 is an edge.

一般に、対象物Ｔのエッジに対応する画素のデプス値には、誤差が多く含まれる傾向にある。このため、画素が対象物Ｔのエッジに対応するか否かに応じて、仮想距離を算出すれば、算出する仮想距離の精度を向上させることが可能である。 In general, depth values of pixels corresponding to edges of the object T tend to include many errors. Therefore, if the virtual distance is calculated according to whether or not the pixel corresponds to the edge of the object T, it is possible to improve the accuracy of the calculated virtual distance.

例えば、距離算出部１０４は、エッジに該当する画素のデプス値を仮想距離の算出に用いない、或いは、エッジに該当する画素のデプス値に乗算する重みづけを、他の画素と比較して小さい値に設定する。距離算出部１０４は、例えば、多視点画像ＴＧにおいてピントが合っている領域の画素がエッジであるか否かを判定する。距離算出部１０４は、エッジであるか否かを、例えば、Ｃａｎｎｙ法によるエッジ検出の手法を用いて検出する。距離算出部１０４は、複数の多視点画像ＴＧの各々における、ピントの合った領域に含まれる画素のうち、エッジでないと判定された画素のデプス値のみを投影画像の画素に対応させることにより、投影画像の各画素の仮想距離を決定する。距離算出部１０４は、投影画像の各画素における仮想距離のそれぞれに対し、統計的な処理を行うことにより、投影画像における最も確からしい仮想距離（仮想距離の真値）を算出する。
或いは、距離算出部１０４は、投影画像における対応画素がエッジであるか否かを判定し、対応画素がエッジでないと判定された画素のみを用いて、投影画像における最も確からしい仮想距離（仮想距離の真値）を算出するようにしてもよい。 For example, the distance calculation unit 104 does not use the depth value of the pixel corresponding to the edge for calculating the virtual distance, or sets the weight by which the depth value of the pixel corresponding to the edge is multiplied to be smaller than that of the other pixels. set to a value. For example, the distance calculation unit 104 determines whether or not the pixels in the in-focus region in the multi-viewpoint image TG are edges. The distance calculation unit 104 detects whether or not there is an edge using, for example, an edge detection technique based on the Canny method. The distance calculation unit 104 associates only the depth values of the pixels that are determined not to be edges among the pixels included in the in-focus region in each of the plurality of multi-viewpoint images TG with the pixels of the projection image. Determine the virtual distance for each pixel in the projection image. The distance calculation unit 104 calculates the most probable virtual distance (true value of the virtual distance) in the projection image by performing statistical processing on each virtual distance in each pixel of the projection image.
Alternatively, the distance calculation unit 104 determines whether or not the corresponding pixel in the projected image is an edge, and uses only the pixels for which the corresponding pixel is determined not to be an edge to calculate the most probable virtual distance (virtual distance (true value) may be calculated.

以上説明したように、第１の実施形態の変形例に係る三次元形状モデル生成装置１では、距離算出部１０４は、ピント領域検出部１０３によりピントが合っていると判定された画素が、エッジであるか否かを画像処理により判定した判定結果に基づき、仮想距離を算出する。或いは、距離算出部１０４は、投影画像における対応画素がエッジであるか否かを判定し、対応画素がエッジでないと判定された画素のみを用いて、投影画像における最も確からしい仮想距離（仮想距離の真値）を算出するようにしてもよい。すなわち、距離算出部１０４は、多視点画像ＴＧにおいてピントが合っている領域の画素がエッジであるか否かの判定結果、及び投影画像における対応画素がエッジであるか否かの判定結果の少なくとも一方の判定結果を用いて、投影画像における最も確からしい仮想距離（仮想距離の真値）を算出する。これにより、誤差が含まれる可能性が高いデプス値が、仮想距離の計算に与える影響を低減させることが可能である。 As described above, in the three-dimensional geometric model generation device 1 according to the modification of the first embodiment, the distance calculation unit 104 causes the pixels determined to be in focus by the focus area detection unit 103 to be edge The virtual distance is calculated based on the result of determination by image processing as to whether or not. Alternatively, the distance calculation unit 104 determines whether or not the corresponding pixel in the projected image is an edge, and uses only the pixels for which the corresponding pixel is determined not to be an edge to calculate the most probable virtual distance (virtual distance (true value) may be calculated. That is, the distance calculation unit 104 calculates at least the determination result of whether or not the pixel in the in-focus region in the multi-viewpoint image TG is an edge and the determination result of whether or not the corresponding pixel in the projection image is an edge. Using one determination result, the most probable virtual distance (true value of virtual distance) in the projected image is calculated. This makes it possible to reduce the influence of the depth value, which is highly likely to contain an error, on the calculation of the virtual distance.

（第１の実施形態の変形例２）
次に、第１の実施形態の変形例２について説明する。本変形例では、ピント領域検出部１０３によりピントが合っていると判定された画素のデプス値に重みづけをする点において、上述した実施形態と相違する。 (Modification 2 of the first embodiment)
Next, Modification 2 of the first embodiment will be described. This modification differs from the above-described embodiment in that the depth value of the pixel determined to be in focus by the focus area detection unit 103 is weighted.

距離算出部１０４は、ピントが合っている領域に含まれる画素のデプス値に応じた距離を、ピントが合った度合いに応じて重みづけする。距離算出部１０４は、よりピントが合っている（ピントが合った度合いが大きい）場合に重みづけの乗算値が大きくなるように、重みづけを設定する。距離算出部１０４は、よりピントが合っていない（ピントが合った度合いが小さい）場合に重みづけの乗算値が小さくなるように、重みづけを設定する。距離算出部１０４は、重みづけした距離を、投影画像に対応する点に対応させることにより、投影画像の画素ごとの仮想距離を算出する。 The distance calculation unit 104 weights the distance according to the depth value of the pixels included in the in-focus area according to the degree of in-focus. The distance calculation unit 104 sets the weight so that the multiplication value of the weight increases when the subject is more in focus (the degree of focus is greater). The distance calculation unit 104 sets the weighting so that the multiplication value of the weighting becomes smaller when the subject is less focused (the degree of focusing is smaller). The distance calculation unit 104 calculates a virtual distance for each pixel of the projection image by associating the weighted distance with a point corresponding to the projection image.

また、距離算出部１０４は、ピントが合っている領域に含まれる画素のデプス値に応じた距離を、その距離の大きさに応じて重みづけするようにしてもよい。距離算出部１０４は、距離が小さい（カメラの視点位置に近い）場合に重みづけの乗算値が大きくなるように、重みづけを設定する。距離算出部１０４は、距離が大きい（カメラの視点位置から遠い）場合に重みづけの乗算値が小さくなるように、重みづけを設定する。距離算出部１０４は、重みづけした距離を、投影画像に対応する点に対応させることにより、投影画像の画素ごとの仮想距離を算出する。 Further, the distance calculation unit 104 may weight the distance according to the depth value of the pixels included in the in-focus area according to the magnitude of the distance. The distance calculation unit 104 sets the weight so that the multiplication value of the weight becomes large when the distance is small (close to the viewpoint position of the camera). The distance calculation unit 104 sets the weight so that the multiplication value of the weight becomes small when the distance is large (far from the viewpoint position of the camera). The distance calculation unit 104 calculates a virtual distance for each pixel of the projection image by associating the weighted distance with a point corresponding to the projection image.

以上説明したように、第１の実施形態の変形例２に係る三次元形状モデル生成装置１では、ピント領域検出部１０３は、画素ごとにピントが合った度合いを検出し、距離算出部１０４は、ピントが合っている領域に含まれる画素のデプス値に応じた距離を、ピントが合った度合いに応じて重みづけする。これにより、ピントが合った度合いを仮想距離の算出に反映させることができ、より精度よく仮想距離を算出することが可能となる。 As described above, in the three-dimensional shape model generation device 1 according to the second modification of the first embodiment, the focus area detection unit 103 detects the degree of focus for each pixel, and the distance calculation unit 104 , the distance corresponding to the depth value of the pixels included in the in-focus area is weighted according to the degree of in-focus. As a result, the degree of focus can be reflected in the calculation of the virtual distance, making it possible to calculate the virtual distance with higher accuracy.

また、第１の実施形態の変形例２に係る三次元形状モデル生成装置１では、距離算出部１０４は、ピントが合っている領域に含まれる画素のデプス値に応じた距離を、当該距離の大きさに応じて重みづけする。一般に、カメラの視点位置から遠ざかるにしたがって、被写体までの距離ｕに含まれる誤差が大きくなる傾向にある。このため、視点位置からの距離に応じた重みづけを行うことにより、より精度よく仮想距離を算出することが可能となる。 Further, in the three-dimensional geometric model generation device 1 according to Modification 2 of the first embodiment, the distance calculation unit 104 calculates the distance according to the depth value of the pixels included in the in-focus region. Weighted according to size. In general, the error included in the distance u to the subject tends to increase as the distance from the viewpoint position of the camera increases. Therefore, by weighting according to the distance from the viewpoint position, it is possible to calculate the virtual distance with higher accuracy.

（第１の実施形態の変形例３）
次に、第１の実施形態の変形例３について説明する。本実施形態では、投影画像の１つの画素に対応する、複数の多視点画像ＴＧの画素の奥行値のばらつきを考慮して、投影画像の仮想距離を算出する点において、上述した実施形態と相違する。 (Modification 3 of the first embodiment)
Next, Modification 3 of the first embodiment will be described. The present embodiment differs from the above-described embodiments in that the virtual distance of the projected image is calculated in consideration of variations in the depth values of pixels of a plurality of multi-viewpoint images TG corresponding to one pixel of the projected image. do.

例えば、距離算出部１０４は、投影画像の画素ごとに、その画素に対応する多視点画像ＴＧの画素を抽出する。距離算出部１０４は、投影画像の一つの画素に対応する多視点画像ＴＧの画素が複数ある場合、その複数の画素における奥行値のばらつきの度合いを算出する。ばらつきの度合いの算出には、分散など、任意の統計的手法が用いられてよい。距離算出部１０４は、多視点画像ＴＧの画素における奥行値のばらつきの度合いが所定の閾値より大きい場合、投影画像におけるその画素の仮想距離を、投影画像の仮想距離（仮想距離の真値）の算出に用いない。一方、距離算出部１０４は、奥行値のばらつきの度合いが所定の閾値より小さい場合、投影画像におけるその画素の仮想距離を、投影画像の仮想距離（仮想距離の真値）の算出に用いるようにする。 For example, the distance calculation unit 104 extracts a pixel of the multi-viewpoint image TG corresponding to each pixel of the projection image. When there are a plurality of pixels of the multi-viewpoint image TG corresponding to one pixel of the projection image, the distance calculation unit 104 calculates the degree of variation in the depth values of the plurality of pixels. Any statistical method such as variance may be used to calculate the degree of variability. When the degree of variation in the depth value of a pixel of the multi-view image TG is greater than a predetermined threshold, the distance calculation unit 104 calculates the virtual distance of that pixel in the projection image as the virtual distance of the projection image (the true value of the virtual distance). Not used for calculation. On the other hand, when the degree of variation in depth values is smaller than a predetermined threshold, the distance calculation unit 104 uses the virtual distance of the pixel in the projected image to calculate the virtual distance of the projected image (the true value of the virtual distance). do.

以上説明したように、第１の実施形態の変形例３に係る三次元形状モデル生成装置１では、距離算出部１０４は、投影画像の画素に対応する複数の多視点画像ＴＧの画素における奥行値のばらつきの度合いに応じて、投影画像におけるその画素の仮想距離を、投影画像の仮想距離の算出に用いるか否かを判定する。これにより、画素ごとの奥行値のばらつきが大きい場合には、投影画像の仮想距離の算出に用いないようにすることができ、より精度よく投影画像の仮想距離を算出することが可能である。 As described above, in the 3D geometric model generation device 1 according to the third modification of the first embodiment, the distance calculation unit 104 calculates the depth values of the pixels of the multiple viewpoint images TG corresponding to the pixels of the projection image. It is determined whether or not to use the virtual distance of the pixel in the projection image for calculating the virtual distance of the projection image, depending on the degree of variation in . As a result, when the variation in the depth value for each pixel is large, it is possible not to use it for calculating the virtual distance of the projected image, and it is possible to calculate the virtual distance of the projected image with higher accuracy.

（第２の実施形態）
次に、第２の実施形態について説明する。以下の説明においては、上述した実施形態と異なる部分についてのみ説明し、同じ部分については同等の符号を付してその説明を省略する。 (Second embodiment)
Next, a second embodiment will be described. In the following description, only parts different from the above-described embodiment will be described, and the same parts will be denoted by the same reference numerals, and the description thereof will be omitted.

本実施形態においては、スケール値ＳＣを推定する際に用いる現実距離を、キャリブレーションにより求める点において、上述した実施形態と相違する。現実距離を、キャリブレーションにより求めることにより、より精度よく現実距離を求めることができ、三次元形状モデルＭの実際の寸法を、さらに精度よく推定することが可能となる。 This embodiment differs from the above-described embodiments in that the actual distance used when estimating the scale value SC is obtained by calibration. By obtaining the actual distance through calibration, the actual distance can be obtained with higher accuracy, and the actual dimensions of the three-dimensional shape model M can be estimated with even higher accuracy.

図７は、第２の実施形態に係る三次元形状モデル生成装置１Ａの構成の例を示すブロック図である。三次元形状モデル生成装置１Ａは、マーカスケール推定部１１０を備える。
マーカスケール推定部１１０は、マーカ三次元形状モデルＭＭの投影画像におけるスケール値ＳＣを推定する。マーカ三次元形状モデルＭＭは、マーカが付された対象物Ｔの三次元形状モデルである。マーカは、実際の寸法が既知の印である。 FIG. 7 is a block diagram showing an example of the configuration of a three-dimensional geometric model generation device 1A according to the second embodiment. The 3D geometric model generating device 1A includes a marker scale estimating section 110 .
The marker scale estimation unit 110 estimates the scale value SC in the projected image of the marker three-dimensional shape model MM. The marker three-dimensional shape model MM is a three-dimensional shape model of the object T to which markers are attached. A marker is a mark whose actual dimensions are known.

マーカスケール推定部１１０は、第１の実施形態における、スケール推定部１０５がおこなう処理と同様の処理にて、マーカ三次元形状モデルＭＭ（図８参照）の投影画像におけるスケール値ＳＣを推定する。ただし、マーカ多視点画像ＭＴＧでは、画像に撮像されたマーカを手掛かりとして実際の寸法が判る。この実際の寸法に基づいて、仮想距離ではなく、現実距離を求める。 The marker scale estimation unit 110 estimates the scale value SC in the projected image of the marker three-dimensional shape model MM (see FIG. 8) by the same processing as the processing performed by the scale estimation unit 105 in the first embodiment. However, in the marker multi-viewpoint image MTG, the actual size can be known by using the marker imaged in the image as a clue. Based on this actual dimension, the real distance is determined rather than the virtual distance.

具体的に、画像データ取得部１０１は、マーカが付された対象物Ｔの多視点画像であるマーカ多視点画像ＭＴＧを取得する。
三次元形状生成部１０２は、マーカ多視点画像ＭＴＧを用いてマーカ三次元形状モデルＭＭを生成する。
ピント領域検出部１０３は、マーカ多視点画像ＭＴＧにおけるブラーマップを生成する。
距離算出部１０４は、マーカ多視点画像ＭＴＧにおいてピントの合った領域について、その領域に含まれる画素ごとの現実距離を算出する。
距離算出部１０４は、マーカ三次元形状モデルＭＭの投影画像の各画素に、上記で算出した画素ごとの現実距離を対応させることにより、投影画像の各画素の現実距離を算出する。
距離算出部１０４は、投影画像の各画素の現実距離のばらつきに対し、統計的な処理を行うことにより、投影画像における最も確からしい現実距離（現実距離の真値）を算出する。
マーカスケール推定部１１０は、距離算出部１０４により算出された現実距離（現実距離の真値）、および、第１の実施形態において距離算出部１０４により算出された仮想距離（仮想距離の真値）を用いて、スケール値ＳＣを推定する。 Specifically, the image data acquiring unit 101 acquires a marker multi-viewpoint image MTG, which is a multi-viewpoint image of the target object T to which markers are attached.
The three-dimensional shape generation unit 102 uses the marker multi-viewpoint images MTG to generate the marker three-dimensional shape model MM.
The focus area detection unit 103 generates a blur map in the marker multi-viewpoint image MTG.
The distance calculation unit 104 calculates the actual distance for each pixel included in the focused area in the marker multi-viewpoint image MTG.
The distance calculation unit 104 calculates the actual distance of each pixel of the projection image by associating each pixel of the projection image of the marker three-dimensional shape model MM with the actual distance of each pixel calculated above.
The distance calculation unit 104 calculates the most probable real distance (true value of the real distance) in the projection image by performing statistical processing on variations in the real distance of each pixel of the projection image.
The marker scale estimation unit 110 calculates the real distance (true value of the real distance) calculated by the distance calculation unit 104 and the virtual distance (true value of the virtual distance) calculated by the distance calculation unit 104 in the first embodiment. is used to estimate the scale value SC.

以上説明したように、第２の実施形態の三次元形状モデル生成装置１Ａでは、マーカスケール推定部１１０を備える。マーカスケール推定部１１０は、実際の寸法が既知であるマーカの三次元形状モデルＭを、実際の寸法に基づいてスケール補正した、補正済み三次元形状モデルに基づいて、現実距離を導出する。スケール推定部１０５は、マーカスケール推定部１１０により推定された現実距離を用いて、スケール値ＳＣを推定する。これにより、より精度が高い現実距離を用いて、スケール値ＳＣを推定することが可能である。 As described above, the 3D geometric model generating device 1A of the second embodiment includes the marker scale estimating section 110 . The marker scale estimator 110 derives the actual distance based on the corrected three-dimensional shape model obtained by performing scale correction on the three-dimensional shape model M of the marker whose actual dimensions are known based on the actual dimensions. Scale estimation section 105 estimates scale value SC using the actual distance estimated by marker scale estimation section 110 . This makes it possible to estimate the scale value SC using the more accurate real distance.

（第２の実施形態の変形例１）
次に、第２の実施形態の変形例１について説明する。本変形例では、学習済みモデルに学習させる学習用データセットの内容が、上述した実施形態と相違する。 (Modification 1 of the second embodiment)
Next, Modification 1 of the second embodiment will be described. In this modified example, the contents of the learning data set that the trained model learns are different from those in the above-described embodiment.

本変形例では、学習用データセットの入力を、補正済みの投影画像とする。補正済みの投影画像は、補正済みのマーカ三次元形状モデルＭＭを、二次元平面に投影させた画像である。補正済みのマーカ三次元形状モデルＭＭとは、マーカが付された対象物Ｔの多視点画像であるマーカ多視点画像ＭＴＧを用いて生成された、マーカ三次元形状モデルＭＭを、マーカの実際の値に応じて拡大又は縮小させることにより、マーカが付された対象物Ｔの実際の寸法に補正したモデルである。 In this modified example, the input of the learning data set is assumed to be a corrected projected image. The corrected projection image is an image obtained by projecting the corrected three-dimensional marker model MM onto a two-dimensional plane. The corrected marker three-dimensional shape model MM refers to the marker three-dimensional shape model MM generated using the marker multi-view image MTG, which is a multi-view image of the target object T to which the marker is attached. It is a model corrected to the actual size of the target object T with markers by enlarging or reducing it according to the value.

また、学習用データセットの出力を、補正済みの投影画像における画素ごとのデプス値に基づいて判定した、補正済みの投影画像におけるピントの合った領域を示す情報とする。 Also, the output of the learning data set is information indicating an in-focus region in the corrected projected image determined based on the depth value for each pixel in the corrected projected image.

以上説明したように、第２の実施形態の変形例１に係る三次元形状モデル生成装置１Ａでは、学習用データセットの入力は、実際の寸法が既知であるマーカの三次元形状モデルを、マーカの実際の寸法に基づいてスケール補正した補正済み三次元形状モデルに基づいて、補正済み三次元形状モデルを二次元平面に投影させた、補正済みの投影画像であり、学習用データセットの出力は、補正済みの投影画像における画素ごとのデプス値に基づいて判定した、投影画像におけるピントの合った領域を示す情報である。これにより、第２の実施形態の変形例１に係る三次元形状モデル生成装置１Ａでは、補正済みの投影画像（すなわち、実際の寸法の情報をもつ投影画像）と、その補正済みの投影画像の画素ごとのデプス値（すなわち、実際の被写体までの距離）に応じて、ピントが合っているか否を学習させることができる。これにより、学習済みモデルを、被写界深度に応じてピントが合っているか否かを推測するモデルとすることができる。学習済みモデルが、被写界深度に応じてピントが合っているか否かを推測することにより、仮想距離を被写界深度の範囲内に収めて、仮想距離に含まれる誤差を低減させることが可能である。 As described above, in the three-dimensional shape model generation device 1A according to the first modification of the second embodiment, the input of the learning data set is the three-dimensional shape model of the marker whose actual dimensions are known. Based on the corrected 3D shape model scale-corrected based on the actual dimensions of , the corrected 3D shape model is projected onto a 2D plane, and the output of the training dataset is , is information indicating an in-focus region in the projection image determined based on the depth value for each pixel in the corrected projection image. As a result, in the three-dimensional geometric model generation device 1A according to the modification 1 of the second embodiment, the corrected projection image (that is, the projection image having the information of the actual dimensions) and the corrected projection image It is possible to learn whether or not the object is in focus according to the depth value for each pixel (that is, the distance to the actual object). As a result, the learned model can be used as a model for estimating whether or not the subject is in focus according to the depth of field. By estimating whether or not the object is in focus according to the depth of field, the trained model can keep the virtual distance within the range of the depth of field and reduce the error included in the virtual distance. It is possible.

上述した実施形態における三次元形状モデル生成装置１の全部または一部をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＦＰＧＡ等のプログラマブルロジックデバイスを用いて実現されるものであってもよい。 All or part of the three-dimensional geometric model generation device 1 in the above-described embodiment may be realized by a computer. In that case, a program for realizing this function may be recorded in a computer-readable recording medium, and the program recorded in this recording medium may be read into a computer system and executed. It should be noted that the "computer system" referred to here includes hardware such as an OS and peripheral devices. The term "computer-readable recording medium" refers to portable media such as flexible discs, magneto-optical discs, ROMs and CD-ROMs, and storage devices such as hard discs incorporated in computer systems. Furthermore, "computer-readable recording medium" refers to a program that dynamically retains programs for a short period of time, like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. It may also include something that holds the program for a certain period of time, such as a volatile memory inside a computer system that serves as a server or client in that case. Further, the program may be for realizing a part of the functions described above, or may be capable of realizing the functions described above in combination with a program already recorded in the computer system. It may be realized using a programmable logic device such as FPGA.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiment of the present invention has been described in detail with reference to the drawings, the specific configuration is not limited to this embodiment, and design and the like are included within the scope of the gist of the present invention.

１…三次元形状モデル生成装置
１０１…画像データ取得部
１０２…三次元形状生成部
１０３…ピント領域検出部
１０４…距離算出部
１０５…スケール推定部
１０６…スケール変換部
１０７…画像データ記憶部
１０８…三次元形状記憶部
１０９…スケール情報記憶部
１１０…マーカスケール推定部 DESCRIPTION OF SYMBOLS 1... Three-dimensional shape model generation apparatus 101... Image data acquisition part 102... Three-dimensional shape generation part 103... Focus area detection part 104... Distance calculation part 105... Scale estimation part 106... Scale conversion part 107... Image data storage part 108... Three-dimensional shape storage unit 109 Scale information storage unit 110 Marker scale estimation unit

Claims

a three-dimensional shape generation unit that generates a three-dimensional shape model of the object from a plurality of multi-viewpoint images obtained by imaging the object from different viewpoints;
a focus area detection unit that detects an in-focus area in each image of the multi-viewpoint image;
Virtual distance between a projection image obtained by projecting the three-dimensional shape model onto a two-dimensional plane and a viewpoint position in the projection image, for pixels in an in-focus area in each image of the multi-viewpoint image. a distance calculation unit that calculates a virtual distance;
a scale estimation unit that estimates a scale value indicating a ratio of the projected image and an actual distance to a viewpoint position in the projected image to the virtual distance;
a scale conversion unit that converts the virtual distance into the real distance using the scale value;
with
The focus area detection unit uses a learned model generated by performing machine learning using a learning data set in which inputs and outputs are associated to determine the focus of each image of the multi-view image. detect the region where
The input of the learning data set is an input image for learning,
The output of the learning data set is information indicating the focused region in the input image,
3D shape model generator.

The input of the learning data set is each image of a multi-view image,
The output of the learning data set is a corrected three-dimensional shape model obtained by scale-correcting the three-dimensional shape model of the marker whose actual dimensions are known based on the actual size of the marker, and projecting the corrected three-dimensional shape model onto a two-dimensional plane. Information indicating an in-focus region in the corrected projected image, determined based on the depth value for each pixel in the corrected projected image,
3. The three-dimensional geometric model generation device according to claim 1 .

a three-dimensional shape generation unit that generates a three-dimensional shape model of the object from a plurality of multi-viewpoint images obtained by imaging the object from different viewpoints;
a focus area detection unit that detects an in-focus area in each image of the multi-viewpoint image;
Virtual distance between a projection image obtained by projecting the three-dimensional shape model onto a two-dimensional plane and a viewpoint position in the projection image, for pixels in an in-focus area in each image of the multi-viewpoint image. a distance calculation unit that calculates a virtual distance;
a scale estimation unit that estimates a scale value indicating a ratio of the projected image and an actual distance to a viewpoint position in the projected image to the virtual distance;
a scale conversion unit that converts the virtual distance into the real distance using the scale value;
with
The distance calculation unit determines whether or not the pixel determined to be in focus by the focus area detection unit is an edge, and the pixel determined to be in focus by the focus area detection unit. based on at least one of the determination results of whether or not the corresponding pixels of the projection image obtained by projecting the three-dimensional shape model onto the two-dimensional plane are edges, Determining whether to use the virtual distance in the corresponding pixel of the projected image to calculate the virtual distance in the projected image;
3D shape model generator.

The scale estimation unit derives the actual distance based on a depth of field calculated from camera parameters of a camera that captured the multi-viewpoint image.
The three-dimensional geometric model generation device according to any one of claims 1 to 3 .

a marker scale estimation unit for deriving the actual distance based on a corrected three-dimensional shape model obtained by performing scale correction based on the actual dimensions of the three-dimensional shape model of the marker whose actual dimensions are known;
The scale estimator estimates the scale value using the actual distance derived by the marker scale estimator.
The three-dimensional geometric model generation device according to any one of claims 1 to 3 .

Based on the corrected three-dimensional shape model, the marker scale estimator determines that out of pixels in a corrected projection image obtained by projecting the corrected three-dimensional shape model onto a two-dimensional plane, the pixels are in focus. deriving the actual distance based on the determined pixel depth values;
The three-dimensional geometric model generation device according to claim 5 .

The scale estimation unit derives the actual distance based on the distance obtained from the focus function of the camera that captured the multi-view image.
The three-dimensional geometric model generation device according to any one of claims 1 to 3 .

a three-dimensional shape generation unit that generates a three-dimensional shape model of the object from a plurality of multi-viewpoint images obtained by imaging the object from different viewpoints;
a focus area detection unit that detects an in-focus area in each image of the multi-viewpoint image;
Virtual distance between a projection image obtained by projecting the three-dimensional shape model onto a two-dimensional plane and a viewpoint position in the projection image, for pixels in an in-focus area in each image of the multi-viewpoint image. a distance calculation unit that calculates a virtual distance;
a scale estimation unit that estimates a scale value indicating a ratio of the projected image and an actual distance to a viewpoint position in the projected image to the virtual distance;
a scale conversion unit that converts the virtual distance into the real distance using the scale value;
with
The distance calculation unit associates a distance corresponding to a depth value of a pixel included in the detected in-focus region with each pixel of the projection image, thereby calculating the virtual distance of each pixel of the projection image. calculate the distance,
The focus area detection unit detects the degree of focus for each pixel,
The distance calculation unit weights the distance corresponding to the depth value of the pixels included in the detected focused area according to the degree of the focus, and calculates the weighted distance in the projection image. calculating the virtual distance for each pixel of the projection image by corresponding to each pixel of
3D shape model generator.

a three-dimensional shape generation unit that generates a three-dimensional shape model of the object from a plurality of multi-viewpoint images obtained by imaging the object from different viewpoints;
a focus area detection unit that detects an in-focus area in each image of the multi-viewpoint image;
Virtual distance between a projection image obtained by projecting the three-dimensional shape model onto a two-dimensional plane and a viewpoint position in the projection image, for pixels in an in-focus area in each image of the multi-viewpoint image. a distance calculation unit that calculates a virtual distance;
a scale estimation unit that estimates a scale value indicating a ratio of the projected image and an actual distance to a viewpoint position in the projected image to the virtual distance;
a scale conversion unit that converts the virtual distance into the real distance using the scale value;
with
The distance calculation unit associates a distance corresponding to a depth value of a pixel included in the detected in-focus region with each pixel of the projection image, thereby calculating the virtual distance of each pixel of the projection image. calculating a distance, weighting the distance corresponding to the depth value of the pixels included in the detected in-focus region according to the size of the distance, and applying the weighted distance to the pixels of the projection image; calculating the virtual distance for each pixel of the projected image by matching each
3D shape model generator.

a three-dimensional shape generation unit that generates a three-dimensional shape model of the object from a plurality of multi-viewpoint images obtained by imaging the object from different viewpoints;
a focus area detection unit that detects an in-focus area in each image of the multi-viewpoint image;
Virtual distance between a projection image obtained by projecting the three-dimensional shape model onto a two-dimensional plane and a viewpoint position in the projection image, for pixels in an in-focus area in each image of the multi-viewpoint image. a distance calculation unit that calculates a virtual distance;
a scale estimation unit that estimates a scale value indicating a ratio of the projected image and an actual distance to a viewpoint position in the projected image to the virtual distance;
a scale conversion unit that converts the virtual distance into the real distance using the scale value;
with
The distance calculation unit calculates a distance according to a depth value of each pixel in an in-focus region in each image of the multi-viewpoint image, associates the calculated distance with a pixel of the projection image, and calculates the depth of the projection image. When there are a plurality of distances corresponding to pixels, the plurality of distances are compared, and the virtual distance in the pixels of the projection image is changed to the virtual distance in the projection image according to the degree of variation in the plurality of distances. Determine whether to use for calculation,
3D shape model generator.

A three-dimensional shape model generation method performed by a computer,
A three-dimensional shape generation step in which a three-dimensional shape generation unit generates a three-dimensional shape model of the object from a plurality of multi-viewpoint images of the object taken from different viewpoints;
A focus area detection step in which a focus area detection unit detects an in-focus area in each image of the multi-viewpoint image;
A distance calculation unit calculates a virtual image between a projected image obtained by projecting the three-dimensional shape model onto a two-dimensional plane and a viewpoint position in the projected image for pixels in an in-focus area in each image of the multi-viewpoint image. a distance calculation step of calculating a virtual distance that is a realistic distance;
a scale estimation step in which a scale estimation unit estimates a scale value indicating a ratio of the projected image and an actual distance to a viewpoint position in the projected image to the virtual distance;
a scale conversion step in which a scale conversion unit converts the virtual distance into the real distance using the scale value;
including
In the focus area detection step, using a learned model generated by performing machine learning using a learning data set in which inputs and outputs are associated, the focus of each image of the multi-view image is adjusted. detect the region where
The input of the learning data set is an input image for learning,
The output of the learning data set is information indicating the focused region in the input image,
A three-dimensional shape model generation method.

A three-dimensional shape model generation method performed by a computer,
A three-dimensional shape generation step in which a three-dimensional shape generation unit generates a three-dimensional shape model of the object from a plurality of multi-viewpoint images of the object taken from different viewpoints;
A focus area detection step in which a focus area detection unit detects an in-focus area in each image of the multi-viewpoint image;
A distance calculation unit calculates a virtual image between a projected image obtained by projecting the three-dimensional shape model onto a two-dimensional plane and a viewpoint position in the projected image for pixels in an in-focus area in each image of the multi-viewpoint image. a distance calculation step of calculating a virtual distance that is a realistic distance;
a scale estimation step in which a scale estimation unit estimates a scale value indicating a ratio of the projected image and an actual distance to a viewpoint position in the projected image to the virtual distance;
a scale conversion step in which a scale conversion unit converts the virtual distance into the real distance using the scale value;
including
In the distance calculation step, the determination result of whether or not the pixel determined to be in focus in the focus area detection step is an edge, and the determination that the pixel is in focus in the focus area detection step. at least one of the results of determining whether or not the corresponding pixel of the projection image obtained by projecting the three-dimensional shape model onto the two-dimensional plane is an edge, determining whether or not to use the virtual distance in the corresponding pixel of the projection image for calculating the virtual distance in the projection image,
A three-dimensional shape model generation method.

A three-dimensional shape model generation method performed by a computer,
A three-dimensional shape generation step in which a three-dimensional shape generation unit generates a three-dimensional shape model of the object from a plurality of multi-viewpoint images of the object taken from different viewpoints;
A focus area detection step in which a focus area detection unit detects an in-focus area in each image of the multi-viewpoint image;
A distance calculation unit calculates a virtual image between a projected image obtained by projecting the three-dimensional shape model onto a two-dimensional plane and a viewpoint position in the projected image for pixels in an in-focus area in each image of the multi-viewpoint image. a distance calculation step of calculating a virtual distance that is a realistic distance;
a scale estimation step in which a scale estimation unit estimates a scale value indicating a ratio of the projected image and an actual distance to a viewpoint position in the projected image to the virtual distance;
a scale conversion step in which a scale conversion unit converts the virtual distance into the real distance using the scale value;
including
In the distance calculation step, the distance corresponding to the depth value of the pixels included in the detected in-focus area is associated with each pixel of the projection image, thereby obtaining the virtual distance of each pixel of the projection image. calculate the distance,
detecting the degree of focus for each pixel in the focus area detection step;
In the distance calculation step, a distance corresponding to a depth value of a pixel included in the detected focused area is weighted according to the degree of focus, and the weighted distance is calculated as the projected image. calculating the virtual distance for each pixel of the projection image by corresponding to each pixel of
A three-dimensional shape model generation method.

A three-dimensional shape model generation method performed by a computer,
A three-dimensional shape generation step in which a three-dimensional shape generation unit generates a three-dimensional shape model of the object from a plurality of multi-viewpoint images of the object taken from different viewpoints;
A focus area detection step in which a focus area detection unit detects an in-focus area in each image of the multi-viewpoint image;
A distance calculation unit calculates a virtual image between a projected image obtained by projecting the three-dimensional shape model onto a two-dimensional plane and a viewpoint position in the projected image for pixels in an in-focus area in each image of the multi-viewpoint image. a distance calculation step of calculating a virtual distance that is a realistic distance;
a scale estimation step in which a scale estimation unit estimates a scale value indicating a ratio of the projected image and an actual distance to a viewpoint position in the projected image to the virtual distance;
a scale conversion step in which a scale conversion unit converts the virtual distance into the real distance using the scale value;
including
In the distance calculation step, the distance corresponding to the depth value of the pixels included in the detected in-focus area is associated with each pixel of the projection image, thereby obtaining the virtual distance of each pixel of the projection image. calculating a distance, weighting the distance corresponding to the depth value of the pixels included in the detected in-focus region according to the size of the distance, and applying the weighted distance to the pixels of the projection image; calculating the virtual distance for each pixel of the projected image by matching each
A three-dimensional shape model generation method.

A three-dimensional shape model generation method performed by a computer,
A three-dimensional shape generation step in which a three-dimensional shape generation unit generates a three-dimensional shape model of the object from a plurality of multi-viewpoint images of the object taken from different viewpoints;
A focus area detection step in which a focus area detection unit detects an in-focus area in each image of the multi-viewpoint image;
A distance calculation unit calculates a virtual image between a projected image obtained by projecting the three-dimensional shape model onto a two-dimensional plane and a viewpoint position in the projected image for pixels in an in-focus area in each image of the multi-viewpoint image. a distance calculation step of calculating a virtual distance that is a realistic distance;
a scale estimation step in which a scale estimation unit estimates a scale value indicating a ratio of the projected image and an actual distance to a viewpoint position in the projected image to the virtual distance;
a scale conversion step in which a scale conversion unit converts the virtual distance into the real distance using the scale value;
including
In the distance calculation step, a distance corresponding to a depth value of each pixel in an in-focus region in each image of the multi-viewpoint image is calculated, the calculated distance is associated with the pixel of the projection image, and the projection image is calculated. When there are a plurality of distances corresponding to pixels, the plurality of distances are compared, and the virtual distance in the pixels of the projection image is changed to the virtual distance in the projection image according to the degree of variation in the plurality of distances. Determine whether to use for calculation,
A three-dimensional shape model generation method.

the computer,
3D shape generation means for generating a 3D shape model of an object from a plurality of multi-viewpoint images of the object taken from different viewpoints;
Focus area detection means for detecting an in-focus area in each image of the multi-viewpoint image;
Virtual distance between a projection image obtained by projecting the three-dimensional shape model onto a two-dimensional plane and a viewpoint position in the projection image, for pixels in an in-focus area in each image of the multi-viewpoint image. distance calculation means for calculating a virtual distance;
scale estimation means for estimating a scale value indicating a ratio of the projected image and an actual distance to a viewpoint position in the projected image to the virtual distance;
scale conversion means for converting the virtual distance into the real distance using the scale value;
A program for operating as
In the distance calculation means, the distance corresponding to the depth value of the pixels included in the detected in-focus area is associated with each pixel of the projection image, thereby obtaining the virtual distance of each pixel of the projection image. calculating a distance, weighting the distance corresponding to the depth value of the pixels included in the detected in-focus region according to the size of the distance, and applying the weighted distance to the pixels of the projection image; calculating the virtual distance for each pixel of the projected image by matching each
program.

the computer,
3D shape generation means for generating a 3D shape model of an object from a plurality of multi-viewpoint images of the object taken from different viewpoints;
Focus area detection means for detecting an in-focus area in each image of the multi-viewpoint image;
Virtual distance between a projection image obtained by projecting the three-dimensional shape model onto a two-dimensional plane and a viewpoint position in the projection image, for pixels in an in-focus area in each image of the multi-viewpoint image. distance calculation means for calculating a virtual distance;
scale estimation means for estimating a scale value indicating a ratio of the projected image and an actual distance to a viewpoint position in the projected image to the virtual distance;
scale conversion means for converting the virtual distance into the real distance using the scale value;
A program for operating as
In the distance calculation means, the determination result of whether or not the pixel determined to be in focus by the focus area detection means is an edge, and the determination by the focus area detection means that the pixel is in focus. at least one of the results of determining whether or not the corresponding pixel of the projection image obtained by projecting the three-dimensional shape model onto the two-dimensional plane is an edge, determining whether or not to use the virtual distance in the corresponding pixel of the projection image for calculating the virtual distance in the projection image,
program.

the computer,
3D shape generation means for generating a 3D shape model of an object from a plurality of multi-viewpoint images of the object taken from different viewpoints;
Focus area detection means for detecting an in-focus area in each image of the multi-viewpoint image;
Virtual distance between a projection image obtained by projecting the three-dimensional shape model onto a two-dimensional plane and a viewpoint position in the projection image, for pixels in an in-focus area in each image of the multi-viewpoint image. distance calculation means for calculating a virtual distance;
scale estimation means for estimating a scale value indicating a ratio of the projected image and an actual distance to a viewpoint position in the projected image to the virtual distance;
scale conversion means for converting the virtual distance into the real distance using the scale value;
A program for operating as
In the distance calculation means, the distance corresponding to the depth value of the pixels included in the detected in-focus area is associated with each pixel of the projection image, thereby obtaining the virtual distance of each pixel of the projection image. calculate the distance,
detecting the degree of focus for each pixel in the focus area detection means;
In the distance calculation means, the distance corresponding to the depth value of the pixels included in the detected focused area is weighted according to the degree of focus, and the weighted distance is calculated as the projected image. calculating the virtual distance for each pixel of the projection image by corresponding to each pixel of
program.

the computer,
3D shape generation means for generating a 3D shape model of an object from a plurality of multi-viewpoint images of the object taken from different viewpoints;
Focus area detection means for detecting an in-focus area in each image of the multi-viewpoint image;
Virtual distance between a projection image obtained by projecting the three-dimensional shape model onto a two-dimensional plane and a viewpoint position in the projection image, for pixels in an in-focus area in each image of the multi-viewpoint image. distance calculation means for calculating a virtual distance;
scale estimation means for estimating a scale value indicating a ratio of the projected image and an actual distance to a viewpoint position in the projected image to the virtual distance;
scale conversion means for converting the virtual distance into the real distance using the scale value;
A program for operating as
The distance calculating means calculates a distance corresponding to a depth value of each pixel in an in-focus area in each image of the multi-viewpoint image, associates the calculated distance with a pixel of the projection image, and calculates the depth of the projection image. When there are a plurality of distances corresponding to pixels, the plurality of distances are compared, and the virtual distance in the pixels of the projection image is changed to the virtual distance in the projection image according to the degree of variation in the plurality of distances. Determine whether to use for calculation,
program.