JP2022042153A

JP2022042153A - Three-dimensional shooting model generation system and three-dimensional shooting model generation method

Info

Publication number: JP2022042153A
Application number: JP2020147419A
Authority: JP
Inventors: 創小谷; Hajime Kotani; 達彦小林; Tatsuhiko Kobayashi
Original assignee: CRESCENT Inc
Current assignee: CRESCENT Inc
Priority date: 2020-09-02
Filing date: 2020-09-02
Publication date: 2022-03-14
Also published as: WO2022050159A1; TW202211680A

Abstract

To provide a three-dimensional shooting model generation system that can be realized with a simple configuration and that anyone can easily enjoy and handle.SOLUTION: A three-dimensional shooting model generation system includes cameras 201 to 204 which are a plurality of photographing devices 20 for photographing a subject OB from a plurality of directions, and a controller 110 and an image processing server 120 which are arithmetic processing devices 10 configured to generate a three-dimensional photographing model of the subject OB on the basis of image data obtained by photographing the subject OB by the plurality of photographing devices 201 to 204. The depth of field DOF of an image pickup lens system 221 used in the cameras 201 to 204 is set to ±0.5 to ±1.5 m centered on a focal position CIF on an optical axis Ax.SELECTED DRAWING: Figure 2

Description

本発明は、複数の方向から撮影した被写体の画像データに基づいて該被写体の三次元モデルを生成するための実写立体撮影技術（ＶｏｌｕｍｅｔｒｉｃＣａｐｕｔｕｒｅ）に関する。 The present invention relates to a live-action stereoscopic photography technique (Volumetric Capture) for generating a three-dimensional model of a subject based on image data of the subject taken from a plurality of directions.

近年、マーカーを使わずに、３６０度全方位から対象となる人や物の動きを撮影することで、被写体の三次元モデルを生成することができる実写立体撮影技術（ＶｏｌｕｍｅｔｒｉｃＣａｐｕｔｕｒｅ）が注目されている。本技術によれば、複数のカメラで同期撮影した人物等の動画像からフレーム毎に被写体を抽出し三次元モデル化することで、被写体モデルを仮想空間の任意の位置に配置してその動きをディスプレイ上で自由に再現することができる。複数の視点から撮影した画像を演算処理し、任意の視点から見えるように再現された画像は「自由視点画像」または「仮想視点画像」と呼ばれ、例えば特許文献１には、そのような仮想視点画像を生成するシステムが開示されている。 In recent years, a live-action stereoscopic photography technique (Volumetric Capture) that can generate a three-dimensional model of a subject by photographing the movement of a target person or object from all directions of 360 degrees without using a marker has attracted attention. There is. According to this technology, by extracting a subject for each frame from moving images of people taken synchronously by multiple cameras and creating a three-dimensional model, the subject model can be placed at any position in the virtual space and its movement can be measured. It can be freely reproduced on the display. An image obtained by arithmetically processing an image taken from a plurality of viewpoints and reproduced so as to be visible from an arbitrary viewpoint is called a "free viewpoint image" or a "virtual viewpoint image". For example, in Patent Document 1, such a virtual image is used. A system for generating a viewpoint image is disclosed.

特開２０１７－２１１８２７号公報Japanese Unexamined Patent Publication No. 2017-21827

ＶｏｌｕｍｅｔｒｉｃＣａｐｕｔｕｒｅにおいては、上述したように撮影したフレーム画像から背景を排除し対象とする被写体を抽出する必要がある。被写体と背景とを区別するための手法として、従来、被写体にメッシュ光を投影する方法や、背景にグリーンバックを使用する方法などが採られている。しかし、これら従来の手法では、撮影の場にメッシュ光を投影するためのプロジェクタやグリーンバックなどを設置しなければならず、設備が整えられたスタジオ等でなければ撮影することができない。また、撮影した画像データに対しオブジェクト抽出処理を施して被写体を抽出する手法も考えられる。しかし、複数の視点から撮影された毎秒数十フレームの高精細動画像データに対し、フレーム毎に逐一オブジェクト（被写体画像）を抽出するには、高速大容量の処理能力を有する大型のプラットフォームが必要となる。 In the Volumetric Capture, it is necessary to remove the background from the frame image taken as described above and extract the target subject. Conventionally, as a method for distinguishing between a subject and a background, a method of projecting mesh light on the subject and a method of using a green background as the background have been adopted. However, with these conventional methods, it is necessary to install a projector, a green background, or the like for projecting mesh light in the shooting place, and shooting can only be performed in a studio equipped with equipment. It is also conceivable to consider a method of extracting a subject by performing an object extraction process on the captured image data. However, in order to extract objects (subject images) one by one for each frame of high-definition moving image data of several tens of frames per second taken from multiple viewpoints, a large platform with high-speed and large-capacity processing capacity is required. It becomes.

本発明は、このような従来の課題に鑑みてなされたものであり、簡素な構成で実現でき、誰もが手軽に楽しめて取り扱うことができる立体撮影モデル生成システムおよびそのシステムで実行される立体撮影モデル生成方法を提供することを目的としている。 The present invention has been made in view of such conventional problems, and is a stereoscopic imaging model generation system that can be realized with a simple configuration and can be easily enjoyed and handled by anyone, and a stereoscopic image executed by the system. The purpose is to provide a method for generating a shooting model.

上述した課題を解決するため、本発明は、被写体を複数の方向から撮影するための複数の撮影装置と、前記複数の撮影装置が前記被写体を撮影することで得られる画像データに基づいて該被写体の立体撮影モデルを生成するよう構成された演算処理装置とを含む立体撮影モデル生成システムであって、前記撮影装置に用いられる撮像レンズ系の被写界深度が、光軸上において焦点が合う位置を中心に±０．５～±１．５ｍに設定されている、立体撮影モデル生成システムである。 In order to solve the above-mentioned problems, the present invention has a plurality of photographing devices for photographing a subject from a plurality of directions, and the subject based on image data obtained by the plurality of photographing devices photographing the subject. A stereoscopic photography model generation system including an arithmetic processing device configured to generate a stereoscopic photography model of the above, in which the depth of field of the image pickup lens system used in the image pickup device is in focus on the optical axis. It is a stereoscopic photography model generation system set to ± 0.5 to ± 1.5 m around the center.

立体撮影モデル生成システムは、前記被写体が前記被写界深度の内にある場合の前記輪郭のエッジ検出精度の最大値に対し、該被写体が前記被写界深度の外にある場合のエッジ検出精度の比が０．７以下であることが好ましい。 The stereoscopic photography model generation system has an edge detection accuracy when the subject is outside the depth of field, whereas the maximum value of the edge detection accuracy of the contour when the subject is within the depth of field. The ratio of is preferably 0.7 or less.

また、立体撮影モデル生成システムは、前記エッジ検出精度が、前記輪郭の境界における輝度変化を一次微分した値に基づいて算定されることが好ましい。または、立体撮影モデル生成システムは、前記エッジ検出精度が、前記輪郭の境界における輝度変化を二次微分した値に基づいて算定されるものでもよい。 Further, in the stereoscopic imaging model generation system, it is preferable that the edge detection accuracy is calculated based on the value obtained by first-derivating the change in luminance at the boundary of the contour. Alternatively, the stereoscopic imaging model generation system may be one in which the edge detection accuracy is calculated based on a value obtained by quadrically differentiating the change in luminance at the boundary of the contour.

また、立体撮影モデル生成システムは、前記演算処理装置が、前記立体撮影モデルに基づいて自由視点画像を生成する自由視点画像生成部を有していることが好ましい。 Further, in the stereoscopic photography model generation system, it is preferable that the arithmetic processing unit has a free viewpoint image generation unit that generates a free viewpoint image based on the stereoscopic photography model.

また、本発明は、複数の撮影装置と、前記複数の撮影装置が被写体を撮影することで得られる画像データに基づいて該被写体の立体撮影モデルを生成する演算処理装置とを含み、前記撮影装置に用いられる撮像レンズ系の被写界深度が、光軸上において焦点が合う位置を中心に±０．５～±１．５ｍに設定されているシステムにおいて行われる立体撮影モデル生成方法であって、前記複数の撮影装置により前記被写体を複数の方向から撮影するステップと、前記演算処理装置が実行する処理が、前記複数の撮影装置により撮影された画像データから前記被写体の輪郭を検出するステップと、検出された前記輪郭によって画される前記被写体画像データに基づいて該被写体の立体撮影モデルを生成するステップとを含む三次元画像生成方法である。 The present invention also includes a plurality of photographing devices and an arithmetic processing device that generates a stereoscopic photographing model of the subject based on image data obtained by photographing the subject by the plurality of photographing devices. This is a stereoscopic photography model generation method performed in a system in which the depth of field of the image pickup lens system used in the above is set to ± 0.5 to ± 1.5 m centered on the position of focus on the optical axis. A step of photographing the subject from a plurality of directions by the plurality of photographing devices, and a step of detecting the contour of the subject from the image data photographed by the plurality of photographing devices in the process executed by the arithmetic processing device. It is a three-dimensional image generation method including a step of generating a stereoscopic photography model of the subject based on the subject image data imaged by the detected contour.

三次元画像生成方法において、前記被写体が前記被写界深度の内にある場合の前記輪郭のエッジ検出精度の最大値に対し、該被写体が前記被写界深度の外にある場合のエッジ検出精度の比が０．７以下であることが好ましい。 In the three-dimensional image generation method, the edge detection accuracy when the subject is outside the depth of field is compared with the maximum value of the edge detection accuracy of the contour when the subject is within the depth of field. The ratio of is preferably 0.7 or less.

また、三次元画像生成方法において、前記エッジ検出精度が、前記輪郭の境界における輝度変化を一次微分した値に基づいて算定されることが好ましい。または、三次元画像生成方法は、前記エッジ検出精度が、前記輪郭の境界における輝度変化を二次微分した値に基づいて算定されるものでもよい。 Further, in the three-dimensional image generation method, it is preferable that the edge detection accuracy is calculated based on the value obtained by first-derivating the change in luminance at the boundary of the contour. Alternatively, the three-dimensional image generation method may be one in which the edge detection accuracy is calculated based on a value obtained by quadrically differentiating the change in luminance at the boundary of the contour.

また、三次元画像生成方法において、前記演算処理装置が実行する処理が、前記立体撮影モデルに基づいて自由視点画像を生成するステップを更に含むことが好ましい。 Further, in the three-dimensional image generation method, it is preferable that the process executed by the arithmetic processing unit further includes a step of generating a free viewpoint image based on the stereoscopic imaging model.

また、三次元画像生成方法において、前記複数の撮影装置が被写体を撮影することで得られる画像データが動画像データであり、前記演算処理装置は、複数の前記動画像データを同期して前記被写体の動的な立体撮影モデルを生成することが好ましい。 Further, in the three-dimensional image generation method, the image data obtained by photographing the subject by the plurality of photographing devices is the moving image data, and the arithmetic processing apparatus synchronizes the plurality of the moving image data with the subject. It is preferable to generate a dynamic 3D imaging model of.

本発明によれば、簡素な構成で実現でき、誰もが手軽に楽しめて取り扱うことができる、立体撮影モデル生成システムおよびそのシステムで実行される立体撮影モデル生成方法を提供することができる。 According to the present invention, it is possible to provide a stereoscopic photography model generation system that can be realized with a simple configuration and that anyone can easily enjoy and handle, and a stereoscopic photography model generation method executed by the system.

本発明の一つの実施形態による立体撮影モデル生成システムのブロック図である。It is a block diagram of the stereoscopic photography model generation system by one Embodiment of this invention. カメラと被写界深度との位置関係を説明するための図である。It is a figure for demonstrating the positional relationship between a camera and a depth of field. テストパターンを用いてエッジ検出精度（シャープネス）を測定する方法を説明するための図である。It is a figure for demonstrating the method of measuring edge detection accuracy (sharpness) using a test pattern. カメラアダプタの機能ブロック図である。It is a functional block diagram of a camera adapter. コントロールボックス及びその周辺部の機能ブロック図である。It is a functional block diagram of a control box and its peripheral part. 画像処理サーバの機能ブロック図である。It is a functional block diagram of an image processing server. 本発明のもう一つの実施形態による立体撮影モデル生成システムのブロック図である。It is a block diagram of the stereoscopic photography model generation system by another embodiment of this invention.

本発明の好適な実施形態による立体撮影モデル生成システムは、被写体の三次元モデルを生成する等の処理を行う演算処理装置と、被写体を複数の方向から撮影するための複数の撮影装置とを備えて構成される。なお、本明細書においてカメラで撮影される「被写体」と「オブジェクト」とは同じ意味で用いている。被写体は人物に限らず物（可動物）であってもよい。また、本システムで生成されるオブジェクトの「立体撮影モデル」を「三次元モデル」または「三次元動画モデル」と称する。また、以下説明する実施形態において、「画像」とは動画像を想定しており、「三次元モデル」は動きを表現するための時間パラメータを含むが、本発明の実施にあたっては「画像」が写真等の静止画像を含むことを明記しておく。 The stereoscopic photography model generation system according to the preferred embodiment of the present invention includes an arithmetic processing unit that performs processing such as generating a three-dimensional model of the subject, and a plurality of imaging devices for photographing the subject from a plurality of directions. It is composed of. In this specification, "subject" and "object" taken by a camera are used interchangeably. The subject is not limited to a person but may be an object (movable object). Further, the "three-dimensional shooting model" of the object generated by this system is referred to as a "three-dimensional model" or a "three-dimensional moving image model". Further, in the embodiment described below, the "image" is assumed to be a moving image, and the "three-dimensional model" includes a time parameter for expressing the motion, but in carrying out the present invention, the "image" is used. It should be clearly stated that still images such as photographs are included.

以下、オブジェクト（被写体）ＯＢを人物とし、ある一定時間、その場での当該人物の動きを三次元モデル化する場合の例について、具体的な実施形態を説明する。 Hereinafter, a specific embodiment will be described with respect to an example in which an object (subject) OB is used as a person and the movement of the person on the spot is modeled three-dimensionally for a certain period of time.

図１は、本発明の一つの実施形態による立体撮影モデル生成システムのブロック図である。演算処理装置１０は、ユーザ側のコントローラ１１０と、インターネットＮ上のクラウドサーバシステムである画像処理サーバ１２０とを含む。コントローラ１１０の本体であるコントロールボックス１１１は、専用のコンピュータ装置または汎用のＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）とすることができる。また、コントローラ１１０は、ユーザによる操作を補助するために、コントロールボックス１１１に無線接続した例えばスマートフォンやタブレットＰＣ等の端末装置１１３を含むものでもよい。 FIG. 1 is a block diagram of a stereoscopic imaging model generation system according to an embodiment of the present invention. The arithmetic processing unit 10 includes a controller 110 on the user side and an image processing server 120 which is a cloud server system on the Internet N. The control box 111, which is the main body of the controller 110, can be a dedicated computer device or a general-purpose PC (Personal Computer). Further, the controller 110 may include a terminal device 113 such as a smartphone or a tablet PC wirelessly connected to the control box 111 in order to assist the operation by the user.

撮影装置２０は、デジタル動画撮影が可能な複数台のカメラ２０１～２０４を含む。カメラ２０１～２０４の本体には、カメラアダプタ２１１～２１４がそれぞれ装備されている。図１の実施形態では、撮影装置２０が４台のカメラで構成されているが、カメラの台数はこれに限定されず、例えば３～２４台程度の範囲で適宜増減させてもよい。 The photographing apparatus 20 includes a plurality of cameras 201 to 204 capable of capturing digital moving images. The main bodies of the cameras 201 to 204 are equipped with camera adapters 211 to 214, respectively. In the embodiment of FIG. 1, the photographing apparatus 20 is composed of four cameras, but the number of cameras is not limited to this, and may be appropriately increased or decreased in the range of, for example, about 3 to 24 cameras.

カメラ２０１～２０４は、床面に敷いた例えば５ｍ四方のマット２３０上の４隅および／または辺の所定位置に置かれた三脚（図示省略）に設置される。各カメラ２０１～２０４の撮像レンズ系２２１は、被写体ＯＢが立つマット２３０の中央に予め向けられている。なお、マット２３０は、被写体との区別がつき、被写体の抽出処理をし易くするためにグリーンのマットであることが好ましい。更にマットの形状は四角形に限定されず、カメラの台数に応じて多角形または円形であってもよい。 The cameras 201 to 204 are installed on a tripod (not shown) placed at predetermined positions on four corners and / or sides of, for example, a 5 m square mat 230 laid on the floor. The image pickup lens system 221 of each camera 201 to 204 is directed in advance to the center of the mat 230 on which the subject OB stands. The mat 230 is preferably a green mat in order to distinguish it from the subject and facilitate the extraction process of the subject. Further, the shape of the mat is not limited to a quadrangle, and may be a polygon or a circle depending on the number of cameras.

図２に示すように、本実施形態のカメラ２０１～２０４に用いられる撮像レンズ系２２１は、その被写界深度（ＤＯＦ；ＤｅｐｔｈｏｆＦｉｅｌｄ）が、光軸Ａｘにおいてレンズの焦点がちょうど合う中央位置（ＣＩＦ；ＣｅｎｔｅｒｉｎＦｏｕｃｕｓ）を中心に±０．５～±１．５ｍに設定されている。ただし、各カメラ２０１～２０４の光学的な特性は共通しており、カメラ２０１～２０４間では被写界深度特性そのもののばらつきはないものとする。 As shown in FIG. 2, the image pickup lens system 221 used in the cameras 201 to 204 of the present embodiment has a central position where the depth of field (DOF) of the image pickup lens system 221 is exactly the focus of the lens on the optical axis Ax. It is set to ± 0.5 to ± 1.5 m centering on (CIF; Center in Focus). However, it is assumed that the optical characteristics of the cameras 201 to 204 are common, and there is no variation in the depth of field characteristics themselves between the cameras 201 to 204.

なお、被写界深度は、被写体を撮影した像の焦点が合っているように見える被写体側の距離の範囲（フィールド）をいう。被写界深度よりも遠景および近景をカメラで撮影した像はぼやける。被写界深度が「深い」とは焦点が合う範囲が比較的広いことを意味し、被写界深度が「浅い」とは焦点が合う範囲が比較的狭いことを意味する。本実施形態の立体撮影モデル生成システムは、後に詳述するように、従来よりも被写界深度が浅いレンズを使用することに一つの特徴がある。 The depth of field refers to the range (field) of the distance on the subject side in which the image of the subject appears to be in focus. Images taken with a camera in a distant view or a near view rather than the depth of field are blurred. A "deep" depth of field means a relatively wide range of focus, and a "shallow" depth of field means a relatively narrow range of focus. As will be described in detail later, the stereoscopic photography model generation system of the present embodiment is characterized in that it uses a lens having a shallower depth of field than the conventional one.

カメラの被写界深度特性は、撮像レンズの焦点距離や絞り（Ｆ値）によっても影響を受ける。図２に示すように、本実施形態による撮像レンズ系２２１は、撮影場所の広さやオブジェクト（人物）ＯＢが動く範囲等の撮影環境・条件に合わせて、像の焦点がちょうど合う位置ＣＩＦを中心に被写界深度が±０．５～±１．５ｍの特性を有するレンズが選択採用される。なお、図２の実施例において、カメラ２０１から被写界深度の中央の位置ＣＩＦまでの距離は３．５ｍであり、Ｆ値は１とした。 The depth of field characteristics of a camera are also affected by the focal length and aperture (F value) of the image pickup lens. As shown in FIG. 2, the image pickup lens system 221 according to the present embodiment is centered on the position CIF where the image is exactly focused according to the shooting environment and conditions such as the size of the shooting location and the range in which the object (person) OB moves. A lens having a characteristic of a depth of field of ± 0.5 to ± 1.5 m is selectively adopted. In the embodiment of FIG. 2, the distance from the camera 201 to the position CIF at the center of the depth of field was 3.5 m, and the F value was 1.

光軸上のある位置が被写界深度内にあるか否かを判定する基準、言い換えると光軸上における被写界深度の境界を測定する基準として許容錯乱円径がある。しかし、本実施形態のように、被写体輪郭のエッジ検出精度（「シャープネス」ともいう。）を基準に被写界深度またはフィールドの境界を判定してもよい。より具体的には、被写体が被写界深度の内にある場合のエッジ検出精度の最大値に対し、該被写体が被写界深度の外にある場合のエッジ検出精度の比が０．７以下（つまり最大値に対し３０％減）であるか否かで境界を判定することができる。 There is a permissible circle of confusion diameter as a criterion for determining whether or not a certain position on the optical axis is within the depth of field, in other words, as a criterion for measuring the boundary of the depth of field on the optical axis. However, as in the present embodiment, the depth of field or the boundary of the field may be determined based on the edge detection accuracy (also referred to as “sharpness”) of the subject contour. More specifically, the ratio of the edge detection accuracy when the subject is outside the depth of field is 0.7 or less to the maximum value of the edge detection accuracy when the subject is within the depth of field. The boundary can be determined by whether or not it is (that is, 30% less than the maximum value).

輪郭のエッジ検出精度（シャープネス）は、例えば図３に示すように、白黒が明確に区分けされたテストパターンＴＰを用い、その白黒が反転する境界部分の輝度変化に基づいて正確に測定することができる。先ず、テストパターンＴＰが被写界深度ＤＯＦの中央の位置ＣＩＦにあるときのシャープネスの最大値ＳＥｍａｘを測定する。次に、テストパターンＴＰの光軸Ａｘ上の位置を、ＣＩＦを中心にカメラ２０１に近い側（マイナス方向）と遠い側（プラス方向）に徐々に変えながらシャープネスＳＥを測定する。そして、シャープネスの比ＳＥ／ＳＥｍａｘが０．７となる前後両位置を、被写界深度ＤＯＦの境界と判定することができる。 As shown in FIG. 3, for example, the edge detection accuracy (sharpness) of the contour can be accurately measured based on the brightness change of the boundary portion where the black and white are inverted by using the test pattern TP in which the black and white are clearly separated. can. First, the maximum value SEmax of sharpness when the test pattern TP is at the central position CIF of the depth of field DOF is measured. Next, the sharpness SE is measured while gradually changing the position of the test pattern TP on the optical axis Ax to the side closer to the camera 201 (minus direction) and the side farther from the camera 201 (plus direction) around the CIF. Then, both the front and rear positions where the sharpness ratio SE / SEmax is 0.7 can be determined as the boundary of the depth of field DOF.

エッジ検出精度は、例えば輪郭の境界における輝度変化を一次微分した値に基づいて算定することができる（勾配法）。この勾配法では、輝度を一次微分した最大値または最小値の縦軸の大きさでシャープネスを算定することができる。また、エッジ検出精度は、輪郭の境界における輝度変化を二次微分した値に基づいて算定してもよい（ラプラシアン法）。このラプラシアン法では、輝度の変曲点を構成する横軸の振幅の大きさでシャープネスを算定することができる。なお、エッジ検出精度の算定法は、上述の勾配法およびラプラシアン法に限定されない。 The edge detection accuracy can be calculated, for example, based on the value obtained by first-derivating the change in luminance at the boundary of the contour (gradient method). In this gradient method, the sharpness can be calculated by the size of the vertical axis of the maximum value or the minimum value obtained by firstly differentiating the luminance. Further, the edge detection accuracy may be calculated based on a value obtained by quadratic differentiation of the change in luminance at the boundary of the contour (Laplacian method). In this Laplacian method, sharpness can be calculated by the magnitude of the amplitude on the horizontal axis that constitutes the inflection of luminance. The method for calculating the edge detection accuracy is not limited to the gradient method and the Laplacian method described above.

次に、カメラアダプタ２１１～２１４は、例えばＥｔｈｅｒｎｅｔ（登録商標）、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）等の規格に準拠した通信ケーブル２４０を介してスイッチングハブ３００に接続されている。コントローラ１１０（コントロールボックス１１１）は、このスイッチングハブ３００を経由して、各カメラ２０１～２０４が撮影した画像のデータや、カメラパラメータ情報を取得する。 Next, the camera adapters 211 to 214 are connected to the switching hub 300 via a communication cable 240 conforming to a standard such as Ethernet (registered trademark) or USB (Universal Serial Bus). The controller 110 (control box 111) acquires image data taken by each camera 201 to 204 and camera parameter information via the switching hub 300.

なお、図１の実施形態のシステムでは、それぞれのカメラ２０１～２０４およびカメラアダプタ２１１～２１４が一つのスイッチングハブ３００に接続された、いわゆるスター型のネットワーク構造となっている。ただし、撮影画像の４Ｋまたは８Ｋなどの高精細化、高フレームレート化に伴う大容量化に対応するために、隣接のカメラアダプタをカスケード接続したデイジーチェーンにより複数のカメラ２０１～２０４を接続してもよい。 The system of the embodiment of FIG. 1 has a so-called star-shaped network structure in which the cameras 201 to 204 and the camera adapters 211 to 214 are connected to one switching hub 300. However, in order to support higher definition such as 4K or 8K of captured images and larger capacity due to higher frame rate, multiple cameras 201 to 204 are connected by a daisy chain in which adjacent camera adapters are connected in cascade. May be good.

図４は、カメラアダプタ２１１内部の機能ブロック図である。なお、他のカメラアダプタ２１２～２１４もこれと同一の構成を有しているので、それらの説明は省略する。同図に示すように、カメラアダプタ２１１は、大まかにはカメラ制御部２１５、補正処理部２１６およびフレーム生成・伝送部２１７に区分けされる。カメラ制御部２１５は、レンズ調整部２１５１、撮影動作制御部２１５２および撮影パラメータ設定部２１５３を有している。 FIG. 4 is a functional block diagram inside the camera adapter 211. Since the other camera adapters 212 to 214 have the same configuration as this, their description will be omitted. As shown in the figure, the camera adapter 211 is roughly divided into a camera control unit 215, a correction processing unit 216, and a frame generation / transmission unit 217. The camera control unit 215 has a lens adjustment unit 2151, a shooting operation control unit 2152, and a shooting parameter setting unit 2153.

レンズ調整部２１５１は、カメラ２０１の撮像レンズ系２２１のフォーカス、ズームおよび露出等のパラメータを電気的に調整する。なお、撮像レンズ系２２１のフォーカス等の調整は自動で行ってもよいし、外部（例えばコントローラ１１０）からの指令やユーザがモニター画面を見ながらマニュアルで行ってもよい。 The lens adjustment unit 2151 electrically adjusts parameters such as focus, zoom, and exposure of the image pickup lens system 221 of the camera 201. The focus and the like of the image pickup lens system 221 may be adjusted automatically, or may be manually performed by a command from an external device (for example, the controller 110) or by the user while looking at the monitor screen.

撮影動作制御部２１５２は、外部（例えばコントローラ１１０）からのゲンロック（Ｇｅｎｌｏｃｋ）信号等に同期して撮影の開始や停止の動作を制御する。 The shooting operation control unit 2152 controls the start and stop operations of shooting in synchronization with a Genlock signal or the like from an external device (for example, the controller 110).

撮影パラメータ設定部２１５３は、画像解像度（撮影画素数）、色深度、フレームレートおよびホワイトバランス等の撮影パラメータを設定し、撮影動作制御部２１５２をサブミットする。撮影パラメータの設定は外部（例えばコントローラ１１０）から指定されてもよいし、撮影パラメータ設定部２１５３が自動で設定してもよい。 The shooting parameter setting unit 2153 sets shooting parameters such as image resolution (number of shooting pixels), color depth, frame rate, and white balance, and submits the shooting operation control unit 2152. The shooting parameter setting may be specified from the outside (for example, the controller 110), or the shooting parameter setting unit 2153 may automatically set the shooting parameter.

補正処理部２１６は、キャッシュメモリ２１８に保持された生画像データに対し色補正処理や、カメラの振動に起因するブレの補正処理、レンズ固有の歪み補正処理などを行う。 The correction processing unit 216 performs color correction processing, blur correction processing due to camera vibration, lens-specific distortion correction processing, and the like on the raw image data held in the cache memory 218.

フレーム生成・伝送部２１７は、フレームメモリ２１９上で画像データをフォーマット変換しフレームデータを生成する。フレームデータには、撮影された時刻を示すタイムコードや、撮影したカメラの識別子情報、カメラパラメータ（フォーカス値、ズーム値、レンズ特性等）などのメタ情報（付属情報）が付与されファイル化される。フレーム生成・伝送部２１７の通信アダプタ２１７１は、作成したフレーム画像データをコントローラ１１０に伝送する。 The frame generation / transmission unit 217 formats and converts the image data on the frame memory 219 to generate the frame data. Meta information (attached information) such as time code indicating the shooting time, identifier information of the shooting camera, camera parameters (focus value, zoom value, lens characteristics, etc.) is added to the frame data and filed. .. The communication adapter 2171 of the frame generation / transmission unit 217 transmits the created frame image data to the controller 110.

次に、コントローラ１１０の構成を説明する。図５は、コントローラ１１０の本体であるコントロールボックス１１１及びその周辺部の機能ブロック図である。コントロールボックス１１１は、データ入力部１１１１、データ同期部１１１２、キャリブレーション部１１１３、データ圧縮部１１１４、ストレージ１１１５、ネットワークアダプタ部１１１６、フォーマット変換部１１１７、レンダリング処理部１１１８およびユーザインタフェース部１１１９を有している。 Next, the configuration of the controller 110 will be described. FIG. 5 is a functional block diagram of the control box 111, which is the main body of the controller 110, and its peripheral portion. The control box 111 has a data input unit 1111, a data synchronization unit 1112, a calibration unit 1113, a data compression unit 1114, a storage 1115, a network adapter unit 1116, a format conversion unit 1117, a rendering processing unit 1118, and a user interface unit 1119. ing.

データ入力部１１１１は、通信ケーブル２４０とスイッチングハブ３００とを介して、カメラアダプタ２１１～２１４と接続されており、カメラ２０１～２０４が撮影した画像のデータであるフレーム画像データをリアルタイムで受信する。 The data input unit 1111 is connected to the camera adapters 211 to 214 via the communication cable 240 and the switching hub 300, and receives the frame image data which is the data of the images taken by the cameras 201 to 204 in real time.

データ同期部１１１２は、取得した画像データを、内部の高速ＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）に一次的に記憶し、全てのカメラ２０１～２０４のデータが揃うまでバッファする。画像データにはタイムコード（時間情報）とカメラ識別子情報とが付与されており、データ同期部１１１２は、各タイムコードに基づいて各画像データの時刻同期制御を行う。また、データ同期部１１１２は、例えば特徴点マッチング法を用いて各画像データの同期を行ってもよい。 The data synchronization unit 1112 temporarily stores the acquired image data in an internal high-speed DRAM (Dynamic Random Access Memory), and buffers the data until all the data of the cameras 201 to 204 are prepared. A time code (time information) and camera identifier information are assigned to the image data, and the data synchronization unit 1112 performs time synchronization control of each image data based on each time code. Further, the data synchronization unit 1112 may synchronize each image data by using, for example, a feature point matching method.

各カメラ２０１～２０４と被写体ＯＢとの間の僅かな距離の違いやレンズ固有の特性差等により、各画像に写し込まれた被写体ＯＢの絶対座標に対する寸法が各画像において異なる可能性がある。キャリブレーション部１１１３は、このような寸法誤差を較正するために、各画像のサイズを正規化する処理を行う。また、キャリブレーション部１１１３は、同時に各画像データの輝度やコントラストを一致させる補正を行ってもよい。 Due to a slight difference in distance between each camera 201 to 204 and the subject OB, a difference in characteristics peculiar to the lens, and the like, the dimensions of the subject OB imprinted on each image with respect to the absolute coordinates may differ in each image. The calibration unit 1113 performs a process of normalizing the size of each image in order to calibrate such a dimensional error. Further, the calibration unit 1113 may simultaneously perform correction to match the brightness and contrast of each image data.

同期および正規化された画像データは、データ圧縮部１１１４によりＭＰＥＧ等の所定のフォーマットに圧縮処理され、ネットワークアダプタ部１１１６を介して画像処理サーバ１２０に伝送される。画像処理サーバ１２０は、後述するように、カメラ２０１～２０４のフレーム画像データに基づいて被写体（人物）ＯＢの三次元動画モデルを生成するための高速演算処理を行うクラウド上のプラットフォームである。また、画像データは、例えば大容量のＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）である内部のストレージ１１１５に保管されてもよい。 The synchronized and normalized image data is compressed into a predetermined format such as MPEG by the data compression unit 1114, and transmitted to the image processing server 120 via the network adapter unit 1116. As will be described later, the image processing server 120 is a platform on the cloud that performs high-speed arithmetic processing for generating a three-dimensional moving image model of a subject (person) OB based on frame image data of cameras 201 to 204. Further, the image data may be stored in an internal storage 1115, which is, for example, a large-capacity SSD (Solid State Drive).

画像処理サーバ１２０からネットワークアダプタ部１１１６に返送された被写体ＯＢの三次元動画モデルデータは、フォーマット変換部１１１７でデータの伸張等の処理が施され、時間情報と座標情報とが分離されて、フレーム毎のポリゴンメッシュモデルデータに変換される。そして、変換されたポリゴンメッシュモデルデータは、フレームレート等の時間情報と関連付けられてストレージ１１１５に記憶される。 The 3D moving image model data of the subject OB sent back from the image processing server 120 to the network adapter unit 1116 is processed by the format conversion unit 1117 such as data expansion, and the time information and the coordinate information are separated to form a frame. It is converted to each polygon mesh model data. Then, the converted polygon mesh model data is stored in the storage 1115 in association with time information such as a frame rate.

レンダリング処理部１１１８は、ストレージ１１１５に記憶された被写体ＯＢの三次元モデルを、ディスプレイ１１２または外付けのモニタリング装置（例えばスマートフォン１１３など）に再生表示可能な二次元の自由視点画像に変換処理する。ユーザは、ユーザインタフェース部１１９を介して、必要であればスマートフォン１１３を操作して視点位置をレンダリング処理部１１１８に指示することができる。視点は被写体ＯＢの３６０°周囲のどの位置であってもよく、また制限された範囲内で視点の高さを任意に変えることもできる。また、ユーザは、再生中の三次元モデル動画像に対し、視点位置を任意に移動させて向きを変える指示をすることができる。更に、ユーザは、三次元モデル画像の特定部分のズームインやズームアウトさせた任意視点画像もディスプレイ１１２等に表示させることができる。 The rendering processing unit 1118 converts the three-dimensional model of the subject OB stored in the storage 1115 into a two-dimensional free-viewpoint image that can be reproduced and displayed on the display 112 or an external monitoring device (for example, a smartphone 113). The user can instruct the rendering processing unit 1118 to indicate the viewpoint position by operating the smartphone 113, if necessary, via the user interface unit 119. The viewpoint may be at any position around 360 ° of the subject OB, and the height of the viewpoint can be arbitrarily changed within a limited range. In addition, the user can give an instruction to arbitrarily move the viewpoint position to change the direction of the three-dimensional model moving image being reproduced. Further, the user can also display an arbitrary viewpoint image in which a specific portion of the three-dimensional model image is zoomed in or out on the display 112 or the like.

次に、図６の機能ブロック図を参照して、画像処理サーバ１２０の構成を説明する。画像処理サーバ１２０は、主として入出力部１２０１、画像処理部１２０２およびルーティング処理部１２０３を有している。 Next, the configuration of the image processing server 120 will be described with reference to the functional block diagram of FIG. The image processing server 120 mainly has an input / output unit 1201, an image processing unit 1202, and a routing processing unit 1203.

入出力部１２０１は、データ受信部１２０４と、データ送信部１２０５とを含む。複数のカメラ２０１～２０４で複数の方向から撮影された被写体ＯＢの画像データは、コントロールボックス１１１およびインターネットＮを経由して、入出力部１２０１のデータ受信部１２０４で受信される。 The input / output unit 1201 includes a data receiving unit 1204 and a data transmitting unit 1205. The image data of the subject OB taken from a plurality of directions by the plurality of cameras 201 to 204 is received by the data receiving unit 1204 of the input / output unit 1201 via the control box 111 and the Internet N.

画像処理部１２０２は、前処理部１２０６、オブジェクト抽出部１２０７および三次元モデル生成部１２０８を含む。前処理部１２０６は、受信された画像データに対し、カメラの識別子情報、カメラパラメータ、コントロールボックス１１１のアドレス情報等のメタ情報を分離するとともに所定の伸張処理を施してフレームデータをオブジェクト抽出部１２０７に送る。 The image processing unit 1202 includes a preprocessing unit 1206, an object extraction unit 1207, and a three-dimensional model generation unit 1208. The preprocessing unit 1206 separates meta information such as camera identifier information, camera parameters, and control box 111 address information from the received image data, and performs predetermined decompression processing to extract frame data from the object extraction unit 1207. Send to.

この段階では、フレーム画像は被写体ＯＢとその周囲の背景を含んでいる。オブジェクト抽出部１２０７は、各フレームの画像データに含まれる輪郭のエッジを検出することにより、被写体ＯＢが写された画像領域を抽出する処理を行う。エッジ検出法は、輪郭の境界における輝度変化を一次微分した値に基づく勾配法、またはその輝度変化を二次微分した値に基づくラプラシアン法を用いることが好ましい。 At this stage, the frame image includes the subject OB and the background around it. The object extraction unit 1207 performs a process of extracting an image area in which the subject OB is captured by detecting the edge of the contour included in the image data of each frame. As the edge detection method, it is preferable to use a gradient method based on the value obtained by first-derivating the change in luminance at the boundary of the contour, or a Laplacian method based on the value obtained by making the change in luminance second derivative.

本実施形態においては、上述したように、カメラ２０１～２０４のレンズの被写界深度が±０．５～±１．５ｍと比較的浅く設定されている。そのため、撮影された画像のうち、被写体ＯＢの領域のみ焦点が合い、その周囲の背景の領域はぼやけている。したがって、背景にグリーンバックを設置したり、被写体ＯＢにメッシュ光を投光しなくても、被写体ＯＢの輪郭のエッジを検出するだけで、比較的容易にかつ正確に、被写体ＯＢの輪郭によって画される画像領域（被写体画像データ）を抽出することができる。 In the present embodiment, as described above, the depth of field of the lenses of the cameras 201 to 204 is set to be relatively shallow, ± 0.5 to ± 1.5 m. Therefore, in the captured image, only the area of the subject OB is in focus, and the surrounding background area is blurred. Therefore, without installing a green background in the background or projecting mesh light onto the subject OB, only the edge of the contour of the subject OB is detected, and the image is relatively easily and accurately drawn by the contour of the subject OB. The image area (subject image data) to be created can be extracted.

背景画像から分離された被写体画像データは三次元モデル生成部１２０８に送られる。三次元モデル生成部１２０８は、複数の方向から撮影された被写体画像データに基づいて、例えばステレオカメラの原理を用いて、被写体ＯＢの三次元動画モデルを生成する。 The subject image data separated from the background image is sent to the three-dimensional model generation unit 1208. The 3D model generation unit 1208 generates a 3D moving image model of the subject OB based on the subject image data taken from a plurality of directions, for example, using the principle of a stereo camera.

ルーティング処理部１２０３は、生成した三次元動画モデルデータの転送先として、その元となる画像データを送信したコントロールボックス１１１のアドレスを設定する。そして、データ送信部１２０５は、設定されたインターネットＮ上のアドレスのコントロールボックス１１１に、生成された三次元動画モデルデータを送信する。 The routing processing unit 1203 sets the address of the control box 111 to which the original image data is transmitted as the transfer destination of the generated three-dimensional moving image model data. Then, the data transmission unit 1205 transmits the generated three-dimensional moving image model data to the control box 111 of the set address on the Internet N.

以上、本発明の一つの好適な実施形態を説明したが、本発明の実施にあたっては、その趣旨を逸脱しない範囲で種々の改変が可能である。例えば、上述のコントロールボックス１１１で実行される画像データ処理機能を、インターネットＮ上の画像処理サーバ１２０が有してもよい。また、クラウドサーバを用いずに、三次元モデルを生成するための一連の演算処理をスタンドアロン型のコントロールボックス１１１が実行する態様であってもよい。 Although one preferred embodiment of the present invention has been described above, various modifications can be made in carrying out the present invention without departing from the spirit of the present invention. For example, the image processing server 120 on the Internet N may have an image data processing function executed by the control box 111 described above. Further, the stand-alone type control box 111 may execute a series of arithmetic processes for generating a three-dimensional model without using a cloud server.

更に、図７に示すように、撮影装置２０を、小型のカメラを有する例えばスマートフォン２５１～２５４等の携帯電子機器で構成してもよい。スマートフォン２５１～２５４の各カメラ開口部には、カメラの被写界深度を浅く設定するための専用のレンズユニット２６１～２６４が装着される。撮影用のスマートフォン２５１～２５４と、操作制御用のスマートフォン２５５には、ユーザが立体撮影モデル生成サービスを受けるためのアプリケーションソフトがインストールされている。ユーザは、スマートフォン２５５のアプリを使って、複数のスマートフォン２５１～２５４で撮影した被写体ＯＢの画像データを、スイッチングハブ３００および回線モデム３１０によりインターネットＮ経由で演算処理装置１０に送信することができる。 Further, as shown in FIG. 7, the photographing device 20 may be configured by a portable electronic device having a small camera, for example, smartphones 251 to 254. Dedicated lens units 261 to 264 for setting a shallow depth of field of the camera are attached to the camera openings of the smartphones 251 to 254. Application software for the user to receive the stereoscopic photography model generation service is installed in the smartphones 251 to 254 for photography and the smartphones 255 for operation control. The user can use the application of the smartphone 255 to transmit the image data of the subject OB taken by the plurality of smartphones 251 to 254 to the arithmetic processing device 10 via the Internet N by the switching hub 300 and the line modem 310.

演算処理装置１０で生成処理された、被写体ＯＢの立体撮影モデルおよび／またはその自由視点画像は、インターネットＮを介してスマートフォン２５５に転送またはダウンロードすることができる。それにより、ユーザは、スマートフォン２５５を操作して、手軽に被写体ＯＢの立体撮影画像をその画面上に再生することができる。 The stereoscopic image of the subject OB and / or its free viewpoint image generated and processed by the arithmetic processing unit 10 can be transferred or downloaded to the smartphone 255 via the Internet N. As a result, the user can operate the smartphone 255 to easily reproduce the stereoscopic image of the subject OB on the screen.

以上説明した実施形態の立体撮影モデル生成システムおよびそのシステムで実行される立体撮影モデル生成方法によれば、被写界深度が比較的浅い撮像レンズを撮像装置に用いることにより、被写体のみ焦点が合い、その周囲の背景がぼやけた動画像を撮影することができる。またそのような被写界深度が浅い動画像を用いれば、撮影場所にグリーンバックを設置したり、被写体にメッシュ光を投光するためのプロジェクタを設けなくても、被写体画像を、単純なエッジ検出処理によって容易にかつ正確に背景画像から分離・抽出することができる。したがって、被写体を抽出するまでの画像データ処理負荷を大幅に低減でき、それにより一般的なユーザであっても手軽に扱うことができ、そして簡素な構成のシステムを提供することができる。 According to the stereoscopic photography model generation system of the embodiment described above and the stereoscopic photography model generation method executed by the system, only the subject is focused by using an image pickup lens having a relatively shallow depth of field for the image pickup device. , It is possible to take a moving image with a blurred background around it. In addition, if such a moving image with a shallow depth of field is used, the subject image can be simply edged without installing a green background at the shooting location or a projector for projecting mesh light onto the subject. It can be easily and accurately separated and extracted from the background image by the detection process. Therefore, the image data processing load until the subject is extracted can be significantly reduced, so that even a general user can easily handle the image, and it is possible to provide a system having a simple configuration.

１０演算処理装置
２０撮影装置
１１０コントローラ
１１１コントロールボックス
１１２ディスプレイ
１１３端末装置（スマートフォン）
１２０画像処理サーバ
２０１～２０４カメラ
２１１～２１４カメラアダプタ
２２０撮像素子（ＣＣＤ）
２２１撮像レンズ系
２３０マット
２４０通信ケーブル
２５１～２５５スマートフォン
２６１～２６４レンズユニット
３００スイッチングハブ
１１１０同期信号生成部
１１１２データ同期部
１１１５ストレージ
１１１６ネットワークアダプタ部
１１１８レンダリング処理部
１１１９ユーザインタフェース部
１２０１入出力部
１２０２画像処理部
１２０３ルーティング処理部
１２０７オブジェクト抽出部
１２０８三次元モデル生成部
Ｎインターネット
Ａｘ光軸
ＣＩＦ焦点が合う位置
ＤＯＦ被写界深度
ＯＢ被写体、オブジェクト
ＳＥエッジ検出精度、シャープネス
ＴＰテストパターン 10 Arithmetic processing device 20 Imaging device 110 Controller 111 Control box 112 Display 113 Terminal device (smartphone)
120 Image processing server 201-204 Camera 211-214 Camera adapter 220 Image sensor (CCD)
221 Imaging lens system 230 Matt 240 Communication cable 251 to 255 Smartphone 261 to 264 Lens unit 300 Switching hub 1110 Synchronization signal generation unit 1112 Data synchronization unit 1115 Storage 1116 Network adapter unit 1118 Rendering processing unit 1119 User interface unit 1201 Input / output unit 1202 Image Processing unit 1203 Routing processing unit 1207 Object extraction unit 1208 Three-dimensional model generation unit N Internet Ax Optical axis CIF Focused position DOF Depth of field OB Subject, object SE Edge detection accuracy, sharpness TP test pattern

本発明は、複数の方向から撮影した被写体の画像データに基づいて該被写体の三次元モデルを生成するための実写立体撮影技術（ＶｏｌｕｍｅｔｒｉｃＣａｐｔｕｒｅ）に関する。 The present invention relates to a live-action stereoscopic photography technique (Volumetric Capture) for generating a three-dimensional model of a subject based on image data of the subject taken from a plurality of directions.

近年、マーカーを使わずに、３６０度全方位から対象となる人や物の動きを撮影することで、被写体の三次元モデルを生成することができる実写立体撮影技術（ＶｏｌｕｍｅｔｒｉｃＣａｐｔｕｒｅ）が注目されている。本技術によれば、複数のカメラで同期撮影した人物等の動画像からフレーム毎に被写体を抽出し三次元モデル化することで、被写体モデルを仮想空間の任意の位置に配置してその動きをディスプレイ上で自由に再現することができる。複数の視点から撮影した画像を演算処理し、任意の視点から見えるように再現された画像は「自由視点画像」または「仮想視点画像」と呼ばれ、例えば特許文献１には、そのような仮想視点画像を生成するシステムが開示されている。 In recent years, a live-action stereoscopic photography technique (Volumetric Capture) that can generate a three-dimensional model of a subject by photographing the movement of a target person or object from all directions of 360 degrees without using a marker has attracted attention. There is. According to this technology, by extracting a subject for each frame from moving images of people taken synchronously by multiple cameras and creating a three-dimensional model, the subject model can be placed at any position in the virtual space and its movement can be measured. It can be freely reproduced on the display. An image obtained by arithmetically processing an image taken from a plurality of viewpoints and reproduced so as to be visible from an arbitrary viewpoint is called a "free viewpoint image" or a "virtual viewpoint image". For example, in Patent Document 1, such a virtual image is used. A system for generating a viewpoint image is disclosed.

ＶｏｌｕｍｅｔｒｉｃＣａｐｔｕｒｅにおいては、上述したように撮影したフレーム画像から背景を排除し対象とする被写体を抽出する必要がある。被写体と背景とを区別するための手法として、従来、被写体にメッシュ光を投影する方法や、背景にグリーンバックを使用する方法などが採られている。しかし、これら従来の手法では、撮影の場にメッシュ光を投影するためのプロジェクタやグリーンバックなどを設置しなければならず、設備が整えられたスタジオ等でなければ撮影することができない。また、撮影した画像データに対しオブジェクト抽出処理を施して被写体を抽出する手法も考えられる。しかし、複数の視点から撮影された毎秒数十フレームの高精細動画像データに対し、フレーム毎に逐一オブジェクト（被写体画像）を抽出するには、高速大容量の処理能力を有する大型のプラットフォームが必要となる。 In the Volumetric Capture, it is necessary to remove the background from the frame image taken as described above and extract the target subject. Conventionally, as a method for distinguishing between a subject and a background, a method of projecting mesh light on the subject and a method of using a green background as the background have been adopted. However, in these conventional methods, it is necessary to install a projector, a green background, or the like for projecting mesh light in the shooting place, and shooting can only be performed in a studio equipped with equipment. It is also conceivable to consider a method of extracting a subject by performing an object extraction process on the captured image data. However, in order to extract objects (subject images) one by one for each frame of high-definition moving image data of several tens of frames per second taken from multiple viewpoints, a large platform with high-speed and large-capacity processing capacity is required. It becomes.

上述した課題を解決するため、本発明は、被写体を複数の方向から撮影するための複数の撮影装置と、前記複数の撮影装置が前記被写体を撮影することで得られる画像データに基づいて前記被写体の輪郭を検出し、該被写体の立体撮影モデルを生成するよう構成された演算処理装置とを含む立体撮影モデル生成システムであって、前記撮影装置に用いられる撮像レンズ系の被写界深度が、光軸上において焦点が合う位置を中心に±０．５～±１．５ｍに設定されている、立体撮影モデル生成システムである。 In order to solve the above-mentioned problems, the present invention has a plurality of photographing devices for photographing a subject from a plurality of directions, and the subject based on image data obtained by the plurality of photographing devices photographing the subject. A stereoscopic photography model generation system including an arithmetic processing device configured to detect the contour of the subject and generate a stereoscopic photography model of the subject, wherein the depth of field of the image pickup lens system used in the image pickup device is determined. This is a stereoscopic photography model generation system set to ± 0.5 to ± 1.5 m around the focal position on the optical axis.

また、本発明は、複数の撮影装置と、前記複数の撮影装置が被写体を撮影することで得られる画像データに基づいて該被写体の立体撮影モデルを生成する演算処理装置とを含み、前記撮影装置に用いられる撮像レンズ系の被写界深度が、光軸上において焦点が合う位置を中心に±０．５～±１．５ｍに設定されているシステムにおいて行われる立体撮影モデル生成方法であって、前記複数の撮影装置により前記被写体を複数の方向から撮影するステップと、前記演算処理装置が実行する処理が、前記複数の撮影装置により撮影された画像データから前記被写体の輪郭を検出するステップと、検出された前記輪郭によって画される被写体画像データに基づいて該被写体の立体撮影モデルを生成するステップとを含む立体撮影モデル生成方法である。 The present invention also includes a plurality of photographing devices and an arithmetic processing device that generates a stereoscopic photography model of the subject based on image data obtained by the plurality of photographing devices shooting the subject. This is a stereoscopic photography model generation method performed in a system in which the depth of field of the image pickup lens system used in the above is set to ± 0.5 to ± 1.5 m centered on the position of focus on the optical axis. A step of photographing the subject from a plurality of directions by the plurality of photographing devices, and a step of detecting the contour of the subject from the image data photographed by the plurality of photographing devices in the process executed by the arithmetic processing device. It is a stereoscopic photography model generation method including a step of generating a stereoscopic photography model of the subject based on the subject image data imaged by the detected contour.

立体撮影モデル生成方法において、前記被写体が前記被写界深度の内にある場合の前記輪郭のエッジ検出精度の最大値に対し、該被写体が前記被写界深度の外にある場合のエッジ検出精度の比が０．７以下であることが好ましい。 In the stereoscopic photography model generation method, the edge detection accuracy when the subject is outside the depth of field is compared with the maximum value of the edge detection accuracy of the contour when the subject is within the depth of field. The ratio of is preferably 0.7 or less.

また、立体撮影モデル生成方法において、前記エッジ検出精度が、前記輪郭の境界における輝度変化を一次微分した値に基づいて算定されることが好ましい。または、三次元画像生成方法は、前記エッジ検出精度が、前記輪郭の境界における輝度変化を二次微分した値に基づいて算定されるものでもよい。 Further, in the stereoscopic photography model generation method, it is preferable that the edge detection accuracy is calculated based on the value obtained by first-derivating the change in luminance at the boundary of the contour. Alternatively, the three-dimensional image generation method may be one in which the edge detection accuracy is calculated based on a value obtained by quadrically differentiating the change in luminance at the boundary of the contour.

また、立体撮影モデル生成方法において、前記演算処理装置が実行する処理が、前記立体撮影モデルに基づいて自由視点画像を生成するステップを更に含むことが好ましい。 Further, in the stereoscopic photography model generation method, it is preferable that the process executed by the arithmetic processing unit further includes a step of generating a free viewpoint image based on the stereoscopic photography model.

また、立体撮影モデル生成方法において、前記複数の撮影装置が被写体を撮影することで得られる画像データが動画像データであり、前記演算処理装置は、複数の前記動画像データを同期して前記被写体の動的な立体撮影モデルを生成することが好ましい。 Further, in the stereoscopic photography model generation method, the image data obtained by photographing the subject by the plurality of photographing devices is the moving image data, and the arithmetic processing unit synchronizes the plurality of the moving image data with the subject. It is preferable to generate a dynamic stereoscopic photography model of.

図２に示すように、本実施形態のカメラ２０１～２０４に用いられる撮像レンズ系２２１は、その被写界深度（ＤＯＦ；ＤｅｐｔｈｏｆＦｉｅｌｄ）が、光軸Ａｘにおいてレンズの焦点がちょうど合う中央位置（ＣＩＦ；ＣｅｎｔｅｒｉｎＦｏｃｕｓ）を中心に±０．５～±１．５ｍに設定されている。ただし、各カメラ２０１～２０４の光学的な特性は共通しており、カメラ２０１～２０４間では被写界深度特性そのもののばらつきはないものとする。 As shown in FIG. 2, the image pickup lens system 221 used in the cameras 201 to 204 of the present embodiment has a central position where the depth of field (DOF) of the image pickup lens system 221 is exactly the focus of the lens on the optical axis Ax. It is set to ± 0.5 to ± 1.5 m centering on (CIF; Center in Focus). However, it is assumed that the optical characteristics of the cameras 201 to 204 are common, and there is no variation in the depth of field characteristics themselves between the cameras 201 to 204.

フレーム生成・伝送部２１７は、フレームメモリ２１９上で画像データをフォーマット変換しフレームデータを生成する。フレームデータには、撮影された時刻を示すタイムコードや、撮影したカメラの識別子情報、カメラパラメータ（フォーカス値、ズーム値、レンズ特性等）などのメタ情報（付属情報）が付与されファイル化される。フレーム生成・伝送部２１７は、作成したフレーム画像データをコントローラ１１０に伝送する。 The frame generation / transmission unit 217 formats and converts the image data on the frame memory 219 to generate the frame data. Meta information (attached information) such as time code indicating the shooting time, identifier information of the shooting camera, camera parameters (focus value, zoom value, lens characteristics, etc.) is added to the frame data and filed. .. The frame generation / transmission unit 217 transmits the created frame image data to the controller 110.

Claims

Multiple shooting devices for shooting subjects from multiple directions, and
A stereoscopic imaging model generation system including an arithmetic processing unit configured to generate a stereoscopic imaging model of the subject based on image data obtained by photographing the subject by the plurality of imaging devices.
A stereoscopic photography model generation system in which the depth of field of the image pickup lens system used in the photographing apparatus is set to ± 0.5 to ± 1.5 m centered on the position of focus on the optical axis.

The ratio of the edge detection accuracy when the subject is outside the depth of field is 0.7 or less to the maximum value of the edge detection accuracy of the contour when the subject is within the depth of field. The stereoscopic photography model generation system according to claim 1.

The stereoscopic photography model generation system according to claim 2, wherein the edge detection accuracy is calculated based on a value obtained by first-derivating a change in luminance at the boundary of the contour.

The stereoscopic photography model generation system according to claim 2, wherein the edge detection accuracy is calculated based on a value obtained by secondarily differentiating the change in luminance at the boundary of the contour.

The stereoscopic photography model generation system according to any one of claims 1 to 4, wherein the arithmetic processing apparatus has a free viewpoint image generation unit that generates a free viewpoint image based on the stereoscopic photography model.

An image pickup lens system used in the image pickup device, which includes a plurality of image pickup devices and an arithmetic processing device that generates a stereoscopic photography model of the subject based on image data obtained by the plurality of image pickup devices taking a picture of the subject. This is a stereoscopic photography model generation method performed in a system in which the depth of field of is set to ± 0.5 to ± 1.5 m centered on the position of focus on the optical axis.
A step of photographing the subject from a plurality of directions by the plurality of photographing devices, and a step of photographing the subject from a plurality of directions.
The processing executed by the arithmetic processing unit is
A step of detecting the contour of the subject from image data taken by the plurality of photographing devices, and a step of detecting the contour of the subject.
A three-dimensional image generation method including a step of generating a stereoscopic shooting model of the subject based on the subject image data drawn by the detected contour.

The ratio of the edge detection accuracy when the subject is outside the depth of field is 0.7 or less to the maximum value of the edge detection accuracy of the contour when the subject is within the depth of field. The three-dimensional image generation method according to claim 6.

The stereoscopic photography model generation method according to claim 7, wherein the edge detection accuracy is calculated based on a value obtained by first-derivating the change in luminance at the boundary of the contour.

The stereoscopic photography model generation method according to claim 7, wherein the edge detection accuracy is calculated based on a value obtained by secondarily differentiating the change in luminance at the boundary of the contour.

The stereoscopic imaging model generation method according to any one of claims 6 to 9, wherein the processing executed by the arithmetic processing unit further includes a step of generating a free viewpoint image based on the stereoscopic imaging model.

The image data obtained by photographing the subject by the plurality of photographing devices is the moving image data.
The stereoscopic imaging model generation method according to any one of claims 6 to 10, wherein the arithmetic processing unit synchronizes a plurality of the moving image data to generate a dynamic stereoscopic imaging model of the subject.