JP2019032660A

JP2019032660A - Imaging system and imaging method

Info

Publication number: JP2019032660A
Application number: JP2017152604A
Authority: JP
Inventors: 格北原; Itaru Kitahara; 英彦宍戸; Hidehiko Shishido
Original assignee: University of Tsukuba NUC
Current assignee: University of Tsukuba NUC
Priority date: 2017-08-07
Filing date: 2017-08-07
Publication date: 2019-02-28
Anticipated expiration: 2037-08-07
Also published as: JP7033294B2

Abstract

To provide an imaging system and an imaging method capable of calibrating camera parameters with high accuracy by arranging cameras in a large space subject to be captured even when camera arrangement is comparatively sparse.SOLUTION: An imaging system comprises: a first imaging unit for capturing an imaging object in a three-dimensional space by a camera that captures a plurality of two-dimensional images; a second imaging unit for capturing the object in a state in which it is oriented substantially in the same direction as a camera positioned in the vicinity of the plurality of cameras; an acquisition unit for acquiring an image group captured from multiple viewpoints on the basis of a complementary image captured by the second imaging unit and the image captured by the first imaging unit; an estimating unit for estimating a projective transformation matrix of the image captured from the multiple viewpoints by applying weak proofreading to the image group acquired by the acquisition unit; and a calibration unit for executing highly accurate estimation of camera parameters of the first imaging unit from the projective transformation matrix estimated by the estimating unit.SELECTED DRAWING: Figure 1

Description

本発明は、多視点からの映像を安価かつ簡便に実現する撮像システム、撮像方法に関する。 The present invention relates to an imaging system and an imaging method that can easily and inexpensively realize images from multiple viewpoints.

多視点から撮像した映像を用いて、３次元的な目標物追跡や３次元な物体形状復元など、空間における被撮像体を推定する撮像方法の研究開発が盛んに行われている。撮影された画像と３次元空間と関係を求めるために、２次元的な映像を撮像するカメラの正確なカメラパラメータ推定（カメラキャリブレーション）が必要である。ここで”カメラ”とは、広く一般的に利用されている２次元映像を撮像する機器を表す。基本的なカメラキャリブレーション処理の方法として弱校正がある。 Research and development of imaging methods for estimating an object to be imaged in space, such as three-dimensional object tracking and three-dimensional object shape restoration, using an image captured from multiple viewpoints are actively conducted. In order to obtain the relationship between the captured image and the three-dimensional space, accurate camera parameter estimation (camera calibration) of a camera that captures a two-dimensional image is necessary. Here, the “camera” represents a device that captures a two-dimensional image that is widely used in general. There is weak calibration as a basic camera calibration processing method.

特開２０１５−１２６４０１号公報JP2015-126401A

しかし、弱校正を行う場合、撮像対象とする３次元空間の広がりに比べ、カメラが疎に配置されていると、撮像した映像間に十分な対応点や相関関係が得られず、射影関係の推定精度が低下し、撮像映像から復元される３次元情報と実際の３次元空間中に存在する撮像対象物との幾何学的な関係に大きな乖離を生ずることが知られている。 However, when performing weak calibration, if the cameras are arranged sparsely compared to the extent of the three-dimensional space to be imaged, sufficient corresponding points and correlations cannot be obtained between the captured images, and the projection relationship It is known that the estimation accuracy is lowered and a large divergence is caused in the geometric relationship between the three-dimensional information restored from the captured image and the imaging object existing in the actual three-dimensional space.

本発明は、カメラを大規模な撮像対象空間に配置することによって、カメラ配置が比較的疎となる場合にも、カメラパラメータを高精度でキャリブレーションすることができる撮像システム、撮像方法を提供することを目的とする。 The present invention provides an imaging system and an imaging method that can calibrate camera parameters with high accuracy even when the camera arrangement is relatively sparse by arranging the camera in a large imaging target space. For the purpose.

撮像システムは、複数の２次元映像を撮像するカメラにより３次元空間中の撮像対象物を撮像する第１撮像部と、前記複数のカメラの近傍に位置するカメラとほぼ同方向を向いた状態で対象物を撮像する第２撮像部と、前記第２撮像部により撮像された補完映像と、前記第１撮像部により撮像された映像とに基づいて、多視点から撮像された画像群を取得する取得部と、前記取得部により取得された前記画像群に対して弱校正を適用することにより、前記多視点から撮像された画像の射影変換行列を推定する推定部と、前記推定部により推定された射影変換行列から、前記第１撮像部のカメラパラメータの高精度推定を実行するキャリブレーション部と、を備える。 In the imaging system, the first imaging unit that captures an imaging target in a three-dimensional space by a camera that captures a plurality of two-dimensional images, and a camera that is positioned in the same direction as a camera that is located near the plurality of cameras. An image group captured from multiple viewpoints is acquired based on a second imaging unit that captures an object, a complementary image captured by the second imaging unit, and an image captured by the first imaging unit. An estimation unit that estimates a projective transformation matrix of an image captured from the multi-viewpoint by applying weak calibration to the image group acquired by the acquisition unit, and the estimation unit. A calibration unit that performs high-accuracy estimation of camera parameters of the first imaging unit from the projection transformation matrix.

本発明に係る撮像システム、撮像方法によると、疎に配置したカメラの外部パラメータ（位置、姿勢）および内部パラメータ（焦点距離、画像中心、レンズ歪み、画素の縦横比）を高精度に推定する、つまりカメラキャリブレーションすることができ、３次元空間中の撮像対象物の高品位３次元情報を安価かつ簡便に推定ことが可能となる。 According to the imaging system and imaging method of the present invention, external parameters (position, orientation) and internal parameters (focal length, image center, lens distortion, pixel aspect ratio) of a sparsely arranged camera are estimated with high accuracy. That is, camera calibration can be performed, and high-quality three-dimensional information of the imaging target in the three-dimensional space can be estimated inexpensively and easily.

提案手法による撮像方式を示す図である。It is a figure which shows the imaging system by a proposal method. 提案手法による撮像方式から取得された補完画像である。It is a complementary image acquired from the imaging method by the proposed method. 提案手法による撮像方式から取得された補完画像である。It is a complementary image acquired from the imaging method by the proposed method. 提案手法による撮像方式から取得された補完画像である。It is a complementary image acquired from the imaging method by the proposed method. 提案手法による撮像方式から取得された補完画像である。It is a complementary image acquired from the imaging method by the proposed method. 提案手法による撮像方式から取得された補完画像である。It is a complementary image acquired from the imaging method by the proposed method. 提案手法による撮像方式から取得された補完画像である。It is a complementary image acquired from the imaging method by the proposed method. 提案手法による撮像方式から取得された補完画像である。It is a complementary image acquired from the imaging method by the proposed method. 撮像環境における世界座標系を示す図である。It is a figure which shows the world coordinate system in an imaging environment. 撮像環境における世界座標系を示す図である。It is a figure which shows the world coordinate system in an imaging environment. 提案手法による世界座標系の推定値と真値の比較結果を示す図である。It is a figure which shows the comparison result of the estimated value and true value of the world coordinate system by a proposal method. Ｘ_ｏと真値とのユークリッド距離を計算して算出された誤差を示す図である。Is a diagram showing an X _o and the error calculated by calculating the Euclidean distance between the true value. Ｙ_ｏと真値とのユークリッド距離を計算して算出された誤差を示す図である。It is a diagram illustrating a Y _o and errors calculated by calculating the Euclidean distance between the true value. 撮像画像にConvolutional Pose Machinesを適用して検出した２次元骨格情報とそこから推定される３次元骨格を示す図である。It is a figure which shows the two-dimensional skeleton information detected by applying Convolutional Pose Machines to a captured image, and the three-dimensional skeleton estimated therefrom. 撮像画像にConvolutional Pose Machinesを適用して検出した２次元骨格情報とそこから推定される３次元骨格を示す図である。It is a figure which shows the two-dimensional skeleton information detected by applying Convolutional Pose Machines to a captured image, and the three-dimensional skeleton estimated therefrom. 提案手法に基づいて推定された３次元骨格位置（補完画像毎）を示す図である。It is a figure which shows the three-dimensional skeleton position (for every complement image) estimated based on the proposal method. ３次元骨格の首下の位置座標を用いた推定精度の比較結果を示す図である。It is a figure which shows the comparison result of the estimation precision using the position coordinate under the neck of a three-dimensional skeleton.

以下、本発明による多視点映像撮像システム、撮像方法の実施形態を、図面を参照して説明する。 Hereinafter, embodiments of a multi-view video imaging system and an imaging method according to the present invention will be described with reference to the drawings.

１．はじめに
３次元空間における被撮像体の３次元情報復元で必要となる、撮像対象の３次元空間とカメラにより撮像された２次元映像の射影関係を求めるためには、カメラの外部パラメータと内部パラメータ（合わせてカメラパラメータとする）が必要である。基本的なカメラパラメータを特定するキャリブレーション処理は、空間中に３次元位置が既知なランドマークを設置し、その観測位置との対応関係から射影変換行列を推定する強校正と呼ばれる方法である。 1. 1. Introduction In order to obtain the projection relationship between the 3D space to be imaged and the 2D image captured by the camera, which is necessary for the 3D information restoration of the imaged object in the 3D space, the external parameters and internal parameters of the camera ( Together with camera parameters). The calibration process for specifying basic camera parameters is a method called strong calibration in which a landmark having a known three-dimensional position is set in a space and a projective transformation matrix is estimated from a correspondence relationship with the observation position.

しかし、強校正を一般的な空間に適用する場合、ランドマークの設置作業の手間や費用が問題となる。他方、ランドマークを必要としない方法として弱校正がある。多視点画像間の対応情報からカメラ間の相対的な位置姿勢情報としての外部パラメータや内部パラメータを推定する。しかし、カメラが疎に配置されている場合、撮像された２次元映像間において十分な対応点が得られず、射影関係の推定精度が低下するという問題が存在する。体育館やスタジアムなどの大規模空間では、密にカメラを配置することが難しい場合が多く、疎に配置したカメラを高精度でキャリブレーションする手法の実現が望まれている。 However, when applying strong calibration to a general space, the labor and cost of landmark installation work become a problem. On the other hand, weak calibration is a method that does not require landmarks. External parameters and internal parameters as relative position and orientation information between cameras are estimated from correspondence information between multi-viewpoint images. However, when the cameras are arranged sparsely, sufficient corresponding points cannot be obtained between the captured two-dimensional images, and there is a problem that the estimation accuracy of the projection relationship is lowered. In large-scale spaces such as gymnasiums and stadiums, it is often difficult to arrange cameras densely, and it is desired to realize a technique for calibrating sparsely arranged cameras with high accuracy.

本研究では、モバイルカメラで移動しながら撮影した映像と疎に配置したカメラの画像を統合することにより、安価かつ簡便に密な多視点画像群を構築し弱校正の推定精度を向上することで、カメラパラメータの推定精度を向上させる。 In this study, by integrating video captured while moving with a mobile camera and sparsely arranged camera images, a dense multi-viewpoint image group can be constructed easily and inexpensively to improve the accuracy of weak calibration estimation. Improve camera parameter estimation accuracy.

２．関連研究
カメラパラメータの推定法として、ランドマーク（チェッカーボード）を利用した手法が代表的である。カメラパラメータの推定精度向上を目的とした研究では、動的なシルエットからエピポーラ幾何を計算する方法や、カラーコードの活用などが知られており、カメラパラメータの推定誤差を縮小する。また、水中や医療内視鏡など、カメラキャリブレーションの困難な環境や用途に対応した研究も報告されている。 2. Related research A typical method for estimating camera parameters is to use landmarks (checkerboards). In research aimed at improving the estimation accuracy of camera parameters, methods of calculating epipolar geometry from dynamic silhouettes and the use of color codes are known to reduce camera parameter estimation errors. Research has also been reported for environments and applications where camera calibration is difficult, such as underwater and medical endoscopes.

上述の例では、何れも比較的小規模な３次元空間を対象としている。一方で、大規模空間では空間全体を網羅するようにランドマークを配置する方法以外に現実的な手法は取られておらず、多くの労力や費用が必要である。この問題を解決するために、虹を利用したカメラキャリブレーション手法が提案されたが、自然現象である虹は利用できるときと場所を特定することが極めて困難であり、大規模空間用ランドマークとしての適用は極めて困難である。 In the above-described examples, all are targeted for a relatively small three-dimensional space. On the other hand, in a large-scale space, there is no practical method other than a method of arranging landmarks so as to cover the entire space, and much labor and cost are required. In order to solve this problem, a camera calibration method using a rainbow has been proposed, but it is extremely difficult to specify when and where the rainbow, which is a natural phenomenon, can be used, and it is a landmark for large-scale spaces. Application is extremely difficult.

ランドマークの設置を必要としない、多視点から撮像された２次元映像間の対応点情報を利用したキャリブレーションを行う弱校正と呼ばれる手法が盛んに研究されている。撮像された２次元映像間の対応点を追加することによって頑健なキャリブレーションを実現した例も報告されているが、カメラが疎に配置される場合には十分な対応点を得ることが困難でありカメラパラメータ推定精度が低下する結果、復元される３次元情報の品質が劣化する問題が存在する。 A technique called weak calibration that performs calibration using corresponding point information between two-dimensional images captured from multiple viewpoints that does not require the installation of landmarks has been actively studied. Although an example of realizing robust calibration by adding corresponding points between captured two-dimensional images has been reported, it is difficult to obtain sufficient corresponding points when cameras are sparsely arranged. There is a problem that the quality of the three-dimensional information to be restored deteriorates as a result of the reduction in camera parameter estimation accuracy.

本発明の目的は、疎に配置された２次元映像撮像カメラのみで大規模空間中の撮像対象物の高品位な３次元情報を復元に資するカメラパラメータ手法を実現することである。撮像初期段階に疎に配置されたカメラの間を別のカメラにて撮像した映像（補完映像）を用い該補完映像を利用して疎設置したカメラのパラメータを高精度かつ容易に確定することで、疎に配置されたカメラからの映像のみで３次元空間中の撮像対象物の高品位な３次元情報復元を実現する。 An object of the present invention is to realize a camera parameter method that contributes to restoration of high-quality three-dimensional information of an object to be imaged in a large-scale space using only sparsely arranged two-dimensional video imaging cameras. By using a video (complementary video) captured by another camera between cameras sparsely arranged at the initial stage of imaging, the parameters of the sparsely installed camera can be determined with high accuracy and easily using the complementary video. The high-quality 3D information restoration of the object to be imaged in the 3D space is realized only by the images from the sparsely arranged cameras.

３．多視点カメラキャリブレーション手法
３．１撮影方法および弱校正を用いた射影変換行列の取得
図１に示すように、疎に配置した固定カメラによって複数視点画像を撮影する。同時に、固定カメラの間を隣接する固定カメラとほぼ同じ方向を向いた状態でモバイルカメラを移動させながらビデオを撮影する。ビデオをフレームに分割した補完画像と疎な多視点画像により、固定カメラを含む密な多視点画像群を獲得する。それらの画像群に対して弱校正を適用することにより、全ての多視点画像の射影変換行列を推定する。 3. Multi-View Camera Calibration Method 3.1 Acquisition of Projection Transformation Matrix Using Shooting Method and Weak Calibration As shown in FIG. 1, a multi-view image is shot by a fixed camera arranged sparsely. At the same time, video is taken while moving the mobile camera with the fixed camera facing the same direction as the adjacent fixed camera. A dense multi-viewpoint image group including a fixed camera is obtained by a complementary image obtained by dividing a video into frames and a sparse multi-viewpoint image. By applying weak calibration to these image groups, projective transformation matrices of all multi-view images are estimated.

推定した射影変換行列から、疎な多視点画像に対応するものを抜き出すことにより、ランドマークを設置することなく疎に配置した固定カメラの高精度なカメラキャリブレーションを実現する。さらに、推定精度を高めるためには十分な対応点が検出される必要があることから、撮影空間中に対応点が十分に取れる程の画像特徴が存在すると、より望ましい。 By extracting an object corresponding to a sparse multi-viewpoint image from the estimated projective transformation matrix, a highly accurate camera calibration of a fixed camera arranged sparsely without a landmark is realized. Furthermore, since it is necessary to detect sufficient corresponding points in order to increase the estimation accuracy, it is more desirable that there are image features that can sufficiently take corresponding points in the imaging space.

３．２３次元座標の算出
任意の点の弱校正座標系における３次元座標をＭ_ｓｆｍ＝［Ｘ_ｓ、Ｙ_ｓ、Ｚ_ｓ、１］^Ｔとし、それがカメラ座標系においてｍ＝［ｕ、ｖ、１］^Ｔで観測されている場合、弱校正座標系とカメラ座標系間の射影関係は、３．１節の手法により取得した弱校正座標系におけるカメラの射影変換行列Ｐを用いて式（１）のように表される。 3.2 Calculation of three-dimensional coordinates The three-dimensional coordinates of an arbitrary point in the weak calibration coordinate system are set as M _sfm = [X _s , Y _s , Z _s , 1] ^T, and m = [u, v, 1] When observed at ^T , the projective relationship between the weak calibration coordinate system and the camera coordinate system is expressed by using the camera's projective transformation matrix P in the weak calibration coordinate system obtained by the method of Section 3.1. It is expressed as (1).

複数視点画像において同様に射影関係を推定し、それらの射影変換行列を用いたステレオ法により、画像上での観測座標からその３次元座標を算出する。 The projection relationship is similarly estimated in the multi-viewpoint images, and the three-dimensional coordinates are calculated from the observed coordinates on the image by the stereo method using the projection transformation matrix.

３．３弱校正座標系から撮影空間の世界座標系への変換
弱校正座標系は、観測される対応点の分布に基づいて座標系が設定されるため、撮影毎に原点や各軸の方向が変化してしまう。異なる撮影データにおいて統一的な計測を実現するために、撮影空間の世界座標系を設定し、弱校正座標系から世界座標系への変換を行う。
世界座標系における任意の点をＭ_{ｗｏｒｌｄ}＝［Ｘ_ｗ、Ｙ_ｗ、Ｚ_ｗ］^Ｔとすると、弱校正座標系から世界座標系の変換は、式（２）に示すように、回転行列Ｒと並進ベクトルｔを用いた剛体変換で表される。 3.3 Conversion from weak calibration coordinate system to world coordinate system of shooting space Since the coordinate system is set based on the distribution of corresponding points observed in the weak calibration coordinate system, the origin and the direction of each axis for each shooting Will change. In order to realize uniform measurement in different shooting data, the world coordinate system of the shooting space is set and the weak calibration coordinate system is converted to the world coordinate system.
Assuming that an arbitrary point in the world coordinate system is M _world = [X _w , Y _w , Z _w ] ^T , the transformation from the weak calibration coordinate system to the world coordinate system is represented by the rotation matrix R as shown in Equation (2). It is represented by a rigid transformation using a translation vector t.

ここで、３次元剛体変換行列Ｄは、
であり、 Here, the three-dimensional rigid transformation matrix D is
And

と表され、式（４）を用いることで。弱校正座標系から世界座標系へ変換が実現される。 By using the formula (4). Conversion from weak calibration coordinate system to world coordinate system is realized.

図９、図１０に本発明の実施例の一例を示す。多視点映像の撮影シーンから、２本の直線（エッジ）が垂直に交わり、かつ、大きさが既知の物体が存在する領域を世界座標系の原点としている。ベクトルｔは、世界座標系の原点に対応した弱校正座標系の点Ｓ_ｏから原点ｏ_ｓｆｍへの平行移動量として与えられる。 9 and 10 show an example of the embodiment of the present invention. An area where two straight lines (edges) intersect perpendicularly from a multi-view video shooting scene and an object having a known size is used as the origin of the world coordinate system. The vector t is given as a translation amount from the point S _o of the weak calibration coordinate system corresponding to the origin of the world coordinate system to the origin o _sfm .

また、スケールは、世界座標系で大きさが既知の物体を用いて、それに対応した弱校正座標系での大きさとの比によって求める。世界座標系のｘ、ｙ、ｚ軸上の点に対応した弱校正座標系における点Ｓ_ｘ、Ｓ_ｙ、Ｓ_ｚを用いて、式（５）によって表される弱校正座標系の正規直交基底ベクトルを算出し、各ベクトルｅ_ｉの成分から回転行列Ｒを求める。 Further, the scale is obtained by using an object whose size is known in the world coordinate system, and the ratio with the corresponding size in the weak calibration coordinate system. Using the points S _x , S _y , S _z in the weak calibration coordinate system corresponding to the points on the x, y, z axes of the world coordinate system, the orthonormal basis of the weak calibration coordinate system represented by the equation (5) calculating a vector, determining the rotation matrix R from the components of each vector e _i.

弱校正座標系から世界座標系へ変換することで、世界座標系における被写体の３次元位置を算出することが出来る。 By converting from the weak calibration coordinate system to the world coordinate system, the three-dimensional position of the subject in the world coordinate system can be calculated.

４．多視点カメラキャリブレーション手法の精度評価と評価実験結果
本実施例では、体育館においてバドミントンの練習風景を撮影する。図１に示すように、光軸が世界座標系のＸ軸、Ｙ軸と直交するように固定カメラを２台設置する。図９、図１０に各カメラで撮影した画像例を示す。コートのコーナに原点、コートラインに沿ってＸ軸、Ｙ軸が設定されている。競技規則上バドミントンコートラインは、図９、図１０の（１）（２）の距離が６．１ｍ、（１）（３）の距離が１３．４ｍと定められている。この値を用いてスケールパラメータを推定する。 4). Evaluation of accuracy of multi-viewpoint camera calibration method and result of evaluation experiment In this embodiment, a practice scene of badminton is photographed in a gymnasium. As shown in FIG. 1, two fixed cameras are installed so that the optical axis is orthogonal to the X and Y axes of the world coordinate system. 9 and 10 show examples of images taken by each camera. An X axis and a Y axis are set along the origin and the coat line at the corner of the coat. According to the competition rules, the distance between (1) and (2) in FIGS. 9 and 10 is 6.1 m, and the distance between (1) and (3) is 13.4 m. The scale parameter is estimated using this value.

多視点映像を撮影するカメラはソニー（登録商標）社ＦＤＲＡＸ−１を用いた。横３８４０画素×縦２１６０画素の解像度の映像を毎秒３０枚撮影する。また、同性能のカメラで二つの固定カメラ間を移動しながら同じ空間を撮影する。映像をフレームに分割することにより補完画像を取得する。本実験では、撮影した体育館の構造上、図１のように移動した。 Sony (registered trademark) FDR AX-1 was used as a camera for shooting multi-viewpoint images. 30 images with a resolution of horizontal 3840 pixels × vertical 2160 pixels are taken per second. Also, the same space is photographed while moving between two fixed cameras with the same performance camera. A complementary image is acquired by dividing the video into frames. In this experiment, it moved as shown in FIG. 1 due to the structure of the taken gymnasium.

カメラキャリブレーションの精度を評価するために、移動撮影映像からフレームを切り出す間隔を調整し、３００枚、１５０枚、７５枚、４０枚、２０枚、１０枚、５枚の補完画像を用意する。図２−図８に示すように、撮影した補完画像に対して弱校正処理を適用する。推定されたカメラパラメータを用いて、図９、図１０に示す世界座標系の原点（１）ｏ_{ｗｏｒｌｄ}、（２）Ｘ_ｏ、（３）Ｙ_ｏを算出し、３次元位置の推定精度を検証する。 In order to evaluate the accuracy of camera calibration, the interval at which frames are cut out from the moving image is adjusted, and 300, 150, 75, 40, 20, 10, 10 and 5 complementary images are prepared. As shown in FIGS. 2 to 8, weak calibration processing is applied to the captured supplemental image. Using the estimated camera parameters, the origin (1) o _world , (2) X _o , (3) Y _o of the world coordinate system shown in FIGS. 9 and 10 is calculated, and the estimation accuracy of the three-dimensional position is verified. To do.

４．１補完画像枚数による世界座標系の推定誤差
図１１に示すように、世界座標系の推定値（原点（１）ｏ_{ｗｏｒｌｄ}、（２）Ｘ_ｏ、（３）Ｙ_ｏ）と競技規則で定められた値（（１）（２）６．１ｍ、（１）（３）１３．４ｍを用いた（１）（２）（３）の３次元位置）を比較する。少数枚の補完画像からの推定処理では世界座標系における誤差が大きいことが確認できる。 4.1 Estimated errors in the world coordinate system due to the number of complementary images As shown in FIG. 11, the estimated values in the world coordinate system (origin (1) o _world , (2) X _o , (3) Y _o ) The determined values ((1) (2) 6.1 m, (1) (3) 13.4 m (three-dimensional position) using 13.4 m) are compared. It can be confirmed that the estimation processing from a small number of complementary images has a large error in the world coordinate system.

（２）Ｘ_ｏ、（３）Ｙ_ｏと真値のユークリッド距離の算出誤差を図１２及び図１３に示す。補完画像３００枚、１５０枚、７５枚の場合は１０ｃｍ以下の誤差で３次元位置推定が可能であることがわかる。また、補完画像が２０枚を下回ると誤差が急激に大きくなる。最小値は３００枚の補完画像を用いた場合で平均誤差値は４．３ｃｍ、最大値は２０枚の補完画像を用いた場合で平均誤差値２２９．５ｃｍであった。 (2) Calculation errors of X _o , (3) Y _o and true Euclidean distance are shown in FIGS. It can be seen that the three-dimensional position can be estimated with an error of 10 cm or less in the case of 300, 150, and 75 complementary images. Further, when the number of complementary images is less than 20, the error increases rapidly. The minimum value was 4.3 cm when 300 complementary images were used, and the average error value was 229.5 cm when 20 complementary images were used.

図１２及び図１３の結果を見ると補完画像４０枚と２０枚の場合の平均誤差に大きな差があることがわかる。それぞれの輻輳角は、補完画像４０枚を用いた場合が約６度、補完画像２０枚を用いた場合が約１２度であった。この結果から、本手法を有効に機能させるためには、輻輳角が６度程度となるように補完画像を切り出すことが望ましいと考えられる。例えば、本実験環境では、図１のカメラ間は約４０ｍ（縦約２０ｍ、横約２０ｍ）である。この場合、秒速１ｍの速度で歩行しながら撮影した映像を、毎秒１フレームで切り出すと輻輳角が６度程度の間隔で得られる。 12 and 13, it can be seen that there is a large difference in average error between 40 and 20 complementary images. Each convergence angle was about 6 degrees when 40 complementary images were used, and about 12 degrees when 20 complementary images were used. From this result, in order to make this method function effectively, it is considered desirable to cut out the complementary image so that the convergence angle is about 6 degrees. For example, in this experimental environment, the distance between the cameras in FIG. 1 is approximately 40 m (vertical approximately 20 m, horizontal approximately 20 m). In this case, when a video shot while walking at a speed of 1 m / sec is cut out at 1 frame per second, a convergence angle is obtained at intervals of about 6 degrees.

４．２本発明による３次元映像撮像方法を用いた３次元骨格位置姿勢推定
本発明の応用事例の一つとしてバドミントン選手の３次元的な姿勢推定を行った。本発明手法を利用し、被写体となる選手の姿勢推定精度がどのように変化するかを実測した。撮影画像中での被写体の姿勢推定にはConvolutional Neural Network（ＣＮＮ：深層学習）を利用した人物の骨格位置推定手法を利用した。撮影した多視点カメラ画像にConvolutional Pose Machinesを適用した結果を図１４および図１５の左側に示す。２視点で撮影した画像から検出した骨格情報から３次元骨格位置を推定した結果を図１４および図１５の右側に示す。この際ステレオ処理に利用する射影変換行列は本発明を用いて推定した。 4.2 3D Skeletal Position / Attitude Estimation Using the 3D Image Capturing Method According to the Present Invention As one of application examples of the present invention, 3D attitude estimation of a badminton player was performed. Using the method of the present invention, it was measured how the posture estimation accuracy of the player who is the subject changes. For estimating the posture of a subject in a captured image, a human skeleton position estimation method using a convolutional neural network (CNN: deep learning) was used. The results of applying Convolutional Pose Machines to the captured multi-view camera images are shown on the left side of FIGS. The results of estimating the three-dimensional skeleton position from the skeleton information detected from images taken from two viewpoints are shown on the right side of FIGS. At this time, the projective transformation matrix used for stereo processing was estimated using the present invention.

推定した３次元骨格の首下の位置を用いて、３次元位置推定誤差の比較実験を行った。３００枚の補完画像を用いて推定した３次元骨格位置（図１６参照）とその他の結果を比較する。図１７に示すように、誤差の最小値は１５０枚の補完画像を用いた平均誤差値２．７ｃｍであり、２０枚以下になると急激に誤差が増加することが確認された。 Using the estimated position under the neck of the three-dimensional skeleton, a comparison experiment of three-dimensional position estimation error was performed. The three-dimensional skeleton position estimated using 300 complementary images (see FIG. 16) is compared with other results. As shown in FIG. 17, the minimum value of the error is an average error value of 2.7 cm using 150 complementary images, and it has been confirmed that the error rapidly increases when the number is 20 or less.

Claims

A first imaging unit that images an imaging object in a three-dimensional space with a camera that captures a plurality of two-dimensional images;
A second imaging unit that captures an object in a state of facing substantially the same direction as a camera located in the vicinity of the plurality of cameras;
An acquisition unit that acquires a group of images captured from multiple viewpoints based on the complementary image captured by the second imaging unit and the image captured by the first imaging unit;
An estimation unit that estimates a projective transformation matrix of an image captured from the multi-viewpoint by applying weak calibration to the image group acquired by the acquisition unit;
A calibration unit that performs high-precision estimation of camera parameters of the first imaging unit from the projective transformation matrix estimated by the estimation unit,
Imaging system.

The second imaging unit is a camera that moves and images between cameras used in the first imaging unit.
The imaging system according to claim 1.

The second imaging unit is a camera capable of capturing a moving image;
The imaging system according to claim 1.

The second imaging unit performs an imaging action only before imaging the imaging object by the first imaging unit, and when the high-accuracy parameter estimation of the camera constituting the first imaging unit is completed, End use,
The imaging system according to any one of claims 1 to 3.

A first imaging method of imaging an imaging object in a three-dimensional space with a camera that captures a plurality of two-dimensional images;
A second imaging method for imaging an object in a state of facing substantially the same direction as a camera located in the vicinity of the plurality of cameras;
An acquisition method for acquiring an image group captured from multiple viewpoints based on the complementary image captured by the second imaging method and the image captured by the first imaging method;
An estimation method for estimating a projective transformation matrix of an image captured from the multi-viewpoint by applying weak calibration to the image group acquired by the acquisition method;
A calibration method for performing high-precision estimation of camera parameters of the first imaging method from the projective transformation matrix estimated by the estimation method,
Imaging method.

The second imaging method is a method using a camera that moves and images between cameras used in the first imaging method.
The imaging method according to claim 5.

The second imaging method is a method using a camera capable of capturing a moving image,
The imaging method according to claim 5.

The second imaging method performs the imaging act only before imaging the imaging object by the first imaging method, and when the high-accuracy parameter estimation of the camera constituting the first imaging method is completed End use,
The imaging method according to any one of claims 5 to 7.