JP2024021218A

JP2024021218A - Camera calibration device, camera calibration method and program

Info

Publication number: JP2024021218A
Application number: JP2022123909A
Authority: JP
Inventors: 学中野; Manabu Nakano
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2022-08-03
Filing date: 2022-08-03
Publication date: 2024-02-16

Abstract

PROBLEM TO BE SOLVED: To stably calibrate a plurality of cameras regardless of lens characteristics.

SOLUTION: A camera calibration device 10 includes a 3-dimensional posture estimation unit 11, an association processing unit 12, and an optimization unit 13. The 3-dimensional posture estimation unit 11 estimates 3-dimensional posture of a person from 2-dimensional posture showing coordinates of joint positions of the person on each frame image of video generated by a plurality of cameras installed at different locations which photograph a common person. The association processing unit 12 searches a similarity conversion parameter associated with the 3-dimensional posture estimated for each camera, associates the same person in a frame image generated by different cameras with each other, and determines an external parameter of each camera from the similarity conversion parameter. The optimization unit 13 optimizes the 3-dimensional posture and a camera parameter, such that a re-projection error between 2-dimensional coordinates obtained by re-projecting the associated 3-dimensional posture to the frame image of each camera and the 2-dimensional posture becomes minimum.

SELECTED DRAWING: Figure 2

Description

本開示は、カメラ校正装置、カメラ校正方法及びプログラムに関する。 The present disclosure relates to a camera calibration device, a camera calibration method, and a program.

複数台のカメラを用いて同一の物体（被写体）を撮影した複数の画像を含む画像列から、カメラパラメータとその物体の３次元情報とを復元する問題は、Structure-from-Motion（ＳｆＭ）や多視点幾何学問題と呼ばれている。カメラパラメータは、「内部パラメータ」及び「外部パラメータ」の２種類を含む。内部パラメータは、焦点距離、レンズ歪、光学中心座標などのレンズ固有のパラメータであり、外部パラメータは、カメラ間の３次元的な回転行列及び並進ベクトルである。内部パラメータは、レンズが固定ならば事前に計測が可能である。内部パラメータのみ、もしくは、内部パラメータと外部パラメータとが既知のカメラは、校正済みカメラと呼ばれる。一方、それらが未知のカメラは未校正カメラと呼ばれる。 The problem of restoring camera parameters and three-dimensional information of an object from an image sequence containing multiple images of the same object (subject) taken using multiple cameras can be solved using Structure-from-Motion (SfM) or This is called a multi-view geometry problem. Camera parameters include two types: "internal parameters" and "external parameters." Intrinsic parameters are lens-specific parameters such as focal length, lens distortion, and optical center coordinates, and extrinsic parameters are three-dimensional rotation matrices and translation vectors between cameras. The internal parameters can be measured in advance if the lens is fixed. A camera for which only internal parameters or both internal and external parameters are known is called a calibrated camera. On the other hand, a camera in which these are unknown is called an uncalibrated camera.

１台のカメラの内部パラメータを算出する方法として、平面のチェスボードパターンを用いるZhangの手法が広く知られている。また、１台の校正済みカメラの外部パラメータを算出する方法として、レーザーレンジファインダーで空間内の物体の３次元座標を計測し、３次元点が画像内の２次元点として観測される射影関係よりPerspective-n-point問題を解く手法も広く知られている。 Zhang's method, which uses a flat chessboard pattern, is widely known as a method for calculating the internal parameters of a single camera. In addition, as a method for calculating the external parameters of a single calibrated camera, the three-dimensional coordinates of an object in space are measured using a laser range finder, and the three-dimensional point is observed as a two-dimensional point in the image using a projection relationship. Methods for solving Perspective-n-point problems are also widely known.

複数台のカメラの内部パラメータと外部パラメータを同時に校正する方法として、ＳｆＭと呼ばれる手法が広く知られている。ＳｆＭでは、まず２台のカメラで共通の物体を撮影し、物体表面上の同一の３次元点が画像へと射影された２次元点（以下、対応点）の組を生成する。次に、複数の対応点（一般的に数十から数千個）を用いて、エピポーラ制約と呼ばれる、２台のカメラと対応点の幾何的な関係を表す制約式を解く。そして、全カメラから２台ずつ選択して、対応点の生成とエピポーラ制約式の求解を行い、全カメラのカメラパラメータと３次元点の暫定的な値を取得する。 A method called SfM is widely known as a method for simultaneously calibrating internal parameters and external parameters of a plurality of cameras. In SfM, two cameras first photograph a common object and generate a set of two-dimensional points (hereinafter referred to as corresponding points) in which the same three-dimensional points on the object surface are projected onto an image. Next, using a plurality of corresponding points (generally several tens to several thousand), a constraint expression called an epipolar constraint that expresses the geometric relationship between the two cameras and the corresponding points is solved. Then, two cameras are selected from all the cameras, corresponding points are generated and epipolar constraint equations are solved, and the camera parameters of all the cameras and provisional values of the three-dimensional points are obtained.

最後に、観測した対応点と、推定した全カメラパラメータで３次元点を画像上に投影した点との再投影誤差が最小になるように最適化する（以下、バンドル調整）。ＳｆＭを安定的に実行するためには、１つの対応点が、複数のカメラで観測されていることが望ましい。しかしながら、物体を取り囲むようにカメラが配置されている場合、相対するカメラ同士では対応点が物体自身により隠れてしまい、隣り合うカメラでしか観測できないということが生じる。そのような環境でＳｆＭを実行すると、一部のカメラパラメータが推定できなかったり、全カメラパラメータが求められたとしても精度が低かったりするという課題がある。 Finally, optimization is performed so that the reprojection error between the observed corresponding points and the three-dimensional points projected onto the image using all the estimated camera parameters is minimized (hereinafter referred to as bundle adjustment). In order to perform SfM stably, it is desirable that one corresponding point be observed by multiple cameras. However, when cameras are arranged to surround an object, the corresponding points between opposing cameras are hidden by the object itself, and can only be observed with adjacent cameras. When SfM is executed in such an environment, there are problems in that some camera parameters cannot be estimated, and even if all camera parameters are determined, the accuracy is low.

どのカメラから見ても対応点を生成するために、画像上における人体の関節位置を対応点として用いる方法が提案されている。例えば、人体を正面から見ても背面から見ても首、肘、膝などの関節は観測できる。特許文献１では、歩行者の足首関節を地面における対応点と見なしてＳｆＭを実行し、カメラパラメータと人体関節の３次元座標（以下、３次元姿勢）を推定する方法が記載されている。特許文献２では、複数人の全身の関節位置を対応点と見なしてＳｆＭを実行し、望遠レンズと広角レンズで構成されたカメラの校正と３次元姿勢の推定を行う方法が記載されている。 In order to generate corresponding points when viewed from any camera, a method has been proposed that uses joint positions of a human body on an image as corresponding points. For example, joints such as the neck, elbows, and knees can be observed whether the human body is viewed from the front or from the back. Patent Document 1 describes a method of performing SfM by regarding a pedestrian's ankle joint as a corresponding point on the ground, and estimating camera parameters and three-dimensional coordinates (hereinafter referred to as three-dimensional posture) of a human body joint. Patent Document 2 describes a method of performing SfM by regarding joint positions of multiple people's whole bodies as corresponding points, and calibrating a camera including a telephoto lens and a wide-angle lens and estimating a three-dimensional posture.

また、人体の関節情報を用いる場合は必ずしも複数台のカメラを用いる必要はない。非特許文献１では、歩行者の背骨関節を地面に対する垂直線と見なして、１台のカメラの校正と人の身長を推定する方法が記載されている。非特許文献２では、ニューラルネットワークを用いて、１台のカメラから見た被写体の３次元姿勢を推定する方法が記載されている。 Furthermore, when using human body joint information, it is not necessarily necessary to use multiple cameras. Non-Patent Document 1 describes a method of calibrating one camera and estimating the height of a pedestrian by regarding the spinal joints of the pedestrian as a perpendicular line to the ground. Non-Patent Document 2 describes a method of estimating the three-dimensional posture of a subject as seen from one camera using a neural network.

国際公開第２０２０／１９４４８６号International Publication No. 2020/194486 特許第６８１６０５８号公報Patent No. 6816058

Nakano, Gaku. "Camera Calibration Using Parallel Line Segments." 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2021.Nakano, Gaku. "Camera Calibration Using Parallel Line Segments." 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2021. Yang, Wei, et al. "3d human pose estimation in the wild by adversarial learning." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.Yang, Wei, et al. "3d human pose estimation in the wild by adversarial learning." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

しかしながら、上述した方法には、以下のような問題がある。
まず、特許文献１に記載された方法は、足首関節は上半身や他の歩行者で隠れやすく、常にカメラで観測されるとは限らない。次に、特許文献２に記載された方法は、望遠レンズと広角レンズを同時に用いる必要があるため、街中の監視カメラシステムのような同じレンズ特性で構成される複数台のカメラを校正できない。そして、非特許文献１に記載された方法は、１台のカメラパラメータを独立して算出するため、複数台のカメラを同時に校正できない。また、非特許文献２に記載された方法は、人体関節の３次元座標のみを推定し、カメラパラメータは学習データに依存してニューラルネットワークの中で暗黙的に決定されるため、カメラを校正することはできない。 However, the above method has the following problems.
First, in the method described in Patent Document 1, the ankle joint is easily hidden by the upper body or other pedestrians, and is not always observed by a camera. Next, since the method described in Patent Document 2 requires the simultaneous use of a telephoto lens and a wide-angle lens, it is not possible to calibrate a plurality of cameras configured with the same lens characteristics, such as a surveillance camera system in a city. The method described in Non-Patent Document 1 calculates the parameters of one camera independently, and therefore cannot calibrate multiple cameras at the same time. In addition, the method described in Non-Patent Document 2 estimates only the three-dimensional coordinates of human body joints, and the camera parameters are implicitly determined in the neural network depending on learning data, so the camera is not calibrated. It is not possible.

上述したように、人体の関節位置を対応点として用いる従来の方法は、レンズ特性に関わらず複数台のカメラを安定的に校正することが困難である。 As described above, in the conventional method using the joint positions of the human body as corresponding points, it is difficult to stably calibrate multiple cameras regardless of lens characteristics.

本発明の目的の一例は、上述した課題を鑑み、レンズ特性に関わらず複数台のカメラを安定的に校正し得るカメラ校正装置、カメラ校正方法及びプログラムを提供することにある。 An example of an object of the present invention is to provide a camera calibration device, a camera calibration method, and a program that can stably calibrate a plurality of cameras regardless of lens characteristics, in view of the above-mentioned problems.

本開示の一態様に係るカメラ校正装置は、３次元姿勢推定部、対応付け処理部、最適化部を含む。３次元姿勢推定部は、一人以上の共通の人物含む共通の場所を撮影する異なる場所に設置された複数台のカメラにより生成された映像の各フレーム画像上における人物の関節位置の座標を表す２次元姿勢を取得する。また、３次元姿勢推定部は、該２次元姿勢から人物の３次元的な関節位置を表す３次元姿勢を推定する。対応付け処理部は、カメラ毎に推定された３次元姿勢を対応付ける相似変換パラメータを探索して、異なるカメラによって生成されたフレーム画像における同一人物を対応付け、相似変換パラメータから各カメラの外部パラメータを決定する。最適化部は、対応付けられた３次元姿勢を各カメラのフレーム画像へ再投影した２次元座標と２次元姿勢との再投影誤差が最小となるように、３次元姿勢、カメラの内部パラメータ、外部パラメータの少なくともいずれかを最適化する。 A camera calibration device according to one aspect of the present disclosure includes a three-dimensional pose estimation section, a correspondence processing section, and an optimization section. The 3D pose estimating unit calculates the coordinates of joint positions of a person on each frame image of a video generated by multiple cameras installed at different locations that capture a common location including one or more common people. Get dimensional pose. Further, the three-dimensional posture estimating unit estimates a three-dimensional posture representing three-dimensional joint positions of the person from the two-dimensional posture. The association processing unit searches for similarity transformation parameters that associate the three-dimensional poses estimated for each camera, associates the same person in frame images generated by different cameras, and extracts external parameters of each camera from the similarity transformation parameters. decide. The optimization unit calculates the 3D orientation, internal parameters of the camera, and Optimize at least one of the external parameters.

本開示の一態様に係るカメラ校正方法は、コンピュータが、以下の処理を実行する。
一人以上の共通の人物含む共通の場所を撮影する異なる場所に設置された複数台のカメラにより生成された映像の各フレーム画像上における人物の関節位置の座標を表す２次元姿勢を取得し、当該２次元姿勢から人物の３次元的な関節位置を表す３次元姿勢を推定する処理。
カメラ毎に推定された前記３次元姿勢を対応付ける相似変換パラメータを探索して、異なるカメラによって生成されたフレーム画像における同一人物を対応付け、前記相似変換パラメータから各カメラの外部パラメータを決定する処理。
対応付けられた前記３次元姿勢を各カメラのフレーム画像へ再投影した２次元座標と前記２次元姿勢との再投影誤差が最小となるように、前記３次元姿勢、前記カメラの内部パラメータ、外部パラメータの少なくともいずれかを最適化する処理。 In a camera calibration method according to one aspect of the present disclosure, a computer executes the following processing.
Obtain the two-dimensional posture representing the coordinates of the joint positions of the person on each frame image of the video generated by multiple cameras installed at different locations that photograph a common place containing one or more common people, and A process of estimating a 3D posture representing the 3D joint positions of a person from a 2D posture.
A process of searching for a similarity transformation parameter that associates the three-dimensional posture estimated for each camera, associating the same person in frame images generated by different cameras, and determining an extrinsic parameter of each camera from the similarity transformation parameter.
The three-dimensional pose, the internal parameters of the camera, and the external A process that optimizes at least one of the parameters.

本開示の一態様に係るプログラムは、コンピュータに以下の処理を実行させるものである。
一人以上の共通の人物含む共通の場所を撮影する異なる場所に設置された複数台のカメラにより生成された映像の各フレーム画像上における人物の関節位置の座標を表す２次元姿勢を取得し、当該２次元姿勢から人物の３次元的な関節位置を表す３次元姿勢を推定する処理。
カメラ毎に推定された前記３次元姿勢を対応付ける相似変換パラメータを探索して、異なるカメラによって生成されたフレーム画像における同一人物を対応付け、前記相似変換パラメータから各カメラの外部パラメータを決定する処理。
対応付けられた前記３次元姿勢を各カメラのフレーム画像へ再投影した２次元座標と前記２次元姿勢との再投影誤差が最小となるように、前記３次元姿勢、前記カメラの内部パラメータ、外部パラメータの少なくともいずれかを最適化する処理。 A program according to one aspect of the present disclosure causes a computer to execute the following processing.
Obtain the two-dimensional posture representing the coordinates of the joint positions of the person on each frame image of the video generated by multiple cameras installed at different locations that photograph a common place containing one or more common people, and A process of estimating a 3D posture representing the 3D joint positions of a person from a 2D posture.
A process of searching for a similarity transformation parameter that associates the three-dimensional posture estimated for each camera, associating the same person in frame images generated by different cameras, and determining an extrinsic parameter of each camera from the similarity transformation parameter.
The three-dimensional pose, the internal parameters of the camera, and the external A process that optimizes at least one of the parameters.

本開示により、レンズ特性に関わらず複数台のカメラを安定的に校正し得るカメラ校正装置、カメラ校正方法、及び、プログラムを提供することができる。 According to the present disclosure, it is possible to provide a camera calibration device, a camera calibration method, and a program that can stably calibrate a plurality of cameras regardless of lens characteristics.

実施形態に係るカメラ校正装置による処理が適用される、撮影システムにおける複数台のカメラの配置の一例を示す図である。FIG. 2 is a diagram illustrating an example of the arrangement of a plurality of cameras in a photographing system to which processing by a camera calibration device according to an embodiment is applied. 実施形態に係るカメラ校正装置の構成の一例を示すブロック図である。FIG. 1 is a block diagram showing an example of the configuration of a camera calibration device according to an embodiment. 実施形態に係るカメラ校正装置の処理動作の一例を示すフロー図である。It is a flow diagram showing an example of processing operation of the camera calibration device concerning an embodiment. カメラ校正装置のハードウェア構成の一例を示す図である。It is a diagram showing an example of the hardware configuration of a camera calibration device.

以下、図面を参照しつつ、実施形態について説明する。なお、実施形態において、同一又は同等の要素には、同一の符号を付し、重複する説明は省略される。 Hereinafter, embodiments will be described with reference to the drawings. In the embodiments, the same or equivalent elements are denoted by the same reference numerals, and redundant explanations will be omitted.

＜撮影システム＞
実施形態は、複数台のカメラを校正するカメラ校正装置、カメラ校正方法、及び、カメラ校正プログラムに関する。図１は、実施形態に係るカメラ校正装置による処理が適用される、撮影システムにおける複数台のカメラの配置の一例を示す図である。図１に示すように、Ｎ台（Ｎは２以上の整数）のカメラＣ１、Ｃ２、…、Ｃｎはそれぞれ異なる場所に設置されており、一人以上の共通の人物含む共通の場所（以下、撮影エリアとする）を撮影する。図１に示す例では、撮影エリアに３人の人物Ｔ１、Ｔ２、Ｔ３が存在しており、各カメラＣ１、Ｃ２、…、Ｃｎそれぞれによって撮影されたカメラ映像には人物Ｔ１、Ｔ２、Ｔ３の人物画像が含まれることとなる。 <Photography system>
Embodiments relate to a camera calibration device, a camera calibration method, and a camera calibration program that calibrate a plurality of cameras. FIG. 1 is a diagram illustrating an example of the arrangement of a plurality of cameras in a photographing system to which processing by a camera calibration device according to an embodiment is applied. As shown in Fig. 1, N cameras (N is an integer of 2 or more) C1, C2, ..., Cn are installed at different locations, and they share a common location (hereinafter referred to as "capture camera") containing one or more common people. area). In the example shown in FIG. 1, there are three people T1, T2, T3 in the shooting area, and the camera images taken by each camera C1, C2, ..., Cn include the people T1, T2, T3. This will include images of people.

＜カメラ校正装置＞
図２は、実施形態に係るカメラ校正装置１０の構成の一例を示すブロック図である。図２に示すカメラ校正装置１０は、図１に示すようなＮ台のカメラから構成される撮影システムの、各カメラＣ１、Ｃ２、…、Ｃｎのカメラパラメータと被写体（人物Ｔ１、Ｔ２、Ｔ３）の３次元姿勢を推定する装置である。本実施形態において、Ｎ台のカメラは時間同期が取れているものとする。すなわち、あるカメラのｉ番目のフレームと、別のカメラのｉ番目のフレームとは同一のタイムスタンプを持つ。 <Camera calibration device>
FIG. 2 is a block diagram showing an example of the configuration of the camera calibration device 10 according to the embodiment. The camera calibration device 10 shown in FIG. 2 uses the camera parameters of each camera C1, C2, . This is a device that estimates the three-dimensional posture of In this embodiment, it is assumed that N cameras are time synchronized. That is, the i-th frame of one camera and the i-th frame of another camera have the same time stamp.

カメラ校正装置１０は、３次元姿勢推定部１１、対応付け処理部１２、最適化部１３を含む。３次元姿勢推定部１１は、カメラＣ１、Ｃ２、…、Ｃｎにより生成されたカメラ映像の各フレーム画像における人物Ｔ１、Ｔ２、Ｔ３それぞれの関節位置のフレーム画像上における２次元座標を表す２次元姿勢を取得し、各人物の３次元姿勢を推定する。ここで、３次元姿勢とは、人物の３次元的な関節位置を表す３次元座標である。 The camera calibration device 10 includes a three-dimensional pose estimation section 11, a correspondence processing section 12, and an optimization section 13. The three-dimensional posture estimating unit 11 estimates two-dimensional postures representing two-dimensional coordinates on the frame images of the respective joint positions of the persons T1, T2, and T3 in each frame image of the camera images generated by the cameras C1, C2, ..., Cn. is acquired and the three-dimensional posture of each person is estimated. Here, the three-dimensional posture is three-dimensional coordinates representing three-dimensional joint positions of a person.

具体的には、３次元姿勢推定部１１は、互いに異なる位置に設置された複数のカメラＣ１、Ｃ２、…、Ｃｎによって共通の場所（撮影エリア）を撮影したカメラ映像を取得する。そして、３次元姿勢推定部１１は、まず、取得したカメラ映像の各フレーム画像における各人物の関節位置の２次元座標を取得し、次に３次元姿勢を推定する。 Specifically, the three-dimensional posture estimating unit 11 acquires camera images captured by a plurality of cameras C1, C2, . The three-dimensional posture estimation unit 11 first obtains the two-dimensional coordinates of the joint positions of each person in each frame image of the obtained camera video, and then estimates the three-dimensional posture.

なお、各カメラで推定される３次元姿勢は、各カメラの視野角に応じて異なった関節位置で構成されてもよい。例えば、カメラＣ１で人物Ｔ１の全身が映っていれば、その全身の関節位置について３次元座標が推定される。また、カメラＣ２で人物Ｔ２の上半身しか映っていなければ、その上半身の関節位置について３次元座標が推定される。各カメラ映像から取得される２次元姿勢についても同様に、各カメラの視野角に応じて異なった関節位置で構成されてもよい。 Note that the three-dimensional posture estimated by each camera may be composed of different joint positions depending on the viewing angle of each camera. For example, if the whole body of the person T1 is captured by the camera C1, the three-dimensional coordinates of the joint positions of the whole body are estimated. Furthermore, if only the upper body of the person T2 is captured by the camera C2, the three-dimensional coordinates of the joint positions of the upper body are estimated. Similarly, the two-dimensional posture obtained from each camera image may be configured with different joint positions depending on the viewing angle of each camera.

対応付け処理部１２は、３次元姿勢推定部１１にて推定された各カメラ映像の３次元姿勢を用いて、まず、それらの３次元座標が一致する３次元相似変換を探索し、異なるカメラ映像中における同一人物の３次元姿勢を対応付ける。次に、対応付け処理部１２は、一致した相似変換パラメータから各カメラの外部パラメータを決定する。外部パラメータは、カメラ間の３次元的な回転行列及び並進ベクトルである。ここで、各カメラの外部パラメータのスケールを一意に決定するために、任意の１つのカメラを世界座標系の原点として設定してもよい。 Using the three-dimensional pose of each camera image estimated by the three-dimensional pose estimation unit 11, the correspondence processing unit 12 first searches for a three-dimensional similarity transformation whose three-dimensional coordinates match, and The three-dimensional postures of the same person inside are correlated. Next, the association processing unit 12 determines the external parameters of each camera from the matched similarity transformation parameters. The external parameters are a three-dimensional rotation matrix and translation vector between cameras. Here, in order to uniquely determine the scale of the extrinsic parameters of each camera, any one camera may be set as the origin of the world coordinate system.

最適化部１３は、対応付け処理部１２にて対応付けられた３次元姿勢と、３次元姿勢推定部１１で取得された２次元姿勢とを用いて、バンドル調整を行う。バンドル調整では、まず、対応付け処理部１２にて対応付けられた３次元姿勢を、抽出元の各カメラのフレーム画像（２次元画像）に対応するカメラパラメータを用いて、元のフレーム画像へ再投影して２次元座標が算出される。そして、投影された位置の２次元座標と、３次元姿勢推定部１１で取得された２次元姿勢とが比較され、２次元座標と２次元姿勢の差分（再投影誤差）が算出される。最適化部１３は、この再投影誤差が最小となるように、３次元姿勢と、カメラの内部パラメータと外部パラメータの少なくともいずれかを最適化する。 The optimization unit 13 performs bundle adjustment using the three-dimensional poses associated by the association processing unit 12 and the two-dimensional poses acquired by the three-dimensional pose estimating unit 11. In the bundle adjustment, first, the 3D postures associated with each other in the association processing unit 12 are regenerated into the original frame images using the camera parameters corresponding to the frame images (2D images) of each camera from which they are extracted. Two-dimensional coordinates are calculated by projecting. Then, the two-dimensional coordinates of the projected position and the two-dimensional posture acquired by the three-dimensional posture estimating section 11 are compared, and the difference (reprojection error) between the two-dimensional coordinates and the two-dimensional posture is calculated. The optimization unit 13 optimizes the three-dimensional posture and at least one of the internal and external parameters of the camera so that this reprojection error is minimized.

＜カメラ校正装置の処理動作＞
以下、上記の構成を有するカメラ校正装置１０の処理動作の一例について、図３を参照して説明する。図３は、実施形態に係るカメラ校正装置の処理動作の一例を示すフロー図である。以下の説明においては、適宜、図１、２を参酌する。また、本実施形態では、カメラ校正装置を動作させることによって、カメラ校正方法が実施される。よって、本実施形態におけるカメラ校正方法の説明は、以下のカメラ校正装置の処理動作の説明に代える。 <Processing operation of camera calibration device>
An example of the processing operation of the camera calibration device 10 having the above configuration will be described below with reference to FIG. 3. FIG. 3 is a flow diagram illustrating an example of the processing operation of the camera calibration device according to the embodiment. In the following description, FIGS. 1 and 2 will be referred to as appropriate. Furthermore, in this embodiment, the camera calibration method is implemented by operating the camera calibration device. Therefore, the explanation of the camera calibration method in this embodiment is replaced by the explanation of the processing operation of the camera calibration apparatus below.

まず、図１に示すような状況において、Ｎ台（Ｎは２以上の整数）のカメラＣ１、Ｃ２、…、Ｃｎで共通の撮影エリアが撮影される。すなわち、カメラＣ１、Ｃ２、…、Ｃｎは、視野を共有する場所を撮影している。このような撮影システムは、例えば、交差点や駅構内などの人が往来する場所への設置が想定される。撮影システムは、複数のカメラで撮影エリアを撮影することによりそれぞれカメラ映像を生成する。 First, in a situation as shown in FIG. 1, a common photographing area is photographed by N cameras (N is an integer of 2 or more) C1, C2, . . . , Cn. That is, the cameras C1, C2, . . . , Cn are photographing a place where they share a field of view. Such a photographing system is expected to be installed in a place where people come and go, such as an intersection or inside a station. The photographing system generates camera images by photographing a photographing area with a plurality of cameras.

これらのカメラは内部パラメータが事前に校正されていてもよいし、未校正であってもよい。各カメラは時刻同期が取れているが、カメラ映像の各フレーム画像に映っている複数の被写体の対応付けはできていない。すなわち、あるカメラにおけるｉ番目のフレームに映るｐ番目の人物と、別のカメラにおけるｉ番目のフレームに映るｑ番目の人物が同一人物かどうかは不明である。 The internal parameters of these cameras may be calibrated in advance or may be uncalibrated. Although each camera is time-synchronized, it is not possible to correlate the multiple objects shown in each frame image of the camera video. That is, it is unclear whether the p-th person shown in the i-th frame of one camera and the q-th person shown in the i-th frame of another camera are the same person.

図１に示す状況では、撮影エリアに３人の人物Ｔ１、Ｔ２、Ｔ３が存在しており、各カメラＣ１、Ｃ２、…、Ｃｎそれぞれによって撮影されたカメラ映像には人物Ｔ１、Ｔ２、Ｔ３の人物画像が含まれる。このような各カメラＣ１～ＣｎのＮ個のカメラ映像がカメラ校正装置１０に入力されると、図３の処理フローがスタートする。 In the situation shown in Fig. 1, there are three people T1, T2, T3 in the shooting area, and the camera images taken by each camera C1, C2, ..., Cn include the people T1, T2, T3. Contains images of people. When such N camera images from each of the cameras C1 to Cn are input to the camera calibration device 10, the processing flow shown in FIG. 3 starts.

図３に示すように、３次元姿勢推定部１１は、Ｎ個のカメラ映像が入力されると、各カメラ映像についてフレーム毎に人物の２次元姿勢を取得し、３次元姿勢を推定する（ステップＳ１１）。ここで、各カメラ映像における３次元姿勢は、ある共通の世界座標系で定義される必要はない。すなわち、カメラ毎に独立の座標系により定義されてよい。２次元姿勢と３次元姿勢を求めるためには、既存の様々な方法を用いることができる。例えば、幾何学に基づく非特許文献１に記載された方法を用いてもよいし、機械学習に基づく非特許文献２に記載された方法を用いてもよい。 As shown in FIG. 3, when N camera images are input, the 3D posture estimating unit 11 acquires the 2D posture of the person for each frame of each camera image and estimates the 3D posture (step S11). Here, the three-dimensional posture in each camera image does not need to be defined using a certain common world coordinate system. That is, each camera may be defined by an independent coordinate system. Various existing methods can be used to obtain the two-dimensional and three-dimensional postures. For example, the method described in Non-Patent Document 1 based on geometry may be used, or the method described in Non-Patent Document 2 based on machine learning may be used.

次に、対応付け処理部１２は、各カメラ映像の３次元姿勢が入力されると、それらの３次元姿勢が一致する３次元相似変換を探索し、異なるカメラ映像中における同一人物の３次元姿勢を対応付ける。また、対応付け処理部１２は、一致した相似変換パラメータを各カメラの外部パラメータをして決定する（ステップＳ１２）。 Next, when the 3D postures of each camera image are input, the correlation processing unit 12 searches for a 3D similarity transformation that matches the 3D postures, and searches for a 3D similarity transformation that matches the 3D postures of the same person in different camera images. map. Furthermore, the association processing unit 12 determines the matching similarity transformation parameters by using the external parameters of each camera (step S12).

対応付け処理部１２は、例えば、あるカメラにおけるｉ番目のフレームに映るｐ番目の人物の３次元姿勢が、別のカメラのｉ番目のフレームにて観測されたかどうかを探索して、３次元姿勢の対応付けを行う。 For example, the association processing unit 12 searches whether the three-dimensional posture of the p-th person reflected in the i-th frame of a certain camera is observed in the i-th frame of another camera, and determines the three-dimensional posture. Perform the mapping.

探索方法としては、例えば、広く知られているＩＣＰ（Iterative Closest Point）法を相似変換に拡張した方法を用いてもよい。具体的には、探索は以下のように行えばよい。まず、全フレームの３次元姿勢を一塊の３次元点群とみなすと、Ｎ個の３次元点群が定義できる。次に、Ｎ個から２個の点群を選択し、ＩＣＰ法を適用して２個の点群が一致する相似変換を推定する。次に、２個のうち１個の点群と、残りのＮ－２個の点群から１つを選び、ＩＣＰ法を適用する。この操作をすべての点群に対して行うと、最終的にすべての点群を１個に統合できる。また、相似変換パラメータは各カメラの外部パラメータと一致する。 As the search method, for example, a method obtained by extending the widely known ICP (Iterative Closest Point) method to similarity transformation may be used. Specifically, the search may be performed as follows. First, if the three-dimensional posture of all frames is regarded as a group of three-dimensional points, N three-dimensional point groups can be defined. Next, two point groups are selected from the N points, and the ICP method is applied to estimate a similarity transformation in which the two point groups match. Next, one of the two point groups and one of the remaining N-2 point groups are selected and the ICP method is applied. By performing this operation on all point groups, all point groups can be finally integrated into one. Furthermore, the similarity transformation parameters match the external parameters of each camera.

別の探索方法としては、広く知られている３次元点特徴量による点対応のマッチングを用いてもよい。例えば、ＩＳＳ（Intrinsic Shape Signatures）や画像特徴として知られるＳＩＦＴ（Scale-Invariant Feature Transform）を３次元に拡張した３ＤＳＩＦＴなどを用いて対応点組を計算し、次に、ＲＡＮＳＡＣ（Random Sample Consensus）により誤った対応点を除去しながら相似変換パラメータを推定する。 As another search method, point correspondence matching using a widely known three-dimensional point feature amount may be used. For example, a set of corresponding points is calculated using 3D SIFT, which is a three-dimensional extension of ISS (Intrinsic Shape Signatures) and SIFT (Scale-Invariant Feature Transform), which is known for image features, and then RANSAC (Random Sample Consensus). The similarity transformation parameters are estimated while removing erroneous corresponding points.

上記の例では全フレームの３次元姿勢を１つの点群としてＩＣＰ法又はＲＡＮＳＡＣを適用したが、時刻同期が取れているため、１フレームの３次元姿勢を１つの点群としてもよい。これにより、１つの点群に含まれる３次元座標の数が減少するため、ＩＣＰ法又はＲＡＮＳＡＣをより高速に実行できる。なお、カメラ間の３次元姿勢の対応付けが完了すれば、３次元姿勢の画像上での観測値である２次元姿勢も同時に対応付けられる。 In the above example, the ICP method or RANSAC is applied by treating the three-dimensional posture of all frames as one point group, but since time synchronization is achieved, the three-dimensional posture of one frame may be treated as one point group. This reduces the number of three-dimensional coordinates included in one point group, so the ICP method or RANSAC can be executed faster. Note that once the three-dimensional postures of the cameras have been correlated, the two-dimensional postures, which are the observed values on the images of the three-dimensional postures, are also correlated at the same time.

最後に、最適化部１３は、対応付けられた３次元姿勢と２次元姿勢とを用い、再投影誤差が最小となるようにバンドル調整を実行する（ステップＳ１３）。具体的には、最適化部１３は、カメラパラメータと対応付けられた３次元姿勢を変数として、対応付けられた３次元姿勢を再投影した２次元座標と、２次元姿勢との差分が最小となるように、各カメラパラメータと３次元姿勢を最適化する。これにより、３次元姿勢とカメラパラメータをより高精度に推定できる。そして、最適化部１３は、カメラパラメータと３次元姿勢を出力して動作を終了する。 Finally, the optimization unit 13 uses the correlated three-dimensional posture and two-dimensional posture to perform bundle adjustment so that the reprojection error is minimized (step S13). Specifically, the optimization unit 13 uses the three-dimensional posture associated with the camera parameter as a variable, and determines whether the difference between the two-dimensional coordinates obtained by reprojecting the associated three-dimensional posture and the two-dimensional posture is the minimum. Optimize each camera parameter and three-dimensional pose so that Thereby, the three-dimensional posture and camera parameters can be estimated with higher accuracy. Then, the optimization unit 13 outputs the camera parameters and the three-dimensional orientation, and ends the operation.

なお、外部パラメータは、対応付け処理部１２による出力を初期値として用いることができる。内部パラメータが未知の場合は、以下のように初期値を与えればよい。すなわち、各カメラにおいて、３次元姿勢と２次元姿勢と外部パラメータが既知のため、内部パラメータを未知とする最小二乗法を解けばよい。また、より簡易な方法としては、画像サイズの半分をレンズ中心と仮定したり、カメラの仕様から焦点距離を与えたりしてもよい。 Note that the output from the association processing unit 12 can be used as the initial value for the external parameter. If the internal parameters are unknown, initial values can be given as follows. That is, since the three-dimensional posture, two-dimensional posture, and external parameters of each camera are known, it is sufficient to solve the least squares method with the internal parameters unknown. Furthermore, as a simpler method, it is also possible to assume that half of the image size is the center of the lens, or to give the focal length based on the specifications of the camera.

以上説明したように、実施形態によれば、レンズ特性に関わらず複数台のカメラを安定的に校正できる。その理由は、以下の通りである。
第１の理由は、３次元姿勢推定部１１にて各カメラ映像について独立に３次元姿勢を計算し、対応付け処理部１２にてそれらを対応付ける。これにより、望遠レンズと広角レンズの組み合わせが不要になるためである。
第２の理由は、対応付け処理部１２にて３次元姿勢は足首に限定されず、検出されたすべての関節点を対応付けに利用可能するためである。
実施形態に係る技術は、画像からの３次元形状復元（Structure-from-Motion）の用途に好適である。 As described above, according to the embodiment, multiple cameras can be stably calibrated regardless of lens characteristics. The reason is as follows.
The first reason is that the three-dimensional pose estimation unit 11 independently calculates the three-dimensional pose for each camera image, and the association processing unit 12 associates them. This is because this eliminates the need for a combination of a telephoto lens and a wide-angle lens.
The second reason is that the three-dimensional posture is not limited to the ankle in the association processing unit 12, and all detected joint points can be used for association.
The technique according to the embodiment is suitable for use in three-dimensional shape restoration from images (Structure-from-Motion).

上述した例に対して、いわゆる当業者が理解し得る多様な変更を適用することが可能である。実施形態は、以下の変形例に示す形態によっても実施可能である。 Various modifications that can be understood by those skilled in the art can be applied to the example described above. The embodiment can also be implemented by the forms shown in the following modified examples.

上記の説明では、Ｎ台のカメラは時間同期が取れていると仮定したが、時刻同期が取れていなくてもよい。すなわち、あるカメラのｉ番目のフレームと別のカメラのｉ番目のフレームは同一のタイムスタンプを持つか不明であってもよい。この場合、対応付け処理部１２は、１フレームの３次元姿勢を１つの点群として探索範囲を限定することはできなくなるだけで、全フレームの３次元姿勢を一塊の３次元点群とみなせば対応付け可能である。カメラ間で人物の同一動作が撮影できていれば、カメラ映像の時刻ずれを調整することも可能である。 In the above description, it is assumed that the N cameras are time synchronized, but the time synchronization may not be achieved. That is, it may be unknown whether the i-th frame of one camera and the i-th frame of another camera have the same timestamp. In this case, the correspondence processing unit 12 cannot limit the search range by treating the 3D posture of one frame as one point cloud, but if the 3D posture of all frames is regarded as a group of 3D points, Can be matched. If the same movement of a person can be captured between cameras, it is also possible to adjust the time lag in camera images.

本実施の形態におけるプログラムは、コンピュータに、図３に示すステップＳ１１～Ｓ１３を実行させるプログラムであればよい。このプログラムをコンピュータにインストールし、実行することによって、本実施の形態におけるカメラパラメータ推定装置とカメラパラメータ推定方法とを実現することができる。 The program in this embodiment may be any program that causes the computer to execute steps S11 to S13 shown in FIG. By installing and executing this program on a computer, the camera parameter estimation device and camera parameter estimation method according to the present embodiment can be realized.

図４は、カメラ校正装置１０のハードウェア構成例を示す図である。図４に示すように、カメラ校正装置１０は、プロセッサ１０１と、メモリ１０２とを有している。プロセッサ１０１は、例えば、ＭＰＵ（Micro Processing Unit）、又はＣＰＵ（Central Processing Unit）である。プロセッサ１０１は、複数のプロセッサを含んでもよい。メモリ１０２は、揮発性メモリ及び不揮発性メモリの組み合わせによって構成される。 FIG. 4 is a diagram showing an example of the hardware configuration of the camera calibration device 10. As shown in FIG. 4, the camera calibration device 10 includes a processor 101 and a memory 102. The processor 101 is, for example, an MPU (Micro Processing Unit) or a CPU (Central Processing Unit). Processor 101 may include multiple processors. Memory 102 is configured by a combination of volatile memory and nonvolatile memory.

メモリ１０２は、プロセッサ１０１から離れて配置されたストレージを含んでもよい。この場合、プロセッサ１０１は、図示されていないＩ／Ｏインタフェースを介してメモリ１０２にアクセスしてもよい。第１実施形態のカメラ校正装置１０の３次元姿勢推定部１１、対応付け処理部１２、最適化部１３は、プロセッサ１０１がメモリ１０２に記憶されたプログラムを読み込んで実行することにより実現されてもよい。 Memory 102 may include storage located remotely from processor 101. In this case, processor 101 may access memory 102 via an I/O interface (not shown). The three-dimensional pose estimation unit 11, the association processing unit 12, and the optimization unit 13 of the camera calibration device 10 of the first embodiment may be realized by the processor 101 reading and executing a program stored in the memory 102. good.

プログラムは、様々なタイプの非一時的なコンピュータ可読媒体（non-transitory computer readable medium）を用いて格納され、カメラパラメータ推定装置に供給することができる。非一時的なコンピュータ可読媒体の例は、磁気記録媒体（例えばフレキシブルディスク、磁気テープ、ハードディスクドライブ）、光磁気記録媒体（例えば光磁気ディスク）を含む。さらに、非一時的なコンピュータ可読媒体の例は、ＣＤ－ＲＯＭ（Read Only Memory）、ＣＤ－Ｒ、ＣＤ－Ｒ／Ｗを含む。さらに、非一時的なコンピュータ可読媒体の例は、半導体メモリを含む。半導体メモリは、例えば、マスクＲＯＭ、ＰＲＯＭ（Programmable ROM）、ＥＰＲＯＭ（Erasable PROM）、フラッシュＲＯＭ、ＲＡＭ（Random Access Memory）を含む。また、プログラムは、様々なタイプの一時的なコンピュータ可読媒体（transitory computer readable medium）によってカメラパラメータ推定装置に供給されてもよい。一時的なコンピュータ可読媒体の例は、電気信号、光信号、及び電磁波を含む。一時的なコンピュータ可読媒体は、電線及び光ファイバ等の有線通信路、又は無線通信路を介して、プログラムをカメラ校正装置１０に供給できる。 The program can be stored and provided to the camera parameter estimator using various types of non-transitory computer readable media. Examples of non-transitory computer-readable media include magnetic recording media (eg, floppy disks, magnetic tape, hard disk drives), magneto-optical recording media (eg, magneto-optical disks). Furthermore, examples of non-transitory computer-readable media include CD-ROM (Read Only Memory), CD-R, and CD-R/W. Further examples of non-transitory computer readable media include semiconductor memory. Semiconductor memories include, for example, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, and RAM (Random Access Memory). The program may also be provided to the camera parameter estimation device by various types of transitory computer readable media. Examples of transitory computer-readable media include electrical signals, optical signals, and electromagnetic waves. The temporary computer readable medium can supply the program to the camera calibration device 10 via wired communication paths such as electric wires and optical fibers, or wireless communication paths.

以上、実施の形態を参照して本願発明を説明したが、本願発明は上記によって限定されるものではない。本願発明の構成や詳細には、発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 Although the present invention has been described above with reference to the embodiments, the present invention is not limited to the above. The configuration and details of the present invention can be modified in various ways that can be understood by those skilled in the art within the scope of the invention.

１０カメラ校正装置
１１３次元姿勢推定部
１２対応付け処理部
１３最適化部
１０１プロセッサ
１０２メモリ 10 Camera Calibration Device 11 3D Posture Estimation Unit 12 Correlation Processing Unit 13 Optimization Unit 101 Processor 102 Memory

Claims

Obtain the two-dimensional posture representing the coordinates of the joint positions of the person on each frame image of the video generated by multiple cameras installed at different locations that photograph a common place containing one or more common people, and a 3D posture estimation unit that estimates a 3D posture representing a 3D joint position of a person from a 2D posture;
Matching that searches for a similarity transformation parameter that associates the three-dimensional pose estimated for each camera, matches the same person in frame images generated by different cameras, and determines external parameters of each camera from the similarity transformation parameter. a processing section;
The three-dimensional pose, the internal parameters of the camera, and the external an optimization unit that optimizes at least one of the parameters;
A camera calibration device.

Multiple cameras are time-synchronized,
The association processing unit limits the search range of the similarity transformation parameter to three-dimensional postures in which time stamps recorded in each frame image of the video generated by each camera are the same.
The camera calibration device according to claim 1.

When the plurality of cameras are not time-synchronized, the association processing unit adjusts the time shift of the video based on the same movement of a common person photographed by the plurality of cameras.
The camera calibration device according to claim 1.

The association processing unit sets any one camera among the plurality of cameras as the origin of the world coordinate system, and determines external parameters of the camera.
The camera calibration device according to claim 1.

The three-dimensional pose estimated from images generated by multiple cameras is composed of different joint positions depending on the viewing angle of each camera.
The camera calibration device according to claim 1.

The computer is
Obtain the two-dimensional posture representing the coordinates of the joint positions of the person on each frame image of the video generated by multiple cameras installed at different locations that photograph a common place containing one or more common people, and A process of estimating a 3D posture representing a 3D joint position of a person from a 2D posture;
A process of searching for a similarity transformation parameter that associates the three-dimensional pose estimated for each camera, associating the same person in frame images generated by different cameras, and determining external parameters of each camera from the similarity transformation parameter. ,
The three-dimensional pose, the internal parameters of the camera, and the external a process of optimizing at least one of the parameters;
Perform the camera calibration method.

Obtain the two-dimensional posture representing the coordinates of the joint positions of the person on each frame image of the video generated by multiple cameras installed at different locations that photograph a common place containing one or more common people, and A process of estimating a 3D posture representing a 3D joint position of a person from a 2D posture;
A process of searching for a similarity transformation parameter that associates the three-dimensional pose estimated for each camera, associating the same person in frame images generated by different cameras, and determining external parameters of each camera from the similarity transformation parameter. ,
The three-dimensional pose, the internal parameters of the camera, and the external a process of optimizing at least one of the parameters;
A program that causes a computer to execute.