JP6933110B2

JP6933110B2 - Camera external parameter estimation method, estimation device and estimation program

Info

Publication number: JP6933110B2
Application number: JP2017229132A
Authority: JP
Inventors: 一樹長村; 康洲鎌; 吉武　敏幸; 敏幸吉武
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2017-11-29
Filing date: 2017-11-29
Publication date: 2021-09-08
Anticipated expiration: 2037-11-29
Also published as: JP2019102877A

Description

本発明は、カメラの外部パラメータの推定方法、推定装置および推定プログラムに関する。 The present invention relates to an estimation method, an estimation device and an estimation program for external parameters of a camera.

視点の異なる複数のカメラで撮影された画像を用いて、視聴者が希望する任意の視点の映像（以下、自由視点映像とも称する）を生成する技術が知られている。複数のカメラで撮影された画像から自由視点映像を生成する場合、カメラ間の位置関係等を知るために、カメラ間の位置関係等を推定するキャリブレーションが実行される（例えば、特許文献１参照）。 There is known a technique for generating an image of an arbitrary viewpoint (hereinafter, also referred to as a free viewpoint image) desired by a viewer by using images taken by a plurality of cameras having different viewpoints. When a free-viewpoint image is generated from images taken by a plurality of cameras, calibration for estimating the positional relationship between the cameras is executed in order to know the positional relationship between the cameras (see, for example, Patent Document 1). ).

例えば、ＳＦＭ（Structure from Motion）と呼ばれる手法を用いて、カメラ間の位置関係を示す回転行列および並進ベクトルを含む外部パラメータを推定する多視点映像表現装置が提案されている（例えば、特許文献２参照）。ＳＦＭを用いた手法では、複数のカメラで撮影された複数の画像から抽出される特徴点の対応関係に基づいて、カメラの外部パラメータが推定される。 For example, a multi-viewpoint video expression device that estimates external parameters including a rotation matrix and a translation vector indicating the positional relationship between cameras using a method called SFM (Structure from Motion) has been proposed (for example, Patent Document 2). reference). In the method using SFM, the external parameters of the cameras are estimated based on the correspondence of the feature points extracted from the plurality of images taken by the plurality of cameras.

また、スポーツシーンにおいて、自由視点映像を生成する方法が提案されている（例えば、非特許文献１参照）。 Further, a method of generating a free-viewpoint image in a sports scene has been proposed (see, for example, Non-Patent Document 1).

特開２００４−２３５９３４号公報Japanese Unexamined Patent Publication No. 2004-235934 特開２０１４−２７５２８号公報Japanese Unexamined Patent Publication No. 2014-27528

北原格、大田友一、“多視点映像の融合によるスポーツシーンの自由視点映像生成”、画像の認識・理解シンポジウム(MIRU2000)、(2000)Tadashi Kitahara, Yuichi Ota, "Free-viewpoint video generation of sports scenes by fusing multi-viewpoint video", Image Recognition and Understanding Symposium (MIRU2000), (2000)

スポーツシーンにおいて、自由視点映像を生成する場合、例えば、カメラ間の位置関係等を推定するためにコート内に設置されたキャリブレーションボードを複数のカメラで同時に撮影して得られる情報を用いて、カメラの外部パラメータが事前に計測される。キャリブレーションボードは、カメラの外部パラメータが計測された後、試合の開始前までに撤去される。したがって、キャリブレーションボードを事前に撮影してカメラの外部パラメータを計測する方法では、スポーツの試合中にカメラの外部パラメータを再計測することは困難である。このため、試合中の選手の動き等により床が振動してカメラの姿勢が変化した場合、カメラの外部パラメータは、事前に計測した値、すなわち、自由視点映像の生成に用いる外部パラメータの値から変化する。この場合、自由視点映像の品質は、試合中のカメラの外部パラメータの値が事前に計測した値から変化しない場合に比べて低下する。 When generating a free-viewpoint image in a sports scene, for example, using information obtained by simultaneously photographing a calibration board installed in the court with a plurality of cameras in order to estimate the positional relationship between cameras, etc. The external parameters of the camera are measured in advance. The calibration board will be removed before the start of the match after the external parameters of the camera have been measured. Therefore, it is difficult to remeasure the external parameters of the camera during a sports match by the method of photographing the calibration board in advance and measuring the external parameters of the camera. Therefore, when the floor vibrates and the posture of the camera changes due to the movement of the players during the game, the external parameters of the camera are calculated from the values measured in advance, that is, the values of the external parameters used to generate the free viewpoint image. Change. In this case, the quality of the free-viewpoint image is lower than that in the case where the value of the external parameter of the camera during the game does not change from the value measured in advance.

また、バスケットボール、バレーボール等のスポーツでは、同じチームの選手達は、同じユニホームを着用する。この場合、自由視点映像の生成に用いる複数の画像から抽出した特徴点の画像間の対応関係に基づいてカメラの外部パラメータを推定する方法では、ユニホーム等を示す特徴点が画像間で誤って対応付けられるおそれがある。特徴点の画像間の対応関係が誤っている場合、カメラの外部パラメータの推定精度が低下する。 In sports such as basketball and volleyball, players of the same team wear the same uniform. In this case, in the method of estimating the external parameters of the camera based on the correspondence between the images of the feature points extracted from a plurality of images used for generating the free viewpoint image, the feature points indicating a uniform or the like are erroneously corresponded between the images. It may be attached. If the correspondence between the images of the feature points is incorrect, the estimation accuracy of the external parameters of the camera will decrease.

１つの側面では、本発明は、カメラの姿勢または位置が撮影期間中に変化した場合でも、カメラの外部パラメータを精度よく推定することを目的とする。 In one aspect, the present invention aims to accurately estimate the external parameters of the camera even if the posture or position of the camera changes during the imaging period.

１つの実施態様では、被写体を互いに異なる位置から撮影する複数のカメラの外部パラメータを推定する推定装置は、複数のカメラの各々から取得した画像から、撮影された空間の特徴点を抽出し、複数のカメラの各々から取得した画像から人物の骨格を抽出し、複数のカメラから取得した複数の画像のうちの２つの画像のペア毎に、２つの画像の一方から抽出した特徴点と２つの画像の他方から抽出した特徴点とを対応付けて２つの画像間の第１対応点を決定し、２つの画像のペア毎に、２つの画像の一方から抽出した骨格と２つの画像の他方から抽出した骨格とを対応付けて２つの画像間の第２対応点を決定し、２つの画像間の第１対応点および第２対応点に基づいて、カメラの外部パラメータを推定する。 In one embodiment, an estimation device that estimates external parameters of a plurality of cameras that capture a subject from different positions extracts feature points of the captured space from images acquired from each of the plurality of cameras, and a plurality of images. The skeleton of a person is extracted from the images acquired from each of the cameras of the above, and the feature points and the two images extracted from one of the two images for each pair of two images out of the plurality of images acquired from the plurality of cameras. The first corresponding point between the two images is determined by associating with the feature points extracted from the other of the two images, and each pair of the two images is extracted from the skeleton extracted from one of the two images and the other of the two images. The second correspondence point between the two images is determined in association with the skeleton, and the external parameters of the camera are estimated based on the first correspondence point and the second correspondence point between the two images.

１つの側面では、本発明は、カメラの姿勢または位置が撮影期間中に変化した場合でも、カメラの外部パラメータを精度よく推定することができる。 In one aspect, the present invention can accurately estimate the external parameters of the camera even if the posture or position of the camera changes during the imaging period.

カメラの外部パラメータの推定方法、推定装置および推定プログラムの一実施形態を示す図である。It is a figure which shows one Embodiment of the estimation method, the estimation apparatus and the estimation program of the external parameter of a camera. 図１に示した推定装置の動作の一例を示す図である。It is a figure which shows an example of the operation of the estimation apparatus shown in FIG. カメラの外部パラメータの推定方法、推定装置および推定プログラムの別の実施形態を示す図である。It is a figure which shows another embodiment of the estimation method, the estimation device and the estimation program of the external parameter of a camera. 図３に示した骨格抽出部により抽出される骨格の一例を示す図である。It is a figure which shows an example of the skeleton extracted by the skeleton extraction part shown in FIG. 図３に示した推定装置による画像間の対応付けの一例を示す図である。It is a figure which shows an example of the correspondence between images by the estimation apparatus shown in FIG. 図３に示した推定装置の動作の一例を示す図である。It is a figure which shows an example of the operation of the estimation apparatus shown in FIG. 図６に示した第１算出処理の一例を示す図である。It is a figure which shows an example of the 1st calculation process shown in FIG. カメラの外部パラメータの推定方法、推定装置および推定プログラムの別の実施形態を示す図である。It is a figure which shows another embodiment of the estimation method, the estimation device and the estimation program of the external parameter of a camera. 図８に示した推定装置の動作の一例を示す図である。It is a figure which shows an example of the operation of the estimation apparatus shown in FIG. 図９に示した第１算出処理の一例を示す図である。It is a figure which shows an example of the 1st calculation process shown in FIG.

以下、実施形態について、図面を用いて説明する。 Hereinafter, embodiments will be described with reference to the drawings.

図１は、カメラの外部パラメータの推定方法、推定装置および推定プログラムの一実施形態を示す。図１に示す推定装置１０は、例えば、視聴者が希望する任意の視点の映像（自由視点映像）を生成する自由視点映像生成システムＳＹＳに含まれる。自由視点映像生成システムＳＹＳは、推定装置１０の他に、異なる視点から共通の被写体を撮影する複数のカメラＣＡＭ（ＣＡＭａ、ＣＡＭｂ、ＣＡＭｃ、・・・、ＣＡＭｎ）と自由視点映像生成部２０とを有する。推定装置１０は、２つのカメラＣＡＭ間の位置関係等を示す回転行列および並進ベクトルを含むカメラＣＡＭの外部パラメータを推定するキャリブレーションを実行する。例えば、推定装置１０は、被写体を互いに異なる位置から撮影する複数のカメラＣＡＭの外部パラメータを推定し、推定した外部パラメータを自由視点映像生成部２０に転送する。自由視点映像生成部２０は、複数のカメラＣＡＭでそれぞれ撮影された画像と推定装置１０により推定されたカメラＣＡＭの外部パラメータ等を用いて、自由視点映像を生成する。すなわち、推定装置１０により推定されたカメラＣＡＭの外部パラメータは、自由視点映像を生成するために用いられる。 FIG. 1 shows an embodiment of an estimation method, an estimation device, and an estimation program for external parameters of a camera. The estimation device 10 shown in FIG. 1 is included in, for example, the free viewpoint image generation system SYS that generates an image (free viewpoint image) of an arbitrary viewpoint desired by the viewer. In addition to the estimation device 10, the free-viewpoint video generation system SYS includes a plurality of camera CAMs (CAMa, CAMb, CAMc, ..., CAMn) for capturing a common subject from different viewpoints, and a free-viewpoint video generation unit 20. Have. The estimation device 10 performs calibration for estimating the external parameters of the camera CAM including the rotation matrix and the translation vector indicating the positional relationship between the two camera CAMs. For example, the estimation device 10 estimates the external parameters of a plurality of camera CAMs that shoot the subject from different positions, and transfers the estimated external parameters to the free viewpoint image generation unit 20. The free-viewpoint image generation unit 20 generates a free-viewpoint image by using the images taken by the plurality of camera CAMs and the external parameters of the camera CAMs estimated by the estimation device 10. That is, the external parameters of the camera CAM estimated by the estimation device 10 are used to generate the free viewpoint image.

推定装置１０は、例えば、コンピュータ等の情報処理装置により実現され、ＣＰＵ（Central Processing Unit）等のプロセッサ１００とメモリ１０００とを有する。プロセッサ１００およびメモリ１０００は、バスＢＵＳに接続される。 The estimation device 10 is realized by, for example, an information processing device such as a computer, and has a processor 100 such as a CPU (Central Processing Unit) and a memory 1000. The processor 100 and the memory 1000 are connected to the bus BUS.

例えば、プロセッサ１００は、メモリ１０００に格納される推定プログラム（カメラＣＡＭの外部パラメータの推定プログラム）を実行し、推定装置１０の動作を制御する。なお、推定プログラムは、推定装置１０の記憶装置のうちのメモリ１０００以外の記憶装置に格納されてもよく、推定装置１０の外部の記憶装置に格納されてもよい。また、推定プログラムは、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＤＶＤ（Digital Versatile Disc）、ＵＳＢ（Universal Serial Bus）メモリ等のコンピュータにより読み取り可能な記録媒体ＲＥＣに格納されてもよい。この場合、記録媒体ＲＥＣに格納された推定プログラムは、推定装置１０に設けられる図示しない入出力インタフェースを介して記録媒体ＲＥＣからメモリ１０００等に転送される。なお、推定プログラムは、記録媒体ＲＥＣから図示しないハードディスクに転送された後、ハードディスクからメモリ１０００に転送されてもよい。 For example, the processor 100 executes an estimation program (an estimation program for external parameters of the camera CAM) stored in the memory 1000 and controls the operation of the estimation device 10. The estimation program may be stored in a storage device other than the memory 1000 among the storage devices of the estimation device 10, or may be stored in a storage device outside the estimation device 10. Further, the estimation program may be stored in a computer-readable recording medium REC such as a CD-ROM (Compact Disc Read Only Memory), a DVD (Digital Versatile Disc), or a USB (Universal Serial Bus) memory. In this case, the estimation program stored in the recording medium REC is transferred from the recording medium REC to the memory 1000 or the like via an input / output interface (not shown) provided in the estimation device 10. The estimation program may be transferred from the recording medium REC to a hard disk (not shown) and then transferred from the hard disk to the memory 1000.

プロセッサ１００は、例えば、メモリ１０００に格納される推定プログラムを実行することにより、特徴点抽出部２００、骨格抽出部３００、特徴点対応付け部４００、骨格対応付け部５００および推定部６００の機能を実現する。すなわち、推定装置１０は、特徴点抽出部２００、骨格抽出部３００、特徴点対応付け部４００、骨格対応付け部５００および推定部６００を有する。なお、特徴点抽出部２００、骨格抽出部３００、特徴点対応付け部４００、骨格対応付け部５００および推定部６００は、ハードウェアのみで実現されてもよい。 The processor 100 functions, for example, by executing an estimation program stored in the memory 1000, to perform the functions of the feature point extraction unit 200, the skeleton extraction unit 300, the feature point association unit 400, the skeleton association unit 500, and the estimation unit 600. Realize. That is, the estimation device 10 includes a feature point extraction unit 200, a skeleton extraction unit 300, a feature point association unit 400, a skeleton association unit 500, and an estimation unit 600. The feature point extraction unit 200, the skeleton extraction unit 300, the feature point association unit 400, the skeleton association unit 500, and the estimation unit 600 may be realized only by hardware.

特徴点抽出部２００は、所定のフレームレートで被写体を撮影する複数のカメラＣＡＭの各々から画像を順次取得する。そして、特徴点抽出部２００は、複数のカメラＣＡＭから取得した画像の各々から、カメラＣＡＭで撮影された空間の特徴点を抽出する。カメラＣＡＭで撮影された空間の特徴点は、例えば、物体の角等の自然特徴点である。以下、カメラＣＡＭで撮影された空間の特徴点は、自然特徴点とも称される。 The feature point extraction unit 200 sequentially acquires images from each of a plurality of camera CAMs that shoot a subject at a predetermined frame rate. Then, the feature point extraction unit 200 extracts the feature points of the space taken by the camera CAM from each of the images acquired from the plurality of camera CAMs. The feature points of the space photographed by the camera CAM are natural feature points such as the corners of an object. Hereinafter, the feature points of the space photographed by the camera CAM are also referred to as natural feature points.

骨格抽出部３００は、特徴点抽出部２００に転送される画像を取得する。すなわち、各カメラＣＡＭで撮影された画像は、特徴点抽出部２００および骨格抽出部３００に転送される。そして、骨格抽出部３００は、複数のカメラＣＡＭから取得した画像の各々から、画像中の人物の骨格を抽出する。例えば、骨格抽出部３００は、画像中の人物の関節の位置（以下、骨格点とも称する）を人物の骨格として抽出する。 The skeleton extraction unit 300 acquires an image to be transferred to the feature point extraction unit 200. That is, the images taken by each camera CAM are transferred to the feature point extraction unit 200 and the skeleton extraction unit 300. Then, the skeleton extraction unit 300 extracts the skeleton of the person in the image from each of the images acquired from the plurality of camera CAMs. For example, the skeleton extraction unit 300 extracts the position of a person's joint (hereinafter, also referred to as a skeleton point) in the image as the skeleton of the person.

特徴点対応付け部４００は、各画像から抽出された自然特徴点を示す特徴情報を、特徴点抽出部２００から受ける。そして、特徴点対応付け部４００は、複数のカメラＣＡＭから取得した複数の画像のうちの２つの画像のペア毎に、２つの画像の一方から抽出した自然特徴点と２つの画像の他方から抽出した自然特徴点とを対応付けて２つの画像間の第１対応点を決定する。例えば、特徴点対応付け部４００は、カメラＣＡＭａで撮影された画像から抽出された自然特徴点と、カメラＣＡＭｂで撮影された画像から抽出された自然特徴点との間で、同じ物体の特徴を示す自然特徴点をペアにする。これにより、カメラＣＡＭａ、ＣＡＭｂでそれぞれ撮影された２つの画像間の第１対応点（自然特徴点のペア）が特定される。また、例えば、特徴点対応付け部４００は、カメラＣＡＭｂで撮影された画像から抽出された自然特徴点と、カメラＣＡＭｃで撮影された画像から抽出された自然特徴点との間で、同じ物体の特徴を示す自然特徴点をペアにする。これにより、カメラＣＡＭｂ、ＣＡＭｃでそれぞれ撮影された２つの画像間の第１対応点（自然特徴点のペア）が特定される。そして、特徴点対応付け部４００は、画像間の第１対応点（自然特徴点のペア）を示す情報を推定部６００に転送する。 The feature point associating unit 400 receives feature information indicating natural feature points extracted from each image from the feature point extraction unit 200. Then, the feature point mapping unit 400 extracts from the natural feature points extracted from one of the two images and the other of the two images for each pair of two images out of the plurality of images acquired from the plurality of camera CAMs. The first corresponding point between the two images is determined by associating with the natural feature points. For example, the feature point mapping unit 400 can display the features of the same object between the natural feature points extracted from the image taken by the camera CAMa and the natural feature points extracted from the image taken by the camera CAMb. Pair the indicated natural feature points. As a result, the first corresponding point (pair of natural feature points) between the two images taken by the cameras CAMa and CAMb is specified. Further, for example, the feature point associating unit 400 uses the same object between the natural feature points extracted from the image taken by the camera CAMb and the natural feature points extracted from the image taken by the camera CAMc. Pair the natural feature points that show the feature. As a result, the first corresponding point (pair of natural feature points) between the two images taken by the cameras CAMb and CAMc is specified. Then, the feature point associating unit 400 transfers the information indicating the first corresponding point (pair of natural feature points) between the images to the estimation unit 600.

骨格対応付け部５００は、各画像から抽出された人物の骨格（例えば、骨格点）を示す骨格情報を、骨格抽出部３００から受ける。そして、骨格対応付け部５００は、複数のカメラＣＡＭから取得した複数の画像のうちの２つの画像のペア毎に、２つの画像の一方から抽出した骨格と２つの画像の他方から抽出した骨格とを対応付けて２つの画像間の第２対応点を決定する。例えば、骨格対応付け部５００は、カメラＣＡＭａで撮影された画像から抽出された骨格点と、カメラＣＡＭｂで撮影された画像から抽出された骨格点との間で、同一人物の同じ関節の位置を示す骨格点をペアにする。これにより、カメラＣＡＭａ、ＣＡＭｂでそれぞれ撮影された２つの画像間の第２対応点（骨格点のペア）が特定される。また、例えば、骨格対応付け部５００は、カメラＣＡＭｂで撮影された画像から抽出された骨格点と、カメラＣＡＭｃで撮影された画像から抽出された骨格点との間で、同一人物の同じ関節の位置を示す骨格点をペアにする。これにより、カメラＣＡＭｂ、ＣＡＭｃでそれぞれ撮影された２つの画像間の第２対応点（骨格点のペア）が特定される。このように、推定装置１０は、２つの画像間の対応点として、第１対応点（自然特徴点のペア）の他に、第２対応点（骨格点のペア）を特定する。骨格対応付け部５００は、画像間の第２対応点（骨格点のペア）を示す情報を推定部６００に転送する。 The skeleton association unit 500 receives skeleton information indicating the skeleton (for example, skeleton points) of a person extracted from each image from the skeleton extraction unit 300. Then, the skeleton association unit 500 includes a skeleton extracted from one of the two images and a skeleton extracted from the other of the two images for each pair of two images among the plurality of images acquired from the plurality of camera CAMs. To determine the second corresponding point between the two images. For example, the skeleton association unit 500 positions the same joint of the same person between the skeleton points extracted from the image taken by the camera CAMa and the skeleton points extracted from the image taken by the camera CAMb. Pair the indicated skeletal points. Thereby, the second corresponding point (pair of skeleton points) between the two images taken by the cameras CAMa and CAMb is specified. Further, for example, the skeleton association unit 500 has the same joint of the same person between the skeleton points extracted from the image taken by the camera CAMb and the skeleton points extracted from the image taken by the camera CAMc. Pair the skeletal points that indicate the position. Thereby, the second corresponding point (pair of skeleton points) between the two images taken by the cameras CAMb and CAMc is specified. In this way, the estimation device 10 specifies a second corresponding point (a pair of skeleton points) in addition to the first corresponding point (a pair of natural feature points) as a corresponding point between the two images. The skeleton association unit 500 transfers information indicating a second corresponding point (pair of skeleton points) between images to the estimation unit 600.

同一人物か否かは、例えば、各人物において、互いに隣接する骨格点間の長さおよび角度に基づいて判定される。互いに隣接する骨格点間の長さおよび角度は、人物の関節間の長さおよび角度に対応する。このため、例えば、複数の人物（選手）が同じユニホームを着用するバスケットボール、バレーボール等のスポーツシーンにおいても、２つの画像間で人物の関節間の長さおよび角度が類似する人物を特定することにより、画像間で同一人物を特定することができる。この場合、同一人物の特定に骨格点を用いない場合に比べて、同一人物の特定精度を向上させることができる。すなわち、推定装置１０は、同一人物の特定に骨格点を用いない場合に比べて、特徴点等を誤って対応付けることを抑制できる。 Whether or not they are the same person is determined based on, for example, the length and angle between skeletal points adjacent to each other in each person. The lengths and angles between adjacent skeletal points correspond to the lengths and angles between the joints of a person. Therefore, for example, even in sports scenes such as basketball and volleyball in which a plurality of people (players) wear the same uniform, it is possible to identify a person having similar lengths and angles between the joints of the two images. , The same person can be identified between images. In this case, the accuracy of identifying the same person can be improved as compared with the case where the skeleton point is not used to identify the same person. That is, the estimation device 10 can suppress erroneous association of feature points and the like as compared with the case where the skeleton points are not used to identify the same person.

推定部６００は、２つの画像間の第１対応点および第２対応点に基づいて、カメラＣＡＭの外部パラメータを推定する。例えば、推定部６００は、特徴点対応付け部４００および骨格対応付け部５００で特定された２つの画像間の第１対応点および第２対応点から、８個の対応点を選択する。そして、推定部６００は、選択した８個の対応点を用いて８点アルゴリズムを実行することにより、カメラＣＡＭの内部パラメータおよび外部パラメータの情報を含む基礎行列を算出する。カメラＣＡＭの内部パラメータは、カメラＣＡＭの焦点距離、画像の中心座標等であり、予め算出される。この場合、推定部６００は、カメラＣＡＭの内部パラメータおよび外部パラメータのうちの外部パラメータのみの情報を含む基本行列を、カメラＣＡＭの内部パラメータおよび基礎行列から算出する。そして、推定部６００は、基本行列を分解して外部パラメータを算出し、算出した外部パラメータを自由視点映像生成部２０に転送する。 The estimation unit 600 estimates the external parameters of the camera CAM based on the first correspondence point and the second correspondence point between the two images. For example, the estimation unit 600 selects eight correspondence points from the first correspondence point and the second correspondence point between the two images specified by the feature point association unit 400 and the skeleton association unit 500. Then, the estimation unit 600 calculates the basic matrix including the information of the internal parameter and the external parameter of the camera CAM by executing the 8-point algorithm using the selected eight corresponding points. The internal parameters of the camera CAM are the focal length of the camera CAM, the center coordinates of the image, and the like, and are calculated in advance. In this case, the estimation unit 600 calculates a basic matrix including information on only the external parameters among the internal parameters and the external parameters of the camera CAM from the internal parameters and the basic matrix of the camera CAM. Then, the estimation unit 600 decomposes the basic matrix, calculates an external parameter, and transfers the calculated external parameter to the free viewpoint video generation unit 20.

このように、推定部６００は、例えば、フレーム毎に、複数のカメラＣＡＭで撮影された複数の画像から抽出される自然特徴点の対応関係および人物の骨格の対応関係に基づいて、カメラＣＡＭの外部パラメータを推定する。推定装置１０は、自然特徴点の他に、骨格を用いて画像間の対応点を決定するため、特徴点等を誤って対応付けして対応点を決定することを抑制でき、カメラＣＡＭの外部パラメータの推定精度が低下することを抑制できる。すなわち、推定装置１０は、カメラＣＡＭの姿勢または位置が撮影期間中に変化した場合でも、カメラＣＡＭの外部パラメータを自由視点映像の生成に用いる画像から精度よく推定できる。 In this way, the estimation unit 600 is based on, for example, the correspondence of the natural feature points extracted from the plurality of images taken by the plurality of camera CAMs and the correspondence of the skeleton of the person for each frame. Estimate external parameters. Since the estimation device 10 determines the corresponding points between the images using the skeleton in addition to the natural feature points, it is possible to suppress the determination of the corresponding points by erroneously associating the feature points and the like, and the outside of the camera CAM. It is possible to suppress a decrease in parameter estimation accuracy. That is, even if the posture or position of the camera CAM changes during the shooting period, the estimation device 10 can accurately estimate the external parameters of the camera CAM from the image used for generating the free viewpoint image.

なお、推定装置１０の構成は、図１に示す例に限定されない。例えば、カメラＣＡＭで撮影された画像を受信し、カメラＣＡＭから受信した画像を特徴点抽出部２００および骨格抽出部３００にバスＢＵＳを介して転送するインタフェースが推定装置１０に含まれてもよい。また、例えば、プロセッサ１００は、自由視点映像を生成するプログラムを実行することにより、自由視点映像生成部２０の機能を実現してもよい。 The configuration of the estimation device 10 is not limited to the example shown in FIG. For example, the estimation device 10 may include an interface that receives an image taken by the camera CAM and transfers the image received from the camera CAM to the feature point extraction unit 200 and the skeleton extraction unit 300 via the bus BUS. Further, for example, the processor 100 may realize the function of the free viewpoint image generation unit 20 by executing a program for generating a free viewpoint image.

図２は、図１に示した推定装置１０の動作の一例を示す。図２に示す動作は、カメラＣＡＭの外部パラメータの推定方法の一例である。また、図２に示す動作をコンピュータ等の推定装置１０に実行させるためのプログラムは、カメラＣＡＭの外部パラメータの推定プログラムの一例である。図２に示す動作は、カメラＣＡＭで撮影される映像のフレーム毎に実行される。なお、図２に示す動作は、数フレームおきに実行されてもよい。 FIG. 2 shows an example of the operation of the estimation device 10 shown in FIG. The operation shown in FIG. 2 is an example of a method of estimating the external parameters of the camera CAM. Further, the program for causing the estimation device 10 such as a computer to execute the operation shown in FIG. 2 is an example of an estimation program for external parameters of the camera CAM. The operation shown in FIG. 2 is executed for each frame of the image captured by the camera CAM. The operation shown in FIG. 2 may be executed every few frames.

ステップＳ１０では、特徴点抽出部２００は、複数のカメラＣＡＭから取得した画像の各々から自然特徴点を抽出する。 In step S10, the feature point extraction unit 200 extracts natural feature points from each of the images acquired from the plurality of camera CAMs.

次に、ステップＳ２０では、骨格抽出部３００は、複数のカメラＣＡＭから取得した画像の各々から人物の骨格点を抽出する。これにより、例えば、スポーツシーンの自由視点映像を自由視点映像生成システムＳＹＳが生成する場合、試合中の選手の骨格点が骨格抽出部３００により抽出される。 Next, in step S20, the skeleton extraction unit 300 extracts the skeleton points of the person from each of the images acquired from the plurality of camera CAMs. As a result, for example, when the free-viewpoint image generation system SYS generates a free-viewpoint image of a sports scene, the skeleton points of the players during the match are extracted by the skeleton extraction unit 300.

次に、ステップＳ３０では、特徴点対応付け部４００は、ステップＳ１０で抽出された自然特徴点のうち、互いに隣接するカメラＣＡＭで撮影された画像から抽出された自然特徴点を対応付けて画像間の第１対応点を決定する。これにより、互いに隣接するカメラＣＡＭで撮影された画像間の対応点として、第１対応点（自然特徴点のペア）が特定される。 Next, in step S30, the feature point associating unit 400 associates the natural feature points extracted from the images taken by the cameras CAM adjacent to each other among the natural feature points extracted in step S10 between the images. The first correspondence point of is determined. As a result, the first corresponding point (pair of natural feature points) is specified as the corresponding point between the images taken by the cameras CAM adjacent to each other.

次に、ステップＳ４０では、骨格対応付け部５００は、ステップＳ２０で抽出された骨格点のうち、互いに隣接するカメラＣＡＭで撮影された画像から抽出された骨格点を対応付けて画像間の第２対応点を決定する。これにより、互いに隣接するカメラＣＡＭで撮影された画像間の対応点として、第２対応点が、第１対応点（自然特徴点のペア）とは別に特定される。このように、推定装置１０は、自然特徴点の他に骨格点を用いて画像間の対応点を特定するため、複数の選手が同じユニホームを着用するスポーツシーンにおいても、骨格点を抽出しない場合に比べて、人物（選手）を誤って対応付けることを抑制できる。 Next, in step S40, the skeleton association unit 500 associates the skeleton points extracted from the images taken by the cameras CAM adjacent to each other among the skeleton points extracted in step S20, and the second skeleton between the images. Determine the corresponding points. As a result, the second corresponding point is specified separately from the first corresponding point (pair of natural feature points) as the corresponding points between the images taken by the cameras CAM adjacent to each other. In this way, since the estimation device 10 uses the skeleton points in addition to the natural feature points to identify the corresponding points between the images, even in a sports scene in which a plurality of athletes wear the same uniform, the skeleton points are not extracted. Compared with, it is possible to suppress erroneous association of people (players).

次に、ステップＳ５０では、推定部６００は、ステップＳ３０で決定した画像間の第１対応点およびステップＳ４０で決定した画像間の第２対応点に基づいて、カメラＣＡＭの外部パラメータを推定する。推定装置１０は、特徴点等の対応付けに骨格点を用いない場合に比べて、特徴点等の画像間の対応付けを正確にできるため、カメラＣＡＭの外部パラメータをフレーム毎に精度よく推定できる。 Next, in step S50, the estimation unit 600 estimates the external parameters of the camera CAM based on the first correspondence point between the images determined in step S30 and the second correspondence point between the images determined in step S40. Compared with the case where the skeleton points are not used for the mapping of the feature points, the estimation device 10 can accurately map the feature points between the images, so that the external parameters of the camera CAM can be estimated accurately for each frame. ..

なお、推定装置１０の動作は、図２に示す例に限定されない。例えば、ステップＳ２０の処理は、ステップＳ１０の処理の前に実効されてもよい。あるいは、ステップＳ２０の処理は、ステップＳ３０の処理の後に実効されてもよい。 The operation of the estimation device 10 is not limited to the example shown in FIG. For example, the process of step S20 may be executed before the process of step S10. Alternatively, the process of step S20 may be implemented after the process of step S30.

以上、図１および図２に示す実施形態では、推定装置１０は、自然特徴点の他に、人物の骨格を、複数のカメラＣＡＭから取得した画像の各々から抽出する。そして、推定装置１０は、２つの画像から抽出された自然特徴点を対応付けて２つの画像間の第１対応点を決定する。さらに、推定装置１０は、２つの画像から抽出された骨格を対応付けて２つの画像間の第２対応点を決定する。これにより、２つの画像間の対応点として、第１対応点および第２対応点が特定される。推定装置１０は、自然特徴点の他に、骨格を用いて画像間の対応点を決定するため、特徴点等を誤って対応付けることを抑制でき、誤った対応点を用いてカメラＣＡＭの外部パラメータを推定することを抑制できる。これにより、カメラＣＡＭの外部パラメータの推定精度が低下することを抑制することができる。 As described above, in the embodiment shown in FIGS. 1 and 2, the estimation device 10 extracts the skeleton of the person from each of the images acquired from the plurality of camera CAMs in addition to the natural feature points. Then, the estimation device 10 associates the natural feature points extracted from the two images to determine the first corresponding point between the two images. Further, the estimation device 10 associates the skeletons extracted from the two images to determine the second corresponding point between the two images. As a result, the first correspondence point and the second correspondence point are specified as the correspondence points between the two images. Since the estimation device 10 determines the corresponding points between the images using the skeleton in addition to the natural feature points, it is possible to suppress erroneous association of the feature points and the like, and the external parameters of the camera CAM are used by using the wrong corresponding points. Can be suppressed from estimating. As a result, it is possible to suppress a decrease in the estimation accuracy of the external parameters of the camera CAM.

なお、例えば、推定装置１０は、カメラＣＡＭの姿勢または位置が撮影期間中に変化した場合でも、カメラＣＡＭの姿勢または位置が変化した後に取得した画像から、カメラＣＡＭの外部パラメータを推定できる。したがって、推定装置１０は、カメラＣＡＭの姿勢または位置が撮影期間中に変化した場合でも、カメラＣＡＭの外部パラメータを精度よく推定できる。 For example, even if the posture or position of the camera CAM changes during the shooting period, the estimation device 10 can estimate the external parameters of the camera CAM from the image acquired after the posture or position of the camera CAM changes. Therefore, the estimation device 10 can accurately estimate the external parameters of the camera CAM even when the posture or position of the camera CAM changes during the shooting period.

図３は、カメラの外部パラメータの推定方法、推定装置および推定プログラムの別の実施形態を示す。図１から図２で説明した要素と同一または同様の要素については、同一または同様の符号を付し、これ等については、詳細な説明を省略する。図３に示す推定装置１２は、例えば、複数のカメラＣＡＭと自由視点映像生成部２０とともに、自由視点映像生成システムＳＹＳに含まれる。推定装置１２は、図１で説明した推定装置１０と同様に、カメラＣＡＭの外部パラメータを推定するキャリブレーションを実行する。例えば、推定装置１２は、被写体を互いに異なる位置から撮影する複数のカメラＣＡＭの外部パラメータを推定し、推定した外部パラメータを自由視点映像生成部２０に転送する。自由視点映像生成部２０は、複数のカメラＣＡＭでそれぞれ撮影された画像と推定装置１２により推定されたカメラＣＡＭの外部パラメータ等を用いて、自由視点映像を生成する。すなわち、推定装置１２により推定されたカメラＣＡＭの外部パラメータは、自由視点映像を生成するために用いられる。図３に示す例では、複数のカメラＣＡＭは、被写体を囲むように配置され、カメラＣＡＭａは、カメラＣＡＭｂとカメラＣＡＭｎとの間に配置され、カメラＣＡＭｂは、カメラＣＡＭａとカメラＣＡＭｃとの間に配置される。 FIG. 3 shows another embodiment of a method of estimating external parameters of a camera, an estimation device and an estimation program. Elements that are the same as or similar to the elements described with reference to FIGS. 1 to 2 are designated by the same or similar reference numerals, and detailed description thereof will be omitted. The estimation device 12 shown in FIG. 3 is included in the free viewpoint image generation system SYS together with, for example, a plurality of camera CAMs and a free viewpoint image generation unit 20. The estimation device 12 performs calibration for estimating the external parameters of the camera CAM in the same manner as the estimation device 10 described with reference to FIG. For example, the estimation device 12 estimates the external parameters of a plurality of camera CAMs that shoot the subject from different positions, and transfers the estimated external parameters to the free viewpoint image generation unit 20. The free-viewpoint image generation unit 20 generates a free-viewpoint image by using the images taken by the plurality of camera CAMs and the external parameters of the camera CAMs estimated by the estimation device 12. That is, the external parameters of the camera CAM estimated by the estimation device 12 are used to generate the free viewpoint image. In the example shown in FIG. 3, a plurality of camera CAMs are arranged so as to surround the subject, the camera CAMa is arranged between the camera CAMb and the camera CAMn, and the camera CAMb is arranged between the camera CAMa and the camera CAMc. Be placed.

推定装置１２は、図１に示したプロセッサ１００の代わりにプロセッサ１０２を有することを除いて、図１に示した推定装置１０と同一または同様である。例えば、推定装置１２は、コンピュータ等の情報処理装置により実現され、プロセッサ１０２およびメモリ１０００を有する。プロセッサ１０２およびメモリ１０００は、バスＢＵＳに接続される。また、メモリ１０００には、後述する骨格抽出部３０２（３０２ａ、３０２ｂ、３０２ｃ、・・・、３０２ｎ）に使用される学習データＬＤＰが格納される。図３では、学習データＬＤＰは、メモリ１０００からバスＢＵＳを介して各骨格抽出部３０２に転送されるため、各骨格抽出部３０２とバスＢＵＳとの間のデータ経路の一部を破線で示す。なお、学習データＬＤＰは、推定装置１２の記憶装置のうちのメモリ１０００以外の記憶装置に格納されてもよく、推定装置１２の外部の記憶装置に格納されてもよい。 The estimation device 12 is the same as or similar to the estimation device 10 shown in FIG. 1, except that it has a processor 102 instead of the processor 100 shown in FIG. For example, the estimation device 12 is realized by an information processing device such as a computer, and has a processor 102 and a memory 1000. The processor 102 and the memory 1000 are connected to the bus BUS. Further, the memory 1000 stores the learning data LDP used in the skeleton extraction unit 302 (302a, 302b, 302c, ..., 302n) described later. In FIG. 3, since the training data LDP is transferred from the memory 1000 to each skeleton extraction unit 302 via the bus BUS, a part of the data path between each skeleton extraction unit 302 and the bus BUS is shown by a broken line. The learning data LDP may be stored in a storage device other than the memory 1000 among the storage devices of the estimation device 12, or may be stored in a storage device outside the estimation device 12.

プロセッサ１０２は、例えば、メモリ１０００に格納される推定プログラム（カメラＣＡＭの外部パラメータの推定プログラム）を実行し、推定装置１２の動作を制御する。なお、推定プログラムは、推定装置１２の記憶装置のうちのメモリ１０００以外の記憶装置に格納されてもよく、推定装置１２の外部の記憶装置に格納されてもよい。また、推定プログラムは、ＣＤ−ＲＯＭ、ＤＶＤ、ＵＳＢメモリ等のコンピュータにより読み取り可能な記録媒体ＲＥＣに格納されてもよい。この場合、記録媒体ＲＥＣに格納された推定プログラムは、推定装置１２に設けられる図示しない入出力インタフェースを介して記録媒体ＲＥＣからメモリ１０００等に転送される。なお、推定プログラムは、記録媒体ＲＥＣから図示しないハードディスクに転送された後、ハードディスクからメモリ１０００に転送されてもよい。 The processor 102 executes, for example, an estimation program (an estimation program for external parameters of the camera CAM) stored in the memory 1000, and controls the operation of the estimation device 12. The estimation program may be stored in a storage device other than the memory 1000 among the storage devices of the estimation device 12, or may be stored in a storage device outside the estimation device 12. Further, the estimation program may be stored in a recording medium REC that can be read by a computer such as a CD-ROM, a DVD, or a USB memory. In this case, the estimation program stored in the recording medium REC is transferred from the recording medium REC to the memory 1000 or the like via an input / output interface (not shown) provided in the estimation device 12. The estimation program may be transferred from the recording medium REC to a hard disk (not shown) and then transferred from the hard disk to the memory 1000.

推定装置１２は、複数の特徴点抽出部２０２（２０２ａ、２０２ｂ、２０２ｃ、・・・、２０２ｎ）と、複数の骨格抽出部３０２（３０２ａ、３０２ｂ、３０２ｃ、・・・、３０２ｎ）とを有する。さらに、推定装置１２は、複数の特徴点対応付け部４０２（４０２ａ、４０２ｂ、４０２ｃ、・・・、４０２ｎ）と、複数の骨格対応付け部５０２（５０２ａ、５０２ｂ、５０２ｃ、・・・、５０２ｎ）と、推定部６０２とを有する。 The estimation device 12 has a plurality of feature point extraction units 202 (202a, 202b, 202c, ..., 202n) and a plurality of skeleton extraction units 302 (302a, 302b, 302c, ..., 302n). Further, the estimation device 12 includes a plurality of feature point association units 402 (402a, 402b, 402c, ..., 402n) and a plurality of skeleton association units 502 (502a, 502b, 502c, ..., 502n). And an estimation unit 602.

例えば、プロセッサ１０２は、推定プログラムを実行することにより、複数の特徴点抽出部２０２、複数の骨格抽出部３０２、複数の特徴点対応付け部４０２、複数の骨格対応付け部５０２および推定部６０２の機能を実現する。なお、複数の特徴点抽出部２０２、複数の骨格抽出部３０２、複数の特徴点対応付け部４０２、複数の骨格対応付け部５０２および推定部６０２は、ハードウェアのみで実現されてもよい。 For example, by executing the estimation program, the processor 102 may have a plurality of feature point extraction units 202, a plurality of skeleton extraction units 302, a plurality of feature point association units 402, a plurality of skeleton association units 502, and an estimation unit 602. Realize the function. The plurality of feature point extraction units 202, the plurality of skeleton extraction units 302, the plurality of feature point association units 402, the plurality of skeleton association units 502, and the estimation unit 602 may be realized only by hardware.

複数の特徴点抽出部２０２の各々は、複数のカメラＣＡＭの各々に対応して設けられ、対応するカメラＣＡＭから画像を順次取得する。そして、各特徴点抽出部２０２は、対応するカメラＣＡＭから取得した画像から、カメラＣＡＭで撮影された空間の特徴点（自然特徴点）を抽出する。複数の特徴点抽出部２０２は、複数のカメラＣＡＭの各々から取得した画像から、撮影された空間の特徴点を抽出する特徴点抽出部の一例である。 Each of the plurality of feature point extraction units 202 is provided corresponding to each of the plurality of camera CAMs, and images are sequentially acquired from the corresponding camera CAMs. Then, each feature point extraction unit 202 extracts the feature points (natural feature points) of the space taken by the camera CAM from the images acquired from the corresponding camera CAMs. The plurality of feature point extraction units 202 is an example of a feature point extraction unit that extracts feature points in the captured space from images acquired from each of the plurality of camera CAMs.

複数の骨格抽出部３０２の各々は、複数のカメラＣＡＭの各々に対応して設けられ、対応するカメラＣＡＭから画像を順次取得する。すなわち、複数の骨格抽出部３０２の各々は、複数の特徴点抽出部２０２の各々に対応して設けられ、対応する特徴点抽出部２０２に転送される画像を取得する。このように、各カメラＣＡＭで撮影された画像は、対応する特徴点抽出部２０２および骨格抽出部３０２に転送される。そして、各骨格抽出部３０２は、対応するカメラＣＡＭから取得した画像から、画像中の人物の骨格を抽出する。 Each of the plurality of skeleton extraction units 302 is provided corresponding to each of the plurality of camera CAMs, and images are sequentially acquired from the corresponding camera CAMs. That is, each of the plurality of skeleton extraction units 302 is provided corresponding to each of the plurality of feature point extraction units 202, and acquires an image transferred to the corresponding feature point extraction unit 202. In this way, the images taken by each camera CAM are transferred to the corresponding feature point extraction unit 202 and the skeleton extraction unit 302. Then, each skeleton extraction unit 302 extracts the skeleton of the person in the image from the image acquired from the corresponding camera CAM.

例えば、各骨格抽出部３０２は、人物の画像データ等である学習データＬＤＰを、バスＢＵＳを介してメモリ１０００から受ける。そして、各骨格抽出部３０２は、学習データＬＤＰを用いた深層学習により、画像中の人物の関節の位置（骨格点）を人物の骨格として抽出する。なお、各骨格抽出部３０２は、深層学習以外の機械学習等により、画像中の人物の骨格点を抽出してもよい。複数の骨格抽出部３０２は、複数のカメラＣＡＭの各々から取得した画像から人物の骨格を抽出する骨格抽出部の一例である。 For example, each skeleton extraction unit 302 receives learning data LDP, which is image data of a person, from memory 1000 via a bus BUS. Then, each skeleton extraction unit 302 extracts the position (skeleton point) of the joint of the person in the image as the skeleton of the person by deep learning using the learning data LDP. In addition, each skeleton extraction unit 302 may extract the skeleton points of a person in an image by machine learning or the like other than deep learning. The plurality of skeleton extraction units 302 is an example of a skeleton extraction unit that extracts the skeleton of a person from images acquired from each of the plurality of camera CAMs.

複数の特徴点対応付け部４０２の各々は、互いに隣接するカメラＣＡＭのペアに対応して設けられる。すなわち、複数の特徴点対応付け部４０２の各々は、特徴点抽出部２０２のペアに対応して設けられる。例えば、複数の特徴点対応付け部４０２の各々は、対応する特徴点抽出部２０２のペアの各々から、自然特徴点を示す特徴情報を受ける。そして、複数の特徴点対応付け部４０２は、複数のカメラＣＡＭから取得した複数の画像のうちの２つの画像のペア毎に、２つの画像の一方から抽出した自然特徴点と２つの画像の他方から抽出した自然特徴点とを対応付けて２つの画像間の第１対応点を決定する。 Each of the plurality of feature point mapping units 402 is provided corresponding to a pair of camera CAMs adjacent to each other. That is, each of the plurality of feature point associating units 402 is provided corresponding to the pair of the feature point extraction unit 202. For example, each of the plurality of feature point associating units 402 receives feature information indicating natural feature points from each of the corresponding pair of feature point extraction units 202. Then, the plurality of feature point associating units 402 use the natural feature points extracted from one of the two images and the other of the two images for each pair of two images among the plurality of images acquired from the plurality of camera CAMs. The first corresponding point between the two images is determined by associating with the natural feature points extracted from.

例えば、特徴点対応付け部４０２ａは、特徴点抽出部２０２ａ、２０２ｂのペアに対応して設けられる。この場合、特徴点対応付け部４０２ａは、カメラＣＡＭａで撮影された画像から抽出された自然特徴点を示す特徴情報を特徴点抽出部２０２ａから受け、カメラＣＡＭｂで撮影された画像から抽出された自然特徴点を示す特徴情報を特徴点抽出部２０２ｂから受ける。そして、特徴点対応付け部４０２ａは、特徴点抽出部２０２ａにより抽出された自然特徴点と、特徴点抽出部２０２ｂにより抽出された自然特徴点との間で、同じ物体の特徴を示す自然特徴点をペアにする。これにより、互いに隣接するカメラＣＡＭａ、ＣＡＭｂでそれぞれ撮影された２つの画像間の第１対応点（自然特徴点のペア）が特定される。 For example, the feature point associating unit 402a is provided corresponding to the pair of the feature point extraction units 202a and 202b. In this case, the feature point associating unit 402a receives the feature information indicating the natural feature points extracted from the image taken by the camera CAMa from the feature point extraction unit 202a, and the nature extracted from the image taken by the camera CAMb. The feature information indicating the feature point is received from the feature point extraction unit 202b. Then, the feature point associating unit 402a is a natural feature point that shows the characteristics of the same object between the natural feature point extracted by the feature point extraction unit 202a and the natural feature point extracted by the feature point extraction unit 202b. To pair. As a result, the first corresponding point (pair of natural feature points) between the two images taken by the cameras CAMa and CAMb adjacent to each other is specified.

同様に、特徴点対応付け部４０２ｂは、特徴点抽出部２０２ｂ、２０２ｃのペアに対応して設けられる。この場合、特徴点対応付け部４０２ｂは、カメラＣＡＭｂで撮影された画像から抽出された自然特徴点を示す特徴情報を特徴点抽出部２０２ｂから受け、カメラＣＡＭｃで撮影された画像から抽出された自然特徴点を示す特徴情報を特徴点抽出部２０２ｃから受ける。そして、特徴点対応付け部４０２ｂは、特徴点抽出部２０２ｂにより抽出された自然特徴点と、特徴点抽出部２０２ｃにより抽出された自然特徴点との間で、同じ物体の特徴を示す自然特徴点をペアにする。これにより、互いに隣接するカメラＣＡＭｂ、ＣＡＭｃでそれぞれ撮影された２つの画像間の第１対応点（自然特徴点のペア）が特定される。 Similarly, the feature point associating unit 402b is provided corresponding to the pair of the feature point extraction units 202b and 202c. In this case, the feature point associating unit 402b receives the feature information indicating the natural feature points extracted from the image taken by the camera CAMb from the feature point extraction unit 202b, and the nature extracted from the image taken by the camera CAMc. The feature information indicating the feature point is received from the feature point extraction unit 202c. Then, the feature point associating unit 402b shows the characteristics of the same object between the natural feature points extracted by the feature point extraction unit 202b and the natural feature points extracted by the feature point extraction unit 202c. To pair. As a result, the first corresponding point (pair of natural feature points) between the two images taken by the cameras CAMb and CAMc adjacent to each other is specified.

また、特徴点対応付け部４０２ｎは、特徴点抽出部２０２ｎ、２０２ａのペアに対応して設けられる。この場合、特徴点対応付け部４０２ｎは、カメラＣＡＭｎで撮影された画像から抽出された自然特徴点を示す特徴情報を特徴点抽出部２０２ｎから受け、カメラＣＡＭａで撮影された画像から抽出された自然特徴点を示す特徴情報を特徴点抽出部２０２ａから受ける。そして、特徴点対応付け部４０２ｎは、特徴点抽出部２０２ｎにより抽出された自然特徴点と、特徴点抽出部２０２ａにより抽出された自然特徴点との間で、同じ物体の特徴を示す自然特徴点をペアにする。これにより、互いに隣接するカメラＣＡＭｎ、ＣＡＭａでそれぞれ撮影された２つの画像間の第１対応点（自然特徴点のペア）が特定される。 Further, the feature point associating unit 402n is provided corresponding to the pair of the feature point extraction units 202n and 202a. In this case, the feature point associating unit 402n receives the feature information indicating the natural feature points extracted from the image taken by the camera CAMn from the feature point extraction unit 202n, and the nature extracted from the image taken by the camera CAMa. The feature information indicating the feature point is received from the feature point extraction unit 202a. Then, the feature point associating unit 402n is a natural feature point that shows the characteristics of the same object between the natural feature point extracted by the feature point extraction unit 202n and the natural feature point extracted by the feature point extraction unit 202a. To pair. As a result, the first corresponding point (pair of natural feature points) between the two images taken by the cameras CAMn and CAMa adjacent to each other is specified.

複数の特徴点対応付け部４０２は、複数のカメラＣＡＭから取得した複数の画像のうちの２つの画像のペア毎に、２つの画像間の第１対応点を決定する特徴点対応付け部の一例である。 The plurality of feature point mapping units 402 is an example of a feature point mapping unit that determines a first correspondence point between two images for each pair of two images among a plurality of images acquired from a plurality of camera CAMs. Is.

複数の骨格対応付け部５０２の各々は、互いに隣接するカメラＣＡＭのペアに対応して設けられる。すなわち、複数の骨格対応付け部５０２の各々は、骨格抽出部３０２のペアに対応して設けられる。例えば、複数の骨格対応付け部５０２の各々は、対応する骨格抽出部３０２のペアの各々から、画像中の人物の骨格（例えば、骨格点）を示す骨格情報を受ける。そして、複数の骨格対応付け部５０２は、複数のカメラＣＡＭから取得した複数の画像のうちの２つの画像のペア毎に、２つの画像の一方から抽出した骨格と２つの画像の他方から抽出した骨格とを対応付けて２つの画像間の第２対応点を決定する。 Each of the plurality of skeleton association units 502 is provided corresponding to a pair of camera CAMs adjacent to each other. That is, each of the plurality of skeleton association units 502 is provided corresponding to the pair of skeleton extraction units 302. For example, each of the plurality of skeleton association units 502 receives skeleton information indicating the skeleton (for example, skeleton points) of a person in an image from each of the corresponding pairs of skeleton extraction units 302. Then, the plurality of skeleton association units 502 extracted the skeleton extracted from one of the two images and the other of the two images for each pair of two images among the plurality of images acquired from the plurality of camera CAMs. The second correspondence point between the two images is determined by associating with the skeleton.

例えば、骨格対応付け部５０２ａは、骨格抽出部３０２ａ、３０２ｂのペアに対応して設けられる。この場合、骨格対応付け部５０２ａは、カメラＣＡＭａで撮影された画像から抽出された骨格点を示す骨格情報を骨格抽出部３０２ａから受け、カメラＣＡＭｂで撮影された画像から抽出された骨格点を示す骨格情報を骨格抽出部３０２ｂから受ける。そして、骨格対応付け部５０２ａは、骨格抽出部３０２ａにより抽出された骨格点と、骨格抽出部３０２ｂにより抽出された骨格点との間で、同一人物の同じ関節の位置を示す骨格点をペアにする。これにより、互いに隣接するカメラＣＡＭａ、ＣＡＭｂでそれぞれ撮影された２つの画像間の第２対応点（骨格点のペア）が特定される。 For example, the skeleton association unit 502a is provided corresponding to the pair of the skeleton extraction units 302a and 302b. In this case, the skeleton association unit 502a receives skeleton information indicating the skeleton points extracted from the image taken by the camera CAMa from the skeleton extraction unit 302a, and indicates the skeleton points extracted from the image taken by the camera CAMb. The skeleton information is received from the skeleton extraction unit 302b. Then, the skeleton association unit 502a pairs skeleton points indicating the positions of the same joints of the same person between the skeleton points extracted by the skeleton extraction unit 302a and the skeleton points extracted by the skeleton extraction unit 302b. do. As a result, the second corresponding point (pair of skeleton points) between the two images taken by the cameras CAMa and CAMb adjacent to each other is specified.

同様に、骨格対応付け部５０２ｂは、骨格抽出部３０２ｂ、３０２ｃのペアに対応して設けられる。この場合、骨格対応付け部５０２ｂは、カメラＣＡＭｂで撮影された画像から抽出された骨格点を示す骨格情報を骨格抽出部３０２ｂから受け、カメラＣＡＭｃで撮影された画像から抽出された骨格点を示す骨格情報を骨格抽出部３０２ｃから受ける。そして、骨格対応付け部５０２ｂは、骨格抽出部３０２ｂにより抽出された骨格点と、骨格抽出部３０２ｃにより抽出された骨格点との間で、同一人物の同じ関節の位置を示す骨格点をペアにする。これにより、互いに隣接するカメラＣＡＭｂ、ＣＡＭｃでそれぞれ撮影された２つの画像間の第２対応点（骨格点のペア）が特定される。 Similarly, the skeleton association unit 502b is provided corresponding to the pair of the skeleton extraction units 302b and 302c. In this case, the skeleton association unit 502b receives skeleton information indicating the skeleton points extracted from the image captured by the camera CAMb from the skeleton extraction unit 302b, and indicates the skeleton points extracted from the image captured by the camera CAMc. The skeleton information is received from the skeleton extraction unit 302c. Then, the skeleton association unit 502b pairs skeleton points indicating the positions of the same joints of the same person between the skeleton points extracted by the skeleton extraction unit 302b and the skeleton points extracted by the skeleton extraction unit 302c. do. As a result, the second corresponding point (pair of skeleton points) between the two images taken by the cameras CAMb and CAMc adjacent to each other is specified.

また、骨格対応付け部５０２ｎは、骨格抽出部３０２ｎ、３０２ａのペアに対応して設けられる。この場合、骨格対応付け部５０２ｎは、カメラＣＡＭｎで撮影された画像から抽出された骨格点を示す骨格情報を骨格抽出部３０２ｎから受け、カメラＣＡＭａで撮影された画像から抽出された骨格点を示す骨格情報を骨格抽出部３０２ａから受ける。そして、骨格対応付け部５０２ｎは、骨格抽出部３０２ｎにより抽出された骨格点と、骨格抽出部３０２ａにより抽出された骨格点との間で、同一人物の同じ関節の位置を示す骨格点をペアにする。これにより、互いに隣接するカメラＣＡＭｎ、ＣＡＭａでそれぞれ撮影された２つの画像間の第２対応点（骨格点のペア）が特定される。 Further, the skeleton association unit 502n is provided corresponding to the pair of the skeleton extraction units 302n and 302a. In this case, the skeleton association unit 502n receives skeleton information indicating the skeleton points extracted from the image taken by the camera CAMn from the skeleton extraction unit 302n, and indicates the skeleton points extracted from the image taken by the camera CAMa. The skeleton information is received from the skeleton extraction unit 302a. Then, the skeleton association unit 502n pairs skeleton points indicating the positions of the same joints of the same person between the skeleton points extracted by the skeleton extraction unit 302n and the skeleton points extracted by the skeleton extraction unit 302a. do. As a result, the second corresponding point (pair of skeleton points) between the two images taken by the cameras CAMn and CAMa adjacent to each other is specified.

複数の骨格対応付け部５０２は、複数のカメラＣＡＭから取得した複数の画像のうちの２つの画像のペア毎に、２つの画像間の第２対応点を決定する骨格対応付け部の一例である。推定装置１２は、推定装置１０と同様に、２つの画像間の対応点として、第１対応点（自然特徴点のペア）の他に、第２対応点（骨格点のペア）を特定する。したがって、推定装置１２は、推定装置１０と同様に、同一人物の特定に骨格点を用いない場合に比べて、特徴点等を誤って対応付けることを抑制できる。 The plurality of skeleton association units 502 is an example of a skeleton association unit that determines a second corresponding point between two images for each pair of two images among a plurality of images acquired from a plurality of camera CAMs. .. Similar to the estimation device 10, the estimation device 12 specifies a second correspondence point (pair of skeleton points) in addition to the first correspondence point (pair of natural feature points) as a correspondence point between the two images. Therefore, similarly to the estimation device 10, the estimation device 12 can suppress erroneous association of feature points and the like as compared with the case where the skeleton points are not used to identify the same person.

推定部６０２は、互いに隣接するカメラＣＡＭのペアに対応して設けられる外部パラメータ推定部６２０（６２０ａ、６２０ｂ、６２０ｃ、・・・、６２０ｎ）を有する。したがって、複数の外部パラメータ推定部６２０の各々は、複数の特徴点対応付け部４０２の各々に対応するとともに、複数の骨格対応付け部５０２の各々に対応する。 The estimation unit 602 has an external parameter estimation unit 620 (620a, 620b, 620c, ..., 620n) provided corresponding to a pair of camera CAMs adjacent to each other. Therefore, each of the plurality of external parameter estimation units 620 corresponds to each of the plurality of feature point association units 402 and corresponds to each of the plurality of skeleton association units 502.

複数の外部パラメータ推定部６２０の各々は、対応する特徴点対応付け部４０２から画像間の第１対応点（自然特徴点のペア）を示す情報を受け、対応する骨格対応付け部５０２から画像間の第２対応点（骨格点のペア）を示す情報を受ける。そして、複数の外部パラメータ推定部６２０の各々は、２つの画像間の第１対応点および第２対応点に基づいて、対応するカメラＣＡＭの外部パラメータを推定する。 Each of the plurality of external parameter estimation units 620 receives information indicating a first corresponding point (a pair of natural feature points) between images from the corresponding feature point mapping unit 402, and receives information from the corresponding skeleton mapping unit 502 between images. Receives information indicating the second corresponding point (pair of skeletal points) of. Then, each of the plurality of external parameter estimation units 620 estimates the external parameters of the corresponding camera CAM based on the first corresponding point and the second corresponding point between the two images.

例えば、外部パラメータ推定部６２０ａは、カメラＣＡＭａ、ＣＡＭｂで撮影された画像間の第１対応点を示す情報を特徴点対応付け部４０２ａから受け、カメラＣＡＭａ、ＣＡＭｂで撮影された画像間の第２対応点を示す情報を骨格対応付け部５０２ａから受ける。そして、外部パラメータ推定部６２０ａは、カメラＣＡＭａ、ＣＡＭｂ間の位置関係等を示す回転行列および並進ベクトルを含むカメラＣＡＭの外部パラメータを、カメラＣＡＭａ、ＣＡＭｂで撮影された画像間の第１対応点および第２対応点に基づいて推定する。 For example, the external parameter estimation unit 620a receives information indicating the first corresponding point between the images captured by the cameras CAMa and CAMb from the feature point associating unit 402a, and the second external parameter estimation unit 620a receives information between the images captured by the cameras CAMa and CAMb. Information indicating the corresponding point is received from the skeleton association unit 502a. Then, the external parameter estimation unit 620a sets the external parameters of the camera CAM including the rotation matrix and the translation vector indicating the positional relationship between the cameras CAMa and CAMb as the first corresponding points between the images captured by the cameras CAMa and CAMb. Estimate based on the second correspondence point.

同様に、外部パラメータ推定部６２０ｂは、カメラＣＡＭｂ、ＣＡＭｃで撮影された画像間の第１対応点を示す情報を特徴点対応付け部４０２ｂから受け、カメラＣＡＭｂ、ＣＡＭｃで撮影された画像間の第２対応点を示す情報を骨格対応付け部５０２ｂから受ける。そして、外部パラメータ推定部６２０ｂは、カメラＣＡＭｂ、ＣＡＭｃ間の位置関係等を示す回転行列および並進ベクトルを含むカメラＣＡＭの外部パラメータを、カメラＣＡＭｂ、ＣＡＭｃで撮影された画像間の第１対応点および第２対応点に基づいて推定する。 Similarly, the external parameter estimation unit 620b receives information indicating the first corresponding point between the images captured by the cameras CAMb and CAMc from the feature point associating unit 402b, and receives information indicating the first corresponding point between the images captured by the cameras CAMb and CAMc. 2 Information indicating the corresponding point is received from the skeleton association unit 502b. Then, the external parameter estimation unit 620b sets the external parameters of the camera CAM including the rotation matrix and the translation vector indicating the positional relationship between the cameras CAMb and the CAMc as the first corresponding points between the images captured by the camera CAMb and the CAMc and the first corresponding points. Estimate based on the second correspondence point.

また、外部パラメータ推定部６２０ｎは、カメラＣＡＭｎ、ＣＡＭａで撮影された画像間の第１対応点を示す情報を特徴点対応付け部４０２ｎから受け、カメラＣＡＭｎ、ＣＡＭａで撮影された画像間の第２対応点を示す情報を骨格対応付け部５０２ｎから受ける。そして、外部パラメータ推定部６２０ｎは、カメラＣＡＭｎ、ＣＡＭａ間の位置関係等を示す回転行列および並進ベクトルを含むカメラＣＡＭの外部パラメータを、カメラＣＡＭｎ、ＣＡＭａで撮影された画像間の第１対応点および第２対応点に基づいて推定する。 Further, the external parameter estimation unit 620n receives information indicating the first corresponding point between the images taken by the cameras CAMn and CAMa from the feature point associating unit 402n, and the second external parameter estimation unit 620n receives the information between the images taken by the cameras CAMn and CAMa. Information indicating the corresponding point is received from the skeleton association unit 502n. Then, the external parameter estimation unit 620n sets the external parameters of the camera CAM including the rotation matrix and the translation vector indicating the positional relationship between the cameras CAMn and CAMa as the first corresponding points between the images captured by the cameras CAMn and CAMa. Estimate based on the second correspondence point.

そして、推定部６０２は、例えば、各外部パラメータ推定部６２０により推定されたカメラＣＡＭの外部パラメータをバンドル調整により最適化して自由視点映像生成部２０に転送する。このように、推定部６０２は、２つの画像間の第１対応点および第２対応点に基づいて、カメラＣＡＭの外部パラメータを推定する。 Then, the estimation unit 602 optimizes the external parameters of the camera CAM estimated by each external parameter estimation unit 620 by bundle adjustment and transfers them to the free viewpoint video generation unit 20. In this way, the estimation unit 602 estimates the external parameters of the camera CAM based on the first correspondence point and the second correspondence point between the two images.

なお、推定装置１２の構成は、図３に示す例に限定されない。例えば、カメラＣＡＭで撮影された画像を受信し、カメラＣＡＭから受信した画像を特徴点抽出部２０２および骨格抽出部３０２にバスＢＵＳを介して転送するインタフェースが推定装置１２に含まれてもよい。また、例えば、プロセッサ１０２は、自由視点映像を生成するプログラムを実行することにより、自由視点映像生成部２０の機能を実現してもよい。 The configuration of the estimation device 12 is not limited to the example shown in FIG. For example, the estimation device 12 may include an interface that receives an image taken by the camera CAM and transfers the image received from the camera CAM to the feature point extraction unit 202 and the skeleton extraction unit 302 via the bus BUS. Further, for example, the processor 102 may realize the function of the free viewpoint image generation unit 20 by executing a program for generating a free viewpoint image.

図４は、図３に示した骨格抽出部３０２により抽出される骨格の一例を示す。骨格抽出部３０２は、画像ＩＭＧ中の人物ＰＮの関節の位置等を示す骨格点ＢＰ（図４に示す黒丸）を、学習データＬＤＰを用いた深層学習により抽出する。なお、画像中の人物ＰＮの関節の位置を推定する方法は、例えば、Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh, “Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields”, in arXiv 1611.08050, 2016、［平成２９年９月４日検索］、インターネット＜ＵＲＬ：
1511929379642_0.pdf
＞に開示されている。 FIG. 4 shows an example of the skeleton extracted by the skeleton extraction unit 302 shown in FIG. The skeleton extraction unit 302 extracts the skeleton point BP (black circle shown in FIG. 4) indicating the position of the joint of the person PN in the image IMG by deep learning using the learning data LDP. The method of estimating the position of the joint of the person PN in the image is, for example, Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh, “Realtime Multi-Person 2D Pose Optimization using Part Affinity Fields”, in arXiv 1611.08050. , 2016, [Search on September 4, 2017], Internet <URL:
1511929379642_0.pdf
> Is disclosed.

図５は、図３に示した推定装置１２による画像ＩＭＧａ、ＩＭＧｂ間の対応付けの一例を示す。なお、図５は、バスケットボールの試合を複数のカメラＣＡＭで撮影して自由視点映像を生成する場合において、推定装置１２が画像ＩＭＧａ、ＩＭＧｂ間の対応付けを実行する例を示す。画像ＩＭＧａは、カメラＣＡＭａで撮影された画像であり、画像ＩＭＧｂは、カメラＣＡＭｂで撮影された画像である。 FIG. 5 shows an example of the correspondence between the images IMGa and IMGb by the estimation device 12 shown in FIG. Note that FIG. 5 shows an example in which the estimation device 12 executes the association between the images IMGa and IMGb when the basketball game is photographed by a plurality of camera CAMs to generate a free viewpoint image. The image IMGa is an image taken by the camera CAMa, and the image IMGb is an image taken by the camera CAMb.

特徴点対応付け部４０２ａは、画像ＩＭＧａから抽出された自然特徴点と画像ＩＭＧｂから抽出された自然特徴点との間で、同じ物体の特徴を示す自然特徴点をペアＰＣ（ＰＣ１、ＰＣ２）にする。例えば、画像ＩＭＧａ、ＩＭＧｂ間の自然特徴点のペアＰＣ１（第１対応点）は、コート内に白線で描かれたフリースローレーンの角を示し、自然特徴点のペアＰＣ２（第１対応点）は、バックボードの角を示す。 The feature point associating unit 402a converts the natural feature points that show the features of the same object between the natural feature points extracted from the image IMGa and the natural feature points extracted from the image IMGb into paired PCs (PC1, PC2). do. For example, the pair PC1 of natural feature points (first correspondence point) between the images IMGa and IMGb indicates the corner of the free throw lane drawn by a white line in the court, and the pair PC2 of natural feature points (first correspondence point). Indicates the corner of the backboard.

また、骨格対応付け部５０２ａは、画像ＩＭＧａから抽出された骨格点ＢＰと画像ＩＭＧｂから抽出された骨格点ＢＰとの間で、同一の人物ＰＮの同じ関節の位置を示す骨格点ＢＰをペアＰＰ（ＰＰ１、ＰＰ２、ＰＰ３、ＰＰ４）にする。例えば、画像ＩＭＧａ、ＩＭＧｂ間の骨格点ＢＰのペアＰＰ１（第２対応点）は、人物ＰＮ１（ＰＮ１ａ、ＰＮ１ｂ）の膝を示し、骨格点ＢＰのペアＰＰ２（第２対応点）は、人物ＰＮ２（ＰＮ２ａ、ＰＮ２ｂ）の肘を示す。また、画像ＩＭＧａ、ＩＭＧｂ間の骨格点ＢＰのペアＰＰ３（第２対応点）は、人物ＰＮ３（ＰＮ３ａ、ＰＮ３ｂ）の肩を示し、骨格点ＢＰのペアＰＰ４（第２対応点）は、人物ＰＮ４（ＰＮ４ａ、ＰＮ４ｂ）の膝を示す。画像ＩＭＧａ中の人物ＰＮ５ａは、画像ＩＭＧｂに写っていないため、対応付けされない。 Further, the skeleton association unit 502a pairs PP skeleton points BP indicating the positions of the same joints of the same person PN between the skeleton point BP extracted from the image IMGa and the skeleton point BP extracted from the image IMGb. (PP1, PP2, PP3, PP4). For example, the pair PP1 (second correspondence point) of the skeleton point BP between the images IMGa and IMGb indicates the knee of the person PN1 (PN1a, PN1b), and the pair PP2 (second correspondence point) of the skeleton point BP is the person PN2. The elbow of (PN2a, PN2b) is shown. Further, the pair PP3 (second correspondence point) of the skeleton point BP between the images IMGa and IMGb indicates the shoulder of the person PN3 (PN3a, PN3b), and the pair PP4 (second correspondence point) of the skeleton point BP is the person PN4. The knees of (PN4a, PN4b) are shown. The person PN5a in the image IMGa is not associated with the image IMGb because it is not shown in the image IMGb.

なお、画像ＩＭＧａ中の人物ＰＮと画像ＩＭＧｂ中の人物ＰＮとが同一人物か否かは、各人物ＰＮの関節の角度θおよび関節間の長さＬを含む人物情報に基づいて判定される。例えば、骨格対応付け部５０２ａは、各画像ＩＭＧａ、ＩＭＧｂから抽出した骨格点ＢＰに基づいて、各画像ＩＭＧａ、ＩＭＧｂ中の各人物ＰＮの関節の角度θおよび関節間の長さＬを算出する。そして、骨格対応付け部５０２ａは、画像ＩＭＧａ中の着目する人物ＰＮの関節の角度θおよび関節間の長さＬのそれぞれと同じまたはほぼ同じ値の要素（角度θ、長さＬ）を有する画像ＩＭＧｂ中の人物ＰＮを、着目する人物ＰＮと同一であると判定する。 Whether or not the person PN in the image IMGa and the person PN in the image IMGb are the same person is determined based on the person information including the joint angle θ of each person PN and the length L between the joints. For example, the skeleton association unit 502a calculates the joint angle θ and the length L between the joints of each person PN in each image IMGa and IMGb based on the skeleton point BP extracted from each image IMGa and IMGb. Then, the skeleton association unit 502a is an image having elements (angle θ, length L) having the same or substantially the same values as the joint angle θ and the length L between the joints of the person PN of interest in the image IMGa. It is determined that the person PN in the IMGb is the same as the person PN of interest.

図５に示すかぎ括弧内には、人物ＰＮ１（ＰＮ１ａ、ＰＮ１ｂ）の膝の角度θ０（θ０ａ、θ０ｂ）、膝から股関節までの長さＬ１（Ｌ１ａ、Ｌ１ｂ）および膝から足首までの長さＬ２（Ｌ２ａ、Ｌ２ｂ）を算出する例を示す。骨格点ＢＰ０ａは、画像ＩＭＧａ中の人物ＰＮ１ａの膝に対応する。また、骨格点ＢＰ１ａは、人物ＰＮ１ａの股関節に対応し、骨格点ＢＰ０ａに隣接する骨格点ＢＰである。骨格点ＢＰ２ａは、人物ＰＮ１ａの足首に対応し、骨格点ＢＰ０ａに隣接する骨格点ＢＰである。同様に、骨格点ＢＰ０ｂは、画像ＩＭＧｂ中の人物ＰＮ１ｂの膝に対応する。また、骨格点ＢＰ１ｂは、人物ＰＮ１ｂの股関節に対応し、骨格点ＢＰ０ｂに隣接する骨格点ＢＰである。骨格点ＢＰ２ｂは、人物ＰＮ１ｂの足首に対応し、骨格点ＢＰ０ｂに隣接する骨格点ＢＰである
例えば、骨格対応付け部５０２ａは、人物ＰＮ１ａの骨格点ＢＰ０ａ、ＢＰ１ａを結ぶ線と骨格点ＢＰ０ａ、ＢＰ２ａを結ぶ線との角度θ０ａを、人物ＰＮ１ａの膝の角度θ０ａとして算出する。また、骨格対応付け部５０２ａは、骨格点ＢＰ０ａ、ＢＰ１ａ間の長さＬ１ａを、人物ＰＮ１ａの膝から股関節までの長さＬ１ａとして算出し、骨格点ＢＰ０ａ、ＢＰ２ａ間の長さＬ２ａを、人物ＰＮ１ａの膝から足首までの長さＬ２ａとして算出する。同様に、骨格対応付け部５０２ａは、骨格点ＢＰ０ｂ、ＢＰ１ｂ、ＢＰ２ｂを用いて、人物ＰＮ１ｂの膝の角度θ０ｂ、膝から股関節までの長さＬ１ｂおよび膝から足首までの長さＬ２ｂを算出する。 In the brackets shown in FIG. 5, the knee angle θ0 (θ0a, θ0b) of the person PN1 (PN1a, PN1b), the length L1 (L1a, L1b) from the knee to the hip joint, and the length L2 from the knee to the ankle. An example of calculating (L2a, L2b) is shown. The skeletal point BP0a corresponds to the knee of the person PN1a in the image IMGa. Further, the skeleton point BP1a is a skeleton point BP corresponding to the hip joint of the person PN1a and adjacent to the skeleton point BP0a. The skeleton point BP2a is a skeleton point BP corresponding to the ankle of the person PN1a and adjacent to the skeleton point BP0a. Similarly, the skeletal point BP0b corresponds to the knee of the person PN1b in the image IMGb. Further, the skeleton point BP1b is a skeleton point BP corresponding to the hip joint of the person PN1b and adjacent to the skeleton point BP0b. The skeleton point BP2b is a skeleton point BP corresponding to the ankle of the person PN1b and adjacent to the skeleton point BP0b. The angle θ0a with the line connecting the two is calculated as the knee angle θ0a of the person PN1a. Further, the skeleton association unit 502a calculates the length L1a between the skeleton points BP0a and BP1a as the length L1a from the knee to the hip joint of the person PN1a, and calculates the length L2a between the skeleton points BP0a and BP2a as the person PN1a. It is calculated as the length L2a from the knee to the ankle. Similarly, the skeleton association unit 502a uses the skeleton points BP0b, BP1b, and BP2b to calculate the knee angle θ0b of the person PN1b, the length L1b from the knee to the hip joint, and the length L2b from the knee to the ankle.

なお、人物ＰＮ１ａ、ＰＮ１ｂの膝以外の関節の角度θおよび関節間の長さＬも、膝の角度θ０、膝から股関節までの長さＬ１および膝から足首までの長さＬ２と同様の方法により算出される。これにより、人物ＰＮ毎に、複数の角度θおよび複数の長さＬが算出される。骨格対応付け部５０２ａは、例えば、人物ＰＮ１ａの複数の角度θおよび複数の長さＬを要素とする人物情報と画像ＩＭＧｂ中の各人物ＰＮの人物情報との間で、互いに対応する要素（角度θ、長さＬ）の差分の絶対値和を対応付け評価値として算出する。そして、骨格対応付け部５０２ａは、画像ＩＭＧａ中の人物ＰＮ１ａに対する画像ＩＭＧｂ中の複数の人物ＰＮの対応付け評価値のうち、最小の対応付け評価値になる人物ＰＮ（図５に示す例では、人物ＰＮ１ｂ）を、人物ＰＮ１ａと同一と判定する。 The angles θ of the joints other than the knees and the length L between the joints of the persons PN1a and PN1b are also the same as the knee angle θ0, the knee-to-hip length L1 and the knee-to-ankle length L2. It is calculated. As a result, a plurality of angles θ and a plurality of lengths L are calculated for each person PN. The skeleton association unit 502a is, for example, an element (angle) corresponding to each other between the person information having a plurality of angles θ and a plurality of lengths L of the person PN1a as elements and the person information of each person PN in the image IMGb. The sum of the absolute values of the differences of θ and length L) is calculated as the associated evaluation value. Then, the skeleton association unit 502a has a person PN (in the example shown in FIG. 5, which is the smallest association evaluation value among the association evaluation values of the plurality of person PNs in the image IMGb with respect to the person PN1a in the image IMGa). The person PN1b) is determined to be the same as the person PN1a.

そして、骨格対応付け部５０２ａは、画像ＩＭＧａ、ＩＭＧｂ間で同一人物と判定した人物ＰＮ１ａ、ＰＮ１ｂの骨格点ＢＰのうち、互いに対応する部位を示す骨格点ＢＰのペアＰＰを、画像ＩＭＧａ、ＩＭＧｂ間の骨格点ＢＰのペアＰＰに決定する。すなわち、骨格対応付け部５０２ａは、人物情報を用いて、２つの画像ＩＭＧａ、ＩＭＧｂの一方から抽出した骨格点ＢＰと２つの画像ＩＭＧａ、ＩＭＧｂの他方から抽出した骨格点ＢＰとを対応付けて２つの画像ＩＭＧａ、ＩＭＧｂ間の第２対応点を決定する。 Then, the skeleton association unit 502a sets a pair PP of skeleton points BP indicating the corresponding parts among the skeleton point BPs of the persons PN1a and PN1b determined to be the same person between the images IMGa and IMGb, between the images IMGa and IMGb. The pair PP of the skeletal point BP of. That is, the skeleton association unit 502a associates the skeleton point BP extracted from one of the two images IMGa and IMGb with the skeleton point BP extracted from the other of the two images IMGa and IMGb using the person information. A second correspondence point between the two images IMGa and IMGb is determined.

このように、骨格対応付け部５０２ａは、２つの画像ＩＭＧａ、ＩＭＧｂ間で人物ＰＮの関節の角度θおよび関節間の長さＬが類似する人物ＰＮを特定することにより、画像ＩＭＧａ、ＩＭＧｂ間で同一人物を特定することができる。このため、一人分の空間より広い範囲を撮影対象とする場合、例えば、バスケットボール等のスポーツシーンにおいても、推定装置１２は、同一人物の特定に骨格点ＢＰを用いない場合に比べて、同一人物の特定精度を向上できる。例えば、同一人物の特定に骨格点ＢＰを用いない場合、人物ＰＮ１、ＰＮ３が同じユニホームを着用しているため、人物ＰＮ１と人物ＰＮ３とで類似した特徴が抽出され、人物ＰＮ１ａと人物ＰＮ３ｂとが同一人物と判定されるおそれがある。これに対し、推定装置１２は、２つの画像ＩＭＧａ、ＩＭＧｂ間で人物ＰＮの関節の角度θおよび関節間の長さＬが類似する人物ＰＮ１ａ、ＰＮ１ｂを特定するため、人物ＰＮ１ａと人物ＰＮ３ｂが同一人物であると誤って判定することを抑制できる。すなわち、推定装置１２は、同一人物の特定に骨格点ＢＰを用いない場合に比べて、特徴点等を誤って対応付けることを抑制できる。 In this way, the skeleton association unit 502a identifies the person PN having a similar joint angle θ and the length L between the joints of the person PN between the two images IMGa and IMGb, thereby between the images IMGa and IMGb. The same person can be identified. Therefore, when a wider range than the space for one person is to be photographed, for example, even in a sports scene such as basketball, the estimation device 12 is the same person as compared with the case where the skeleton point BP is not used to identify the same person. The specific accuracy of can be improved. For example, when the skeleton point BP is not used to identify the same person, since the person PN1 and PN3 wear the same uniform, similar characteristics are extracted between the person PN1 and the person PN3, and the person PN1a and the person PN3b become It may be determined that they are the same person. On the other hand, the estimation device 12 identifies the person PN1a and PN1b having similar joint angles θ and the length L between the joints of the person PN between the two images IMGa and IMGb, so that the person PN1a and the person PN3b are the same. It is possible to prevent erroneous determination as a person. That is, the estimation device 12 can suppress erroneous association of feature points and the like as compared with the case where the skeleton point BP is not used to identify the same person.

なお、推定装置１２による画像ＩＭＧａ、ＩＭＧｂ間の対応付けの方法は、図５に示す例に限定されない。例えば、骨格対応付け部５０２ａは、画像ＩＭＧａ中の人物ＰＮ１ａに対する画像ＩＭＧｂ中の複数の人物ＰＮの最小の対応付け評価値が予め決められた閾値以上の場合、人物ＰＮ１ａと同一の人物ＰＮは画像ＩＭＧｂ中に存在しないと判定してもよい。また、人物情報は、人物ＰＮの関節の角度θおよび関節間の長さＬの少なくとも一方を含んでいればよい。また、人物情報を用いた同一人物か否かの判定は、人物情報間の互いに対応する要素（角度θ、長さＬ）の差分の絶対値和を用いる方法に限定されない。 The method of associating the images IMGa and IMGb by the estimation device 12 is not limited to the example shown in FIG. For example, in the skeleton association unit 502a, when the minimum association evaluation value of the plurality of person PNs in the image IMGb with respect to the person PN1a in the image IMGb is equal to or more than a predetermined threshold value, the person PN same as the person PN1a is an image. It may be determined that it does not exist in IMGb. Further, the person information may include at least one of the joint angle θ of the person PN and the length L between the joints. Further, the determination of whether or not the person is the same person using the person information is not limited to the method using the absolute value sum of the differences between the elements (angle θ, length L) corresponding to each other between the person information.

図６は、図３に示した推定装置１２の動作の一例を示す。図６に示す動作は、カメラＣＡＭの外部パラメータの推定方法の一例である。また、図６に示す動作をコンピュータ等の推定装置１２に実行させるためのプログラムは、カメラＣＡＭの外部パラメータの推定プログラムの一例である。図６に示す動作は、カメラＣＡＭで撮影される映像のフレーム毎に実行される。なお、図６に示す動作は、数フレームおきに実行されてもよい。 FIG. 6 shows an example of the operation of the estimation device 12 shown in FIG. The operation shown in FIG. 6 is an example of a method of estimating the external parameters of the camera CAM. Further, the program for causing the estimation device 12 such as a computer to execute the operation shown in FIG. 6 is an example of an estimation program for external parameters of the camera CAM. The operation shown in FIG. 6 is executed for each frame of the image captured by the camera CAM. The operation shown in FIG. 6 may be executed every few frames.

ステップＳ１００では、各特徴点抽出部２０２および各骨格抽出部３０２は、複数のカメラＣＡＭのうちの対応するカメラＣＡＭで撮影された画像ＩＭＧを取得する。例えば、特徴点抽出部２０２ａおよび骨格抽出部３０２ａは、カメラＣＡＭａで撮影された画像ＩＭＧａを取得する。 In step S100, each feature point extraction unit 202 and each skeleton extraction unit 302 acquire an image IMG captured by the corresponding camera CAM among the plurality of camera CAMs. For example, the feature point extraction unit 202a and the skeleton extraction unit 302a acquire the image IMGa taken by the camera CAMa.

次に、ステップＳ２００では、各特徴点抽出部２０２は、複数のカメラＣＡＭのうちの対応するカメラＣＡＭで撮影された画像ＩＭＧから自然特徴点を抽出する。例えば、特徴点抽出部２０２ａは、画像ＩＭＧａから自然特徴点を抽出する。 Next, in step S200, each feature point extraction unit 202 extracts natural feature points from the image IMG captured by the corresponding camera CAM among the plurality of camera CAMs. For example, the feature point extraction unit 202a extracts natural feature points from the image IMGa.

次に、ステップＳ３００では、各骨格抽出部３０２は、複数のカメラＣＡＭのうちの対応するカメラＣＡＭで撮影された画像ＩＭＧから人物ＰＮの骨格点ＢＰを抽出する。例えば、骨格抽出部３０２ａは、学習データＬＤＰを用いた深層学習により、画像ＩＭＧａから人物ＰＮの骨格点ＢＰを抽出する。 Next, in step S300, each skeleton extraction unit 302 extracts the skeleton point BP of the person PN from the image IMG captured by the corresponding camera CAM among the plurality of camera CAMs. For example, the skeleton extraction unit 302a extracts the skeleton point BP of the person PN from the image IMGa by deep learning using the learning data LDP.

次に、ステップＳ４００では、推定装置１２は、カメラＣＡＭの外部パラメータを算出する第１算出処理を、互いに隣接するカメラＣＡＭのペア毎に実行する。第１算出処理の詳細は、図７で説明する。第１算出処理により、カメラＣＡＭの外部パラメータが、互いに隣接するカメラＣＡＭのペア毎に算出される。なお、図７に示す第１算出処理では、人物ＰＮを撮影していないカメラＣＡＭの外部パラメータは、算出されない。 Next, in step S400, the estimation device 12 executes the first calculation process for calculating the external parameters of the camera CAM for each pair of camera CAMs adjacent to each other. The details of the first calculation process will be described with reference to FIG. By the first calculation process, the external parameters of the camera CAM are calculated for each pair of camera CAMs adjacent to each other. In the first calculation process shown in FIG. 7, the external parameters of the camera CAM that does not capture the person PN are not calculated.

次に、ステップＳ５００では、推定装置１２は、互いに隣接するカメラＣＡＭの全てのペアで第１算出処理が終了したか否かを判定する。互いに隣接するカメラＣＡＭの全てのペアで第１算出処理が終了した場合、推定装置１２の動作は、ステップＳ６００に移る。一方、互いに隣接するカメラＣＡＭの全てのペアのうちのいずれかのペアで第１算出処理が終了していない場合、推定装置１２の動作は、ステップＳ５００に戻る。すなわち、推定装置１２は、互いに隣接するカメラＣＡＭの全てのペアで第１算出処理が終了するまで、ステップＳ６００の処理の実行を待機する。 Next, in step S500, the estimation device 12 determines whether or not the first calculation process has been completed for all pairs of camera CAMs adjacent to each other. When the first calculation process is completed for all the pairs of camera CAMs adjacent to each other, the operation of the estimation device 12 shifts to step S600. On the other hand, if the first calculation process is not completed in any pair of all the pairs of camera CAMs adjacent to each other, the operation of the estimation device 12 returns to step S500. That is, the estimation device 12 waits for the execution of the process of step S600 until the first calculation process is completed for all the pairs of camera CAMs adjacent to each other.

ステップＳ６００では、推定部６０２は、ステップＳ４００の第１算出処理により算出された各カメラＣＡＭの外部パラメータを最適化するバンドル調整を実行する。 In step S600, the estimation unit 602 executes bundle adjustment that optimizes the external parameters of each camera CAM calculated by the first calculation process of step S400.

次に、ステップＳ７００では、推定部６０２は、カメラＣＡＭの最終的な外部パラメータを決定する。例えば、推定部６０２は、ステップＳ６００のバンドル調整により最適化された各カメラＣＡＭの外部パラメータを、最終的な外部パラメータに決定する。そして、推定部６０２は、カメラＣＡＭの最終的な外部パラメータを自由視点映像生成部２０に転送する。このように、推定装置１２は、カメラＣＡＭで撮影される映像のフレーム毎に、カメラＣＡＭの外部パラメータを推定し、推定したカメラＣＡＭの外部パラメータを自由視点映像生成部２０に転送する。 Next, in step S700, the estimation unit 602 determines the final external parameters of the camera CAM. For example, the estimation unit 602 determines the external parameter of each camera CAM optimized by the bundle adjustment in step S600 as the final external parameter. Then, the estimation unit 602 transfers the final external parameter of the camera CAM to the free viewpoint image generation unit 20. In this way, the estimation device 12 estimates the external parameters of the camera CAM for each frame of the image captured by the camera CAM, and transfers the estimated external parameters of the camera CAM to the free viewpoint image generation unit 20.

ここで、スポーツシーンにおいて、自由視点映像を生成する場合、カメラＣＡＭ間の位置関係等を推定するためにコート内に設置されたキャリブレーションボードを事前に撮影してカメラＣＡＭの外部パラメータを計測する従来方法が知られている。この種の従来方法では、カメラＣＡＭの外部パラメータが計測された後、試合の開始前までにキャリブレーションボードが撤去されるため、試合中にカメラＣＡＭの外部パラメータを再計測することは困難である。したがって、従来方法では、試合中の選手の動き等により床が振動してカメラＣＡＭの姿勢等が変化した場合、カメラＣＡＭの外部パラメータは、事前に計測した値、すなわち、自由視点映像の生成に用いる外部パラメータの値から変化する。この場合、自由視点映像の品質は、試合中のカメラＣＡＭの外部パラメータの値が事前に計測した値から変化しない場合に比べて低下する。 Here, in a sports scene, when generating a free-viewpoint image, a calibration board installed in the court is photographed in advance to estimate the positional relationship between the camera CAMs, and the external parameters of the camera CAM are measured. Conventional methods are known. In this type of conventional method, it is difficult to remeasure the external parameters of the camera CAM during the match because the calibration board is removed before the start of the match after the external parameters of the camera CAM are measured. .. Therefore, in the conventional method, when the floor vibrates due to the movement of the player during the game and the posture of the camera CAM changes, the external parameter of the camera CAM is a value measured in advance, that is, a free viewpoint image is generated. It changes from the value of the external parameter used. In this case, the quality of the free-viewpoint image is lower than that in the case where the value of the external parameter of the camera CAM during the game does not change from the value measured in advance.

これに対し、推定装置１２は、上述したように、カメラＣＡＭで撮影される映像のフレーム毎に、カメラＣＡＭの外部パラメータを推定する。あるいは、推定装置１２は、数フレームおきに、カメラＣＡＭの外部パラメータを推定する。このため、例えば、スポーツシーンにおいて、試合中の選手の動き等により床が振動してカメラＣＡＭの姿勢等が変化した場合でも、推定装置１２は、姿勢等の変化に応じたカメラＣＡＭの外部パラメータを推定できる。すなわち、自由視点映像の生成に用いるカメラＣＡＭの外部パラメータを従来方法に比べて精度よく推定することができる。この結果、自由視点映像の品質を従来方法に比べて向上させることができる。 On the other hand, as described above, the estimation device 12 estimates the external parameters of the camera CAM for each frame of the image captured by the camera CAM. Alternatively, the estimation device 12 estimates the external parameters of the camera CAM every few frames. Therefore, for example, in a sports scene, even if the floor vibrates due to the movement of a player during a match and the posture of the camera CAM changes, the estimation device 12 uses an external parameter of the camera CAM according to the change of the posture or the like. Can be estimated. That is, the external parameters of the camera CAM used to generate the free-viewpoint image can be estimated more accurately than the conventional method. As a result, the quality of the free-viewpoint video can be improved as compared with the conventional method.

なお、推定装置１２の動作は、図６に示す例に限定されない。例えば、ステップＳ３００の処理は、ステップＳ２００の処理の前に実効されてもよい。 The operation of the estimation device 12 is not limited to the example shown in FIG. For example, the process of step S300 may be executed before the process of step S200.

図７は、図６に示した第１算出処理（ステップＳ４００）の一例を示す。なお、図７に示す第１算出処理は、互いに隣接するカメラＣＡＭの１ペアに対する第１算出処理である。例えば、図７に示す第１算出処理は、互いに隣接するカメラＣＡＭのペア毎に並列に実行される。なお、図７に示す第１算出処理は、互いに隣接するカメラＣＡＭのペア毎に順次実行されてもよい。図７では、カメラＣＡＭａ、ＣＡＭｂのペアに対する第１算出処理を説明する。 FIG. 7 shows an example of the first calculation process (step S400) shown in FIG. The first calculation process shown in FIG. 7 is the first calculation process for one pair of camera CAMs adjacent to each other. For example, the first calculation process shown in FIG. 7 is executed in parallel for each pair of camera CAMs adjacent to each other. The first calculation process shown in FIG. 7 may be sequentially executed for each pair of camera CAMs adjacent to each other. FIG. 7 describes the first calculation process for the pair of cameras CAMa and CAMb.

ステップＳ４１０では、骨格対応付け部５０２ａは、カメラＣＡＭａ、ＣＡＭｂのペアから取得した画像ＩＭＧａ、ＩＭＧｂの両方から人物ＰＮの骨格点ＢＰが抽出されたか否かを判定する。ペアの画像ＩＭＧａ、ＩＭＧｂの両方から人物ＰＮの骨格点ＢＰが抽出された場合、推定装置１２の動作は、ステップＳ４２０に移る。一方、ペアの画像ＩＭＧａ、ＩＭＧｂの少なくとも一方から人物ＰＮの骨格点ＢＰが抽出されない場合、カメラＣＡＭａ、ＣＡＭｂのペアに対する第１算出処理は、終了する。すなわち、推定装置１２は、ペアの画像ＩＭＧａ、ＩＭＧｂの少なくとも一方から人物ＰＮの骨格点ＢＰが抽出されない場合、カメラＣＡＭａ、ＣＡＭｂ間の位置関係を示す外部パラメータを抽出しない。 In step S410, the skeleton association unit 502a determines whether or not the skeleton point BP of the person PN is extracted from both the images IMGa and IMGb acquired from the pair of cameras CAMa and CAMb. When the skeleton point BP of the person PN is extracted from both the paired images IMGa and IMGb, the operation of the estimation device 12 shifts to step S420. On the other hand, when the skeleton point BP of the person PN is not extracted from at least one of the paired images IMGa and IMGb, the first calculation process for the pair of cameras CAMa and CAMb ends. That is, when the skeleton point BP of the person PN is not extracted from at least one of the paired images IMGa and IMGb, the estimation device 12 does not extract the external parameter indicating the positional relationship between the cameras CAMa and CAMb.

ここで、スポーツシーンにおける自由視点映像の生成という観点では、選手（人物ＰＮ）が写っている領域の自由視点映像を精度よく生成できればよい。このため、スポーツシーンでは、選手（人物ＰＮ）の写っている領域が偏った場合、選手（人物ＰＮ）が存在する領域の特徴点に基づいたキャリブレーション（カメラＣＡＭの外部パラメータの推定）ができればよい。すなわち、人物ＰＮが存在しない領域の特徴点に基づいたキャリブレーション（カメラＣＡＭの外部パラメータの推定）は、省かれてもよい。この場合、カメラＣＡＭの外部パラメータの推定精度を複数のカメラＣＡＭの撮影範囲全体で満遍なく向上する場合に比べて、カメラＣＡＭの外部パラメータを推定するための演算量を低減することができる。 Here, from the viewpoint of generating a free-viewpoint image in a sports scene, it suffices if the free-viewpoint image of the area in which the player (person PN) is captured can be generated accurately. Therefore, in the sports scene, if the area in which the player (person PN) is captured is biased, if calibration (estimation of the external parameter of the camera CAM) can be performed based on the feature points of the area in which the player (person PN) exists. good. That is, the calibration based on the feature points of the region where the person PN does not exist (estimation of the external parameters of the camera CAM) may be omitted. In this case, the amount of calculation for estimating the external parameters of the camera CAM can be reduced as compared with the case where the estimation accuracy of the external parameters of the camera CAM is evenly improved over the entire shooting range of the plurality of camera CAMs.

ステップＳ４２０では、特徴点対応付け部４０２ａは、図６に示したステップＳ２００で抽出された自然特徴点のうち、ペアの画像ＩＭＧａ、ＩＭＧｂから抽出された自然特徴点を対応付けて画像ＩＭＧａ、ＩＭＧｂ間の第１対応点を決定する。例えば、特徴点対応付け部４０２ａは、図５で説明したように、画像ＩＭＧａから抽出された自然特徴点と画像ＩＭＧｂから抽出された自然特徴点との間で、同じ物体の特徴を示す自然特徴点のペアＰＣを、画像ＩＭＧａ、ＩＭＧｂ間の第１対応点として決定する。これにより、画像ＩＭＧａ、ＩＭＧｂ間の対応点として、第１対応点（自然特徴点のペアＰＣ）が特定される。 In step S420, the feature point associating unit 402a associates the natural feature points extracted from the paired images IMGa and IMGb among the natural feature points extracted in step S200 shown in FIG. 6 with the images IMGa and IMGb. Determine the first correspondence point between. For example, as described with reference to FIG. 5, the feature point associating unit 402a shows the features of the same object between the natural feature points extracted from the image IMGa and the natural feature points extracted from the image IMGb. The pair PC of points is determined as the first corresponding point between the images IMGa and IMGb. As a result, the first corresponding point (pair PC of natural feature points) is specified as the corresponding point between the images IMGa and IMGb.

次に、ステップＳ４３０では、骨格対応付け部５０２ａは、図６に示したステップＳ３００で抽出された骨格点ＢＰのうち、ペアの画像ＩＭＧａ、ＩＭＧｂから抽出された骨格点ＢＰを対応付けて画像ＩＭＧａ、ＩＭＧｂ間の第２対応点を決定する。例えば、骨格対応付け部５０２ａは、図５で説明したように、画像ＩＭＧａから抽出された骨格点ＢＰと画像ＩＭＧｂから抽出された骨格点ＢＰとの間で、同一の人物ＰＮの同じ関節の位置を示す骨格点ＢＰのペアＰＰを特定する。そして、骨格対応付け部５０２ａは、特定した骨格点ＢＰのペアＰＰを、画像ＩＭＧａ、ＩＭＧｂ間の第２対応点に決定する。これにより、画像ＩＭＧａ、ＩＭＧｂ間の対応点として、第２対応点が、第１対応点（自然特徴点のペア）とは別に特定される。このように、推定装置１２は、自然特徴点と骨格点ＢＰとを用いて画像ＩＭＧａ、ＩＭＧｂ間の対応点を特定する。このため、推定装置１２は、複数の人物ＰＮ（選手）が同じユニホームを着用するスポーツシーンにおいても、骨格点ＢＰを抽出しない場合に比べて、人物ＰＮを誤って対応付けることを抑制できる。 Next, in step S430, the skeleton association unit 502a associates the paired image IMGa and the skeleton point BP extracted from the IMGb among the skeleton point BPs extracted in step S300 shown in FIG. , Determine the second correspondence point between IMGb. For example, as described in FIG. 5, the skeleton association unit 502a is located between the skeleton point BP extracted from the image IMGa and the skeleton point BP extracted from the image IMGb, and the positions of the same joints of the same person PN. The pair PP of the skeletal point BP showing the above is specified. Then, the skeleton association unit 502a determines the pair PP of the specified skeleton point BP as the second corresponding point between the images IMGa and IMGb. As a result, the second correspondence point is specified as the correspondence point between the images IMGa and IMGb separately from the first correspondence point (pair of natural feature points). In this way, the estimation device 12 identifies the corresponding points between the images IMGa and IMGb using the natural feature points and the skeleton point BP. Therefore, even in a sports scene in which a plurality of person PNs (athletes) wear the same uniform, the estimation device 12 can suppress erroneous association of person PNs as compared with the case where the skeleton point BP is not extracted.

次に、ステップＳ４４０では、外部パラメータ推定部６２０ａは、８点アルゴリズムを用いて基礎行列を算出する。基礎行列は、カメラＣＡＭの内部パラメータ（カメラＣＡＭの焦点距離、画像ＩＭＧの中心座標等）の情報と、カメラＣＡＭの外部パラメータ（２つのカメラＣＡＭ間の位置関係等を示す回転行列および並進ベクトル）の情報とを含む。 Next, in step S440, the external parameter estimation unit 620a calculates the basic matrix using an 8-point algorithm. The basic matrix consists of information on the internal parameters of the camera CAM (focal length of the camera CAM, center coordinates of the image IMG, etc.) and external parameters of the camera CAM (rotation matrix and translation vector indicating the positional relationship between the two camera CAMs). Includes information about.

例えば、外部パラメータ推定部６２０ａは、ステップＳ４２０で決定した画像ＩＭＧａ、ＩＭＧｂ間の第１対応点およびステップＳ４３０で決定した画像ＩＭＧａ、ＩＭＧｂ間の第２対応点から、８個の対応点を選択する。そして、外部パラメータ推定部６２０ａは、選択した８個の対応点を用いて８点アルゴリズムを実行することにより、カメラＣＡＭの内部パラメータおよび外部パラメータの情報を含む基礎行列を算出する。なお、外部パラメータ推定部６２０ａは、８点の対応点の組み合わせを変更して８点アルゴリズを複数回実行して、基礎行列を算出してもよい。 For example, the external parameter estimation unit 620a selects eight corresponding points from the first corresponding point between the images IMGa and IMGb determined in step S420 and the second corresponding point between the images IMGa and IMGb determined in step S430. .. Then, the external parameter estimation unit 620a calculates the basic matrix including the internal parameters of the camera CAM and the information of the external parameters by executing the 8-point algorithm using the eight selected corresponding points. The external parameter estimation unit 620a may calculate the basic matrix by changing the combination of the eight corresponding points and executing the eight-point algorithm a plurality of times.

次に、ステップＳ４５０では、外部パラメータ推定部６２０ａは、特異値分解（ＳＶＤ：Singular Value Decomposition）により基本行列を分解してカメラＣＡＭの外部パラメータ（回転行列、並進ベクトル）を算出する。基本行列は、カメラＣＡＭの内部パラメータと外部パラメータのうちの外部パラメータのみの情報を含み、２つのカメラＣＡＭ間の相対的な位置および姿勢を示す。 Next, in step S450, the external parameter estimation unit 620a decomposes the basic matrix by singular value decomposition (SVD) to calculate the external parameters (rotation matrix, translation vector) of the camera CAM. The elementary matrix contains information on only the external parameters of the internal and external parameters of the camera CAM and indicates the relative position and orientation between the two camera CAMs.

例えば、外部パラメータ推定部６２０ａは、既知のカメラＣＡＭの内部パラメータとステップＳ４４０で算出した基礎行列とに基づいて基本行列を算出する。そして、外部パラメータ推定部６２０ａは、特異値分解（ＳＶＤ）により基本行列を分解して、カメラＣＡＭａ、ＣＡＭｂ間の相対的な位置および姿勢を示す回転行列および並進ベクトルを算出する。これにより、カメラＣＡＭの外部ベクトルが算出される。ステップＳ４５０の処理の終了により、カメラＣＡＭａ、ＣＡＭｂのペアに対する第１算出処理は、終了する。 For example, the external parameter estimation unit 620a calculates the basic matrix based on the internal parameters of the known camera CAM and the basic matrix calculated in step S440. Then, the external parameter estimation unit 620a decomposes the basic matrix by singular value decomposition (SVD), and calculates a rotation matrix and a translation vector indicating the relative positions and orientations between the cameras CAMa and CAMb. As a result, the external vector of the camera CAM is calculated. With the end of the process in step S450, the first calculation process for the pair of cameras CAMa and CAMb ends.

このように、外部パラメータ推定部６２０ａは、ステップＳ４２０で決定した画像ＩＭＧａ、ＩＭＧｂ間の第１対応点およびステップＳ４３０で決定した画像ＩＭＧａ、ＩＭＧｂ間の第２対応点に基づいて、カメラＣＡＭの外部パラメータを算出する。外部パラメータ推定部６２０ａは、特徴点等の対応付けに骨格点ＢＰを用いない場合に比べて、画像ＩＭＧａ、ＩＭＧｂ間の対応点が正確に特定されるため、カメラＣＡＭの外部パラメータをフレーム毎または数フレームおきに精度よく算出できる。 As described above, the external parameter estimation unit 620a is outside the camera CAM based on the first correspondence point between the images IMGa and IMGb determined in step S420 and the second correspondence point between the images IMGa and IMGb determined in step S430. Calculate the parameters. Since the external parameter estimation unit 620a accurately identifies the corresponding points between the images IMGa and IMGb as compared with the case where the skeleton point BP is not used for associating the feature points and the like, the external parameter of the camera CAM is set for each frame or for each frame. It can be calculated accurately every few frames.

なお、第１算出処理は、図７に示す例に限定されない。例えば、外部パラメータ推定部６２０ａは、ステップＳ４４０において、９個以上の対応点を用いて基礎行列を算出してもよい。 The first calculation process is not limited to the example shown in FIG. 7. For example, the external parameter estimation unit 620a may calculate the basic matrix using nine or more corresponding points in step S440.

以上、図３から図７に示す実施形態においても、図１および図２に示した実施形態と同様の効果を得ることができる。例えば、推定装置１２は、自然特徴点の他に、人物ＰＮの骨格点ＢＰを、複数のカメラＣＡＭから取得した画像ＩＭＧの各々から抽出する。そして、推定装置１２は、２つの画像ＩＭＧから抽出された自然特徴点を対応付けて２つの画像ＩＭＧ間の第１対応点を決定する。さらに、推定装置１２は、画像ＩＭＧから抽出した骨格点ＢＰに基づいて、人物ＰＮの関節の角度θおよび関節間の長さＬの少なくとも一方を含む人物情報を生成する。そして、推定装置１２は、人物情報を用いて、２つの画像ＩＭＧの一方から抽出した骨格点ＢＰと２つの画像ＩＭＧの他方から抽出した骨格点ＢＰとを対応付けて２つの画像ＩＭＧ間の第２対応点を決定する。これにより、２つの画像ＩＭＧ間の対応点として、第１対応点および第２対応点が特定される。推定装置１２は、自然特徴点と骨格点ＢＰとを用いて画像ＩＭＧ間の対応点を決定するため、特徴点等を誤って対応付けることを抑制でき、誤った対応点を用いてカメラＣＡＭの外部パラメータを推定することを抑制できる。これにより、カメラＣＡＭの外部パラメータの推定精度が低下することを抑制することができる。 As described above, even in the embodiments shown in FIGS. 3 to 7, the same effects as those in the embodiments shown in FIGS. 1 and 2 can be obtained. For example, the estimation device 12 extracts the skeleton point BP of the person PN from each of the image IMGs acquired from the plurality of camera CAMs in addition to the natural feature points. Then, the estimation device 12 associates the natural feature points extracted from the two image IMGs to determine the first corresponding point between the two image IMGs. Further, the estimation device 12 generates person information including at least one of the joint angle θ of the person PN and the length L between the joints based on the skeleton point BP extracted from the image IMG. Then, the estimation device 12 uses the person information to associate the skeleton point BP extracted from one of the two image IMGs with the skeleton point BP extracted from the other of the two image IMGs, and the second image IMG between the two image IMGs. 2 Determine the corresponding points. As a result, the first correspondence point and the second correspondence point are specified as the correspondence points between the two image IMGs. Since the estimation device 12 determines the corresponding points between the image IMGs using the natural feature points and the skeleton point BP, it is possible to prevent the feature points and the like from being erroneously associated with each other, and the wrong corresponding points are used outside the camera CAM. It is possible to suppress the estimation of parameters. As a result, it is possible to suppress a decrease in the estimation accuracy of the external parameters of the camera CAM.

また、例えば、推定装置１２は、スポーツシーンにおいて、カメラＣＡＭの姿勢または位置が試合中に変化した場合でも、カメラＣＡＭの姿勢または位置が変化した後に取得した画像ＩＭＧから、カメラＣＡＭの外部パラメータを推定できる。したがって、推定装置１２は、カメラＣＡＭの姿勢または位置が変化した場合でも、カメラＣＡＭの姿勢または位置の変化に応じて、カメラＣＡＭの外部パラメータを精度よく推定できる。すなわち、推定装置１２は、カメラＣＡＭの姿勢または位置が撮影期間中に変化した場合でも、カメラＣＡＭの外部パラメータを精度よく推定できる。 Further, for example, in the sports scene, even if the posture or position of the camera CAM changes during the game, the estimation device 12 obtains the external parameters of the camera CAM from the image IMG acquired after the posture or position of the camera CAM changes. Can be estimated. Therefore, even if the posture or position of the camera CAM changes, the estimation device 12 can accurately estimate the external parameters of the camera CAM according to the change in the posture or position of the camera CAM. That is, the estimation device 12 can accurately estimate the external parameters of the camera CAM even when the posture or position of the camera CAM changes during the shooting period.

図８は、カメラの外部パラメータの推定方法、推定装置および推定プログラムの別の実施形態を示す。図１から図７で説明した要素と同一または同様の要素については、同一または同様の符号を付し、これ等については、詳細な説明を省略する。図８に示す推定装置１４は、例えば、複数のカメラＣＡＭと自由視点映像生成部２０とともに、自由視点映像生成システムＳＹＳに含まれる。推定装置１４は、図３で説明した推定装置１２と同様に、カメラＣＡＭの外部パラメータを推定するキャリブレーションを実行する。例えば、推定装置１４は、被写体を互いに異なる位置から撮影する複数のカメラＣＡＭの外部パラメータを推定し、推定した外部パラメータを自由視点映像生成部２０に転送する。自由視点映像生成部２０は、複数のカメラＣＡＭでそれぞれ撮影された画像ＩＭＧと推定装置１４により推定されたカメラＣＡＭの外部パラメータ等を用いて、自由視点映像を生成する。すなわち、推定装置１４により推定されたカメラＣＡＭの外部パラメータは、自由視点映像を生成するために用いられる。図８に示す例では、複数のカメラＣＡＭは、図３に示した複数のカメラＣＡＭと同様に、被写体を囲むように配置される。なお、図８では、図を見やすくするために、図３に示したカメラＣＡＭｎ等の記載を省略している。 FIG. 8 shows another embodiment of a method of estimating external parameters of a camera, an estimation device and an estimation program. Elements that are the same as or similar to the elements described with reference to FIGS. 1 to 7 are designated by the same or similar reference numerals, and detailed description thereof will be omitted. The estimation device 14 shown in FIG. 8 is included in the free viewpoint image generation system SYS together with, for example, a plurality of camera CAMs and a free viewpoint image generation unit 20. The estimation device 14 performs calibration for estimating the external parameters of the camera CAM in the same manner as the estimation device 12 described with reference to FIG. For example, the estimation device 14 estimates the external parameters of a plurality of camera CAMs that shoot the subject from different positions, and transfers the estimated external parameters to the free viewpoint image generation unit 20. The free-viewpoint image generation unit 20 generates a free-viewpoint image by using the image IMG captured by the plurality of camera CAMs and the external parameters of the camera CAM estimated by the estimation device 14. That is, the external parameters of the camera CAM estimated by the estimation device 14 are used to generate the free viewpoint image. In the example shown in FIG. 8, the plurality of camera CAMs are arranged so as to surround the subject, similarly to the plurality of camera CAMs shown in FIG. In FIG. 8, the description of the camera CAMn and the like shown in FIG. 3 is omitted in order to make the figure easier to see.

推定装置１４は、図３に示したプロセッサ１０２の代わりにプロセッサ１０４を有することを除いて、図３に示した推定装置１２と同一または同様である。例えば、推定装置１４は、コンピュータ等の情報処理装置により実現され、プロセッサ１０４およびメモリ１０００を有する。プロセッサ１０４およびメモリ１０００は、バスＢＵＳに接続される。なお、メモリ１０００には、学習データＬＤＰの他に、後述する追加特徴抽出部３２０（３２０ａ、３２０ｂ、・・・）に使用される学習データＬＤＡが格納される。 The estimation device 14 is the same as or similar to the estimation device 12 shown in FIG. 3, except that it has a processor 104 instead of the processor 102 shown in FIG. For example, the estimation device 14 is realized by an information processing device such as a computer, and has a processor 104 and a memory 1000. The processor 104 and the memory 1000 are connected to the bus BUS. In addition to the learning data LDP, the memory 1000 stores the learning data LDA used in the additional feature extraction units 320 (320a, 320b, ...), Which will be described later.

図８では、図３と同様に、各骨格抽出部３０２とバスＢＵＳとの間のデータ経路の一部を破線で示す。また、図８では、学習データＬＤＡは、メモリ１０００からバスＢＵＳを介して各追加特徴抽出部３２０に転送されるため、各追加特徴抽出部３２０とバスＢＵＳとの間のデータ経路の一部を破線で示す。なお、学習データＬＤＰ、ＬＤＡは、推定装置１４の記憶装置のうちのメモリ１０００以外の記憶装置に格納されてもよく、推定装置１４の外部の記憶装置に格納されてもよい。 In FIG. 8, as in FIG. 3, a part of the data path between each skeleton extraction unit 302 and the bus BUS is shown by a broken line. Further, in FIG. 8, since the training data LDA is transferred from the memory 1000 to each additional feature extraction unit 320 via the bus BUS, a part of the data path between each additional feature extraction unit 320 and the bus BUS is used. Shown by a broken line. The learning data LDP and LDA may be stored in a storage device other than the memory 1000 among the storage devices of the estimation device 14, or may be stored in a storage device outside the estimation device 14.

プロセッサ１０４は、例えば、メモリ１０００に格納される推定プログラム（カメラＣＡＭの外部パラメータの推定プログラム）を実行し、推定装置１４の動作を制御する。なお、推定プログラムは、推定装置１４の記憶装置のうちのメモリ１０００以外の記憶装置に格納されてもよく、推定装置１４の外部の記憶装置に格納されてもよい。また、推定プログラムは、ＣＤ−ＲＯＭ、ＤＶＤ、ＵＳＢメモリ等のコンピュータにより読み取り可能な記録媒体ＲＥＣに格納されてもよい。この場合、記録媒体ＲＥＣに格納された推定プログラムは、推定装置１４に設けられる図示しない入出力インタフェースを介して記録媒体ＲＥＣからメモリ１０００等に転送される。なお、推定プログラムは、記録媒体ＲＥＣから図示しないハードディスクに転送された後、ハードディスクからメモリ１０００に転送されてもよい。 The processor 104 executes, for example, an estimation program (an estimation program for external parameters of the camera CAM) stored in the memory 1000, and controls the operation of the estimation device 14. The estimation program may be stored in a storage device other than the memory 1000 among the storage devices of the estimation device 14, or may be stored in a storage device outside the estimation device 14. Further, the estimation program may be stored in a recording medium REC that can be read by a computer such as a CD-ROM, a DVD, or a USB memory. In this case, the estimation program stored in the recording medium REC is transferred from the recording medium REC to the memory 1000 or the like via an input / output interface (not shown) provided in the estimation device 14. The estimation program may be transferred from the recording medium REC to a hard disk (not shown) and then transferred from the hard disk to the memory 1000.

プロセッサ１０４は、図３に示したプロセッサ１０２と同様に、推定プログラムを実行することにより、複数の特徴点抽出部２０２等の機能を実現する。なお、プロセッサ１０４は、図３に示した複数の骨格対応付け部５０２の代わりに複数の骨格対応付け部５０４（５０４ａ、５０４ｂ、・・・）の機能を実現する。さらに、プロセッサ１０４は、図３に示したプロセッサ１０２が実現する機能に追加して、複数の追加特徴抽出部３２０（３２０ａ、３２０ｂ、・・・）および複数の対応付け補助部５２０（５２０ａ、５２０ｂ、・・・）の機能を実現する。なお、複数の特徴点抽出部２０２、複数の骨格抽出部３０２、複数の追加特徴抽出部３２０、複数の特徴点対応付け部４０２、複数の骨格対応付け部５０４、複数の対応付け補助部５２０および推定部６０２は、ハードウェアのみで実現されてもよい。 Similar to the processor 102 shown in FIG. 3, the processor 104 realizes functions such as a plurality of feature point extraction units 202 by executing an estimation program. The processor 104 realizes the functions of the plurality of skeleton association units 504 (504a, 504b, ...) Instead of the plurality of skeleton association units 502 shown in FIG. Further, the processor 104 adds a plurality of additional feature extraction units 320 (320a, 320b, ...) And a plurality of association auxiliary units 520 (520a, 520b) in addition to the functions realized by the processor 102 shown in FIG. , ...) to realize the function. A plurality of feature point extraction units 202, a plurality of skeleton extraction units 302, a plurality of additional feature extraction units 320, a plurality of feature point association units 402, a plurality of skeleton association units 504, a plurality of association auxiliary units 520, and the like. The estimation unit 602 may be realized only by hardware.

このように、推定装置１４では、複数の骨格対応付け部５０４が図３に示した複数の骨格対応付け部５０２の代わりに設けられ、複数の追加特徴抽出部３２０および複数の対応付け補助部５２０が図３に示した推定装置１２に追加される。推定装置１４のその他の構成は、図３に示した推定装置１２と同一または同様である。例えば、推定装置１４は、複数の特徴点抽出部２０２、複数の骨格抽出部３０２、複数の追加特徴抽出部３２０、複数の特徴点対応付け部４０２、複数の骨格対応付け部５０４、複数の対応付け補助部５２０および推定部６０２を有する。 As described above, in the estimation device 14, a plurality of skeleton association units 504 are provided in place of the plurality of skeleton association units 502 shown in FIG. 3, a plurality of additional feature extraction units 320 and a plurality of association auxiliary units 520. Is added to the estimation device 12 shown in FIG. Other configurations of the estimation device 14 are the same as or similar to those of the estimation device 12 shown in FIG. For example, the estimation device 14 includes a plurality of feature point extraction units 202, a plurality of skeleton extraction units 302, a plurality of additional feature extraction units 320, a plurality of feature point association units 402, a plurality of skeleton association units 504, and a plurality of correspondences. It has an attachment auxiliary unit 520 and an estimation unit 602.

複数の特徴点抽出部２０２の各々は、図３に示した複数の特徴点抽出部２０２の各々と同一または同様である。複数の骨格抽出部３０２の各々は、図３に示した複数の骨格抽出部３０２の各々と同一または同様である。複数の特徴点対応付け部４０２の各々は、図３に示した複数の特徴点対応付け部４０２の各々と同一または同様である。 Each of the plurality of feature point extraction units 202 is the same as or similar to each of the plurality of feature point extraction units 202 shown in FIG. Each of the plurality of skeleton extraction units 302 is the same as or similar to each of the plurality of skeleton extraction units 302 shown in FIG. Each of the plurality of feature point mapping units 402 is the same as or similar to each of the plurality of feature point association units 402 shown in FIG.

複数の追加特徴抽出部３２０の各々は、複数のカメラＣＡＭの各々に対応して設けられ、対応するカメラＣＡＭから画像ＩＭＧを順次取得する。すなわち、複数の追加特徴抽出部３２０の各々は、複数の特徴点抽出部２０２の各々に対応して設けられ、対応する特徴点抽出部２０２に転送される画像ＩＭＧを取得する。このように、各カメラＣＡＭで撮影された画像ＩＭＧは、対応する特徴点抽出部２０２、骨格抽出部３０２および追加特徴抽出部３２０に転送される。そして、各追加特徴抽出部３２０は、対応するカメラＣＡＭから取得した画像ＩＭＧから、骨格以外の人物ＰＮの特徴を示す追加特徴点を抽出する。 Each of the plurality of additional feature extraction units 320 is provided corresponding to each of the plurality of camera CAMs, and sequentially acquires an image IMG from the corresponding camera CAMs. That is, each of the plurality of additional feature extraction units 320 is provided corresponding to each of the plurality of feature point extraction units 202, and acquires an image IMG to be transferred to the corresponding feature point extraction unit 202. In this way, the image IMG captured by each camera CAM is transferred to the corresponding feature point extraction unit 202, the skeleton extraction unit 302, and the additional feature extraction unit 320. Then, each additional feature extraction unit 320 extracts additional feature points indicating the features of the person PN other than the skeleton from the image IMG acquired from the corresponding camera CAM.

追加特徴点は、例えば、人物ＰＮの顔を示す顔特徴点および人物ＰＮに着用された服飾を示す服飾特徴点の少なくとも１つを含む。また、服飾特徴点は、例えば、服飾上に設けられ、人物ＰＮを識別する識別情報と、人物ＰＮに着用されたシューズとの少なくとも１つを示す。例えば、識別情報は、背番号および人物ＰＮの名前のいずれかである。なお、複数のカメラＣＡＭにより球技が撮影される場合、追加特徴点は、ボールを示すボール特徴点を含んでもよい。 The additional feature points include, for example, at least one of a facial feature point indicating the face of the person PN and a clothing feature point indicating the clothing worn on the person PN. Further, the clothing feature point indicates, for example, at least one of the identification information provided on the clothing and identifying the person PN and the shoes worn by the person PN. For example, the identification information is either a number or the name of a person PN. When the ball game is photographed by a plurality of camera CAMs, the additional feature points may include a ball feature point indicating the ball.

各追加特徴抽出部３２０は、例えば、人物ＰＮの顔の画像データ、人物ＰＮに着用された服飾の画像データ等を含む学習データＬＤＡを、バスＢＵＳを介してメモリ１０００から受ける。そして、各追加特徴抽出部３２０は、学習データＬＤＡに基づいて、画像ＩＭＧ中の人物ＰＮの顔を示す顔特徴点および人物ＰＮに着用された服飾を示す服飾特徴点の少なくとも１つを含む追加特徴点を抽出する。 Each additional feature extraction unit 320 receives, for example, learning data LDA including image data of the face of the person PN, image data of clothes worn by the person PN, and the like from the memory 1000 via the bus BUS. Then, each additional feature extraction unit 320 includes at least one of a facial feature point indicating the face of the person PN and a clothing feature point indicating the clothing worn on the person PN in the image IMG based on the learning data LDA. Extract feature points.

なお、各追加特徴抽出部３２０は、複数のカメラＣＡＭにより球技が撮影される場合、ボールを示すボール特徴点を追加特徴点として抽出してもよい。例えば、各追加特徴抽出部３２０は、人物ＰＮの顔の画像データ、服飾の画像データおよびボールの画像データ等を含む学習データＬＤＡに基づいて、画像ＩＭＧ中の人物ＰＮの顔特徴点、服飾特徴点およびボール特徴点等を追加特徴点として抽出してもよい。 When the ball game is photographed by a plurality of cameras CAM, each additional feature extraction unit 320 may extract a ball feature point indicating the ball as an additional feature point. For example, each additional feature extraction unit 320 uses the learning data LDA including the image data of the face of the person PN, the image data of the clothing, the image data of the ball, and the like, and the face feature points and the clothing features of the person PN in the image IMG. Points, ball feature points, and the like may be extracted as additional feature points.

ここで、画像中の人物ＰＮの顔を認識する方法は、例えば、Peiyun Hu, Deva Ramanan, “Finding Tiny Faces”, in arXiv pre-print 1612.04402, 2016、［平成２９年９月４日検索］、インターネット＜ＵＲＬ：
1511929379642_1.pdf
＞に開示されている。 Here, the method of recognizing the face of the person PN in the image is, for example, Peiyun Hu, Deva Ramanan, “Finding Tiny Faces”, in arXiv pre-print 1612.04402, 2016, [Search on September 4, 2017], Internet <URL:
1511929379642_1.pdf
> Is disclosed.

複数の対応付け補助部５２０の各々は、互いに隣接するカメラＣＡＭのペアに対応して設けられる。すなわち、複数の対応付け補助部５２０の各々は、追加特徴抽出部３２０のペアに対応して設けられる。例えば、各対応付け補助部５２０は、対応する追加特徴抽出部３２０のペアの各々から、画像ＩＭＧ中の人物ＰＮの追加特徴点（例えば、人物ＰＮの顔、人物ＰＮに着用された服飾等）を示す追加特徴情報を受ける。そして、各対応付け補助部５２０は、追加特徴抽出部３２０のペアの一方から受けた追加情報で示される追加特徴点と追加特徴抽出部３２０のペアの他方から受けた追加情報で示される追加特徴点とを対応付けて２つの画像ＩＭＧ間の追加対応点を決定する。例えば、対応付け補助部５２０ａは、追加特徴抽出部３２０ａにより抽出された追加特徴点と、追加特徴抽出部３２０ｂにより抽出された追加特徴点との間で、類似する特徴（類似する顔、類似する服飾等）をペアにする。これにより、互いに隣接するカメラＣＡＭａ、ＣＡＭｂでそれぞれ撮影された２つの画像ＩＭＧ間の追加対応点（追加特徴点のペア）が特定される。複数の対応付け補助部５２０の各々は、対応する骨格対応付け部５０４に追加対応点を示す情報を転送する。 Each of the plurality of association auxiliary units 520 is provided corresponding to a pair of camera CAMs adjacent to each other. That is, each of the plurality of association auxiliary units 520 is provided corresponding to the pair of the additional feature extraction units 320. For example, each matching auxiliary unit 520 has additional feature points of the person PN in the image IMG (for example, the face of the person PN, the clothing worn on the person PN, etc.) from each of the pairs of the corresponding additional feature extraction units 320. Receive additional feature information indicating. Then, each association auxiliary unit 520 has an additional feature point indicated by additional information received from one of the pairs of the additional feature extraction unit 320 and an additional feature indicated by the additional information received from the other of the pair of the additional feature extraction unit 320. An additional corresponding point between the two image IMGs is determined by associating the points with each other. For example, the association auxiliary unit 520a has similar features (similar faces, similar features) between the additional feature points extracted by the additional feature extraction unit 320a and the additional feature points extracted by the additional feature extraction unit 320b. Clothes, etc.) are paired. As a result, additional corresponding points (pairs of additional feature points) between the two image IMGs taken by the cameras CAMa and CAMb adjacent to each other are specified. Each of the plurality of association auxiliary units 520 transfers information indicating an additional correspondence point to the corresponding skeleton association unit 504.

複数の骨格対応付け部５０４の各々は、対応する対応付け補助部５２０で決定された追加対応点を２つの画像ＩＭＧ間の第２対応点の決定に用いることを除いて、図３に示した複数の骨格対応付け部５０２の各々と同一または同様である。すなわち、複数の骨格対応付け部５０４の各々は、２つの画像ＩＭＧ間の追加対応点を用いて、２つの画像ＩＭＧの一方から抽出された骨格点ＢＰと２つの画像ＩＭＧの他方から抽出された骨格点ＢＰとを対応付けて２つの画像ＩＭＧ間の第２対応点を決定する。例えば、各骨格対応付け部５０４は、２つの画像ＩＭＧ間の追加対応点により対応付けられた人物ＰＮに対しては、対応付けられた人物ＰＮ間の対応付け評価値（関節の角度θ等の差分の絶対値和）に基づいて、同一人物か否かを判定する。なお、各骨格対応付け部５０４は、追加特徴点の対応付け先が互いに存在しない人物ＰＮに対しては、追加特徴点の対応付け先が互いに存在しない人物ＰＮ間の対応付け評価値（関節の角度θ等の差分の絶対値和）に基づいて、同一人物か否かを判定してもよい。 Each of the plurality of skeleton mapping units 504 is shown in FIG. 3, except that the additional corresponding points determined by the corresponding mapping auxiliary unit 520 are used to determine the second corresponding point between the two image IMGs. It is the same as or similar to each of the plurality of skeleton association units 502. That is, each of the plurality of skeleton association units 504 was extracted from the skeleton point BP extracted from one of the two image IMGs and the other of the two image IMGs by using the additional correspondence points between the two image IMGs. The second corresponding point between the two image IMGs is determined in association with the skeleton point BP. For example, for each skeleton association unit 504, for the person PN associated with the additional correspondence point between the two image IMGs, the association evaluation value between the associated person PNs (joint angle θ, etc.) Based on the absolute value sum of the differences), it is determined whether or not the person is the same person. In addition, in each skeleton association unit 504, for a person PN in which the association destinations of the additional feature points do not exist with each other, the association evaluation value between the person PNs in which the association destinations of the additional feature points do not exist with each other (joint). It may be determined whether or not the person is the same person based on the absolute value sum of the differences such as the angle θ).

例えば、図５に示した画像ＩＭＧａ中の人物ＰＮ５ａの背番号と画像ＩＭＧｂ中の人物ＰＮ３ｂの背番号とが異なる場合、人物ＰＮ５ａの追加特徴点と人物ＰＮ３ｂの追加特徴点とはペアにならない。なお、画像ＩＭＧｂ中の人物ＰＮ３ｂの追加特徴点は、背番号が一致する画像ＩＭＧａ中の人物ＰＮ３ａの追加特徴点と対応付けられる。関節の角度θおよび関節間の長さＬが人物ＰＮ５ａ、ＰＮ３ｂ間で類似する場合でも、人物ＰＮ３ｂの追加特徴点が人物ＰＮ５ａ以外の人物ＰＮ３ａの追加特徴点に対応付けられているため、人物ＰＮ５ａと人物ＰＮ３ｂとが同一人物とは判定されない。この場合、骨格対応付け部５０４ａは、人物ＰＮ３ａの追加特徴点と人物ＰＮ３ｂの追加特徴点とが対応付けられ、かつ、関節の角度θおよび関節間の長さＬが人物ＰＮ３ａ、ＰＮ３ｂ間で類似するため、人物ＰＮ３ａと人物ＰＮ３ｂとが同一人物と判定する。骨格対応付け部５０４ａは、画像ＩＭＧａ、ＩＭＧｂ間の追加対応点を用いることにより、人物ＰＮ５ａと人物ＰＮ３ｂとが同一人物であると誤って判定することを抑制できる。 For example, when the uniform number of the person PN5a in the image IMGa and the uniform number of the person PN3b in the image IMGb are different from each other, the additional feature points of the person PN5a and the additional feature points of the person PN3b are not paired. The additional feature points of the person PN3b in the image IMGb are associated with the additional feature points of the person PN3a in the image IMGa having the same number. Even when the joint angle θ and the length L between the joints are similar between the person PN5a and the person PN3b, the additional feature points of the person PN3b are associated with the additional feature points of the person PN3a other than the person PN5a, so that the person PN5a And the person PN3b are not determined to be the same person. In this case, in the skeleton association unit 504a, the additional feature points of the person PN3a and the additional feature points of the person PN3b are associated with each other, and the joint angle θ and the length L between the joints are similar between the person PN3a and PN3b. Therefore, it is determined that the person PN3a and the person PN3b are the same person. By using the additional correspondence points between the images IMGa and IMGb, the skeleton association unit 504a can suppress erroneous determination that the person PN5a and the person PN3b are the same person.

また、例えば、図５に示す画像ＩＭＧａ、ＩＭＧｂにおいて、ボール特徴点が抽出された場合、対応付け補助部５２０ａは、画像ＩＭＧａ中の人物ＰＮ１ａが保持するボールと画像ＩＭＧｂ中の人物ＰＮ１ｂが保持するボールとを対応付ける。これにより、画像ＩＭＧａ、ＩＭＧｂ間のボール特徴点のペアが画像ＩＭＧａ、ＩＭＧｂ間の追加対応点として特定される。この結果、例えば、人物ＰＮ１の体型および姿勢に類似する人物ＰＮが存在する場合でも、骨格対応付け部５０４ａは、ボール特徴点のペア（追加対応点）に基づいて、画像ＩＭＧａ中の人物ＰＮ１ａと画像ＩＭＧｂ中の人物ＰＮ１ｂとを対応付けることができる。 Further, for example, when the ball feature points are extracted in the images IMGa and IMGb shown in FIG. 5, the association auxiliary unit 520a is held by the ball held by the person PN1a in the image IMGa and the person PN1b in the image IMGb. Associate with the ball. As a result, the pair of ball feature points between the images IMGa and IMGb is specified as additional corresponding points between the images IMGa and IMGb. As a result, for example, even if there is a person PN similar to the body shape and posture of the person PN1, the skeleton association unit 504a and the person PN1a in the image IMGa are based on the pair of ball feature points (additional corresponding points). It is possible to associate with the person PN1b in the image IMGb.

このように、推定装置１４は、追加特徴点を用いて、２つの画像ＩＭＧ間の第２対応点を決定する。 In this way, the estimation device 14 uses the additional feature points to determine the second corresponding point between the two image IMGs.

推定部６０２は、図３に示した推定部６０２と同一または同様である。すなわち、推定部６０２が有する複数の外部パラメータ推定部６２０の各々は、図３に示した複数の外部パラメータ推定部６２０の各々と同一または同様である。 The estimation unit 602 is the same as or similar to the estimation unit 602 shown in FIG. That is, each of the plurality of external parameter estimation units 620 included in the estimation unit 602 is the same as or similar to each of the plurality of external parameter estimation units 620 shown in FIG.

なお、推定装置１４の構成は、図８に示す例に限定されない。例えば、カメラＣＡＭで撮影された画像ＩＭＧを受信し、カメラＣＡＭから受信した画像ＩＭＧを特徴点抽出部２０２、骨格抽出部３０２および追加特徴抽出部３２０にバスＢＵＳを介して転送するインタフェースが推定装置１４に含まれてもよい。また、追加特徴抽出部３２０は、複数のカメラＣＡＭにより球技が撮影されるか否かに拘わらず、ボールを示すボール特徴点を抽出するための処理を実行してもよい。また、例えば、プロセッサ１０４は、自由視点映像を生成するプログラムを実行することにより、自由視点映像生成部２０の機能を実現してもよい。 The configuration of the estimation device 14 is not limited to the example shown in FIG. For example, an interface that receives an image IMG taken by the camera CAM and transfers the image IMG received from the camera CAM to the feature point extraction unit 202, the skeleton extraction unit 302, and the additional feature extraction unit 320 via the bus BUS is an estimation device. 14 may be included. Further, the additional feature extraction unit 320 may execute a process for extracting a ball feature point indicating the ball regardless of whether or not the ball game is photographed by the plurality of camera CAMs. Further, for example, the processor 104 may realize the function of the free viewpoint image generation unit 20 by executing a program for generating a free viewpoint image.

図９は、図８に示した推定装置１４の動作の一例を示す。図９に示す動作は、カメラＣＡＭの外部パラメータの推定方法の一例である。また、図９に示す動作をコンピュータ等の推定装置１４に実行させるためのプログラムは、カメラＣＡＭの外部パラメータの推定プログラムの一例である。図９に示す動作は、カメラＣＡＭで撮影される映像のフレーム毎に実行される。なお、図９に示す動作は、数フレームおきに実行されてもよい。図６で説明したステップと同一または同様のステップについては、同一または同様の符号を付し、これ等については、詳細な説明を省略する。 FIG. 9 shows an example of the operation of the estimation device 14 shown in FIG. The operation shown in FIG. 9 is an example of a method of estimating the external parameters of the camera CAM. Further, the program for causing the estimation device 14 such as a computer to execute the operation shown in FIG. 9 is an example of an external parameter estimation program of the camera CAM. The operation shown in FIG. 9 is executed for each frame of the image captured by the camera CAM. The operation shown in FIG. 9 may be executed every few frames. Steps that are the same as or similar to those described in FIG. 6 are designated by the same or similar reference numerals, and detailed description thereof will be omitted.

図９に示す動作は、図６に示したステップＳ４００の第１算出処理の代わりにステップＳ４０２の第１算出処理が実行されることを除いて、図６に示した動作と同一または同様である。ステップＳ４０２の第１算出処理は、２つの画像ＩＭＧ間の第２対応点の決定に追加特徴点を用いることを除いて、図６に示したステップＳ４００の第１算出処理と同一または同様である。ステップＳ４０２の第１算出処理の詳細は、図１０で説明する。 The operation shown in FIG. 9 is the same as or similar to the operation shown in FIG. 6, except that the first calculation process of step S402 is executed instead of the first calculation process of step S400 shown in FIG. .. The first calculation process in step S402 is the same as or similar to the first calculation process in step S400 shown in FIG. 6, except that additional feature points are used to determine the second corresponding point between the two image IMGs. .. Details of the first calculation process in step S402 will be described with reference to FIG.

図１０は、図９に示した第１算出処理（ステップＳ４０２）の一例を示す。なお、図１０に示す第１算出処理は、互いに隣接するカメラＣＡＭの１ペアに対する第１算出処理である。例えば、図１０に示す第１算出処理は、互いに隣接するカメラＣＡＭのペア毎に並列に実行される。なお、図１０に示す第１算出処理は、互いに隣接するカメラＣＡＭのペア毎に順次実行されてもよい。図７で説明したステップと同一または同様のステップについては、同一または同様の符号を付し、これ等については、詳細な説明を省略する。 FIG. 10 shows an example of the first calculation process (step S402) shown in FIG. The first calculation process shown in FIG. 10 is the first calculation process for one pair of camera CAMs adjacent to each other. For example, the first calculation process shown in FIG. 10 is executed in parallel for each pair of camera CAMs adjacent to each other. The first calculation process shown in FIG. 10 may be sequentially executed for each pair of camera CAMs adjacent to each other. Steps that are the same as or similar to the steps described with reference to FIG. 7 are designated by the same or similar reference numerals, and detailed description thereof will be omitted.

図１０に示す第１算出処理では、ステップＳ４１２の処理が図７に示した第１算出処理に追加される。そして、図１０に示す第１算出処理は、図７に示したステップＳ４３０の処理の代わりにステップＳ４３２の処理を含む。図１０に示す第１算出処理のその他の処理は、図７に示した第１算出処理と同一または同様である。図１０では、図７と同様に、カメラＣＡＭａ、ＣＡＭｂのペアに対する第１算出処理を説明する。 In the first calculation process shown in FIG. 10, the process of step S412 is added to the first calculation process shown in FIG. 7. Then, the first calculation process shown in FIG. 10 includes the process of step S432 instead of the process of step S430 shown in FIG. 7. Other processes of the first calculation process shown in FIG. 10 are the same as or similar to the first calculation process shown in FIG. 7. In FIG. 10, the first calculation process for the pair of cameras CAMa and CAMb will be described in the same manner as in FIG.

ステップＳ４１０において、カメラＣＡＭａ、ＣＡＭｂのペアから取得した画像ＩＭＧａ、ＩＭＧｂの両方から人物ＰＮの骨格点ＢＰが抽出されたと骨格対応付け部５０４ａが判定した場合、推定装置１４は、ステップＳ４１２の処理を実行する。なお、ペアの画像ＩＭＧａ、ＩＭＧｂの少なくとも一方から人物ＰＮの骨格点ＢＰが抽出されない場合、図７で説明したように、カメラＣＡＭａ、ＣＡＭｂのペアに対する第１算出処理は、終了する。 In step S410, when the skeleton association unit 504a determines that the skeleton point BP of the person PN has been extracted from both the images IMGa and IMGb acquired from the pair of cameras CAMa and CAMb, the estimation device 14 performs the process of step S412. Run. When the skeleton point BP of the person PN is not extracted from at least one of the paired images IMGa and IMGb, the first calculation process for the pair of cameras CAMa and CAMb ends as described with reference to FIG.

ステップＳ４１２では、追加特徴抽出部３２０ａは、カメラＣＡＭａで撮影された画像ＩＭＧａから追加特徴点（例えば、人物ＰＮの顔、人物ＰＮに着用された服飾等を示す特徴点）を抽出する。例えば、追加特徴抽出部３２０ａは、学習データＬＤＡに基づいて、画像ＩＭＧａ中の人物ＰＮの顔を示す顔特徴点および人物ＰＮに着用された服飾を示す服飾特徴点の少なくとも１つを含む追加特徴点を抽出する。ステップＳ４１２の処理が実行された後、推定装置１４の動作は、ステップＳ４２０に移る。そして、ステップＳ４２０の処理が実行された後、推定装置１４の動作は、ステップＳ４３２に移る。 In step S412, the additional feature extraction unit 320a extracts additional feature points (for example, a face of a person PN, a feature point indicating clothing worn on the person PN, etc.) from the image IMGa taken by the camera CAMa. For example, the additional feature extraction unit 320a includes at least one of the facial feature points indicating the face of the person PN and the clothing feature points indicating the clothes worn by the person PN in the image IMGa based on the learning data LDA. Extract points. After the process of step S412 is executed, the operation of the estimation device 14 shifts to step S420. Then, after the process of step S420 is executed, the operation of the estimation device 14 shifts to step S432.

ステップＳ４３２では、骨格対応付け部５０４ａは、ステップＳ４１２で抽出された追加特徴点を用いて、ペアの画像ＩＭＧａ、ＩＭＧｂの各々から抽出された骨格点ＢＰを対応付けて画像ＩＭＧａ、ＩＭＧｂ間の第２対応点を決定する。例えば、骨格対応付け部５０４ａは、図８で説明したように、画像ＩＭＧａ、ＩＭＧｂ間の追加対応点を用いて、画像ＩＭＧａから抽出された骨格点ＢＰと画像ＩＭＧｂから抽出された骨格点ＢＰとを対応付けて画像ＩＭＧａ、ＩＭＧｂ間の第２対応点を決定する。これにより、画像ＩＭＧａ、ＩＭＧｂ間の対応点として、第２対応点が、第１対応点とは別に特定される。骨格対応付け部５０４ａは、画像ＩＭＧａ、ＩＭＧｂ間の追加特徴点のペアを用いて骨格点ＢＰの対応付けを決定するため、体型および姿勢が類似する人物ＰＮが存在する場合でも、追加特徴点を用いない場合に比べて、骨格点ＢＰの対応付けを正確にできる。 In step S432, the skeleton association unit 504a associates the skeleton points BP extracted from each of the paired images IMGa and IMGb with the additional feature points extracted in step S412, and associates the skeleton points BP between the images IMGa and IMGb. 2 Determine the corresponding points. For example, as described in FIG. 8, the skeleton association unit 504a uses the additional correspondence points between the image IMGa and the IMGb to obtain the skeleton point BP extracted from the image IMGa and the skeleton point BP extracted from the image IMGb. Is associated with each other to determine the second corresponding point between the images IMGa and IMGb. As a result, the second corresponding point is specified separately from the first corresponding point as the corresponding point between the images IMGa and IMGb. Since the skeleton association unit 504a determines the association of the skeleton points BP using the pair of additional feature points between the images IMGa and IMGb, the additional feature points are set even if there is a person PN having a similar body shape and posture. Compared with the case where it is not used, the skeleton point BP can be associated more accurately.

ステップＳ４３２の処理が実行された後、推定装置１４の動作は、ステップＳ４４０に移る。そして、ステップＳ４４０の処理が実行された後、推定装置１４の動作は、ステップＳ４５０に移る。ステップＳ４５０の処理の終了により、カメラＣＡＭａ、ＣＡＭｂのペアに対する第１算出処理は、終了する。 After the process of step S432 is executed, the operation of the estimation device 14 shifts to step S440. Then, after the process of step S440 is executed, the operation of the estimation device 14 shifts to step S450. With the end of the process in step S450, the first calculation process for the pair of cameras CAMa and CAMb ends.

推定装置１４は、追加特徴点を第２対応点の決定に用いない場合に比べて、画像ＩＭＧａ、ＩＭＧｂ間の第２対応点を正確に特定できるため、カメラＣＡＭの外部パラメータをフレーム毎または数フレームおきに精度よく算出できる。なお、第１算出処理は、図１０に示す例に限定されない。例えば、ステップＳ４２０の処理は、ステップＳ４１２の処理より前に実効されてもよい。 Since the estimation device 14 can accurately identify the second corresponding point between the images IMGa and IMGb as compared with the case where the additional feature point is not used for determining the second corresponding point, the external parameter of the camera CAM is set for each frame or the number. It can be calculated accurately every frame. The first calculation process is not limited to the example shown in FIG. For example, the process of step S420 may be executed before the process of step S412.

以上、図８から図１０に示す実施形態においても、図１から図７に示した実施形態と同様の効果を得ることができる。例えば、推定装置１４は、自然特徴点の他に、人物ＰＮの骨格点ＢＰを用いて画像ＩＭＧ間の対応点を決定するため、特徴点等を誤って対応付けることを抑制でき、誤った対応点を用いてカメラＣＡＭの外部パラメータを推定することを抑制できる。 As described above, even in the embodiments shown in FIGS. 8 to 10, the same effects as those in the embodiments shown in FIGS. 1 to 7 can be obtained. For example, since the estimation device 14 determines the corresponding points between the image IMGs by using the skeleton point BP of the person PN in addition to the natural feature points, it is possible to suppress erroneous association of the feature points and the like, and the erroneous corresponding points. Can be used to suppress the estimation of external parameters of the camera CAM.

さらに、推定装置１４は、自然特徴点および骨格点ＢＰの他に、骨格点ＢＰ以外の人物ＰＮの特徴を示す追加特徴点を複数のカメラＣＡＭから取得した画像ＩＭＧの各々から抽出する。そして、推定装置１４は、人物ＰＮの関節の角度θおよび関節間の長さＬの少なくとも一方を含む人物情報と、人物ＰＮの追加特徴点とを用いて、２つの画像ＩＭＧ間で骨格点ＢＰを対応付けて２つの画像ＩＭＧ間の第２対応点を決定する。これにより、推定装置１４は、追加特徴点を第２対応点の決定に用いない場合に比べて、２つの画像ＩＭＧ間の第２対応点を正確に特定できる。すなわち、推定装置１４は、誤った第２対応点等を用いてカメラＣＡＭの外部パラメータを推定することを抑制できる。この結果、カメラＣＡＭの外部パラメータの推定精度が低下することを抑制することができる。 Further, the estimation device 14 extracts, in addition to the natural feature point and the skeleton point BP, additional feature points indicating the features of the person PN other than the skeleton point BP from each of the image IMGs acquired from the plurality of camera CAMs. Then, the estimation device 14 uses the person information including at least one of the joint angle θ of the person PN and the length L between the joints and the additional feature points of the person PN to use the skeleton point BP between the two image IMGs. To determine the second corresponding point between the two image IMGs. As a result, the estimation device 14 can accurately identify the second corresponding point between the two image IMGs as compared with the case where the additional feature point is not used for determining the second corresponding point. That is, the estimation device 14 can suppress the estimation of the external parameters of the camera CAM by using an erroneous second corresponding point or the like. As a result, it is possible to suppress a decrease in the estimation accuracy of the external parameters of the camera CAM.

また、例えば、推定装置１４は、スポーツシーンにおいて、カメラＣＡＭの姿勢または位置が試合中に変化した場合でも、カメラＣＡＭの姿勢または位置が変化した後に取得した画像ＩＭＧから、カメラＣＡＭの外部パラメータを推定できる。したがって、推定装置１４は、カメラＣＡＭの姿勢または位置が変化した場合でも、カメラＣＡＭの姿勢または位置の変化に応じて、カメラＣＡＭの外部パラメータを精度よく推定できる。すなわち、推定装置１２は、カメラＣＡＭの姿勢または位置が撮影期間中に変化した場合でも、カメラＣＡＭの外部パラメータを精度よく推定できる。 Further, for example, in the sports scene, even if the posture or position of the camera CAM changes during the game, the estimation device 14 obtains the external parameters of the camera CAM from the image IMG acquired after the posture or position of the camera CAM changes. Can be estimated. Therefore, even if the posture or position of the camera CAM changes, the estimation device 14 can accurately estimate the external parameters of the camera CAM according to the change in the posture or position of the camera CAM. That is, the estimation device 12 can accurately estimate the external parameters of the camera CAM even when the posture or position of the camera CAM changes during the shooting period.

以上の実施形態において説明した発明を整理して、付記として以下の通り開示する。
（付記１）
被写体を互いに異なる位置から撮影する複数のカメラの外部パラメータを推定する推定方法において、
前記複数のカメラの各々から取得した画像から、撮影された空間の特徴点を抽出し、
前記複数のカメラの各々から取得した画像から人物の骨格を抽出し、
前記複数のカメラから取得した複数の画像のうちの２つの画像のペア毎に、前記２つの画像の一方から抽出した前記特徴点と前記２つの画像の他方から抽出した前記特徴点とを対応付けて前記２つの画像間の第１対応点を決定し、
前記２つの画像のペア毎に、前記２つの画像の一方から抽出した前記骨格と前記２つの画像の他方から抽出した前記骨格とを対応付けて前記２つの画像間の第２対応点を決定し、
前記２つの画像間の前記第１対応点および前記第２対応点に基づいて、カメラの外部パラメータを推定する
ことを特徴とする推定方法。
（付記２）
付記１に記載の推定方法において、
画像から抽出した前記骨格に基づいて、人物の関節の角度および関節間の長さの少なくとも一方を含む人物情報を生成し、
前記人物情報を用いて、前記２つの画像の一方から抽出した前記骨格と前記２つの画像の他方から抽出した前記骨格とを対応付けて前記２つの画像間の前記第２対応点を決定する
ことを特徴とするカメラの外部パラメータの推定方法。
（付記３）
付記１または付記２に記載の推定方法において、
前記複数のカメラの各々から取得した画像から、前記骨格以外の人物の特徴を示す追加特徴点を抽出し、
前記追加特徴点を用いて、前記２つの画像の一方から抽出した前記骨格と前記２つの画像の他方から抽出した前記骨格とを対応付けて前記２つの画像間の前記第２対応点を決定する
ことを特徴とするカメラの外部パラメータの推定方法。
（付記４）
付記３に記載の推定方法において、
前記追加特徴点は、人物の顔を示す顔特徴点および人物に着用された服飾を示す服飾特徴点の少なくとも１つを含む
ことを特徴とする推定方法。
（付記５）
付記４に記載の推定方法において、
前記服飾特徴点は、前記服飾上に設けられ、人物を識別する識別情報と、人物に着用されたシューズとの少なくとも１つを示す
ことを特徴とする推定方法。
（付記６）
付記５に記載の推定方法において、
前記識別情報は、背番号および人物の名前のいずれかである
ことを特徴とする推定方法。
（付記７）
付記３ないし付記６のいずれか１項に記載の推定方法において、
前記複数のカメラにより球技が撮影される場合、前記追加特徴点は、ボールを示すボール特徴点を含む
ことを特徴とするカメラの外部パラメータの推定方法。
（付記８）
付記１ないし付記７のいずれか１項に記載の推定方法において、
前記外部パラメータは、任意の視点の自由視点映像を生成するために用いられる
ことを特徴とする推定方法。
（付記９）
被写体を互いに異なる位置から撮影する複数のカメラの外部パラメータを推定する推定装置において、
前記複数のカメラの各々から取得した画像から、撮影された空間の特徴点を抽出する特徴点抽出部と、
前記複数のカメラの各々から取得した画像から人物の骨格を抽出する骨格抽出部と、
前記複数のカメラから取得した複数の画像のうちの２つの画像のペア毎に、前記２つの画像の一方から抽出した前記特徴点と前記２つの画像の他方から抽出した前記特徴点とを対応付けて前記２つの画像間の第１対応点を決定する特徴点対応付け部と、
前記２つの画像のペア毎に、前記２つの画像の一方から抽出した前記骨格と前記２つの画像の他方から抽出した前記骨格とを対応付けて前記２つの画像間の第２対応点を決定する骨格対応付け部と、
前記２つの画像間の前記第１対応点および前記第２対応点に基づいて、カメラの外部パラメータを推定する推定部と
を有することを特徴とする推定装置。
（付記１０）
被写体を互いに異なる位置から撮影する複数のカメラの外部パラメータの推定プログラムにおいて、
前記複数のカメラの各々から取得した画像から、撮影された空間の特徴点を抽出し、
前記複数のカメラの各々から取得した画像から人物の骨格を抽出し、
前記複数のカメラから取得した複数の画像のうちの２つの画像のペア毎に、前記２つの画像の一方から抽出した前記特徴点と前記２つの画像の他方から抽出した前記特徴点とを対応付けて前記２つの画像間の第１対応点を決定し、
前記２つの画像のペア毎に、前記２つの画像の一方から抽出した前記骨格と前記２つの画像の他方から抽出した前記骨格とを対応付けて前記２つの画像間の第２対応点を決定し、
前記２つの画像間の前記第１対応点および前記第２対応点に基づいて、カメラの外部パラメータを推定する
処理をコンピュータに実行させるための推定プログラム。
（付記１１）
被写体を互いに異なる位置から撮影する複数のカメラの外部パラメータの推定プログラムを記録した記録媒体であって、
前記複数のカメラの各々から取得した画像から、撮影された空間の特徴点を抽出し、
前記複数のカメラの各々から取得した画像から人物の骨格を抽出し、
前記複数のカメラから取得した複数の画像のうちの２つの画像のペア毎に、前記２つの画像の一方から抽出した前記特徴点と前記２つの画像の他方から抽出した前記特徴点とを対応付けて前記２つの画像間の第１対応点を決定し、
前記２つの画像のペア毎に、前記２つの画像の一方から抽出した前記骨格と前記２つの画像の他方から抽出した前記骨格とを対応付けて前記２つの画像間の第２対応点を決定し、
前記２つの画像間の前記第１対応点および前記第２対応点に基づいて、カメラの外部パラメータを推定する
処理をコンピュータに実行させるための推定プログラムを記録した記録媒体。 The inventions described in the above embodiments will be organized and disclosed as an appendix as follows.
(Appendix 1)
In an estimation method that estimates the external parameters of multiple cameras that shoot subjects from different positions.
From the images acquired from each of the plurality of cameras, the feature points of the captured space are extracted.
The skeleton of a person is extracted from the images acquired from each of the plurality of cameras.
For each pair of two images among the plurality of images acquired from the plurality of cameras, the feature points extracted from one of the two images and the feature points extracted from the other of the two images are associated with each other. The first corresponding point between the two images is determined.
For each pair of the two images, the skeleton extracted from one of the two images and the skeleton extracted from the other of the two images are associated with each other to determine a second corresponding point between the two images. ,
An estimation method characterized in that an external parameter of a camera is estimated based on the first correspondence point and the second correspondence point between the two images.
(Appendix 2)
In the estimation method described in Appendix 1,
Based on the skeleton extracted from the image, person information including at least one of the joint angle and the length between the joints of the person is generated.
Using the person information, the skeleton extracted from one of the two images and the skeleton extracted from the other of the two images are associated with each other to determine the second corresponding point between the two images. A method of estimating external parameters of a camera, characterized by.
(Appendix 3)
In the estimation method described in Appendix 1 or Appendix 2,
From the images acquired from each of the plurality of cameras, additional feature points indicating the characteristics of a person other than the skeleton are extracted.
Using the additional feature points, the skeleton extracted from one of the two images and the skeleton extracted from the other of the two images are associated with each other to determine the second corresponding point between the two images. A method of estimating external parameters of a camera, characterized in that.
(Appendix 4)
In the estimation method described in Appendix 3,
The estimation method, wherein the additional feature point includes at least one of a facial feature point indicating a person's face and a clothing feature point indicating the clothing worn by the person.
(Appendix 5)
In the estimation method described in Appendix 4,
The clothing feature point is an estimation method provided on the clothing and showing at least one of identification information for identifying a person and shoes worn by the person.
(Appendix 6)
In the estimation method described in Appendix 5,
An estimation method characterized in that the identification information is either a uniform number or a person's name.
(Appendix 7)
In the estimation method described in any one of Appendix 3 to Appendix 6,
When the ball game is photographed by the plurality of cameras, the additional feature point is a method of estimating an external parameter of the camera, which includes a ball feature point indicating a ball.
(Appendix 8)
In the estimation method described in any one of Appendix 1 to Appendix 7,
The estimation method is characterized in that the external parameters are used to generate a free-viewpoint image of an arbitrary viewpoint.
(Appendix 9)
In an estimation device that estimates the external parameters of multiple cameras that shoot subjects from different positions.
A feature point extraction unit that extracts feature points in the captured space from images acquired from each of the plurality of cameras, and a feature point extraction unit.
A skeleton extraction unit that extracts the skeleton of a person from images acquired from each of the plurality of cameras, and a skeleton extraction unit.
For each pair of two images among the plurality of images acquired from the plurality of cameras, the feature points extracted from one of the two images and the feature points extracted from the other of the two images are associated with each other. A feature point mapping unit that determines the first corresponding point between the two images, and
For each pair of the two images, the skeleton extracted from one of the two images and the skeleton extracted from the other of the two images are associated with each other to determine a second corresponding point between the two images. Skeleton mapping part and
An estimation device including an estimation unit that estimates an external parameter of a camera based on the first correspondence point and the second correspondence point between the two images.
(Appendix 10)
In an external parameter estimation program for multiple cameras that shoot subjects from different positions
From the images acquired from each of the plurality of cameras, the feature points of the captured space are extracted.
The skeleton of a person is extracted from the images acquired from each of the plurality of cameras.
For each pair of two images among the plurality of images acquired from the plurality of cameras, the feature points extracted from one of the two images and the feature points extracted from the other of the two images are associated with each other. The first corresponding point between the two images is determined.
For each pair of the two images, the skeleton extracted from one of the two images and the skeleton extracted from the other of the two images are associated with each other to determine a second corresponding point between the two images. ,
An estimation program for causing a computer to perform a process of estimating an external parameter of a camera based on the first correspondence point and the second correspondence point between the two images.
(Appendix 11)
A recording medium that records programs for estimating external parameters of multiple cameras that shoot subjects from different positions.
From the images acquired from each of the plurality of cameras, the feature points of the captured space are extracted.
The skeleton of a person is extracted from the images acquired from each of the plurality of cameras.
For each pair of two images among the plurality of images acquired from the plurality of cameras, the feature points extracted from one of the two images and the feature points extracted from the other of the two images are associated with each other. The first corresponding point between the two images is determined.
For each pair of the two images, the skeleton extracted from one of the two images and the skeleton extracted from the other of the two images are associated with each other to determine a second corresponding point between the two images. ,
A recording medium on which an estimation program for causing a computer to execute a process of estimating an external parameter of a camera based on the first correspondence point and the second correspondence point between the two images is recorded.

以上の詳細な説明により、実施形態の特徴点および利点は明らかになるであろう。これは、特許請求の範囲がその精神および権利範囲を逸脱しない範囲で前述のような実施形態の特徴点および利点にまで及ぶことを意図するものである。また、当該技術分野において通常の知識を有する者であれば、あらゆる改良および変更に容易に想到できるはずである。したがって、発明性を有する実施形態の範囲を前述したものに限定する意図はなく、実施形態に開示された範囲に含まれる適当な改良物および均等物に拠ることも可能である。 The above detailed description will clarify the features and advantages of the embodiments. It is intended that the claims extend to the features and advantages of the embodiments as described above, without departing from their spirit and scope of rights. Also, anyone with ordinary knowledge in the art should be able to easily come up with any improvements or changes. Therefore, there is no intention to limit the scope of the embodiments having invention to those described above, and it is possible to rely on suitable improvements and equivalents included in the scope disclosed in the embodiments.

１０、１２、１４‥推定装置；２０‥自由視点映像生成部；１００、１０２、１０４‥プロセッサ；２００、２０２‥特徴点抽出部；３００、３０２‥骨格抽出部；３２０‥追加特徴抽出部；４００、４０２‥特徴点対応付け部；５００、５０２、５０４‥骨格対応付け部；５２０‥対応付け補助部；６００、６０２‥推定部；６２０‥外部パラメータ推定部；１０００‥メモリ；ＢＵＳ‥バス；ＣＡＭ‥カメラ；ＲＥＣ‥記録媒体；ＳＹＳ‥自由視点映像生成システム 10, 12, 14 ... Estimator; 20 ... Free viewpoint image generator; 100, 102, 104 ... Processor; 200, 202 ... Feature point extraction unit; 300, 302 ... Skeleton extraction unit; 320 ... Additional feature extraction unit; 400 , 402 ... feature point mapping unit; 500, 502, 504 ... skeleton mapping unit; 520 ... mapping auxiliary unit; 600, 602 ... estimation unit; 620 ... external parameter estimation unit; 1000 ... memory; BUS ... bus; CAM Camera; REC Recording medium; SYS Free-viewpoint video generation system

Claims

In an estimation method that estimates the external parameters of multiple cameras that shoot subjects from different positions.
From the images acquired from each of the plurality of cameras, the feature points of the captured space are extracted.
The skeleton of a person is extracted from the images acquired from each of the plurality of cameras.
For each pair of two images of the plurality of images acquired from the plurality of cameras, the feature points extracted from one of the two images and the feature points extracted from the other of the two images are associated with each other. The first corresponding point between the two images is determined.
For each pair of the two images, the skeleton extracted from one of the two images and the skeleton extracted from the other of the two images are associated with each other to determine a second corresponding point between the two images. ,
An estimation method characterized in that an external parameter of a camera is estimated based on the first correspondence point and the second correspondence point between the two images.

In the estimation method according to claim 1,
Based on the skeleton extracted from the image, person information including at least one of the joint angle and the length between the joints of the person is generated.
Using the person information, the skeleton extracted from one of the two images and the skeleton extracted from the other of the two images are associated with each other to determine the second corresponding point between the two images. A method of estimating external parameters of a camera, characterized by.

In the estimation method according to claim 1 or 2.
From the images acquired from each of the plurality of cameras, additional feature points indicating the characteristics of a person other than the skeleton are extracted.
Using the additional feature points, the skeleton extracted from one of the two images and the skeleton extracted from the other of the two images are associated with each other to determine the second corresponding point between the two images. A method of estimating external parameters of a camera, characterized in that.

In the estimation method according to claim 3,
The estimation method, wherein the additional feature point includes at least one of a facial feature point indicating a person's face and a clothing feature point indicating the clothing worn by the person.

In the estimation method according to claim 4,
The clothing feature point is an estimation method provided on the clothing and showing at least one of identification information for identifying a person and shoes worn by the person.

In the estimation method according to claim 5,
An estimation method characterized in that the identification information is either a uniform number or a person's name.

In the estimation method according to any one of claims 3 to 6,
When the ball game is photographed by the plurality of cameras, the additional feature point is a method of estimating an external parameter of the camera, which includes a ball feature point indicating a ball.

In the estimation method according to any one of claims 1 to 7.
The estimation method is characterized in that the external parameters are used to generate a free-viewpoint image of an arbitrary viewpoint.

In an estimation device that estimates the external parameters of multiple cameras that shoot subjects from different positions.
A feature point extraction unit that extracts feature points in the captured space from images acquired from each of the plurality of cameras, and a feature point extraction unit.
A skeleton extraction unit that extracts the skeleton of a person from images acquired from each of the plurality of cameras, and a skeleton extraction unit.
For each pair of two images among the plurality of images acquired from the plurality of cameras, the feature points extracted from one of the two images and the feature points extracted from the other of the two images are associated with each other. A feature point mapping unit that determines the first corresponding point between the two images, and
For each pair of the two images, the skeleton extracted from one of the two images and the skeleton extracted from the other of the two images are associated with each other to determine a second corresponding point between the two images. Skeleton mapping part and
An estimation device including an estimation unit that estimates an external parameter of a camera based on the first correspondence point and the second correspondence point between the two images.

In an external parameter estimation program for multiple cameras that shoot subjects from different positions
From the images acquired from each of the plurality of cameras, the feature points of the captured space are extracted.
The skeleton of a person is extracted from the images acquired from each of the plurality of cameras.
For each pair of two images among the plurality of images acquired from the plurality of cameras, the feature points extracted from one of the two images and the feature points extracted from the other of the two images are associated with each other. The first corresponding point between the two images is determined.
For each pair of the two images, the skeleton extracted from one of the two images and the skeleton extracted from the other of the two images are associated with each other to determine a second corresponding point between the two images. ,
An estimation program for causing a computer to perform a process of estimating an external parameter of a camera based on the first correspondence point and the second correspondence point between the two images.