JP5751996B2

JP5751996B2 - Subject three-dimensional region estimation method and program

Info

Publication number: JP5751996B2
Application number: JP2011196928A
Authority: JP
Inventors: 浩嗣三功; 内藤　整; 整内藤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2011-09-09
Filing date: 2011-09-09
Publication date: 2015-07-22
Anticipated expiration: 2031-09-09
Also published as: JP2013058132A

Description

本発明は、自由視点映像の合成において最も重要な処理である、被写体の３次位置を推定する方法およびプログラムに関する。 The present invention relates to a method and a program for estimating a tertiary position of a subject, which is the most important process in the synthesis of a free viewpoint video.

大空間を対象に、カメラの強校正を前提とせず、画像情報のみを用いて任意視点画像を合成する手法が提案された。しかしながら、サッカー等の屋外大空間で実施されるスポーツ映像を対象に、選手の３次元位置を高精度に推定する従来技術は、単一カメラ情報のみを用いるものがほとんどである。 A method for synthesizing an arbitrary viewpoint image using only image information without using strong camera calibration as a target has been proposed. However, most of the conventional techniques for estimating a player's three-dimensional position with high accuracy for sports images performed in a large outdoor space such as soccer use only single camera information.

また、非特許文献１は、一般状態空間モデルにおける高次元の状態ベクトルの推定手法として、実装の容易さから様々な応用分野へ急速に普及しつつある粒子フィルタの概説を行っている。 Non-Patent Document 1 outlines a particle filter that is rapidly spreading to various application fields from the ease of implementation as a high-dimensional state vector estimation method in a general state space model.

樋口知之:“粒子フィルタ”, 電子情報通信学会誌,Vol.88, No. 12, pp. 989-994 (2005)Tomoyuki Higuchi: “Particle Filter”, IEICE Journal, Vol.88, No. 12, pp. 989-994 (2005)

しかしながら、単一カメラ情報のみを用いる従来技術では、被写体どうしが重なるオクルージョン領域の発生により、推定精度が低下するという課題があった。 However, the conventional technique using only single camera information has a problem that the estimation accuracy is lowered due to the occurrence of an occlusion region where subjects overlap each other.

以上の問題点を踏まえ、本発明では、単一のカメラ映像だけでなく、同期撮影された複数のカメラ映像を統合的に扱うことで、被写体３次元位置推定の精度を向上させる被写体３次元領域推定方法およびプログラムを提供することを目的とする。 In view of the above problems, the present invention integrates not only a single camera image but also a plurality of camera images taken synchronously, thereby improving the accuracy of the subject 3D position estimation. An object is to provide an estimation method and a program.

上記目的を実現するため本発明による被写体の３次元世界座標を推定する方法は、初期フレームにおける被写体の３次元世界座標と、複数のカメラで撮影された複数フレームの、前記被写体を含むカメラ画像とから、後続フレームにおける被写体の３次元世界座標を推定する方法であって、前記初期フレームにおける被写体の３次元世界座標を特定平面上のＸＹ座標に射影するステップと、前記後続フレームにおける前記特定平面上の被写体のＸＹ座標を時間軸情報に基づき推定するステップと、前記特定平面上の被写体のＸＹ座標を評価するステップと、を含み、前記特定平面上のＸＹ座標に射影するステップは、前記被写体の３次元世界座標を、前記被写体が存在するフィールド面を真上から俯瞰した場合に得られる俯瞰平面を前記特定平面として該俯瞰平面上のＸＹ座標に射影し、前記被写体を俯瞰平面上に存在する複数の粒子として近似し、前記被写体のＸＹ座標を評価するステップは、前記俯瞰平面上の被写体のＸＹ座標が被写体らしいかどうかを示す尤度を、前記被写体が前記粒子として近似された各カメラ画像から算出し、全カメラ画像から算出された被写体ごとの該尤度の総和から、前記俯瞰平面上の当該被写体のＸＹ座標が被写体のものらしいかどうかを被写体ごとに評価し、前記後続フレームにおける被写体の３次元世界座標を、被写体のものらしいと評価されたＸＹ座標でマップ化して推定する。 In order to achieve the above object, a method for estimating the three-dimensional world coordinates of a subject according to the present invention includes a three-dimensional world coordinate of a subject in an initial frame, a plurality of frames taken by a plurality of cameras, and a camera image including the subject. A method for estimating a three-dimensional world coordinate of a subject in a subsequent frame , the step of projecting the three-dimensional world coordinate of the subject in the initial frame to an XY coordinate on a specific plane; and on the specific plane in the subsequent frame estimating on the basis of XY coordinates of the object on the time axis information, the steps to evaluate the XY coordinates of an object on a specific plane, only including the step of projecting the XY coordinates on the specific plane, the The 3D world coordinates of the subject are specified as the bird's-eye view plane obtained when the field surface where the subject is present is seen from directly above. The step of projecting onto the XY coordinates on the overhead view plane as a plane, approximating the subject as a plurality of particles existing on the overhead view plane, and evaluating the XY coordinates of the subject is as follows: Likelihood indicating whether or not it seems to be a subject is calculated from each camera image in which the subject is approximated as the particle, and the subject on the overhead plane is calculated from the sum of the likelihood for each subject calculated from all camera images. The XY coordinates of the subject are evaluated for each subject, and the three-dimensional world coordinates of the subject in the subsequent frame are estimated by mapping with the XY coordinates evaluated to be the subject .

また、前記時間軸情報に基づき推定するステップは、前記被写体を形成する粒子に関して、俯瞰平面上の移動方向をフレーム間のＸＹ座標差分に基づく速度により定式化することも好ましい。 In the step of estimating based on the time axis information, it is also preferable to formulate the moving direction on the bird's eye plane with respect to the particles forming the subject based on the speed based on the XY coordinate difference between frames.

また、前記被写体のＸＹ座標を評価するステップは、各カメラの前記俯瞰平面に対する平面射影行列を推定するサブステップと、前記平面射影行列により射影された各カメラ画像から、被写体を形成する各粒子の尤度を算出するサブステップと、前記被写体を形成する各粒子の尤度から、前記後続フレームで被写体を形成する粒子群のＸＹ座標を推定するサブステップとを含むことも好ましい。
を含むこと The step of evaluating the XY coordinates of the subject includes a sub-step of estimating a plane projection matrix with respect to the overhead plane of each camera, and each camera image projected by the plane projection matrix of each particle forming the subject. a sub-step of calculating the likelihood, the likelihoods of the particles forming the subject, it is also preferred to include a sub-step of estimating the XY coordinates of the particles forming the subject with the subsequent frame.
Including

また、前記カメラの前記俯瞰平面に対する平面射影行列を推定するサブステップは、前記俯瞰平面上のＸＹ座標が既知の特徴点について、前記カメラ画像中で観測される画素を４点以上与え、最小２乗法により前記俯瞰平面に対する平面射影行列を推定することも好ましい。 Further, the sub-step of estimating a plane projection matrix for the overhead plane of the camera gives four or more pixels observed in the camera image for feature points whose XY coordinates on the overhead plane are known, and a minimum of 2 It is also preferable to estimate a plane projection matrix with respect to the overhead view plane by multiplication.

また、前記各粒子の尤度を算出するサブステップは、前記粒子のＸＹ座標を前記俯瞰平面に対する平面射影行列により各カメラ画像へ射影し、前記各カメラ画像において、前記粒子のＸＹ座標が射影された画素を底辺の中心とする長方形領域を設定し、前記長方形領域内部に含まれる各画素の被写体らしさを算出し、前記被写体らしさの平均から前記粒子の尤度を算出することも好ましい。 The sub-step of calculating the likelihood of each particle projects the XY coordinates of the particles onto each camera image using a plane projection matrix with respect to the overhead view plane , and the XY coordinates of the particles are projected on each camera image. It is also preferable to set a rectangular area with the bottom pixel as the center of the base, calculate the subjectivity of each pixel included in the rectangular area, and calculate the likelihood of the particle from the average of the subjectness.

また、前記各画素の被写体らしさは、現時刻のフレーム直前までに構築される動的背景モデルに基づく尤度と、被写体のユニフォーム色に基づく情報と、背景パターン色に基づく情報とから算出されることも好ましい。 Further, the subjectivity of each pixel is calculated from the likelihood based on the dynamic background model constructed immediately before the frame at the current time, information based on the uniform color of the subject, and information based on the background pattern color. It is also preferable.

また、前記後続フレームで被写体を形成する粒子群のＸＹ座標を推定するサブステップは、一定の尤度以下の粒子を消滅させ、一定尤度より大きい粒子近傍に、新たな粒子を再配置する処理を含むことも好ましい。 Also, sub-step of estimating the XY coordinates of the particles forming the subject with the subsequent frame, extinguished particles below a certain likelihood, the larger particles near than a certain likelihood, to reposition the new grain processing It is also preferable to contain .

上記目的を実現するため本発明による被写体の３次元世界座標を推定することを特徴とするプログラムは、初期フレームにおける被写体の３次元世界座標と、複数のカメラで撮影された複数フレームの、前記被写体を含むカメラ画像とから、後続フレームにおける被写体の３次元世界座標を推定するためのコンピュータを、前記初期フレームにおける被写体の３次元世界座標を特定平面上のＸＹ座標に射影する手段と、前記後続フレームにおける前記特定平面上の被写体のＸＹ座標を時間軸情報に基づき推定する手段と、前記特定平面上の被写体のＸＹ座標を評価する手段として機能させ、前記特定平面上のＸＹ座標に射影する手段は、前記被写体の３次元世界座標を、前記被写体が存在するフィールド面を真上から俯瞰した場合に得られる俯瞰平面を前記特定平面として該俯瞰平面上のＸＹ座標に射影し、前記被写体を俯瞰平面上に存在する複数の粒子として近似し、前記被写体のＸＹ座標を評価する手段は、前記俯瞰平面上の被写体のＸＹ座標が被写体らしいかどうかを示す尤度を、前記被写体が前記粒子として近似された各カメラ画像から算出し、全カメラ画像から算出された被写体ごとの該尤度の総和から、前記俯瞰平面上の当該被写体のＸＹ座標が被写体のものらしいかどうかを被写体ごとに評価し、前記後続フレームにおける被写体の３次元世界座標を、被写体のものらしいと評価されたＸＹ座標でマップ化して推定する被写体の３次元世界座標を推定する。 A program characterized by estimating a three-dimensional world coordinates of an object according to the present invention for achieving the above object, a three-dimensional world coordinates of the object in the initial frame, a plurality of frames taken by a plurality of cameras, the subject A computer for estimating the three-dimensional world coordinates of the subject in the subsequent frame from the camera image including the image, and means for projecting the three-dimensional world coordinates of the subject in the initial frame to XY coordinates on a specific plane; and the subsequent frame said means for estimating, based the XY coordinates of an object on a particular plane in the time axis information, to function as a commentary worth means XY coordinates of the object on the particular plane, means for projecting the XY coordinates on the particular plane in Is obtained when the three-dimensional world coordinates of the subject are viewed from directly above the field surface on which the subject exists. Projecting onto the XY coordinates on the overhead plane using the viewpoint plane as the specific plane, approximating the subject as a plurality of particles existing on the overhead plane, and evaluating the XY coordinates of the subject on the overhead plane A likelihood indicating whether or not the XY coordinates of the subject are likely to be a subject is calculated from each camera image in which the subject is approximated as the particle, and the overhead is calculated from the sum of the likelihood for each subject calculated from all camera images. Whether the XY coordinates of the subject on the plane are likely to be the subject is evaluated for each subject, and the three-dimensional world coordinates of the subject in the subsequent frame are mapped and estimated with the XY coordinates evaluated as likely to be the subject. Estimate the 3D world coordinates of the subject.

本発明により、被写体３次元領域の推定精度を大幅に改善することができ、自由視点映像の合成画質を向上することができる。 According to the present invention, it is possible to greatly improve the estimation accuracy of the subject three-dimensional region and improve the composite image quality of the free viewpoint video.

本発明によるフローチャートを示す。2 shows a flowchart according to the invention. カメラ画像における長方形領域を示す。A rectangular area in a camera image is shown. 初期フレームでのカメラ画像と被写体ＸＹ座標マップ画像を示す。The camera image and subject XY coordinate map image in an initial frame are shown. フレーム３０でのカメラ画像と被写体ＸＹ座標マップ画像を示す。A camera image and a subject XY coordinate map image in the frame 30 are shown.

本発明を実施するための最良の実施形態について、以下では図面を用いて詳細に説明する。提案手法は、時間軸情報と複数カメラ間の情報を統合的に扱うことを特徴とする。具体的には、ある特定フレームにおいて、各選手の３次元位置を真上から見下ろした場合に得られるＸＹ座標でマップ化し、後続フレームでは、同マップのフレーム間情報に基づく動き予測と、カメラ間での対応を考慮した選手らしさの評価（候補点をカメラ視点に投影し、選手かどうかを評価）によってマップを自動生成する。以下、本フローチャートに基づいて説明する。なお、本実施例では、非固定ズームカメラの３次元位置は変化しないことを前提とする。 The best mode for carrying out the present invention will be described in detail below with reference to the drawings. The proposed method is characterized by handling time axis information and information between multiple cameras in an integrated manner. Specifically, in a specific frame, map with the XY coordinates obtained when the three-dimensional position of each player is looked down from directly above, and in the subsequent frame, motion prediction based on inter-frame information of the map, and between cameras A map is automatically generated by evaluating the player's characteristics (projecting candidate points to the camera viewpoint and evaluating whether or not the player is a player) in consideration of the correspondence in Hereinafter, description will be given based on this flowchart. In this embodiment, it is assumed that the three-dimensional position of the non-fixed zoom camera does not change.

開始：複数フレーム分の多視点映像を入力する。つまり、２台以上のカメラで撮影された複数フレームのカメラ画像が入力される。 Start: Input multi-view video for multiple frames. That is, camera images of a plurality of frames taken by two or more cameras are input.

ステップ１：各カメラのフィールド面に対する平面射影行列を推定する。例えば、フィールド上ライン等、予め３次元座標が既知の特徴点について、カメラ画像中での画素座標を複数組与えることで、平面射影行列Ｈｃを推定することができる（参考：特願２０１０−０３２１３６）。また、空間内のユークリッド座標系とカメラ座標系との射影関係を推定するカメラの強校正を予め行って、平面射影行列Ｈｃを事前に求めておくこともできる。 Step 1: Estimate a planar projection matrix for the field plane of each camera. For example, the plane projection matrix Hc can be estimated by giving a plurality of pixel coordinates in the camera image for feature points whose three-dimensional coordinates are known in advance, such as a line on the field (reference: Japanese Patent Application No. 2010-032136). ). It is also possible to perform a strong camera calibration to estimate the projection relationship between the Euclidean coordinate system in the space and the camera coordinate system in advance, and to obtain the planar projection matrix Hc in advance.

平面射影行列Ｈｃは、３行３列の行列であり、対象フィールドの２次元座標中の点（ｕ、ｖ）をカメラ画素座標（ｕ’、ｖ’）に射影する。つまり、
s（ｕ’、ｖ’、１）^Ｔ＝Ｈｃ（ｕ、ｖ、１）^Ｔ
である。なお、ｓは規格化のためのスカラーである。 The planar projection matrix Hc is a 3 × 3 matrix, and projects the point (u, v) in the two-dimensional coordinates of the target field to the camera pixel coordinates (u ′, v ′). That means
s (u ′, v ′, 1) ^T = Hc (u, v, 1) ^T
It is. Note that s is a scalar for normalization.

ステップ２：各被写体を複数の粒子で近似する。３次元世界座標を、各被写体が存在するフィールド面を真上から俯瞰した場合に得られる俯瞰平面上のＸＹ座標に射影し、各被写体を俯瞰平面上に存在する複数の粒子として近似する。この被写体は、Ｎ個の粒子から構成されるものとして、時刻ｔにおける被写体の状態を、多数の離散的な粒子群として定式化する。時刻ｔの被写体は、
のＮ個のベクトルによって表される。カメラ画像中にＭ個の被写体があれば、Ｎ×Ｍ個のベクトルでカメラ画像中の全被写体が複数の粒子で近似される。なお、このベクトルの初期値として、最低２フレーム分（ｔ＝０、１）が事前に入力され、以下のステップで、これ以降のフレームにおける被写体を形成する粒子の位置を推定し、被写体のＸＹ座標での位置を推定する。 Step 2: Approximate each subject with a plurality of particles. The three-dimensional world coordinates are projected onto XY coordinates on the bird's-eye view plane obtained when the field surface on which each subject exists is viewed from directly above, and each subject is approximated as a plurality of particles existing on the bird's-eye view plane. The subject is composed of N particles, and the state of the subject at time t is formulated as a large number of discrete particle groups. The subject at time t is
Of N vectors. If there are M subjects in the camera image, all subjects in the camera image are approximated by a plurality of particles with N × M vectors. As the initial value of this vector, at least two frames (t = 0, 1) are input in advance, and in the following steps, the positions of the particles forming the subject in the subsequent frames are estimated, and the XY of the subject is determined. Estimate the position in coordinates.

ステップ３：粒子の状態遷移モデルを定式化する。各被写体を形成する粒子に関して、俯瞰平面上の移動方向をフレーム間のＸＹ座標差分に基づく速度（Ｕ，Ｖ）により定式化する。 Step 3: Formulate a particle state transition model. For the particles forming each subject, the moving direction on the overhead view plane is formulated by the velocity (U, V) based on the XY coordinate difference between frames.

フレームｔで粒子の速度を表す移動ベクトル（Ｕ_ｔ，Ｖ_ｔ）は、
（Ｕ_ｔ，Ｖ_ｔ）＝（Ｘ_ｔ−Ｘ_ｔ−１，Ｙ_ｔ−Ｙ_ｔ−１）
と定式化される。 The movement vector (U _t , V _t ) representing the velocity of the particles at frame t is
(U _t , V _t ) = (X _t −X _t−1 , Y _t −Y _t−1 )
Is formulated.

この式により、２フレーム分（ｔ＝０、１）の粒子の位置が入力された場合、（Ｕ_１，Ｖ_１）が求まり、この速度を用いてｔ＝２での粒子の位置（Ｕ_２，Ｖ_２）が求まる。ｔ＝３以上も同様にして粒子の位置を求めることができる。 When the position of the particle for two frames (t = 0, 1) is input according to this equation, (U ₁ , V ₁ ) is obtained, and the position of the particle at t = 2 (U ₂ ) is obtained using this velocity. , V ₂ ). The position of the particles can be obtained in the same manner for t = 3 or more.

ステップ４：粒子の尤度を算出する。被写体のＸＹ座標での位置を推定するフレームをｔとする。ステップ３の式により、初期フレーム（ｔ＝０）の粒子のＸＹ座標（Ｘ_０ ^（ｎ）、Ｙ_０ ^（ｎ））から、フレームｔでの粒子のＸＹ座標（Ｘ_ｔ ^（ｎ）、Ｙ_ｔ ^（ｎ））が求まる。求めた（Ｘ_ｔ ^（ｎ）、Ｙ_ｔ ^（ｎ））を、平面射影行列Ｈｃにより各カメラ画像中の画素（ｘ_ｔ，ｃ ^（ｎ）、ｙ_ｔ，ｃ ^（ｎ））に射影する。
s（ｘ_ｔ，ｃ ^（ｎ）、ｙ_ｔ，ｃ ^（ｎ）、１）^Ｔ＝Ｈｃ（Ｘ_ｔ ^（ｎ）、Ｙ_ｔ ^（ｎ）、１）^Ｔ
なお、ｎ＝１，・・・，Ｎであり、ｃは、カメラを示す。例えばカメラがＣ台あるとき、ｃ＝１，・・・，Ｃとなる。 Step 4: Calculate the likelihood of particles. Let t be the frame for estimating the position of the subject in the XY coordinates. From the XY coordinates (X ₀ ⁽ⁿ⁾ , Y ₀ ⁽ⁿ⁾ ) of the particles in the initial frame (t = 0), the XY coordinates (X _t ⁽ⁿ⁾ , Y _t of the particles in the frame t are obtained from the equation of Step 3. ^(N) ) is obtained. The obtained (X _t ⁽ⁿ⁾ , Y _t ⁽ⁿ⁾ ) is projected onto the pixel (x _{t, c} ⁽ⁿ⁾ , y _{t, c} ⁽ⁿ⁾ ) in each camera image by the plane projection matrix Hc.
s (x _{t, c} ⁽ⁿ⁾ , y _{t, c} ⁽ⁿ⁾ , 1) ^T = Hc (X _t ⁽ⁿ⁾ , Y _t ⁽ⁿ⁾ , 1) ^T
Note that n = 1,..., N, and c indicates a camera. For example, when there are C cameras, c = 1,.

次に、カメラ画像において、画素（ｘ_ｔ，ｃ ^（ｎ）、ｙ_ｔ，ｃ ^（ｎ））を底辺の中心点とする長方形領域を設定する。例えば、長方形の幅をｗ、高さをｖとすると画素ｘｙに対して、（ｘ−ｗ／２、ｙ）、（ｘ＋ｗ／２、ｙ）、（ｘ−ｗ／２、ｙ＋ｖ）、（ｘ＋ｗ／２、ｙ＋ｖ）の４点の長方形領域が設定される。図２は、画素ｘｙに対して設定された長方形領域を示す。 Next, in the camera image, a rectangular region having the pixel (x _{t, c} ⁽ⁿ⁾ , y _{t, c} ⁽ⁿ⁾ ) as the center point of the base is set. For example, assuming that the width of the rectangle is w and the height is v, for the pixel xy, (x−w / 2, y), (x + w / 2, y), (x−w / 2, y + v), (x + w) A rectangular area of 4 points of / 2, y + v) is set. FIG. 2 shows a rectangular area set for the pixel xy.

次に、この長方形領域に含まれる各画素について、被写体らしさを示す尤度を算出し、算出された尤度の平均値を、画素ｘｙでの粒子の尤度（Φ（ｘｙ））とする。Φ（ｘｙ）は、
Φ（ｘｙ）＝（１／カメラ台数）×（１／長方形内の画素数）ΣΣφｃ（ｘ’、ｙ’）
と表せる。
ここで、φｃ（ｘ’、ｙ’）は、カメラｃによるカメラ画像での長方形領域内の画素（ｘ’、ｙ’）での被写体らしさを示す尤度であり、最初のΣは全カメラについての総和であり、次のΣは長方形内のすべて画素についての総和である。 Next, for each pixel included in the rectangular area, the likelihood indicating the subjectness is calculated, and the average value of the calculated likelihood is defined as the particle likelihood (Φ (xy)) at the pixel xy. Φ (xy) is
Φ (xy) = (1 / number of cameras) × (1 / number of pixels in rectangle) ΣΣφc (x ′, y ′)
It can be expressed.
Here, φc (x ′, y ′) is a likelihood indicating the subjectness at the pixel (x ′, y ′) in the rectangular area in the camera image by the camera c, and the first Σ is for all the cameras. The next Σ is the sum for all pixels in the rectangle.

なお、各画素での被写体らしさを示す尤度は、時刻ｔ−１までに構築される動的背景モデルをもとに尤度を算出する。なお、動的背景モデルは、画素ごとに、複数フレーム分の平均、分散をもとにガウス分布を仮定することで定式化される。 Note that the likelihood indicating the likelihood of the subject in each pixel is calculated based on the dynamic background model built up to time t-1. The dynamic background model is formulated by assuming a Gaussian distribution based on the average and variance of a plurality of frames for each pixel.

さらに、被写体の人物のユニフォームの色が事前に分かっている場合、長方形領域の被写体の色とこのユニフォームの色を比較して、差分情報を被写体らしさを示す尤度に加えることができる。さらに、背景の色が事前に分かっている場合、長方形領域の背景の色とこの背景の色を比較して、差分情報を被写体らしさを示す尤度に加えることができる。 Furthermore, when the color of the uniform of the person who is the subject is known in advance, the color of the subject in the rectangular area is compared with the color of the uniform, and the difference information can be added to the likelihood indicating the likelihood of the subject. Further, when the background color is known in advance, the background color of the rectangular area is compared with the color of the background, and the difference information can be added to the likelihood indicating the subjectness.

ステップ５：一定尤度以上の粒子近傍に再配置する。上記ステップで画素（ｘ_ｔ ^（ｎ）、ｙ_ｔ ^（ｎ））に対して、粒子の尤度が算出される。ここで、尤度が一定以下の粒子を消滅させ、消滅させた粒子数と同じ数の粒子を、尤度が一定より大きい粒子の周りに再配置する。例えば、画素（ｘ_ｔ ^（ｎ）、ｙ_ｔ ^（ｎ））において、ｎ＝１から１００の尤度が一定以下の場合、（ｘ_ｔ ^（１）、ｙ_ｔ ^（１））から（ｘ_ｔ ^{（１００）}、ｙ_ｔ ^{（１００）}）の粒子は、消滅させられ、（ｘ_ｔ ^{（１０１）}、ｙ_ｔ ^{（１０１）}）から（ｘ_ｔ ^（Ｎ）、ｙ_ｔ ^（Ｎ））の近傍に新たな（ｘ_ｔ ^（１）、ｙ_ｔ ^（１））から（ｘ_ｔ ^{（１００）}、ｙ_ｔ ^{（１００）}）の新たな粒子が再配置される。 Step 5: Rearrange in the vicinity of particles having a certain likelihood or more. In the above step, the likelihood of the particle is calculated for the pixel (x _t ⁽ⁿ⁾ , y _t ⁽ⁿ⁾ ). Here, particles having a likelihood less than or equal to a certain amount are eliminated, and the same number of particles as the number of particles that have been eliminated are rearranged around particles having a likelihood that is greater than a certain value. For example, in the pixel (x _t ⁽ⁿ⁾ , y _t ⁽ⁿ⁾ ), when the likelihood of n = 1 to 100 is below a certain value, (x _t ⁽¹⁾ , y _t ⁽¹⁾ ) to (x _t ⁽ ¹⁾ ¹⁰⁰⁾ , y _t ⁽¹⁰⁰⁾ ) particles are extinguished and new (x _t ^(N) , y _t ^(N) ) in the vicinity of (x _t ⁽¹⁰¹⁾ , y _t ⁽¹⁰¹⁾ ) ( x _t ^_(1), from ^{_{^{y t (1)) (x}}} t (100), a new particle of ^{y t (100))} are rearranged.

以上のようにして、フレームｔにおいて、被写体を近似する複数の粒子が推定され、この複数の粒子から、俯瞰平面上でのフレームｔにおける被写体のＸＹ座標が推定される。 As described above, a plurality of particles that approximate the subject are estimated in the frame t, and the XY coordinates of the subject in the frame t on the overhead view plane are estimated from the plurality of particles.

次に、本発明の実施例を示す。図３は、初期フレームでのカメラ画像と被写体ＸＹ座標マップ画像を示す。図３（ａ）は、カメラ１の画像を示し、図３（ｂ）は、カメラ２の画像を示し、図３（ｃ）は、被写体ＸＹ座標マップ画像を示す。図３（ｃ）の各点が、被写体（本実施例では選手）を示す。この被写体を構成する複数の粒子のＸＹ座標が初期値として入力される。 Next, examples of the present invention will be described. FIG. 3 shows a camera image and a subject XY coordinate map image in the initial frame. 3A shows an image of the camera 1, FIG. 3B shows an image of the camera 2, and FIG. 3C shows an object XY coordinate map image. Each point in FIG. 3C indicates a subject (a player in this embodiment). The XY coordinates of a plurality of particles constituting the subject are input as initial values.

図４は、フレーム３０でのカメラ画像と被写体ＸＹ座標マップ画像を示す。図４（ａ）は、カメラ１の画像を示し、図４（ｂ）は、カメラ２の画像を示し、図４（ｃ）は、被写体ＸＹ座標マップ画像を示す。それぞれ、ｔ＝３０での画像を示す。本実施例は、１秒間３０フレームであるため、図４のカメラ画像は、図３のカメラ画像から１秒後の画像である。図４（ｃ）が、フレーム３０において、本発明の方法により推定された被写体の粒子の画像である。 FIG. 4 shows a camera image and a subject XY coordinate map image in the frame 30. 4A shows an image of the camera 1, FIG. 4B shows an image of the camera 2, and FIG. 4C shows a subject XY coordinate map image. Each shows an image at t = 30. Since the present embodiment has 30 frames per second, the camera image in FIG. 4 is an image one second after the camera image in FIG. FIG. 4C is an image of a subject particle estimated in the frame 30 by the method of the present invention.

以上のように、本発明は、ある特定フレームにおいて、各選手の３次元位置を真上から見下ろした場合に得られるＸＹ座標でマップ化し、後続フレームで、同マップのフレーム間情報に基づく動き予測と、カメラ間での対応を考慮した選手らしさの評価によって被写体領域を推定する。 As described above, the present invention maps the XY coordinates obtained when looking down on the three-dimensional position of each player from directly above in a specific frame, and performs motion prediction based on the inter-frame information of the map in subsequent frames. Then, the subject area is estimated by evaluating the player likeness considering the correspondence between the cameras.

また、以上述べた実施形態は全て本発明を例示的に示すものであって限定的に示すものではなく、本発明は他の種々の変形態様および変更態様で実施することができる。従って本発明の範囲は特許請求の範囲およびその均等範囲によってのみ規定されるものである。 Moreover, all the embodiments described above are illustrative of the present invention and are not intended to limit the present invention, and the present invention can be implemented in other various modifications and changes. Therefore, the scope of the present invention is defined only by the claims and their equivalents.

Claims

A method for estimating a three-dimensional world coordinate of a subject in a subsequent frame from a three-dimensional world coordinate of the subject in an initial frame and a camera image including the subject of a plurality of frames taken by a plurality of cameras,
Projecting the three-dimensional world coordinates of the subject in the initial frame to XY coordinates on a specific plane;
Estimating XY coordinates of the subject on the specific plane in the subsequent frame based on time axis information;
A step of evaluating the XY coordinates of the object on the particular plane,
Only including,
The step of projecting to the XY coordinates on the specific plane includes the three-dimensional world coordinates of the subject on the bird's-eye view plane with the bird's-eye view plane obtained when the field surface on which the subject exists is viewed from directly above as the specific plane XY coordinates of the object, approximating the subject as a plurality of particles present on the overhead view plane,
The step of evaluating the XY coordinates of the subject calculates a likelihood indicating whether the XY coordinates of the subject on the overhead view plane are likely to be a subject from each camera image in which the subject is approximated as the particles, From the sum of the likelihood for each subject calculated from the above, it is evaluated for each subject whether or not the XY coordinates of the subject on the overhead view plane are those of the subject, and the three-dimensional world coordinates of the subject in the subsequent frame are A method of estimating the three-dimensional world coordinates of a subject, characterized by mapping and estimating with XY coordinates evaluated as likely to be the subject.

The step of estimating based on the time axis information includes:
The method according to claim 1 , wherein the moving direction on the bird's eye plane is formulated by a speed based on an XY coordinate difference between frames for the particles forming the subject.

The step of evaluating the XY coordinates of the subject includes
A sub-step of estimating a plane projection matrix for the overhead plane of each camera;
A sub-step of calculating the likelihood of each particle forming the subject from each camera image projected by the planar projection matrix ;
From the likelihood of each particle forming the subject, the sub-steps of estimating the XY coordinates of the particles forming the subject with the subsequent frame,
The method according to claim 1 or 2 , comprising:

The sub-step of estimating a plane projection matrix for the overhead plane of the camera is as follows:
The feature point whose XY coordinates on the overhead plane are known has four or more pixels observed in the camera image, and a plane projection matrix for the overhead plane is estimated by a least square method. 3. The method according to 3 .

The sub-step of calculating the likelihood of each particle includes
Projecting the XY coordinates of the particles onto each camera image by a plane projection matrix for the overhead plane,
In each of the camera images, a rectangular region whose center is the pixel on which the XY coordinates of the particles are projected is set.
Calculating the subjectivity of each pixel contained within the rectangular area;
The method according to claim 3 or 4 and calculates the likelihood of the particles from an average of the object ness.

The subjectivity of each pixel is
Likelihood based on a dynamic background model built up to just before the current frame,
Information based on the subject's uniform color,
Information based on background pattern color,
The method according to claim 5 , wherein the method is calculated from:

The sub-step of estimating the XY coordinates of the particle group forming the subject in the subsequent frame is as follows:
The method according to any one of claims 3 to 6 , further comprising a process of eliminating particles having a certain likelihood or less and rearranging new particles in the vicinity of the particles having a certain likelihood.

A computer for estimating the three-dimensional world coordinates of the subject in the subsequent frame from the three-dimensional world coordinates of the subject in the initial frame and the camera images including the subject of the plurality of frames taken by a plurality of cameras.
Means for projecting the three-dimensional world coordinates of the subject in the initial frame to XY coordinates on a specific plane;
Means for estimating XY coordinates of the subject on the specific plane in the subsequent frame based on time axis information;
It means for evaluating the XY coordinates of the object on the particular plane,
To function ,
The means for projecting onto the XY coordinates on the specific plane has the three-dimensional world coordinates of the subject as the specific plane as an overhead plane obtained when the field plane on which the subject exists is viewed from directly above the overhead plane. XY coordinates of the object, approximating the subject as a plurality of particles present on the overhead view plane,
The means for evaluating the XY coordinates of the subject calculates a likelihood indicating whether the XY coordinates of the subject on the overhead view plane are likely to be a subject from each camera image in which the subject is approximated as the particle, From the sum of the likelihood for each subject calculated from the above, it is evaluated for each subject whether or not the XY coordinates of the subject on the overhead view plane are those of the subject, and the three-dimensional world coordinates of the subject in the subsequent frame are A program characterized by estimating a three-dimensional world coordinate of a subject to be estimated by mapping with XY coordinates evaluated as being likely to be the subject.