JP6525453B2

JP6525453B2 - Object position estimation system and program thereof

Info

Publication number: JP6525453B2
Application number: JP2014238775A
Authority: JP
Inventors: 高橋　正樹; 正樹高橋; 山内　結子; 結子山内
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2014-11-26
Filing date: 2014-11-26
Publication date: 2019-06-05
Anticipated expiration: 2034-11-26
Also published as: JP2016099941A

Description

本発明は、複数のカメラによる映像を並列処理して、映像内の特定のオブジェクトを追跡可能とするオブジェクト位置推定システム、及びそのプログラムに関する。 The present invention relates to an object position estimation system capable of tracking a specific object in an image by parallel processing of images from a plurality of cameras, and a program thereof.

映像内の特定のオブジェクトを追跡可能とするオブジェクト位置推定システムは、例えば、スポーツ映像を解析し、自動審判、スポーツ番組の放送、スポーツデータ生成・配信、コーチングなどのサービスに利用することができる。また、監視カメラ映像解析に基づいたセキュリティシステムなど、様々なサービスにも利用することができる。 An object position estimation system capable of tracking a specific object in a video can analyze sports video, and can be used for services such as automatic umpire, broadcast of sports programs, sports data generation / distribution, and coaching. In addition, it can be used for various services such as a security system based on surveillance camera image analysis.

近年、パターン認識技術や計算機速度の向上により、映像内の特定オブジェクトを検出・追跡する技術の性能が高まっている。この映像解析技術の発展は特にスポーツシーン解析において顕著であり、例えばカメラをセンサとした様々なアプリケーションが提案されている。ウィンブルドンでも使用されているテニスのホークアイシステムは、複数の固定カメラ映像をセンサとしてテニスボールを３次元的に追跡し、ジャッジに絡むＩＮ／ＯＵＴの判定を行っている。また２０１４年のＦＩＦＡワールドカップでは、ゴールラインテクノロジーと称して、数台の固定カメラの映像を解析し、ゴールの判定を自動化している。更に、サッカースタジアムヘ多数のステレオカメラを設置し、フィールド内の全選手をリアルタイムに追跡するＴＲＡＣＡＢシステムなど、スポーツにおけるリアルタイム映像解析技術の高度化が進んでいる。 In recent years, with the improvement of pattern recognition technology and computer speed, the performance of technology for detecting and tracking a specific object in a video has been enhanced. The development of this video analysis technology is particularly remarkable in sports scene analysis, and for example, various applications using a camera as a sensor have been proposed. The Hawkeye system of tennis, which is also used in Wimbledon, three-dimensionally tracks a tennis ball using a plurality of fixed camera images as sensors, and performs IN / OUT judgment involving a judge. Also, in 2014 FIFA World Cup, called the goal line technology, it analyzes the images of several fixed cameras and automates the determination of goals. Furthermore, advancements in real-time video analysis technology in sports, such as the TRACAB system that installs a large number of stereo cameras in a football stadium and tracks all players in the field in real time, are advancing.

そのリアルタイム映像解析技術の一例として、サッカーボールの領域検出に先立って、被写体領域の抽出にかかる処理負担の軽減及び被写体領域以外の領域の抽出の防止を図るため、画像特徴より画像の前景と背景を効果的に分離する技法が開示されている（例えば、特許文献１参照）。 As an example of the real-time video analysis technology, in order to reduce the processing load for extracting the subject area and to prevent the extraction of the area other than the subject area prior to the area detection of the soccer ball, the foreground and background of the image There is disclosed a technique for effectively separating the two (see, for example, Patent Document 1).

一方、現実世界の空間に配置したボールオブジェクトをタブレット端末に接続したカメラで認識・追跡可能に射影変換し、その追跡結果をタブレット端末内のアプリケーションの動作に反映させる技法が開示されている（例えば、特許文献２参照）。 On the other hand, there is disclosed a technique for projective transformation of a ball object arranged in the real world space so that it can be recognized and traced by a camera connected to a tablet terminal, and the tracking result is reflected in the operation of an application in the tablet terminal (for example, , Patent Document 2).

また、複数カメラによる人物追跡技法として、或るカメラの認識処理結果について他カメラの認識処理結果を用いて処理負荷を軽減させ、マーキングされた人物をマーク表示する技法が開示されている（例えば、特許文献３参照）。 Also, as a person tracking technique using a plurality of cameras, a technique is disclosed for reducing the processing load using the recognition processing result of another camera with respect to the recognition processing result of a certain camera, and displaying the marked person as a mark (for example, Patent Document 3).

特開２０１２−９９８０号公報Unexamined-Japanese-Patent No. 2012-9980 特表２０１３−５３２３３７号公報Japanese Patent Application Publication No. 2013-532337 特開２００６−２２９４６５号公報JP, 2006-229465, A

前述したようなテニスボールを３次元的に追跡する技法や、フィールド内の全選手をリアルタイムに追跡するＴＲＡＣＡＢシステムなどの技法では、予測可能なテニスボール、ゴール付近のみを撮影した映像からのサッカーボール、移動速度が遅く追跡が比較的容易なサッカー選手などを対象としている。しかしながら、フィールド内の任意位置でのサッカーボールは、その軌道予測が難しく、高速で移動し、オクルージョン（隠薇）が発生する頻度が高いため、このようなサッカーボールを安定的に追跡する技法は未だ確立されていない。 Techniques such as three-dimensional tracking of tennis balls as described above and the TRACAB system that tracks all players in the field in real time, such as a predictable tennis ball and a soccer ball from a video taken only near the goal It is intended for soccer players, etc. who are slow in moving speed and relatively easy to track. However, a soccer ball at an arbitrary position in the field is difficult to predict its trajectory, travels at a high speed, and there is a high frequency of occurrence of occlusion (metaphor), so a technique for stably tracking such a soccer ball is It has not been established yet.

特に、特許文献１の技法では、映像内の被写体領域を抽出するために、画像特徴より画像の前景と背景を効果的に分離するよう構成されているが、サッカーボール領域検出に画像特徴を扱っているものの、機械学習など安定した認識処理を施しておらず、高速で移動し、オクルージョン（隠薇）が発生する頻度が高いサッカーボールを安定的に追跡するのは困難である。 In particular, the technique of Patent Document 1 is configured to effectively separate the foreground and the background of the image from the image features in order to extract the subject region in the video, but the image features are handled in soccer ball region detection. However, it is difficult to stably track a soccer ball which moves at high speed and has a high frequency of occurrence of occlusion (metaphor) without stable recognition processing such as machine learning.

また、特許文献２の技法では、現実世界の空間認識に射影変換も利用しているが、射影変換後の画像について、対象とするオブジェクトに関する検出が容易となるよう自由に加工可能に構成することが前提にあり、サッカースタジアムなどの広いフィールドを任意位置で撮像する映像に対して、リアルタイムにボール追跡する用途に適用することは困難である。 In the technique of Patent Document 2, projective transformation is also used for space recognition in the real world, but the image after projective transformation can be freely processed so as to facilitate detection of an object of interest. It is difficult to apply to the application of tracking a ball in real time to an image taken at an arbitrary position, such as a soccer stadium, at a wide position.

また、特許文献３の技法では、複数カメラによる人物追跡技法として、或るカメラの認識処理結果について他カメラの認識処理結果を用いて処理負荷を軽減させることができるが、射影変換などにより被写体を同一空間で追跡しておらず、サッカースタジアムなどの広いフィールドを任意位置で撮像する映像に対して、リアルタイムにオブジェクト追跡するにあたり、高速で移動し、オクルージョン（隠薇）が発生する頻度が高いオブジェクトを安定的に追跡するのは困難である。 Further, in the technique of Patent Document 3, the processing load can be reduced by using the recognition processing result of another camera with respect to the recognition processing result of a certain camera as a person tracking technique by a plurality of cameras. Objects that are not tracked in the same space and move in high-speed in tracking an object in real time with respect to an image taken at an arbitrary position such as a football stadium, etc. It is difficult to keep track of

したがって、以上のような既存の技法では、バウンドや選手からの跳ね返りによって軌道が予測できず、選手の影に隠れるオクルージョンが発生しやすく、シュート時には高速移動する、など追跡に困難を伴うサッカーボールのような特定のオブジェクトの位置を推定し、追跡することは実現できていない。 Therefore, with the above-mentioned existing techniques, it is not possible to predict the trajectory due to bounce or bounce from a player, occlusion is likely to occur in the player's shadow, and it is difficult to track such as moving fast when shooting. It has not been possible to estimate and track the position of such specific objects.

本発明の目的は、上述の問題を鑑みて為されたものであり、複数のカメラによる映像を並列処理して、映像内の特定のオブジェクトを追跡可能とするオブジェクト位置推定システム、及びそのプログラムを提供することにある。 SUMMARY OF THE INVENTION The object of the present invention is made in view of the above problems, and an object position estimation system capable of tracking a specific object in an image by parallel processing of an image by a plurality of cameras, and a program thereof It is to provide.

上記課題を解決するために、本発明のオブジェクト位置推定システムは、複数のカメラによる映像を並列処理して、映像内の特定のオブジェクトを追跡可能とするために、個々のカメラ映像に対して追跡対象のオブジェクト領域を自動検出し機械学習を用いた安定したオブジェクト位置推定処理を施すと共に、各カメラ映像のオブジェクト位置推定処理結果を統合して再度オブジェクトの実空間上の位置を推定することで、より安定化させたオブジェクト位置推定を実現する。このように構成した本発明のオブジェクト位置推定システムは、各カメラ映像に対する推定処理を同一の空間に射影変換し最終的なオブジェクト位置を推定処理しており、各カメラ映像に対する推定処理を相補的に利用することで、追跡対象のオブジェクトを高精度で追跡可能とする。 In order to solve the above problems, the object position estimation system of the present invention performs tracking on individual camera images in order to parallel process images from a plurality of cameras to enable tracking of specific objects in the images. By automatically detecting the object region of interest and performing stable object position estimation processing using machine learning, and integrating the object position estimation processing results of each camera image to estimate the position of the object in the real space again, Implement more stabilized object position estimation. The object position estimation system of the present invention configured as described above performs projective transformation of estimation processing for each camera image into the same space, and estimates the final object position, and complements estimation processing for each camera image. By using it, tracking object can be traced with high accuracy.

例えば、本発明のオブジェクト位置推定システムでは、スタジアムに設置した複数の固定カメラ映像からボール領域を個々に検出・追跡し、それらの追跡結果を統合することで安定したボール追跡を可能とする。ボールの検出には機械学習の理論を用いて、ノイズの影響を抑えた正確なボール検出を可能とし、ボールが選手の影に隠れた場合に通常であれば追跡不能な状況となっても、複数のカメラで処理し各カメラの追跡状況と統合することで、安定したボール追跡を可能とする。特に、追跡結果の統合においては、各カメラ映像を射影変換でフィールドを真上から見た同一の画像空間上へと配置し、選手の密集領域を考慮した頑健なボール推定処理を実現する。尚、本発明は、サッカーボールの位置推定に特に有効であるが、サッカー以外でもテニス、バレーボール、バスケットボール、卓球など幅広い球技に応用可能な技術である。また追跡対象は、ボールに限らず様々なオブジェクトの位置を推定することも可能である。 For example, in the object position estimation system of the present invention, stable ball tracking is enabled by individually detecting and tracking the ball area from a plurality of fixed camera images installed in a stadium and integrating the tracking results. The theory of machine learning is used to detect balls, allowing accurate ball detection with reduced effects of noise, and if the balls are hidden behind a player's shadows, even if they would normally become untrackable, Processing with multiple cameras and integration with the tracking status of each camera enables stable ball tracking. In particular, in the integration of the tracking results, each camera image is placed on the same image space viewed from directly above the field by projective transformation, and a robust ball estimation process is realized in consideration of the player's dense area. The present invention is particularly effective for estimating the position of a soccer ball, but is a technology applicable to a wide range of ball games such as tennis, volleyball, basketball, and table tennis other than soccer. Moreover, it is also possible to estimate the position of various objects, not only the ball, but the tracking object.

即ち、本発明のオブジェクト位置推定システムは、複数のカメラによる映像を並列処理して、映像内の特定のオブジェクトを追跡可能とするオブジェクト位置推定システムであって、各カメラからフレーム単位で連続する映像を入力してフレーム間差分画像を生成し、現在のフレームにおけるオブジェクト候補領域を差分画像で区分された領域に基づく低レベル特徴により決定するオブジェクト候補領域抽出手段と、各カメラから得られる当該オブジェクト候補領域のうち、追跡対象のオブジェクトを含むとして想定される領域を機械学習に基づく高レベル特徴により特定するオブジェクト候補領域認識手段と、各カメラから得られる当該高レベル特徴により特定されたオブジェクト候補領域について、フレーム間で関連付けたボール候補領域の軌跡を生成するオブジェクト軌跡生成手段と、各カメラから得られる当該オブジェクト候補領域の軌跡のうち、カメラ毎に予め定めた指標に基づき最も高い値を持つ軌跡を追跡対象のオブジェクトの軌跡として選定するオブジェクト軌跡選定手段と、各カメラから得られる当該現在のフレームの画像と、カメラ毎に選定されたオブジェクト候補領域の軌跡の画像座標を所定の射影変換画像へ変換して統合する射影変換手段と、前記射影変換画像についてブロック分割し、各ブロック内で当該追跡対象のオブジェクトと相関性が高いものとして予め定めた当該追跡対象のオブジェクト以外のオブジェクトによって表される特徴量を算出して比較し、該特徴量が所定値以下のブロックを排除し、未排除のブロックに位置するオブジェクト候補座標を選定するブロック判定手段と、当該選定されたオブジェクト候補座標について所定時間以上に継続しているボール候補座標を最終的なボール位置座標として決定するオブジェクト位置推定手段と、を備えることを特徴とする。 That is, the object position estimation system according to the present invention is an object position estimation system capable of tracking a specific object in an image by parallel processing of images by a plurality of cameras, and an image which is continuous in frame units from each camera To generate an inter-frame difference image, and object candidate region extraction means for determining object candidate regions in the current frame by low-level features based on regions divided by the difference image, and the object candidates obtained from each camera Object candidate area recognition means for specifying an area assumed to include an object to be tracked among the areas by a high level feature based on machine learning, and an object candidate area specified by the high level feature obtained from each camera , Ball candidate domain associated between frames Among the trajectories of the object candidate area obtained from each camera, the trajectory having the highest value based on the index previously determined for each camera is selected as the trajectory of the object to be tracked among the trajectories of the object candidate region obtained from each camera Object trajectory selection means, projection conversion means for converting the image of the current frame obtained from each camera and the image coordinates of the trajectory of the object candidate area selected for each camera into a predetermined projective transformation image and integrating them; The projective transformation image is divided into blocks, and within each block, feature amounts represented by objects other than the object to be tracked, which are predetermined to be highly correlated with the object to be tracked, are calculated and compared. Object candidates located in unremoved blocks after excluding blocks whose feature amount is less than a predetermined value The apparatus is characterized by comprising block determination means for selecting a mark, and object position estimation means for determining, as a final ball position coordinate, ball candidate coordinates continuing for a predetermined time or more with respect to the selected object candidate coordinates. Do.

また、本発明のオブジェクト位置推定システムにおいて、前記オブジェクト候補領域認識手段は、色ヒストグラム特徴量とローカルバイナリーパターン特徴量の二重特徴量により認識されるボール候補領域を機械学習に基づく高レベル特徴により特定することを特徴とする。 Further, in the object position estimation system of the present invention, the object candidate area recognition means uses a high-level feature based on machine learning to recognize a ball candidate area recognized by a dual feature of a color histogram feature and a local binary pattern feature. It is characterized by specifying.

また、本発明のオブジェクト位置推定システムにおいて、前記オブジェクト軌跡選定手段は、カメラ毎に予め定めた指標として、前記射影変換画像上でのオブジェクト候補領域の速度、移動量、及び追跡時間のうち１つ以上を指標として選定することを特徴とする。 Further, in the object position estimation system according to the present invention, the object trajectory selection means may use one of the velocity, the movement amount, and the tracking time of the object candidate area on the projective transformation image as an index previously determined for each camera. It is characterized by selecting the above as an index.

また、本発明のオブジェクト位置推定システムにおいて、前記ブロック判定手段は、当該追跡対象のオブジェクトと相関性が高いものとして人物を対象とし、当該追跡対象のオブジェクト以外のオブジェクトによって表される特徴量として当該人物の占有率を対象とすることを特徴とする。 Further, in the object position estimation system according to the present invention, the block determination unit targets a person as having high correlation with the object to be tracked, and sets the feature amount represented by an object other than the object to be tracked. It is characterized in that the occupancy rate of a person is targeted.

また、本発明のオブジェクト位置推定システムにおいて、前記オブジェクト位置推定手段は、当該選定されたオブジェクト候補座標について、各軌跡の追跡終了フレームからの経過時間を基に、所定時間以上に継続しているボール候補座標を最終的なボール位置座標として決定することを特徴とする。 Further, in the object position estimation system according to the present invention, the object position estimation means is configured to continue the ball for a predetermined time or more based on an elapsed time from a tracking end frame of each trajectory for the selected object candidate coordinates. The candidate coordinates are determined as final ball position coordinates.

更に、本発明は、本発明に係るオブジェクト位置推定システムにて前記オブジェクト候補領域抽出手段、前記オブジェクト候補領域認識手段、前記オブジェクト軌跡生成手段、及び前記オブジェクト軌跡選定手段の各機能を実現させるために、当該並列処理における処理対象のカメラの映像から前記特定のオブジェクトに関するオブジェクト候補領域の軌跡を個別に選定するコンピュータに、処理対象のカメラからフレーム単位で連続する映像を入力してフレーム間差分画像を生成し、現在のフレームにおけるオブジェクト候補領域を差分画像で区分された領域に基づく低レベル特徴により決定するステップと、処理対象のカメラから得られる当該オブジェクト候補領域のうち、追跡対象のオブジェクトを含むとして想定される領域を機械学習に基づく高レベル特徴により特定するステップと、処理対象のカメラから得られる当該高レベル特徴により特定されたオブジェクト候補領域について、フレーム間で関連付けたボール候補領域の軌跡を生成するステップと、処理対象のカメラから得られる当該オブジェクト候補領域の軌跡のうち、カメラ毎に予め定めた指標に基づき最も高い値を持つ軌跡を追跡対象のオブジェクトの軌跡として選定するステップと、をプログラムにより実行させた上で、
本発明に係るオブジェクト位置推定システムにて前記射影変換手段、前記ブロック判定手段、及び前記オブジェクト位置推定手段の各機能を実現させるために、当該複数のカメラの映像からそれぞれ抽出された特定のオブジェクトに関するオブジェクト候補領域を統合して当該特定のオブジェクトを追跡可能とするコンピュータに、前記プログラムによって実行された各ステップを経て各カメラから得られる当該現在のフレームの画像と、カメラ毎に選定されたオブジェクト候補領域の軌跡の画像座標を所定の射影変換画像へ変換して統合するステップと、前記射影変換画像についてブロック分割し、各ブロック内で当該追跡対象のオブジェクトと相関性が高いものとして予め定めた当該追跡対象のオブジェクト以外のオブジェクトによって表される特徴量を算出して比較し、該特徴量が所定値以下のブロックを排除し、未排除のブロックに位置するオブジェクト候補座標を選定するステップと、当該選定されたオブジェクト候補座標について所定時間以上に継続しているボール候補座標を最終的なボール位置座標として決定するステップと、を実行させるためのプログラムを構成する。
Furthermore, the present invention, the object candidate area extracting means with the object position estimation system according to the present invention, said object candidate region recognizing means, the object trajectory generation unit, and in order to implement the functions of the object trajectory selection means To the computer for individually selecting the locus of the object candidate area relating to the specific object from the image of the camera to be processed in the parallel processing, a continuous image is input in units of frames from the camera to be processed to obtain an interframe difference image. Generating and determining an object candidate area in the current frame by a low-level feature based on the area divided by the difference image, and including an object to be tracked among the object candidate areas obtained from the camera to be processed Mechanics the assumed area Identifying with a high-level feature based on H., generating a trajectory of a ball candidate region associated between frames for an object candidate region identified with the high-level feature obtained from the camera to be processed; Selecting a locus having the highest value based on a predetermined index for each camera among the loci of the object candidate area obtained from the camera as the locus of the object to be traced by the program ;
The present invention relates to a specific object extracted from images of a plurality of cameras in order to realize the functions of the projective transformation means, the block determination means, and the object position estimation means in the object position estimation system according to the present invention. An image of the current frame obtained from each camera through each step executed by the program and an object candidate selected for each camera on a computer capable of tracking the specific object by integrating the object candidate area Converting the image coordinates of the locus of the area into a predetermined projective transformation image and integrating the projective transformation image, and dividing the projection transformation image into blocks, which are predetermined as having high correlation with the object to be tracked in each block By objects other than the object being tracked Calculating and comparing the feature quantities to be selected, excluding blocks whose feature quantity is equal to or less than a predetermined value, and selecting object candidate coordinates located in the non-excluded block; and predetermined time for the selected object And determining a continuing ball candidate coordinate as a final ball position coordinate.

本発明によれば、高い精度で実空間上のオブジェクト位置を推定することができる。 According to the present invention, an object position in real space can be estimated with high accuracy.

本発明による一実施形態のオブジェクト位置推定システムの構成例を示すブロック図である。It is a block diagram showing an example of composition of an object position estimating system of one embodiment by the present invention. 本発明による一実施形態のオブジェクト位置推定システムの動作の一例を示すフローチャートである。It is a flowchart which shows an example of operation | movement of the object position estimation system of one Embodiment by this invention. 本発明による一実施形態のオブジェクト位置推定システムにおけるオブジェクトの候補領域抽出処理の説明図である。It is explanatory drawing of the candidate area | region extraction process of the object in the object position estimation system of one Embodiment by this invention. （Ａ）,（Ｂ）は、本発明による一実施形態のオブジェクト位置推定システムにおける低レベル特徴によるオブジェクトの候補認識処理の説明図である。(A), (B) is explanatory drawing of the candidate recognition process of the object by the low level feature in the object position estimation system of one Embodiment by this invention. （Ａ）,（Ｂ）は、それぞれ本発明による一実施形態のオブジェクト位置推定システムにおける高レベル特徴によるオブジェクトの候補認識処理に用いる機械学習用の正例及び負例を例示する図である。(A), (B) is a figure which illustrates the positive example and the negative example for machine learning used for the candidate recognition processing of the object by the high level feature in the object position estimation system of one embodiment by the present invention, respectively. 本発明による一実施形態のオブジェクト位置推定システムにおける高レベル特徴によるオブジェクトの候補認識処理に用いる色ヒストグラムを例示する図である。It is a figure which illustrates the color histogram used for the candidate recognition process of the object by the high level feature in the object position estimation system of one Embodiment by this invention. 本発明による一実施形態のオブジェクト位置推定システムにおけるオブジェクトの軌跡生成処理及び軌跡選定処理の処理結果に関する説明図である。It is explanatory drawing regarding the process result of the locus | trajectory production | generation process of an object in the object position estimation system of one Embodiment by this invention, and a locus | trajectory selection process. （Ａ）,（Ｂ）は、本発明による一実施形態のオブジェクト位置推定システムにおける射影変換処理に関する説明図である。(A), (B) is explanatory drawing regarding the projective transformation process in the object position estimation system of one Embodiment by this invention. 本発明による一実施形態のオブジェクト位置推定システムにおける射影変換画像を例示する図である。It is a figure which illustrates the projective transformation image in the object position estimation system of one Embodiment by this invention. 本発明による一実施形態のオブジェクト位置推定システムにおけるオブジェクトのブロック判定処理に関する説明図である。It is explanatory drawing regarding the block determination process of the object in the object position estimation system of one Embodiment by this invention. 本発明による一実施形態のオブジェクト位置推定システムにおけるオブジェクトの位置推定処理に関する説明図である。It is explanatory drawing regarding the position estimation process of the object in the object position estimation system of one Embodiment by this invention. 本発明による一実施形態のオブジェクト位置推定システムにおける高レベル特徴の識別精度及び処理時間を対比説明する図である。It is a figure which contrasts the identification precision and processing time of the high level feature in the object position estimation system of one Embodiment by this invention. 本発明による一実施形態のオブジェクト位置推定システムにおける推定誤差を対比説明する図である。It is a figure which contrasts the estimation error in the object position estimation system of one Embodiment by this invention. （Ａ）,（Ｂ）は、本発明による一実施形態のオブジェクト位置推定システムにおけるオブジェクトの推定位置の領域半径に対する適合確率に関する説明図である。(A), (B) is explanatory drawing regarding the fitting probability with respect to the area | region radius of the presumed position of the object in the object position estimation system of one Embodiment by this invention.

（オブジェクト位置推定システムの構成）
以下、図面を参照して、本発明による一実施形態のオブジェクト位置推定システム１を説明する。図１は、本発明による一実施形態のオブジェクト位置推定システム１の構成例を示すブロック図である。 (Configuration of object position estimation system)
An object position estimation system 1 according to an embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a configuration example of an object position estimation system 1 according to an embodiment of the present invention.

図１に例示するオブジェクト位置推定システム１は、個別のカメラ映像をそれぞれ入力し、独立に並列処理して個別のオブジェクト位置を推定する複数の対象オブジェクト検出・追跡装置２と、該複数の対象オブジェクト検出・追跡装置２の処理結果を統合してオブジェクト位置を推定する対象オブジェクト位置統合装置３とを備える。対象オブジェクト検出・追跡装置２は、オブジェクト候補領域抽出部２１、オブジェクト候補領域認識部２２、オブジェクト軌跡生成部２３、及びオブジェクト軌跡選定部２４を備える。また、対象オブジェクト位置統合装置３は、射影変換部３１、ブロック判定部３２、及びオブジェクト位置推定部３３を備える。 The object position estimation system 1 illustrated in FIG. 1 inputs a plurality of target object detection / tracking devices 2 each of which inputs individual camera images and independently processes them in parallel to estimate the individual object positions, and the plurality of target objects A target object position integration device 3 which integrates the processing results of the detection / tracking device 2 to estimate the object position. The target object detection / tracking device 2 includes an object candidate area extraction unit 21, an object candidate area recognition unit 22, an object trajectory generation unit 23, and an object trajectory selection unit 24. Further, the target object position integration device 3 includes a projection transformation unit 31, a block determination unit 32, and an object position estimation unit 33.

複数の対象オブジェクト検出・追跡装置２は、それぞれ同様に構成され、リアルタイムに個別のカメラ映像をそれぞれ入力し並列処理するように構成されている。本例では、４台のカメラ（第１乃至第４カメラ）から得られるカメラ映像（特に、各カメラから得られるフレーム画像を２Ｄ(dimensional)画像と称する。）が、それぞれの対象オブジェクト検出・追跡装置２に入力され、それぞれの対象オブジェクト検出・追跡装置２は、各２Ｄ画像内のオブジェクト位置を推定し、その処理結果を対象オブジェクト位置統合装置３に出力する。したがって、対象オブジェクト位置統合装置３は、複数の対象オブジェクト検出・追跡装置２から入力される処理結果を各カメラのカメラ映像と同期させて最終的なオブジェクト位置を推定処理するように構成されている。 The plurality of target object detection / tracking devices 2 are respectively configured in the same manner, and configured to input and parallel process individual camera images in real time. In this example, camera images obtained from four cameras (first to fourth cameras) (in particular, a frame image obtained from each camera is referred to as a 2D (dimensional) image) are subject object detection and tracking. Each target object detection / tracking device 2 is input to the device 2, estimates the object position in each 2D image, and outputs the processing result to the target object position integration device 3. Therefore, the target object position integration device 3 is configured to synchronize the processing results input from the plurality of target object detection / tracking devices 2 with the camera images of the cameras and estimate the final object position. .

そして、複数の対象オブジェクト検出・追跡装置２、及び対象オブジェクト位置統合装置３は、それぞれパーソナルコンピュータにより実現することができる。例えば、本実施形態の対象オブジェクト検出・追跡装置２及び対象オブジェクト位置統合装置３は、それぞれの各構成要素の機能について、コンピュータにより実現することができ、当該コンピュータに、本発明に係る各構成要素を実現させるためのプログラムは、当該コンピュータの内部又は外部に備えられるメモリ（図示せず）に記憶される。コンピュータに備えられる中央演算処理装置（ＣＰＵ）などの制御で、各構成要素の機能を実現するための処理内容が記述されたプログラムを、適宜、メモリから読み込んで実行することにより、本実施形態の対象オブジェクト検出・追跡装置２及び対象オブジェクト位置統合装置３の構成要素の機能をコンピュータにより実現させることができる。ここで、各構成要素の機能をハードウェアの一部で実現してもよい。 The plurality of target object detection / tracking devices 2 and the target object position integration device 3 can be realized by personal computers. For example, the target object detection / tracking device 2 and the target object position integration device 3 of the present embodiment can be realized by a computer with respect to the function of each component, and each component according to the present invention The program for realizing the above is stored in a memory (not shown) provided inside or outside the computer. By controlling a central processing unit (CPU) or the like included in a computer, a program in which processing content for realizing the function of each component is described is read from memory and executed as appropriate. The functions of the components of the target object detection / tracking device 2 and the target object position integration device 3 can be realized by a computer. Here, the function of each component may be realized by part of hardware.

以下では、サッカースタジアムに配設された４台のカメラ（第１乃至第４カメラ）からのカメラ映像を解析して、サッカーボール（以下、単に「ボール」とも略す。）を追跡対象のオブジェクトとする例を説明する。尚、カメラ台数は、多ければ多いほどよいが、複数台のカメラであれば本発明に係るオブジェクト位置推定システム１を構成することができる。 In the following, camera images from four cameras (first to fourth cameras) disposed at a soccer stadium are analyzed, and a soccer ball (hereinafter, also simply referred to as a “ball”) is regarded as an object to be tracked. An example will be described. The more the number of cameras, the better, but if there are a plurality of cameras, the object position estimation system 1 according to the present invention can be configured.

まず、対象オブジェクト検出・追跡装置２が備えるオブジェクト候補領域抽出部２１、オブジェクト候補領域認識部２２、オブジェクト軌跡生成部２３、及びオブジェクト軌跡選定部２４の機能について説明する。 First, the functions of the object candidate area extraction unit 21, the object candidate area recognition unit 22, the object trajectory generation unit 23, and the object trajectory selection unit 24 included in the target object detection / tracking device 2 will be described.

オブジェクト候補領域抽出部２１は、対応するカメラからフレーム単位で連続する映像（２Ｄ画像）を入力してフレーム間差分画像を生成し、現在のフレーム（以下、「現２Ｄ画像」と称する。）におけるオブジェクト候補領域（本例では、ボール候補領域）を差分画像で区分された領域に基づく低レベル特徴により決定する処理（オブジェクト候補領域抽出処理）を実行する機能を有する。 The object candidate area extraction unit 21 receives a continuous image (2D image) from the corresponding camera in frame units to generate an inter-frame difference image, and the current frame (hereinafter referred to as "the current 2D image") is generated. It has a function of executing a process (object candidate area extraction process) of determining an object candidate area (in this example, a ball candidate area) based on low-level features based on the areas divided by the difference image.

オブジェクト候補領域認識部２２は、オブジェクト候補領域抽出部２１によって決定された現２Ｄ画像におけるボール候補領域について、追跡対象のオブジェクトを含むとして想定される領域を機械学習に基づく高レベル特徴により特定する処理（オブジェクト候補領域認識処理）を実行する機能を有する。特に、オブジェクト候補領域認識部２２は、色ヒストグラム特徴量とローカルバイナリーパターン（ＬＢＰ:Local Binary Pattern）特徴量の二重特徴量により認識されるボール候補領域を抽出することで追跡対象のオブジェクトを含むとして想定される領域を特定する。 The object candidate area recognition unit 22 is a process of specifying an area assumed to include an object to be tracked by a high-level feature based on machine learning for the ball candidate area in the current 2D image determined by the object candidate area extraction unit 21 It has a function of executing (object candidate area recognition processing). In particular, the object candidate area recognition unit 22 includes an object to be tracked by extracting a ball candidate area recognized by a double feature of a color histogram feature and a local binary pattern (LBP) feature. Identify the area assumed as

オブジェクト軌跡生成部２３は、オブジェクト候補領域認識部２２によって特定されたボール候補領域について、フレーム間（即ち、現２Ｄ画像と連続する１以上の過去の２Ｄ画像（以下、「過去２Ｄ画像」と称する。）との間）で関連付けたボール候補領域の軌跡を生成する処理（オブジェクト軌跡生成処理）を実行する機能を有する。 The object trajectory generation unit 23 refers to the ball candidate area identified by the object candidate area recognition unit 22 between frames (that is, one or more past 2D images continuous with the current 2D image (hereinafter referred to as “past 2D image”) ) And the function of generating the trajectory of the ball candidate area associated with the object trajectory generation processing).

オブジェクト軌跡選定部２４は、オブジェクト軌跡生成部２３によって複数に生成されうるボール候補領域の軌跡のうち、後述する予め定めた指標に基づき最も高い値を持つ軌跡を選定し、現２Ｄ画像とともに当該選定した軌跡の各フレームにおけるボール候補領域の画像座標（以下、「ボール候補座標」とも称する。）を追跡対象のオブジェクト（本例では、ボール）の軌跡として決定して、対象オブジェクト位置統合装置３に出力する処理（オブジェクト軌跡選定処理）を実行する機能を有する。 The object trajectory selection unit 24 selects a trajectory having the highest value based on a predetermined index described later among the trajectories of the ball candidate regions that can be generated in plurality by the object trajectory generation unit 23 and selects the trajectory together with the current 2D image. The image coordinates (hereinafter also referred to as “ball candidate coordinates”) of the ball candidate area in each frame of the determined trajectory is determined as the trajectory of the object to be tracked (in this example, the ball). It has a function of executing a process to output (object trajectory selection process).

次に、対象オブジェクト位置統合装置３が備える射影変換部３１、ブロック判定部３２、及びオブジェクト位置推定部３３の機能について説明する。 Next, the functions of the projection conversion unit 31, the block determination unit 32, and the object position estimation unit 33 included in the target object position integration device 3 will be described.

射影変換部３１は、各対象オブジェクト検出・追跡装置２から、２Ｄ画像（即ち、現２Ｄ画像）と、当該選定された軌跡のボール候補座標を入力し、各ボール候補座標を反映する所定の射影変換画像へそれぞれ変換して統合する処理（射影変換処理）を実行する機能を有する。 The projective transformation unit 31 receives a 2D image (that is, a current 2D image) and ball candidate coordinates of the selected trajectory from each target object detection / tracking device 2 and performs predetermined projection reflecting the ball candidate coordinates. It has a function of executing processing (projective conversion processing) for converting and integrating each into a converted image.

ブロック判定部３２は、射影変換部３１によって得られる射影変換画像について所定ブロック数でブロック分割し、各ブロック内で当該ボールのオブジェクトと相関性が高いものとして予め定めた当該ボール以外のオブジェクトによって表される特徴量を算出して比較し、該特徴量が所定値以下のブロックを排除し、未排除のブロックに位置するボール候補座標を選定することにより、当該射影変換画像から得られる各ボール候補座標を選択する処理（ブロック判定処理）を実行する機能を有する。例えば、各ブロック内の当該ボール以外のオブジェクトによって表される特徴量として、人物領域の占有率とすることができる。即ち、ブロック判定処理として、射影変換画像から得られる各ブロックの人物領域の占有率を算出し、人物領域の占有率を基に、人物領域の占有率が高いブロック内のボール候補座標を優先的に選択する。 The block determination unit 32 divides the projective transformation image obtained by the projective transformation unit 31 into blocks according to a predetermined number of blocks, and displays a table by an object other than the ball which is predetermined as having high correlation with the object of the ball in each block. Each of the ball candidates obtained from the projective transformation image by calculating and comparing the feature quantities to be calculated, excluding blocks having the feature quantity equal to or less than a predetermined value, and selecting ball candidate coordinates located in the non-excluded blocks. It has a function of executing a process of selecting coordinates (block determination process). For example, the occupancy rate of a person area can be used as a feature quantity represented by an object other than the ball in each block. That is, as block determination processing, the occupancy rate of the person area of each block obtained from the projective transformation image is calculated, and based on the occupancy rate of the person area, ball candidate coordinates in blocks having high occupancy rate of the person area are prioritized Select to

オブジェクト位置推定部３３は、ブロック判定部３２によって選択された各ボール候補座標について、各軌跡の「追跡終了フレームからの経過時間」を基に、所定時間以上（所定のフレーム数以上）に継続しているボール候補座標を最終的なボール位置座標として決定する処理（ボール位置推定処理）を実行する機能を有する。当該決定された最終的なボール位置座標は、当該射影変換画像における位置座標を示す態様で、図示しない所定の画像表示部へと表示させ、この表示を連続させることで、当該最終的に推定したボール位置の軌跡を当該射影変換画像上にて表示させることができる。そして、所定時間以上（所定のフレーム数以上）に継続しているボール候補座標を最終的なボール位置座標として決定することで、例えば、ブロック判定部３２によって選択された各ボール候補座標が或るフレームにて複数ある場合や全く検出されない場合も、複数フレームで連続的に射影変換された各射影変換画像上にて連続表示させると、概ね１つの推定されたボール位置の軌跡が認識される。 The object position estimation unit 33 continues, for each ball candidate coordinate selected by the block determination unit 32, for a predetermined time or more (a predetermined number of frames or more) based on the "time elapsed from the tracking end frame" of each trajectory. It has a function of executing processing (ball position estimation processing) of determining the ball candidate coordinates that are being played as the final ball position coordinates. The determined final ball position coordinates are displayed on a predetermined image display unit (not shown) in a manner indicating the position coordinates in the projective transformation image, and the display is continuously performed to estimate the final position. The trajectory of the ball position can be displayed on the projective transformation image. Then, by determining ball candidate coordinates continuing for a predetermined time or more (predetermined number of frames or more) as final ball position coordinates, for example, each ball candidate coordinate selected by the block determination unit 32 is present Even in the case where there are a plurality of frames or no frame is detected at all, when one frame is continuously displayed on each of the projective transformation images continuously projectively transformed in a plurality of frames, approximately one estimated ball position trajectory is recognized.

（オブジェクト位置推定システムの動作例）
以下、図２乃至図１１を参照して、本発明による一実施形態のオブジェクト位置推定システム１の動作の一例をより詳細にする。尚、カメラ映像に関する２Ｄ画像を示す各図面はグレイスケール表示で図示しているが、特筆しない限りＲＧＢカラー情報で処理されるものとする。図２は、本発明による一実施形態のオブジェクト位置推定システムの動作の一例を示すフローチャートである。 (Operation example of object position estimation system)
Hereinafter, an example of the operation of the object position estimation system 1 according to an embodiment of the present invention will be described in more detail with reference to FIGS. Although each drawing showing a 2D image related to a camera image is illustrated in gray scale, it is assumed to be processed with RGB color information unless otherwise specified. FIG. 2 is a flow chart showing an example of the operation of the object position estimation system according to an embodiment of the present invention.

まず、対象オブジェクト検出・追跡装置２の各々は、対応するカメラからフレーム単位で連続する映像（２Ｄ画像）を入力する（ステップＳ１）。 First, each of the target object detection / tracking device 2 inputs continuous video (2D image) in frame units from the corresponding camera (step S1).

次に、対象オブジェクト検出・追跡装置２の各々は、オブジェクト候補領域抽出部２１により、対応するカメラから入力されたフレーム単位で連続する映像（２Ｄ画像）についてフレーム間差分画像を生成し、現２Ｄ画像におけるボール候補領域を差分画像で区分された領域に基づく低レベル特徴により決定する（ステップＳ２）。 Next, each of the target object detection / tracking device 2 generates an inter-frame difference image for continuous video (2D image) in frame units input from the corresponding camera by the object candidate area extraction unit 21 and the current 2D The ball candidate area in the image is determined by the low level feature based on the area divided by the difference image (step S2).

より具体的には、オブジェクト候補領域抽出部２１により、フレーム間差分画像を生成し、低レベルの画像特徴を基準としてボール候補領域を粗く抽出する。フレーム間差分画像とは隣り合う映像フレームの画素ごとの差分値をピクセル値とした画像であり、映像内で動きが生じた部分のみ値が増加する。追跡時のボールは常に動いているため、フレーム間差分画像内で高い値（絶対値）を有する画素のみを処理対象とすることで、効率的にボールを検出することができる。そして、特定のしきい値で２値化したフレーム間差分画像にラベリング処理を施し、ラベル（連結領域）ごとに低レベル特徴を算出する。低レベル特徴は、ラベル内の色、形（円形度）及び画素数とし、予め設定した範囲外の特徴量を持つラベルをボール候補から除外することで、ボール候補領域を決定する。 More specifically, the object candidate area extraction unit 21 generates an inter-frame difference image, and roughly extracts ball candidate areas based on low-level image features. The inter-frame difference image is an image in which the difference value for each pixel of an adjacent video frame is used as a pixel value, and the value increases only in a portion where motion occurs in the image. Since the ball at the time of tracking always moves, the ball can be efficiently detected by processing only pixels having a high value (absolute value) in the inter-frame difference image. Then, the inter-frame difference image binarized with a specific threshold value is subjected to labeling processing, and low-level features are calculated for each label (connected area). The low-level feature is a color, a shape (degree of circularity), and the number of pixels in the label, and a label having a feature amount outside the preset range is excluded from the ball candidate to determine the ball candidate region.

より詳細に図３及び図４を参照して説明するに、図３は、本発明による一実施形態のオブジェクト位置推定システム１におけるオブジェクト候補領域抽出部２１によるオブジェクトの候補領域抽出処理の説明図である。図４（Ａ）,（Ｂ）は、本発明による一実施形態のオブジェクト位置推定システムにおける低レベル特徴によるオブジェクト候補領域抽出部２１によるオブジェクトの候補認識処理の説明図である。例えば、図３に示すように、動きのあるオブジェクトＯｂを含む第ｎフレーム及び第ｎ−１フレーム間の差分を取ることで、フレーム間差分画像を生成することができる。フレーム間差分画像内には、高い値（絶対値）を有する画素の領域にラベルＬ１，Ｌ２を付与する。尚、フレーム間差分画像にモノフォロジー演算によるフィルタ処理を施した後、ラベルを付すように構成してもよい。 More specifically, with reference to FIG. 3 and FIG. 4, FIG. 3 is an explanatory view of object candidate area extraction processing by the object candidate area extraction unit 21 in the object position estimation system 1 of one embodiment according to the present invention. is there. FIGS. 4A and 4B are explanatory diagrams of object candidate recognition processing by the object candidate area extraction unit 21 based on low-level features in the object position estimation system according to the embodiment of the present invention. For example, as shown in FIG. 3, it is possible to generate an inter-frame difference image by taking the difference between the n-th frame and the (n-1) -th frame including the moving object Ob. In the inter-frame difference image, labels L1 and L2 are attached to the areas of pixels having high values (absolute values). It is to be noted that the inter-frame difference image may be configured to be labeled after being subjected to filter processing by monomorphic calculation.

このようにして、第ｎフレーム（現２Ｄ画像Ｆｒ）を例示する図４（Ａ）について、図４（Ｂ）に示すように抽出された各領域にラベルＬ１〜Ｌ１９を付すことができ、更に、ラベル（連結領域）ごとに低レベル特徴を算出して、予め設定した範囲外の特徴量を持つラベルをボール候補から除外することで、ボール候補領域を決定することができる。図４（Ｂ）では、ボール候補領域から除外された領域（ラベル）Ｌ１，Ｌ４〜Ｌ８，Ｌ１２〜Ｌ１７と、ボール候補領域として抽出された領域（ラベル）Ｌ２，Ｌ３，Ｌ９〜Ｌ１１，Ｌ１８，Ｌ１９とに分類し、ボール候補領域を決定することができる旨を例示している。尚、ラベル１は、当該フレーム間差分画像で抽出した領域が所定量以上密集しているときは１つの領域とみなしており、ボール候補領域から除外することができる。 In this manner, in FIG. 4 (A) exemplifying the n-th frame (the current 2D image Fr), labels L1 to L19 can be attached to the respective regions extracted as shown in FIG. 4 (B). The ball candidate area can be determined by calculating low-level features for each label (connected area) and excluding labels having feature quantities outside the preset range from the ball candidates. In FIG. 4B, the areas (labels) L1, L4 to L8, L12 to L17 excluded from the ball candidate area and the areas (labels) L2, L3, L9 to L11, L18, extracted as ball candidate areas. It categorizes into L19 and illustrates that the ball candidate area can be determined. The label 1 is regarded as one area when the areas extracted in the inter-frame difference image are dense in a predetermined amount or more, and can be excluded from the ball candidate areas.

次に、対象オブジェクト検出・追跡装置２の各々は、オブジェクト候補領域認識部２２により、オブジェクト候補領域抽出部２１によって決定された現２Ｄ画像におけるボール候補領域について、追跡対象のオブジェクトを含むとして想定される領域を機械学習に基づく高レベル特徴により特定する（ステップＳ３）。特に、オブジェクト候補領域認識部２２は、色ヒストグラム特徴量とローカルバイナリーパターン（ＬＢＰ）特徴量の二重特徴量により認識されるボール候補領域を抽出することで追跡対象のオブジェクトを含むとして想定される領域を特定する。 Next, each of the target object detection / tracking device 2 is assumed to include an object to be tracked for the ball candidate region in the current 2D image determined by the object candidate region extraction unit 21 by the object candidate region recognition unit 22 Region is identified by high-level features based on machine learning (step S3). In particular, the object candidate area recognition unit 22 is assumed to include an object to be tracked by extracting a ball candidate area recognized by a double feature of a color histogram feature and a local binary pattern (LBP) feature. Identify the area.

より具体的には、オブジェクト候補領域認識部２２により、より高い精度でボール候補を絞り込む。このために、ボールの認識には教師あり学習のフレームワークを用いるのが好適である。まず、予めサッカーの映像から多数のボール画像（１２×１２画素）と非ボール画像（１２× １２画素）を収集する。図５（Ａ）,（Ｂ）は、それぞれ本発明による一実施形態のオブジェクト位置推定システム１における高レベル特徴によるオブジェクトの候補認識処理に用いる機械学習用の正例（ボール画像）及び負例（非ボール画像）を例示する図である。収集した正例及び負例の画像から、色ヒストグラム特徴量とＬＢＰ特徴量を抽出する。 More specifically, the object candidate area recognition unit 22 narrows down ball candidates with higher accuracy. To this end, it is preferable to use a supervised learning framework for ball recognition. First, a large number of ball images (12 × 12 pixels) and non-ball images (12 × 12 pixels) are collected in advance from a soccer video. FIGS. 5A and 5B respectively show a positive example (ball image) and a negative example (ball image) for machine learning used in object candidate recognition processing by high-level features in the object position estimation system 1 according to an embodiment of the present invention. It is a figure which illustrates a (non-ball image). Color histogram feature quantities and LBP feature quantities are extracted from the collected positive and negative example images.

図６は、本発明による一実施形態のオブジェクト位置推定システム１における高レベル特徴によるオブジェクトの候補認識処理に用いる色ヒストグラムを例示する図である。図６に示すように、色ヒストグラム特徴量は、ＲＧＢ色空間を３×３×３の２７次元に分割し、正例及び負例の画像内の各画素の色頻度で表すことができる。 FIG. 6 is a view exemplifying a color histogram used for object candidate recognition processing by high-level features in the object position estimation system 1 according to an embodiment of the present invention. As shown in FIG. 6, the color histogram feature value can be expressed by dividing the RGB color space into 27 dimensions of 3 × 3 × 3 and by the color frequency of each pixel in the positive and negative example images.

また、ＬＢＰ特徴量は、テクスチャ解析で用いられるパターンベースの特徴量であり、識別精度の高さや計算コストの低さから近年注目されている（例えば、『寺島，喜田、“勾配情報を用いたLocal Binary Patternの改良”,DEIM Forum 2014, F5‐4』を参照）。ＬＢＰ特徴量は、ラスタスキャンによって算出され、注目画素の周囲の半径をＲ、近傍領域の画素数をＰとして式（１）を基に求めることができる。 The LBP feature is a pattern-based feature used in texture analysis, and has recently been attracting attention because of high identification accuracy and low calculation cost (for example, “Terajima, Kida,” using gradient information) "Improvement of Local Binary Pattern", DEIM Forum 2014, F5-4). The LBP feature value is calculated by raster scan, and can be determined based on Equation (1), where R is the radius of the pixel of interest and P is the number of pixels in the vicinity region.

ここで，ｇ_ｃは注目画素の画素値を示し，ｇ_ｐは参照点の画素値を表す。近傍領域はＲ＝１のとき近傍領域は３×３であり，Ｐの最大は８となる。ＬＢＰ特徴量は注目画素ｇ_ｃと近傍領域の画素値ｇ_ｐの大小比較によってパターンＬＢＰ_Ｐ，Ｒが算出され、その種類は２のＰ乗となる。ＬＢＰ特徴量はこのパターンの頻度を記述したヒストグラムの特徴量である。一例として、Ｒ＝１, Ｐ＝８の２５６次元のＬＢＰとすることができる。色ヒストグラム特徴量とＬＢＰ特徴量の二重特徴量により、その合計次元数は２８３（＝２７＋２５６）となる。この２８３次元の特徴量であれば、機械学習による認識処理負荷を比較的軽減させることができ、リアルタイムに入力される２Ｄ画像に対する実時間内処理を実現することができる。 Here, g _c represents the pixel value of the target pixel, and g _p represents the pixel value of the reference point. When R = 1, the near region is 3 × 3 and the maximum of P is 8. The patterns LBP _{P and R} are calculated by comparing the size of the pixel of interest g _c with the pixel value g _p of the neighboring region, and the type of L BP feature amount is _P <2>. The LBP feature is a feature of a histogram describing the frequency of this pattern. As an example, a 256-dimensional LBP of R = 1 and P = 8 can be used. The total number of dimensions is 283 (= 27 + 256) due to the dual feature amount of the color histogram feature amount and the LBP feature amount. With this 283-dimensional feature quantity, the recognition processing load by machine learning can be relatively reduced, and real-time processing on a 2D image input in real time can be realized.

そして、この機械学習として、当該２８３次元の二重特徴量による高レベル特徴を用いた、教師あり学習の枠組みで２値のサポート・ベクタ・マシン（SVM: Support Vector Machine）を構成する。このサポート・ベクタ・マシンによる識別器を用いることで、前述した低レベル特徴による抽出よりも高い精度でボール候補領域の是非を判定し、リアルタイムに入力される２Ｄ画像に対する実時間内処理で、ボールを含むとして想定されるボール候補領域の数を絞り込むことができる。 Then, as this machine learning, a binary support vector machine (SVM: Support Vector Machine) is configured in a supervised learning framework using high-level features by the 283-dimensional dual feature amount. By using the classifier based on this support vector machine, the ball candidate area is judged with higher accuracy than the extraction based on the low level feature described above, and the ball is processed in real time with the 2D image input in real time. It is possible to narrow down the number of ball candidate areas assumed to include.

即ち、サポート・ベクタ・マシンによる識別処理には処理時間を要するため、本発明に係るオブジェクト位置推定システム１では、低レベル特徴によるボール領域抽出で領域数を減らした後、高レベル特徴による認識処理を行うよう２段階抽出する構成とすることで、高精度、且つ実時間内処理を実現している。 That is, since the identification process by the support vector machine requires processing time, in the object position estimation system 1 according to the present invention, after the number of areas is reduced by the ball area extraction by the low level feature, the recognition process by the high level feature is performed. The high-precision and real-time processing is realized by adopting a configuration in which the two-stage extraction is performed to perform the

次に、対象オブジェクト検出・追跡装置２の各々は、オブジェクト軌跡生成部２３により、オブジェクト候補領域認識部２２によって特定されたボール候補領域について、現２Ｄ画像と１以上の連続する過去２Ｄ画像との間（即ち、隣接フレーム間）で関連付けたボール候補領域の軌跡を生成する（ステップＳ４）。 Next, each of the target object detection / tracking devices 2 performs the current 2D image and one or more consecutive past 2D images for the ball candidate area identified by the object candidate area recognition unit 22 by the object trajectory generation unit 23. A trajectory of a ball candidate area associated with each other (that is, between adjacent frames) is generated (step S4).

より具体的には、オブジェクト軌跡生成部２３により、当該低レベル特徴及び高レベル特徴で抽出したボール候補領域に対し、その位置座標を時間軸上で規定数（例えば３点）以上連結した移動軌跡を生成する。例えば、現２Ｄ画像のフレームｉで抽出したボール候補領域がｍ個存在し、過去２Ｄ画像のフレームｉ−１までに追跡しているボール候補領域がｎ個存在するとする。この際、フレームｉで新たに抽出したボール候補領域をフレームｉ−１までに追跡しているボール候補領域のどの軌跡に割り当てるかを、各軌跡までの距離と低レベル特徴の類似度を基準として判断する。 More specifically, for the ball candidate area extracted as the low-level feature and the high-level feature by the object-trajectory generating unit 23, a movement locus obtained by connecting the position coordinates of a predetermined number (for example, three points) or more on the time axis. Generate For example, it is assumed that there are m ball candidate areas extracted in frame i of the current 2D image and n ball candidate areas tracked by frame i-1 in the past 2D image. Under the present circumstances, it is based on the distance to each locus and the similarity of a low level feature to which locus of the ball candidate area currently traced by frame i to which a ball candidate area newly extracted by frame i is assigned. to decide.

そして、各軌跡は、フレームｉ−１までに追跡しているボール候補領域の低レベル特徴が関連付けて保持されているものとする。フレームｉで抽出したボール候補領域が、各軌跡に関連付けて保持されている低レベル特徴に対して所定値以上に類似度が高く、且つ所定範囲内で距離が近いとするボール候補領域が存在する場合、追跡成功として当該軌跡にそのボール候補領域を追加する。一方、所定値以上に類似度が高く、且つ所定範囲内で距離が近いとするボール候補領域が存在しない場合、当該フレームｉで抽出したボール候補領域を新規オブジェクトとみなし、現２Ｄ画像のフレームｉから新たな軌跡を生成する。オブジェクト軌跡生成部２３により生成される各軌跡は、ボール候補領域の検出に成功したフレームでの画像座標を保持している。また、規定フレーム数以上で追跡に失敗した軌跡は、追跡対象から削除する。 Then, it is assumed that low-level features of the ball candidate area tracked by frame i-1 are associated with each trajectory and held. There is a ball candidate area in which the ball candidate area extracted in frame i has a similarity higher than a predetermined value with respect to low-level features held in association with the trajectories, and the distance is short within a predetermined range. In the case, the ball candidate area is added to the trajectory as tracking success. On the other hand, if there is no ball candidate area whose similarity is higher than a predetermined value and the distance is short within the predetermined range, the ball candidate area extracted in the frame i is regarded as a new object, and the frame 2 of the current 2D image is considered. Generate a new trajectory from. Each trajectory generated by the object trajectory generation unit 23 holds image coordinates in a frame in which the ball candidate region has been successfully detected. In addition, trajectories for which tracking failed for a specified number of frames or more are deleted from the tracking targets.

次に、対象オブジェクト検出・追跡装置２の各々は、オブジェクト軌跡選定部２４により、オブジェクト軌跡生成部２３によって複数に生成されうるボール候補領域の軌跡のうち、現２Ｄ画像のフレームを基準にして１以上のフレームで構成されるそれぞれの軌跡について予め定めた指標に基づき最も高い値を持つ軌跡を選定し、現２Ｄ画像とともに当該選定した軌跡の各フレームにおけるボール候補座標を、追跡対象のオブジェクト（即ち、ボール）の軌跡として決定する（ステップＳ５）。 Next, each of the target object detection / tracking devices 2 causes the object trajectory selection unit 24 to generate a plurality of trajectory candidate ball candidates that can be generated by the object trajectory generation unit 23 based on the frame of the current 2D image. The trajectory having the highest value is selected based on a predetermined index for each trajectory composed of the above frames, and the ball candidate coordinates in each frame of the selected trajectory are selected along with the current 2D image, , And the ball) are determined (step S5).

より具体的には、オブジェクト軌跡選定部２４により、予め定めた指標に基づき最も高い値を持つ軌跡を選定し、対応するカメラの現２Ｄ画像での最終的なボール位置を決定する。オブジェクト軌跡生成部２３により生成された各軌跡は、ボール候補領域の検出に成功したフレームでの画像座標を保持している。尚、これら画像座標は、後述する射影変換によって実空間座標へ変換することが可能である。オブジェクト軌跡選定部２４は、各軌跡の位置座標の履歴から、実空間上（即ち、射影変換画像上）でのボール候補領域の速度、移動量、及び追跡時間のうち１つ以上を指標として求める。 More specifically, the object trajectory selection unit 24 selects a trajectory having the highest value based on a predetermined index, and determines the final ball position in the current 2D image of the corresponding camera. Each trajectory generated by the object trajectory generation unit 23 holds image coordinates in a frame in which the ball candidate region has been successfully detected. These image coordinates can be converted to real space coordinates by projective transformation described later. The object trajectory selection unit 24 obtains, from the history of position coordinates of each trajectory, one or more of the velocity, movement amount, and tracking time of the ball candidate region in real space (that is, on the projective transformation image) as an index .

例えば、現２Ｄ画像のフレームを基準にして得られるそれぞれの軌跡について、各軌跡を構成する各フレームでの画像座標を基にフレーム単位で移動量を算出し、フレーム単位で算出した移動量を各軌跡の長さに相当するフレーム数（追跡時間）で平均化することで当該指標の評価値（これは、平均動きベクトル長に相当する。）とし、この平均動きベクトル長が最も長いものを、最もボールの動きに近い軌跡として選定する。或いは、現２Ｄ画像のフレームを基準にして得られるそれぞれの軌跡について、各軌跡を構成する各フレームでの画像座標を基にフレーム単位で速度及び移動量を算出し、フレーム単位で算出した速度を各軌跡の長さに相当するフレーム数（追跡時間）で平均化することで当該指標の評価値を得ることができる。或いはまた、各軌跡のフレーム数で勘案した平均速度や平均移動量、及び追跡時間について重み付け加算した値で当該指標の評価値を得ることができる。定性的には、当該平均速度が最も速いもの、当該平均移動量が最も大きいもの、及び追跡時間が最も長いものが最もボールの動きに近い軌跡として選定される。そして、現２Ｄ画像とともに当該選定した軌跡の各フレームにおけるボール候補座標を、対象オブジェクト検出・追跡装置２の各々の最終値として決定する。尚、当該指標に閾値を設け、全ての軌跡が該閾値以下の指標である場合、即ちボールの動きに近い軌跡が存在しないと判断する場合、当該カメラ映像でのボール候補領域の追跡は失敗とみなし、当該選定した軌跡の各フレームにおけるボール候補座標の代わりにＮＵＬＬを出力する。 For example, for each locus obtained on the basis of the frame of the current 2D image, the movement amount is calculated in frame units based on the image coordinates in each frame constituting each locus, and the movement amount calculated in frame units is calculated By averaging with the number of frames (tracking time) corresponding to the length of the locus, the evaluation value of the index (which corresponds to the average motion vector length) is obtained, and the one with the longest average motion vector length is Select as the trajectory closest to the movement of the ball. Alternatively, for each trajectory obtained based on the frame of the current 2D image, the velocity and movement amount are calculated in frame units based on the image coordinates in each frame constituting each trajectory, and the velocity calculated in frame units is calculated The evaluation value of the index can be obtained by averaging the number of frames (tracking time) corresponding to the length of each trajectory. Alternatively, it is possible to obtain an evaluation value of the index by a weighted addition of the average velocity and the average moving amount and the tracking time taken into consideration in the number of frames of each trajectory. Qualitatively, the one with the highest average velocity, the one with the highest average movement amount, and the one with the longest tracking time is selected as the trajectory closest to the movement of the ball. Then, the ball candidate coordinates in each frame of the selected trajectory together with the current 2D image are determined as final values of each of the target object detection / tracking device 2. Note that if the index is provided with a threshold and all the trajectories are below the threshold, that is, if it is determined that there is no trajectory close to the movement of the ball, tracking of the ball candidate area in the camera image is considered as failure. It considers and outputs NULL instead of the ball candidate coordinates in each frame of the selected trajectory.

更に、オブジェクト軌跡生成部２３は、オブジェクト軌跡選定部２４による軌跡の選定に関わらず、当該複数に生成された際の全てのボール候補領域の軌跡を継続的に追跡している。したがって、オブジェクト軌跡選定部２４は、一旦、最もボールの動きに近い軌跡として選定されたボール候補座標の軌跡が途切れた場合（即ち、当該軌跡に割り当てられるボール候補座標が無くなってしまう場合）も、その途切れた時点のフレームにおける最も距離的に近接する他のボール候補領域の画像座標を当該最もボールの動きに近い軌跡のボール候補座標とみなして規定フレーム数まで追跡して出力する。このとき、オブジェクト軌跡選定部２４は、その途切れた時点の「追跡終了フレームからの経過時間」を補助情報として付与して追跡対象のオブジェクト（即ち、ボール）の軌跡とともに出力する。各軌跡は必ず「追跡終了フレームからの経過時間」を補助情報として有するように構成され、追跡対象のオブジェクトの追跡開始から現２Ｄ画像まで常に追跡できていた時には、「追跡終了フレームからの経過時間」は“０”を意味する。 Furthermore, regardless of the selection of the trajectory by the object trajectory selection unit 24, the object trajectory generation unit 23 continuously tracks the trajectories of all the ball candidate areas when the plurality of ball candidate regions are generated. Therefore, the object trajectory selection unit 24 also temporarily cancels the trajectory of the ball candidate coordinates selected as the trajectory closest to the movement of the ball (that is, when the ball candidate coordinates assigned to the trajectory disappear), The image coordinates of the other ball candidate area closest to the distance in the frame at the interrupted time point are regarded as the ball candidate coordinates of the trajectory closest to the movement of the ball, and tracked and output up to the specified frame number. At this time, the object trajectory selection unit 24 adds “the elapsed time from the tracking end frame” at the time of the interruption as the auxiliary information, and outputs it along with the trajectory of the object to be tracked (that is, the ball). Each trajectory is always configured to have "elapsed time from the tracking end frame" as auxiliary information, and when it is possible to always track from the tracking start of the tracking target object to the current 2D image, "elapsed time from the tracking end frame "" Means "0".

通常、全ての軌跡は、連続するフレームでボール候補座標が無くなることがあり、ボール候補座標が無くなった場合でも最も距離的に近接する他のボール候補領域の画像座標を出力するよう構成することで、オブジェクト軌跡選定部２４によって当該最終的に選定するボール候補領域の軌跡を規定フレーム数まで追跡表示させることが可能となり、その追跡能力を向上させることができる。そして、その途切れた時点の「追跡終了フレームからの経過時間」を補助情報として付与することで、後述する最終的なボール位置座標の選定時における精度を向上させることができる。 Normally, all the trajectories may lose ball candidate coordinates in consecutive frames, and even if the ball candidate coordinates are lost, the image coordinates of another ball candidate area closest in distance can be output by outputting the image coordinates. The trajectory of the ball candidate area finally selected by the object trajectory selection unit 24 can be traced and displayed up to the specified number of frames, and the tracking ability can be improved. Then, by adding “the elapsed time from the tracking end frame” at the time of the interruption as the auxiliary information, it is possible to improve the accuracy at the time of selecting the final ball position coordinate described later.

オブジェクト軌跡選定部２４により選定した軌跡の例を図７に示す。図７は、本発明による一実施形態のオブジェクト位置推定システム１におけるオブジェクトの軌跡生成処理及び軌跡選定処理の処理結果に関する説明図である。オブジェクト軌跡選定部２４により選定した軌跡は、図７に示すように、現２Ｄ画像のフレームＦｒにおける当該選定された軌跡のボール候補座標Ｂｎは、現２Ｄ画像Ｆｒより過去のフレームに関するボール候補座標を基に、軌跡ＴＬｎとして表すことができる。 An example of the locus selected by the object locus selection unit 24 is shown in FIG. FIG. 7 is an explanatory diagram of processing results of an object trajectory generation process and a trajectory selection process in the object position estimation system 1 according to the embodiment of the present invention. As the trajectory selected by the object trajectory selection unit 24, as shown in FIG. 7, the ball candidate coordinates Bn of the selected trajectory in the frame Fr of the current 2D image are the ball candidate coordinates for the past frames from the current 2D image Fr. Can be expressed as a locus TLn.

次に、対象オブジェクト位置統合装置３は、対象オブジェクト検出・追跡装置２の各々から、それぞれの現２Ｄ画像と、当該選定された軌跡の各フレームにおけるボール候補座標（或いは当該ＮＵＬＬ）を入力する（ステップＳ６）。 Next, the target object position integration device 3 inputs the current 2D image of each and the ball candidate coordinates (or the NULL) in each frame of the selected trajectory from each of the target object detection / tracking device 2 ( Step S6).

次に、対象オブジェクト位置統合装置３は、射影変換部３１により、各対象オブジェクト検出・追跡装置２から入力された、２Ｄ画像（即ち、現２Ｄ画像）と、当該選定された軌跡のボール候補座標について、各ボール候補座標を反映する所定の射影変換画像へそれぞれ変換して統合する（ステップＳ７）。 Next, in the target object position integration device 3, the 2D image (that is, the current 2D image) input from each target object detection / tracking device 2 by the projective transformation unit 31 and the ball candidate coordinates of the selected trajectory Are converted and integrated into predetermined projective transformation images reflecting the ball candidate coordinates (step S7).

より具体的には、射影変換部３１により、各対象オブジェクト検出・追跡装置２から得られた２Ｄ画像と、当該選定された軌跡のボール候補座標を実空間座標へと射影変換する。射影変換とは、平面から平面へ写像する、式（２），（３）で表わされる画像上の変換である（例えば、『高橋，沼徳，青木，近藤、“東映画像の幾何補正に関する実験的検討”，計測自動制御学会東北支部第２３５回研究集会，資料番号２３５―５，２００７』を参照）。 More specifically, the projective transformation unit 31 projective-transforms the 2D image obtained from each target object detection / tracking device 2 and the ball candidate coordinates of the selected trajectory into real space coordinates. Projective transformation is a transformation on an image represented by equations (2) and (3), which maps from plane to plane (for example, “experiment on geometric correction of Toei image,“ Takahashi, Numa, Aoki, Kondo ” Examination ”, the Tohoku Section of the Institute of Measurement and Control Engineers, No. 235, Conference No. 235-5, 2007.

ここで、座標（ｘ_ｃ, ｙ_ｃ）は変換前、座標（ｘ_ｐ, ｙ_ｐ）は変換後の座標とする。本例における射影変換は、平面から平面への写像であるため変換後も２次元座標となるが、２次元フィールド座標では高さを省略できるため、実空間上での高さ成分をゼロ（ｚ＝０）とみなしている。ｈ_１,…,ｈ_８は射影変換パラメータであり、射影変換行列Ｈは式（４）で表される。 Here, coordinates (x _c , y _c ) are coordinates before conversion, and coordinates (x _p , y _p ) are coordinates after conversion. The projective transformation in this example is a mapping from a plane to a plane and therefore becomes two-dimensional coordinates even after transformation, but since height can be omitted in two-dimensional field coordinates, the height component in real space is zero (z = 0) is considered. h ₁ ,..., h ₈ are projective transformation parameters, and the projective transformation matrix H is expressed by equation (4).

式（４）における８つのパラメータは、画像間の４点以上の対応関係が得られれば求めることができる。例えば、固定撮影映像からフィールドの四隅など特徴的な点を４点以上指定し、フィールドを真上から見た映像へ変換する射影変換行列を事前に作成しておく。この射影変換行列を算出することで、例えば、図８（Ａ）に示す或るカメラのフレーム画像Ｆｒ（ｎ）（及びボール候補座標）を、図８（Ｂ）に示す２次元フィールド座標系の画像ＩＦｒ（及び射影変換後のボール候補座標）へと変換することが可能となる。尚、図８（Ａ）,（Ｂ）は、サッカーフィールドの右サイドを撮影した画像を、この真上座標系へ射影変換した画像例を示している。さらに、サッカーフィールドの右サイドを撮影した画像を真上座標系へ射影変換したものと、サッカーフィールドの左サイドを撮影した画像を射影変換したものとを合成することで、例えば図９に例示する画像を得ることができる。即ち、対象オブジェクト検出・追跡装置２の各々から得られるフレーム画像及び当該選定された軌跡のボール候補座標は、同一の空間（例えば、真上画像座標系に相当するフィールド座標）に射影変換される。以降の処理は、この射影変換した真上画像座標系で処理が進められる。 The eight parameters in Equation (4) can be obtained if the correspondence between four or more points between images is obtained. For example, four or more characteristic points such as the four corners of the field are specified from the fixed captured video, and a projection transformation matrix is created in advance to transform the field into a video viewed from directly above. By calculating this projective transformation matrix, for example, a frame image Fr (n) (and ball candidate coordinates) of a certain camera shown in FIG. 8A can be displayed in the two-dimensional field coordinate system shown in FIG. It becomes possible to convert into an image IFr (and ball candidate coordinates after projective transformation). FIGS. 8A and 8B show an example of an image obtained by projective conversion of an image obtained by photographing the right side of the soccer field to the directly above coordinate system. Furthermore, it is illustrated, for example, in FIG. 9, for example, by combining an image obtained by project-transforming an image obtained by capturing the right side of the soccer field into the directly upper coordinate system and one obtained by project-transformed images obtained by capturing the left side of the soccer field. You can get an image. That is, the frame image obtained from each of the target object detection / tracking device 2 and the ball candidate coordinates of the selected trajectory are projective transformed into the same space (for example, field coordinates corresponding to the directly above image coordinate system) . The subsequent processing is performed in the projection coordinate system directly above.

次に、対象オブジェクト位置統合装置３は、ブロック判定部３２により、射影変換部３１によって得られる射影変換画像について所定ブロック数でブロック分割し、各ブロック内で当該ボールのオブジェクトと相関性が高いものとして予め定めた当該ボール以外のオブジェクトによって表される特徴量を算出して比較し、該特徴量が所定値以下のブロックを排除し、未排除のブロックに位置するボール候補座標を選定することにより、当該射影変換画像から得られる各ボール候補座標を選択する。例えば、各ブロック内の当該ボール以外のオブジェクトによって表される特徴量として、人物領域の占有率とすることができる。即ち、ブロック判定部３２のブロック判定処理として、射影変換画像から得られる各ブロックの人物領域の占有率を算出し、人物領域の占有率を基に、人物領域の占有率が高いブロック内のボール候補座標を優先的に選択する。 Next, in the target object position integration device 3, the block determination unit 32 divides the projective transformed image obtained by the projective transform unit 31 into blocks by a predetermined number of blocks, and has high correlation with the object of the ball in each block. By calculating and comparing feature quantities represented by objects other than the ball, which are predetermined as the above, excluding blocks whose feature quantities are equal to or less than a predetermined value, and selecting ball candidate coordinates located in unremoved blocks Each ball candidate coordinate obtained from the projective transformation image is selected. For example, the occupancy rate of a person area can be used as a feature quantity represented by an object other than the ball in each block. That is, as the block determination processing of the block determination unit 32, the occupancy rate of the person area of each block obtained from the projective transformation image is calculated, and based on the occupancy rate of the person area, the ball in the block having a high occupancy rate of the person area Select candidate coordinates with priority.

より具体的には、ブロック判定部３２により、例えば、図９のように射影変換した画像ＩＦｒについて、図１０に示すように、１２ブロックに分割し、各ブロック内で人物領域ｈの占有率を算出する。占有率の算出式を式（５）に示す。 More specifically, for example, as shown in FIG. 10, the block determination unit 32 divides the image IFr project-transformed as shown in FIG. 9 into 12 blocks, and the occupancy rate of the person area h in each block is calculate. An equation for calculating the occupancy rate is shown in equation (5).

Ｏ_ｉ＝ｍ_ｉ／ｐ_ｉ（５） O _i = m _i / p _i (5)

ここで、Ｏ_ｉはブロック番号Ｂｋにおける人物領域の占有率であり、ｍ_ｉは人物領域の合計画素数、ｐ_ｉはブロックＢｋ内のトータル画素数である。一般に、ボールは選手が密集しているブロックに位置することが多い。そこで、或るカメラ映像で人物領域の占有率が低いブロックからボールを検出した場合、その検出を却下することで、安定したボール位置推定を実現することができる。尚、各ブロック内の当該ボール以外のオブジェクトによって表される特徴量（人物領域）の算出は、厳密性を要するものではなく、各ブロック間の差異を基に判別可能とするものであれば、例えば前述した色ヒストグラムによる解析など任意の画像解析処理を用いることができる。例えば、図１０では、Ｂｋ（０，１），Ｂｋ（０，２），Ｂｋ（１，１），Ｂｋ（１，２），Ｂｋ（２，１），Ｂｋ（２，２）以外のブロックＢｋがボール推定位置から排除される。 Here, O _i is the occupancy of the human region in the block number Bk, m _i is the total number of pixels in the person area, p _i is the total number of pixels in the block Bk. In general, balls are often located in blocks where players are closely packed. Therefore, when a ball is detected from a block having a low occupancy rate of a person area in a certain camera image, a stable ball position estimation can be realized by rejecting the detection. Note that the calculation of the feature amount (person area) represented by an object other than the ball in each block does not require strictness, and it can be determined based on the difference between the blocks. For example, arbitrary image analysis processing such as analysis using the color histogram described above can be used. For example, in FIG. 10, blocks other than Bk (0, 1), Bk (0, 2), Bk (1, 1), Bk (1, 2), Bk (2, 1), Bk (2, 2) Bk is excluded from the ball estimated position.

次に、対象オブジェクト位置統合装置３は、オブジェクト位置推定部３３により、ブロック判定部３２によって選択された各ボール候補座標について、各軌跡の「追跡終了フレームからの経過時間」を基に、所定時間以上（所定のフレーム数以上）に継続しているボール候補座標を最終的なボール位置座標として決定する（ステップＳ９）。各軌跡は必ず「追跡終了フレームからの経過時間」を補助情報として有するように構成され、追跡対象のオブジェクトの追跡開始から現２Ｄ画像まで常に追跡できていた時には、「追跡終了フレームからの経過時間」は“０”を意味する。 Next, in the target object position integration device 3, for each of the ball candidate coordinates selected by the block determination unit 32 by the object position estimation unit 33, based on the “elapsed time from the tracking end frame” of each trajectory, a predetermined time The ball candidate coordinates continuing above (more than the predetermined number of frames) are determined as the final ball position coordinates (step S9). Each trajectory is always configured to have "elapsed time from the tracking end frame" as auxiliary information, and when it is possible to always track from the tracking start of the tracking target object to the current 2D image, "elapsed time from the tracking end frame "" Means "0".

より具体的には、オブジェクト位置推定部３３により、真上座標系での各カメラのボール候補座標を統合し、最終的なボール位置座標を決定するため、実空間座標（２次元フィールド座標）に射影変換した各カメラ映像のボール座候補座標について、式（６）に従い、追跡終了フレームからの経過時間に基づく重み付き線形和で最終的なボール位置座標Ｐ_Ｆを算出する。 More specifically, the object position estimation unit 33 integrates the ball candidate coordinates of each camera in the directly upper coordinate system, and determines the final ball position coordinates, so that the real space coordinates (two-dimensional field coordinates) are used. The final ball position coordinates P _F are calculated by weighted linear sum based on the elapsed time from the tracking end frame according to the equation (6) for the ball seat candidate coordinates of each camera video projective transformed.

ここで、フレーム番号ｃを要素として、ｔ（ｃ）は追跡終了フレームからの経過時間、Ｐ_Ｆ（ｃ）はそれぞれの対象オブジェクト検出・追跡装置２からブロック判定部３２を経て選択されて得られているボール候補座標、Ｔは「追跡終了フレームからの経過時間ｔ（ｃ）」の最大値として定義した固定時間（例えば３秒）である。実空間座標（２次元フィールド座標）の画像上でのボール位置推定例を図１１に示す。図中の丸印が最終的に推定し得られたボール位置座標点であり、黒点が実際のボール位置の座標点を表している。また４台のカメラ映像毎の処理で得られていたボール候補座標の位置が、好適には同一箇所を示すことになるが、図１１に示すように、複数の異なる箇所でボール候補座標の位置を示す場合も想定され、それぞれのカメラ番号を表す数字で示されている。図１１において、カメラ番号２が最も追跡終了フレームからの経過時間ｔ（ｃ）が長いことを示しており、式（６）によれば、“Ｔ−ｔ（ｃ）”が当該カメラで得られた軌跡の重みを意味するところとなり、追跡終了フレームからの経過時間ｔ（ｃ）がより長いボール候補座標が最終的なボール位置座標Ｐ_Ｆとして長く表示される。 Here, with frame number c as an element, t (c) is an elapsed time from the tracking end frame, and P _F (c) is obtained by being selected from each target object detection / tracking device 2 via the block determination unit 32 The ball candidate coordinates in question, T is a fixed time (for example, 3 seconds) defined as the maximum value of “the elapsed time t (c) from the tracking end frame”. An example of ball position estimation on an image of real space coordinates (two-dimensional field coordinates) is shown in FIG. Circles in the figure are the ball position coordinate points finally obtained by estimation, and black dots represent the coordinate points of the actual ball position. Also, although the positions of the ball candidate coordinates obtained in the processing for each of the four camera images preferably indicate the same portion, as shown in FIG. 11, the positions of the ball candidate coordinates at a plurality of different portions Is also assumed, and is indicated by a number representing each camera number. In FIG. 11, the camera number 2 indicates that the elapsed time t (c) from the tracking end frame is the longest, and according to Equation (6), “T−t (c)” is obtained by the camera. The ball candidate coordinates having a longer elapsed time t (c) from the tracking end frame are displayed longer as the final ball position coordinates P _F.

即ち、当該決定された最終的なボール位置座標は、当該射影変換画像における位置座標を示す態様で、図示しない所定の画像表示部へと表示させ、この表示を連続させることで、当該最終的に推定したボール位置の軌跡を当該射影変換画像上にて表示させることができる。そして、「追跡終了フレームからの経過時間」を基に、所定時間以上（例えば、３フレーム以上とするなどの所定のフレーム数以上）に継続しているボール候補座標を最終的なボール位置座標として決定する。これにより、例えば、ブロック判定部３２によって選択された各ボール候補座標が或るフレームにて複数ある場合や全く検出されない場合も、複数フレームで連続的に射影変換された各射影変換画像上にて連続表示させると、概ね１つの推定されたボール位置の軌跡が認識される。 That is, the determined final ball position coordinates are displayed on a predetermined image display unit (not shown) in a manner of indicating the position coordinates in the projective transformation image, and the display is continuously performed, thereby the final The trajectory of the estimated ball position can be displayed on the projective transformation image. Then, based on "the elapsed time from the tracking end frame", ball candidate coordinates continuing for a predetermined time or more (for example, a predetermined number of frames such as 3 frames or more) are set as final ball position coordinates. decide. Thus, for example, even in the case where there are a plurality of ball candidate coordinates selected by the block determination unit 32 in a certain frame or none of them are detected at all, it is possible to continuously project transformed images in a plurality of frames. When continuously displayed, a track of approximately one estimated ball position is recognized.

尚、ブロック判定部３２の処理後にも複数のボール候補座標の軌跡が認められる場合に、これを１つの軌跡に絞りたいときには、最も優先度の高いブロック（例えば、最も人物の占有率の高いブロック）内で、「追跡終了フレームからの経過時間」が最も長いボール候補座標を最終的なボール位置座標Ｐ_Ｆとして決定するよう構成してもよい。 Note that if trajectories of a plurality of ball candidate coordinates are recognized even after the processing of the block determination unit 32, the block with the highest priority (for example, the block with the highest occupancy rate of the person) In the above, the ball candidate coordinates having the longest “time elapsed from the tracking end frame” may be determined as the final ball position coordinates P _F.

また、「追跡終了フレームからの経過時間」を基にする式（６）の代わりに、ボール位置の状態推定アルゴリズムとしてカルマンフィルタを用いることもできる。カルマンフィルタは、離散的な誤差のある観測から、時々刻々と時間変化する量（例えばある物体の位置と速度）を推定するために用いられる（例えば、『西山、“カルマンフィルタ”、電子情報通信学会「知識ベース」１群（信号・システム)‐５編（信号理論）‐６章，２０１１』を参照）。ただし、後述する実験結果に示すように、カルマンフィルタよりも式（６）のように定義した状態推定式のほうがリアルタイムのボール位置推定に優れた結果が得られている。 Also, a Kalman filter can be used as a ball position state estimation algorithm instead of the equation (6) based on the "time elapsed from the tracking end frame". The Kalman filter is used to estimate the time-varying amount (for example, the position and velocity of an object) from observation with discrete error (for example, “Nishiyama,“ Kalman filter ”, Institute of Electronics, Information and Communication Engineers,“ Knowledge base, 1 group (signals and systems)-5 (signal theory)-Chapter 6, chapter 2011). However, as shown in the experimental results described later, the state estimation equation defined as equation (6) is superior to the Kalman filter in real-time ball position estimation.

以下、本実施形態のオブジェクト位置推定システム１と、従来技法とを比較検証した実験結果について説明する。 An experimental result comparing and verifying the object position estimation system 1 of the present embodiment with the conventional technique will be described below.

（実験条件）
本実施形態のオブジェクト位置推定システム１の有効性を確認するため、サッカーの試合映像を用いた検証実験を行った。２０１４年天皇杯全日本サッカー選手権にて、フィールドを固定撮影した４台のカメラ映像を検証に用いた。各カメラ前半の映像を学習し、後半の映像を評価に利用した。２次元フィールド画像上のボール位置を１０フレーム毎に目視確認し、ボール位置の正解データとした。 (Experimental conditions)
In order to confirm the effectiveness of the object position estimation system 1 of the present embodiment, a verification experiment using a soccer match video was performed. At the 2014 Emperor's Cup All-Japan Soccer Championship, 4 camera images of the field fixedly shot were used for verification. We learned the video of the first half of each camera and used the video of the second half for evaluation. The ball position on the two-dimensional field image was visually checked every 10 frames and used as the correct data of the ball position.

（ボール検出精度）
まず、本発明に係るオブジェクト候補領域抽出部２１の低レベル特徴による抽出を経て得られるオブジェクト候補領域認識部２２の高レベル特徴の抽出処理についての性能評価を行った。約９，０００枚のボール画像（正例）と非ボール画像（負例）を利用し、その半数を学習用とし、残りの半数を評価用とした。２値判定による識別精度、及び識別に要した１画像あたりの処理時間を図１２に示した。比較対象として、モーメント、色ヒストグラム単体、ＬＢＰ単体、ＯＲＢ、ＳＵＲＦ、及びＳＩＦＴを示している。本実験に係る「モーメント」は、画像や形状の特徴を定量的に表現する技法の一つであり、本例では０次の空間モーメント、中心モーメント、正規化された中心モーメント、及び並進・回転に不変な７つのＨｕモーメント不変量を合わせて１０次元の特徴量とした。本実験に係る「色ヒストグラム」は、ＲＧＢ色空間を３×３×３の２７次元の特徴量とした。本実験に係る「ＬＢＰ」は、式（１）に基づくＲ＝１, Ｐ＝８の２５６次元の特徴量とした。「ＯＲＢ」、「ＳＵＲＦ」、及び「ＳＩＦＴ」は画像内の局所勾配による特徴量として従来から知られたものであり、その特徴次元をビンとしたヒストグラムで識別を行った。 (Ball detection accuracy)
First, the performance evaluation of the high-level feature extraction process of the object candidate area recognition unit 22 obtained through the low-level feature extraction of the object candidate area extraction unit 21 according to the present invention was performed. Using about 9,000 ball images (positive example) and non-ball images (negative example), half of them were used for learning and the other half were used for evaluation. The discrimination accuracy by binary determination and the processing time per image required for discrimination are shown in FIG. For comparison, moments, color histograms alone, LBP alone, ORB, SURF, and SIFT are shown. The “moment” in this experiment is one of the techniques for quantitatively expressing the features of images and shapes, and in this example, the zero-order spatial moment, central moment, normalized central moment, and translation and rotation The seven invariant Hu moment invariants are combined into 10 dimensional feature quantities. In the “color histogram” according to this experiment, the RGB color space is a 27 × 3 × 3 × 3 feature amount. “LBP” according to this experiment is a 256-dimensional feature quantity of R = 1, P = 8 based on the equation (1). “ORB”, “SURF”, and “SIFT” are conventionally known as features due to local gradients in an image, and identification was performed using a histogram with bins of the feature dimensions.

本実験に係る図１２に示す「本発明」は、オブジェクト候補領域認識部２２の高レベル特徴による抽出として色ヒストグラム特徴量（２７次元）とＬＢＰ特徴量（２５６次元）の二重特徴量による２８３次元の特徴量である。図１２に示すように、ＬＢＰ及び色ヒストグラムの識別性能は単体でも他の技法に比べて高精度となり、本実施形態のオブジェクト位置推定システム１のようにその両者を組み合わせた二重特徴量では最も高精度となる値となった。本実施形態のオブジェクト位置推定システム１の処理速度に関しては、ＬＢＰ単体及び色ヒストグラム単体と比べて若干遅くなるものの、リアルタイム処理を実現できる範囲内の負荷で実現されることが確認できた。 The “present invention” shown in FIG. 12 according to the present experiment is 283 based on the double feature quantity of the color histogram feature quantity (27 dimensions) and the LBP feature quantity (256 dimensions) as extraction based on high level features of the object candidate area recognition unit 22. It is a feature of dimension. As shown in FIG. 12, the discrimination performance of the LBP and the color histogram is higher than that of the other techniques even if it is a single device, and the double feature amount combining the both as in the object position estimation system 1 of this embodiment is the best. It became a high precision value. It has been confirmed that the processing speed of the object position estimation system 1 of the present embodiment can be realized with a load within a range where real time processing can be realized although it is slightly slower than LBP alone and color histogram alone.

（ボール位置推定精度）
続いて、本実施形態のオブジェクト位置推定システム１によるボール位置の推定精度を検証した。試合後半の映像から２次元フィールド座標上でのボールの正解位置と推定位置の誤差を算出した。代表的なオブジェクト追跡技法であるカルマンフィルタと比較した結果を図１３に示している。 (Ball position estimation accuracy)
Subsequently, the estimation accuracy of the ball position by the object position estimation system 1 of the present embodiment was verified. The error between the correct position and the estimated position of the ball on the two-dimensional field coordinates was calculated from the video of the second half of the game. The result compared with the Kalman filter which is a typical object tracking technique is shown in FIG.

推定誤差を実空間のメートルに換算したところ、本実施形態のオブジェクト位置推定システム１では、誤差の平均値は５．４２ｍであった。一方、カルマンフィルタの場合では、誤差の平均値は１０．５５ｍとなり、本実施形態のオブジェクト位置推定システム１はカルマンフィルタと比較しても誤差は低く抑えられており、高い精度でボール位置を推定可能となることを確認できた。推定誤差が所定半径の円内に収まる確率も識別精度の重要な要素して考察するに、例えば、図１４（Ａ）では、カメラ番号３のボール候補座標で最終的なボール位置座標として推定され、実際のボール位置と一致している場合を示している。また、図１４（Ｂ）では、カメラ番号１，４のボール候補座標のうちカメラ番号４のボール候補座標で最終的なボール位置座標として推定され、実際のボール位置とは若干のずれが生じている。この推定誤差が所定半径の円内に収まる確率も図１３に示しており、当該試合の約９割の時間帯で半径１０ｍの以内の誤差を達成できていることを確認した。 When the estimation error was converted into meters in the real space, in the object position estimation system 1 of the present embodiment, the average value of the errors was 5.42 m. On the other hand, in the case of the Kalman filter, the average value of the errors is 10.55 m, and the object position estimation system 1 of this embodiment suppresses the error low compared to the Kalman filter, and can estimate the ball position with high accuracy. It could be confirmed that The probability that the estimation error falls within a circle of a predetermined radius is also considered as an important factor of identification accuracy. For example, in FIG. , Shows the case where it matches the actual ball position. Further, in FIG. 14B, the ball candidate coordinates of camera number 4 among the ball candidate coordinates of camera numbers 1 and 4 are estimated as final ball position coordinates, and a slight deviation from the actual ball position occurs. There is. The probability that this estimation error falls within a circle of a predetermined radius is also shown in FIG. 13, and it was confirmed that an error within a radius of 10 m could be achieved in about 90% of the time slot of the game.

本実施形態のオブジェクト位置推定システム１では、特に、各カメラ映像に対して機械学習を用いた高精度なボール検出処理を行っている点、軌跡からボールの動きに近いオブジェクトを効果的に選択できている点、及び複数のカメラ映像に対する処理結果を相補的に統合している点が、当該高精度化に貢献したものと考えられる。 In the object position estimation system 1 of the present embodiment, in particular, it is possible to effectively select an object close to the movement of the ball from the locus, in that high precision ball detection processing using machine learning is performed on each camera image. And the point that the processing results for a plurality of camera images are complementarily integrated are considered to have contributed to the high accuracy.

以上、特定の実施形態の例を挙げて本発明を説明したが、本発明は前述の実施形態に限定されるものではなく、その技術思想を逸脱しない範囲で種々変形可能である。例えば、本実施形態のオブジェクト位置推定システム１では、複数の対象オブジェクト検出・追跡装置２と、対象オブジェクト位置統合装置３とを個別の装置で構成する例を説明したが、複数の対象オブジェクト検出・追跡装置２のうちの１つに対象オブジェクト位置統合装置３の機能を組み入れるように構成してもよい。また、用途によっては、複数の対象オブジェクト検出・追跡装置２と、対象オブジェクト位置統合装置３の各機能を１つの装置に組み入れて構成してもよい。したがって、複数の対象オブジェクト検出・追跡装置２、及び対象オブジェクト位置統合装置３は、それぞれパーソナルコンピュータにより実現することができ、個別のプログラムで動作するよう構成することができるほか、単一のプログラムとして構成することもできる。 Although the present invention has been described above by giving an example of a specific embodiment, the present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the technical concept thereof. For example, in the object position estimation system 1 according to the present embodiment, an example in which the plurality of target object detection / tracking devices 2 and the target object position integration device 3 are configured as separate devices has been described. One of the tracking devices 2 may be configured to incorporate the function of the target object position integration device 3. Also, depending on the application, the functions of the plurality of target object detection / tracking devices 2 and the target object position integration device 3 may be incorporated into one device. Therefore, the plurality of target object detection / tracking devices 2 and the target object position integration device 3 can be realized by personal computers, can be configured to operate with individual programs, and can be implemented as a single program. It can also be configured.

また、上述した例において、ブロック判定部３２は、追跡対象のオブジェクトとして相関性が高いものとして予め定めることが可能な他のオブジェクトについて、複数の人物による占有率を例に説明したが、例えば卓球など競技者数が限られている場合は、各ブロックでその競技者自体が占める割合で判別するよう構成してもよい。 Also, in the above-described example, the block determination unit 32 has described, as an example, occupancy rates by a plurality of persons for other objects that can be predetermined as objects to be tracked that are highly correlated. If the number of athletes is limited, for example, each block may be configured to determine the proportion occupied by the athlete itself.

また、上述した例では、射影変換画像として２次元フィールド画像とする例を説明したが、３次元空間から２次元空間へと定義づけることができる座標系であれば任意の射影変換画像を形成することができる。 In the above-described example, the projective transformation image is described as a two-dimensional field image, but any projective transformation image is formed as long as it can be defined from a three-dimensional space to a two-dimensional space. be able to.

本発明によれば、高い精度で実空間上のオブジェクト位置を推定することができるので、スポーツ映像解析技術、特に、自動審判、スポーツ番組の放送、スポーツデータ生成・配信、コーチングなどのサービスに利用することが可能である。また追跡対象を特定オブジェクトに拡張すれば、監視カメラ映像解析に基づぃたセキュリティシステムなど、様々なサービスにも応用可能である。したがって、高い精度での実空間上のオブジェクト位置の推定を要する用途に有用である。 According to the present invention, it is possible to estimate the object position in real space with high accuracy, so it is used for sports video analysis technology, in particular, services such as automatic refereeing, sports program broadcasting, sports data generation / distribution, and coaching. It is possible. If the tracking target is expanded to a specific object, it can be applied to various services such as a security system based on surveillance camera image analysis. Therefore, it is useful for the application which requires estimation of the object position in real space with high accuracy.

１オブジェクト位置推定システム
２対象オブジェクト検出・追跡装置
３対象オブジェクト位置統合装置
２１オブジェクト候補領域抽出部
２２オブジェクト候補領域認識部
２３オブジェクト軌跡生成部
２４オブジェクト軌跡選定部
３１射影変換部
３２ブロック判定部
３３オブジェクト位置推定部 DESCRIPTION OF SYMBOLS 1 object position estimation system 2 target object detection and tracking device 3 target object position integration device 21 object candidate region extraction unit 22 object candidate region recognition unit 23 object trajectory generation unit 24 object trajectory selection unit 31 projection conversion unit 32 block determination unit 33 object Position estimation unit

Claims

An object position estimation system capable of processing an image by a plurality of cameras in parallel to enable tracking of a specific object in the image,
Object candidate area extraction means for inputting a continuous image in frame units from each camera to generate an inter-frame difference image, and determining object candidate areas in the current frame by low-level features based on areas divided by the difference image; ,
Object candidate area recognition means for specifying an area assumed to include an object to be tracked among the object candidate areas obtained from each camera by high-level feature based on machine learning,
Object trajectory generating means for generating a trajectory of a ball candidate region associated between frames for an object candidate region specified by the high level feature obtained from each camera;
Object trajectory selection means for selecting, as a trajectory of an object to be tracked, a trajectory having the highest value based on a predetermined index for each camera among trajectories of the object candidate area obtained from each camera;
Projection transformation means for transforming the image of the current frame obtained from each camera and the image coordinates of the locus of the object candidate area selected for each camera into a predetermined projective transformation image and integrating them;
The projective transformation image is divided into blocks, and within each block, feature amounts represented by objects other than the object to be tracked, which are predetermined to be highly correlated with the object to be tracked, are calculated and compared. Block determination means for excluding blocks having a feature amount equal to or less than a predetermined value and selecting object candidate coordinates located in unremoved blocks;
Object position estimation means for determining, as final ball position coordinates, ball candidate coordinates continuing for a predetermined time or more with respect to the selected object candidate coordinates;
An object position estimation system comprising:

The object candidate area recognition means specifies a ball candidate area recognized by a dual feature quantity of a color histogram feature quantity and a local binary pattern feature quantity by a high level feature based on machine learning. Object position estimation system described in.

The object trajectory selection means is characterized in that one or more of the velocity, the movement amount, and the tracking time of the object candidate area on the projective transformation image is selected as an index determined in advance for each camera. The object position estimation system according to claim 1 or 2.

The block determination unit targets a person as having high correlation with the object to be tracked, and targets an occupancy rate of the person as a feature represented by an object other than the object to be tracked. The object position estimation system according to any one of claims 1 to 3, wherein

The object position estimation means determines, as the final ball position coordinates, ball candidate coordinates continuing for a predetermined time or more based on the elapsed time from the tracking end frame of each trajectory for the selected object candidate coordinates. The object position estimation system according to any one of claims 1 to 4, characterized in that:

The object position estimation system according to claim 1, wherein the object candidate area extraction unit, the object candidate area recognition unit, the object trajectory generation unit, and the parallel processing of the object trajectory selection unit are realized. A computer individually selecting trajectories of an object candidate area relating to the specific object from images of a camera to be processed in
Generating a frame-to-frame difference image by inputting a continuous image in frame units from a camera to be processed, and determining an object candidate region in the current frame according to a low-level feature based on a region divided by the difference image;
Identifying a region assumed to include the object to be tracked among the object candidate regions obtained from the camera to be processed, using a high-level feature based on machine learning;
Generating a trajectory of a ball candidate area associated between frames for an object candidate area specified by the high level feature obtained from the camera to be processed;
Of trajectory of the object candidate area obtained from the processing target camera, comprising the steps of selecting a trajectory with the highest value on the basis of the predetermined metric for each camera as the locus of the tracking target object was executed by the program Above,
The object position estimation system according to claim 1, wherein the projection conversion means, the block determination means, and the object position estimation means are realized by the respective features extracted from the images of the plurality of cameras. On a computer that integrates object candidate areas related to objects and enables tracking of the specific object;
The step of integrating by converting the image of the current frame obtained from the camera through the steps executed by the program, the image coordinates of the trajectory of the object candidate region is selected for each camera to a predetermined projection transformation image When,
The projective transformation image is divided into blocks, and within each block, feature amounts represented by objects other than the object to be tracked, which are predetermined to be highly correlated with the object to be tracked, are calculated and compared. Removing blocks having a feature amount equal to or less than a predetermined value, and selecting object candidate coordinates located in unremoved blocks;
A program for causing a ball candidate coordinate continuing for a predetermined time or more with respect to the selected object candidate coordinate to be determined as a final ball position coordinate.