JP7198661B2

JP7198661B2 - Object tracking device and its program

Info

Publication number: JP7198661B2
Application number: JP2018245234A
Authority: JP
Inventors: 正樹高橋; 秀樹三ツ峰
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2018-12-27
Filing date: 2018-12-27
Publication date: 2023-01-04
Anticipated expiration: 2038-12-27
Also published as: JP2020107071A

Description

本発明は、赤外画像及び可視画像を用いて、オブジェクトを追跡するオブジェクト追跡装置及びそのプログラムに関する。 The present invention relates to an object tracking device and program for tracking an object using an infrared image and a visible image.

近年、映像解析技術の進展に伴い、カメラを用いた様々なアプリケーションが提案されている。この技術の発展は、特にスポーツシーンの映像解析において顕著である。例えば、ウィンブルドンでも使用されているテニスのホークアイシステムは、複数の固定カメラの映像を用いてテニスボールを３次元的に追跡し、ＩＮ／ＯＵＴの判定を行っている。また２０１４年のＦＩＦＡワールドカップでは、ゴールラインテクノロジーと称して、数台の固定カメラの映像を解析し、ゴールの判定を自動化している。さらにサッカースタジアムに多数のステレオカメラを設置し、フィールド内の全選手をリアルタイムに追跡するＴＲＡＣＡＢシステムも知られている。 In recent years, with the progress of video analysis technology, various applications using cameras have been proposed. The development of this technology is particularly remarkable in video analysis of sports scenes. For example, the Hawkeye system for tennis, which is also used at Wimbledon, tracks a tennis ball three-dimensionally using images from a plurality of fixed cameras, and performs IN/OUT determination. Also, at the 2014 FIFA World Cup, goal line technology was used to analyze images from several fixed cameras and automate goal determination. Furthermore, a TRACAB system is also known, in which a large number of stereo cameras are installed in a soccer stadium and all players on the field are tracked in real time.

これら映像解析技術は、時間解像度が３０フレーム／秒(ｆｐｓ)のカメラで撮影した映像を利用する前提であることが多い。例えば、フェンシングの剣先、バドミントンのシャトルなど、目視が困難なほどの高速で移動するオブジェクトを撮影すると、映像上では、そのオブジェクトに極度のモーションブラーが発生する（図１６の符号α）。このため、映像のみからオブジェクト位置を正確に計測することが極めて困難である。この場合、３０ｆｐｓを超えるハイスピートカメラを利用したり、シャッター速度を高速化することで、モーションブラーを軽減できる。その一方、ハイスピードカメラは高価であり、シャッター速度を高速化すると映像の輝度が低下するという問題がある。 These video analysis techniques are often based on the premise of using video captured by a camera with a temporal resolution of 30 frames per second (fps). For example, if an object moving at such a high speed that it is difficult to see with the naked eye, such as the point of a fencing sword or a badminton shuttlecock, is photographed, extreme motion blur occurs in the image of the object (marked α in FIG. 16). Therefore, it is extremely difficult to accurately measure the object position only from the image. In this case, motion blur can be reduced by using a high-speed camera exceeding 30 fps or by increasing the shutter speed. On the other hand, the high-speed camera is expensive, and there is a problem that the brightness of the image decreases when the shutter speed is increased.

このような制約条件の中、赤外カメラを利用し、高速移動体を頑健に追跡する従来技術が提案されている（特許文献１）。この従来技術は、追跡対象に再帰性の反射テープを貼付し、赤外カメラから赤外光を照射し、その反射光を赤外画像上で検出することにより、追跡対象の位置を計測するものである。この従来技術では、赤外画像上で検出を行うことで、可視画像において誤検出の原因となるノイズを低減し、高い精度でオブジェクトを追跡できる。 Under such constraints, a conventional technique has been proposed that uses an infrared camera to robustly track a high-speed moving object (Patent Document 1). This conventional technology measures the position of a tracked target by attaching a retroreflective tape to the tracked target, irradiating infrared light from an infrared camera, and detecting the reflected light on an infrared image. is. In this prior art, detection is performed on an infrared image, thereby reducing noise that causes erroneous detection in a visible image and tracking an object with high accuracy.

特開２０１８－７８４３１号公報JP 2018-78431 A

前記した従来技術では、追跡対象が高速に移動している場合、又は、反射テープが赤外カメラに正対していない場合、反射テープからの反射光が微弱となり、赤外画像上で追跡対象の検出が困難となる。以後、追跡対象の検出に失敗することを「ロスト」と記載することがある。また、従来技術では、複数の追跡対象の追跡中に追跡対象同士が近接した際、又は、全追跡対象をロストした後に再検出した際、追跡対象が入れ替わる場合がある。この場合、従来技術では、正確な軌跡の描画が極めて困難となり、軌跡の入れ替わりが生じることがある。 In the above-described prior art, when the object to be tracked is moving at high speed, or when the reflective tape does not face the infrared camera, the reflected light from the reflective tape becomes weak, and the object to be tracked is displayed on the infrared image. Difficult to detect. Henceforth, failure to detect a tracked object may be referred to as "lost". Further, in the conventional technology, when a plurality of tracked objects are being tracked and the tracked objects come close to each other, or when all the tracked objects are lost and redetected, the tracked objects may be replaced. In this case, with the conventional technology, it is extremely difficult to draw an accurate trajectory, and the trajectory may be replaced.

そこで、本発明は、軌跡の入れ替わりを抑制できるオブジェクト追跡装置及びそのプログラムを提供することを課題とする。 Accordingly, it is an object of the present invention to provide an object tracking device and its program that can suppress the replacement of trajectories.

前記した課題に鑑みて、本発明に係るオブジェクト追跡装置は、動いているオブジェクトのそれぞれに付された赤外光マーカを赤外光で撮影した赤外画像と、それぞれのオブジェクトを動かしている人物を可視光で撮影した可視画像とを用いて、オブジェクトを追跡するオブジェクト追跡装置であって、赤外光検出手段と、関節位置検出手段と、特徴ベクトル算出手段と、属性情報生成手段と、軌跡生成手段と、を備える構成とした。 In view of the above-described problems, an object tracking device according to the present invention provides an infrared image obtained by photographing an infrared light marker attached to each moving object with infrared light, and a person moving each object. An object tracking device for tracking an object using a visible image captured with visible light, comprising infrared light detection means, joint position detection means, feature vector calculation means, attribute information generation means, and a trajectory and generating means.

かかるオブジェクト追跡装置において、赤外光検出手段は、オブジェクトの位置として、赤外画像から赤外光マーカの位置を検出する。
関節位置検出手段は、可視画像から人物の各関節位置を検出する。
特徴ベクトル算出手段は、オブジェクトの位置から各関節位置までの特徴ベクトルを算出する。この特徴ベクトルは、追跡の対象となるオブジェクトの位置と人物の姿勢との関係を表している。 In such an object tracking device, the infrared light detection means detects the position of the infrared light marker from the infrared image as the position of the object.
The joint position detection means detects each joint position of the person from the visible image.
The feature vector calculation means calculates feature vectors from the position of the object to each joint position. This feature vector represents the relationship between the position of the object to be tracked and the pose of the person.

属性情報生成手段は、オブジェクトの位置と各関節位置との関係を予め学習した識別器を用いて、特徴ベクトルによってオブジェクトに対応する人物を選択し、オブジェクトと人物との対応関係を示す属性情報を生成する。
軌跡生成手段は、オブジェクトの位置及び属性情報に基づいて、オブジェクトの軌跡を生成する。
このように、オブジェクト追跡装置は、オブジェクトを追跡する際、オブジェクトと人物との対応関係を示す属性情報を用いるので、その軌跡の入れ替わりを抑制できる。 The attribute information generating means selects a person corresponding to the object according to the feature vector using a classifier that has previously learned the relationship between the position of the object and the positions of each joint, and generates attribute information indicating the correspondence between the object and the person. Generate.
The trajectory generating means generates the trajectory of the object based on the position and attribute information of the object.
In this way, when tracking an object, the object tracking device uses the attribute information indicating the correspondence between the object and the person, so it is possible to suppress the change of the trajectory.

なお、本発明は、コンピュータが備えるＣＰＵ、メモリ、ハードディスクなどのハードウェア資源を、前記したオブジェクト追跡装置として協調動作させるプログラムで実現することもできる。 The present invention can also be implemented by a program that causes hardware resources such as a CPU, memory, and hard disk provided in a computer to operate cooperatively as the object tracking device described above.

本発明によれば、オブジェクトを追跡する際、オブジェクトと人物との対応関係を示す属性情報を用いるので、その軌跡の入れ替わりを抑制できる。このように、本発明によれば、正確なオブジェクトの軌跡を生成し、追跡頑健性を向上させることができる。 According to the present invention, when tracking an object, the attribute information indicating the correspondence between the object and the person is used, so that the trajectory can be suppressed from being replaced. Thus, according to the present invention, an accurate object trajectory can be generated and tracking robustness can be improved.

実施形態に係るオブジェクト追跡システムの概略構成図である。1 is a schematic configuration diagram of an object tracking system according to an embodiment; FIG. 実施形態における剣先の説明図である。It is explanatory drawing of the sword point in embodiment. 実施形態において、赤外画像の一例を示す図である。In an embodiment, it is a figure showing an example of an infrared picture. 実施形態に係るオブジェクト追跡装置の構成を示すブロック図である。1 is a block diagram showing the configuration of an object tracking device according to an embodiment; FIG. 実施形態において、人物の関節点の説明図である。FIG. 4 is an explanatory diagram of joint points of a person in an embodiment; 実施形態において、可視画像の一例を示す図である。FIG. 4 is a diagram showing an example of a visible image in the embodiment; 実施形態において、関節点の検出を説明する説明図である。FIG. 10 is an explanatory diagram illustrating detection of joint points in the embodiment; 実施形態において、選手の関節点の選択を説明する説明図である。FIG. 10 is an explanatory diagram illustrating selection of a player's joint points in the embodiment; 実施形態において、特徴ベクトルの算出を説明する説明図である。FIG. 4 is an explanatory diagram illustrating calculation of a feature vector in the embodiment; 実施形態において、識別器の学習を説明する説明図である。FIG. 4 is an explanatory diagram for explaining learning of a discriminator in an embodiment; 実施形態において、識別器による判定を説明する説明図である。FIG. 4 is an explanatory diagram for explaining determination by a discriminator in an embodiment; 実施形態において、軌跡の描画を説明する説明図である。FIG. 10 is an explanatory diagram illustrating drawing of a trajectory in the embodiment; 実施形態に係るオブジェクト追跡装置の動作を示すフローチャートである。4 is a flow chart showing the operation of the object tracking device according to the embodiment; 実施例において、尤度分布を可視化した画像の一例である。It is an example of the image which visualized likelihood distribution in an Example. 実施例において、尤度分布を可視化した画像の別例である。3 is another example of an image in which the likelihood distribution is visualized in the example. 従来技術において、フェンシングの映像におけるモーションブラーを説明する説明図である。1 is an explanatory diagram illustrating motion blur in a fencing video in the prior art; FIG.

（実施形態）
［オブジェクト追跡システムの概略］
以下、本発明の実施形態について、適宜図面を参照しながら詳細に説明する。
図１を参照し、本発明の実施形態に係るオブジェクト追跡システム１の概略について説明する。
以後の実施形態では、フェンシングにおいて、選手（人物）が持っている剣の剣先（オブジェクト）を追跡対象として説明する。フェンシングの最中、両選手の剣先は、高速で動いていることが多い。 (embodiment)
[Overview of object tracking system]
BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings as appropriate.
An overview of an object tracking system 1 according to an embodiment of the present invention will be described with reference to FIG.
In the following embodiments, in fencing, the point of a sword (object) held by a player (person) will be described as a tracking target. During fencing, the tips of both athletes' swords are often moving at high speed.

オブジェクト追跡システム１は、可視光及び赤外光を同光軸で撮影可能な可視・赤外同軸光カメラ２０を利用し、可視光画像Ｖ及び赤外画像Ｉを組み合わせて、高速で移動する２本の剣先位置を追跡し、その軌跡Ｔ（Ｔ^１，Ｔ^２）を描画するものである。図１に示すように、オブジェクト追跡システム１は、赤外光投光器１０と、可視・赤外同軸光カメラ２０と、オブジェクト追跡装置３０と、を備える。 The object tracking system 1 uses a visible/infrared coaxial camera 20 capable of photographing visible light and infrared light on the same optical axis, combines a visible light image V and an infrared image I, and moves 2 at high speed. It traces the position of the tip of the book and draws its trajectory T (T ¹ , T ² ). As shown in FIG. 1 , the object tracking system 1 includes an infrared light projector 10 , a visible/infrared coaxial light camera 20 , and an object tracking device 30 .

赤外光投光器１０は、赤外光を投光する一般的な投光器である。
図２に示すように、この赤外光投光器１０が投光した赤外光は、両選手の剣先９０に付けた反射テープ（赤外光マーカ）９１で反射され、後記する可視・赤外同軸光カメラ２０で撮影される。 The infrared light projector 10 is a general light projector that projects infrared light.
As shown in FIG. 2, the infrared light projected by the infrared light projector 10 is reflected by a reflective tape (infrared light marker) 91 attached to the tips of the swords 90 of both athletes, resulting in a visible/infrared coaxial light, which will be described later. Photographed by the optical camera 20 .

反射テープ９１は、赤外光投光器１０からの赤外線を反射するものである。この反射テープ９１は、剣先９０に１枚以上付ければよく、その大きさや枚数に特に制限はない。図２の例では、剣先９０は、その側面に矩形状の反射テープ９１を１枚付けている。ここで、剣先９０は、側面反対側に反射テープ９１を１枚追加してもよく、その側面を一周するように帯状の反射テープ９１を巻いてもよい（不図示）。 The reflective tape 91 reflects infrared rays from the infrared light projector 10 . One or more reflective tapes 91 may be attached to the tip 90, and the size and number of the tapes are not particularly limited. In the example of FIG. 2, the sword tip 90 has a piece of rectangular reflective tape 91 attached to its side surface. Here, the sword tip 90 may be provided with a piece of reflective tape 91 added to the opposite side, or a band-shaped reflective tape 91 may be wound around the side (not shown).

可視・赤外同軸光カメラ２０は、可視光と赤外光を同一光軸で撮影し、同一画素数の可視画像Ｖ及び赤外画像Ｉを生成するものである。本実施形態では、可視・赤外同軸光カメラ２０は、フェンシングの競技を撮影した可視画像Ｖ（図５）と、剣先９０の反射テープ９１を撮影した赤外画像Ｉとを生成する。図３に示すように、赤外画像Ｉは、２個の反射テープ９１のみが撮影される一方、他の選手などが撮影されない（破線で図示）。また、可視画像Ｖの剣先９０と、赤外画像Ｉの反射テープ９１との画像座標が対応するため、３次元空間での視点変換を行うことなく軌跡Ｔを描画できる。 The visible/infrared coaxial camera 20 captures visible light and infrared light on the same optical axis to generate a visible image V and an infrared image I having the same number of pixels. In this embodiment, the visible/infrared coaxial camera 20 generates a visible image V ( FIG. 5 ) of fencing competition and an infrared image I of the reflective tape 91 of the tip 90 . As shown in FIG. 3, in the infrared image I, only the two reflective tapes 91 are photographed, while other athletes and the like are not photographed (indicated by broken lines). Further, since the image coordinates of the tip of the sword 90 of the visible image V and the reflective tape 91 of the infrared image I correspond to each other, the trajectory T can be drawn without converting the viewpoint in the three-dimensional space.

オブジェクト追跡装置３０は、可視・赤外同軸光カメラ２０から入力された赤外画像Ｉと可視画像Ｖとを用いて、両選手の剣先９０を追跡するものである。そして、オブジェクト追跡装置３０は、追跡した両選手の剣先９０の軌跡Ｔ^１，Ｔ^２を異なる色で描画し、描画した軌跡Ｔ^１，Ｔ^２を可視画像Ｖに合成することで、軌跡合成画像Ｆを生成する。
なお、図１では、左側の選手が持つ剣先９０の軌跡Ｔ^１を破線で図示し、右側の選手が持つ剣先９０の軌跡Ｔ^２を一点鎖線で図示した。 The object tracking device 30 uses the infrared image I and the visible image V input from the visible/infrared coaxial camera 20 to track the tips 90 of both players. Then, the object tracking device 30 draws the tracked trajectories T ¹ and T ² of the tip of the sword 90 of both athletes in different colors, and synthesizes the drawn trajectories T ¹ and T ² with the visible image V to obtain a trajectory composite image. Generate F.
In ^FIG . 1, the trajectory T1 of the tip 90 held by the player on the left is indicated by a broken line, and the trajectory T2 of the tip 90 held by the player on the right is indicated by ^a dashed line.

［オブジェクト追跡装置の構成］
図４を参照し、オブジェクト追跡装置３０の構成について説明する。
図４に示すように、オブジェクト追跡装置３０は、赤外光検出手段３１と、人物姿勢取得手段３３と、オブジェクト識別手段３５と、オブジェクト追跡手段３７と、を備える。 [Configuration of object tracking device]
The configuration of the object tracking device 30 will be described with reference to FIG.
As shown in FIG. 4 , the object tracking device 30 includes infrared light detection means 31 , human posture acquisition means 33 , object identification means 35 and object tracking means 37 .

本実施形態では、オブジェクト追跡装置３０は、時間方向に連続するフレーム１，…，ｔ－１，ｔ，…の赤外画像Ｉ及び可視画像Ｖが入力され、入力された赤外画像Ｉ及び可視画像Ｖに順次処理を施すこととする。以後、現在のフレーム（現フレーム）をｔとし、現フレームｔの赤外画像Ｉを赤外画像Ｉ_ｔとし、現フレームの可視画像Ｖを可視画像Ｖ_ｔとする。 In this embodiment, the object tracking device 30 receives infrared images I and visible images V of frames 1, . . . , t−1, t, . Suppose that an image V is subjected to sequential processing. Hereinafter, the current frame (current frame) is defined as _t , the infrared image I of the current frame _t is defined as the infrared image It, and the visible image V of the current frame is defined as the visible image Vt.

赤外光検出手段３１は、赤外画像Ｉ_ｔから剣先９０（反射テープ９１）の位置を検出するものである。以下、赤外光検出手段３１による剣先位置の検出手法の一例を説明する。 The infrared light detection means 31 detects the position of the tip 90 (reflective tape ₉₁ ) from the infrared image It. An example of a technique for detecting the position of the tip of the sword by the infrared light detection means 31 will be described below.

＜剣先位置の検出手法＞
まず、赤外光検出手段３１は、下記の式（１）を用いて、現在のフレームの赤外画像Ｉ_ｔと、１つ前のフレームの赤外画像Ｉ_ｔ－１との２値赤外差分画像を生成することで、動オブジェクトの領域Ｍ_ｔのみを抽出する。つまり、赤外光検出手段３１は、赤外画像Ｉ_ｔの画素（ｘ，ｙ）の輝度値Ｉ^ｘｙ _ｔと、赤外画像Ｉ_ｔ－１の画素（ｘ，ｙ）の輝度値Ｉ^ｘｙ _ｔ－１との差分が、予め設定した閾値Ｒ＿ｂｒｉを超える動オブジェクトの領域Ｍ^ｘｙ _ｔを、候補ブロブとして抽出する。 <Detection method of sword tip position>
First, the infrared light detection means 31 uses the following equation (1) to obtain a binary infrared image of the infrared image I t of the current frame and the infrared image I _t _-1 of the previous frame. Only the region _Mt of the moving object is extracted by generating the difference image. That is, the infrared light detection means 31 detects the luminance value I ^xy _t of the pixel (x, y) of the infrared image I _t and the luminance value I ^xy _t of the pixel (x, y) of the infrared image I _t−1 . A moving object region M ^xy _t whose difference from ₋₁ exceeds a preset threshold value R_bri is extracted as a candidate blob.

ここで、ｘ，ｙは、水平及び垂直の画像座標を表す。また、閾値Ｒ＿ｂｒｉは、任意の値で予め設定する。また、式（１）の‘０’が最小輝度値を表し、‘２５５’が最大輝度値を表す。
なお、赤外光検出手段３１は、静止しているノイズブロブの発生を抑えるために２値赤外差分画像Ｍ^ｘｙ _ｔを生成したが、赤外画像Ｉ_ｔで輝度が高い領域を候補ブロブとして抽出してもよい。 where x,y represent the horizontal and vertical image coordinates. Also, the threshold value R_bri is preset with an arbitrary value. Also, '0' in Equation (1) represents the minimum luminance value, and '255' represents the maximum luminance value.
Note that the infrared light detection means 31 generates the binary infrared difference image M ^xy _t in order to suppress the generation of stationary noise blobs, but the area with high luminance in the infrared image I _t is extracted as a candidate blob. You may

次に、赤外光検出手段３１は、抽出した候補ブロブにモルフォロジ処理を施し、小領域のノイズブロブを消去する。このモルフォロジ処理とは、画像をいくつかの方向に画素単位でずらした画像群と、もとの画像との画像間演算によって、小領域のノイズブロブを消去する処理である。 Next, the infrared light detection means 31 applies morphology processing to the extracted candidate blobs to eliminate noise blobs in small areas. This morphology processing is processing for eliminating noise blobs in a small area by performing inter-image operations between an image group obtained by shifting an image in several directions pixel by pixel and the original image.

次に、赤外光検出手段３１は、モルフォロジ処理で残った候補ブロブにラベリング処理を施す。このラベリング処理とは、候補ブロブにラベル（番号）を割り当てる処理である。
次に、赤外光検出手段３１は、ラベリング処理を施した候補ブロブの位置、面積及び形状特徴量を求める。ここで、候補ブロブの位置は、候補ブロブの中心位置又は重心位置である。また、候補ブロブの形状特徴量は、円形度や外接矩形のアスペクト比とする。 Next, the infrared light detection means 31 performs labeling processing on the candidate blobs remaining after the morphology processing. This labeling process is a process of assigning labels (numbers) to candidate blobs.
Next, the infrared light detection means 31 obtains the position, area, and shape feature amount of the candidate blob subjected to the labeling process. Here, the position of the candidate blob is the center position or barycentric position of the candidate blob. Also, the shape feature amount of the candidate blob is the degree of circularity or the aspect ratio of the circumscribing rectangle.

次に、赤外光検出手段３１は、予め設定した最小面積から最大面積までの範囲にない候補ブロブを消去する。そして、赤外光検出手段３１は、形状特徴量が予め設定した範囲内にない候補ブロブを消去する。さらに、赤外光検出部３１１は、候補ブロブの数がオブジェクト上限数を超えている場合、面積が大きい２個の候補ブロブの位置を剣先９０の位置Ｓ^１，Ｓ^２として残し、他の候補ブロブを消去する。なお、Ｓ^ｍ（Ｓ^１，Ｓ^２）は、後記する左右の属性情報が付加されていない剣先９０の位置を表す（ｍ∈１，２）。
その後、赤外光検出手段３１は、剣先９０の位置Ｓ^１，Ｓ^２として、赤外画像Ｉ_ｔから検出した２個の反射テープ９１の位置をオブジェクト識別手段３５（特徴ベクトル算出手段３５１）に出力する。 Next, the infrared light detection means 31 eliminates candidate blobs that are not within the range from the preset minimum area to the maximum area. Then, the infrared light detection means 31 deletes candidate blobs whose shape feature amount is not within the preset range. Furthermore, when the number of candidate blobs exceeds the upper limit number of objects, the infrared light detection unit 311 leaves the positions of two candidate blobs with large areas as the positions S ¹ and S ² of the tip 90, and leaves other candidates. Erase the blob. S ^m (S ¹ , S ² ) represents the position of the sword tip 90 to which the left and right attribute information described later is not added (mε1, 2).
After that, the infrared light detection means 31 sends the positions of the ^two reflective tapes ₉₁ detected from the infrared image It to the object identification means 35 ⁽ feature vector calculation means 351) as positions S1 and S2 of the tip 90. Output.

人物姿勢取得手段３３は、可視画像Ｖ_ｔから人物の姿勢を取得するものであり、人物姿勢検出手段（関節位置検出手段）３３１と、人物選択手段３３３と、を備える。 The human pose acquisition means 33 acquires the pose of a person from the visible image _Vt , and includes a person pose detection means (joint position detection means) 331 and a person selection means 333 .

人物姿勢検出手段３３１は、人物の姿勢として、可視画像Ｖ_ｔから人物の各関節点（関節位置）を検出するものである。ここで、人物姿勢検出手段３３１は、任意の手法で人物の関節点を検出可能であり、可視画像Ｖ_ｔから関節点を自動的に検出してもよく、可視画像Ｖ_ｔに手動で関節点を指定してもよい。 The human pose detection means 331 detects each joint point (joint position) of the person from the visible image _Vt as the pose of the person. Here, the human pose detection means 331 can detect the joint points of the person by any method, and may automatically detect the joint points from the visible image _Vt , or manually detect the joint points on the visible image _Vt . may be specified.

本実施形態では、人物姿勢検出手段３３１が、一般的な姿勢計測手法の一つである“ＯｐｅｎＰｏｓｅ”を用いることとして説明する（参考文献１）。
参考文献１：ZheCao, Tomas Simon, Shih-EnWei, YaserSheikh, ”Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields,”In Proceedings of the IEEE InternationalConference on Computer Vision and Pattern Recognition 2017 (CVPR2017), pp.7291-7299 In the present embodiment, the human posture detection unit 331 uses "OpenPose", which is one of the general posture measurement methods (reference document 1).
Reference 1: ZheCao, Tomas Simon, Shih-EnWei, YaserSheikh, ”Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields,” In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition 2017 (CVPR2017), pp.7291- 7299

この姿勢計測手法は、深層学習を用いて人物姿勢を計測する手法であり、可視画像Ｖ_ｔから各人物の関節点を１８点検出する。以後、図５に示すように、各人物の関節点をＢ^ｎ _ｉで表す。上付き添え字ｎは、可視画像Ｖ_ｔに含まれる人物の識別番号を表し（ｎ∈Ｎ）、可視画像Ｖ_ｔに含まれる人物の総数をＮとする。下付き添え字ｉは、関節点の識別番号を表す（ｉ＝０～１７）。
なお、図５では、識別番号ｎを省略すると共に、隣接する関節点Ｂ^ｎ _ｉを結ぶ破線を図示した。 This posture measurement method is a method of measuring a person's posture using deep learning, and detects 18 joint points of each person from the visible image _Vt . Hereinafter, as shown in FIG. 5, the joint points of each person are represented by B ⁿ _i . A superscript n represents the identification number of a person included in the visible image Vt ₍ _nεN ), and let N be the total number of people included in the visible image Vt. The subscript i represents the identification number of the joint point (i=0 to 17).
Note that FIG. 5 omits the identification number n and shows dashed lines connecting adjacent joint points B ⁿ _i .

この関節点Ｂ_ｉには、上半身の関節点Ｂ_０～Ｂ_７，Ｂ_１４～Ｂ_１７と、股関節Ｂ_８，Ｂ_１１を含めた下半身の股関節Ｂ_８～Ｂ_１３とが含まれる。頭部の関節点Ｂ_１４～Ｂ_１７のように関節でない箇所も含まれているが、目や鼻のように画像特徴を有するので、これらも関節点として扱っている。 The joint points B _i include upper body joint points B ₀ to B _{7 and} B ₁₄ to B ₁₇ and lower body hip joints B ₈ to B ₁₃ including hip joints B ₈ and B ₁₁ . Although non-joint points such as the joint points B ₁₄ to B ₁₇ of the head are included, they are also treated as joint points because they have image features such as the eyes and nose.

例えば、図６の可視画像Ｖ_ｔが人物姿勢検出手段３３１に入力されたこととする。この可視画像Ｖ_ｔは、フェンシングの試合映像であり、２人の選手Ｈ^Ｌ，Ｈ^Ｒの他、１人の審判Ｈ^Ｊ及び５人の観客Ｈ^Ｇなど、人物Ｈが８人含まれている（Ｎ＝８）。ここで、参考文献１の姿勢計測手法では、関節点Ｂ^ｎ _ｉの検出対象となる人物Ｈを可視画像Ｖ_ｔで指定できないので、全人物Ｈの関節点Ｂ^ｎ _ｉを検出することになる。すなわち、人物姿勢検出手段３３１は、図７に示すように、可視画像Ｖ_ｔに含まれる全人物Ｈの関節点Ｂ^ｎ _ｉを検出する。なお、図７では、図面を見やすくするため、一部の関節点Ｂ^ｎ _ｉのみ符号を図示した。
その後、人物姿勢検出手段３３１は、可視画像Ｖ_ｔと、この可視画像Ｖ_ｔから検出した全人物Ｈの関節点Ｂ^ｎ _ｉとを、人物選択手段３３３に出力する。 For example, assume that the visible image _Vt in FIG. This visible image _Vt is a video of a fencing match, and includes eight persons H, including two players ^HL and ^HR , one referee ^HJ , and five spectators ^HG . (N=8). Here, in the posture measurement method of Reference 1, since the person H whose joint points B ⁿ _i are to be detected cannot be specified in the visible image V _t , the joint points B ⁿ _i of all persons H are detected. That is, the human posture detection means 331 detects the joint points B ⁿ _i of all the people H included in the visible image V _t as shown in FIG. 7 . In addition, in FIG. 7, only some of the joint points B ⁿ _i are illustrated with reference numerals for easy viewing of the drawing.
After that, the person posture detection means 331 outputs the visible image _Vt and the joint points ^Bn _i of all the persons H detected from this visible image _Vt to the person selection means 333 .

前記したように、可視画像Ｖ_ｔに含まれる全人物Ｈの関節点Ｂ^ｎ _ｉが検出されてしまう。また、フェンシングの試合映像では、観客Ｈ^Ｇや審判Ｈ^Ｊに比べ、選手Ｈ^Ｌ，Ｈ^Ｒが大きなサイズで撮影されることが多い。そこで、人物選択手段３３３は、下記の式（２）に示すように、全人物Ｈの関節点Ｂ^ｎ _ｉに基づいて胴体長ｌ^ｎを算出し、算出した胴体長ｌ^ｎが長い順に予め設定した人数の人物Ｈを選択する。 As described above, the joint points _Bni of all persons H included in the _visible image ^Vt are detected. Also, in a fencing game video, the players ^HL and ^HR are often shot in a larger size than the spectators ^HG and the referee ^HJ . Therefore, the person selection means 333 calculates the torso length ^ln based on the joint points ^Bn _i of all the persons H, as shown in the following equation (2), and presets the calculated torso length ^ln in descending order. The number of people H selected is selected.

参考文献１の姿勢計測手法で関節点Ｂ^ｎ _ｉを検出した場合、図５に示すように、首関節が関節点Ｂ_１であり、両足の付け根にあたる股関節がそれぞれ関節点Ｂ_８，Ｂ_１１である。そこで、式（２）に示すように、首関節Ｂ_１から一方の股関節Ｂ_８までのベクトルの長さと、首関節Ｂ_１から他方の股関節Ｂ_１１までのベクトルの長さとの平均値を、胴体長ｌ^ｎとした。 When the joint points ^Bn _i are detected by the _posture measurement method of Reference ₁ , as shown _in FIG. be. Therefore, as shown in equation (2), the average value of the length of the vector from neck joint B ₁ to one hip joint B ₈ and the length of the vector from neck joint B ₁ to the other hip joint B ₁₁ is the torso The length ^ln .

また、フェンシングの試合は２人の選手Ｈ^Ｌ，Ｈ^Ｒで行うので、図８に示すように、胴体長ｌ^ｎが長い２人の選手Ｈ^Ｌ，Ｈ^Ｒを選択すればよい。なお、首関節Ｂ_１及び股関節Ｂ_８，Ｂ_１１と、選択する人数とは、人物選択手段３３３に予め設定しておくこととする。 In addition, since the fencing match is played by two players ^HL and ^HR , two players ^HL and ^HR having a long torso length ^ln should be selected as shown in FIG. The neck joint B ₁ , hip joints B ₈ and B ₁₁ , and the number of people to be selected are preset in the person selection means 333 .

また、フェンシングの試合では、左右の選手Ｈ^Ｌ，Ｈ^Ｒの位置が入れ替わらないため、左側の選手Ｈ^Ｌ又は右側の選手Ｈ^Ｒを示す属性によって、関節点Ｂ^ｎ _ｉ及び胴体長ｌ^ｎを記述できる。本実施形態では、左側の選手Ｈ^Ｌを示す識別番号ｎをＬに置き換え、右側の選手Ｈ^Ｒを示す識別番号ｎをＲに置き換えることとする。
その後、人物選択手段３３３は、左側の選手Ｈ^Ｌの関節点Ｂ^Ｌ _ｉ及び胴体長ｌ^Ｌと、右側の選手Ｈ^Ｒの関節点Ｂ^Ｒ _ｉ及び胴体長ｌ^Ｒとを、オブジェクト識別手段３５（特徴ベクトル算出手段３５１）に出力する。 In addition, in a _fencing match, the positions of the ^left and right ^players ^HL and ^HR ^are not ^interchanged . can be described. In this embodiment, the identification number n indicating the player HL on the left side is replaced with ^L , and the identification number n indicating the player HR on the right side is replaced with ^R.
After that, the person selection means 333 selects the joint point B ^Li and the torso length l ^L of the left player H ^L and the joint point B ^R _i and the torso length _l ^R of the right player H ^R as the object identification means 35 ( Output to the feature vector calculation means 351).

オブジェクト識別手段３５は、フェンシングの試合映像から検出された２個の剣先９０のそれぞれが、左側の選手Ｈ^Ｌ又は右側の選手Ｈ^Ｒのどちらに対応するのかを識別するものであり、特徴ベクトル算出手段３５１と、属性情報生成手段３５３と、を備える。 The object identification means 35 identifies whether each of the two sword tips 90 detected from the fencing game video corresponds to the left player ^HL or the right player ^HR . Means 351 and attribute information generating means 353 are provided.

特徴ベクトル算出手段３５１は、剣先９０の位置Ｓ^１，Ｓ^２から選手Ｈ^Ｌ，Ｈ^Ｒの各関節点Ｂ^Ｌ _ｉ，Ｂ^Ｒ _ｉまでの特徴ベクトルを算出するものである。図９に示すように、特徴ベクトル算出手段３５１は、１個目の剣先９０の位置Ｓ^１から右側の選手Ｈ^Ｒの各関節点Ｂ^Ｒ _ｉまでの特徴ベクトルを算出する。なお、図９では、特徴ベクトルを二点鎖線の矢印で図示した。また、特徴ベクトル算出手段３５１は、剣先９０の位置Ｓ^１から左側の選手Ｈ^Ｌの各関節点Ｂ^Ｌ _ｉまでの特徴ベクトルも算出する。このように、特徴ベクトルは、剣先９０の位置Ｓ^１から左右の両選手Ｈ^Ｌ，Ｈ^Ｒに向かうので、左右の選手Ｈ^Ｌ，Ｈ^Ｒの相対位置を考慮した頑健な特徴量となる。 The feature vector calculation means 351 calculates feature vectors from the positions S ¹ and S ² of the tip of the sword 90 to the joint points B ^Li and B ^R _i of the _players ^HL and ^HR . As shown in FIG. 9, the feature vector calculation means 351 calculates a feature vector from the position _S1 of the ^first sword tip 90 to each joint point ^BRi of the player ^HR on the right side. In addition, in FIG. 9, the feature vector is illustrated with the arrow of the chain double-dashed line. The feature vector calculation means 351 also calculates a feature vector from the position _S1 of the tip of the sword 90 to ^each joint point ^BLi of the player ^HL on the left side. In this way, the feature vector is directed from the position S1 of the tip of the sword ⁹⁰ to both the left and right players ^HL and ^HR , so it becomes a robust feature quantity that takes into consideration the relative positions of the left and right players ^HL and ^HR .

なお、図示を省略したが、特徴ベクトル算出手段３５１は、１個目の剣先９０の位置Ｓ^１と同様、２個目の剣先９０の位置Ｓ^２から右側の選手Ｈ^Ｒの各関節点Ｂ^Ｒ _ｉまでの特徴ベクトルと、剣先９０の位置Ｓ^２から左側の選手Ｈ^Ｌの各関節点Ｂ^Ｌ _ｉまでの特徴ベクトルとを算出する。 Although not shown, the feature vector calculation means 351 calculates each joint point B ^R of the player H ^R on the right side from the position S ² of the second sword tip 90 in the same way as the position S ¹ of the first sword tip 90 . A ^feature _vector up to _i and a feature vector from the position S2 of the tip of the sword 90 to ^each joint point ^BLi of the player HL on the left side are calculated.

この特徴ベクトルは、以下の式（３）で表されており、関節点Ｂ^Ｌ _ｉ，Ｂ^Ｒ _ｉがそれぞれ１８点あるために３６次元の特徴量となる。また、可視・赤外同軸光カメラ２０のズーム量に応じて、可視画像Ｖ_ｔ内で選手Ｈ^Ｌ，Ｈ^Ｒのサイズが変化する。そこで、特徴ベクトル算出手段３５１は、式（３）に示すように、選手Ｈ^Ｌ，Ｈ^Ｒの胴体長ｌ^Ｌ，ｌ^Ｒで正規化（除算）することで、選手Ｈ^Ｌ，Ｈ^Ｒのサイズに不変な特徴ベクトルを算出できる。 This feature vector is represented by the following equation (3), and since there are 18 joint points B ^L _i and B ^R _i each, it becomes a 36-dimensional feature amount. Also, depending on the zoom amount of the visible/infrared coaxial camera 20, the sizes of the players ^HL and ^HR change within the visible image _Vt . Therefore, the feature vector calculation means 351 normalizes (divides) the players ^HL and ^HR by the torso lengths ^lL and ^lR of the players HL and ^HR , as shown in Equation (3), to obtain the sizes of the players ^HL and HR. Invariant feature vectors can be calculated.

その後、特徴ベクトル算出手段３５１は、算出した特徴ベクトルと、剣先９０の位置Ｓ^１，Ｓ^２とを属性情報生成手段３５３に出力する。 After that, the feature vector calculating means 351 outputs the calculated feature vector and the positions S ¹ and S ² of the tip 90 to the attribute information generating means 353 .

属性情報生成手段３５３は、予め学習した識別器を用いて、剣先９０の位置Ｓ^１，Ｓ^２に対応する選手Ｈ^Ｌ，Ｈ^Ｒを選択し、剣先９０と選手Ｈ^Ｌ，Ｈ^Ｒとの対応関係を示す属性情報を生成するものである。つまり、属性情報は、２人の選手Ｈ^Ｌ，Ｈ^Ｒと、各選手Ｈ^Ｌ，Ｈ^Ｒが動かしている剣の剣先９０の位置Ｓ^１，Ｓ^２とを対応付けた情報である。 The attribute information generating means 353 selects the players ^HL and ^HR corresponding to the positions S ¹ and S ² of the tip 90 by using a classifier learned in advance, and determines the correspondence between the tip 90 and the players ^HL and ^HR . It generates attribute information that indicates the relationship. That is, the attribute information is information that associates the two players ^HL and ^HR with the positions S ¹ and S ² of the sword points 90 of the swords that the players ^HL and ^HR are moving.

この属性情報生成手段３５３は、２つの動作モードで動作する。動作モードの１つめは、属性情報生成手段３５３が、識別器を学習する学習モードである。動作モードの２つめは、属性情報生成手段３５３が、学習した識別器を用いて、剣先９０の位置Ｓ^１，Ｓ^２に対応する選手Ｈ^Ｌ，Ｈ^Ｒを選択する選択モードである。なお、本実施形態では、オブジェクト追跡装置３０のユーザが、２つの動作モードを手動で切り替えることとする。 This attribute information generation means 353 operates in two operation modes. A first operation mode is a learning mode in which the attribute information generating means 353 learns a discriminator. The second operation mode is a selection mode in which the attribute information generating means 353 selects the players ^HL and ^HR corresponding to the positions S ¹ and S ² of the tip 90 using the learned discriminators. In this embodiment, the user of the object tracking device 30 manually switches between the two operation modes.

＜学習モード＞
まず、属性情報生成手段３５３の学習モードについて説明する。
識別器は、図９の特徴ベクトル、すなわち、剣先９０の位置Ｓ^１，Ｓ^２と、選手Ｈ^Ｌ，Ｈ^Ｒの関節点Ｂ^Ｌ _ｉ，Ｂ^Ｒ _ｉとの関係を学習したものである。本実施形態では、属性情報生成手段３５３が、サポートベクタマシン（ＳＶＭ：Support Vector Machine）により、回帰モデルの識別器を学習する。 <Learning mode>
First, the learning mode of the attribute information generating means 353 will be described.
The discriminator has learned the feature vector of FIG. 9, that is, the relationship between the positions S ¹ and S ² of the tip of the sword 90 and the joint points B ^L _i and B ^R _i of the players H ^L and H ^R . In this embodiment, the attribute information generating means 353 learns a regression model discriminator using a support vector machine (SVM).

このとき、属性情報生成手段３５３は、図１０に示すように、左右の選手Ｈ^Ｌ，Ｈ^Ｒ毎に識別器を学習する。図１０では、ＳＶＭ回帰（Ｌ）が左側の選手Ｈ^Ｌに対応した識別器であり、ＳＶＭ回帰（Ｒ）が右側の選手Ｈ^Ｒに対応した識別器である。ＳＶＭ回帰（Ｌ）の学習データは、剣先９０の位置Ｓ^Ｌ，Ｓ^Ｒを可視画像Ｖ上で手動で設定し、左側の選手Ｈ^Ｌに対応する剣先９０の位置Ｓ^Ｌでスコア１．０（正例）とし、右側の選手Ｈ^Ｒに対応する剣先９０の位置Ｓ^Ｒでスコア－１．０（負例）とすればよい。なお、剣先９０の位置Ｓ^Ｌ，Ｓ^Ｒは、左右の属性情報が付加された剣先９０の位置を表す。ＳＶＭ回帰（Ｌ）と同様、ＳＶＭ回帰（Ｒ）の学習データは、左側の選手Ｈ^Ｌに対応する剣先９０の位置Ｓ^Ｌでスコア－１．０（負例）とし、右側の選手Ｈ^Ｒに対応する剣先９０の位置Ｓ^Ｒでスコア１．０（正例）とすればよい。一般的には、１００組以上の学習データを準備すれば、高精度な識別器を学習できる。 At this time, as shown in FIG. 10, the attribute information generating means 353 learns classifiers for each of the left and right players ^HL and ^HR . In FIG. 10, SVM regression ( ^L ) is the discriminator corresponding to the player HL on the left, and SVM regression ( ^R ) is the discriminator corresponding to the player HR on the right. The learning data for SVM regression ( ^L ) is obtained by manually setting the positions S ^L and S ^R of the tip 90 on the visible image V, and ^obtaining a score of 1.0 ( A positive example), and a score of -1.0 (negative example) at the position S ^R of the tip 90 corresponding to the player H ^R on the right side. The positions S ^L and S ^R of the tip 90 represent the positions of the tip 90 to which the left and right attribute information is added. Similar to the SVM regression ( ^L ), the learning data for the SVM regression (R) is a score of -1.0 (negative example) at the position SL of the tip 90 corresponding to the player ^HL on the left side, and the score for the player ^HR on the right side. A score of 1.0 (positive example) may be given at the corresponding position ^SR of the tip 90 . In general, if 100 sets or more of training data are prepared, a highly accurate discriminator can be learned.

＜選択モード＞
次に、属性情報生成手段３５３の選択モードについて説明する。
回帰モデルの識別器は、成否、正負や真偽といった２値判定ではなく、その判定結果を数値（尤度）として出力する。つまり、回帰モデルの識別器は、２個の剣先９０がそれぞれ、左側の選手Ｈ^Ｌのものである尤度と、右側の選手Ｈ^Ｒのものである尤度とを出力する。従って、属性情報生成手段３５３は、図１１に示すように、左側の選手Ｈ^Ｌに対応したＳＶＭ回帰（Ｌ）に特徴ベクトルを入力し、左側の選手Ｈ^Ｌであることを示す尤度と、右側の選手Ｈ^Ｒであることを示す尤度とを算出する。さらに、属性情報生成手段３５３は、右側の選手Ｈ^Ｒに対応したＳＶＭ回帰（Ｒ）に特徴ベクトルを入力し、左側の選手Ｈ^Ｌであることを示す尤度と、右側の選手Ｈ^Ｒであることを示す尤度とを算出する。このように、属性情報生成手段３５３は、左右の選手Ｈ^Ｌ，Ｈ^Ｒに対応する識別器を用いて、左右の選手Ｈ^Ｌ，Ｈ^Ｒに対応する尤度を計４通り算出する。 <Selection mode>
Next, the selection mode of the attribute information generating means 353 will be described.
The discriminator of the regression model outputs the determination result as a numerical value (likelihood) instead of binary determination such as success/failure, positive/negative, or true/false. That is, the regression model discriminator outputs the likelihood that each of the two sword tips 90 belongs to the left player ^HL and the likelihood that it belongs to the right player ^HR . Therefore, as shown in FIG. 11, the attribute information generating means 353 inputs the feature vector to the SVM regression ( ^L ) corresponding to the player HL on the left side, and the likelihood indicating that it is the player ^HL on the left side, Then, the likelihood of being the player ^HR on the right side is calculated. Furthermore, the attribute information generation means 353 inputs the feature vector to the SVM regression ( ^R ) corresponding to the right player HR, and the likelihood indicating that it is the left player ^HL and the right player ^HR . Calculate the likelihood that indicates that In this way, the attribute information generating means 353 calculates a total of four likelihoods corresponding to the left and right players ^HL and ^HR using classifiers corresponding to the left and right players ^HL and ^HR .

次に、属性情報生成手段３５３は、４通りの尤度のうち、最も尤度が高くなるものを選択する。つまり、属性情報生成手段３５３は、２個の剣先９０と左右の選手Ｈ^Ｌ，Ｈ^Ｒとの４通りの組み合わせのうち、最も尤度が高くなる組み合わせを選択する。従って、属性情報生成手段３５３は、残りの剣先９０と、残りの選手Ｈ^Ｌ，Ｈ^Ｒとの組み合わせも必然的に選択できる。 Next, the attribute information generating means 353 selects the one with the highest likelihood among the four likelihoods. That is, the attribute information generating means 353 selects the combination with the highest likelihood among the four combinations of the two sword tips 90 and the left and right players ^HL and ^HR . Therefore, the attribute information generating means 353 can inevitably select combinations of the remaining sword tips 90 and the remaining players ^HL and ^HR .

以下、４通りの尤度を算出する利点について、２値判定と対比して説明する。
左右の選手Ｈ^Ｌ，Ｈ^Ｒに対応する識別器に２値判定（分類モデル）を適用した場合、両方の選手Ｈ^Ｌ，Ｈ^Ｒに同一の剣先９０が対応するという矛盾した判定結果を生じることがある。例えば、同一の剣先９０について、ＳＶＭ回帰（Ｌ）が左側の選手Ｈ^Ｌのものと判定すると共に、ＳＶＭ回帰（Ｒ）が右側の選手Ｈ^Ｒのものと判定することがあり、何れが正しいか真偽不明となる。一方、属性情報生成手段３５３は、２値判定ではなく尤度という数値を算出するので、最も尤度が高くなる剣先９０と選手Ｈ^Ｌ，Ｈ^Ｒとの組み合わせを選択可能であり、２値判定のように矛盾した判定結果を生じることがない。 Advantages of calculating four types of likelihoods will be described below in comparison with binary determination.
When binary judgment (classification model) is applied to classifiers corresponding to left and right players ^HL and ^HR , a contradictory judgment result is produced that the same sword tip 90 corresponds to both players ^HL and ^HR . There is For example, for the same sword tip 90, the SVM regression ( ^L ) may be determined to be that of the left player HL and the SVM regression ( ^R ) may be determined to be that of the right player HR. The truth is unknown. On the other hand, since the attribute information generating means 353 calculates a numerical value of likelihood instead of binary determination, it is possible to select the combination of the tip 90 and the players ^HL and ^HR that gives the highest likelihood. There is no contradictory determination result such as

次に、属性情報生成手段３５３は、２個の剣先９０のそれぞれと、左右の選手Ｈ^Ｌ，Ｈ^Ｒとの対応関係を示す属性情報を生成し、生成した属性情報を剣先９０の位置Ｓ^１，Ｓ^２に付加する。そして、属性情報生成手段３５３は、属性情報が付加された剣先９０の位置Ｓ^Ｌ，Ｓ^Ｒをオブジェクト追跡手段３７（軌跡描画手段３７１）に出力する。 Next, the attribute information generating means 353 generates attribute information indicating the correspondence relationship between each of the two sword tips 90 and the left and right players H ^L and ^HR , and converts the generated attribute information to the position S ¹ of the sword tip 90 . , ^S2 . Then, the attribute information generation means 353 outputs the positions S ^L and S ^R of the tip 90 to which the attribute information is added to the object tracking means 37 (trajectory drawing means 371).

図４に戻り、オブジェクト追跡装置３０の構成について、説明を続ける。
オブジェクト追跡手段３７は、オブジェクトを追跡するものであり、可視画像蓄積手段３７１と、軌跡描画手段（軌跡生成手段）３７３と、を備える。 Returning to FIG. 4, the description of the configuration of the object tracking device 30 is continued.
The object tracking means 37 tracks an object, and includes a visible image storage means 371 and a trajectory drawing means (trajectory generating means) 373 .

可視画像蓄積手段３７１は、可視・赤外同軸光カメラ２０より入力された可視画像Ｖ_ｔを蓄積するメモリ、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）等の記憶手段である。可視画像蓄積手段３７１が蓄積した可視画像Ｖ_ｔは、後記する軌跡描画手段３７３により参照される。 The visible image storage means 371 is storage means such as a memory, HDD (Hard Disk Drive), or SSD (Solid State Drive) for storing the visible image _Vt input from the visible/infrared coaxial camera 20 . The visible image _Vt accumulated by the visible image accumulation means 371 is referred to by the trajectory drawing means 373, which will be described later.

軌跡描画手段３７３は、属性情報を参照しながら、可視画像蓄積手段３７１に蓄積されている可視画像Ｖ_ｔに剣先９０の軌跡を描画するものである。このとき、軌跡描画手段３７３は、剣先９０の位置Ｓ^Ｌ，Ｓ^Ｒに付加された属性情報を参照するので、軌跡Ｔの入れ替わりを抑制し、正しい軌跡を描画できる。 The trajectory drawing means 373 draws the trajectory of the tip 90 on the visible image _Vt accumulated in the visible image accumulating means 371 while referring to the attribute information. At this time, the trajectory drawing means 373 refers to the attribute information added to the positions ^S ^L and SR of the tip 90, so that the replacement of the trajectory T can be suppressed and the correct trajectory can be drawn.

例えば、軌跡描画手段３７３は、左側の選手Ｈ^Ｌの剣先が赤色、右側の選手Ｈ^Ｒの剣先が緑色のように、左右の選手Ｈ^Ｌ，Ｈ^Ｒの剣先９０に異なる色を予め設定する。そして、軌跡描画手段３７３は、図１２に示すように、可視画像Ｖ_ｔと軌跡Ｔとを合成した軌跡合成画像Ｆ_ｔを生成する。この軌跡合成画像Ｆ_ｔには、左側の選手Ｈ^Ｌが持つ剣先９０の軌跡Ｔ^１と、右側の選手Ｈ^Ｒが持つ剣先９０の軌跡Ｔ^２とがＣＧで合成されている。
その後、軌跡描画手段３７３は、軌跡合成画像Ｆ_ｔを外部の装置（例えば、ディスプレイ）に出力する。 For example, the trajectory drawing means 373 presets different colors for the tips 90 of the left and right players ^HL and ^HR , such as red for the tip of the player ^HL on the left and green for the tip of the player ^HR on the right. Then, the trajectory drawing means 373 generates a trajectory composite image _Ft by synthesizing the visible image Vt and the trajectory _T , as shown in FIG. In this trajectory composite image _Ft , the trajectory T1 of the tip 90 held by the player ^HL on the left and the trajectory T2 of the tip ⁹⁰ held by the player ^HR on the right ^are synthesized by CG.
After that, the trajectory drawing means 373 outputs the trajectory composite image _Ft to an external device (for example, display).

［オブジェクト追跡装置の動作］
図１３を参照し、オブジェクト追跡装置３０の動作について説明する。
図１３に示すように、ステップＳ１において、赤外光検出手段３１は、赤外画像Ｉ_ｔから剣先９０の位置Ｓ^１，Ｓ^２を検出する。
例えば、赤外光検出手段３１は、２値赤外差分画像を生成し、抽出した候補ブロブにモルフォロジ処理を施す。次に、赤外光検出部３１１は、モルフォロジ処理で残った候補ブロブにラベリング処理を施し、候補ブロブの位置、面積及び形状特徴量を求める。そして、赤外光検出手段３１は、面積及び形状特徴量を基準にフィルタリングし、面積が大きい２個の候補ブロブの位置を剣先９０の位置Ｓ^１，Ｓ^２として検出する。 [Operation of object tracking device]
The operation of the object tracking device 30 will be described with reference to FIG.
As shown in ^FIG . ¹³ , in step S1, the infrared light detection means 31 detects the positions S1 and S2 of the tip ₉₀ from the infrared image It.
For example, the infrared light detection means 31 generates a binary infrared difference image and applies morphological processing to the extracted candidate blobs. Next, the infrared light detection unit 311 performs labeling processing on the candidate blobs remaining after the morphology processing, and obtains the position, area, and shape feature amount of the candidate blobs. Then, the infrared light detection means 31 performs filtering based on the area and shape feature amount, and detects the positions of two candidate blobs having a large area as the positions S ¹ and S ² of the tip 90 .

ステップＳ２において、人物姿勢検出手段３３１は、可視画像Ｖ_ｔに含まれる全人物Ｈの関節点Ｂ^ｎ _ｉを検出する。例えば、一般的な姿勢計測手法の一つである“ＯｐｅｎＰｏｓｅ”を用いて、人物Ｈの各関節点Ｂ^ｎ _ｉを検出する。
ステップＳ３において、人物選択手段３３３は、ステップＳ２で検出した全人物Ｈのうち、胴体長ｌ^ｎが長い２人の選手Ｈ^Ｌ，Ｈ^Ｒを選択する。 In step S2, the human posture detection means 331 detects the joint points ^Bn _i of all the people H included in the visible image _Vt . For example, each joint point B ⁿ _i of the person H is detected using “OpenPose” which is one of the general posture measurement methods.
In step S3, the person selection means 333 selects two players ^HL and ^HR having the longest torso length ^ln among all persons H detected in step S2.

ステップＳ４において、特徴ベクトル算出手段３５１は、ステップＳ１で検出した剣先９０の位置Ｓ^１，Ｓ^２から、ステップＳ３で選択した選手Ｈ^Ｌ，Ｈ^Ｒの各関節点Ｂ^Ｌ _ｉ，Ｂ^Ｒ _ｉまでの特徴ベクトルを算出する。このとき、特徴ベクトル算出手段３５１は、選手Ｈ^Ｌ，Ｈ^Ｒの胴体長ｌ^ｎで正規化する。 In step S4, the feature vector calculation means 351 calculates from the positions S ¹ and S ² of the tip of the sword 90 detected in step S1 to the joint points B ^L _i and B ^R _i of the players ^HL and ^HR selected in step S3. Calculate the feature vector of . At this time, the feature vector calculating means 351 normalizes the torso lengths ^ln of the players ^HL and ^HR .

ステップＳ５において、属性情報生成手段３５３は、予め学習した識別器を用いて、剣先９０の位置Ｓ^１，Ｓ^２に対応する選手Ｈ^Ｌ，Ｈ^Ｒを選択し、剣先９０と選手Ｈ^Ｌ，Ｈ^Ｒとの対応関係を示す属性情報を生成する。
例えば、属性情報生成手段３５３は、左右の選手Ｈ^Ｌ，Ｈ^Ｒに対応した２つの回帰モデルの識別器を用いて、左右の選手Ｈ^Ｌ，Ｈ^Ｒに対応する尤度を４通り算出する。そして、属性情報生成手段３５３は、２個の剣先９０と左右の選手Ｈ^Ｌ，Ｈ^Ｒとの４通りの組み合わせのうち、最も尤度が高くなる組み合わせを選択する。さらに、属性情報生成手段３５３は、２個の剣先９０と、左右の選手Ｈ^Ｌ，Ｈ^Ｒとの対応関係を示す属性情報を生成し、生成した属性情報を剣先９０の位置Ｓ^１，Ｓ^２に付加する。 ^In step S5, the attribute information generating means ³⁵³ ^{selects the players HL} ^and ^HR corresponding to the positions S1 and S2 of the tip 90 by using a pre-learned discriminator. Attribute information indicating the correspondence with ^R is generated.
For example, the attribute information generating means 353 uses classifiers of two regression models corresponding to the left and right players ^HL and ^HR to calculate four likelihoods corresponding to the left and right players ^HL and ^HR . Then, the attribute information generating means 353 selects the combination with the highest likelihood among the four combinations of the two sword tips 90 and the left and right players ^HL and ^HR . Further, the attribute information generating means 353 generates attribute information indicating the correspondence between the two sword tips 90 and the left and right players ^HL and ^HR , and sends the generated attribute information to positions S ¹ and S ² of the sword tips 90 . Append to

ステップＳ６において、軌跡描画手段３７３は、ステップＳ５で生成した属性情報を参照しながら、剣先９０の軌跡Ｔを可視画像Ｖ_ｔに描画する。例えば、軌跡描画手段３７３は、図１２に示すように、可視画像Ｖ_ｔと軌跡Ｔとを合成した軌跡合成画像Ｆ_ｔを生成する。 In step S6, the trajectory drawing means 373 draws the trajectory _T of the tip 90 on the visible image Vt while referring to the attribute information generated in step S5. For example, the trajectory drawing means 373 generates a trajectory composite image _Ft by synthesizing the visible image Vt and the trajectory _T , as shown in FIG.

［作用・効果］
以上のように、オブジェクト追跡装置３０は、剣先９０を追跡する際、剣先９０と選手Ｈ^Ｌ，Ｈ^Ｒとの対応関係を示す属性情報を用いるので、その軌跡の入れ替わりを抑制できる。このように、オブジェクト追跡装置３０は、正確な剣先９０の軌跡を生成し、追跡頑健性を向上させることができる。 [Action/effect]
As described above, when tracking the tip 90, the object tracking device 30 uses the attribute information indicating the correspondence between the tip 90 and the players ^HL and ^HR , so that the trajectories can be prevented from being changed. Thus, the object tracking device 30 can generate an accurate trajectory of the tip 90 and improve tracking robustness.

（変形例）
以上、本発明の実施形態を詳述してきたが、本発明は前記した実施形態に限られるものではなく、本発明の要旨を逸脱しない範囲の設計変更等も含まれる。 (Modification)
Although the embodiments of the present invention have been described in detail above, the present invention is not limited to the above-described embodiments, and includes design changes and the like without departing from the gist of the present invention.

前記した実施形態では、可視・近赤外同光軸カメラを利用することとして説明したが、本発明は、これに限定されない。例えば、本発明では、可視・近赤外同時撮影カメラ、及び、可視・近赤外マルチ波長カメラを利用することができる。 In the above-described embodiment, the use of a visible/near-infrared coaxial camera has been described, but the present invention is not limited to this. For example, in the present invention, a visible/near-infrared simultaneous imaging camera and a visible/near-infrared multi-wavelength camera can be used.

［可視・近赤外同時撮影カメラ］
可視・近赤外同時撮影カメラは、ＲＧＢに加えＩＲ（近赤外光）を分光する４波長分光プリズムを用いて、それぞれの波長毎のセンサ、合計４枚のセンサで撮影するカメラである。この可視・近赤外同時撮影カメラは、ＲＧＢセンサによる可視画像、及び、ＩＲセンサによる赤外画像を個別に出力することが可能である。つまり、オブジェクト追跡装置は、前記した実施形態と同様、可視・近赤外同時撮影カメラから可視画像及び赤外画像を取得し、軌跡合成画像を出力できる。 [Visible and near-infrared simultaneous shooting camera]
The visible/near-infrared simultaneous shooting camera uses a four-wavelength spectral prism that separates IR (near-infrared light) in addition to RGB, and shoots with a sensor for each wavelength, a total of four sensors. This visible/near-infrared simultaneous photographing camera can separately output a visible image from an RGB sensor and an infrared image from an IR sensor. That is, the object tracking device can acquire a visible image and an infrared image from the visible/near-infrared simultaneous photographing camera and output a trajectory composite image, as in the above-described embodiment.

［可視・近赤外マルチ波長カメラ］
可視・近赤外マルチ波長カメラは、ＲＧＢ３色以外に近赤外領域で３つの波長を分光するマルチ波長分光プリズムを利用したカメラである。通常、カメラは、ＩＲカットフィルタ又は可視光カットフィルタを装着して可視分光特性又は近赤外分光特性に示される波長を取り出し、可視光又は近赤外のみの画像を取得する。しかし、可視・近赤外マルチ波長カメラは、ＩＲカットフィルタ及び可視光カットフィルタを装着せず、基本分光特性に示される波長全てを取り出すことで、可視光及び近赤外光を合成した可視・赤外合成画像を生成する。 [Visible/near-infrared multi-wavelength camera]
The visible/near-infrared multi-wavelength camera is a camera that uses a multi-wavelength spectroscopic prism that separates three wavelengths in the near-infrared region in addition to the three colors of RGB. Usually, a camera is equipped with an IR cut filter or a visible light cut filter to extract wavelengths indicated by visible spectral characteristics or near-infrared spectral characteristics, and obtains an image of only visible light or near-infrared light. However, the visible/near-infrared multi-wavelength camera is not equipped with an IR cut filter or a visible light cut filter, and extracts all wavelengths indicated by the basic spectral characteristics. Generate an infrared composite image.

そこで、可視・近赤外マルチ波長カメラを用いる場合、オブジェクト追跡システムは、可視・赤外分離装置を備えればよい。この可視・赤外分離装置は、可視・近赤外マルチ波長カメラが生成した可視・赤外合成画像を、可視画像と赤外画像とに分離するものである。 Therefore, when using a visible/near-infrared multi-wavelength camera, the object tracking system may include a visible/infrared separation device. This visible/infrared separation device separates a visible/infrared composite image generated by a visible/near-infrared multi-wavelength camera into a visible image and an infrared image.

［属性情報の生成方法］
前記した実施形態では、現フレームの可視画像のみで尤度を求めることとして説明したが、本発明は、これに限定されない。つまり、過去フレームの可視画像でも尤度を求めてもよい。例えば、属性情報生成手段は、現時刻から一定時間遡った期間で尤度を平均し、この尤度の平均が最も高くなる剣先と選手との組み合わせを選択する。 [How to generate attribute information]
In the above-described embodiment, the likelihood is obtained only from the visible image of the current frame, but the present invention is not limited to this. That is, the likelihood may also be obtained for the visible image of the past frame. For example, the attribute information generating means averages the likelihoods in a period that is a certain amount of time before the current time, and selects the combination of the tip of the sword and the player that has the highest average likelihood.

［その他変形例］
前記した実施形態では、フェンシングを一例として説明したが、本発明は、これに限定されない。つまり、本発明は、テニス、バドミントン、バレーボール等、選手の位置が入れ替わらないスポーツにも適用することができる。例えば、バドミントンの場合、オブジェクト追跡装置は、選手が持つラケットの方向を識別し、ラケットの軌跡に左右の選手を対応付けることで、両選手が持つラケットの軌跡を異なる色で描画することができる。 [Other Modifications]
Although fencing has been described as an example in the above-described embodiments, the present invention is not limited to this. That is, the present invention can also be applied to sports such as tennis, badminton, and volleyball, in which the positions of players do not change. For example, in the case of badminton, the object tracking device identifies the direction of the racket held by the player, associates the left and right players with the racket trajectory, and draws the racket trajectories held by both players in different colors.

さらに、本発明は、軌跡を異なる色で描かない場合、選手の位置が入れ換わるスポーツにも適用することができる。例えば、オブジェクト追跡装置は、バドミントンのシャトルを追跡し、その軌跡を描画することができる。さらに、本発明は、剣道やナギナタにも適用することができる。この他、本発明は、オーケストラにおける指揮棒の軌跡や、ドラマや映画における刀等の軌跡を描画することができる。 Furthermore, the invention can also be applied to sports in which the positions of the players are swapped if the trajectories are not drawn in different colors. For example, an object tracking device can track a badminton shuttlecock and draw its trajectory. Furthermore, the present invention can also be applied to kendo and naginata. In addition, the present invention can draw the trajectory of a baton in an orchestra, or the trajectory of a sword or the like in a drama or movie.

前記した実施形態では、識別器をＳＶＭで学習することとして説明したが、本発明は、これに限定されない。例えば、識別器は、再帰型ニューラルネットワーク（ＲＮＮ：Recurrent Neural Network）などのニューラルネットワーク、ＣＲＦ（Conditional Random Fields）などで学習することができる。また、本発明では、回帰モデルの識別器だけでなく、分類モデルの識別器も利用することができる。 In the above-described embodiment, the classifier is learned by SVM, but the present invention is not limited to this. For example, the discriminator can be learned by a neural network such as a recurrent neural network (RNN: Recurrent Neural Network), CRF (Conditional Random Fields), or the like. Moreover, in the present invention, not only a regression model discriminator but also a classification model discriminator can be used.

前記した実施形態では、１８点の関節点を検出することとして説明したが、全ての関節点を検出せずともよい。人物の姿勢に相関が高いのは上半身の関節点であり、特に、頭及び腕部の関節点であると考えられるので、これら関節点を検出すればよい。
また、関節点の検出には参考文献１に記載の手法を適用することとして説明したが、本発明は、これに限定されない。選手の関節点のみを検出できる手法を適用した場合、オブジェクト追跡装置は、人物選択手段を備えずともよい。 In the above-described embodiment, 18 joint points are detected, but all joint points may not be detected. The joint points of the upper body are considered to have a high correlation with the posture of the person, and particularly the joint points of the head and arms, so these joint points should be detected.
Further, although the method described in reference 1 is applied to detect joint points, the present invention is not limited to this. If a technique that can detect only the joint points of players is applied, the object tracking device does not need to include the person selection means.

前記した実施形態では、軌跡描画手段が軌跡を描画することとして説明したが、本発明は、これに限定されない。例えば、オブジェクト追跡装置は、オブジェクトの軌跡を示す軌跡データを生成し、生成した軌跡データを外部に出力してもよい。 In the above-described embodiment, the trajectory drawing means draws the trajectory, but the present invention is not limited to this. For example, the object tracking device may generate trajectory data indicating the trajectory of the object and output the generated trajectory data to the outside.

前記した実施形態では、オブジェクト追跡装置を独立したハードウェアとして説明したが、本発明は、これに限定されない。例えば、本発明は、コンピュータが備えるＣＰＵ、メモリ、ハードディスク等のハードウェア資源を、前記したオブジェクト追跡装置として協調動作させるプログラムで実現することもできる。これらのプログラムは、通信回線を介して配布してもよく、ＣＤ－ＲＯＭやフラッシュメモリ等の記録媒体に書き込んで配布してもよい。 Although the object tracking device has been described as independent hardware in the above-described embodiments, the present invention is not limited to this. For example, the present invention can also be realized by a program that causes hardware resources such as a CPU, memory, and hard disk provided in a computer to operate cooperatively as the object tracking device described above. These programs may be distributed via a communication line, or may be distributed after being written in a recording medium such as a CD-ROM or flash memory.

オブジェクト追跡装置による識別精度向上の効果を検証するため、図１のオブジェクト追跡装置にフェンシングの試合映像を入力して実験を行った。
従来手法では、パーティクルフィルタを用いて剣先の位置のみで追跡処理を行ったため、軌跡の入れ替わりなどの誤追跡が頻繁に生じた。この実施例では、従来手法で誤追跡が発生した映像シーケンスを利用し、その映像シーケンス毎に左右の選手を識別したときの精度（％）を算出した。さらに、オブジェクト追跡装置の処理速度（ｆｐｓ：フレーム/秒）をあわせて計測した。 In order to verify the effectiveness of the object tracking system in improving identification accuracy, we conducted an experiment by inputting a video of a fencing match into the object tracking system shown in FIG.
In the conventional method, tracking processing was performed only at the position of the tip of the sword using a particle filter, so erroneous tracking such as trajectory replacement occurred frequently. In this example, a video sequence in which mistracking occurred by the conventional method was used, and the accuracy (%) when identifying the left and right players for each video sequence was calculated. Furthermore, the processing speed (fps: frames/second) of the object tracking device was also measured.

その実験結果を以下の表１に示す。５映像シーケンスの平均で９７．６％と、高い精度が得られた。この実施例では、従来手法で誤追跡が発生した映像シーケンスを用いたが、人物姿勢を考慮することによって、全ての映像シーケンスで誤追跡を低減できることが分かった。また、この実施例では、処理速度が平均２．８ｆｐｓ程度であり、実用上十分であることも分かった。例えば、ＧＰＵ（Graphics Processing Unit）を用いることや、識別処理を秒単位にすることで、リアルタイム処理を実現できると考えられる。 The experimental results are shown in Table 1 below. A high accuracy of 97.6% was obtained on average for 5 video sequences. In this example, a video sequence in which mistracking occurred by the conventional method was used, but it was found that mistracking could be reduced in all video sequences by considering the pose of the person. Also, in this example, the average processing speed was about 2.8 fps, which was found to be practically sufficient. For example, real-time processing can be realized by using a GPU (Graphics Processing Unit) or performing identification processing in seconds.

実験結果を検証するため、尤度分布を可視化した画像を図１４に示す。前記したように識別器（ＳＶＭ回帰）は、左右の選手の尤度をそれぞれ算出できる。そこで、この実施例では、画像の全画素で左右の選手の尤度を算出し、その値に応じてヒートマップ状に可視化した。この図１４では、尤度の値に応じた輝度で、左側の選手を赤色で示し、右側の選手を緑色で示した。さらに、図１４では、剣先の検出位置を丸印で図示し、その位置での尤度を数値で示した。 FIG. 14 shows an image in which the likelihood distribution is visualized in order to verify the experimental results. As described above, the discriminator (SVM regression) can calculate the likelihoods of the left and right players, respectively. Therefore, in this embodiment, the likelihoods of the left and right players are calculated for all pixels of the image, and the calculated values are visualized in the form of a heat map. In FIG. 14, the brightness corresponding to the likelihood value indicates the player on the left side in red and the player on the right side in green. Furthermore, in FIG. 14, the detection position of the tip of the sword is illustrated with a circle, and the likelihood at that position is indicated numerically.

さらに、図１４と同様、別の４画像で尤度分布を可視化したものを図１５に示す。図１４及び図１５に示すように、選手同士の距離が近くなると、尤度分布の範囲も狭くなるが、オブジェクト追跡装置が左右の選手を正しく識別できた。このように、オブジェクト追跡装置は、剣先位置と人物の関節位置との関係性を学習することにより、高い精度で選手を識別できることがわかった。 Furthermore, similar to FIG. 14, FIG. 15 shows the visualization of the likelihood distribution with another four images. As shown in FIGS. 14 and 15, the closer the players are to each other, the narrower the range of the likelihood distribution becomes, but the object tracking device can correctly identify the left and right players. Thus, it was found that the object tracking device can identify the player with high accuracy by learning the relationship between the position of the tip of the sword and the joint positions of the person.

１オブジェクト追跡システム
１０赤外光投光器
２０可視・赤外同軸光カメラ
３０オブジェクト追跡装置
３１赤外光検出手段
３３人物姿勢取得手段
３３１人物姿勢検出手段（関節位置検出手段）
３３３人物選択手段
３５オブジェクト識別手段
３５１特徴ベクトル算出手段
３５３属性情報生成手段
３７オブジェクト追跡手段
３７１可視画像蓄積手段
３７３軌跡描画手段（軌跡生成手段） 1 Object Tracking System 10 Infrared Light Projector 20 Visible/Infrared Coaxial Light Camera 30 Object Tracking Device 31 Infrared Light Detection Means 33 Human Posture Acquisition Means 331 Human Posture Detection Means (Joint Position Detection Means)
333 person selection means 35 object identification means 351 feature vector calculation means 353 attribute information generation means 37 object tracking means 371 visible image storage means 373 trajectory drawing means (trajectory generation means)

Claims

Using an infrared image obtained by photographing an infrared light marker attached to each moving object using infrared light and a visible image obtained by photographing a person moving each of the objects using visible light, the object An object tracking device for tracking
infrared light detection means for detecting the position of the infrared light marker from the infrared image as the position of the object;
joint position detection means for detecting each joint position of the person from the visible image;
feature vector calculation means for calculating a feature vector from the position of the object to each joint position;
The person corresponding to the object is selected by the feature vector using a classifier that has previously learned the relationship between the position of the object and the positions of the joints, and attribute information indicating the correspondence relationship between the object and the person. an attribute information generating means for generating
a trajectory generating means for generating a trajectory of the object based on the position of the object and the attribute information;
An object tracking device comprising:

The attribute information generating means is
learning the classifier of the regression model for each person;
Using the classifier learned for each person, the likelihood of each combination of the object and the person is calculated from the feature vector, and the combination of the object and the person with the highest calculated likelihood is selected. 2. The object tracking device according to claim 1, wherein the attribute information is generated by selecting.

The joint position detection means detects joint positions of all persons included in the visible image,
Based on the joint positions detected by the joint position detecting means, a body length from the neck joint to the hip joint is calculated for each person, and a predetermined number of the people are selected in descending order of the calculated body length. 3. The object tracking apparatus according to claim 1, further comprising: person selection means for outputting said joint positions of said selected person to said feature vector calculation means.

4. The object tracking device according to claim 3, wherein said feature vector calculating means normalizes said feature vector by said torso length.

5. The object tracking device according to claim 1, wherein the joint position detecting means detects at least the joint positions of the head and arms of the person.

A program for causing a computer to function as the object tracking device according to any one of claims 1 to 5.