JP7272024B2

JP7272024B2 - Object tracking device, monitoring system and object tracking method

Info

Publication number: JP7272024B2
Application number: JP2019049169A
Authority: JP
Inventors: 宏奥田; 信二高橋
Original assignee: Omron Corp
Current assignee: Omron Corp
Priority date: 2019-03-15
Filing date: 2019-03-15
Publication date: 2023-05-12
Anticipated expiration: 2039-03-15
Also published as: JP2020149642A

Description

本発明は、動画像中の物体を追跡する技術に関する。 The present invention relates to technology for tracking an object in moving images.

動画像（時系列画像）のあるフレームにおいて検出された物体を追跡する物体追跡は、コンピュータビジョン分野において重要な技術である。 Object tracking, which tracks an object detected in a frame of moving images (time-series images), is an important technique in the field of computer vision.

物体追跡アルゴリズムの一つとして相関フィルタを用いる手法が知られている（非特許文献１）。この手法では、ＨＯＧ特徴量のような形状の画像特徴を利用し、追跡対象物の周辺領域まで含めてオンライン学習を行う。しかしながら、上述のような特徴量を用いているため、物体の形状や色が急激に変化する場合に、追跡に失敗することがあった。また、追跡対象物の周辺領域を含む画像を用いてオンライン学習をしているため、複雑な背景下においては適切な応答が得られずに、背景にドリフトすることがあった。 A method using a correlation filter is known as one of object tracking algorithms (Non-Patent Document 1). In this method, an image feature of a shape such as an HOG feature is used, and online learning is performed including the peripheral area of the tracked object. However, since the above-described feature values are used, tracking may fail when the shape or color of the object changes abruptly. In addition, since online learning is performed using images that include the surrounding area of the tracked object, it sometimes drifts into the background without obtaining an appropriate response under a complex background.

また、物体追跡に背景差分を用いることも知られている（例えば、特許文献１）。背景差分法では、フレーム間の差分を取り、差分値の大きい領域を物体として検出する。この手法では、追跡対象の動きが止まってしまった場合にはロストしてしまう。例えば追跡対象が人物である場合、この人物が椅子に座るとロストしてしまうため、オフィス内の監視に向かない。さらに、テンプレートマッチングでは、物体が変形しテンプレートとの差異が所定の閾値以上になると、ロストしてしまう。人物の場合、人物の動作によってテンプレートと比べて大きな変形が発生するため追跡に失敗する。 It is also known to use background subtraction for object tracking (eg, Patent Document 1). In the background subtraction method, a difference between frames is obtained, and an area with a large difference value is detected as an object. With this method, if the movement of the tracked object stops, it will be lost. For example, if the object to be tracked is a person, it will be lost if the person sits on a chair, so it is not suitable for monitoring in the office. Furthermore, in template matching, if the object is deformed and the difference from the template exceeds a predetermined threshold, the object is lost. In the case of a person, tracking fails because the motion of the person causes a large deformation compared to the template.

ところで、ビルディングオートメーション（ＢＡ）やファクトリーオートメーション（ＦＡ）の分野において、画像センサにより人の「数」・「位置」・「動線」などを自動で計測し、照明や空調などの機器を最適制御するアプリケーションが必要とされている。このような用途では、できるだけ広い範囲の画像情報を取得するために、魚眼レンズ（フィッシュアイレンズ）を搭載した超広角のカメラ（魚眼カメラ、全方位カメラ、全天球カメラなどと呼ばれるが、いずれも意味は同じである。本明細書では「魚眼カメラ」の語を用いる）を利用することが多い。さらに、上記の用途では、できるだけ広い範囲の画像情報を取得するために、天井などの高所に取り付けたカメラをカメラの視点がトップ・ビューになるようにして配置する。この配置のカメラでは、人物を撮影する視点は、人物が画像の周辺にいるときには正面像になり、画像の中央にいるときには上面図となる。 By the way, in the field of building automation (BA) and factory automation (FA), image sensors are used to automatically measure the number, position, and flow of people, and optimally control equipment such as lighting and air conditioning. What is needed is an application that For such applications, an ultra-wide-angle camera equipped with a fish-eye lens (also called a fish-eye camera, omnidirectional camera, omnidirectional camera, etc.) is used to acquire image information in as wide a range as possible. In this specification, the term "fish-eye camera" is used) is often used. Furthermore, in the above applications, in order to acquire image information in as wide a range as possible, a camera mounted on a high place such as a ceiling is arranged so that the viewpoint of the camera is a top view. With a camera of this arrangement, the viewpoint for photographing a person is a front view when the person is in the periphery of the image, and a top view when the person is in the center of the image.

魚眼カメラで撮影された画像は、撮影面内の位置により撮影対象の見た目が歪みのため変形する。さらに、カメラの視点をトップ・ビューにすると、追跡対象の位置により見た目が変化する。また、組み込み機器など、処理能力の限られた環境ではフレームレートが低いことが考えられ、フレーム間での物体の移動量や特徴量の変化が大きいという特殊性がある。したがって、従来技術の追跡手法では、精度良く追跡できない場合がある。 An image captured by a fish-eye camera is deformed due to distortion in the appearance of an object to be captured depending on the position within the imaging plane. Furthermore, when the camera viewpoint is set to the top view, the appearance changes depending on the position of the tracked object. Also, in an environment with limited processing power, such as an embedded device, the frame rate is likely to be low, and there is a peculiarity in that the amount of movement of an object and the amount of feature change between frames are large. Therefore, the tracking method of the prior art may not be able to track accurately.

特開２００２－１５７５９９号公報JP-A-2002-157599

Henriques, Joao F., et al. "High-speed tracking with kernelized correlation filters." IEEE transactions on pattern analysis and machine intelligence 37.3 (2015): 583-596.Henriques, Joao F., et al. "High-speed tracking with kernelized correlation filters." IEEE transactions on pattern analysis and machine intelligence 37.3 (2015): 583-596.

本発明は上記実情に鑑みなされたものであって、従来よりも精度の良い物体追跡技術を提供することを目的とする。 SUMMARY OF THE INVENTION It is an object of the present invention to provide an object tracking technique that is more accurate than conventional techniques.

上記目的を達成するために本発明は、以下の構成を採用する。 In order to achieve the above objects, the present invention employs the following configurations.

本発明の第一側面は、第１フレーム画像における対象物の位置を取得する取得手段と、前記第１フレーム画像の後のフレーム画像である第２フレーム画像から、前記対象物の位置を求める追跡手段と、を備える物体追跡装置である。前記追跡手段は、第１サブ追跡手段と第２サブ追跡手段と位置特定手段とを含む。第１サブ追跡手段は、前記第２フレーム画像から抽出される特徴量に基づいて、第１の追跡アルゴリズムによって前記対象物の前記第２フレーム画像における第１の座標を求める。第２サブ追跡手段は、前記第１フレーム画像と前記第２フレーム画像のフレーム間の差分画像に基づいて、動きのある領域の重心である第２の座標を求める。位置特定手段は、前記第１の座標と前記第２の座標とに基づいて、前記第２フレーム画像における前記対象物の位置を求める。 A first aspect of the present invention includes acquisition means for acquiring the position of an object in a first frame image, and tracking for obtaining the position of the object from a second frame image that is a frame image after the first frame image. and means for tracking an object. The tracking means includes a first sub-tracking means, a second sub-tracking means and a locating means. A first sub-tracking means obtains first coordinates of the object in the second frame image by a first tracking algorithm based on the feature amount extracted from the second frame image. The second sub-tracking means obtains second coordinates, which are the center of gravity of the moving area, based on the difference image between the frames of the first frame image and the second frame image. The position specifying means obtains the position of the object in the second frame image based on the first coordinates and the second coordinates.

追跡の対象とする物体、すなわち「対象物」は、任意の物体であってよく、人体、顔、動物、車両などがその一例である。第１の追跡アルゴリズムは任意の物体追跡アルゴリズムであってよいが、局所最適化による追跡アルゴリズムが好ましい。局所最適による追跡アルゴリズムとは、追跡対象を含む部分領域の画像を学習して追跡するアルゴリズムである。この学習は逐次学習型の追跡アルゴリズムにより行うことが好適である。 The object to be tracked, or "object", can be any object, such as a human body, a face, an animal, a vehicle, and the like. The first tracking algorithm can be any object tracking algorithm, but a tracking algorithm with local optimization is preferred. A tracking algorithm based on local optimization is an algorithm that learns and tracks an image of a partial region containing a tracked object. This learning is preferably performed by a sequential learning type tracking algorithm.

第１の追跡アルゴリズムによって特定される対象物の位置（第１の座標）は、学習における誤差の蓄積の影響により正確ではない場合がある。そこで、上述のようにフレーム間の差分画像に基づいて動きのある領域の重心位置（第２の座標）を求めて、この重心位置も考慮して対象物の位置を特定することで、対象物の中心位置をより精度良く求めることができ、追跡精度が向上する。 The position of the object (first coordinates) identified by the first tracking algorithm may not be accurate due to the effects of error accumulation during learning. Therefore, as described above, the position of the center of gravity (second coordinates) of the moving region is obtained based on the difference image between frames, and the position of the object is specified in consideration of this position of the center of gravity. can be obtained with higher accuracy, and the tracking accuracy is improved.

本発明の第２サブ追跡手段は、少なくとも第１フレーム画像を用いて生成された前記対象物を含む領域の色ヒストグラムを用いて、前記第２フレーム画像において前記対象物が存在する確からしさを表す尤度のマップ（ヒートマップ）を生成し、前記第１フレーム画像と前記第２フレーム画像のフレーム間の差分画像に、前記尤度のマップを掛け合わせて調整済差分画像を生成し、前記調整済差分画像において、差分の程度に応じた画素領域の重心を前記第２の座標として求めてもよい。このようにフレーム間の差分だけでなく、色ヒストグラムに基づく前景尤度も考慮することで、追跡精度がより向上する。フレーム間の差分のみに基づいて重心位置を決定すると追跡対象ではない動体の影響を受けるが、前景尤度を考慮することでこの影響を排除もしくは低減できる。 The second sub-tracking means of the present invention uses a color histogram of an area containing the object generated using at least the first frame image to express the probability that the object exists in the second frame image. generating a likelihood map (heat map), multiplying an inter-frame difference image between the first frame image and the second frame image by the likelihood map to generate an adjusted difference image, and performing the adjustment; In the completed difference image, the center of gravity of the pixel area corresponding to the degree of difference may be obtained as the second coordinates. In this way, by considering not only the difference between frames but also the foreground likelihood based on the color histogram, the tracking accuracy is further improved. Determining the position of the center of gravity based only on the difference between frames is affected by moving objects that are not the target of tracking, but this effect can be eliminated or reduced by considering the foreground likelihood.

また、本発明の第２サブ追跡手段は、前記調整済差分画像に対して、前記対象物が存在すると推測される位置を中心とする中心領域と、当該中心領域の周辺の周辺領域とを設定し、中心領域にある画素と周辺領域にある画素とに異なる重みを与えて前記重心を求めてもよい。重みは固定値であってもよいが、中心領域および周辺領域それぞれの面積に応じた値を採用してもよい。 Also, the second sub-tracking means of the present invention sets a central region centered on a position where the object is estimated to exist and a peripheral region around the central region in the adjusted difference image. Alternatively, the center of gravity may be obtained by assigning different weights to the pixels in the central area and the pixels in the peripheral area. The weight may be a fixed value, or a value corresponding to the areas of the central region and the peripheral region may be adopted.

また、本発明の第２サブ追跡手段は、前記中心領域における差分の和と、前記周辺領域における差分の和とに基づいて重み係数を決定し、前記重み係数を用いた前記第１の座標と前記第２の座標の加重平均として決定される座標を前記対象物の位置として求めてもよ
い。例えば、中心領域における差分の和が大きいほど、あるいは第２の座標に対する重みを大きく設定するとよい。これ以外にも、重み係数は、中心領域の差分の和の周辺領域の差分の和に対する比が大きいほど大きく設定したり、中心領域の差分の和と周辺領域の差分の和の和が大きいほど大きく設定したりしてもよい。第２の座標に対する重みは、第１の座標を第２の座標を用いて修正する際の修正効果の強さ、あるいは適用率とみなせる。 Further, the second sub-tracking means of the present invention determines a weighting factor based on the sum of differences in the central area and the sum of differences in the peripheral area, and calculates the first coordinates using the weighting factors. A coordinate determined as a weighted average of the second coordinates may be determined as the position of the object. For example, the larger the sum of the differences in the central region, or the larger the weight for the second coordinate may be set. In addition to this, the weighting factor is set larger as the ratio of the sum of the differences in the central region to the sum of the differences in the peripheral region is larger, or as the sum of the sum of the differences in the central region and the sum of the differences in the peripheral region is larger. You may set it large. The weight for the second coordinate can be regarded as the strength of the correction effect when correcting the first coordinate using the second coordinate, or the application rate.

なお、上述の中心領域は、第１フレーム画像における対象物の位置と、第１フレーム画像における対象物の移動速度に基づいて決定することができる。 Note that the center region described above can be determined based on the position of the object in the first frame image and the moving speed of the object in the first frame image.

また、本発明において、第１サブ追跡手段によって決定される対象物の位置が十分に信頼できる場合には、そうでない場合よりも第２の座標による修正の効果（適用率）を小さく設定してもよいし、あるいは第２の座標による修正を行わなくてもよい。すなわち、追跡手段は、第１サブ追跡手段によって求められる第１の座標の確からしさ（信頼度）が第２閾値以上である場合には、そうでない場合よりも、第２の座標に対する重みを小さくして、第１の座標と第２の座標の加重平均として対象物の位置を決定してもよい。 Further, in the present invention, when the position of the object determined by the first sub-tracking means is sufficiently reliable, the effect (application rate) of correction by the second coordinates is set to be smaller than otherwise. Alternatively, no modification by the second coordinates may be performed. That is, when the likelihood (reliability) of the first coordinate obtained by the first sub-tracking means is equal to or greater than the second threshold, the tracking means assigns a smaller weight to the second coordinate than otherwise. to determine the position of the object as a weighted average of the first coordinate and the second coordinate.

また、本発明において、第１サブ追跡手段と第２サブ追跡手段によって決定される対象物の位置が十分に近いときには、そうでない場合よりも第２の座標による修正の効果（適用率）を小さく設定してもよいし、あるいは第２の座標による修正を行わなくてもよい。すなわち、位置特定手段は、第１の座標と第２の座標の差が第３閾値未満である場合には、そうでない場合よりも、第２の座標に対する重みを小さくして、第１の座標と第２の座標の加重平均として対象物の位置を決定してもよい。 In addition, in the present invention, when the positions of the object determined by the first sub-tracking means and the second sub-tracking means are sufficiently close, the effect (application rate) of correction by the second coordinates is made smaller than otherwise. It may be set or may not be modified by the second coordinate. That is, when the difference between the first coordinate and the second coordinate is less than the third threshold, the position identifying means assigns a smaller weight to the second coordinate than otherwise, and and the second coordinate may be used to determine the position of the object.

本発明における第１サブ追跡手段が採用する第１の追跡アルゴリズムは特に限定されないが、例えば、第１フレーム画像における対象物の近傍から得られる特徴量を用いた相関フィルタによって、第２フレーム画像における対象物の位置を求めるアルゴリズムであってよい。また、第１の追跡アルゴリズムは、対象物の位置が特定されるたびに学習を行う逐次学習型のアルゴリズムであってよい。 The first tracking algorithm adopted by the first sub-tracking means in the present invention is not particularly limited. It may be an algorithm for determining the position of an object. Also, the first tracking algorithm may be a sequential learning algorithm that learns each time the position of the object is specified.

また、本発明において処理対象とされる画像は、魚眼カメラにより得られた魚眼画像であってよい。「魚眼カメラ」は、魚眼レンズを搭載したカメラであり、通常のカメラに比べて超広角での撮影が可能なカメラである。全方位カメラ、全天球カメラおよび魚眼カメラはいずれも超広角カメラの一種であり、いずれも意味は同じである。魚眼カメラは、検出対象エリアの上方から検出対象エリアを見下ろすように設置されていればよい。典型的には魚眼カメラの光軸が鉛直下向きとなるように設置されるが、魚眼カメラの光軸が鉛直方向に対して傾いていても構わない。魚眼画像はひずみが大きいため、特に低フレームレートの画像ではフレーム間での物体の特徴変化が大きく、背景へのドリフトが多発する。さらに、カメラの光軸を鉛直下向きとなるように設置すると、画像における対象物の位置により対象物を撮影する視点が変化するため、特に低フレームレートの画像では、物体が大きく変形し追跡の失敗が多発する。しかし、本発明によればそのような魚眼画像においても、カメラの光軸を鉛直下向きとなるように設置しても精度の良い追跡が可能である。もっとも、本発明が処理対象とする画像は、魚眼画像に限られず、通常の画像（歪みの少ない画像や高フレームレートの画像）であっても構わない。 Also, the image to be processed in the present invention may be a fish-eye image obtained by a fish-eye camera. A “fish-eye camera” is a camera equipped with a fish-eye lens, and is capable of shooting at a super wide angle compared to a normal camera. An omnidirectional camera, an omnidirectional camera, and a fisheye camera are all types of ultra-wide-angle cameras, and all have the same meaning. The fisheye camera may be installed so as to look down on the detection target area from above the detection target area. The optical axis of the fish-eye camera is typically set vertically downward, but the optical axis of the fish-eye camera may be tilted with respect to the vertical direction. Since the fisheye image has a large distortion, especially in the low frame rate image, the characteristics of the object change greatly between frames, and the drift to the background frequently occurs. Furthermore, when the camera is installed so that the optical axis faces vertically downward, the viewpoint from which the object is captured changes depending on the position of the object in the image. occur frequently. However, according to the present invention, even with such a fisheye image, accurate tracking is possible even if the optical axis of the camera is set vertically downward. However, the image to be processed by the present invention is not limited to the fisheye image, and may be a normal image (an image with little distortion or an image with a high frame rate).

本発明の第二側面は、第１フレーム画像における対象物の位置を取得する取得ステップと、前記第１フレーム画像の後のフレーム画像である第２フレーム画像から、前記対象物の位置を求める追跡ステップと、を含む、物体追跡方法であって、前記追跡ステップは、前記第２フレーム画像から抽出される特徴量に基づいて、第１の追跡アルゴリズムによって前記対象物の前記第２フレーム画像における第１の座標を求めるステップと、前記第１フレーム画像と前記第２フレーム画像のフレーム間の差分画像に基づいて、動きのある領
域の重心である第２の座標を求めるステップと、前記第１の座標と前記第２の座標とに基づいて、前記第２フレーム画像における前記対象物の位置を求めるステップと、を含む、ことを特徴とする物体追跡方法を提供する。 A second aspect of the present invention includes an acquisition step of acquiring a position of an object in a first frame image, and tracking for obtaining the position of the object from a second frame image that is a frame image after the first frame image. and a second frame image of the object by a first tracking algorithm based on the feature amount extracted from the second frame image. determining second coordinates, which are the center of gravity of a motion area, based on the difference image between the frames of the first frame image and the second frame image; and determining the position of the object in the second frame image based on the coordinates and the second coordinates.

本発明は、上記手段の少なくとも一部を有する物体追跡装置として捉えてもよいし、画像処理装置や監視システムとして捉えてもよい。また、本発明は、上記処理の少なくとも一部を含む物体追跡方法、画像処理方法、監視方法として捉えてもよい。また、本発明は、かかる方法を実現するためのプログラムやそのプログラムを非一時的に記録した記録媒体として捉えることもできる。なお、上記手段および処理の各々は可能な限り互いに組み合わせて本発明を構成することができる。 The present invention may be regarded as an object tracking device having at least part of the above means, or as an image processing device or a monitoring system. Further, the present invention may be regarded as an object tracking method, an image processing method, and a monitoring method including at least part of the above processing. Further, the present invention can also be regarded as a program for realizing such a method and a recording medium on which the program is non-temporarily recorded. It should be noted that each of the means and processes described above can be combined with each other as much as possible to constitute the present invention.

本発明によれば、従来よりも精度の良い物体追跡が行える。 According to the present invention, object tracking can be performed with higher accuracy than in the past.

図１は、本発明に係る人追跡装置の適用例を示す図である。FIG. 1 is a diagram showing an application example of a person tracking device according to the present invention. 図２は、人追跡装置を備える監視システムの構成を示す図である。FIG. 2 is a diagram showing the configuration of a monitoring system that includes a person tracking device. 図３は、追跡部の詳細な機能ブロック図である。FIG. 3 is a detailed functional block diagram of the tracking unit. 図４は、人追跡装置が実施する全体処理のフローチャートである。FIG. 4 is a flowchart of overall processing performed by the person tracking device. 図５は、学習処理のフローチャートである。FIG. 5 is a flowchart of learning processing. 図６は、追跡処理のフローチャートである。FIG. 6 is a flowchart of tracking processing. 図７は、追跡処理におけるターゲット中心修正処理のフローチャートである。FIG. 7 is a flowchart of target center correction processing in tracking processing. 図８は、ターゲット中心修正処理における他追跡対象のマスク処理のフローチャートである。FIG. 8 is a flowchart of masking processing for other tracking targets in the target center correction processing. 図９は、本実施形態における学習処理および追跡処理（修正処理含む）を説明する図である。FIG. 9 is a diagram for explaining learning processing and tracking processing (including correction processing) in this embodiment. 図１０は、調整済差分画像、差分領域の重心、および中心領域と周辺領域の差分の和を説明する図である。FIG. 10 is a diagram for explaining the adjusted difference image, the centroid of the difference area, and the sum of the differences between the central area and the peripheral area. 図１１は、適用率を求めるために用いる関数を説明する図である。FIG. 11 is a diagram illustrating a function used to obtain the application rate. 図１２は、他追跡対象のマスク処理を説明する図である。12A and 12B are diagrams for explaining the masking process for other tracking targets.

＜適用例＞
図１を参照して、本発明に係る物体追跡装置の適用例を説明する。人追跡装置１は、追跡対象エリア１１の上方（例えば天井１２など）に設置された魚眼カメラ１０により得られた魚眼画像を解析して、追跡対象エリア１１内に存在する人１３を検出・追跡する装置である。この人追跡装置１は、例えば、オフィスや工場などにおいて、追跡対象エリア１１を通行する人１３の検出、認識、追跡などを行う。図１の例では、魚眼画像から検出された４つの人体それぞれの領域がバウンディングボックスで示されている。人追跡装置１の検出結果は、外部装置に出力され、例えば、人数のカウント、照明や空調など各種機器の制御、不審者の監視および動線分析などに利用される。 <Application example>
An application example of an object tracking device according to the present invention will be described with reference to FIG. The human tracking device 1 analyzes a fisheye image obtained by a fisheye camera 10 installed above a tracked area 11 (for example, a ceiling 12), and detects a person 13 existing within the tracked area 11. • It is a tracking device. This person tracking device 1 detects, recognizes, and tracks a person 13 passing through a tracking target area 11, for example, in an office, a factory, or the like. In the example of FIG. 1, the four human body regions detected from the fisheye image are indicated by bounding boxes. The detection result of the human tracking device 1 is output to an external device and used, for example, for counting the number of people, controlling various devices such as lighting and air conditioning, monitoring suspicious persons, and analyzing flow lines.

本適用例では、物体追跡アルゴリズムとして、局所最適化による追跡アルゴリズムを採用する。このアルゴリズムでは、追跡対象を含む部分領域の画像を学習し、対象物と同様の特徴を有する領域の位置を特定することにより追跡が行われる。対象物の近傍も学習対象としているので、背景が複雑に変化する状況では対象物の位置を適切に予測できずに、背景を誤って対象物であると判断してしまうドリフトと呼ばれる誤動作が発生することがある。このような誤動作を防止するために、本適用例では、局所最適化による追跡アルゴリズムによって得られる対象物位置を、フレーム間の差分画像に基づいて得られる対象物
位置（差分重心）により修正することで、背景へのドリフトを抑制する。より具体的には前フレーム画像と現フレーム画像の差分画像に加えて、色ヒストグラム、追跡対象物の移動速度（向き及び速さ）、中心領域と周辺領域の差分の分布などを考慮して、差分重心の算出および差分重心による位置の修正効果の度合いを決定する。これにより、基本的な追跡性能の向上と、背景固着を抑制し、より精度の良い追跡を実現する。 In this application example, a tracking algorithm based on local optimization is adopted as the object tracking algorithm. In this algorithm, tracking is performed by learning images of subregions containing the tracked object and locating regions with similar features to the object. Since the vicinity of the object is also included in the learning target, the position of the object cannot be predicted appropriately in situations where the background changes in a complicated manner, resulting in a malfunction called drift, in which the background is mistakenly determined to be the object. I have something to do. In order to prevent such malfunction, in this application example, the object position obtained by the tracking algorithm by local optimization is corrected by the object position (difference centroid) obtained based on the difference image between frames. to suppress drift to the background. More specifically, in addition to the difference image between the previous frame image and the current frame image, considering the color histogram, the moving speed (direction and speed) of the tracked object, the distribution of the difference between the central area and the peripheral area, etc., Calculation of the center of gravity difference and the degree of position correction effect based on the center of gravity difference are determined. This improves basic tracking performance, suppresses background fixation, and achieves more accurate tracking.

＜監視システム＞
図２を参照して、本発明の実施形態を説明する。図２は、本発明の実施形態に係る人追跡装置を適用した監視システムの構成を示すブロック図である。監視システム２は、魚眼カメラ１０と人追跡装置１とを備えている。 <Monitoring system>
An embodiment of the present invention will be described with reference to FIG. FIG. 2 is a block diagram showing the configuration of a monitoring system to which the person tracking device according to the embodiment of the invention is applied. A monitoring system 2 includes a fisheye camera 10 and a human tracking device 1. - 特許庁

魚眼カメラ１０は、魚眼レンズを含む光学系と撮像素子（ＣＣＤやＣＭＯＳなどのイメージセンサ）を有する撮像装置である。魚眼カメラ１０は、例えば図１に示すように、追跡対象エリア１１の天井１２などに、光軸を鉛直下向きにした状態で設置され、追跡対象エリア１１の全方位（３６０度）の画像を撮影するとよい。魚眼カメラ１０は人追跡装置１に対し有線（ＵＳＢケーブル、ＬＡＮケーブルなど）または無線（ＷｉＦｉなど）で接続され、魚眼カメラ１０で撮影された画像データは人追跡装置１に取り込まれる。画像データはモノクロ画像、カラー画像のいずれでもよく、また画像データの解像度やフレームレートやフォーマットは任意である。本実施形態では、１０ｆｐｓ（１秒あたり１０枚）で取り込まれるカラー（ＲＧＢ）画像を用いることを想定している。 The fisheye camera 10 is an imaging device having an optical system including a fisheye lens and an imaging element (image sensor such as CCD or CMOS). For example, as shown in FIG. 1, the fisheye camera 10 is installed on the ceiling 12 of the tracking target area 11 or the like with the optical axis directed vertically downward, and captures an omnidirectional (360 degrees) image of the tracking target area 11. Take a picture. The fisheye camera 10 is connected to the person tracking device 1 by wire (USB cable, LAN cable, etc.) or wirelessly (WiFi, etc.). The image data may be either a monochrome image or a color image, and the resolution, frame rate and format of the image data are arbitrary. In this embodiment, it is assumed that color (RGB) images captured at 10 fps (10 images per second) are used.

本実施形態の人追跡装置１は、画像入力部２０、人体検出部２１、学習部２２、記憶部２３、追跡部２４、出力部２８を有している。 The human tracking device 1 of this embodiment has an image input unit 20 , a human body detection unit 21 , a learning unit 22 , a storage unit 23 , a tracking unit 24 and an output unit 28 .

画像入力部２０は、魚眼カメラ１０から画像データを取り込む機能を有する。取り込まれた画像データは人体検出部２１および追跡部２４に引き渡される。この画像データは記憶部２３に格納されてもよい。 The image input unit 20 has a function of capturing image data from the fisheye camera 10 . The captured image data is handed over to the human body detection unit 21 and the tracking unit 24 . This image data may be stored in the storage unit 23 .

人体検出部２１は、人体を検出するアルゴリズムを用いて、魚眼画像から人体を検出する機能を有する。人体検出部２１によって検出された人体が、追跡部２４による追跡処理の対象となる。なお、人体検出部２１は、画像内に新たに現れた人物のみを検出してもよく、追跡対象の人物が存在している位置の近くは検出処理の対象から除外してもよい。さらに、一定の時間間隔またはフレーム間隔により、画像全体に人体検出部２１による人物の検出を行い、その後、追跡部２４による追跡処理をするＴｒａｃｋｉｎｇ－ｂｙ－ｄｅｔｅｃｔｉｏｎ方式にしてもよい。 The human body detection unit 21 has a function of detecting a human body from a fisheye image using an algorithm for detecting a human body. The human body detected by the human body detection unit 21 is targeted for tracking processing by the tracking unit 24 . Note that the human body detection unit 21 may detect only a person newly appearing in the image, and may exclude the vicinity of the position where the person to be tracked exists from the target of detection processing. Furthermore, a Tracking-by-detection method may be used in which the human body detection unit 21 detects a person in the entire image at a constant time interval or frame interval, and then the tracking unit 24 performs tracking processing.

学習部２２は、人体検出部２１が検出した、あるいは追跡部２４が特定した人体の画像から、追跡対象の人体の特徴を学習して学習結果を記憶部２３に記憶する。ここでは、学習部２２は、形状特徴に基づく評価を行うための相関フィルタと、色特徴に基づく評価を行うための色ヒストグラムとを求める。学習部２２は、毎フレーム学習を行い、現フレームから得られる学習結果を所定の係数で過去の学習結果に反映させて更新する。 The learning unit 22 learns the characteristics of the human body to be tracked from the human body image detected by the human body detection unit 21 or specified by the tracking unit 24 and stores the learning result in the storage unit 23 . Here, the learning unit 22 obtains a correlation filter for performing evaluation based on shape features and a color histogram for performing evaluation based on color features. The learning unit 22 performs learning for each frame, and updates the past learning result by reflecting the learning result obtained from the current frame with a predetermined coefficient.

記憶部２３は、学習部２２によって学習された学習結果を記憶する。記憶部２３は、また、利用する特徴量、各特徴量のパラメータ、学習係数、差分重心による修正の際の重み係数（関数）など、学習処理および追跡処理のハイパーパラメータも記憶する。 The storage unit 23 stores learning results learned by the learning unit 22 . The storage unit 23 also stores hyperparameters for learning processing and tracking processing, such as feature quantities to be used, parameters for each feature quantity, learning coefficients, and weighting coefficients (functions) for correction using the center of gravity of the difference.

追跡部２４は、追跡対象の人物の現フレーム画像中での位置を特定する。追跡部２４は、最初は人体検出部２１による検出位置を含む領域をターゲット領域として、そのターゲット領域内から検出された人物と同様の特徴を有する物体位置を特定する。それ以降は、前フレーム画像について追跡部２４が特定した位置の付近をターゲット領域として、現フ
レーム画像中から追跡対象の人物の位置を特定する。 The tracking unit 24 identifies the position of the person to be tracked in the current frame image. The tracking unit 24 first identifies an object position having characteristics similar to those of a person detected from within the target area, with the area including the detection position by the human body detection unit 21 as the target area. After that, the position of the person to be tracked is specified in the current frame image with the vicinity of the position specified by the tracking unit 24 in the previous frame image as the target area.

図３は、追跡部２４のより詳細な機能ブロック図である。追跡部２４は、第１サブ追跡部２５と第２サブ追跡部２６と位置修正部２７とを含む。第１サブ追跡部２５は、特徴量抽出部２５１、応答マップ生成部２５２、仮中心決定部２５３を含む。第２サブ追跡部２６は、差分抽出部２６１、ヒートマップ生成部２６２、差分画像調整部２６３、他対象領域マスク部２６４、中心予測部２６５、重心決定部２６６、重み決定部２６７を含む。 FIG. 3 is a more detailed functional block diagram of the tracking unit 24. As shown in FIG. The tracking section 24 includes a first sub-tracking section 25 , a second sub-tracking section 26 and a position correction section 27 . The first sub-tracking unit 25 includes a feature extraction unit 251 , a response map generation unit 252 and a provisional center determination unit 253 . The second sub-tracking section 26 includes a difference extraction section 261 , a heat map generation section 262 , a difference image adjustment section 263 , a different target region mask section 264 , a center prediction section 265 , a center of gravity determination section 266 and a weight determination section 267 .

特徴量抽出部２５１は、ターゲット領域から物体の形状に関する特徴量と色に関する特徴量を抽出する。特徴量抽出部２５１は、形状に関する特徴としてＨＯＧ特徴量を抽出し、色に関する特徴量として色ヒストグラムを抽出する。 The feature quantity extraction unit 251 extracts a feature quantity relating to the shape and a feature quantity relating to the color of the object from the target region. The feature amount extraction unit 251 extracts the HOG feature amount as the feature amount regarding the shape, and extracts the color histogram as the feature amount regarding the color.

応答マップ生成部２６は、抽出された特徴量と、記憶部２３に記憶されている相関フィルタを用いて、ターゲット領域の各位置について追跡対象物が存在する確からしさを表す応答マップ（尤度のマップ）を生成する。 The response map generating unit 26 uses the extracted feature amount and the correlation filter stored in the storage unit 23 to generate a response map (likelihood map).

仮中心決定部２５３は、応答マップに基づいて、現フレーム画像における追跡対象物の暫定的な位置を決定する。具体的には、仮中心決定部２５３は、応答マップにおいて最大値を取るターゲット領域中の位置を特定し、この位置を現フレーム画像における追跡対象物の中心位置（第１の座標）として仮決定する。なお、応答マップにおける最大値を取る位置を中心位置として決定する必要はなく、応答マップに複数のピークが現れる場合に、その他の要素を考慮して最大値以外の局所ピークを取る位置を対象物の中心位置として仮決定してもよい。 A temporary center determination unit 253 determines the temporary position of the tracked object in the current frame image based on the response map. Specifically, the provisional center determination unit 253 identifies the position in the target region that takes the maximum value in the response map, and provisionally determines this position as the center position (first coordinate) of the tracked object in the current frame image. do. Note that it is not necessary to determine the position of the maximum value in the response map as the center position. If multiple peaks appear in the response map, the position of the local peak other than the maximum value can be determined by considering other factors. may be tentatively determined as the center position of .

第２サブ追跡部２６は、前フレーム画像と現フレーム画像の差分に基づき現フレーム画像における追跡対象物の中心位置を決定する。第２サブ追跡部２６の各機能部については、以下でフローチャートを参照しながら詳細に説明する。 The second sub-tracking unit 26 determines the center position of the tracked object in the current frame image based on the difference between the previous frame image and the current frame image. Each functional unit of the second sub-tracking unit 26 will be described in detail below with reference to flowcharts.

位置修正部２７は、第１サブ追跡部２５が決定した対象物の位置を、第２サブ追跡部２６が決定した対象物の位置を用いて修正することで、現フレーム画像における対象物の中心位置を特定する。位置修正部２７は、現フレーム画像における対象物の位置を特定する位置特定手段の一例である。具体的には、上記２つの位置の重み付け和（加重平均）を取ることで最終的な対象物の中心位置が特定される。処理の詳細については、以下でフローチャートを参照しながら説明する。 The position correction unit 27 corrects the position of the target object determined by the first sub-tracking unit 25 using the position of the target object determined by the second sub-tracking unit 26 to obtain the center of the target object in the current frame image. Locate. The position correction unit 27 is an example of position specifying means for specifying the position of the object in the current frame image. Specifically, the final center position of the object is identified by taking the weighted sum (weighted average) of the two positions. Details of the processing will be described below with reference to flowcharts.

出力部２８は、魚眼画像や検出結果・追跡結果などの情報を外部装置に出力する機能を有する。例えば、出力部２８は、外部装置としてのディスプレイに情報を表示してもよいし、外部装置としてのコンピュータに情報を転送してもよいし、外部装置としての照明装置や空調やＦＡ装置に対し情報や制御信号を送信してもよい。 The output unit 28 has a function of outputting information such as fisheye images, detection results, and tracking results to an external device. For example, the output unit 28 may display information on a display as an external device, may transfer information to a computer as an external device, or may transmit information to a lighting device, an air conditioner, or an FA device as an external device. Information and control signals may be transmitted.

人追跡装置１は、例えば、ＣＰＵ（プロセッサ）、メモリ、ストレージなどを備えるコンピュータにより構成することができる。その場合、図２、図３に示す構成は、ストレージに格納されたプログラムをメモリにロードし、ＣＰＵが当該プログラムを実行することによって実現されるものである。かかるコンピュータは、パーソナルコンピュータ、サーバコンピュータ、タブレット端末、スマートフォンのような汎用的なコンピュータでもよいし、オンボードコンピュータのように組み込み型のコンピュータでもよい。あるいは、図２に示す構成の全部または一部を、ＡＳＩＣやＦＰＧＡなどで構成してもよい。あるいは、図２に示す構成の全部または一部を、クラウドコンピューティングや分散コンピューティングにより実現してもよい。 The person tracking device 1 can be configured by, for example, a computer including a CPU (processor), memory, storage, and the like. In that case, the configurations shown in FIGS. 2 and 3 are realized by loading a program stored in the storage into the memory and executing the program by the CPU. Such a computer may be a general-purpose computer such as a personal computer, a server computer, a tablet terminal, a smart phone, or a built-in computer such as an on-board computer. Alternatively, all or part of the configuration shown in FIG. 2 may be configured with ASIC, FPGA, or the like. Alternatively, all or part of the configuration shown in FIG. 2 may be realized by cloud computing or distributed computing.

＜全体処理＞
図４は、監視システム２による人追跡処理の全体フローチャートである。図４に沿って人追跡処理の全体的な流れを説明する。 <Overall processing>
FIG. 4 is an overall flowchart of human tracking processing by the monitoring system 2 . The overall flow of human tracking processing will be described along FIG.

まず、ステップＳ１０１において、ユーザが人追跡装置１に対して学習および追跡のハイパーパラメータの設定を行う。ハイパーパラメータの例として、利用する特徴量、各特徴量のパラメータ、学習係数、差分重心による修正の際の重み係数（関数）などが挙げられる。入力されたハイパーパラメータは記憶部２３に記憶される。 First, in step S<b>101 , the user sets learning and tracking hyperparameters for the human tracking device 1 . Examples of hyperparameters include feature quantities to be used, parameters for each feature quantity, learning coefficients, and weighting coefficients (functions) for correction using the center of gravity of the difference. The inputted hyperparameters are stored in the storage unit 23 .

次に、ステップＳ１０２において、人追跡装置１は、ターゲット領域を取得する。ターゲット領域は、追跡対象の人物が存在する領域とその周辺をあわせた領域であり、追跡対象の人物が存在する可能性が高い領域である。ターゲット領域は、追跡部２４によって処理対象とされる領域ともいえる。本実施形態では、追跡対象人物の初期位置は人体検出部２１によって検出される。ただし、追跡対象人物の初期位置は、例えば、ユーザによって入力されるなどしてもよい。 Next, in step S102, the human tracking device 1 acquires a target area. The target area is an area including the area where the person to be tracked exists and its periphery, and is an area where the person to be tracked is likely to exist. The target area can also be said to be an area to be processed by the tracking unit 24 . In this embodiment, the initial position of the person to be tracked is detected by the human body detection unit 21 . However, the initial position of the tracked person may, for example, be entered by the user.

以下、ステップＳ１０４からＳ１０７の処理が繰り返し実施される。ステップＳ１０３の終了判定において終了条件を満たしたら処理を終了する。終了条件は、例えば、追跡対象人物の喪失（フレームアウト）や動画の終了とすることができる。 Thereafter, the processing from steps S104 to S107 is repeatedly performed. If the termination condition is satisfied in the determination of termination in step S103, the process is terminated. The termination condition can be, for example, the loss of the tracked person (frame out) or the end of the animation.

ステップＳ１０４において、画像入力部２０が魚眼カメラ１０から１フレームの魚眼画像を入力する。この際、魚眼画像の歪みを補正した平面展開画像を作成して以降の処理を行ってもよいが、本実施形態の監視システム２では、魚眼画像をそのまま（歪んだまま）検出や追跡の処理に用いる。 In step S<b>104 , the image input unit 20 inputs a one-frame fisheye image from the fisheye camera 10 . At this time, a flat unfolded image in which the distortion of the fisheye image is corrected may be created and subsequent processing may be performed. used for the processing of

ステップＳ１０５では、現在のフレームが最初の画像であるか否かが判定される。ここで、最初の画像とは、追跡対象人物の初期位置が与えられたフレーム画像のことであり、典型的には人体検出部２１によって追跡対象人物が検出されたフレーム画像のことである。 In step S105, it is determined whether the current frame is the first image. Here, the first image is a frame image to which the initial position of the person to be tracked is given, typically a frame image from which the person to be tracked has been detected by the human body detection unit 21 .

現在のフレームが最初の画像よりも後のフレームの画像である場合には、ステップＳ１０６に進み、追跡部２４が追跡処理を実行する。追跡処理の詳細は後述する。 If the current frame is the image of the frame after the first image, the process proceeds to step S106, and the tracking unit 24 executes the tracking process. Details of the tracking process will be described later.

ステップＳ１０７では、現在のフレーム画像において対象人物が存在する領域に基づいて、学習部２２が学習処理を実行する。学習処理の詳細は後述する。 In step S107, the learning unit 22 executes learning processing based on the area where the target person exists in the current frame image. Details of the learning process will be described later.

このように、追跡処理Ｓ１０６による追跡対象人物の位置特定が毎フレーム行われて、追跡が実現される。また、本実施形態の追跡手法は、追跡対象人物の特徴を毎フレーム学習する逐次学習型の追跡アルゴリズムを採用している。 In this manner, the position of the person to be tracked is specified for each frame by the tracking process S106, and tracking is realized. Further, the tracking method of this embodiment employs a sequential learning type tracking algorithm that learns the characteristics of the person to be tracked for each frame.

＜学習処理＞
図５は、ステップＳ１０７の学習処理の詳細を示すフローチャートである。また、図９は学習処理および学習結果を用いた追跡処理を説明する図である。以下、図５および図９を参照して学習処理について説明する。 <Learning processing>
FIG. 5 is a flowchart showing details of the learning process in step S107. FIG. 9 is a diagram for explaining the learning process and the tracking process using the learning result. The learning process will be described below with reference to FIGS. 5 and 9. FIG.

学習部２２は、まず、現フレーム画像からターゲット領域９０４を切り出す（Ｓ２０１）。ここでは、Ｔフレームが現フレーム画像であるとする。図９に示すように、ターゲット領域９０４は、人物の前景領域９０２および背景領域９０３を含む領域である。前景領域９０２は追跡対象人物が存在する領域であり、背景領域は追跡対象人物が存在しない領域である。背景領域９０３の大きさは、前景領域９０２の大きさに応じて決定されている
。例えば、前景領域９０２のサイズがターゲット領域９０４の全体サイズの所定の比率（例えば１／３）となるように、背景領域９０３のサイズが決定されている。なお、ターゲット領域は中心が追跡対象人物の位置となるように追跡処理の最後に更新されている（図６のステップＳ３０６）ので、ターゲット領域９０４の中心９０１は追跡対象人物の中心位置と等しい。 The learning unit 22 first cuts out the target area 904 from the current frame image (S201). Here, it is assumed that the T frame is the current frame image. As shown in FIG. 9, target area 904 is an area that includes foreground area 902 and background area 903 of a person. The foreground area 902 is the area where the tracked person exists, and the background area is the area where the tracked person does not exist. The size of the background area 903 is determined according to the size of the foreground area 902 . For example, the size of the background area 903 is determined such that the size of the foreground area 902 is a predetermined ratio (eg, ⅓) of the overall size of the target area 904 . Since the target area is updated at the end of the tracking process so that the center of the target area is the position of the person to be tracked (step S306 in FIG. 6), the center 901 of the target area 904 is equal to the center position of the person to be tracked.

学習部２２は、ターゲット領域９０４の特徴量画像として、明度特徴量画像９０６とＨＯＧ特徴量画像９０７を取得する（Ｓ２０２）。ＨＯＧ特徴量は、局所領域の輝度勾配方向をヒストグラム化した特徴量であり、物体の形状・輪郭を表す特徴量と捉えられる。ここでは、ＨＯＧ特徴量を採用しているが、物体の形状・輪郭を表す他の特徴量、例えば、ＬＢＰ特徴量、ＳＨＩＦＴ特徴量、ＳＵＲＦ特徴量を採用してもよい。また、明度画像ではなく輝度画像を採用してもよい。なお、追跡処理（図６のステップＳ３０２）で明度特徴量画像とＨＯＧ特徴量画像が求められている場合には、あらためてこれらを求める必要はない。 The learning unit 22 acquires the brightness feature amount image 906 and the HOG feature amount image 907 as feature amount images of the target region 904 (S202). The HOG feature amount is a feature amount obtained by forming a histogram of the luminance gradient direction of a local region, and can be regarded as a feature amount representing the shape/contour of an object. Here, the HOG feature amount is used, but other feature amounts representing the shape/contour of the object, such as the LBP feature amount, the SHIFT feature amount, and the SURF feature amount, may be used. Also, a luminance image may be employed instead of the brightness image. Note that if the brightness feature amount image and the HOG feature amount image have been obtained in the tracking process (step S302 in FIG. 6), there is no need to obtain them again.

学習部２２は、次フレームでフレーム間の差分画像を求める際に利用できるように
明度特徴量画像９０６を記憶部２３に記憶する（Ｓ２０３）。 The learning unit 22 stores the lightness feature amount image 906 in the storage unit 23 so that it can be used when obtaining a difference image between frames in the next frame (S203).

学習部２２はまた、ターゲット領域９０４内の色ヒストグラム９０８を取得する（Ｓ２０４）。具体的には、前景領域９０２と背景領域９０３のそれぞれの色ヒストグラムを取得する。色ヒストグラムは色を表す特徴量であり、色を表すその他の特徴量としてColor Names (CN)特徴量を採用できる。また、色の特徴量ではなく、輝度の特徴を表す特徴量として輝度ヒストグラムを採用してもよい。学習部２２は、求めた色ヒストグラム９０８を記憶部２３に記憶する。 The learning unit 22 also acquires the color histogram 908 within the target region 904 (S204). Specifically, the color histograms of the foreground area 902 and the background area 903 are obtained. A color histogram is a feature quantity representing color, and a Color Names (CN) feature quantity can be adopted as another feature quantity representing color. Also, a luminance histogram may be employed as a feature quantity representing a luminance feature instead of a color feature quantity. The learning unit 22 stores the obtained color histogram 908 in the storage unit 23 .

学習部２２は、応答がターゲット中心にピークを持つような相関フィルタ９０９を求める（Ｓ２０５）。具体的には、ＨＯＧ特徴量画像９０７を抽出した後に、その特徴量画像自身の相関に対して、中心のみにピークを持つ理想の応答に最も近づくようなフィルタを求めることで、相関フィルタ９０９が得られる。相関フィルタの計算をフーリエ空間で行う場合には、特徴量画像に窓関数を乗じても良い。ＨＯＧ特徴量画像９０７は、次フレームの追跡処理で相関フィルタをかける際に使用するため、記憶部２３に記憶する。 The learning unit 22 obtains a correlation filter 909 whose response has a peak at the center of the target (S205). Specifically, after extracting the HOG feature amount image 907, the correlation filter 909 is obtained by obtaining a filter that is closest to an ideal response having a peak only at the center with respect to the correlation of the feature amount image itself. can get. When calculating the correlation filter in Fourier space, the feature amount image may be multiplied by a window function. The HOG feature amount image 907 is stored in the storage unit 23 because it is used when applying a correlation filter in tracking processing of the next frame.

今回の学習が最初の学習であれば（Ｓ２０６－ＹＥＳ）、ステップＳ２０４およびＳ２０５で生成した相関フィルタおよび色ヒストグラムをそのまま記憶部２３に記憶する。一方、今回の学習が２回目以降の学習であれば（Ｓ２０６－ＮＯ）、処理はステップＳ２０７に進む。 If this learning is the first learning (S206-YES), the correlation filter and color histogram generated in steps S204 and S205 are stored in the storage unit 23 as they are. On the other hand, if the current learning is the second or later learning (S206-NO), the process proceeds to step S207.

学習部２２は、ステップＳ２０７において、前回求めた相関フィルタ（記憶部２３に記憶されている相関フィルタ）と今回ステップＳ２０５で求めた相関フィルタを合成することで新たな相関フィルタを求め、記憶部２３に記憶する。また、学習部２２は、ステップＳ２０８において、前回求めた色ヒストグラム（記憶部２３に記憶されている色ヒストグラム）と、今回ステップＳ２０４で求めた色ヒストグラムを合成することで新たな色ヒストグラムを求め、記憶部２３に記憶する。合成の際の重み（学習係数）は適宜決定すればよい。 In step S207, the learning unit 22 obtains a new correlation filter by synthesizing the correlation filter obtained last time (the correlation filter stored in the storage unit 23) and the correlation filter obtained this time in step S205. memorize to Further, in step S208, the learning unit 22 obtains a new color histogram by synthesizing the color histogram obtained last time (the color histogram stored in the storage unit 23) and the color histogram obtained this time in step S204. Stored in the storage unit 23 . Weights (learning coefficients) for synthesis may be determined as appropriate.

＜追跡処理＞
図６は、ステップＳ１０６の追跡処理の詳細を示すフローチャートである。また、図９は学習処理および学習結果を用いた追跡処理を説明する図である。以下、図６および図９を参照して追跡処理について説明する。 <Tracking process>
FIG. 6 is a flowchart showing details of the tracking process in step S106. FIG. 9 is a diagram for explaining the learning process and the tracking process using the learning result. The tracking process will be described below with reference to FIGS. 6 and 9. FIG.

［相関フィルタモデルによる追跡処理］
追跡部２４は、現フレーム画像からターゲット領域９０５を切り出す（Ｓ３０１）。ここでは、Ｔ＋１フレーム目が現フレーム画像であるとする。なお、ターゲット領域は中心が追跡対象人物の位置となるように前回の追跡処理の最後に更新されている（図６のステップＳ３０６）ので、Ｔフレーム目の画像のターゲット領域９０４の中心９０１は追跡対象人物の中心位置と等しい。ここでは、図９に示すように、Ｔフレーム目において特定された追跡対象人物の位置を中心とするターゲット領域９０４に対応するターゲット領域９０５が切り出される。 [Tracking processing by correlation filter model]
The tracking unit 24 cuts out the target area 905 from the current frame image (S301). Here, it is assumed that the (T+1)th frame is the current frame image. Note that the target area has been updated at the end of the previous tracking process so that the center of the target person is the position of the person to be tracked (step S306 in FIG. 6). Equal to the center position of the target person. Here, as shown in FIG. 9, a target area 905 corresponding to the target area 904 whose center is the position of the person to be tracked identified in the T-th frame is cut out.

特徴量抽出部２５１は、ターゲット領域９０５の特徴量画像として、明度特徴量画像９１０とＨＯＧ特徴量画像９１１を抽出する（Ｓ３０２）。明度特徴量画像９１０はフレーム画像と同じ解像度であるが、ＨＯＧ特徴量画像はセルごと（例えば３×３画素ごと）に特徴量が求められるのでその解像度はフレーム画像よりも低い。応答マップ生成部２５２は、ターゲット領域９０５内のＨＯＧ特徴量画像９１１と記憶部２３に記憶されているＨＯＧ特徴量画像９０７の相関に対して相関フィルタ９０９をかけて応答マップ９２１（尤度のマップ）を求める（Ｓ３０３）。仮中心決定部２５３は、応答マップ９２１における最大値を取る位置を、現フレーム画像における対象物の中心位置ｐ（９２２）であると仮決定する（Ｓ３０４）。なお、中心位置ｐは応答マップ９２１において最大値を取る位置として決定される必要はなく、例えば、応答マップ９２１に複数のピークが現れる場合にはその他の要素を考慮して最大以外のピークの位置を中心位置ｐとしてもよい。 The feature quantity extraction unit 251 extracts the brightness feature quantity image 910 and the HOG feature quantity image 911 as the feature quantity images of the target region 905 (S302). The brightness feature amount image 910 has the same resolution as the frame image, but the HOG feature amount image has a lower resolution than the frame image because the feature amount is obtained for each cell (for example, every 3×3 pixels). The response map generation unit 252 applies a correlation filter 909 to the correlation between the HOG feature amount image 911 in the target region 905 and the HOG feature amount image 907 stored in the storage unit 23 to generate a response map 921 (likelihood map ) is obtained (S303). The tentative center determining unit 253 tentatively determines that the position of the maximum value in the response map 921 is the center position p (922) of the object in the current frame image (S304). Note that the center position p does not need to be determined as the position that takes the maximum value in the response map 921. For example, when a plurality of peaks appear in the response map 921, the positions of the peaks other than the maximum value are determined in consideration of other factors. may be the center position p.

［修正処理］
追跡部２４は、ステップＳ３０５においてフレーム間の差分画像に基づく対象物の中心位置ｃを求め、ステップＳ３０４で求めた中心位置ｐを中心位置ｃにより修正する。図７は、ステップＳ３０５の処理の詳細を示すフローチャートである。以下、図７を参照して詳しく説明する。 [Correction process]
In step S305, the tracking unit 24 obtains the center position c of the object based on the difference image between the frames, and corrects the center position p obtained in step S304 with the center position c. FIG. 7 is a flowchart showing the details of the processing in step S305. A detailed description will be given below with reference to FIG.

ヒートマップ生成部２６２は、ターゲット領域９０５内の各画素の色と記憶部２３に記憶されている色ヒストグラム９０８とから、ターゲット領域９０５内の各画素が追跡対象人物（前景）である確からしさ（尤度）を表す前景尤度のマップ９１２を生成する（Ｓ４０１）。本明細書では、色ヒストグラムに基づく前景尤度のマップのことをヒートマップと称する。 The heat map generation unit 262 calculates the likelihood ( A foreground likelihood map 912 is generated (S401). Foreground likelihood maps based on color histograms are referred to herein as heatmaps.

追跡部２４は、ヒートマップ９１２、前フレームの明度特徴量画像９０６、現フレームの明度特徴量画像９１０をそれぞれ同じ解像度に低解像度化する。追跡部２４は、例えば、これらの画像の解像度を相関フィルタ演算の結果得られる応答マップ９２１と同じにする。低解像度化により、ある程度の広がりを持った領域についての前景尤度や差分を平均化した画素からなる画像が得られる。 The tracking unit 24 reduces the resolutions of the heat map 912, the brightness feature amount image 906 of the previous frame, and the brightness feature amount image 910 of the current frame to the same resolution. The tracking unit 24, for example, makes the resolution of these images the same as the response map 921 obtained as a result of the correlation filter calculation. By reducing the resolution, it is possible to obtain an image composed of pixels obtained by averaging the foreground likelihoods and differences in an area having a certain degree of spread.

差分画像調整部２６３は、低解像度化された明度特徴量画像９０６，９１０の各画素について、差の絶対値（絶対値差）を計算して、差の絶対値画像９１３を生成する（Ｓ４０４）。差分画像調整部２６３は、差の絶対値画像９１３と前景らしさを表すヒートマップ９１２を画素ごとに積算して、調整済差分画像９１４を生成する（Ｓ４０４）。調整済差分画像は、単純なフレーム間の差分画像に対して、低解像度化を施し、かつ、色に基づく前景尤度に応じた調整を行った画像と捉えられる。低解像度化によりある程度の広がりを持った領域に対する差分を把握でき、かつ、色に基づく前景尤度を考慮することで追跡対象ではない動体を除外することができる。 The difference image adjustment unit 263 calculates the absolute value of the difference (absolute value difference) for each pixel of the low-resolution lightness feature amount images 906 and 910, and generates a difference absolute value image 913 (S404). . The difference image adjustment unit 263 integrates the difference absolute value image 913 and the heat map 912 representing the foreground likelihood for each pixel to generate an adjusted difference image 914 (S404). The adjusted difference image can be regarded as an image obtained by lowering the resolution of a simple difference image between frames and adjusting it according to the foreground likelihood based on color. By reducing the resolution, it is possible to grasp the difference for an area with a certain degree of spread, and by considering the foreground likelihood based on color, it is possible to exclude moving objects that are not the target of tracking.

差分画像調整部２６３は、さらに、調整済差分画像９１４を二値化する（Ｓ４０５）。二値化処理には、既存の任意の動的二値化アルゴリズムを採用可能であるが、固定閾値で
二値化を行ってもよい。図１０Ａは二値化された調整済差分画像９１４の例を示す。図において白抜きで示した画素１００１は差分がある画素（差分が閾値上の画素）である。差分がある画素１００１の全体からなる領域が動きのある領域に該当する。 The difference image adjustment unit 263 further binarizes the adjusted difference image 914 (S405). Any existing dynamic binarization algorithm can be adopted for the binarization process, but binarization may be performed with a fixed threshold. FIG. 10A shows an example of a binarized adjusted difference image 914 . A pixel 1001 shown in white in the figure is a pixel with a difference (a pixel whose difference is above the threshold). A region consisting of all pixels 1001 with a difference corresponds to a moving region.

このようにして得られた調整済差分画像９１４に対して、他対象領域マスク部２６４が、他の追跡対象物によって生じている差分を無視（除去）するためのマスク領域を設定する（Ｓ４０６）。ここでは理解の容易化のために、ターゲット領域９０５内に他の追跡対象が存在しないと仮定して、マスク処理の詳細には立ち入らずに説明を続ける。マスク処理の詳細については後ほど説明する。 In the adjusted difference image 914 thus obtained, the other target region masking unit 264 sets a mask region for ignoring (removing) the difference caused by another tracked object (S406). . Here, for ease of understanding, it is assumed that there is no other tracked object within the target area 905, and the description continues without going into details of the mask processing. Details of the mask processing will be described later.

中心予測部２６５は、前フレーム画像における追跡対象物の中心位置と、前フレーム画像までの追跡対象物の移動速度から、現フレーム画像における追跡対象物の中心位置を予測する（Ｓ４０７）。図１０Ｂに、前フレーム画像における中心位置１００２と前フレーム画像までの追跡対象物の移動速度１００３が示されており、中心予測部２６５は、これらの情報に基づいて、現フレーム画像における予測中心位置１００４を予測する。予測にはベイジアンフィルターを用いてもよく、例えばカルマンフィルターにより移動位置を推定してもよく、オプティカルフローにより移動位置を推定してもよい。なお、予測中心位置１００４を中心とする中心領域１００５が設定される。また、ターゲット領域のうち中心領域１００５ではない領域は周辺領域として設定される。 The center prediction unit 265 predicts the center position of the tracked object in the current frame image from the center position of the tracked object in the previous frame image and the movement speed of the tracked object up to the previous frame image (S407). FIG. 10B shows the center position 1002 in the previous frame image and the moving speed 1003 of the tracked object up to the previous frame image. Predict 1004. A Bayesian filter may be used for prediction. For example, the movement position may be estimated using a Kalman filter, or the movement position may be estimated using optical flow. A center area 1005 centered on the predicted center position 1004 is set. Also, an area of the target area other than the central area 1005 is set as a peripheral area.

次に、重心決定部２６６は差分のある画素からなる領域の重心ｃの位置を求める（Ｓ４０８、図９の符号９１５）。重心ｃを求める際に、それぞれの画素について重みを設定してもよい。すなわち、下記の式（１）によって重心ｃを求めても良い。
差分の重心＝Σ_ｉ（ｄ（ｉ）×ｗ（ｉ）×ｐ（ｉ））・・・（１）
ここで、ｐ（ｉ）が画素の座標（位置ベクトル）、ｄ（ｉ）は差分の有無（差分があれば１、なければ０）、ｗ（ｉ）は重みであり、総和（シグマ）は調整済差分画像の全体を対象とする。 Next, the center-of-gravity determining unit 266 obtains the position of the center of gravity c of the region composed of the pixels with the difference (S408, reference numeral 915 in FIG. 9). A weight may be set for each pixel when obtaining the center of gravity c. That is, the center of gravity c may be obtained by the following formula (1).
Center of gravity of difference=Σ _i (d(i)×w(i)×p(i)) (1)
Here, p(i) is the pixel coordinate (position vector), d(i) is the presence or absence of a difference (1 if there is a difference, 0 if not), w(i) is the weight, and the sum (sigma) is The entire adjusted difference image is targeted.

上記の重みｗ（ｉ）は、例えば、中心領域内の画素と周辺領域内の画素とにそれぞれ異なる値に設定することが考えられ、この際、それぞれの重みは中心領域および周辺領域の面積に反比例する値を設定することが考えられる。これ以外にも、予測中心位置１００４から近いほど大きな重みを設定するなど、その他の基準で重みを設定してもよい。なお、重心ｃを求める際に重み付けをしないで、単純な重心を求めてもよい。なお、重心位置の座標はサブ画素精度で求められてもよい。また、上記のｄ（ｉ）は差分が閾値以上か否かによって０または１の二値としているが、差分の程度に応じた３つ以上の値あるいは連続値を取るようにしてもよい。 For example, the weight w(i) may be set to different values for the pixels in the central region and the pixels in the peripheral region. It is conceivable to set inversely proportional values. Other than this, the weight may be set based on other criteria, such as setting a greater weight as the position is closer to the predicted center position 1004 . A simple center of gravity may be obtained without weighting when obtaining the center of gravity c. Note that the coordinates of the barycentric position may be obtained with sub-pixel precision. Also, the above d(i) has a binary value of 0 or 1 depending on whether the difference is greater than or equal to the threshold value, but may take three or more values or a continuous value depending on the degree of the difference.

上述のようにして求めた重心ｃは、フレーム間の差分画像に基づく現フレーム画像における追跡対象物の予測中心位置と捉えることができる。また、以下では重心ｃのことを差分重心と称する場合もある。 The center of gravity c obtained as described above can be regarded as the predicted central position of the tracked object in the current frame image based on the inter-frame difference image. Further, the center of gravity c may be referred to as the differential center of gravity below.

また、重み決定部２６７は、中心領域１００５内の差分の和ｓｃと周辺領域内の差分の和ｓｂをそれぞれ求める（Ｓ４０８、図９の符号９１５）。図１０Ｃにおいて白で示した画素が中心領域内の差分のある画素であり、斜線を付した画素が周辺領域内の差分のある画素である。ここでも、差分の和ｓｃ，ｓｂは重み付け和として求めてもよく、重みの設定は上記と同様である。差分の和は具体的には下記の式（２）により求められる。
差分の和＝Σ_ｉ（ｄ（ｉ）×ｗ（ｉ））・・・（２）
ここで、総和（シグマ）の対象は、中心領域または周辺領域である。 The weight determination unit 267 also obtains the sum sc of differences in the central region 1005 and the sum sb of differences in the peripheral region (S408, reference numeral 915 in FIG. 9). In FIG. 10C, the white pixels are the pixels with the difference in the central region, and the shaded pixels are the pixels with the difference in the peripheral region. Also here, the sums sc and sb of the differences may be obtained as weighted sums, and the setting of the weights is the same as above. Specifically, the sum of differences is obtained by the following formula (2).
Sum of differences=Σ _i (d(i)×w(i)) (2)
Here, the object of the summation (sigma) is the central region or the peripheral region.

なお、式（２）により求めた差分の和ｓｃ，ｓｂを中心領域および周辺領域の面積で正
規化してもよい。言い換えると、式（２）で求められる値を中心領域または周辺領域の面積で割り算した値をｓｃ，ｓｂとしてもよい。 Note that the sums sc and sb of the differences obtained by Equation (2) may be normalized by the areas of the central region and the peripheral region. In other words, sc and sb may be values obtained by dividing the values obtained by Equation (2) by the area of the central region or the peripheral region.

重み決定部２６７は、このようにして求めたｓｃとｓｂを元に、仮決定した中心位置ｐに対する重心の適用率（重み係数）ｅを算出する（Ｓ４０９）。ここでは、以下の式によって適用率ｅを算出する。
ｅ＝ｆ（ｓｃ＋ｓｂ）_０，α×ｆ（ｓｃ／ｓｂ）_β，１・・・（３）
ここで、関数ｆ（ｘ）_ａ，ｂは、図１１に示すように、ｘ＜ａのとき０、ｘ＞ｂのとき１、ａ≦ｘ≦ｂのとき（ｘ－ａ）／（ｂ－ａ）をとる関数である。 The weight determination unit 267 calculates the application rate (weight coefficient) e of the center of gravity with respect to the tentatively determined center position p based on sc and sb obtained in this way (S409). Here, the application rate e is calculated by the following formula.
e=f(sc+sb) _{0, α} ×f(sc/sb) _{β, 1} (3)
Here, as shown in FIG. 11, the function f(x) _{a, b} is 0 when x<a, 1 when x>b, and (x−a)/(b− a) is a function that takes

ここで示したｅの算出式は一例であり、重心ｃが追跡対象物の中心位置である確率が高いと推定されるほど１に近い値を取るように決定すればその他の算出式を用いてもよい。上記の算出式においてαは十分に小さい値（例えば０．０２５）であり、ｓｃ＋ｓｂがα以下の場合には第１項は０となる。また、βはある程度大きな値（例えば０．５）であり、ｓｃ／ｓｂがβ以下であれば第２項は０となり、βよりも大きい場合にはｓｃ／ｓｂが大きいほど第２項は大きな値を取る。このように適用率ｅを決定すれば、背景が動いていて中心領域よりも周辺領域に差分が多くなるようなケースや得られる差分が少ないケースのように、差分重心ｃが追跡対象物の中心位置である確率が低い場合に、差分重心ｃによる修正の効果を低く設定でき、適切ではない修正を回避できる。 The calculation formula for e shown here is just an example, and other calculation formulas may be used if it is determined to take a value closer to 1 as the probability that the center of gravity c is the center position of the tracked object is higher. good too. In the above formula, α is a sufficiently small value (for example, 0.025), and the first term becomes 0 when sc+sb is less than or equal to α. Also, β is a relatively large value (for example, 0.5), and if sc/sb is less than or equal to β, the second term becomes 0. take a value. If the application rate e is determined in this way, the center of gravity of the difference c will be the center of the object to be tracked, as in the case where the background is moving and the difference is greater in the peripheral area than in the central area, or in the case where the obtained difference is small. When the probability of being at the position is low, the effect of correction by the differential center of gravity c can be set low, and inappropriate correction can be avoided.

位置修正部２７は、第１サブ追跡部２５による中心位置ｐに対して、第２サブ追跡部２６（重心決定部２６６）による重心ｃの位置に適用率ｅを掛けて、中心位置ｐを修正（補正）する（Ｓ４１０）。修正後の中心位置Ｐ（９２３）は、以下の式により算出される。
Ｐ＝（１－ｅ）×ｐ＋ｅ×ｃ・・・（４） The position correction unit 27 corrects the center position p by multiplying the position of the center of gravity c by the second sub-tracking unit 26 (center-of-gravity determination unit 266) by the application rate e with respect to the center position p by the first sub-tracking unit 25. (Correction) is performed (S410). The center position P (923) after correction is calculated by the following formula.
P=(1−e)×p+e×c (4)

なお、ここでは適用率ｅを調整済差分画像における差分の和ｓｃ，ｓｂに基づいて式（３）にしたがって決定しているが、その他の要素を考慮して適用率ｅを決定してもよい。 Although the application rate e is determined according to the equation (3) based on the sums sc and sb of the differences in the adjusted difference image, the application rate e may be determined in consideration of other factors. .

例えば、第１サブ追跡部２５による追跡対象物の中心位置の信頼度（確からしさ）を考慮して、信頼度が閾値よりも高い場合には、適用率ｅを上記式（３）により求まる値よりも小さな値としたり、ゼロとしたりしてもよい。この閾値は、第１サブ追跡部２５による追跡位置が十分に信頼できるとみなせる値に設定する。このようにすれば、フレーム間の差分画像に基づく重心を用いた修正により追跡位置の精度が低下することを防止できる。例えば、追跡対象物が静止している場合には第１サブ追跡部２５により得られる追跡位置は十分に精度が良い。しかしながら、その周辺に移動物体があるとフレーム間の差分画像に基づく重心の影響により適切ではない追跡位置の修正が行われ追跡精度が低下する可能性がある。このような事態を避けられる。 For example, considering the reliability (probability) of the center position of the tracked object by the first sub-tracking unit 25, if the reliability is higher than the threshold, the application rate e is the value obtained by the above equation (3). It may be set to a value smaller than or set to zero. This threshold value is set to a value at which the position tracked by the first sub-tracking unit 25 can be regarded as sufficiently reliable. By doing so, it is possible to prevent the accuracy of the tracking position from deteriorating due to correction using the center of gravity based on the difference image between frames. For example, when the object to be tracked is stationary, the tracking position obtained by the first sub-tracking section 25 is sufficiently accurate. However, if there is a moving object in the vicinity, there is a possibility that the center of gravity based on the difference image between frames will be corrected inappropriately and the tracking accuracy will decrease. You can avoid this situation.

また、第１サブ追跡部２５による追跡対象物の中心位置ｐと、差分重心ｃの位置との差が閾値以下であれば、適用率ｅを上記式（３）により求まる値よりも小さな値としたり、ゼロとしたりしてもよい。この閾値は実験等により定めればよいが十分に小さい値とする。相関フィルタは、過去の追跡対象の画像の累積により決定されるため、追跡対象の位置を実際の位置から僅かにずらす修正をした場合でも、不適切な修正が累積すると最終的に誤検出につながる可能性がある。そこで、相関フィルタによる中心位置ｐと差分重心ｃとが十分に近いならば、相関フィルタによる結果を重視して相関フィルタの応答の劣化を避けることで精度の良い追跡が実現できる。 Further, if the difference between the center position p of the object tracked by the first sub-tracking unit 25 and the position of the differential center of gravity c is equal to or less than the threshold, the application rate e is set to a value smaller than the value obtained by the above equation (3). or zero. This threshold may be determined by experiments or the like, but it is assumed to be a sufficiently small value. Since the correlation filter is determined by accumulating past tracked images, even if the tracked position is slightly shifted from its actual position, the accumulation of inappropriate corrections will eventually lead to false positives. there is a possibility. Therefore, if the center position p obtained by the correlation filter and the center of gravity of difference c are sufficiently close, the result obtained by the correlation filter is emphasized and deterioration of the response of the correlation filter is avoided, thereby achieving accurate tracking.

以上によりステップＳ３０５の修正処理が完了し、現フレーム画像における追跡対象物の中心位置Ｐが決定される。 As described above, the correction processing in step S305 is completed, and the center position P of the tracked object in the current frame image is determined.

図６のフローチャートの説明に戻る。上記のようにしてステップＳ３０５の修正処理が完了すると、追跡部２４は、ターゲット領域の中心を修正後の中心位置Ｐに更新する（Ｓ３０６）。また、ターゲット領域のサイズを更新する。このように、追跡処理が完了した後に、ターゲット領域の中心は追跡対象人物の中心位置に更新され、また、ターゲット領域のサイズも追跡結果に応じて更新される。ターゲット領域の更新サイズは、ＤＳＳＴ（Discriminative Scale Space Tracking）のように画像のピラミッドを用いる方法で推定
してもよいし、前フレームにおけるターゲット領域のサイズ、レンズ歪みの特性、カメラの視点、カメラの配置およびターゲット領域の画像における位置の少なくともいずれかに基づいて決定されてもよい。追跡処理完了後のターゲット領域の中心が追跡対象人物の中心位置であり、ターゲット領域中の前景領域が追跡対象人物の存在領域（バウンディングボックス）である。 Returning to the description of the flowchart in FIG. When the correction process of step S305 is completed as described above, the tracking unit 24 updates the center of the target area to the corrected center position P (S306). It also updates the size of the target region. Thus, after the tracking process is completed, the center of the target area is updated to the center position of the person to be tracked, and the size of the target area is also updated according to the tracking result. The update size of the target area may be estimated by a method using an image pyramid such as DSST (Discriminative Scale Space Tracking), or may be estimated based on the size of the target area in the previous frame, characteristics of lens distortion, camera viewpoint, and camera position. It may be determined based on the placement and/or the position of the target area in the image. The center of the target area after completion of the tracking process is the center position of the person to be tracked, and the foreground area in the target area is the existence area (bounding box) of the person to be tracked.

［マスク処理］
次に、上記で説明を省略した修正処理中の他追跡対象領域のマスク処理（図７のＳ４０６）について説明する。上述したように、ステップＳ４０６のマスク処理は、調整済差分画像９１４において他の追跡対象物によって生じている差分を無視（除去）するためにマスク領域を設定する処理である。図８は、マスク処理Ｓ４０６の詳細を示すフローチャートである。この処理は、追跡処理Ｓ１０６において現在注目している追跡対象物以外の追跡対象物のそれぞれについて処理が繰り返される。図１２Ａにおいて、人物１２０１は現在注目している追跡対象人物であり、人物１２０２はそれ以外の追跡対象人物である。したがって、人物１２０２のそれぞれについて、以下のステップＳ５０１からＳ５０３の処理が繰り返される。 [Mask processing]
Next, the masking process (S406 in FIG. 7) of the other tracking target area during the correction process, which has been omitted from the description above, will be described. As described above, the masking process in step S406 is a process of setting a masking area in order to ignore (remove) differences caused by other tracked objects in the adjusted difference image 914 . FIG. 8 is a flowchart showing the details of mask processing S406. This process is repeated for each of the tracked objects other than the currently focused tracked object in the tracking process S106. In FIG. 12A, a person 1201 is the current tracked person, and a person 1202 is another tracked person. Therefore, the following steps S501 to S503 are repeated for each person 1202 .

具体的には、他対象領域マスク部２６４は、前フレーム画像における他追跡対象物１２０２のターゲット矩形１２１１を取得し（Ｓ５０１）、前フレーム画像までの他追跡対象物１２０２の位置と移動速度１２０３に基づいて現フレーム画像における他追跡対象物１２０２のターゲット矩形１２１１を予測する（Ｓ５０２）。そして、他対象領域マスク部２６４は、現フレーム画像のターゲット領域から、矩形１２１０および矩形１２１１のいずれかに領域をマスクし、ステップＳ４０７以降の処理で考慮されないように除外する。矩形１２１０と矩形１２１１の和が、注目している対象物以外の対象物が存在すると予測される領域に相当する。 Specifically, the other target region masking unit 264 acquires the target rectangle 1211 of the other tracked object 1202 in the previous frame image (S501), and changes the position and moving speed 1203 of the other tracked object 1202 up to the previous frame image. Based on this, the target rectangle 1211 of the other tracked object 1202 in the current frame image is predicted (S502). Then, the other target area masking unit 264 masks the area from the target area of the current frame image to either the rectangle 1210 or the rectangle 1211, and excludes it from being considered in the processing after step S407. The sum of the rectangle 1210 and the rectangle 1211 corresponds to the area in which an object other than the object of interest is expected to exist.

図１２Ａに示すように追跡対象人物が密集あるいは近接している状況の下では図１２Ｃに示すような調整済差分画像９１４が得られる。他追跡対象物１２０２が存在すると予測される領域をマスクして処理の対象外とすることで、図１２Ｄに示すマスク後の調整済差分画像が、ステップＳ４０８等の重心算出や差分和の算出に使われる。このように、他追跡対象物の領域を除外することで、着目している追跡対象物の移動の伴う差分のみを考慮することができるので、第２サブ追跡部２６による重心ｃおよび適用率ｅの算出の精度が向上する。 Under the situation where the persons to be tracked are dense or close to each other as shown in FIG. 12A, an adjusted difference image 914 as shown in FIG. 12C is obtained. By masking the region in which the other tracking target object 1202 is expected to exist and excluding it from the processing target, the masked adjusted difference image shown in FIG. used. By excluding the areas of other tracking objects in this way, it is possible to consider only the difference accompanying the movement of the tracking object of interest. The accuracy of the calculation of is improved.

＜本実施形態の有利な効果＞
本実施形態では、魚眼画像を平面展開せずに用いる人追跡装置において、背景へのドリフトを抑制し、精度の高い人追跡が実現できる。ドリフトは、逐次学習を行う際に追跡対象以外の特徴を誤って学習することに起因して発生する追跡の失敗である。第１サブ追跡部２５による物体追跡アルゴリズムは、対象物が存在する領域の近傍領域まで含めて局所的な特徴を学習する。したがって背景が複雑に変化する場合に、実際の対象物中心とは異なる位置を中心位置として認識することがある。このような誤差が逐次学習により蓄積され、最終的に誤って背景を追跡対象として認識してしまう場合がある。本実施形態においては、第１サブ追跡部２５が求めた対象物の位置を、フレーム間の差分画像に基づいて求めた位置によって修正しているので、より精度良く対象物の位置を特定することができる
。特に、第１サブ追跡部２５の相関フィルタによる追跡性能を極力損ねずに、かつ、相関フィルタによる追跡が適切に行われない可能性が高い、静止した複雑な背景下で激しく変化する対象の追跡が精度良く行えるようになる。 <Advantageous effects of the present embodiment>
In this embodiment, in a human tracking device that uses a fisheye image without planar development, drift to the background can be suppressed and highly accurate human tracking can be realized. Drift is a tracking failure that occurs due to erroneous learning of features other than the tracked target during sequential learning. The object tracking algorithm by the first sub-tracking unit 25 learns local features including the neighboring area of the area where the object exists. Therefore, when the background changes intricately, a position different from the actual center of the object may be recognized as the center position. Such errors are accumulated by sequential learning, and the background may eventually be erroneously recognized as the tracking target. In this embodiment, the position of the object obtained by the first sub-tracking unit 25 is corrected by the position obtained based on the difference image between frames, so that the position of the object can be specified with higher accuracy. can be done. In particular, tracking of an object that changes drastically under a static and complex background, which is highly likely to be inappropriately tracked by the correlation filter without impairing the tracking performance of the correlation filter of the first sub-tracking unit 25 as much as possible. can be performed with high accuracy.

また、従来の背景差分法では、現フレーム画像とその前後のフレーム画像を処理に用いるため、現フレーム画像に対する追跡結果が得られるのはその１フレーム後の画像が入力された後となる。しかしながら、本実施形態では、現フレーム画像とその前のフレーム画像のみを用いているため、追跡開始からすぐに効果が得られる。また、従来の背景差分法は処理対象となる領域が広くなり演算コストが高く、演算リソースの少ない組込機器には不向きであるが、本実施形態の手法は注目する領域のみのフレーム間の差分画像が対象となり演算コストが低いため組込機器にも好適に適用できる。 Further, in the conventional background subtraction method, since the current frame image and the frame images before and after it are used for processing, the tracking result for the current frame image is obtained after the image one frame after is input. However, in this embodiment, since only the current frame image and the previous frame image are used, the effect can be obtained immediately from the start of tracking. In addition, the conventional background subtraction method requires a large area to be processed, which results in a high computational cost and is not suitable for embedded devices with few computational resources. Since the target is an image and the calculation cost is low, it can be suitably applied to an embedded device.

また、他の追跡対象人物が存在すると予測される領域をマスクして処理の対象とすることで、追跡対象人物が密接あるいは近接しているときに、他の追跡対象人物の影響で差分重心が正しく求まらない事態を回避できる。仮に、マスク処理により差分情報が全て除外されてしまっても、差分重心に基づく修正が行われないだけであり、相関フィルタを用いた追跡位置が利用可能である。すなわち、マスク処理は追跡の結果に悪影響を与えにくいという利点がある。 In addition, by masking areas in which other tracked persons are expected to exist and processing them, when the tracked persons are close or close to each other, the difference center of gravity is affected by the other tracked persons. You can avoid situations where you don't want it right. Even if all the difference information is excluded by the masking process, correction based on the difference centroid is simply not performed, and the tracking position using the correlation filter can be used. That is, there is an advantage that the mask processing does not adversely affect the tracking result.

また、フレーム間の差分画像、中心位置の修正に必要な情報を得るために必要な処理負荷が少ないため、演算リソースの少ない組込機器にも好適に適用できる。 In addition, since the processing load required to obtain the difference image between frames and the information necessary for correcting the center position is small, the method can be suitably applied to embedded devices with few computational resources.

＜その他＞
上記実施形態は、本発明の構成例を例示的に説明するものに過ぎない。本発明は上記の具体的な形態には限定されることはなく、その技術的思想の範囲内で種々の変形が可能である。 <Others>
The above-described embodiment is merely an example of the configuration of the present invention. The present invention is not limited to the specific forms described above, and various modifications are possible within the technical scope of the present invention.

また、上記の実施形態において、第１サブ追跡部２５は相関フィルタを用いた追跡処理を行っているが、その他のアルゴリズムにより追跡を行ってもよい。例えば、ＣＮＮ（Convolutional Neural Network）、ＲＮＮ（Recurrent Neural Network）、ＬＳＴＭ（Long
Short-Term Memory）のような深層学習モデルや、ＳＶＭ（Support Vector Machine）のようなパターン認識モデルを利用して追跡を行ってもよい。また、第２サブ追跡部２６は、差分に基づく動体検知手法であれば、上記以外の手法を採用してもよい。 Also, in the above embodiment, the first sub-tracking unit 25 performs tracking processing using a correlation filter, but tracking may be performed using other algorithms. For example, CNN (Convolutional Neural Network), RNN (Recurrent Neural Network), LSTM (Long
Tracking may be performed using a deep learning model such as Short-Term Memory) or a pattern recognition model such as SVM (Support Vector Machine). Also, the second sub-tracking unit 26 may adopt a technique other than the above as long as it is a moving object detection technique based on the difference.

また、上記の実施形態では魚眼画像を平面展開せずに処理しているが、魚眼画像を平面展開した画像を処理対象としてもよいし、通常のカメラにより撮影された画像を処理対象としてもよい。 Further, in the above embodiment, the fisheye image is processed without planar development, but the image processed by planarizing the fisheye image may be processed, or the image captured by a normal camera may be processed. good too.

＜付記＞
（１）第１フレーム画像における対象物の位置を取得する取得手段（２１）と、
前記第１フレーム画像の後のフレーム画像である第２フレーム画像から、前記対象物の位置を求める追跡手段（２４）と、
を備える、物体追跡装置（１）であって、
前記追跡手段（２４）は、
前記第２フレーム画像から抽出される特徴量に基づいて、第１の追跡アルゴリズムによって前記対象物の前記第２フレーム画像における第１の座標（ｐ）を求める第１サブ追跡手段（２５）と、
前記第１フレーム画像と前記第２フレーム画像のフレーム間の差分画像（９１３）に基づいて、動きのある領域の重心である第２の座標（ｃ）を求める第２サブ追跡手段（２６）と、
前記第１の座標（ｐ）と前記第２の座標（ｃ）とに基づいて、前記第２フレーム画像における前記対象物の位置（Ｐ）を求める位置特定手段（２７）と、
を備える、
ことを特徴とする物体追跡装置（１）。 <Appendix>
(1) acquisition means (21) for acquiring the position of the object in the first frame image;
tracking means (24) for obtaining the position of the object from a second frame image, which is a frame image after the first frame image;
An object tracking device (1) comprising
Said tracking means (24) are:
a first sub-tracking means (25) for obtaining a first coordinate (p) of the object in the second frame image by a first tracking algorithm based on the feature amount extracted from the second frame image;
a second sub-tracking means (26) for obtaining a second coordinate (c), which is the center of gravity of a moving area, based on an inter-frame difference image (913) between said first frame image and said second frame image; ,
Position specifying means (27) for obtaining a position (P) of the object in the second frame image based on the first coordinate (p) and the second coordinate (c);
comprising
An object tracking device (1) characterized by:

（２）第１フレーム画像における対象物の位置を取得する取得ステップ（Ｓ１０２）と、
前記第１フレーム画像の後のフレーム画像である第２フレーム画像から、前記対象物の位置を求める追跡ステップ（Ｓ１０６）と、
を含む、物体追跡方法であって、
前記追跡ステップは、
前記第２フレーム画像から抽出される特徴量に基づいて、第１の追跡アルゴリズムによって前記対象物の前記第２フレーム画像における第１の座標（ｐ）を求めるステップ（Ｓ３０４）と、
前記第１フレーム画像と前記第２フレーム画像のフレーム間の差分画像（９１３）に基づいて、動きのある領域の重心である第２の座標（ｃ）を求めるステップ（Ｓ４０８）と、
前記第１の座標と前記第２の座標とに基づいて、前記第２フレーム画像における前記対象物の位置（Ｐ）を求めるステップ（Ｓ４１０）と、
を含む、ことを特徴とする物体追跡方法。 (2) an acquisition step (S102) of acquiring the position of the object in the first frame image;
a tracking step (S106) for obtaining the position of the object from a second frame image, which is a frame image after the first frame image;
An object tracking method comprising:
The tracking step includes:
obtaining a first coordinate (p) of the object in the second frame image by a first tracking algorithm based on the feature amount extracted from the second frame image (S304);
a step (S408) of obtaining a second coordinate (c), which is the center of gravity of an area with motion, based on an inter-frame difference image (913) between the first frame image and the second frame image;
obtaining a position (P) of the object in the second frame image based on the first coordinates and the second coordinates (S410);
An object tracking method comprising:

１：人追跡装置
２：監視システム
１０：魚眼カメラ
１１：追跡対象エリア
１２：天井
１３：人 1: Person tracking device 2: Monitoring system 10: Fisheye camera 11: Tracking target area 12: Ceiling 13: Person

Claims

acquisition means for acquiring the position of the object in the first frame image;
tracking means for obtaining the position of the object from a second frame image, which is a frame image after the first frame image;
An object tracking device comprising:
The tracking means are
a first sub-tracking means for obtaining a first coordinate of the object in the second frame image by a first tracking algorithm based on the feature quantity extracted from the second frame image;
a second sub-tracking means for obtaining a second coordinate, which is the center of gravity of a moving area, based on an inter-frame difference image between the first frame image and the second frame image;
position specifying means for determining the position of the object in the second frame image based on the first coordinates and the second coordinates;
with
The second sub-tracking means,
generating a likelihood map representing the probability that the object exists in the second frame image using a color histogram of the area containing the object generated using at least the first frame image;
generating an adjusted difference image by multiplying the difference image between the frames of the first frame image and the second frame image by the likelihood map;
Obtaining the center of gravity of the pixel region based on the degree of difference as the second coordinate in the adjusted difference image;
An object tracking device characterized by:

The second sub-tracking means,
setting a central region centered on a position where the object is assumed to exist and a peripheral region around the central region for the adjusted difference image;
Giving different weights to pixels in the central region and pixels in the peripheral region to obtain the center of gravity;
The object tracking device according to claim 1 .

The second sub-tracking means,
setting a central region centered on a position where the object is assumed to exist and a peripheral region around the central region for the adjusted difference image;
determining a weighting factor based on the sum of differences in the central region and the sum of differences in the peripheral region;
The position specifying means obtains coordinates determined as a weighted average of the first coordinates and the second coordinates using the weighting factor as the position of the object.
3. An object tracking device according to claim 1 or 2 .

The second sub-tracking means obtains the sum of the center of gravity and the difference by excluding an area in the adjusted difference image in which an object other than the object of interest is expected to exist.
4. An object tracking device according to claim 3 .

The position where the object is presumed to exist is determined based on the position of the object in the first frame image and the moving speed of the object in the first frame image,
An object tracking device according to any one of claims 2 to 4 .

When the likelihood of the first coordinate obtained by the first sub-tracking means is equal to or greater than a second threshold, the position specifying means weights the second coordinate less than otherwise, determining a coordinate determined as a weighted average of the first coordinate and the second coordinate as the position of the object;
An object tracking device according to any one of claims 1 to 5 .

When the difference between the first coordinate and the second coordinate is less than a third threshold, the position identifying means weights the second coordinate less than otherwise, and determines the position of the first coordinate. determining a coordinate determined as a weighted average of the coordinate and the second coordinate as the position of the object;
An object tracking device according to any one of claims 1 to 6 .

The first tracking algorithm is an algorithm for determining the position of the object in the second frame image by a correlation filter based on the feature amount obtained from the vicinity of the object in the first frame image.
An object tracking device according to any one of claims 1 to 7 .

The first frame image and the second frame image are fisheye images obtained by a fisheye camera,
The object tracking device according to any one of claims 1 to 8 , characterized in that:

an object tracking device according to any one of claims 1 to 9;
a fisheye camera,
surveillance system.

an acquisition step of acquiring the position of the object in the first frame image;
a tracking step of obtaining the position of the object from a second frame image, which is a frame image after the first frame image;
An object tracking method comprising:
The tracking step includes:
determining first coordinates of the object in the second frame image by a first tracking algorithm based on the feature quantity extracted from the second frame image;
obtaining second coordinates, which are the center of gravity of a moving region, based on an inter-frame difference image between the first frame image and the second frame image;
determining the position of the object in the second frame image based on the first coordinates and the second coordinates;
including
In the step of determining the second coordinates,
generating a likelihood map representing the probability that the object exists in the second frame image, using the color histogram of the area containing the object generated using at least the first frame image;
generating an adjusted difference image by multiplying the difference image between the frames of the first frame image and the second frame image by the likelihood map;
Obtaining the center of gravity of the pixel region based on the degree of difference as the second coordinate in the adjusted difference image;
An object tracking method characterized by:

A program for causing a computer to perform the steps of the method according to claim 11.