JP2012088881A

JP2012088881A - Person motion detection device and program thereof

Info

Publication number: JP2012088881A
Application number: JP2010234240A
Authority: JP
Inventors: Masaki Takahashi; 正樹高橋; Masato Fujii; 真人藤井; Masahide Naemura; 昌秀苗村
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2010-10-19
Filing date: 2010-10-19
Publication date: 2012-05-10
Anticipated expiration: 2030-10-19
Also published as: JP5604256B2

Abstract

PROBLEM TO BE SOLVED: To provide a person motion detection device that detects a motion of a person from video photographed by a camera.SOLUTION: A person motion detection device 1 includes: feature point track information generation means 10 of generating a track of feature points as feature point track information for each frame image of video; feature quantity extraction means 20 of generating a track feature quantity by accumulating a direction and a magnitude of a movement vector at a feature point for each range width obtained by dividing a possible range thereof by a predetermined number; learning data storage means 40 of clustering a plurality of track feature quantities into a predetermined number of clusters, and finding and storing, as learning data, a distribution obtained by accumulating a plurality of track feature quantities constituting a known motion by the clusters; and operation identification means 30 of generating a distribution obtained by accumulating clusters that the track feature quantities belong to from a plurality of track feature quantities within a predetermined time section, and comparing the distribution with the learning data to identify the motion of the person.

Description

本発明は、カメラで撮影された映像から人物の動作を検出する人物動作検出装置およびそのプログラムに関する。 The present invention relates to a human motion detection device that detects a human motion from video captured by a camera and a program thereof.

近年、人物の動作を自動認識する研究が盛んに進められている。例えば、身体に接触型の測定器（センサ）を取り付け、測定器で測定した速度や加速度情報から人物の動作を認識する手法が提案されている（特許文献１参照）。
しかし、このように身体に測定器を取り付けて動作を認識する手法は、設営コストや人体に与える影響（負荷）を考慮すると好ましいものとは言えない。そこで、さらに、近年では、人物を撮影した映像を解析することで人物の動作を認識する研究が多く行われている。例えば、映像中の人物の軌跡から人物の動作を認識する手法が提案されている（特許文献２，３参照）。 In recent years, researches for automatically recognizing human movements have been actively conducted. For example, a method has been proposed in which a contact-type measuring instrument (sensor) is attached to the body, and a person's movement is recognized from speed and acceleration information measured by the measuring instrument (see Patent Document 1).
However, such a method of attaching a measuring instrument to the body and recognizing the movement is not preferable in consideration of the installation cost and the influence (load) on the human body. In recent years, therefore, many studies have been conducted on recognizing a person's movement by analyzing a video image of the person. For example, a method for recognizing a person's movement from the locus of the person in the video has been proposed (see Patent Documents 2 and 3).

また、映像から人物の軌跡を求める際に、映像中のフレームごとの特徴点について、フレームごとに特徴点の水平座標および垂直座標を、時間方向に追跡することで得られる３次元（水平、垂直、時間）特徴を用いて人物の動作を認識する手法も提案されている（非特許文献１参照）。
この非特許文献１に記載された手法は、追跡する時間を所定時間に限定し、３次元特徴を固定次元（固定長）の軌跡の特徴量（軌跡特徴量）とすることで、当該軌跡特徴量を１つの単語とみなして分類を行う「Ｂａｇ−ｏｆ−ｗｏｒｄｓ（ＢＯＷ）」の手法を用いて、人物の動作を予め学習によって求めた動作に分類するものである。 In addition, when obtaining the trajectory of a person from an image, three-dimensional (horizontal and vertical) obtained by tracking the horizontal and vertical coordinates of the feature point for each frame in the time direction for the feature points for each frame in the image. A method for recognizing a person's movement using a time) feature has also been proposed (see Non-Patent Document 1).
The method described in Non-Patent Document 1 limits the tracking time to a predetermined time, and sets the three-dimensional feature as a feature amount (trajectory feature amount) of a fixed-dimension (fixed-length) trajectory. By using the “Bag-of-words (BOW)” method in which the quantity is regarded as one word for classification, the movement of the person is classified into movements obtained in advance by learning.

また、このような軌跡特徴量から「Ｂａｇ−ｏｆ−ｗｏｒｄｓ」手法を用いて動作を検出する他の手法として、特徴点がフレームごとに移動する際の移動ベクトルの角度を用いて動作検出を行う手法も提案されている（非特許文献２参照）。
この非特許文献２に記載された手法は、移動ベクトルの角度を予め定めたビン幅θ（ビン数２π／θ）で、［０，θ），［θ，２θ），…，［２π−θ，２π）ごと（なお、［ａ，ｂ）は、ａ以上ｂ未満の範囲を示す）に累計することで、固定次元の軌跡特徴量としてヒストグラム化し、「Ｂａｇ−ｏｆ−ｗｏｒｄｓ」手法を用いた人物の動作検出を可能にしている。 In addition, as another method for detecting the motion from the trajectory feature amount using the “Bag-of-words” method, motion detection is performed using the angle of the movement vector when the feature point moves for each frame. A technique has also been proposed (see Non-Patent Document 2).
The method described in Non-Patent Document 2 uses a bin width θ (bin number 2π / θ) with a predetermined motion vector angle, and [0, θ), [θ, 2θ), ..., [2π-θ. , 2π) (note that [a, b) indicates a range from a to less than b), thereby generating a histogram as fixed dimension trajectory features, and using the “Bag-of-words” method. Human motion detection is possible.

特開平１０−１１３３４３号公報JP-A-10-113343 特開２００３−８７７７１号公報JP 200387777 A 特開２００２−８０４２号公報Japanese Patent Laid-Open No. 2002-8042

Matikainen, P., Hebert, M. and Sukthankar, R. 2009. Trajectons: Action recognition through the motion analysis of tracked features. Workshop on Video-Oriented Object and Event Classification (ICCV). (Sep. 2009).Matikainen, P., Hebert, M. and Sukthankar, R. 2009. Trajectons: Action recognition through the motion analysis of tracked features.Workshop on Video-Oriented Object and Event Classification (ICCV). (Sep. 2009). V Mezaris, A Dimou, I Kompatsiaris, "Local invariant feature tracks for high-level video feature extraction", Proc. 11th International Workshop on Image Analysis for Multimedia Interactive Services, (WIAMIS 2010), April 2010.V Mezaris, A Dimou, I Kompatsiaris, "Local invariant feature tracks for high-level video feature extraction", Proc. 11th International Workshop on Image Analysis for Multimedia Interactive Services, (WIAMIS 2010), April 2010.

しかし、特許文献２，３に記載の手法では、映像内の人物の領域をフレームごとに正確に切り出すことが必要である。そのため、特許文献２，３に記載の手法では、人物領域を切り出しやすくするため、背景を平坦（予め定めた色等）に限定したり、動作の抽出対象の人物を一人に限定したり等の条件が必要になってしまう。すなわち、特許文献２，３に記載の手法では、不特定多数の人物が登場する複雑な映像では、精度よく人物の動作を検出できないという問題がある。 However, in the methods described in Patent Documents 2 and 3, it is necessary to accurately cut out a person region in a video for each frame. Therefore, in the methods described in Patent Documents 2 and 3, in order to make it easy to cut out a person region, the background is limited to a flat (predetermined color, etc.), or the person to be extracted is limited to one person. Conditions will be required. That is, the methods described in Patent Documents 2 and 3 have a problem that the motion of a person cannot be detected with high accuracy in a complicated video in which an unspecified number of persons appear.

また、非特許文献１，２に記載の手法では、映像内の特徴点を時間方向に追跡し、「Ｂａｇ−ｏｆ−ｗｏｒｄｓ」手法を用いることで、複数の動作をクラスタリングすることができるため、映像内に複数の人物が存在する場合であっても、ある程度頑健に動作を検出することができる。
しかし、非特許文献１，２に記載の手法は、以下に示すような問題点を含んでいる。
非特許文献２に記載の手法は、動作の判定要素としては、移動ベクトルの角度以外に、特徴点が移動する速度（移動ベクトルの長さ）も重要な要素であるにもかかわらず、その速度が考慮されていない。そのため、非特許文献２に記載の手法は、動き速度が不自然であるにも関わらず、移動ベクトルの角度が予め学習した結果と近似した場合、動作を誤って検出してしまうという問題を含んでいる。 In the methods described in Non-Patent Documents 1 and 2, the feature points in the video are tracked in the time direction, and a plurality of operations can be clustered by using the “Bag-of-words” method. Even if there are a plurality of persons in the video, the motion can be detected to some extent robustly.
However, the methods described in Non-Patent Documents 1 and 2 include the following problems.
In the method described in Non-Patent Document 2, although the speed at which the feature point moves (the length of the movement vector) is an important factor in addition to the angle of the movement vector, the speed is determined as an action determination element. Is not taken into account. For this reason, the method described in Non-Patent Document 2 includes a problem that, even though the motion speed is unnatural, the motion is erroneously detected when the angle of the movement vector approximates the result learned in advance. It is out.

一方、非特許文献１に記載の手法は、軌跡特徴量として、時間方向に特徴点を追跡した特徴量を使用しているため、時間方向の速度に基づく特徴量を考慮しているように考えられる。しかし、非特許文献１に記載の手法は、固定次元（固定長）の軌跡特徴量で「Ｂａｇ−ｏｆ−ｗｏｒｄｓ」手法を用いるために、追跡する時間を所定時間に限定しなければならず、動作途中で軌跡特徴量が遮断されてしまう。そのため、非特許文献１に記載の手法は、動作の時間長によって、正確に動作を検出することができないという問題がある。 On the other hand, the method described in Non-Patent Document 1 uses a feature value obtained by tracking a feature point in the time direction as a trajectory feature value, and thus seems to consider a feature value based on the speed in the time direction. It is done. However, since the method described in Non-Patent Document 1 uses the “Bag-of-words” method with a fixed dimension (fixed length) trajectory feature, the tracking time must be limited to a predetermined time, The trajectory feature value is cut off during the operation. For this reason, the technique described in Non-Patent Document 1 has a problem in that it cannot accurately detect an operation depending on the time length of the operation.

本発明は、以上のような問題に鑑みてなされたものであり、角度および速度の情報を含んだ場合であっても、動作時間に関係なく固定次元（固定長）の軌跡特徴量を用いて、正確に人物の動作を検出することが可能な人物動作検出装置およびそのプログラムを提供することを課題とする。 The present invention has been made in view of the above problems. Even when angle and speed information is included, a fixed dimension (fixed length) trajectory feature amount is used regardless of the operation time. It is an object of the present invention to provide a human motion detection device and a program thereof capable of accurately detecting a human motion.

本発明は、前記課題を解決するために創案されたものであり、まず、請求項１に記載の人物動作検出装置は、人物を撮影した映像から、前記人物の動作を検出する人物動作検出装置であって、特徴点軌跡情報生成手段と、時間特徴量生成手段と、学習データ記憶手段と、動作識別手段と、を備える構成とした。 The present invention has been made to solve the above-described problems. First, the human motion detection device according to claim 1 is a human motion detection device that detects the motion of the person from an image of a person photographed. The feature point trajectory information generation unit, the time feature amount generation unit, the learning data storage unit, and the action identification unit are provided.

かかる構成において、人物動作検出装置は、特徴点軌跡情報生成手段によって、映像のフレーム画像ごとに、画像内の特徴となる特徴点を検出し、フレーム画像ごとに特徴点の特徴量のマッチングを行うことで、特徴点の位置を時間方向に追跡した軌跡を特徴点軌跡情報として生成する。この特徴点は、Ｈａｒｒｉｓオペレータ、ＳＩＦＴ、ＳＵＲＦ等の一般的な特徴点検出手法を用いることができる。このように、特徴点を追跡することで、映像内の動作が特徴点の軌跡の集合として抽出されることになる。 In this configuration, the human motion detection device detects feature points that are features in the image for each frame image by the feature point locus information generation unit, and performs feature value matching for each frame image. Thus, a trajectory obtained by tracking the position of the feature point in the time direction is generated as the feature point trajectory information. For this feature point, a general feature point detection method such as Harris operator, SIFT, SURF, or the like can be used. In this way, by tracking the feature points, the motion in the video is extracted as a set of trajectories of the feature points.

また、人物動作検出装置は、時間特徴量生成手段によって、特徴点軌跡情報生成手段で生成された特徴点軌跡情報に含まれる前記特徴点の位置に基づいて、特徴点のフレーム画像ごとの移動ベクトルの向きおよび大きさを、当該向きおよび当該大きさの取り得る範囲を予め定めた数に分割した範囲幅ごとに累計することで時間特徴量を生成し、特徴点の軌跡の特徴量である軌跡特徴量とする。
この移動ベクトルの向きは、特徴点が移動する方向を表し、移動ベクトルの大きさは、特徴点が移動する速度を表すことになり、人物の動作を特徴付ける特徴量となる。また、時間特徴量生成手段は、移動ベクトルの向きおよび大きさの取り得る範囲を予め定めた数に分割した範囲幅ごとに累計することで、軌跡の長さ、すなわち、動作の時間長に依存せず、固定長の特徴量が抽出されることになる。 In addition, the human motion detection device may include a movement vector for each feature point frame image based on the position of the feature point included in the feature point trajectory information generated by the feature point trajectory information generation unit by the temporal feature amount generation unit. A trajectory that is a feature amount of a trajectory of a feature point is generated by accumulating the direction and size of each of the range and the range that can be taken by the size for each range width divided into a predetermined number. The feature value.
The direction of the movement vector represents the direction in which the feature point moves, and the magnitude of the movement vector represents the speed at which the feature point moves, and is a feature amount that characterizes the movement of the person. In addition, the time feature value generation unit accumulates the possible range of the direction and size of the movement vector for each range width divided into a predetermined number, thereby depending on the length of the trajectory, that is, the operation time length. Instead, a fixed-length feature value is extracted.

また、人物動作検出装置は、複数の軌跡特徴量を予め定めた数のクラスタにクラスタリングしておき、既知の動作を構成する複数の軌跡特徴量をクラスタごとに累計した分布を、既知の動作ごとに対応付けて、予め学習データとして学習データ記憶手段に記憶しておく。この学習データによって、人物の動作を構成する複数の軌跡特徴量が、予め定めた数のクラスタでモデル化されることになる。 In addition, the human motion detection device clusters a plurality of trajectory feature quantities into a predetermined number of clusters, and calculates a distribution obtained by accumulating a plurality of trajectory feature quantities constituting a known motion for each known motion. Are stored in advance as learning data in the learning data storage means. With this learning data, a plurality of trajectory feature amounts that constitute a person's motion is modeled by a predetermined number of clusters.

そして、人物動作検出装置は、動作識別手段によって、所定時間区間ごとに、当該時間区間内に軌跡の終点が存在する複数の軌跡特徴量から、当該軌跡特徴量が属するクラスタを累計した分布を生成し、学習データ記憶手段に記憶されている動作ごとのクラスタの分布と類似するか否かにより、人物の動作を識別する。なお、所定時間区間内に軌跡が終了した段階を基準とするのは、その段階で１つの動作が完了したとみなすことができるからである。
このように、人物の動作を構成する複数の軌跡特徴量が、クラスタの分布によって特定され、動作識別手段は、当該分布と学習データの分布とを比較することで、人物の動作を検出することができる。 Then, the human motion detection device generates a distribution by accumulating the clusters to which the trajectory feature amount belongs from a plurality of trajectory feature amounts in which the end point of the trajectory exists in the time interval for each predetermined time interval by the motion identification unit. Then, the person's action is identified based on whether or not it is similar to the cluster distribution for each action stored in the learning data storage means. The reason why the trajectory ends within the predetermined time interval is used as a reference because it can be considered that one operation is completed at that stage.
In this way, a plurality of trajectory feature quantities that constitute a person's motion is specified by the distribution of the cluster, and the motion identifying means detects the motion of the person by comparing the distribution with the distribution of the learning data. Can do.

また、請求項２に記載の人物動作検出装置は、請求項１に記載の人物動作検出装置において、空間特徴量生成手段をさらに備える構成とした。 According to a second aspect of the present invention, there is provided the human motion detection device according to the first aspect, further comprising a spatial feature generating unit.

かかる構成において、人物動作検出装置は、空間特徴量生成手段によって、特徴点軌跡情報生成手段で生成された特徴点軌跡情報に含まれる特徴点の位置におけるフレーム画像の輝度勾配を空間特徴量として生成し、軌跡特徴量に付加する。なお、この空間特徴量は、軌跡の始点、終点または中間点におけるフレーム画像の特徴点の輝度勾配、あるいは、フレーム画像の輝度勾配を特徴点の軌跡ごとに平均化した輝度勾配であってもよい。
このように、人物動作検出装置は、時間方向の特徴量に、さらに空間方向の特徴量を付加して軌跡特徴量を生成する。これによって、動作識別手段は、人物の動きの特徴のみならず、外観の特徴を加味して、動作を識別することになる。 In such a configuration, the human motion detection device generates, as a spatial feature amount, a brightness gradient of the frame image at the position of the feature point included in the feature point locus information generated by the feature point locus information generation unit by the spatial feature amount generation unit. And added to the trajectory feature value. The spatial feature amount may be a luminance gradient of the feature point of the frame image at the start point, end point, or intermediate point of the trajectory, or a luminance gradient obtained by averaging the luminance gradient of the frame image for each trajectory of the feature point. .
As described above, the human motion detection device generates a trajectory feature amount by adding a feature amount in the spatial direction to the feature amount in the time direction. As a result, the motion identifying means identifies the motion by taking into account not only the feature of the person's movement but also the feature of the appearance.

さらに、請求項３に記載の人物動作検出装置は、請求項１または請求項２に記載の人物動作検出装置において、時間特徴量生成手段が、方向特徴量生成手段と、速度特徴量生成手段と、を備える構成とした。 Further, the human motion detection device according to claim 3 is the human motion detection device according to claim 1 or 2, wherein the time feature quantity generation means includes a direction feature quantity generation means, a speed feature quantity generation means, It was set as the structure provided with.

かかる構成において、人物動作検出装置は、方向特徴量生成手段によって、移動ベクトルの向きの取り得る範囲を予め定めた複数の数で分割したそれぞれ異なる範囲幅ごとに、移動ベクトルの向きを累計することで、時間特徴量を構成する特徴量である方向特徴量を生成する。 In such a configuration, the human motion detection device accumulates the direction of the movement vector for each different range width obtained by dividing the range that the direction of the movement vector can take by a plurality of predetermined numbers by the direction feature amount generation unit. Thus, a directional feature amount that is a feature amount constituting the temporal feature amount is generated.

また、人物動作検出装置は、速度特徴量生成手段によって、移動ベクトルの大きさの取り得る範囲を予め定めた複数の数で分割したそれぞれ異なる範囲幅ごとに、移動ベクトルの大きさを累計することで、時間特徴量を構成する特徴量である速度特徴量を生成する。
これによって、方向特徴量には、人物の動きの方向を粗く分類した分布から細かく分類した分布まで、複数の特徴が含まれることになる。また、速度特徴量には、人物の動きの速さを粗く分類した分布から細かく分類した分布まで、複数の特徴が含まれることになる。 In addition, the human motion detection device accumulates the size of the movement vector for each different range width obtained by dividing the range of the size of the movement vector by a plurality of predetermined numbers by the speed feature amount generation unit. Thus, a speed feature quantity that is a feature quantity constituting the temporal feature quantity is generated.
As a result, the direction feature quantity includes a plurality of features from a distribution in which the direction of movement of the person is roughly classified to a distribution in which the direction is finely classified. Also, the speed feature amount includes a plurality of features from a distribution in which the speed of movement of a person is roughly classified to a distribution in which the person is finely classified.

また、請求項４に記載の人物動作検出装置は、請求項３に記載の人物動作検出装置において、時間特徴量生成手段が、平滑化手段をさらに備える構成とした。 According to a fourth aspect of the present invention, there is provided the human motion detection apparatus according to the third aspect, wherein the time feature quantity generating means further comprises a smoothing means.

かかる構成において、人物動作検出装置は、平滑化手段によって、特徴点軌跡情報における特徴点の軌跡を平滑化した複数の軌跡を生成する。また、方向特徴量生成手段および速度特徴量生成手段は、平滑化手段で平滑化された複数の軌跡に対して、方向特徴量および速度特徴量をそれぞれ生成する。
これによって、方向特徴量および速度特徴量には、厳密に再現された軌跡からおおまかに再現された軌跡まで、複数の特徴が含まれることになる。 In this configuration, the human motion detection device generates a plurality of trajectories obtained by smoothing the trajectories of the feature points in the feature point trajectory information by the smoothing unit. In addition, the direction feature quantity generation unit and the speed feature quantity generation unit generate a direction feature quantity and a speed feature quantity for each of the trajectories smoothed by the smoothing unit.
As a result, the direction feature quantity and the speed feature quantity include a plurality of features from a strictly reproduced trajectory to a roughly reproduced trajectory.

また、請求項５に記載の人物動作検出装置は、請求項１から請求項４のいずれか一項に記載の人物動作検出装置において、動作識別手段が、重み付き分布生成手段と、分類手段と、を備える構成とした。 The human motion detection device according to claim 5 is the human motion detection device according to any one of claims 1 to 4, wherein the motion identification means includes a weighted distribution generation means, a classification means, It was set as the structure provided with.

かかる構成において、人物動作検出装置は、重み付き分布生成手段によって、所定時間区間内に軌跡の終点が存在する個々の軌跡特徴量を単語とみなし、前記時間長内に存在する複数の単語を文書とみなすことで、ｔｆ−ｉｄｆ法により、特徴量抽出手段で生成された軌跡特徴量の重要度を算出し、当該軌跡特徴量が属するクラスタの頻度に重み付けを行うことでクラスタの分布を生成する。
また、人物動作検出装置は、分類手段によって、重み付き分布生成手段で生成されたクラスタの分布と、学習データ記憶手段に学習データとして記憶されている動作ごとのクラスタの分布との距離に基づいて類似を判定し、人物の動作を分類する。この距離は、例えば、ユークリッド距離を用いる。 In such a configuration, the human motion detection device regards each trajectory feature amount having a trajectory end point within a predetermined time interval as a word by the weighted distribution generation unit, and documents a plurality of words existing within the time length. Therefore, the importance of the trajectory feature amount generated by the feature amount extraction unit is calculated by the tf-idf method, and the cluster distribution is generated by weighting the frequency of the cluster to which the trajectory feature amount belongs. .
Further, the human motion detection device is based on the distance between the cluster distribution generated by the weighted distribution generation unit by the classification unit and the cluster distribution for each operation stored as learning data in the learning data storage unit. Similarity is determined, and a person's action is classified. For this distance, for example, the Euclidean distance is used.

このように、軌跡特徴量は固定長の特徴量であるため、人物動作検出装置は、「Ｂａｇ−ｏｆ−ｗｏｒｄｓ」手法で用いられるｔｆ−ｉｄｆ法を用いて、軌跡特徴量の重要度を算出することができる。これによって、映像内で頻繁に発生する背景領域上の軌跡特徴量の重要度を下げ、特定の時間に発生する人物の軌跡特徴量の重要度を上げることができる。 Thus, since the trajectory feature amount is a fixed-length feature amount, the human motion detection device calculates the importance of the trajectory feature amount using the tf-idf method used in the “Bag-of-words” method. can do. As a result, the importance of the trajectory feature amount on the background area that frequently occurs in the video can be reduced, and the importance of the trajectory feature amount of the person occurring at a specific time can be increased.

さらに、請求項６に記載の人物動作検出プログラムは、人物を撮影した映像から、前記人物の動作を検出するために、コンピュータを、特徴点軌跡情報生成手段、時間特徴量生成手段、動作識別手段として機能させる構成とした。 Furthermore, the person motion detection program according to claim 6 is a computer program comprising: a feature point trajectory information generating means, a time feature amount generating means, and an action identifying means for detecting a motion of the person from an image of a person photographed. It was set as the structure made to function as.

かかる構成において、人物動作検出プログラムは、特徴点軌跡情報生成手段によって、映像のフレーム画像ごとに特徴点を検出し、フレーム画像ごとに特徴点の特徴量のマッチングを行うことで、特徴点の位置を時間方向に追跡した軌跡を特徴点軌跡情報として生成する。また、人物動作検出プログラムは、時間特徴量生成手段によって、特徴点軌跡情報生成手段で生成された特徴点軌跡情報に含まれる前記特徴点の位置に基づいて、特徴点のフレーム画像ごとの移動ベクトルの向きおよび大きさを、当該向きおよび当該大きさの取り得る範囲を予め定めた数に分割した範囲幅ごとに累計することで時間特徴量を生成し、特徴点の軌跡の特徴量である軌跡特徴量とする。 In such a configuration, the human motion detection program detects the feature points for each frame image of the video by the feature point trajectory information generation unit, and performs the feature point matching for each frame image to thereby determine the position of the feature point. Is generated as feature point trajectory information. In addition, the human motion detection program includes a movement vector for each feature point frame image based on the position of the feature point included in the feature point trajectory information generated by the feature point trajectory information generation unit by the temporal feature amount generation unit. A trajectory that is a feature amount of a trajectory of a feature point is generated by accumulating the direction and size of each of the range and the range that can be taken by the size for each range width divided into a predetermined number. The feature value.

そして、人物動作検出プログラムは、動作識別手段によって、所定時間区間内に軌跡の終点が存在する複数の軌跡特徴量から、当該軌跡特徴量が属するクラスタを累計した分布を生成し、学習データ記憶手段に記憶されている動作ごとのクラスタの分布と類似する否かにより、人物の動作を識別する。なお、学習データ記憶手段には、複数の軌跡特徴量を予め定めた数のクラスタにクラスタリングしておき、既知の動作を構成する複数の軌跡特徴量をクラスタごとに累計した分布を、既知の動作ごとに対応付けて、予め学習データとして記憶しておく。 Then, the human motion detection program generates a distribution by accumulating the clusters to which the trajectory feature amount belongs from a plurality of trajectory feature amounts in which the end point of the trajectory exists within a predetermined time interval by the motion identifying unit, and learning data storage unit A person's action is identified based on whether or not it is similar to the cluster distribution for each action stored in. In the learning data storage means, a plurality of trajectory feature quantities are clustered into a predetermined number of clusters, and a distribution obtained by accumulating a plurality of trajectory feature quantities constituting a known action for each cluster is used as a known action. The data is stored in advance as learning data in association with each other.

本発明は、以下に示す優れた効果を奏するものである。
請求項１，６に記載の発明によれば、映像から、人物の動作に関する特徴量を、動作時間に関係なく固定長の軌跡特徴量で表すことができるため、動作途中で特徴量が遮断されることなく、正確に動作の特徴量を抽出することができる。これによって、本発明は、精度の高い特徴量を用いることで、映像から、高精度に人物の動作を検出することができる。 The present invention has the following excellent effects.
According to the first and sixth aspects of the present invention, since the feature amount related to the motion of the person can be expressed from the video by the fixed-length trajectory feature amount regardless of the motion time, the feature amount is cut off during the motion. Therefore, it is possible to accurately extract the feature amount of the operation without any problem. Thus, according to the present invention, it is possible to detect a person's movement with high accuracy from an image by using a highly accurate feature amount.

また、請求項１，６に記載の発明によれば、軌跡特徴量を固定長で表すことができるため、軌跡特徴量を単語とみなした「Ｂａｇ−ｏｆ−ｗｏｒｄｓ」手法による人物の動作検出が可能になる。これによって、本発明は、所定時間区間内に軌跡の終点が存在する動作ごとにクラスタリングを行うことで、映像内に複数の人物が存在している場合であっても、動作が完了したタイミングで人物の動作を個別に検出することができる。 Further, according to the first and sixth aspects of the present invention, since the trajectory feature amount can be expressed by a fixed length, the motion detection of the person by the “Bag-of-words” method in which the trajectory feature amount is regarded as a word is performed. It becomes possible. As a result, the present invention performs clustering for each motion in which a trajectory end point exists within a predetermined time interval, so that even when there are a plurality of persons in the video, the motion is completed at the timing. The movement of a person can be detected individually.

請求項２に記載の発明によれば、人物の動作を検出する際に、時間特徴量に加え、空間特徴量を加味することができる。これによって、本発明は、人物の動作検出に、動きのみならず、外観的特徴も判定要素として加味できるため、例えば、ある動作がノイズによって発生した動作なのか、人物の手の動きによって発生したものかを区別して判定することが可能になる。 According to the second aspect of the present invention, it is possible to consider the spatial feature amount in addition to the time feature amount when detecting the motion of the person. As a result, the present invention can add not only movement but also appearance characteristics as determination factors to human motion detection. For example, whether a motion is caused by noise or caused by motion of a person's hand. It is possible to distinguish and determine whether it is a thing.

請求項３に記載の発明によれば、方向特徴量および速度特徴量を粗く分類した分布から細かく分類した分布まで複数分類しておくことで、正確に学習データの動作を再現した場合でなくても、おおまかな動きであっても、動きを判別することができる。 According to the third aspect of the present invention, it is not the case where the operation of the learning data is accurately reproduced by classifying a plurality of directional feature quantities and velocity feature quantities from coarsely classified distributions to finely classified distributions. Even if it is a rough movement, the movement can be discriminated.

請求項４に記載の発明によれば、軌跡を平滑化することで、人物の同じ動作に対して、人物ごとの個人差に伴う異なる動きであっても、その違いを吸収して、同一の動きと判定することができ、頑健に人物の動きを検出することができる。 According to the fourth aspect of the present invention, by smoothing the trajectory, even if the movement of the person is different due to individual differences for each person, the difference is absorbed and the same movement is obtained. It can be determined as a movement, and the movement of a person can be detected robustly.

請求項５に記載の発明によれば、軌跡特徴量を固定長とすることで、ｔｆ−ｉｄｆ法を用いることが可能になり、映像内における人物の動作の軌跡の重要度を高め、背景領域の軌跡の重要度を下げることができる。これによって、本発明は、頑健に人物の動作を検出することができる。 According to the fifth aspect of the present invention, it is possible to use the tf-idf method by setting the trajectory feature amount to a fixed length, increasing the importance of the trajectory of the person's motion in the video, and the background region. The importance of the trajectory can be reduced. As a result, the present invention can robustly detect the movement of a person.

本発明の実施形態に係る人物動作検出装置の全体構成を示すブロック構成図である。It is a block block diagram which shows the whole structure of the person motion detection apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る人物動作検出装置の特徴点軌跡情報生成手段が生成する特徴点軌跡情報を説明するための説明図である。It is explanatory drawing for demonstrating the feature point locus | trajectory information which the feature point locus | trajectory information generation means of the person motion detection apparatus which concerns on embodiment of this invention produces | generates. 本発明の実施形態に係る人物動作検出装置の平滑化手段が行う特徴点の軌跡の平滑化を説明するための模式図であって、（ａ）はＨａａｒフィルタを２段階適用した図、（ｂ）は特徴点の軌跡が平滑化される様子を示す図である。It is a schematic diagram for demonstrating the smoothing of the locus | trajectory of the feature point which the smoothing means of the human motion detection apparatus which concerns on embodiment of this invention performs, Comprising: (a) is a figure which applied the Haar filter in two steps, (b) ) Is a diagram showing how the trajectory of the feature point is smoothed. 本発明の実施形態に係る人物動作検出装置の方向特徴量生成手段が生成する方向特徴量（方向特徴量ヒストグラム）を示す図である。It is a figure which shows the direction feature-value (direction feature-value histogram) which the direction feature-value production | generation means of the person motion detection apparatus which concerns on embodiment of this invention produces | generates. 本発明の実施形態に係る人物動作検出装置の速度特徴量生成手段が生成する速度特徴量（速度特徴量ヒストグラム）を示す図である。It is a figure which shows the speed feature-value (speed feature-value histogram) which the speed feature-value production | generation means of the human motion detection apparatus which concerns on embodiment of this invention produces | generates. 本発明の実施形態に係る人物動作検出装置のコードブック生成手段におけるコードブックの生成手法を説明するための説明図である。It is explanatory drawing for demonstrating the production | generation method of the code book in the code book production | generation means of the human motion detection apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る人物動作検出装置のヒストグラム生成手段におけるヒストグラムの生成手法を生成するための説明図である。It is explanatory drawing for producing | generating the production | generation method of the histogram in the histogram production | generation means of the human motion detection apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る人物動作検出装置の学習フェーズ（コードブック生成）の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the learning phase (codebook production | generation) of the person motion detection apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る人物動作検出装置の学習フェーズ（ヒストグラム生成）の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the learning phase (histogram generation) of the human motion detection apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る人物動作検出装置の動作検出フェーズの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the motion detection phase of the human motion detection apparatus which concerns on embodiment of this invention.

以下、本発明の実施形態について図面を参照して説明する。
［人物動作検出装置の構成］
最初に、図１を参照して、本発明の実施形態に係る人物動作検出装置の構成について説明する。人物動作検出装置１は、カメラ（不図示）で撮影された映像から、当該映像に映された人物の動作を検出するものである。ここでは、人物動作検出装置１は、特徴点軌跡情報生成手段１０と、特徴量抽出手段２０と、動作識別手段３０と、学習データ記憶手段４０と、を備えている。 Embodiments of the present invention will be described below with reference to the drawings.
[Configuration of human motion detection device]
Initially, with reference to FIG. 1, the structure of the human motion detection apparatus which concerns on embodiment of this invention is demonstrated. The person motion detection device 1 detects a motion of a person shown in the video from a video shot by a camera (not shown). Here, the human motion detection device 1 includes a feature point trajectory information generation unit 10, a feature amount extraction unit 20, a motion identification unit 30, and a learning data storage unit 40.

特徴点軌跡情報生成手段１０は、入力される映像のフレーム（フレーム画像）ごとに、フレーム画像の特徴となる点（特徴点）を検出し、時間方向に特徴点を追跡することで、特徴点の位置情報（座標）を時間方向に連結した特徴点軌跡情報を生成するものである。
ここでは、特徴点軌跡情報生成手段１０は、前景領域抽出手段１１と、特徴点検出手段１２と、特徴点追跡手段１３と、を備えている。 The feature point trajectory information generation unit 10 detects a point (feature point) that is a feature of the frame image for each frame (frame image) of the input video, and tracks the feature point in the time direction to thereby detect the feature point. The feature point trajectory information is generated by connecting the position information (coordinates) in the time direction.
Here, the feature point trajectory information generation unit 10 includes a foreground region extraction unit 11, a feature point detection unit 12, and a feature point tracking unit 13.

前景領域抽出手段１１は、入力される映像のフレームごとに、動きのある領域を前景領域として抽出するものである。この前景領域抽出手段１１は、抽出した前景領域とそれ以外の領域である背景領域とを区分した情報（例えば、２値画像）を、特徴点検出手段１２に出力する。 The foreground area extraction unit 11 extracts a moving area as a foreground area for each frame of an input video. The foreground area extraction unit 11 outputs information (for example, a binary image) obtained by dividing the extracted foreground area and the background area other than the extracted foreground area to the feature point detection unit 12.

なお、この前景領域抽出手段１１は、一般的な背景差分処理により前景領域を抽出することができる。例えば、映像が固定カメラで撮影された映像であれば、人物が映っていない画像を予め背景画像として撮影しておき、入力されるフレーム（フレーム画像）との差分をとることで、差のある領域を前景領域として抽出する。 Note that the foreground area extracting unit 11 can extract the foreground area by a general background difference process. For example, if the video is a video shot with a fixed camera, an image in which no person is shown is shot in advance as a background image, and there is a difference by taking the difference from the input frame (frame image). Extract the region as a foreground region.

また、前景領域抽出手段１１は、例えば、フレーム画像の画素ごとに予め定めたフレーム数で画素値（あるいは輝度値）の平均や分散を算出し、画素値の変動が予め定めた閾値よりも大きい画素を前景領域の画素とすることとしてもよい。
このように、前景領域抽出手段１１は、動きのある領域を前景領域として抽出することで、主に人物の動いた領域を抽出することができる。 In addition, the foreground area extraction unit 11 calculates an average or variance of pixel values (or luminance values) with a predetermined number of frames for each pixel of the frame image, for example, and the variation in pixel values is larger than a predetermined threshold value. The pixel may be a pixel in the foreground area.
As described above, the foreground area extraction unit 11 can extract an area where a person has moved mainly by extracting an area having movement as a foreground area.

特徴点検出手段１２は、入力される映像のフレームごとに、フレーム画像の特徴となる点（特徴点）を検出するものである。例えば、特徴点検出手段１２は、隣接画素に対する画素値あるいは輝度値の変化によって特徴点を検出する。この特徴点検出手段１２は、フレーム画像ごとに検出した特徴点の位置（座標）を特徴点追跡手段１３に出力する。なお、ここでは、特徴点検出手段１２は、検出した特徴点が、前景領域抽出手段１１で抽出された前景領域に含まれない場合、特徴点追跡手段１３に出力しないこととする。これによって、特徴点追跡手段１３における特徴点追跡の演算処理において、人物の動きとは関係のない背景の特徴点に対する特徴点追跡を防止することができる。 The feature point detection means 12 detects a point (feature point) that is a feature of the frame image for each frame of the input video. For example, the feature point detection unit 12 detects a feature point based on a change in pixel value or luminance value with respect to an adjacent pixel. The feature point detection unit 12 outputs the position (coordinates) of the feature point detected for each frame image to the feature point tracking unit 13. Here, it is assumed that the feature point detection unit 12 does not output to the feature point tracking unit 13 when the detected feature point is not included in the foreground region extracted by the foreground region extraction unit 11. Accordingly, in the feature point tracking calculation processing in the feature point tracking unit 13, it is possible to prevent the feature point tracking for the background feature points that are not related to the movement of the person.

この特徴点検出手段１２における特徴点検出手法は、一般的な手法を用いることができる。例えば、特徴点検出手段１２は、入力されるフレーム画像に対して、Ｈａｒｒｉｓオペレータに代表されるようなコーナー検出処理を施すことで特徴点を検出する。
このＨａｒｒｉｓオペレータは、画像信号の相関性に基づいて特徴点を検出する手法で、画像内のエッジやコーナー等の特徴点において相関出力値が大きくなるという特徴を有するオペレータである。 A general method can be used as the feature point detection method in the feature point detection means 12. For example, the feature point detection means 12 detects a feature point by performing a corner detection process represented by a Harris operator on the input frame image.
The Harris operator is a method for detecting feature points based on the correlation of image signals, and is an operator having a feature that a correlation output value becomes large at feature points such as edges and corners in an image.

このＨａｒｒｉｓオペレータは、まず、入力画像（フレーム画像）に対してガウシアンオペレータにより平滑化処理を行う。そして、Ｈａｒｒｉｓオペレータは、画像上の予め定めた大きさの正方形窓Ｗにおいて、座標（ｘ，ｙ）ごとに、輝度値Ｉ（ｘ，ｙ）の勾配Ｉ_ｕ（ｘ，ｙ），Ｉ_ｖ（ｘ，ｙ）を用いて以下の（１）式に示す行列Ａを算出する。ここで、勾配Ｉ_ｕ（ｘ，ｙ），Ｉ_ｖ（ｘ，ｙ）は、それぞれ、輝度値Ｉ（ｘ，ｙ）のｘに関する偏微分値、ｙに関する偏微分値である。 The Harris operator first smoothes the input image (frame image) by the Gaussian operator. Then, the Harris operator in the square window W of a predetermined size on the image, for each coordinate (x, y), the gradient I _u (x, y), I _v ( The matrix A shown in the following equation (1) is calculated using x, y). Here, the gradients I _u (x, y) and I _v (x, y) are a partial differential value related to x and a partial differential value related to y of the luminance value I (x, y), respectively.

そして、Ｈａｒｒｉｓオペレータは、以下の（２）式に示すように、（１）式で算出した行列Ａの固有値λ_１，λ_２の極小値を特徴量Ｈ_ｘｙとして求める。 Then, as shown in the following equation (2), the Harris operator obtains the minimum values of the eigenvalues λ ₁ and λ ₂ of the matrix A calculated by the equation (1) as the feature amount H _xy .

なお、固有値の正確な計算は演算量が大きいため、（２）式に代えて、以下の（３）式に示すように、行列Ａの行列式（ｄｅｔＡ）とトレース（ｔｒＡ）を用いて演算することとしてもよい。なお、κは、予め定めた定数であって、例えば、Ｈａｒｒｉｓらが参照論文で推奨する“０．０４”〜“０．１５”の範囲の定数である（参照論文：Harris, C., Stephens, M.: A Combined Corner and Edge Detector. Proceedings of the 4th Alvey Vision Conference. Manchester, U.K. (1988) 147-151.）。 In addition, since accurate calculation of the eigenvalue requires a large amount of calculation, the calculation is performed using the determinant (detA) and trace (trA) of the matrix A as shown in the following expression (3) instead of the expression (2). It is good to do. Note that κ is a predetermined constant, for example, a constant in a range of “0.04” to “0.15” recommended by Harris et al. In a reference paper (reference paper: Harris, C., Stephens). , M .: A Combined Corner and Edge Detector. Proceedings of the 4th Alvey Vision Conference. Manchester, UK (1988) 147-151.

このように算出された特徴量Ｈ_ｘｙは、その値が大きいほど、エッジ、コーナー等の特徴を示している。そこで、特徴点検出手段１２は、この特徴量Ｈ_ｘｙが予め定めた閾値よりも大きい場合に、座標（ｘ，ｙ）の画素を特徴点と判定する。
このように、特徴点検出手段１２は、フレーム画像ごとに特徴点を検出し、前景領域抽出手段１１で抽出された前景領域内の特徴点のみを、特徴点追跡手段１３に出力する。
なお、特徴点検出手段１２は、Ｈａｒｒｉｓオペレータのほか、ＳＩＦＴ（Scale Invariant Feature Transform）、ＳＵＲＦ（Speeded Up Robust Features）等の一般的な特徴量検出手法を用いてもよい。 The feature amount H _xy calculated in this way indicates features such as edges and corners as the value increases. Therefore, the feature point detection unit 12 determines the pixel at the coordinates (x, y) as the feature point when the feature amount H _xy is larger than a predetermined threshold value.
As described above, the feature point detection unit 12 detects the feature point for each frame image, and outputs only the feature points in the foreground region extracted by the foreground region extraction unit 11 to the feature point tracking unit 13.
The feature point detection means 12 may use a general feature amount detection method such as SIFT (Scale Invariant Feature Transform), SURF (Speeded Up Robust Features), etc., in addition to the Harris operator.

特徴点追跡手段１３は、特徴点検出手段１２で検出された特徴点を、フレームごとに追跡するものである。この特徴点追跡手段１３は、特徴量が類似する特徴点をフレームごとにマッチングすることで、特徴点を時間方向に追跡する。 The feature point tracking unit 13 tracks the feature points detected by the feature point detection unit 12 for each frame. The feature point tracking means 13 tracks feature points in the time direction by matching feature points with similar feature quantities for each frame.

すなわち、特徴点追跡手段１３は、フレーム画像ごとに、あるフレーム画像における特徴点の特徴量と、前フレーム画像における特徴点の特徴量とがマッチング（合致または類似）した場合に同一の特徴点として追跡し、特徴量がマッチングしなかった場合に、当該特徴点の追跡を終了する。これによって、時間方向に特徴点がマッチングする間、特徴点の追跡が行われることになる。 That is, for each frame image, the feature point tracking unit 13 determines that the feature amount of a feature point in a certain frame image matches the feature amount of the feature point in the previous frame image as the same feature point. If the feature amount is not matched, tracking of the feature point is ended. As a result, while the feature points match in the time direction, the feature points are tracked.

なお、この特徴点追跡手段１３における特徴点追跡手法は、一般的な手法を用いることができる。例えば、Ｌｕｃａｓ−Ｋａｎａｄｅ法を用いることができる。
このＬｕｃａｓ−Ｋａｎａｄｅ法は、同一物体の局所領域内では、オプティカルフローが同一になると仮定した空間的局所最適化法の一つである。オプティカルフローとは、連続した画像間で特徴点がどの方向にどの程度移動するかを表す速度ベクトルである。 A general method can be used as the feature point tracking method in the feature point tracking means 13. For example, the Lucas-Kanade method can be used.
The Lucas-Kanade method is one of spatial local optimization methods that assume that the optical flows are the same in the local region of the same object. An optical flow is a velocity vector that represents how much a feature point moves in which direction between successive images.

ここで、フレーム画像の時刻ｔにおけるある正方形窓Ｗ内の座標（ｘ，ｙ）の輝度値をＩ（ｘ，ｙ，ｔ）、時刻（ｔ＋δｔ）における正方形窓Ｗ内の座標（ｘ，ｙ）の輝度値をＩ（ｘ，ｙ，ｔ＋δｔ）としたとき、オプティカルフロー（ｕ，ｖ）は、以下の（４）式で表される。 Here, the luminance value of coordinates (x, y) in a certain square window W at time t of the frame image is I (x, y, t), and coordinates (x, y) in the square window W at time (t + δt). The optical flow (u, v) is expressed by the following equation (4) where I (x, y, t + δt) is the luminance value of.

このように、特徴点追跡手段１３は、フレーム画像間で、特徴点のマッチングを行い、オプティカルフロー（ｕ，ｖ）が類似する特徴点を同一特徴点の軌跡として追跡する。なお、オプティカルフローが類似するか否かは、オプティカルフロー同士の距離（例えば、ユークリッド距離）により判定することができる。
ここでは、特徴点追跡手段１３は、特徴点の軌跡ごとに、時間情報（例えば、フレーム番号）に対応付けて、フレーム画像内の特徴点の座標位置を連結することで特徴点軌跡情報を生成する。この特徴点追跡手段１３で生成された特徴点軌跡情報は、特徴量抽出手段２０に出力される。 As described above, the feature point tracking unit 13 performs feature point matching between the frame images, and tracks feature points having similar optical flows (u, v) as traces of the same feature points. Note that whether or not the optical flows are similar can be determined by the distance between the optical flows (for example, the Euclidean distance).
Here, the feature point tracking unit 13 generates feature point trajectory information by connecting the coordinate positions of the feature points in the frame image in association with time information (for example, frame number) for each trajectory of the feature points. To do. The feature point trajectory information generated by the feature point tracking unit 13 is output to the feature amount extracting unit 20.

この特徴点軌跡情報生成手段１０は、例えば、図２に示すように、時刻ｔ_ｉ，…，ｔ_ｊ，…，ｔ_ｋにおいて入力された映像の各フレーム画像（ａ），（ｂ），（ｃ）において、人物がある動作（ここでは、携帯電話を耳に近づける動作）を行った場合、時刻ｔ_ｉ，…，ｔ_ｊ，…，ｔ_ｋにおいて、フレーム画像内の特徴点を順次複数検出する。そして、特徴点の軌跡が終了した時刻ｔ_ｋの時点で、（ｄ）に示すように、各フレーム画像（ａ），（ｂ），（ｃ）で検出した特徴点を連結することで、特徴点の軌跡を生成する。 The feature point trajectory information generating unit 10 is, for example, as shown in FIG. 2, the time _{_{t i, ..., t j,}} ..., each frame image of the input video in _{t k (a), (b} ), ( in c), the operation (here there is a person, in the case of performing the operation) to bring the cellular phone to the ear, the time _{t i, ..., t j,} ..., at t _k, sequentially plurality detect feature points in the frame image To do. Then, at time t _k the locus of the feature points is completed, as shown in (d), each frame image (a), (b), by connecting the detected feature point (c), wherein Generate a point trajectory.

この図２中、ｐ_ｉは時刻ｔ_ｉの時点における特徴点の位置、ｐ_ｊは時刻ｔ_ｊの時点における特徴点の位置、ｐ_ｋは時刻ｔ_ｋの時点における特徴点の位置をそれぞれ示している。なお、この図２では、特徴点の軌跡を分かり易く説明するため、特徴点の数を減らして図示している。
このように、特徴点軌跡情報生成手段１０は、追跡した軌跡ｐ_ｉ，…，ｐ_ｊ，…，ｐ_ｋの座標位置を連結することで特徴点軌跡情報を生成する。
図１に戻って、人物動作検出装置１の構成について説明を続ける。 In FIG. 2, p _i represents the position of the feature point at the time t _i , p _j represents the position of the feature point at the time t _j , and p _k represents the position of the feature point at the time t _k. Yes. In FIG. 2, the number of feature points is reduced in order to easily explain the trajectory of feature points.
Thus, the feature point trajectory information generating unit 10, tracking trajectory _{p i, ..., p j,} ..., and generates a feature point trajectory information by connecting the coordinate position of p _k.
Returning to FIG. 1, the description of the configuration of the human motion detection device 1 will be continued.

特徴量抽出手段２０は、特徴点軌跡情報生成手段１０で生成された特徴点軌跡情報に基づいて、特徴点の軌跡ごとの特徴量（軌跡特徴量）を生成するものである。なお、この特徴量抽出手段２０は、特徴点の軌跡ごとに、時間方向の多次元の特徴量（時間特徴量）と、フレーム画像内の空間方向の多次元の特徴量（空間特徴量）とを、固定長（固定次元）の軌跡特徴量として生成する。ここでは、特徴量抽出手段２０は、時間特徴量生成手段２１と、空間特徴量生成手段２２と、を備えている。 The feature amount extraction unit 20 generates a feature amount (trajectory feature amount) for each feature point trajectory based on the feature point trajectory information generated by the feature point trajectory information generation unit 10. Note that the feature quantity extraction unit 20 performs, for each feature point trajectory, a multidimensional feature quantity in the time direction (temporal feature quantity) and a multidimensional feature quantity in the spatial direction in the frame image (spatial feature quantity). Is generated as a trajectory feature amount of a fixed length (fixed dimension). Here, the feature quantity extraction unit 20 includes a temporal feature quantity generation unit 21 and a spatial feature quantity generation unit 22.

時間特徴量生成手段２１は、特徴点軌跡情報生成手段１０で生成された特徴点軌跡情報に含まれる前記特徴点の位置に基づいて、特徴点の軌跡（フレーム画像ごとの特徴点の移動ベクトル）から、時間方向の多次元の特徴量（時間特徴量）を生成するものである。すなわち、時間特徴量生成手段２１は、人物の動作のうち時間方向の特徴となる特徴点の移動方向（移動ベクトルの向き〔角度〕）や移動速度（移動ベクトルの大きさ〔長さ〕）に基づいて、時間方向の特徴量を生成する。この時間特徴量は、特徴点の軌跡の時間方向の特徴を示す軌跡特徴量となる。
ここでは、時間特徴量生成手段２１は、平滑化手段２１１と、方向特徴量生成手段２１２と、速度特徴量生成手段２１３と、を備えている。 Based on the position of the feature point included in the feature point trajectory information generated by the feature point trajectory information generation unit 10, the time feature amount generation unit 21 generates a trajectory of the feature point (a feature point movement vector for each frame image). From this, a multi-dimensional feature quantity (temporal feature quantity) in the time direction is generated. That is, the time feature quantity generation means 21 determines the movement direction (direction of movement vector [angle]) and the movement speed (size of movement vector [length]) of a feature point that is a feature in the time direction of the human motion. Based on this, a feature quantity in the time direction is generated. This time feature amount is a trajectory feature amount indicating a feature in the time direction of the trajectory of the feature point.
Here, the time feature quantity generation means 21 includes a smoothing means 211, a direction feature quantity generation means 212, and a speed feature quantity generation means 213.

平滑化手段２１１は、特徴点の軌跡に対して、複数のレベル（平滑化レベル）の平滑化処理を行うものである。この平滑化手段２１１は、特徴点の複雑な軌跡を複数のレベル平滑化レベルで平滑化することで、複数の軌跡を生成する。この平滑化処理は、Ｈａａｒフィルタに代表される一般的なローパスフィルで実現することができる。 The smoothing unit 211 performs a plurality of levels (smoothing levels) of smoothing processing on the trajectory of the feature points. The smoothing unit 211 generates a plurality of trajectories by smoothing a complex trajectory of feature points at a plurality of level smoothing levels. This smoothing process can be realized by a general low-pass filter typified by a Haar filter.

このように、特徴点の軌跡を複数の平滑化レベルで表すことで、人物の動作の軌跡が人物の個性によらない人物の一般的な動作に近似した軌跡として表現されることになる。ただし、平滑化されていない軌跡は、人物の動作の軌跡を正確に表している。そこで、この平滑化手段２１１は、平滑化を行っていない軌跡を含んだ複数のレベルで平滑化した軌跡を生成し、方向特徴量生成手段２１２および速度特徴量生成手段２１３に出力することとする。 In this way, by representing the trajectory of the feature point with a plurality of smoothing levels, the trajectory of the person's motion is expressed as a trajectory that approximates the general motion of the person regardless of the personality of the person. However, the unsmoothed trajectory accurately represents the trajectory of the person's movement. Therefore, the smoothing unit 211 generates a trajectory smoothed at a plurality of levels including a trajectory that has not been smoothed, and outputs the trajectory to the direction feature value generating unit 212 and the velocity feature value generating unit 213. .

ここで、数式および図３を参照して、平滑化手段２１１が、Ｈａａｒフィルタによって複数の平滑化レベルで軌跡を平滑化する処理について具体的に説明する。
Ｈａａｒフィルタは、離散時間（ｚ空間）で、以下の（５）式の伝達関数で表されるフィルタである。 Here, the process in which the smoothing unit 211 smoothes the trajectory at a plurality of smoothing levels by the Haar filter will be specifically described with reference to the mathematical formula and FIG.
The Haar filter is a filter represented by a transfer function of the following equation (5) in discrete time (z space).

ここで、特徴点ｋの軌跡を、（５）式に示したＨａａｒフィルタでｑ段階（ｑ：０以上の整数）に平滑化したときの特徴点ｋのｘ座標およびｙ座標を、以下の（６）式とする。 Here, the x-coordinate and y-coordinate of the feature point k when the trajectory of the feature point k is smoothed to q steps (q: an integer of 0 or more) with the Haar filter shown in the equation (5) are as follows: 6) Formula.

また、特徴点ｋの軌跡がフレーム番号ｔ_１からｔ_２に存在したとすると、特徴点ｋのｘ座標ｐ^ｘ _ｋ，ｑは、以下の（７）式で表すことができ、（５）式に示したＨａａｒフィルタは、以下の（８）式で表すことができる。なお、ｙ座標ｐ^ｙ _ｋ，ｑについては、ｘ座標と同様であるため、数式を省略する。 Further, assuming that the trajectory of the feature point k exists in the frame numbers t ₁ to t ₂ , the x coordinate p ^x _{k, q} of the feature point k can be expressed by the following equation (7), and the equation (5) The Haar filter shown in (5) can be expressed by the following equation (8). Since the y coordinate p ^y _{k, q} is the same as the x coordinate, the mathematical formula is omitted.

ここで、図３を参照して、Ｈａａｒフィルタによって特徴点の軌跡が平滑化される様子を模式的に説明する。ここでは、図３（ａ）に示すように、前記（５）式のＨａａｒフィルタを２段階適用した例を示している。すなわち、平滑化手段２１１は、平滑化レベル０（Ｌｅｖｅｌ０：ｑ＝０）の特徴点ｋの軌跡に対して、Ｈａａｒフィルタを適用し、平滑化レベル１（Ｌｅｖｅｌ１：ｑ＝１）の軌跡を生成し、さらに、平滑化レベル１の軌跡に対して、Ｈａａｒフィルタを適用することで、平滑化レベル２（Ｌｅｖｅｌ２：ｑ＝２）の軌跡を生成する。 Here, with reference to FIG. 3, how the trajectory of the feature point is smoothed by the Haar filter will be schematically described. Here, as shown in FIG. 3A, an example is shown in which the Haar filter of the equation (5) is applied in two stages. That is, the smoothing unit 211 applies a Haar filter to the trajectory of the feature point k at the smoothing level 0 (Level 0: q = 0) to generate a trajectory at the smoothing level 1 (Level 1: q = 1). Further, a smoothing level 2 (Level 2: q = 2) trajectory is generated by applying the Haar filter to the smoothing level 1 trajectory.

これによって、平滑化手段２１１は、図３（ｂ）に示すように、特徴点ｋのＬｅｖｅｌ０のＰ_ｋ，０の特徴点の軌跡（図中、実線）と、Ｌｅｖｅｌ１のＰ_ｋ，１の特徴点の軌跡（図中、破線）と、Ｌｅｖｅｌ２のＰ_ｋ，２の特徴点の軌跡（図中、一点鎖線）とをそれぞれ生成し、その軌跡の座標位置を、平滑化レベルの異なる特徴点軌跡情報として、方向特徴量生成手段２１２および速度特徴量生成手段２１３に出力する。
図１に戻って、人物動作検出装置１の構成について説明を続ける。 Thereby, as shown in FIG. 3 (b), the smoothing means 211 has the trajectory of the level _{0 Pk, 0} feature point of the feature point _k (solid line in the figure) and the level 1 feature of _{Pk, 1} . A trajectory of points (broken line in the figure) and a trajectory of _{Pk, 2} feature points of Level 2 (dashed line in the figure) are respectively generated, and the coordinate positions of the trajectories are feature point trajectories with different smoothing levels. As information, it outputs to the direction feature-value production | generation means 212 and the speed feature-value production | generation means 213.
Returning to FIG. 1, the description of the configuration of the human motion detection device 1 will be continued.

方向特徴量生成手段２１２は、平滑化手段２１１で多段階に平滑化された特徴点軌跡情報に含まれる特徴点の位置に基づいて、特徴点が移動する方向について固定次元（固定長）の特徴量（方向特徴量）を生成するものである。なお、この方向特徴量は、時間特徴量を構成する特徴量である。
この方向特徴量生成手段２１２は、平滑化手段２１１で生成された各平滑化レベルの軌跡について、フレーム画像上における特徴点が移動する角度（移動ベクトルの向き）を一定の角度幅ごとに累計（ヒストグラム化）することで、方向特徴量を生成する。
すなわち、方向特徴量生成手段２１２は、ヒストグラムのビン幅（角度幅）をθとしたとき、［０，θ），［θ，２θ），…，［２π−θ，２π）ごとに、特徴点が移動する角度を累計する。ここで、［ａ，ｂ）は、ａ以上ｂ未満を示す。
なお、このとき、方向特徴量生成手段２１２は、ヒストグラムのビン幅（角度幅）が異なる複数のヒストグラムを生成することとする。 The direction feature quantity generation unit 212 has a fixed dimension (fixed length) feature in the direction in which the feature point moves based on the position of the feature point included in the feature point trajectory information smoothed in multiple stages by the smoothing unit 211. A quantity (direction feature quantity) is generated. Note that this direction feature amount is a feature amount constituting a time feature amount.
This directional feature quantity generation means 212 accumulates the angles (directions of the movement vectors) at which the feature points move on the frame image with respect to each smoothing level trajectory generated by the smoothing means 211 for each fixed angular width ( By generating a histogram, a direction feature amount is generated.
That is, the direction feature quantity generation unit 212 has a feature point for each [0, θ), [θ, 2θ),..., [2π−θ, 2π), where θ is the bin width (angle width) of the histogram. Accumulate the angle that moves. Here, [a, b) represents not less than a and less than b.
At this time, the direction feature quantity generation unit 212 generates a plurality of histograms having different bin widths (angle widths) of the histograms.

具体的には、方向特徴量生成手段２１２は、“０”〜“２π”の角度を、４分割、８分割および１６分割したビン幅に設定し、各平滑化レベルの軌跡のヒストグラムを生成する。例えば、“０”〜“２π”の角度を４分割したビン幅“π／２”のヒストグラムを生成する場合、［０，π／２），［π／２，π），［π，３π／２），［３π／２，２π）ごとに角度を累計する。 Specifically, the direction feature value generation unit 212 sets the angles from “0” to “2π” to bin widths that are divided into four, eight, and sixteen, and generates a histogram of the locus of each smoothing level. . For example, when generating a histogram of bin width “π / 2” obtained by dividing the angle “0” to “2π” into four, [0, π / 2), [π / 2, π), [π, 3π / 2) The angle is accumulated every [3π / 2, 2π).

例えば、図３で説明した３段階の平滑化レベルで平滑化した特徴点の軌跡に対して、それぞれ、３つの異なるビン幅で特徴点が移動する角度をヒストグラム化した例を図４に示す。図４に示すように、方向特徴量生成手段２１２は、ビン幅“π／２”（ビン数“４”），“π／４”（ビン数“８”），“π／８”（ビン数“１６”）のヒストグラムを、それぞれ、平滑化レベル数（ここでは、“３”）分生成することで、８４（ビン数（４＋８＋１６）×平滑化レベル数（３））次元の固定の特徴量（方向特徴量：方向特徴量ヒストグラム）を生成する。 For example, FIG. 4 shows an example in which the angle at which the feature point moves with three different bin widths is histogrammed with respect to the trajectory of the feature point smoothed at the three smoothing levels described in FIG. As shown in FIG. 4, the direction feature value generation unit 212 has bin widths “π / 2” (bin number “4”), “π / 4” (bin number “8”), “π / 8” (bin The number of smoothing levels (here, “3”) are generated for each of the histograms of the number “16”), so that a fixed feature of 84 (number of bins (4 + 8 + 16) × number of smoothing levels (3)) dimensions is obtained. A quantity (direction feature quantity: direction feature quantity histogram) is generated.

速度特徴量生成手段２１３は、平滑化手段２１１で多段階に平滑化された特徴点軌跡情報に含まれる特徴点の位置に基づいて、特徴点が移動する速度について固定次元（固定長）の特徴量（速度特徴量）を生成するものである。この特徴点の軌跡はフレーム画像ごとに追跡されているため、特徴点の速度は、フレーム画像上における特徴点の移動ベクトルの長さを用いればよい。ここでは、移動ベクトルの水平方向の長さと、垂直方向の長さとから、それぞれ速度特徴量を生成することとする。なお、この速度特徴量は、時間特徴量を構成する特徴量である。 The speed feature quantity generation means 213 has a fixed dimension (fixed length) feature for the speed at which the feature point moves based on the position of the feature point included in the feature point trajectory information smoothed in multiple stages by the smoothing means 211. A quantity (speed feature quantity) is generated. Since the trajectory of this feature point is tracked for each frame image, the speed of the feature point may be the length of the movement vector of the feature point on the frame image. Here, velocity feature quantities are generated from the horizontal length and the vertical length of the movement vector, respectively. Note that this speed feature amount is a feature amount constituting a time feature amount.

この速度特徴量生成手段２１３は、平滑化手段２１１で生成された各平滑化レベルの軌跡について、フレーム画像上における特徴点が移動する速度（移動ベクトルの大きさ〔水平方向の長さ，垂直方向の長さ〕）を一定の速度幅ごとに累計（ヒストグラム化）することで、速度特徴量を生成する。 The speed feature quantity generation means 213 uses the speed at which the feature point moves on the frame image (the size of the movement vector [the length in the horizontal direction, the length in the vertical direction) for each smoothing level trajectory generated by the smoothing means 211. The speed feature amount is generated by accumulating (histogram-izing) the length of each]] for each constant speed width.

なお、速度特徴量生成手段２１３は、方向特徴量生成手段２１２と同様に、ビン幅の異なる複数のヒストグラムを生成することとする。
具体的には、速度特徴量生成手段２１３は、例えば、水平方向の速度について特徴量を生成する場合、特徴点軌跡情報に基づいて、水平速度が最も遅い、すなわち、移動ベクトルの水平方向の長さが最も短い速度（長さ）をヒストグラムの最小値ｖ_ｓとする。また、水平速度が最も早い、すなわち、移動ベクトルの水平方向の長さが最も長い速度（長さ）をヒストグラムの最大値ｖ_ｆとする。 Note that the speed feature quantity generation unit 213 generates a plurality of histograms having different bin widths, similarly to the direction feature quantity generation unit 212.
Specifically, for example, when generating the feature value for the horizontal speed, the speed feature value generation unit 213 has the slowest horizontal speed based on the feature point trajectory information, that is, the length of the movement vector in the horizontal direction. The shortest speed (length) is defined as the minimum value v _{s of the} histogram. Further, the speed (length) having the fastest horizontal speed, that is, the longest horizontal length of the movement vector is defined as the maximum value v _{f of the} histogram.

そして、速度特徴量生成手段２１３は、ｖ_ｓ〜ｖ_ｆの速度を、４分割、８分割および１６分割したビン幅に設定し、各平滑化レベルの軌跡のヒストグラムを生成する。例えば、ｖ_ｓ〜ｖ_ｆの速度を４分割したビン幅“｛ｖ_ｆ−ｖ_ｓ｝／４”のヒストグラムを生成する場合、［ｖ_ｓ，ｖ_ｓ＋｛ｖ_ｆ−ｖ_ｓ｝／４），［ｖ_ｓ＋｛ｖ_ｆ−ｖ_ｓ｝／４，ｖ_ｓ＋｛ｖ_ｆ−ｖ_ｓ｝／２），［ｖ_ｓ＋｛ｖ_ｆ−ｖ_ｓ｝／２，ｖ_ｓ＋３×｛ｖ_ｆ−ｖ_ｓ｝／４），［ｖ_ｓ＋３×｛ｖ_ｆ−ｖ_ｓ｝／４，ｖ_ｆ］ごとに速度を累計する。ここで、［ａ，ｂ）は、ａ以上ｂ未満の範囲を示し、［ａ，ｂ］は、ａ以上ｂ以下の範囲を示す。
また、速度特徴量生成手段２１３は、垂直方向の速度についても水平方向と同様に、ヒストグラムを生成する。 Then, the speed feature quantity generation unit 213 sets the speeds of v _{s to} v _f to bin widths that are divided into 4, 8, and 16, and generates a histogram of the trajectory of each smoothing level. For example, when generating a histogram of bin width “{v _f −v _s } / 4” obtained by dividing the speed of v _{s to} v _f into four, [v _s , v _s + {v _f −v _s } / 4) , [V _s + {v _f −v _s } / 4, v _s + {v _f −v _s } / 2), [v _s + {v _f −v _s } / 2, v _s + 3 × {v _f −v _s } / 4) and [v _s + 3 × {v _f −v _s } / 4, v _f ] are accumulated. Here, [a, b) represents a range from a to b, and [a, b] represents a range from a to b.
Also, the speed feature quantity generation unit 213 generates a histogram for the speed in the vertical direction as in the horizontal direction.

例えば、図３で説明した３段階の平滑化レベルで平滑化した特徴点の軌跡に対して、それぞれ、３つの異なるビン幅で特徴点が移動する速度をヒストグラム化した例を図５に示す。
図５に示すように、速度特徴量生成手段２１３は、水平方向および垂直方向の速度特徴量として、方向特徴量生成手段２１２と同様に、それぞれ８４次元の固定の特徴量を生成する。すなわち、速度特徴量生成手段２１３は、水平方向および垂直方向の速度特徴量として、１６８次元（８４×２）の固定次元の特徴量（速度特徴量：速度特徴量ヒストグラム）を生成する。
このように、速度特徴量生成手段２１３は、特徴点の軌跡の時間長に依存せずに、固定次元（固定長）の速度特徴量を生成することができる。 For example, FIG. 5 shows an example in which the speed at which the feature point moves with three different bin widths is histogrammed with respect to the trajectory of the feature point smoothed at the three smoothing levels described in FIG.
As shown in FIG. 5, the speed feature quantity generation unit 213 generates 84-dimensional fixed feature quantities as the speed feature quantity in the horizontal direction and the vertical direction, in the same manner as the direction feature quantity generation unit 212. That is, the speed feature quantity generation unit 213 generates a 168-dimensional (84 × 2) fixed dimension feature quantity (speed feature quantity: speed feature quantity histogram) as the speed feature quantity in the horizontal direction and the vertical direction.
As described above, the speed feature quantity generation unit 213 can generate a speed feature quantity of a fixed dimension (fixed length) without depending on the time length of the trajectory of the feature point.

空間特徴量生成手段２２は、特徴点軌跡情報生成手段１０で生成された特徴点軌跡情報に含まれる特徴点の位置に基づいて、特徴点の軌跡から、空間方向の多次元の特徴量（空間特徴量）を生成するものである。すなわち、空間特徴量生成手段２２は、フレーム画像上の特徴点の特徴量をアピアランス（外観）特徴として生成するものである。この空間特徴量は、特徴点の軌跡の空間方向の特徴を示す軌跡特徴量となる。 The spatial feature quantity generation unit 22 uses a feature point trajectory based on the position of the feature point included in the feature point trajectory information generated by the feature point trajectory information generation unit 10 to generate a multidimensional feature quantity (space) in the spatial direction. Feature amount). That is, the spatial feature value generation means 22 generates feature values of feature points on the frame image as appearance (appearance) features. This spatial feature amount is a trajectory feature amount indicating a feature in the spatial direction of the trajectory of the feature point.

この空間特徴量生成手段２２は、フレーム画像上における特徴点の特徴量を生成するものであって、一般的な特徴量表現によって固定長の特徴量を生成することができる。例えば、この特徴量として、ＳＵＲＦ（Speeded Up Robust Features）特徴量、ＳＩＦＴ（Scale-Invariant Feature Transform）特徴量等を用いることができる。 The spatial feature value generation means 22 generates feature values of feature points on the frame image, and can generate fixed-length feature values by general feature value expression. For example, a SURF (Speeded Up Robust Features) feature amount, a SIFT (Scale-Invariant Feature Transform) feature amount, or the like can be used as the feature amount.

この特徴量としてＳＵＲＦ特徴量を用いる場合、空間特徴量生成手段２２は、特徴点で、Ｈａａｒウェブレットによって、最も支配的な輝度の傾き方向（輝度勾配：dominant rotation）を求める。そして、空間特徴量生成手段２２は、最も支配的な方向を基準に、特徴点近傍の予め定めた１６個のブロック内で、それぞれ輝度勾配の方向の総和（水平Σｄｘ、垂直Σｄｙ）と大きさの総和（水平Σ｜ｄｘ｜、垂直Σ｜ｄｙ｜）の４つの値を特徴量として算出する。
すなわち、空間特徴量生成手段２２は、特徴点ごとに、ＳＵＲＦ特徴量として、６４次元（１６×４）の特徴量を算出する。
また、特徴量としてＳＩＦＴ特徴量を用いる場合であれば、空間特徴量生成手段２２は、特徴点ごとに、１２８次元の特徴量を算出することとする。 When the SURF feature value is used as this feature value, the spatial feature value generating unit 22 obtains the most dominant luminance inclination direction (luminance gradient: dominant rotation) by the Haar weblet at the feature point. Then, the spatial feature value generation means 22 is based on the most dominant direction as a reference, and the sum (horizontal Σdx, vertical Σdy) and magnitude of the direction of the luminance gradient in 16 predetermined blocks near the feature point, respectively. The four values of the sum total (horizontal Σ | dx |, vertical Σ | dy |) are calculated as feature amounts.
That is, the spatial feature value generation unit 22 calculates a 64-dimensional (16 × 4) feature value as the SURF feature value for each feature point.
If a SIFT feature value is used as the feature value, the spatial feature value generating unit 22 calculates a 128-dimensional feature value for each feature point.

ここでは、空間特徴量生成手段２２は、軌跡上のすべての特徴点において、対応するフレーム画像からＳＵＲＦ特徴量（あるいはＳＩＦＴ特徴量）を抽出し、軌跡ごとに平均化することで、当該特徴点における空間特徴量を生成する。なお、このＳＵＲＦ特徴量（ＳＩＦＴ特徴量）は、アピアランス特徴であるため、必ずしも特徴点の時間方向に対応した軌跡すべてについて特徴量に対して演算を行う必要はない。例えば、空間特徴量生成手段２２は、特徴点の軌跡の中で、軌跡の始点、終点または中間点について代表して特徴量を生成することとしてもよい。 Here, the spatial feature value generating means 22 extracts the SURF feature value (or SIFT feature value) from the corresponding frame image at all feature points on the trajectory, and averages each trajectory, thereby obtaining the feature point. Generate spatial features in. Since the SURF feature value (SIFT feature value) is an appearance feature, it is not always necessary to calculate the feature value for all the trajectories corresponding to the time direction of the feature points. For example, the spatial feature value generation unit 22 may generate a feature value as a representative of the start point, end point, or intermediate point of the trajectory of the trajectory of the feature point.

この空間特徴量生成手段２２は、時間特徴量生成手段２１で生成された時間特徴量（方向特徴量および速度特徴量）に空間特徴量を付加することで、軌跡特徴量を生成し、動作識別手段３０に出力することとする。 The spatial feature generating unit 22 adds a spatial feature to the temporal feature (direction feature and velocity feature) generated by the temporal feature generating unit 21, thereby generating a trajectory feature and identifying an action. The data is output to the means 30.

このように、特徴量抽出手段２０は、時間特徴量生成手段２１で生成された固定次元の時間特徴量（方向特徴量〔本実施形態では８４次元〕、速度特徴量〔本実施形態では１６８次元〕）と、空間特徴量生成手段２２で生成された固定次元の空間特徴量（本実施形態では６４次元〔ＳＵＲＦ特徴量の場合〕）とで、人物の動作時間が可変であっても、固定次元（固定長）の軌跡特徴量を特徴点の軌跡ごとに生成（抽出）する。
ここでは、特徴量抽出手段２０は、特徴点の軌跡ごとに、固定次元の軌跡特徴量（時間特徴量および空間特徴量）を、当該軌跡の終了時間、すなわち、人物の動作が完了した時間（例えば、軌跡の最終フレーム番号）とともに、動作識別手段３０に出力する。 As described above, the feature quantity extracting unit 20 is configured to generate the fixed dimension temporal feature quantity (direction feature quantity (84 dimensions in this embodiment)) and speed feature quantity (in this embodiment, 168 dimensions) generated by the temporal feature quantity generation means 21. )) And a fixed dimension spatial feature quantity generated by the spatial feature quantity generation means 22 (in this embodiment, 64 dimensions [in the case of a SURF feature quantity]), even if the motion time of the person is variable, it is fixed. A dimension (fixed length) trajectory feature amount is generated (extracted) for each trajectory of feature points.
Here, the feature quantity extraction means 20 calculates the fixed dimension of the trajectory feature quantity (time feature quantity and spatial feature quantity) for each trajectory of the feature point, that is, the end time of the trajectory, that is, the time when the action of the person is completed ( For example, the final frame number of the trajectory) is output to the action identifying means 30.

動作識別手段３０は、後記する学習データ記憶手段４０に記憶されている学習データを参照して、特徴量抽出手段２０で抽出された所定時間区間内に軌跡の終点が存在する多次元（固定次元）の軌跡特徴量から、人物の動作を識別するものである。この動作識別手段３０は、動作ごとに予め求めた軌跡特徴量に近似するか否かを順次判定するＩｆ−Ｔｈｅｎルールに基づく手法や、機械学習のサポートベクターマシン（ＳＶＭ）に基づく手法など、一般的な手法を用いることができる。ここでは、動作識別手段３０は、多次元の軌跡特徴量を１つの単語（以下、軌跡単語ともいう）とみなし、「Ｂａｇ−ｏｆ−ｗｏｒｄｓ」手法を用いて動作を識別する。 The motion identification unit 30 refers to learning data stored in the learning data storage unit 40 to be described later, and is multidimensional (fixed dimension) in which the end point of the locus exists within the predetermined time interval extracted by the feature amount extraction unit 20. ) To identify the action of the person. This motion identification means 30 is a general method such as a method based on an If-Then rule for sequentially determining whether or not to approximate a trajectory feature amount obtained in advance for each motion, and a method based on a machine learning support vector machine (SVM). Can be used. Here, the motion identification unit 30 regards the multidimensional trajectory feature quantity as one word (hereinafter also referred to as a trajectory word), and identifies the motion using the “Bag-of-words” method.

この動作識別手段３０は、学習手段３１と、動作判定手段３２と、を備えている。なお、動作識別手段３０は、図示を省略した入力手段を介して動作モードが設定されることで、学習データを学習する「学習フェーズ」と、映像から人物の動作を検出する「動作検出フェーズ」との２つのフェーズで動作し、「学習フェーズ」においては、学習手段３１が動作し、「動作検出フェーズ」においては、動作判定手段３２が動作するものとする。 The motion identification unit 30 includes a learning unit 31 and a motion determination unit 32. In addition, the action identifying unit 30 is configured to set an operation mode via an input unit (not shown), so that a “learning phase” in which learning data is learned and a “motion detection phase” in which a person's action is detected from a video. The learning unit 31 operates in the “learning phase”, and the operation determination unit 32 operates in the “operation detection phase”.

学習手段３１は、特徴量抽出手段２０で抽出された予め人物が動作した際の映像における軌跡特徴量から、動作ごとの軌跡特徴量の分布を学習するものである。ここでは、学習手段３１は、コードブック生成手段３１１と、ヒストグラム生成手段３１２と、を備えている。 The learning unit 31 learns the distribution of the trajectory feature amount for each operation from the trajectory feature amount in the video when the person moves in advance extracted by the feature amount extraction unit 20. Here, the learning unit 31 includes a code book generating unit 311 and a histogram generating unit 312.

コードブック生成手段３１１は、種々の動作を撮影した映像から、特徴量抽出手段２０で抽出された軌跡特徴量（軌跡単語）を入力し、複数の軌跡単語を、予め定めた数（ｋ個）のクラスタにクラスタリングすることで、コードブックを生成するものである。
このコードブックは、複数の軌跡単語を、その特徴（多次元の特徴量）に基づいて、予め定めたｋ個（例えば、１０００個）に分類した単語辞書である。
このコードブック生成手段３１１におけるクラスタリングは、例えば、Ｋ平均法（Ｋ−ｍｅａｎｓ法）を用いて行うことができる。
このコードブック生成手段３１１は、ｋ個のクラスタに分類された複数の軌跡単語からなるコードブックを、学習データ記憶手段４０に書き込み記憶する。 The code book generating unit 311 inputs the trajectory feature amount (trajectory word) extracted by the feature amount extracting unit 20 from the video obtained by photographing various actions, and sets a predetermined number (k) of trajectory words. The code book is generated by clustering into clusters.
This code book is a word dictionary in which a plurality of trajectory words are classified into k pieces (for example, 1000 pieces) determined in advance based on their features (multidimensional feature amounts).
The clustering in the code book generating unit 311 can be performed using, for example, a K-means method (K-means method).
The code book generating unit 311 writes and stores a code book composed of a plurality of trajectory words classified into k clusters in the learning data storage unit 40.

なお、このコードブック生成手段３１１がコードブックを生成するために用いる映像は、特に限定するものではないが、例えば、人物動作検出装置１が、固定カメラで人物の動作を検出する場合、予め定めた位置に設置したカメラで数日間撮影した映像である。
また、ここでは、コードブック生成手段３１１は、予め定めた時間長（所定時間区間）のシーケンス（例えば、１秒〔２５フレーム相当〕）に軌跡が終了した複数の軌跡単語について、当該シーケンスを１ドキュメント（文書）として、当該ドキュメントに含まれる軌跡単語およびそのクラスタをドキュメントごとに学習データ記憶手段４０に書き込み記憶しておくこととする。このドキュメントは、後記する動作判定手段３２の重み付きヒストグラム生成手段３２１において、軌跡単語の重要度を算出する際に使用される。 The video used by the code book generating unit 311 to generate the code book is not particularly limited. For example, when the human motion detection device 1 detects a human motion with a fixed camera, it is determined in advance. This is a video taken for several days with a camera installed at a certain position.
Further, here, the code book generating means 311 applies the sequence 1 for a plurality of trajectory words whose trajectories have ended in a sequence of a predetermined time length (predetermined time interval) (for example, 1 second [corresponding to 25 frames]). As a document (document), a trajectory word and its cluster included in the document are written and stored in the learning data storage means 40 for each document. This document is used when the importance of the locus word is calculated in the weighted histogram generation means 321 of the action determination means 32 described later.

ヒストグラム生成手段３１２は、予め定めた動作を撮影した映像から、特徴量抽出手段２０で抽出された複数の軌跡特徴量（軌跡単語）を入力し、当該動作における軌跡単語の出現頻度の分布（ヒストグラム）を生成するものである。
このヒストグラム生成手段３１２は、予め既知の動作において、複数の軌跡単語のそれぞれを、コードブック生成手段３１１で生成されたコードブックのｋ個のクラスタの中で、距離（ユークリッド距離）が最も近いクラスタに分類し、ｋ個のビン数からなるヒストグラムを生成する。 The histogram generation means 312 receives a plurality of trajectory feature quantities (trajectory words) extracted by the feature quantity extraction means 20 from a video obtained by photographing a predetermined action, and a distribution of appearance frequencies of trajectory words in the action (histogram). ).
This histogram generation means 312 is a cluster in which the distance (Euclidean distance) is the closest among the k clusters of the codebook generated by the codebook generation means 311 in a known operation in advance. And a histogram composed of k bins is generated.

なお、ヒストグラム生成手段３１２は、ヒストグラムを正規化することとする。すなわち、ヒストグラム生成手段３１２は、クラスタごとに累計した度数の合計値が、“１．０”となるように、それぞれのクラスタの度数を正規化する。これによって、軌跡の数によらずに、１つの動作を同一の基準で表現することができ、動作検出を容易かつ頑健に行うことが可能になる。
このように、ヒストグラム生成手段３１２は、既知の動作において作成したヒストグラムを、動作に対応付けて、学習データ記憶手段４０に書き込み記憶する。 Note that the histogram generation unit 312 normalizes the histogram. That is, the histogram generation unit 312 normalizes the frequencies of the respective clusters so that the total value of the frequencies accumulated for each cluster becomes “1.0”. This makes it possible to express one motion on the same basis regardless of the number of trajectories, and to easily and robustly detect the motion.
As described above, the histogram generation unit 312 writes and stores the histogram created in the known operation in the learning data storage unit 40 in association with the operation.

ここで、図６および図７を参照（適宜図１参照）して、学習手段３１が、「学習フェーズ」において行う学習の手法について模式的に説明する。なお、軌跡単語は、実際は多次元の特徴量であるが、図６および図７では、模式的に軌跡形状で示す。
まず、学習手段３１は、図６（ａ）に示すように、特徴量抽出手段２０で抽出された複数の多次元の軌跡特徴量（軌跡単語Ｗ_１，Ｗ_２，…，Ｗ_ｎ）を入力し、学習データ記憶手段４０に書き込む。その後、学習手段３１は、コードブック生成手段３１１によって、図６（ｂ）に示すように、複数の軌跡単語Ｗ_１，Ｗ_２，…，Ｗ_ｎを、例えば、Ｋ平均法により特徴量に基づいてｋ個のクラスタ（Ｃ_１，Ｃ_２，…，Ｃ_ｋ）に分類する。このように、コードブック生成手段３１１は、ｋ個のクラスタに分類された軌跡単語の辞書であるコードブックＣＢを生成する。 Here, with reference to FIG. 6 and FIG. 7 (refer to FIG. 1 as appropriate), a learning method performed by the learning unit 31 in the “learning phase” will be schematically described. Note that the trajectory word is actually a multidimensional feature quantity, but is schematically shown as a trajectory shape in FIGS. 6 and 7.
First, as shown in FIG. 6A, the learning unit 31 inputs a plurality of multidimensional trajectory feature amounts (trajectory words W ₁ , W ₂ ,..., W _n ) extracted by the feature amount extraction unit 20. Then, it is written in the learning data storage means 40. Then, the learning unit 31, based on the codebook generator 311, as shown in FIG. 6 (b), a plurality of loci words _W _1, W 2, ..., the _{W n,} for example, the feature amount by the K-means method Into k clusters (C ₁ , C ₂ ,..., C _k ). In this way, the code book generating unit 311 generates a code book CB that is a dictionary of trajectory words classified into k clusters.

そして、学習手段３１は、図７（ａ）に示すように、既知の動作によって特徴量抽出手段２０で抽出された複数の多次元の軌跡特徴量（軌跡単語ｗ_１，ｗ_２，…，ｗ_ｎ）を入力する。そして、学習手段３１は、ヒストグラム生成手段３１２によって、軌跡単語ｗ_１，ｗ_２，…，ｗ_ｎのそれぞれが、図６（ｂ）に示したコードブックＣＢのどのクラスタ（Ｃ_１，Ｃ_２，…，Ｃ_ｋ）に属するかを分類し、クラスタごとにその属する数（度数）を求め、図７（ｂ）に示すように、ヒストグラムＨを生成する。なお、このヒストグラムＨは、度数の合計値が“１．０”となるように、それぞれのクラスタの度数を正規化しておくこととする。
このように、学習手段３１は、ヒストグラム生成手段３１２によって、既知の動作ごとに、ヒストグラムＨを生成することで、学習データを生成する。
図１に戻って、人物動作検出装置１の構成について説明を続ける。 Then, as shown in FIG. 7A, the learning unit 31 includes a plurality of multidimensional trajectory feature amounts (trajectory words w ₁ , w ₂ ,..., W extracted by the feature amount extracting unit 20 by known operations. _n ). Then, the learning unit 31, the histogram generation unit 312, the locus words _w _1, w 2, ..., each _{w n} is the codebook CB throat clusters shown in FIG. _{_{6 (b) (C 1,}} C 2, .., C _k ), and the number (frequency) belonging to each cluster is obtained, and a histogram H is generated as shown in FIG. In this histogram H, the frequencies of the respective clusters are normalized so that the total value of the frequencies becomes “1.0”.
As described above, the learning unit 31 generates learning data by generating the histogram H for each known operation by the histogram generation unit 312.
Returning to FIG. 1, the description of the configuration of the human motion detection device 1 will be continued.

動作判定手段３２は、学習データ記憶手段４０に記憶されている学習データを参照して、特徴量抽出手段２０で抽出された多次元（固定次元）の軌跡特徴量（軌跡単語）から、人物の動作を判定するものである。
この動作判定手段３２は、予め定めた時間長（所定時間区間）のシーケンス（例えば、１秒〔２５フレーム相当〕）に軌跡の終点が存在する複数の軌跡単語について、学習データ記憶手段４０に記憶されている学習データを参照して動作を判定する。このように、所定時間区間内に軌跡が終了した複数の軌跡単語は、動作が完了した一連の動作の特徴を示すことになる。なお、このシーケンスの時間長は、任意に定めることができる。
ここでは、動作判定手段３２は、重み付きヒストグラム生成手段３２１と、分類手段３２２と、を備えている。 The motion determination unit 32 refers to the learning data stored in the learning data storage unit 40 and uses the multidimensional (fixed dimension) trajectory feature amount (trajectory word) extracted by the feature amount extraction unit 20 to determine the person's character. The operation is determined.
The motion determination unit 32 stores, in the learning data storage unit 40, a plurality of trajectory words whose trajectory end points exist in a sequence of a predetermined time length (predetermined time interval) (for example, 1 second [corresponding to 25 frames]). The operation is determined with reference to the learned data. In this way, a plurality of trajectory words whose trajectories have ended within a predetermined time interval indicate characteristics of a series of motions in which the motion has been completed. The time length of this sequence can be arbitrarily determined.
Here, the operation determination unit 32 includes a weighted histogram generation unit 321 and a classification unit 322.

重み付きヒストグラム生成手段（重み付き分布生成手段）３２１は、特徴量抽出手段２０で抽出された１シーケンス内の軌跡特徴量（軌跡単語）を入力し、当該シーケンスにおける軌跡単語の出現頻度の分布（ヒストグラム）を生成するものである。なお、重み付きヒストグラム生成手段３２１は、軌跡単語の重要度に基づいて、ヒストグラムの出現頻度に対して重み付けを行う。 A weighted histogram generation unit (weighted distribution generation unit) 321 inputs a trajectory feature amount (trajectory word) in one sequence extracted by the feature amount extraction unit 20, and a distribution of the appearance frequency of the trajectory word in the sequence ( Histogram). The weighted histogram generation means 321 weights the appearance frequency of the histogram based on the importance of the trajectory word.

すなわち、重み付きヒストグラム生成手段３２１は、１シーケンス内の複数の軌跡単語のそれぞれを、学習データ記憶手段４０に記憶されているコードブックのｋ個のクラスタの中で、距離（ユークリッド距離）が最も近いクラスタに分類し、ｋ個のビン数からなるヒストグラムを生成する。
また、重み付きヒストグラム生成手段３２１は、１シーケンス内の軌跡単語を、１つのドキュメント（文書）とみなし、全ドキュメントにおける軌跡単語の重要度を、ｔｆ−ｉｄｆ法を用いて算出し、当該軌跡単語が属するクラスタの出現頻度に重要度を乗算することで、ヒストグラム（クラスタの分布）に重みを付加する。ここで、全ドキュメントとは、学習手段３１によって、予め学習フェーズにおいて、種々の動作を撮影した複数の映像から収集したドキュメントを指す。 That is, the weighted histogram generation unit 321 determines the distance (Euclidean distance) of each of a plurality of trajectory words in one sequence among the k clusters of the code book stored in the learning data storage unit 40. Classify into close clusters and generate a histogram consisting of k bins.
Further, the weighted histogram generation means 321 regards the trajectory word in one sequence as one document (document), calculates the importance of the trajectory word in all documents using the tf-idf method, and calculates the trajectory word. By multiplying the frequency of appearance of the cluster to which the number belongs by the importance, a weight is added to the histogram (cluster distribution). Here, the whole document refers to a document collected from a plurality of videos obtained by photographing various operations by the learning unit 31 in advance in the learning phase.

なお、重み付きヒストグラム生成手段３２１は、クラスタごとに累計した度数の合計値が、“１．０”となるように、それぞれのクラスタの度数を正規化する。これによって、学習データ記憶手段４０に記憶されている学習データとの対比を同一の基準で行うことができる。
このように生成された軌跡単語の出現頻度の分布（ヒストグラム）は、分類手段３２２に出力される。 The weighted histogram generation unit 321 normalizes the frequency of each cluster so that the total value of the frequency accumulated for each cluster becomes “1.0”. Thereby, the comparison with the learning data memorize | stored in the learning data memory | storage means 40 can be performed on the same reference | standard.
The distribution (histogram) of the appearance frequency of the trajectory word generated in this way is output to the classification means 322.

ここで、重み付きヒストグラム生成手段３２１が、ｔｆ−ｉｄｆ法により重要度を算出する手法について、数式を用いて具体的に説明する。
ここでは、重み付きヒストグラム生成手段３２１は、予め定めた時間長（例えば、１秒）のシーケンス内に軌跡の終点が存在する複数の軌跡単語について、その軌跡単語が属するクラスタごとの重要度を算出する。
すなわち、重み付きヒストグラム生成手段３２１は、軌跡単語が属するクラスタｘのドキュメントｄ内における重要度ｗ_ｘｄを、以下の（９）式に示すｔｆ_ｘｄ値とｉｄｆ_ｘ値の積により算出する。 Here, a method in which the weighted histogram generation unit 321 calculates importance by the tf-idf method will be specifically described using mathematical expressions.
Here, the weighted histogram generation means 321 calculates importance for each cluster to which the trajectory word belongs for a plurality of trajectory words in which the end point of the trajectory exists in a sequence having a predetermined time length (for example, 1 second). To do.
That is, the weighted histogram generation unit 321 calculates the importance w _xd in the document d of the cluster x to which the locus word belongs by the product of the tf _xd value and the idf _x value shown in the following equation (9).

この（９）式のｉｄｆ_ｘ値は、全ドキュメントにおけるクラスタｘを含むドキュメントの頻度の逆数の対数で、以下の（１０）式で表される。 The idf _x value of the equation (9) is a logarithm of the reciprocal of the frequency of documents including the cluster x in all documents, and is expressed by the following equation (10).

ここで、Ｎは、全ドキュメント数、ｎ_ｘは、全ドキュメントにおいて、クラスタｘを含むドキュメント数である。このように、ｉｄｆ_ｘ値は、クラスタｘを含むドキュメント頻度に反比例する。
また、（９）式のｔｆ_ｘｄ値は、あるドキュメントｄにおけるクラスタｘの頻度で、以下の（１１）式で表される。 Here, N, the total number of documents, n _x, in all documents, a number of documents including cluster x. Thus, the idf _x value is inversely proportional to the frequency of documents containing cluster x.
Further, the tf _xd value in the equation (9) is the frequency of the cluster x in a certain document d and is expressed by the following equation (11).

ここで、ＯＣ_ｘｄは、あるドキュメントｄにおけるクラスタｘの数、Ｗは、ドキュメントｄ内の軌跡単語の集合である。また、ＯＣ_ｉｄは、その軌跡単語の集合における軌跡単語ｉ（クラスタ）の数である。
このように、重み付きヒストグラム生成手段３２１は、ｔｆ−ｉｄｆ法により軌跡単語が属するクラスタの重要度を算出して、ヒストグラムを生成するため、頻繁に発生する背景領域上の軌跡単語の重要度を下げ、特定のシーケンスで頻繁に発生する軌跡単語の重要度を高めることができる。なお、特徴点軌跡情報生成手段１０において、前景の特徴点から軌跡を抽出することとしているが、照明やノイズ等の原因で、背景において特徴点を追跡する場合もある。この場合、重み付きヒストグラム生成手段３２１によって、背景領域上の軌跡単語の重要度を下げることで、人物の動作をより適切に表したヒストグラムを生成することができる。
図１に戻って、人物動作検出装置１の構成について説明を続ける。 Here, OC _xd is the number of clusters x in a document d, and W is a set of trajectory words in the document d. OC _id is the number of trajectory words i (clusters) in the set of trajectory words.
In this way, the weighted histogram generation means 321 calculates the importance of the cluster to which the locus word belongs by the tf-idf method and generates a histogram. Therefore, the importance of the locus word on the background region that frequently occurs is calculated. The importance of trajectory words that frequently occur in a specific sequence can be increased. The feature point trajectory information generation unit 10 extracts the trajectory from the foreground feature points. However, the feature points may be traced in the background due to lighting or noise. In this case, the weighted histogram generation means 321 can generate a histogram that more appropriately represents the action of the person by reducing the importance of the trajectory word on the background region.
Returning to FIG. 1, the description of the configuration of the human motion detection device 1 will be continued.

分類手段３２２は、重み付きヒストグラム生成手段３２１で生成されたあるシーケンスにおける軌跡単語の出現頻度の分布（ヒストグラム）と、学習データ記憶手段４０に記憶されている学習データの動作ごとの分布（ヒストグラム）との距離に基づいて類似を判定し、当該シーケンスにおける人物の動作を予め定めた動作に分類するものである。 The classifying unit 322 includes a distribution (histogram) of appearance frequencies of trajectory words in a sequence generated by the weighted histogram generation unit 321 and a distribution (histogram) for each operation of learning data stored in the learning data storage unit 40. Similarity is determined on the basis of the distance between and the movement of the person in the sequence is classified into a predetermined movement.

すなわち、この分類手段３２２は、入力されたあるシーケンスにおけるヒストグラム（クラスタの分布）と、学習データのヒストグラム（クラスタの分布）との距離、例えば、ユークリッド距離が最も近いものを類似した動作として判定し、その類似した学習データのヒストグラムに対応する動作を、シーケンスにおける人物の動作として分類する。
この分類結果は、人物動作検出装置１における人物の動作検出結果として出力される。 In other words, the classification unit 322 determines that the distance between the histogram (cluster distribution) in the input sequence and the histogram (cluster distribution) of the learning data, for example, the closest Euclidean distance, is a similar operation. The operation corresponding to the histogram of the similar learning data is classified as the human operation in the sequence.
This classification result is output as a human motion detection result in the human motion detection device 1.

学習データ記憶手段４０は、事前の学習によって、予め定めた数のクラスタにクラスタリングされた軌跡特徴量の出現頻度の分布（ヒストグラム）と、人物の動作とを対応付けた学習データを記憶するものである。この学習データ記憶手段４０は、ハードディスク、半導体メモリ等の一般的な記憶媒体で構成することができる。 The learning data storage means 40 stores learning data that associates the distribution of appearance frequencies (histograms) of trajectory feature quantities clustered into a predetermined number of clusters and a person's action by prior learning. is there. The learning data storage means 40 can be composed of a general storage medium such as a hard disk or a semiconductor memory.

この学習データ記憶手段４０には、多次元の軌跡特徴量を１つの単語（軌跡単語）とみなした複数の軌跡単語を予め定めた数のクラスタにクラスタリングしたコードブックと、ある動作において発生する軌跡単語のクラスタごとの分布を、その動作と対応付けたヒストグラムとを、学習データとして記憶しておく。
さらに、学習データ記憶手段４０には、予め定めた時間長（所定時間区間）のシーケンスに軌跡の終点が存在する複数の軌跡単語について、当該シーケンスを１ドキュメント（文書）として、当該ドキュメントに含まれる軌跡単語およびそのクラスタをドキュメントごとに記憶しておく。 The learning data storage means 40 includes a codebook obtained by clustering a plurality of trajectory words in which a multidimensional trajectory feature amount is regarded as one word (trajectory word) into a predetermined number of clusters, and a trajectory generated in a certain operation. A histogram in which the distribution of each word cluster is associated with the operation is stored as learning data.
Further, the learning data storage means 40 includes, for a plurality of trajectory words having a trajectory end point in a sequence having a predetermined time length (predetermined time interval), the sequence as one document (document). The trajectory word and its cluster are stored for each document.

このように人物動作検出装置１を構成することで、人物動作検出装置１は、時間方向に可変長の特徴量を、固定長（固定次元）の軌跡特徴量として扱うことで、「Ｂａｇ−ｏｆ−ｗｏｒｄｓ」手法を用いて人物の動作検出を行うことができる。
なお、人物動作検出装置１は、一般的なコンピュータを前記した各手段として機能させるプログラム（人物動作検出プログラム）により動作させることができる。 By configuring the human motion detection device 1 in this manner, the human motion detection device 1 treats the variable length feature quantity in the time direction as a fixed length (fixed dimension) trajectory feature quantity, thereby enabling “Bag-of. -Words "technique can be used to detect human motion.
The person motion detection device 1 can be operated by a program (person motion detection program) that causes a general computer to function as each of the above-described means.

［人物動作検出装置の動作］
次に、図８〜図１０を参照して、本発明の実施形態に係る人物動作検出装置の動作について説明する。ここでは、人物動作検出装置１の動作を、「学習フェーズ」と、「動作検出フェーズ」とに分けて説明する。 [Operation of human motion detection device]
Next, the operation of the human motion detection device according to the embodiment of the present invention will be described with reference to FIGS. Here, the operation of the human motion detection device 1 will be described separately for the “learning phase” and the “motion detection phase”.

〔学習フェーズ（第１段階）〕
最初に、図８を参照（構成については適宜図１参照）して、人物動作検出装置１の学習フェーズ（第１段階）における動作について説明する。なお、図８の学習フェーズ（第１段階）は、種々の動作を撮影した複数の映像から、軌跡特徴量（軌跡単語）を抽出し、複数の軌跡単語を、予め定めた数（ｋ個）のクラスタにクラスタリングすることで、軌跡を分類する際に用いるコードブックを生成する動作である。 [Learning phase (first stage)]
First, the operation in the learning phase (first stage) of the human motion detection device 1 will be described with reference to FIG. In the learning phase (first stage) in FIG. 8, trajectory feature amounts (trajectory words) are extracted from a plurality of videos obtained by photographing various actions, and a predetermined number (k) of trajectory words are extracted. This is an operation of generating a code book used for classifying the trajectory by clustering into clusters.

まず、人物動作検出装置１は、特徴点軌跡情報生成手段１０によって、入力された映像から、特徴点の軌跡を示す特徴点軌跡情報を生成する。
すなわち、人物動作検出装置１は、前景領域抽出手段１１によって、入力された映像のフレーム画像ごとに、背景差分処理により、動きのある領域を前景領域として抽出する（ステップＳ１）。 First, in the human motion detection device 1, the feature point trajectory information generation unit 10 generates feature point trajectory information indicating the trajectory of the feature point from the input video.
That is, the human motion detection apparatus 1 extracts a moving area as a foreground area by background difference processing for each frame image of the input video by the foreground area extraction unit 11 (step S1).

また、人物動作検出装置１は、特徴点検出手段１２によって、入力された映像のフレームごとに、Ｈａｒｒｉｓオペレータ等の特徴点検出手法により、フレーム画像の特徴となる点（特徴点）を検出する（ステップＳ２）。このとき、特徴点検出手段１２は、ステップＳ１で前景領域と判定された領域以外の特徴点については破棄する。
そして、人物動作検出装置１は、特徴点追跡手段１３によって、ステップＳ２で検出された特徴点において、特徴量（例えば、輝度勾配）が類似する特徴点をフレームごと（時間方向）に追跡し、特徴点軌跡情報を生成する（ステップＳ３）。 In the human motion detection device 1, the feature point detection means 12 detects a point (feature point) that is a feature of the frame image by a feature point detection method such as a Harris operator for each frame of the input video ( Step S2). At this time, the feature point detection unit 12 discards the feature points other than the region determined as the foreground region in step S1.
Then, the human motion detection device 1 tracks the feature points having similar feature quantities (for example, luminance gradient) in the feature points detected in step S2 by the feature point tracking unit 13, for each frame (time direction). Feature point trajectory information is generated (step S3).

そして、人物動作検出装置１は、特徴量抽出手段２０の時間特徴量生成手段２１によって、ステップＳ３で生成された特徴点軌跡情報に基づいて、時間方向の多次元の特徴量（時間特徴量）を生成する。
すなわち、人物動作検出装置１は、平滑化手段２１１によって、ステップＳ３で生成された特徴点軌跡情報に記述されている特徴点の軌跡（座標）に対して、多段階の平滑化処理を行う（ステップＳ４）。このとき、例えば、平滑化手段２１１は、Ｈａａｒフィルタを２段階適用し、平滑化レベルが３段階の特徴点軌跡情報を生成する。 Then, the human motion detection device 1 uses the time feature quantity generation means 21 of the feature quantity extraction means 20 based on the feature point trajectory information generated in step S3, so that the time direction multidimensional feature quantity (time feature quantity) is obtained. Is generated.
That is, the human motion detection apparatus 1 performs multi-level smoothing processing on the trajectory (coordinates) of the feature point described in the feature point trajectory information generated in step S3 by the smoothing unit 211 ( Step S4). At this time, for example, the smoothing unit 211 applies the Haar filter in two stages, and generates feature point trajectory information having a smoothing level of three stages.

その後、人物動作検出装置１は、方向特徴量生成手段２１２によって、ステップＳ４で多段階に平滑化された特徴点軌跡情報に基づいて、フレーム画像上における特徴点が移動する角度（移動ベクトルの角度；０〜２π）を一定の角度幅ごとに累計（ヒストグラム化）することで、方向特徴量を生成する（ステップＳ５）。
このとき、方向特徴量生成手段２１２は、異なる角度幅（例えば、π／２，π／４，π／８）をビン幅として各角度の移動ベクトルを累計することでヒストグラムを生成する。 Thereafter, the human motion detection device 1 uses the direction feature quantity generation unit 212 to move the feature point on the frame image based on the feature point trajectory information smoothed in multiple stages in step S4 (the angle of the movement vector). , 0 to 2π) are accumulated for every fixed angle width (histogram), to generate a directional feature amount (step S5).
At this time, the direction feature value generation unit 212 generates a histogram by accumulating the movement vectors of the respective angles with different angular widths (for example, π / 2, π / 4, π / 8) as bin widths.

さらに、人物動作検出装置１は、速度特徴量生成手段２１３によって、ステップＳ４で多段階に平滑化された特徴点軌跡情報に基づいて、フレーム画像上における特徴点の移動速度を一定の速度幅ごとに累計（ヒストグラム化）することで、速度特徴量を生成する（ステップＳ６）。
このとき、速度特徴量生成手段２１３は、特徴点の速度として、フレーム画像上におけるフレームごとの特徴点の移動ベクトルの水平方向の長さおよび垂直方向の長さを用いることとする。また、速度特徴量生成手段２１３は、異なる速度幅をビン幅として各速度の移動ベクトルを累計することでヒストグラムを生成する。 Furthermore, the human motion detection device 1 determines the moving speed of the feature points on the frame image for each constant speed width based on the feature point trajectory information smoothed in multiple stages in step S4 by the speed feature quantity generation unit 213. The speed feature quantity is generated by accumulating (histogram) (step S6).
At this time, the speed feature quantity generation unit 213 uses the horizontal length and the vertical length of the movement vector of the feature point for each frame on the frame image as the speed of the feature point. Further, the speed feature quantity generation unit 213 generates a histogram by accumulating the movement vectors of the respective speeds using different speed widths as bin widths.

さらに、人物動作検出装置１は、空間特徴量生成手段２２によって、ステップＳ３で生成された特徴点軌跡情報に記述されている特徴点の軌跡（座標）に対して、空間方向の多次元の特徴量（空間特徴量；例えば、ＳＵＲＦ特徴量、ＳＩＦＴ特徴量等）を生成する（ステップＳ７）。
このとき、空間特徴量生成手段２２は、軌跡上のすべての特徴点において、対応するフレーム画像から特徴量（ＳＵＲＦ特徴量あるいはＳＩＦＴ特徴量）を抽出し、軌跡ごとに平均化する。
なお、このステップＳ５〜Ｓ７における各特徴量の生成は、必ずしもこの順番で行う必要はなく、並列処理で各特徴量を生成することとしてもよい。 Furthermore, the human motion detection device 1 uses the spatial feature quantity generation means 22 to perform multidimensional features in the spatial direction with respect to the trajectory (coordinates) of the feature points described in the feature point trajectory information generated in step S3. A quantity (spatial feature quantity; for example, SURF feature quantity, SIFT feature quantity, etc.) is generated (step S7).
At this time, the spatial feature quantity generation means 22 extracts the feature quantity (SURF feature quantity or SIFT feature quantity) from the corresponding frame image at all feature points on the trace, and averages it for each trace.
Note that the generation of the feature amounts in steps S5 to S7 is not necessarily performed in this order, and the feature amounts may be generated by parallel processing.

このように、特徴量抽出手段２０は、特徴点ごとに、時間方向に固定長の時間特徴量（方向特徴量，速度特徴量）と、空間方向に固定長の空間特徴量とからなる軌跡特徴量を生成する。これによって、人物動作検出装置１は、軌跡の長さが時間方向に可変であっても、固定長の多次元の軌跡特徴量で軌跡を表現することができる。これによって、人物動作検出装置１は、多次元の軌跡特徴量１つの単語（軌跡単語）として扱うことができる。 In this way, the feature quantity extraction means 20 is a trajectory feature comprising a temporal feature quantity (direction feature quantity, velocity feature quantity) having a fixed length in the time direction and a spatial feature quantity having a fixed length in the spatial direction for each feature point. Generate quantity. Accordingly, the human motion detection device 1 can express a trajectory with a fixed-length multidimensional trajectory feature amount even if the trajectory length is variable in the time direction. Thus, the human motion detection device 1 can handle the word as one multidimensional trajectory feature quantity (trajectory word).

そして、人物動作検出装置１は、学習手段３１によって、予め人物が動作した際の映像における軌跡特徴量から、動作ごとの軌跡特徴量の分布を学習する。
すなわち、人物動作検出装置１は、コードブック生成手段３１１によって、特徴量抽出手段２０において種々の動作を撮影した複数の映像から抽出された軌跡特徴量（軌跡単語）を用いて、複数の軌跡単語を、予め定めた数（ｋ個）のクラスタにクラスタリングすることで、単語辞書となるコードブックを生成する（ステップＳ８）。そして、コードブック生成手段３１１は、生成したコードブックを、学習データ記憶手段４０に書き込み記憶する（ステップＳ９）。なお、コードブック生成手段３１１は、後記する動作検出フェーズにおいて、ｔｆ−ｉｄｆ法を用いる場合、入力映像の予め定めた時間長のシーケンスを１ドキュメントとしたときのドキュメントごとの軌跡単語およびそのクラスタを、学習データ記憶手段４０に書き込み記憶しておくこととする。 Then, the person motion detection device 1 learns the distribution of trajectory feature values for each motion from the trajectory feature values in the video when the person moves in advance by the learning unit 31.
That is, the human motion detection device 1 uses a trajectory feature amount (trajectory word) extracted from a plurality of videos in which various motions are captured by the feature amount extraction unit 20 by the code book generation unit 311 to use a plurality of trajectory words. Is clustered into a predetermined number (k) of clusters to generate a codebook to be a word dictionary (step S8). Then, the code book generation unit 311 writes and stores the generated code book in the learning data storage unit 40 (step S9). Note that the codebook generation means 311 uses the tf-idf method in the motion detection phase described later, and the trajectory word and its cluster for each document when the sequence of a predetermined time length of the input video is one document. The learning data storage means 40 is written and stored.

以上の動作によって、人物動作検出装置１は、種々の軌跡を固定長の多次元の特徴量を有する軌跡単語として収集し、ｋ個にクラスタリングした単語辞書（コードブック）を生成することができる。 Through the above operations, the human motion detection apparatus 1 can collect various trajectories as trajectory words having a fixed-length multidimensional feature and generate a word dictionary (codebook) clustered into k pieces.

〔学習フェーズ（第２段階）〕
次に、図９を参照（構成については適宜図１参照）して、人物動作検出装置１の学習フェーズ（第２段階）における動作について説明する。なお、図９の学習フェーズ（第２段階）は、予め定めた動作を撮影した映像から、軌跡特徴量（軌跡単語）を抽出し、学習フェーズ（第１段階）で生成したコードブックを参照し、クラスタ単位でヒストグラム化することで、当該動作の特徴量をヒストグラムとして生成する動作である。
なお、ステップＳ１１〜Ｓ１７までの動作は、図８で説明したステップＳ１〜Ｓ７までの動作と同じであるため、ここでは説明を省略する。 [Learning phase (second stage)]
Next, the operation in the learning phase (second stage) of the human motion detection device 1 will be described with reference to FIG. 9 (refer to FIG. 1 as appropriate for the configuration). In the learning phase (second stage) in FIG. 9, a trajectory feature amount (trajectory word) is extracted from a video obtained by photographing a predetermined action, and the code book generated in the learning phase (first stage) is referred to. This is an operation of generating a feature quantity of the operation as a histogram by forming a histogram in cluster units.
The operations from step S11 to S17 are the same as the operations from step S1 to S7 described with reference to FIG.

ステップＳ１７の後、人物動作検出装置１は、学習手段３１のヒストグラム生成手段３１２によって、特徴量抽出手段２０において予め定めた動作を撮影した映像から抽出された複数の軌跡特徴量（軌跡単語）を用いて、当該動作における軌跡単語の出現頻度の分布（ヒストグラム）を生成する（ステップＳ１８）。そして、ヒストグラム生成手段３１２は、生成したヒストグラムを個々の動作に対応付けて学習データ記憶手段４０に書き込み記憶する（ステップＳ１９）。なお、ヒストグラム生成手段３１２は、ヒストグラムを、個々の動作ごとに生成し、予め度数の合計値が“１．０”となるように、正規化することとする。
以上の動作によって、人物動作検出装置１は、ある動作における軌跡単語の出現頻度の分布（ヒストグラム）を動作ごとの特徴量として生成することができる。 After step S <b> 17, the human motion detection apparatus 1 uses the histogram generation unit 312 of the learning unit 31 to extract a plurality of trajectory feature amounts (trajectory words) extracted from a video obtained by capturing a predetermined motion in the feature amount extraction unit 20. Using this, the distribution (histogram) of the appearance frequency of the trajectory word in the operation is generated (step S18). The histogram generation unit 312 then writes and stores the generated histogram in the learning data storage unit 40 in association with each operation (step S19). Note that the histogram generation unit 312 generates a histogram for each operation, and normalizes the histogram so that the total value of frequencies becomes “1.0” in advance.
With the above operation, the human motion detection device 1 can generate a distribution (histogram) of appearance frequency of trajectory words in a certain motion as a feature amount for each motion.

〔動作検出フェーズ〕
次に、図１０を参照（構成については適宜図１参照）して、人物動作検出装置１の動作検出フェーズにおける動作について説明する。
なお、ステップＳ２１〜Ｓ２７までの動作は、図８で説明したステップＳ１〜Ｓ７までの動作と同じであるため、ここでは説明を省略する。 [Motion detection phase]
Next, referring to FIG. 10 (refer to FIG. 1 as appropriate for the configuration), the operation in the motion detection phase of the human motion detection device 1 will be described.
The operations from step S21 to S27 are the same as the operations from step S1 to S7 described with reference to FIG.

ステップＳ２７の後、人物動作検出装置１は、動作判定手段３２の重み付きヒストグラム生成手段３２１によって、予め定めた時間長のシーケンスに軌跡の終点が存在する複数の軌跡単語を、学習データ記憶手段４０に記憶されているコードブックのｋ個のクラスタの中で、距離（ユークリッド距離）が最も近いクラスタに分類し、ｋ個のビン数からなるヒストグラムを生成する（ステップＳ２８）。 After step S27, the human motion detection apparatus 1 uses the weighted histogram generation unit 321 of the motion determination unit 32 to obtain a plurality of trajectory words having trajectory end points in a sequence having a predetermined time length as the learning data storage unit 40. Are classified into clusters having the shortest distance (Euclidean distance) among the k clusters of the codebook stored in, and a histogram composed of k bins is generated (step S28).

このとき、重み付きヒストグラム生成手段３２１は、１シーケンス内の軌跡単語を、１つのドキュメント（文書）とみなし、全ドキュメント（ここでは、学習データ記憶手段４０に記憶されている全ドキュメント）における軌跡単語の重要度を、ｔｆ−ｉｄｆ法を用いて算出し、当該軌跡単語が属するクラスタの出現頻度に重要度を乗算することで、ヒストグラムに重みを付加する。これによって、重み付きヒストグラム生成手段３２１は、背景領域上の軌跡単語の重要度を下げることで、人物の動作をより適切に表したヒストグラムを生成することができる。なお、重み付きヒストグラム生成手段３２１は、このヒストグラムを、予め度数の合計値が“１．０”となるように、正規化することとする。 At this time, the weighted histogram generation means 321 regards the trajectory word in one sequence as one document (document), and the trajectory words in all documents (here, all the documents stored in the learning data storage means 40). Is calculated using the tf-idf method, and a weight is added to the histogram by multiplying the appearance frequency of the cluster to which the locus word belongs by the importance. As a result, the weighted histogram generation means 321 can generate a histogram that more appropriately represents the motion of the person by reducing the importance of the trajectory word on the background region. Note that the weighted histogram generation means 321 normalizes the histogram so that the total value of the frequencies is “1.0” in advance.

そして、人物動作検出装置１は、分類手段３２２によって、ステップＳ２８で生成されたヒストグラム（重み付きヒストグラム）を、学習データ記憶手段４０に記憶されている学習データの動作ごとのヒストグラムと比較し、当該シーケンスにおける人物の動作を予め定めた動作に分類する（ステップＳ２９）。
このように分類された動作は、人物動作検出装置１の人物の動作検出結果として外部に出力される。 Then, the human motion detection device 1 compares the histogram (weighted histogram) generated in step S28 by the classification unit 322 with the histogram for each operation of the learning data stored in the learning data storage unit 40, and The movement of the person in the sequence is classified into a predetermined movement (step S29).
The movement classified in this way is output to the outside as a human movement detection result of the human movement detection apparatus 1.

以上説明したように、人物動作検出装置１は、時間方向に可変長の人物の軌跡を固定長（固定次元）の軌跡特徴量で表すことができ、一連の動作軌跡を忠実に特徴量として表現することができるため、映像内から人物の動作を精度よく検出することができる。
さらに、人物動作検出装置１は、特徴点の軌跡の特徴量として、固定長（固定次元）の軌跡特徴量を用いるため、その軌跡特徴量を単語（軌跡単語）とみなして、「Ｂａｇ−ｏｆ−ｗｏｒｄｓ」手法を用いて人物の動作検出を行うことができる。これによって、人物動作検出装置１は、頻繁に発生する背景上の特徴量の重要度を下げ、人物動作をより頑健に行うことができる。 As described above, the human motion detection device 1 can represent a trajectory of a variable length person in the time direction as a trajectory feature amount of a fixed length (fixed dimension), and faithfully represent a series of motion trajectories as feature amounts. Therefore, it is possible to accurately detect the movement of the person from the video.
Furthermore, since the human motion detection apparatus 1 uses a fixed-length (fixed dimension) trajectory feature amount as the feature amount of the trajectory of the feature point, the trajectory feature amount is regarded as a word (trajectory word) and “Bag-of -Words "technique can be used to detect human motion. As a result, the human motion detection device 1 can reduce the importance of frequently occurring feature quantities on the background and perform human motion more robustly.

このように、本発明に係る人物動作検出装置１は、人物動作を頑健に検出することができるため、映像監視による人物の異常行動検出、特定動作検出、あるいは、ジェスチャをトリガとしたマンマシンインタフェースなど、広く応用することができる。 As described above, since the human motion detection device 1 according to the present invention can detect a human motion robustly, it detects an abnormal behavior of a person by video monitoring, a specific motion detection, or a man-machine interface triggered by a gesture. It can be widely applied.

以上、本発明の実施形態に係る人物動作検出装置１の構成および動作について説明したが、本発明は、この実施形態に限定されるものではない。
例えば、ここでは、特徴量抽出手段２０が、時間特徴量と空間特徴量との両方を軌跡特徴量として生成することとしたが、時間特徴量のみを用いることとしてもよい。この場合、図１の構成から空間特徴量生成手段２２を省略して構成すればよい。このとき、軌跡特徴量は、時間特徴量である方向特徴量および速度特徴量で構成されることになる。 The configuration and operation of the human motion detection device 1 according to the embodiment of the present invention have been described above, but the present invention is not limited to this embodiment.
For example, here, the feature quantity extraction unit 20 generates both the temporal feature quantity and the spatial feature quantity as the trajectory feature quantity, but only the temporal feature quantity may be used. In this case, the spatial feature generation means 22 may be omitted from the configuration of FIG. At this time, the trajectory feature amount includes a directional feature amount and a speed feature amount that are temporal feature amounts.

また、ここでは、学習手段３１を備えることとしたが、すべての人物動作検出装置１に学習手段３１を備える必要はない。すなわち、ある人物動作検出装置１において、学習を行い学習データ記憶手段４０に学習データを記憶した後、少なくとも学習データ記憶手段４０さえ備えれば、動作検出フェーズを実行することができる。この場合、学習を行わない人物動作検出装置１からは、学習手段３１を省略して構成すればよい。 In addition, although the learning means 31 is provided here, it is not necessary to provide the learning means 31 in every person motion detection device 1. That is, in a certain person motion detection device 1, after learning and storing learning data in the learning data storage means 40, the motion detection phase can be executed as long as at least the learning data storage means 40 is provided. In this case, the learning means 31 may be omitted from the human motion detection device 1 that does not perform learning.

［人物動作検出装置の評価結果］
最後に、本発明の実施形態に係る人物動作検出装置１において、従来では、加味することができなかった時間方向の特徴量を用いた場合の人物の動作検出結果について説明する。ここでは、人物の動作として、「指を指す動作（Ｐｏｉｎｔｉｎｇ）」、「物を置く動作（ＯｂｊｅｃｔＰｕｔ）」について、映像から各動作を検出することができた再現率〔Ｒｅｃａｌｌ〕（％）を測定した。 [Evaluation result of human motion detection device]
Finally, in the human motion detection device 1 according to the embodiment of the present invention, a human motion detection result when using a time-direction feature amount that could not be considered in the past will be described. Here, the measurement of the reproduction rate [Recall] (%) at which each motion can be detected from the image for the motion of a person (Pointing) and the motion of placing an object (ObjectPut) is performed. did.

〔表１〕に、従来のＳＵＲＦ特徴量のみで動作を検出した場合（ＳＵＲＦ）、ＳＵＲＦ特徴量に角度の特徴量（方向特徴量）を付加して動作を検出した場合（ＳＵＲＦ＋ａｎｇｌｅ）、さらに、本発明における時間方向の特徴量である速度を付加して動作を検出した場合（ＳＵＲＦ＋ａｎｇｌｅ＋ｓｐｅｅｄ）について、それぞれ再現率の測定結果を示した。〔表１〕に示すように、本発明における時間方向の特徴量である速度を付加して動作を検出することで、再現率を高めることができた。
このように、本発明は、可変長の特徴量である時間方向の特徴量を固定長の特徴量として扱うことで、従来の動作検出手法に比べて、頑健に人物の動作を検出することができる。 In [Table 1], when the motion is detected only with the conventional SURF feature value (SURF), when the motion is detected by adding the angle feature value (direction feature value) to the SURF feature value (SURF + angle), In the present invention, the measurement results of the recall are shown for the case where the motion is detected by adding the speed, which is the characteristic amount in the time direction (SURF + angle + speed). As shown in [Table 1], the reproducibility could be improved by detecting the motion by adding the speed, which is the time-direction feature amount in the present invention.
As described above, the present invention can detect the motion of a person more robustly than the conventional motion detection method by treating the time-direction feature value, which is a variable-length feature value, as a fixed-length feature value. it can.

１人物動作検出装置
１０特徴点軌跡情報生成手段
１１前景領域抽出手段
１２特徴点検出手段
１３特徴点追跡手段
２０特徴量抽出手段
２１時間特徴量生成手段
２１１平滑化手段
２１２方向特徴量生成手段
２１３速度特徴量生成手段
２２空間特徴量生成手段
３０動作識別手段
３１学習手段
３１１コードブック生成手段
３１２ヒストグラム生成手段
３２動作判定手段
３２１重み付きヒストグラム生成手段（重み付き分布生成手段）
３２２分類手段
４０学習データ記憶手段 DESCRIPTION OF SYMBOLS 1 Human motion detection apparatus 10 Feature point locus information generation means 11 Foreground area extraction means 12 Feature point detection means 13 Feature point tracking means 20 Feature quantity extraction means 21 Time feature quantity generation means 211 Smoothing means 212 Directional feature quantity generation means 213 Speed Feature quantity generation means 22 Spatial feature quantity generation means 30 Action identification means 31 Learning means 311 Codebook generation means 312 Histogram generation means 32 Action determination means 321 Weighted histogram generation means (weighted distribution generation means)
322 Classification means 40 Learning data storage means

Claims

A human motion detection device that detects a motion of the person from a video of the person,
Feature points are detected for each frame image of the video, and feature points are matched for each frame image, thereby generating a trajectory tracking the position of the feature points in the time direction as feature point trajectory information. Feature point trajectory information generating means,
Based on the position of the feature point included in the feature point trajectory information generated by the feature point trajectory information generation unit, the direction and the size of the movement vector for each frame image of the feature point are determined as the direction and the size. A time feature amount generating means for generating a time feature amount by accumulating the range that can be taken for each range width divided into a predetermined number, and making a trajectory feature amount that is a feature amount of the trajectory of the feature point;
A plurality of trajectory feature quantities are clustered into a predetermined number of clusters, and a distribution obtained by accumulating a plurality of trajectory feature quantities constituting a known motion for each cluster is previously learned in association with each known motion. Learning data storage means for storing as data,
For each predetermined time interval, a distribution obtained by accumulating the clusters to which the trajectory feature amount belongs is generated from a plurality of trajectory feature amounts whose trajectory end points exist in the time interval, and stored in the learning data storage unit. Action identification means for identifying the action of the person depending on whether the distribution is similar to the cluster distribution for each action;
A human motion detection device comprising:

Spatial feature value generation means for generating a luminance gradient of a frame image at the position of the feature point included in the feature point trajectory information generated by the feature point trajectory information generation means as a spatial feature quantity and adding the spatial gradient to the trajectory feature quantity; The human motion detection device according to claim 1, further comprising:

The time feature amount generation means includes:
Direction features that are feature amounts constituting the temporal feature amount by accumulating the directions of the movement vectors for each different range width obtained by dividing the possible range of the direction of the movement vector by a plurality of predetermined numbers. Direction feature amount generating means for generating a quantity;
A feature amount constituting the temporal feature amount by accumulating the size of the movement vector for each different range width obtained by dividing the range that the movement vector can take by a plurality of predetermined numbers. A speed feature quantity generating means for generating a speed feature quantity;
The human motion detection device according to claim 1, further comprising:

The time feature amount generating means further comprises a smoothing means for generating a plurality of trajectories obtained by smoothing the trajectories of the feature points in the feature point trajectory information,
The direction feature quantity generation unit and the speed feature quantity generation unit generate the direction feature quantity and the speed feature quantity for a plurality of trajectories smoothed by the smoothing unit, respectively. Item 4. The human motion detection device according to Item 3.

The operation identification means includes
By treating each trajectory feature amount having a trajectory end point in the time interval as a word and considering a plurality of words existing in the time interval as a document, the feature amount extraction unit performs the tf-idf method. A weighted distribution generating means for calculating the importance of the generated trajectory feature quantity and generating a cluster distribution by weighting the frequency of the cluster to which the trajectory feature quantity belongs;
Similarity is determined based on the distance between the distribution of clusters generated by the weighted distribution generation unit and the distribution of clusters for each operation stored as learning data in the learning data storage unit, and the motion of the person is determined. A classification means for classifying;
The human motion detection device according to any one of claims 1 to 4, further comprising:

In order to detect the movement of the person from the video of the person,
Feature points are detected for each frame image of the video, and feature points are matched for each frame image, thereby generating a trajectory tracking the position of the feature points in the time direction as feature point trajectory information. Feature point trajectory information generating means,
Based on the position of the feature point included in the feature point trajectory information generated by the feature point trajectory information generation unit, the direction and the size of the movement vector for each frame image of the feature point are determined as the direction and the size. A time feature amount generating means for generating a time feature amount by accumulating the range that can be taken for each range width divided into a predetermined number, and making a trajectory feature amount that is a feature amount of the trajectory of the feature point;
A plurality of trajectory feature quantities are clustered into a predetermined number of clusters, and a distribution obtained by accumulating a plurality of trajectory feature quantities constituting a known motion for each cluster is previously learned in association with each known motion. A distribution obtained by accumulating the clusters to which the trajectory feature amount belongs from a plurality of trajectory feature amounts having an end point of the trajectory within the predetermined time interval for each predetermined time interval with reference to the learning data storage means stored as data. Action identifying means for identifying the action of the person according to whether or not it is similar to the cluster distribution for each action stored in the learning data storage means,
It is made to function as a person motion detection program characterized by things.