JPH0546583A

JPH0546583A - Confirmation device for moving body action

Info

Publication number: JPH0546583A
Application number: JP3205033A
Authority: JP
Inventors: Junji Yamato; 淳司大和; Atsushi Otani; 淳大谷; Kenichiro Ishii; 健一郎石井
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1991-08-15
Filing date: 1991-08-15
Publication date: 1993-02-26

Abstract

PURPOSE:To select an action with a high likelihood by acquiring timeseries models in action as determinative state transition models corresponding to respective recognition categories by training based upon learning data and calculating the probability that those models generate actions to be recognized. CONSTITUTION:In learning mode, parameters of state transition models for recognition are estimated from data for learning and stored by recognition categories in a state transition model storage memory 30 for recognition. In recognition mode, the likelihoods of the models which are stored in the state transition model storage memory 30 for recognition by the learning and corresponds to the respective categories are calculated to perform maximum likelihood estimation which employs the category corresponding to the model having the maximum likelihood as a recognition result. Processing up to quantization are the same between the learning and recognition. Consequently, the action of a moving body such as a person in a moving picture can be recognized. Namely, stable processing becomes possible since model fitting is not required.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、動画像からの人間等
の動物体の動作、行動のパタンの認識を行う動物体行動
認識装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a moving object recognition device for recognizing a motion pattern of a human or other moving body from a moving image.

【０００２】[0002]

【従来の技術】動画像を対象としたパタン認識技術は、
近年多くの研究が行われているが、動作、行動の認識を
目指すものとしては、大きく分けて次の２つのアプロー
チがある。2. Description of the Related Art Pattern recognition technology for moving images is
Although many studies have been conducted in recent years, the following two approaches can be broadly divided into those aiming to recognize motions and behaviors.

【０００３】（１）モデルをベースとしたモデルベース
・アプローチと呼ぶべきものがある。これは人体の胴
体、腕などの各パーツを、楕円体、一般化円筒などの幾
何学的モデルとして表現し、それらの関節角などのパラ
メータで人間の姿勢を記述するものである。(1) There is a model-based approach based on a model. This expresses each part of the human body such as the body and arm as a geometric model such as an ellipsoid and a generalized cylinder, and describes the human posture by parameters such as their joint angles.

【０００４】（２）特徴をベースとしたヒューリスティ
ック・アプローチと呼ぶべきものがある。これは、実際
の情景画像を対象にして人物流の計数などが行われてい
る。この場合、例えば閾値処理された画像中の一定以上
の面積領域の数を数えるなどの方法がとられる。(2) There is something called a feature-based heuristic approach. For this purpose, human physical distribution is counted for an actual scene image. In this case, for example, a method of counting the number of area regions having a certain size or more in the threshold-processed image is used.

【０００５】[0005]

【発明が解決しようとする課題】前記（１）の従来のモ
デルベース・アプローチ技術では、モデルの画像へのフ
ィッティングが必要となるため、ノイズの多い実画像を
対象とした場合、そのパタン認識が不安定となるという
問題があった。In the conventional model-based approach technique of the above (1), it is necessary to fit the model to the image. Therefore, when a noisy real image is targeted, its pattern recognition is not performed. There was a problem of instability.

【０００６】前記（２）の従来のヒューリスティック・
アプローチ技術では、対象毎、シーン毎に人間が処理の
内容、パラメータについてアドホックにヒューリスティ
クスな手法を構築する必要があり、また、処理内容が、
計数といった低レベルの認識にとどまり、行動の認識と
いった高度な処理が困難であった。The conventional heuristic of the above (2)
In the approach technology, it is necessary for a human to construct an ad-hoc heuristic method for processing contents and parameters for each target and each scene.
Only low-level recognition such as counting was difficult, and advanced processing such as behavior recognition was difficult.

【０００７】本発明は、前記問題点を解決するためにな
されたものであり、本発明の目的は、高度な行動認識
を、不安定なモデルフィッティングによらずに、実現す
るための認識技術及び認識系の構築技術を提供すること
にある。The present invention has been made to solve the above-mentioned problems, and an object of the present invention is to provide a recognition technique for realizing advanced behavior recognition without relying on unstable model fitting. It is to provide the recognition system construction technology.

【０００８】本発明の前記ならびにその他の目的及び新
規な特徴は、本明細書の記述及び添付図面によって明ら
かにする。The above and other objects and novel features of the present invention will become apparent from the description of this specification and the accompanying drawings.

【０００９】[0009]

【課題を解決するための手段】前記目的を達成するため
に、本発明においては、シーン中の人間等の動物体の行
動を認識する動物体行動認識装置において、動物体の行
動の各動作を、画像から抽出したメッシュ特徴、オプテ
ィカルフローの方向分布などの特徴量のベクトルで表現
する手段と、各認識カテゴリに対応する確率的状態遷移
モデルとして行動の時系列モデルを学習データによるト
レーニングによって獲得する手段と、それらのモデルが
認識対象行動を生成する確率を各々計算する手段とを具
備し、もっとも尤度の高い行動を選び出すことを最も主
要な特徴とする。In order to achieve the above-mentioned object, in the present invention, in an animal body action recognition device for recognizing the action of an animal body such as a human being in a scene, each action of the animal body action is , A method of expressing with a vector of feature quantities such as mesh features extracted from images and direction distribution of optical flow, and a time series model of behavior as a probabilistic state transition model corresponding to each recognition category is acquired by training with learning data. Means and means for calculating the probabilities that these models generate recognition-targeted behaviors, respectively, are characterized by selecting the behavior with the highest likelihood.

【００１０】[0010]

【作用】前述の手段によれば、シーン中の人間等の動物
体の行動の各動作を、画像から抽出したメッシュ特徴、
オプティカルフローの方向分布などの特徴量のベクトル
で表現し、各認識カテゴリに対応する確率的状態遷移モ
デルとして行動の時系列モデルを学習データによるトレ
ーニングによって獲得し、それらのモデルが認識対象行
動を生成する確率を各々計算することにより、もっとも
尤度の高い行動を選び出すことができるので、事例から
の学習によって、動画像中の人物などの動物体の行動を
認識することができる。According to the above-mentioned means, each motion of the behavior of the moving body such as a human being in the scene is extracted by the mesh feature extracted from the image,
Expressed by a vector of feature quantities such as the direction distribution of the optical flow, a time series model of behavior is acquired by training with learning data as a probabilistic state transition model corresponding to each recognition category, and those models generate recognition target behavior. By calculating the respective probabilities, the action with the highest likelihood can be selected, and the action of the moving object such as a person in the moving image can be recognized by learning from the case.

【００１１】つまり、本発明は、特に、従来の技術と
は、モデルフィッティングを要しないため安定した処理
が可能な点、事例からの学習によるため人間が個々の状
況に応じて認識系のパラメータを決定する必要がない点
において異なる。That is, the present invention, in particular, differs from the prior art in that model fitting is not required and stable processing is possible, and since learning is performed from cases, a human can set parameters of the recognition system according to individual situations. The difference is that there is no need to decide.

【００１２】[0012]

【実施例】以下、本発明の実施例を図面を参照して詳細
に説明する。なお、実施例を説明するための全図におい
て、同一機能を有するものは同一符号を付け、その繰り
返しの説明は省略する。Embodiments of the present invention will now be described in detail with reference to the drawings. In all the drawings for explaining the embodiments, parts having the same function are designated by the same reference numerals, and repeated description thereof will be omitted.

【００１３】〔実施例１〕図１は、本発明の動物体行動
認識装置の実施例１の概略構成を示すブロック図、図２
は、本実施例１の動物体行動認識装置の機能構成を示す
ブロック図である。図１及び図２において、１１は画像
入力装置、１２はコンピュータ、１３は外部メモリ装
置、２１は画像入力部、２２は画像用メモリ、２３は特
徴抽出部、２４は特徴格納メモリ、２５は量子化部、２
６はシンボル格納メモリ、２７は尤度算出部、２８は認
識結果用メモリ、２９はモデルパラメータ推定部、３０
は認識用状態遷移モデル格納メモリである。前記認識用
状態遷移モデル格納メモリ３０としては、例えば外部メ
モリ装置１３を用いる。[First Embodiment] FIG. 1 is a block diagram showing a schematic configuration of a first embodiment of a moving object behavior recognition apparatus of the present invention, FIG.
FIG. 3 is a block diagram showing a functional configuration of the moving object behavior recognition device according to the first embodiment. In FIGS. 1 and 2, 11 is an image input device, 12 is a computer, 13 is an external memory device, 21 is an image input unit, 22 is an image memory, 23 is a feature extraction unit, 24 is a feature storage memory, and 25 is a quantum. Avatar, 2
6 is a symbol storage memory, 27 is a likelihood calculation unit, 28 is a recognition result memory, 29 is a model parameter estimation unit, 30
Is a recognition state transition model storage memory. As the recognition state transition model storage memory 30, for example, an external memory device 13 is used.

【００１４】本実施例１の基本的動作には、学習と認識
の２つの段階がある。学習時には、学習用のデータから
認識用状態遷移モデルのパラメータ推定を行い認識カテ
ゴリ毎に認識用状態遷移モデル格納メモリ３０に格納す
る。認識時には学習によって認識用状態遷移モデル格納
メモリ３０に格納された、各カテゴリに対応するモデル
の尤度を算出し、最大の尤度を持つモデルに対応するカ
テゴリを認識結果とする最尤推定を行う。量子化までの
処理は学習時、認識時とも同一である。共通の部分から
図２に示す流れに沿って説明する。The basic operation of the first embodiment has two stages of learning and recognition. At the time of learning, the parameters of the recognition state transition model are estimated from the learning data and stored in the recognition state transition model storage memory 30 for each recognition category. At the time of recognition, the likelihood of the model corresponding to each category stored in the recognition state transition model storage memory 30 is calculated by learning, and the maximum likelihood estimation in which the category corresponding to the model having the maximum likelihood is set as the recognition result is performed. To do. The processing up to quantization is the same during learning and recognition. The common part will be described along the flow shown in FIG.

【００１５】まず、ＴＶカメラ等の画像入力装置１１の
画像入力部２１から行動中の人間を含む動画像をとら
え、画像用メモリ２２に格納する。First, a moving image including a person in action is captured from the image input unit 21 of the image input device 11 such as a TV camera and stored in the image memory 22.

【００１６】次に、特徴抽出部２３により、動画像か
ら、複数の特徴量を得る。Next, the feature extraction unit 23 obtains a plurality of feature quantities from the moving image.

【００１７】ここで使用する特徴量の例を以下に示す。
まず、図３に示すメッシュ特徴が考えられる。すなわ
ち、まず、画像用メモリ２２をｎ×ｍの画素数を持つＮ
×Ｍのサブブロックに分割し、各々このサブブロックで
画像の２値化を行う。次に、このサブブロック内の黒画
素の占有率を求め、これをＮ×Ｍ次元の特徴ベクトルと
する方法である。すなわち、ａ_ijをメッシュ（i,j)の黒
画素の占有率とし、これを並べたベクトル、An example of the feature quantity used here is shown below.
First, consider the mesh features shown in FIG. That is, first, the image memory 22 is set to have N × m pixels.
It is divided into xM sub-blocks, and an image is binarized in each of these sub-blocks. Next, it is a method of obtaining the occupancy rate of black pixels in this sub-block and using this as an N × M dimensional feature vector. That is, let a _{ij be} the occupation rate of black pixels of the mesh (i, j), and arrange this vector,

【００１８】[0018]

【数１】：ｆm＝（ａ₀₀，ａ₀₁，...，ａ_ij,...a_MN）を特徴ベクトルとする方法である。## EQU1 ## This is a method in which fm = (a ₀₀ , a ₀₁ , ..., A _ij , ... a _MN ) is used as the feature vector.

【００１９】あるいは図４に示すようなオプティカルフ
ローを用いた特徴として、以下に挙げる３つの例があ
る。第１に、複数の時間フレームの画像から得られたオ
プティカルフローを用いて、画像をｎ×ｍの画素数を持
つＮ×Ｍのサブブロックに分割し、各々のサブブロック
内でのフローの方向を特徴ベクトルとする。すなわち、
θ_ijをメッシュ(i,j)のフローベクトルの平均の方向
（ｘ軸とのなす角）とし、これを並べたベクトル、Alternatively, there are the following three examples as characteristics using the optical flow as shown in FIG. First, an image is divided into N × M sub-blocks having n × m pixels by using optical flows obtained from images of a plurality of time frames, and a flow direction in each sub-block is divided. Is a feature vector. That is,
Let θ _{ij be} the average direction of the flow vector of the mesh (i, j) (angle formed with the x axis), and arrange this vector,

【００２０】[0020]

【数２】 [Equation 2]

【００２１】を特徴ベクトルとする方法である。第２
に、同じくフローの大きさを特徴ベクトルとする。すな
わち、ｒ_ijをメッシュ（i,j）のフローベクトルの平均
の大きさとし、これを並べたベクトル、Is a feature vector. Second
Similarly, the size of the flow is used as the feature vector. That is, let r _{ij be} the average size of the flow vector of the mesh (i, j), and arrange this,

【００２２】[0022]

【数３】：ｆ_r＝（ｒ₀₀，ｒ₀₁，...，ｒ_ij,...ｒ_MN）を特徴ベクトルとする方法である。## EQU3 ## This is a method in which f _r = (r ₀₀ , r ₀₁ , ..., r _ij , ... r _MN ) is used as the feature vector.

【００２３】また、第３に、２次元フーリエ変換のパワ
ースペクトルを適当なメッシュに分割し、同様に各メッ
シュの平均のパワー、位相を成分とする特徴ベクトルを
使用する方法もある。Thirdly, there is also a method in which the power spectrum of the two-dimensional Fourier transform is divided into appropriate meshes, and the feature vectors having the average power and phase of each mesh as components are similarly used.

【００２４】特徴ベクトルが得られた後、量子化部２５
によってベクトル列のシンボル列への変換が行われ、シ
ンボル格納メモリ２６に記録される。これはベクトル量
子化による。すなわち、各特徴ベクトルはあらかじめ用
意された量子化のための代表点の一覧に基づき、それら
の内で最も距離の近い代表点ベクトルに対応するシンボ
ルに変換される。この代表点群をコードブックと呼ぶ。
コードブックの作成法には、k-mean（k-平均)法
（［１］Hidden Markov Model for Speech Recognitio
n X.D.Huang,Y.Ariki,M.A.Jack Edinburg Univ.Press p
117参照）、ＬＢＧ法（［２］An Algorithm for Vector
Quantizer design , Y.Linde,A.Buzo, R.M.Gray IEEE
Trans.Commin. vol.COM-28,pp,84-95,1980 参照）など
がある。本実施例の場合、認識用モデルの学習時にコー
ドブックの作成も行う必要があるが、いずれの方法も適
用可能である。また、使用する距離尺度には、ユークリ
ッド距離、各次元の分散を考慮したマハラノビス距離な
どがある。後述するように、量子化の必要のない連続モ
デルに基づく認識も可能である。After the feature vector is obtained, the quantizer 25
The vector sequence is converted into a symbol sequence by and is recorded in the symbol storage memory 26. This is due to vector quantization. That is, each feature vector is converted into a symbol corresponding to a representative point vector having the shortest distance among them, based on a list of representative points for quantization prepared in advance. This representative point group is called a codebook.
The k-mean method ([1] Hidden Markov Model for Speech Recognitio is used to create the codebook.
n XD Huang, Y.Ariki, MAJack Edinburg Univ.Press p
117), LBG method ([2] An Algorithm for Vector
Quantizer design, Y.Linde, A.Buzo, RMGray IEEE
Trans.Commin. Vol.COM-28, pp, 84-95,1980). In the case of the present embodiment, it is necessary to create a codebook when learning the recognition model, but any method is applicable. Further, the distance measure to be used includes Euclidean distance, Mahalanobis distance in consideration of variance of each dimension, and the like. As will be described later, recognition based on a continuous model that does not require quantization is also possible.

【００２５】ここまでの処理によって、画像系列がシン
ボル列に変換された。また、ここまでの動作について
は、認識時、学習時ともに同一である。これ以降の処理
の流れについて、まず、認識時について説明する。By the processing up to this point, the image series is converted into a symbol string. The operations up to this point are the same both during recognition and during learning. With respect to the flow of the processing thereafter, first, at the time of recognition will be described.

【００２６】認識時には、これらの特徴ベクトル列は、
特徴格納メモリ２４に記録される。そして、認識するカ
テゴリ数だけ用意された認識用状態遷移モデル格納メモ
リ３０に格納されたモデルの各々から、この特徴ベクト
ル列が生成される確率を尤度算出部２７によって算出す
る。以下の説明のために、モデルのパラメータを次のよ
うに定める。At the time of recognition, these feature vector sequences are
It is recorded in the feature storage memory 24. Then, the likelihood calculating unit 27 calculates the probability of generating this feature vector sequence from each of the models stored in the recognition state transition model storage memory 30 prepared for the number of categories to be recognized. For the following description, the model parameters are defined as follows.

【００２７】[0027]

【数４】Ｔ：観測されたシンボル系列Ｏ＝Ｏ₁，
Ｏ₂，...，Ｏ_Tの長さＮ：モデル中の状態数Ｌ：モデル中のシンボル数Ｓ＝｛ｓ｝：状態の集合。ｓ_tはｔ番目の状態（観測で
きない）[Number 4] T: the observed symbol sequence O = O _1,
Length of O ₂ , ..., O _T N: Number of states in model L: Number of symbols in model S = {s}: Set of states. s _t is the t-th state (not observable)

【００２８】[0028]

【数５】υ＝｛υ₁,υ₂，...，υ_L｝：観測可能なシン
ボルの集合Ａ＝｛a_ij|a_ij=Pr（s_t+1=j|s_t=i）｝:状態遷移確率。a
_ijは状態iから状態jへ遷移する確率Ｂ＝｛b_j（Ｏ_t）|b_j（Ｏ_t）=Pr（Ｏ_t|s_t=j）｝:シンボ
ル出力確率ｂ_j（ｋ）は状態ｊにおいてシンボルυ_kを出力する確率 π＝｛π_i|π_i＝Ｐｒ（ｓ₁＝ｉ）｝:初期状態確率観測したあるシンボル列を、あるモデルが発生する確率
はforwardアルゴリズム（［１］Hidden Markov Model
for Speech Recognition X.D.Huang,Y.Ariki,M.A.Jack
Edinburg Univ.Press p148参照）によって以下のように
して求めることができる。[Equation 5] υ = {υ ₁ , υ ₂ , ..., υ _L }: A set of observable symbols A = {a _ij | a _ij = Pr (s _{t + 1} = j | s _t = i) }: State transition probability. a
_ij is the probability of transition from state i to state j B = {b _j (O _t ) | b _j (O _t ) = Pr (O _t | s _t = j)}: symbol output probability b _j (k) is the state Probability of outputting the symbol υ _k at j π = {π _i | π _i = Pr (s ₁ = i)}: initial state probability The probability that a certain model will generate an observed symbol sequence is the forward algorithm ([1] Hidden Markov Model
for Speech Recognition XDHuang, Y.Ariki, MAJack
Edinburg Univ. Press p148) can be obtained as follows.

【００２９】あるモデルλ＝｛Ａ，Ｂ，π｝がシンボル
系列Ｏ＝Ｏ₁，Ｏ₂，...，Ｏ_Tを出力する確率Ｐｒ（Ｏ｜
λ）は、Probability Pr (O |) that a model λ = {A, B, π} outputs a symbol sequence O = O ₁ , O ₂ , ..., O _T.
λ) is

【００３０】[0030]

【数６】 [Equation 6]

【００３１】ただし、ここでα_T（ｉ）は α_T（ｉ）≡Ｐｒ（Ｏ₁，Ｏ₂，...，Ｏ_t,ｓ_t＝ｉ|λ）. （２）で定義され、具体的には、Here, α _T (i) is defined by α _T (i) ≡Pr (O ₁ , O ₂ , ..., O _t , s _t = i | λ). (2), and Specifically,

【００３２】[0032]

【数７】 [Equation 7]

【００３３】の漸化式で求められる。It is obtained by the recurrence formula of

【００３４】こうして求められた尤度が最大となるモデ
ルが、認識結果として選択され認識結果用メモリ２８に
蓄えられる。以上が認識時の処理フローである。The model having the maximum likelihood thus obtained is selected as a recognition result and stored in the recognition result memory 28. The above is the processing flow at the time of recognition.

【００３５】次に、学習の際の処理フローについて述べ
る。モデルパラメータ推定部２９は、各カテゴリ毎に複
数与えられた学習用データから得られたシンボル列に対
して、そのシンボル列を発生するような状態遷移モデル
のパラメータを推定し、認識用状態遷移モデル格納メモ
リ３０に蓄える。これは、あるシンボル列、Next, a processing flow for learning will be described. The model parameter estimation unit 29 estimates, for a symbol sequence obtained from a plurality of learning data given for each category, parameters of a state transition model that generate the symbol sequence, and recognizes the state transition model for recognition. It is stored in the storage memory 30. This is a sequence of symbols,

【００３６】[0036]

【数８】：Ｏ＝Ｏ₁，Ｏ₂，...，Ｏ_T が与えられたときにBaum-Welchアルゴリズム（［１］Hi
dden Markov Model forSpeech Recognition X.D.Huan
g,Y.Ariki,M.A.Jack Edinburg Univ.Press p152,［３］
確率モデルによる音声認識中川聖一電子情報通信学
会 p55参照）を用いて求められる。ここで、Baum-Welc
hアルゴリズムを説明する。これは、あるモデルパラメ
ータをもとに、それよりもより尤度の高いモデルパラメ
ータを求めることを繰り返していく手続きである。繰り
返し毎に先に説明したforwardアルゴリズムによって尤
度の値を確認することで収束の確認が可能である。[Equation 8]: When O = O ₁ , O ₂ , ..., O _T is given, the Baum-Welch algorithm ([1] Hi
dden Markov Model for Speech Recognition XDHuan
g, Y.Ariki, MAJack Edinburg Univ.Press p152, [3]
Speech recognition by probabilistic model Seiichi Nakagawa (See p. 55 of The Institute of Electronics, Information and Communication Engineers). Where Baum-Welc
The h algorithm will be described. This is a procedure in which a model parameter having a higher likelihood than that is repeatedly obtained based on a certain model parameter. It is possible to confirm the convergence by confirming the value of the likelihood by the forward algorithm described above every iteration.

【００３７】[0037]

【数９】 [Equation 9]

【００３８】[0038]

【数１０】 [Equation 10]

【００３９】[0039]

【数１１】 [Equation 11]

【００４０】[0040]

【数１２】 [Equation 12]

【００４１】ただしここで、However, here,

【００４２】[0042]

【数１３】 [Equation 13]

【００４３】[0043]

【数１４】 [Equation 14]

【００４４】上記の手続きによって、学習データに対応
する認識用状態遷移モデルのパラメータを求めることが
できる。こうして求めた各カテゴリ毎のモデルを認識の
際に使用する。The parameters of the recognition state transition model corresponding to the learning data can be obtained by the above procedure. The model thus obtained for each category is used for recognition.

【００４５】この実施例１で述べた処理フローの実験結
果例として、図５に示したものを説明する。本例では認
識対象の行動として、４つの動作（右手を上げてから下
げる、左手を上げてから下げる、右足、左足も同様）の
場合を示す。図５において、５枚ずつ横に並んだ一連の
図が各々の動作例を示す。上から各々、右手、左手、右
足、左足を上げる動作である。これら各カテゴリ毎に３
回の試行を行い、内１回を学習用データ、２回を認識実
験用データとして使用した。特徴ベクトルとしてはメッ
シュ特徴を使用した。また、量子化においては、図５に
示した２０枚の画像のベクトルを代表点として使用し
た。すなわち、これらの画像から得られた２０の特徴ベ
クトルｆ_q（i）,（ｉ＝１，...，２０）のうちで最もユ
ークリッド距離の近いものによって、特徴ベクトルｆ_m
（ｊ）を、Ｏ_j＝argmin_i｛ｆ_m（ｊ），ｆ_q（ｉ）｝で示
される、シンボルＯjに量子化した。これら２０の代表
点がコードブックを構成することになる。本実験例で
は、コードブックの作成簡略化のために、先に挙げたk-
mean法などコードブック作成アルゴリズムに依らず、動
作途中の適当な画像を選定した。As an example of the experimental result of the processing flow described in the first embodiment, the one shown in FIG. 5 will be described. In this example, four behaviors (the right hand is raised and then lowered, the left hand is raised and lowered, and the right and left feet are the same) are shown as the behaviors to be recognized. In FIG. 5, a series of five pieces arranged side by side shows an example of each operation. The operation is to raise the right hand, the left hand, the right foot, and the left foot from the above. 3 for each of these categories
Two trials were performed, one of which was used as learning data and the other of which was used as recognition experiment data. A mesh feature was used as the feature vector. In the quantization, the vectors of the 20 images shown in FIG. 5 were used as the representative points. That is, among the 20 feature vectors f _q (i), (i = 1, ..., 20) obtained from these images, the feature vector f _{m is determined} by the one having the closest Euclidean distance.
(J) is quantized into a symbol Oj, which is represented by O _j = argmin _i {f _m (j), f _q (i)}. These 20 representative points will form a codebook. In this experimental example, in order to simplify the creation of the codebook, k-
An appropriate image during the operation was selected regardless of the codebook creation algorithm such as the mean method.

【００４６】学習によって生成された４つの状態遷移モ
デルに、認識実験用のデータを適用して各々、４つのな
かで最大の尤度をもつものを認識結果として選択した。
各々の画像に対応するシンボル（ここでは数字）を図６
に示す。次に、実験の結果を示す。図７にある動作（左
足を上げる）に対応するシンボル列の一例を示す。この
シンボル列を認識用データとしたときの結果を図８に示
す。これは認識用データ（動作＝左足を上げる）に対す
る４つのモデルの尤度である。同じカテゴリに属するモ
デルの尤度が高く正しい認識が行われていることがわか
る。同様の試験を３回の試行の各々を学習用データとし
て順に用いて各組合せでの認識率を調べたところ平均の
認識率は８８％であった。実験に用いた３人では８８
％、８８％、９６％で、平均９０％という結果が得られ
た。Data for recognition experiments were applied to the four state transition models generated by learning, and the one having the maximum likelihood among the four models was selected as the recognition result.
The symbols (here, numbers) corresponding to each image are shown in FIG.
Shown in. Next, the results of the experiment are shown. An example of a symbol string corresponding to the operation (raising the left foot) shown in FIG. 7 is shown. FIG. 8 shows the result when this symbol string is used as the recognition data. This is the likelihood of the four models with respect to the recognition data (motion = raise left foot). It can be seen that models belonging to the same category have a high likelihood and are correctly recognized. When the recognition rate of each combination was examined by sequentially using each of the three trials as the learning data in the same test, the average recognition rate was 88%. 88 for 3 people used in the experiment
%, 88%, 96%, and an average result of 90% was obtained.

【００４７】〔実施例２〕図９は、本発明の実施例２の
機能構成を示すブロック図である。本実施例２は、量子
化によるシンボルへの変換を要しない例であり、図９に
示すように、特徴用メモリから量子化部を経ることなく
連続モデル尤度算出部６１、連続モデルパラメータ推定
部６３において、それぞれ認識、学習の処理が行われ
る。この場合は、状態遷移モデルの各状態は、確率的に
特徴ベクトルを出力するものとして定式化される。すな
わち、各状態から特徴ベクトルが出力される確率密度関
数が正規分布の混合の形で表現されるものとする。従っ
て、モデルのパラメータは、先に示した例におけるシン
ボル出力確率にかえて混合される正規分布の平均と分
散、及びそれらの混合の重み計数を用いるが、基本的な
動作は量子化を伴う前記実施例１と同様である。６２は
認識用連続状態遷移モデル格納メモリである。[Second Embodiment] FIG. 9 is a block diagram showing the functional arrangement of a second embodiment of the present invention. The second embodiment is an example in which conversion into symbols by quantization is not required, and as shown in FIG. 9, the continuous model likelihood calculating unit 61, the continuous model parameter estimation without passing through the quantization unit from the feature memory. In the unit 63, recognition and learning processes are performed, respectively. In this case, each state of the state transition model is formulated as a probability vector for outputting a feature vector. That is, it is assumed that the probability density function in which the feature vector is output from each state is expressed in the form of a mixture of normal distributions. Therefore, the parameters of the model use the mean and variance of the normal distribution mixed in place of the symbol output probabilities in the above-mentioned example, and the weighting coefficient of those mixtures, but the basic operation is the This is the same as the first embodiment. Reference numeral 62 is a recognition continuous state transition model storage memory.

【００４８】以上、本発明を実施例に基づいて具体的に
説明したが、本発明は、前記実施例に限定されるもので
はなく、その要旨を逸脱しない範囲において種々変更し
得ることはいうまでもない。Although the present invention has been specifically described based on the embodiments, the present invention is not limited to the above embodiments, and various modifications can be made without departing from the scope of the invention. Nor.

【００４９】[0049]

【発明の効果】以上、説明したように、本発明によれ
ば、事例からの学習によって動画像による行動認識系を
構成することができるので、従来の手法に比べ、自律的
に対象や環境に適応でき、高度な認識が可能となる。ま
た、画像上でのモデルフィッティングを含まないため、
実画像に対してもロバストな処理が実現できる。従っ
て、この発明は、銀行や商店における不審行動監視、ス
ポーツなどの動画から所望の動作部分の切り出しなどに
広く適用できる。As described above, according to the present invention, a behavior recognition system based on moving images can be constructed by learning from a case. Adaptable and highly recognizable. Also, since it does not include model fitting on the image,
Robust processing can be realized even for real images. Therefore, the present invention can be widely applied to suspicious behavior monitoring in banks and shops, cutting out desired motion parts from moving images such as sports, and the like.

[Brief description of drawings]

【図１】本発明の動物体行動認識装置の実施例１の概
略構成を示すブロック図、FIG. 1 is a block diagram showing a schematic configuration of a first embodiment of a moving object behavior recognition device of the present invention,

【図２】本実施例１の動物体行動認識装置の機能構成
を示すブロック図、FIG. 2 is a block diagram showing a functional configuration of a moving object behavior recognition device according to the first embodiment,

【図３】本実施例１のメッシュ特徴を説明するための
図、FIG. 3 is a diagram for explaining mesh features of the first embodiment,

【図４】本実施例１のオプティカルフローを用いた特
徴量を説明するための図、FIG. 4 is a diagram for explaining a feature amount using the optical flow according to the first embodiment,

【図５】本実施例１の実験対象動作の代表画像を示す
図、FIG. 5 is a diagram showing a representative image of an experiment target operation of the first embodiment,

【図６】本実施例１の代表画像に対応するシンボルを
示す図、FIG. 6 is a diagram showing symbols corresponding to a representative image of the first embodiment,

【図７】本実施例１のある動作に対応するシンボル列
の例を示す図、FIG. 7 is a diagram showing an example of a symbol string corresponding to a certain operation of the first embodiment,

【図８】本実施例１の実験結果を示す図、FIG. 8 is a diagram showing an experimental result of the first embodiment,

【図９】本発明の実施例２の機能構成を示すブロック
図。FIG. 9 is a block diagram showing a functional configuration of a second embodiment of the present invention.

[Explanation of symbols]

１１…画像入力装置、１２…コンピュータ、１３…外部
メモリ装置、２１…画像入力装置、２２…画像用メモ
リ、２３…特徴抽出部、２４…特徴格納メモリ、２５…
量子化部、２６…シンボル列格納用メモリ、２７…尤度
算出部、２８…認識結果用メモリ、２９…モデルパラメ
ータ推定部、３０…認識用状態遷移モデル格納メモリ、
４１…メッシュ特徴抽出画像の例、５１…オプティカル
フローの例、６１…連続モデル尤度算出部、６２…認識
用連続状態遷移モデル格納メモリ、６３…連続モデルパ
ラメータ推定部。11 ... Image input device, 12 ... Computer, 13 ... External memory device, 21 ... Image input device, 22 ... Image memory, 23 ... Feature extraction unit, 24 ... Feature storage memory, 25 ...
Quantization unit, 26 ... Symbol string storage memory, 27 ... Likelihood calculation unit, 28 ... Recognition result memory, 29 ... Model parameter estimation unit, 30 ... Recognition state transition model storage memory,
41 ... Example of mesh feature extraction image, 51 ... Example of optical flow, 61 ... Continuous model likelihood calculation unit, 62 ... Recognition continuous state transition model storage memory, 63 ... Continuous model parameter estimation unit.

Claims

[Claims]

1. A moving object recognition apparatus for recognizing a moving object such as a human being in a scene, wherein each motion of the moving object is characterized by a mesh feature extracted from an image, a directional distribution of optical flow, and the like. , A method of acquiring a time-series model of behavior as a probabilistic state transition model corresponding to each recognition category by training with learning data, and the probabilities that those models generate recognition target behaviors. An apparatus for recognizing a behavior of a moving object, comprising: