JP2005092346A

JP2005092346A - Characteristic extraction method from three-dimensional data and device

Info

Publication number: JP2005092346A
Application number: JP2003321962A
Authority: JP
Inventors: Nobuyuki Otsu; 展之大津; Takumi Kobayashi; 匠小林
Original assignee: National Institute of Advanced Industrial Science and Technology AIST
Current assignee: National Institute of Advanced Industrial Science and Technology AIST
Priority date: 2003-09-12
Filing date: 2003-09-12
Publication date: 2005-04-07
Anticipated expiration: 2023-09-12
Also published as: JP4061377B2

Abstract

PROBLEM TO BE SOLVED: To provide a recognition method and device for operation in a moving image. SOLUTION: Characteristic data are extracted from three-dimensional data on the moving image by a cubic high-order local autocorrelation characteristic extraction method, the extracted characteristic data are converted by statistical technique such as multivariate analysis to produce new characteristic data, and the new characteristic data are compared to registration data to perform decision. A characteristic value extracted by the cubic high-order local autocorrelation characteristic extraction method is a position-unchangeable value not depending on time or a place in a target solid. When a plurality of targets are present in the solid, the whole characteristic value becomes the sum of the respective individual characteristic values, so that it can be easily handled in later recognition. A calculation amount for the characteristic extraction is small to allow a real-time process. COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、動画像における動作の認識、および３次元形状認識に関わる技術に関するものである。 The present invention relates to a technique related to recognition of motion in a moving image and three-dimensional shape recognition.

画像データから特定の図形等を検出し認識したり、登録されている画像との照合のための各種の技術が提案されている。発明者らは下記に公報番号を示すように、２次元画像に対して、幾つかの望ましい条件から非常に汎用的な高次自己相関特徴に基づく学習適応型画像認識・計測方式の発明を行った。
特許第２９８２８１４号公報 Various techniques for detecting and recognizing a specific figure from image data and collating with a registered image have been proposed. The inventors have invented a learning-adaptive image recognition / measurement method based on a very general-purpose higher-order autocorrelation feature for a two-dimensional image from several desirable conditions, as shown in the publication number below. It was.
Japanese Patent No. 2982814

動画像は２次元（静止）画像が時間に沿って並んだ３次元（立体）の数値データである。これらの動画像における動作の認識の分野では、新たなニーズが高まりつつあるが、文字など２次元の静止画像の認識に比べると、幾つかのヒューリスティックな（思いつき）手法の提案を除き、殆ど基本的かつ汎用的な体系立った特徴抽出方式が存在しない。従来の動画像からの最も基本的な特徴抽出手法とされているものにオプティカルフローがあるが、それが寄って立つ前提条件が実際には厳しく、しかも微分に基づくために実際のノイズに弱いなど、実応用につながっていない。 A moving image is three-dimensional (three-dimensional) numerical data in which two-dimensional (still) images are arranged in time. In the field of motion recognition in these moving images, new needs are increasing. However, compared to recognition of two-dimensional still images such as characters, the basics are almost the same except for some heuristic methods. There is no systematic and generalized feature extraction method. Optical flow is one of the most basic feature extraction methods from conventional moving images, but the preconditions on which it approaches are actually severe, and because it is based on differentiation, it is vulnerable to actual noise, etc. , Has not led to real application.

本発明は、動画像認識において今後ますますニーズの高くなるコンピュータ・ビジョン（人工視覚）に広く使える基本的かつ汎用的な特徴抽出方式であり、高次局所自己相関特徴抽出方式を３次元に拡張した立体高次局所自己相関特徴抽出方式であることを最も主要な特徴とする。そして、この立体高次局所自己相関特徴抽出方式と多変量解析などの統計的な情報統合手法を組み合わせることにより、適応学習型の汎用動画像認識方式が得られる。また同手法は、３次元形状認識にもそのまま用いることができる。 The present invention is a basic and general-purpose feature extraction method that can be widely used in computer vision (artificial vision), which will become increasingly necessary in moving image recognition in the future, and extends the higher-order local autocorrelation feature extraction method to three dimensions. 3D higher order local autocorrelation feature extraction method is the main feature. Then, by combining this three-dimensional higher-order local autocorrelation feature extraction method and a statistical information integration method such as multivariate analysis, an adaptive learning type general-purpose moving image recognition method can be obtained. The method can also be used as it is for 3D shape recognition.

立体高次局所自己相関特徴抽出方式によって抽出される特徴値は、対象の立体における場所や時間に依らない位置不変な値である（位置不変性）ので、対象を画像から切り出す必要がない。また、立体内に複数個の対象がある場合、全体の特徴値はそれぞれの個別の特徴値の和になる性質を持つ（加法性）。これらの性質は、以後の認識にとって扱い易く好ましい性質である。さらに、微分でなく積分（累積）に基づくためにノイズに頑健である。また、特徴抽出のための計算量が少なく、実時間処理が可能である。 Since the feature value extracted by the three-dimensional higher-order local autocorrelation feature extraction method is a position-invariant value that does not depend on the location or time in the target solid (position invariance), it is not necessary to cut out the object from the image. In addition, when there are a plurality of objects in a solid, the entire feature value has the property of being the sum of the individual feature values (additive property). These properties are easy to handle and preferable for subsequent recognition. Furthermore, it is robust to noise because it is based on integration (accumulation) rather than differentiation. In addition, the amount of calculation for feature extraction is small, and real-time processing is possible.

発明者がこの方式を実際の動画像認識に対して実験した結果、本発明の動画像からの動作の認識結果は非常に良好であり、画像認識、計測、人工視覚、ロボット視覚、コンピュータ・ヒューマン・インタフェース分野全般、さらにはビデオサーベイランス、監視システム、警備システムなどセキュリティに関わる応用分野など、広く様々な課題に応用できるという利点がある。 As a result of the inventor's experiment on this method for actual moving image recognition, the recognition result of the motion from the moving image of the present invention is very good, and image recognition, measurement, artificial vision, robot vision, computer human -It has the advantage that it can be applied to a wide variety of issues such as the interface field in general, as well as application fields related to security such as video surveillance, surveillance systems, and security systems.

動画像データの認識や計測は、対象の３次元データ内における場所に寄らないことが望ましく、従って抽出される特徴データは位置不変であることが望ましい。また、３次元データ内に複数個の対象がある場合、全体の特徴値はそれぞれの個別の特徴値の和になると以後の認識にとって扱い易い。さらに、特徴抽出としては計算量が少なく実時間処理が可能であることが望ましい。 It is desirable that the recognition and measurement of moving image data does not depend on the location in the target three-dimensional data, and therefore, the extracted feature data is desirably position-invariant. Further, when there are a plurality of objects in the three-dimensional data, the entire feature value is easy to handle for subsequent recognition if it becomes the sum of the individual feature values. Furthermore, it is desirable that feature extraction requires a small amount of calculation and can be performed in real time.

これらの要請条件を満たす基本的で汎用的な特徴抽出方式として、２次元の場合の高次局所自己相関特徴抽出方式を３次元に拡張した立体高次局所自己相関特徴抽出方式を用いる。そして、この特徴抽出方式と多変量解析などの統計的な情報統合手法を組み合わせることにより、適応学習型の汎用動画像（動作）認識方式が得られる。 As a basic and general-purpose feature extraction method that satisfies these requirements, a three-dimensional higher-order local autocorrelation feature extraction method that is a three-dimensional extension of the higher-order local autocorrelation feature extraction method in the two-dimensional case is used. An adaptive learning type general-purpose moving image (motion) recognition method can be obtained by combining this feature extraction method and a statistical information integration method such as multivariate analysis.

図１は、本発明による動作認識処理の内容を示すフローチャートである。なお、この処理は、デジタルビデオカメラやその他の動画像を取り込むためのインターフェイス回路を備えたパソコンなどの周知の任意のコンピュータシステムにおいてプログラムを作成し、インストールして起動することにより実行される。動画像データは例えばビデオカメラから実時間で入力されてもよいし、一旦ファイルに保存されてから３次元データとして読み込んでもよい。従って、システムのハードウェア構成については説明を省略する。 FIG. 1 is a flowchart showing the contents of motion recognition processing according to the present invention. This process is executed by creating, installing, and starting up a program in any known computer system such as a digital video camera or other personal computer equipped with an interface circuit for capturing moving images. The moving image data may be input in real time from a video camera, for example, or may be read as three-dimensional data after being saved in a file. Therefore, description of the hardware configuration of the system is omitted.

Ｓ１０においては、動画像（あるいは３次元画像）データ等の３次元データを読み込む。Ｓ１１においては、入力動画データに対して「動き」の情報を検出し、背景など静止しているものを除去する目的で差分データを生成する。差分の取り方としては、隣接するフレーム間の同じ位置の画素の輝度（カラー画像の場合は色）の変化を抽出するフレーム間差分またはフレーム内における輝度の変化部分を抽出して得られるエッジのフレーム間差分、あるいは両方を採用可能である。 In S10, three-dimensional data such as moving image (or three-dimensional image) data is read. In S11, "motion" information is detected for the input moving image data, and difference data is generated for the purpose of removing stationary objects such as the background. The difference can be obtained by extracting the change in the luminance (color in the case of a color image) of the pixel at the same position between adjacent frames, or by extracting the change in luminance in the frame. Difference between frames or both can be adopted.

Ｓ１２においては、差分値から「動き」に無関係な輝度値や色情報、ノイズを除去し、動きあり（値１）動きなし（値０）の２値情報にするために、自動閾値選定による２値化を行う。輝度あるいはエッジの差分で動きのある画素が得られるが、差分の値は輝度の差であり、照明条件や色などで値が異なってくるため、動きの情報だけではない。２値化の目的は、ある閾値を設けて、これより差分値が小さいとノイズとみなし、値を０とし、動きなしと判断する。また、閾値より差分値が大きいと値を１とし、動きありと見なす（判断する）ことにある。これによって、ノイズや色や明るさの違いに頑健にいわゆる「動き」情報を得ることができる。カラー画像の場合も、差分値として色ベクトル(R,G,B)差分値（距離）をとれば、濃淡画像と同様に扱うことができる。２値化の方法としては、一定閾値、判別最小二乗自動閾値法（大津方式：画像内のピクセル値のヒストグラムを作成し、２群が統計的にもっとも分離される閾値を自動的に選定する）、閾値０及びノイズ処理方式（濃淡画像において差が０以外を全て動き有り＝１としてまず２値化を行い、その２値画像のノイズを除去するために２値画像に対して収縮処理を行う方法）を採用可能である。図４は、２値化差分処理結果の画像を示す説明図である。図４左は入力された濃淡画像であり、右がフレーム間差分を取り、２値化した画像である。以上の前処理により入力動画データは画素値に「動いた（１）」「動かない（０）」の論理値をもつフレームの列となる。 In S12, in order to remove the luminance value, color information and noise irrelevant to “movement” from the difference value, and to obtain binary information with movement (value 1) and no movement (value 0), 2 by automatic threshold selection. Perform valuation. Pixels with motion can be obtained by luminance or edge difference, but the difference value is a luminance difference, and the value varies depending on illumination conditions, colors, and the like, and thus is not limited to motion information. The purpose of binarization is to set a certain threshold value, and if the difference value is smaller than this, it is regarded as noise, the value is set to 0, and it is determined that there is no movement. Further, when the difference value is larger than the threshold value, the value is set to 1 and it is considered (determined) that there is motion. This makes it possible to obtain so-called “motion” information that is robust against differences in noise, color, and brightness. In the case of a color image, if a color vector (R, G, B) difference value (distance) is taken as a difference value, it can be handled in the same manner as a grayscale image. As a binarization method, a constant threshold value, a discriminative least squares automatic threshold method (Otsu method: create a histogram of pixel values in an image, and automatically select a threshold value that most statistically separates the two groups) , Threshold 0 and noise processing method (all differences except for 0 in grayscale image have motion = 1), binarization is first performed, and the binary image is subjected to contraction processing to remove noise of the binary image Method). FIG. 4 is an explanatory diagram showing an image of the binarization difference processing result. The left side of FIG. 4 is the input grayscale image, and the right side is an image obtained by binarizing the difference between frames. By the above preprocessing, the input moving image data becomes a sequence of frames having logical values of “moved (1)” and “not moved (0)” as pixel values.

図１に戻って、Ｓ１３においては、詳細は後述するが、立体高次局所自己相関特徴抽出処理（２５１次元特徴データ生成）を行う。自己相関関数の高次への拡張が高次自己相関関数である。Ｎ次自己相関関数は、３次元データをｆ（ｒ）（但しｒ＝（ｘ，ｙ，ｚ））とすると、次の式１となる。 Returning to FIG. 1, in S13, although the details will be described later, a solid higher-order local autocorrelation feature extraction process (251-dimensional feature data generation) is performed. An extension of the autocorrelation function to higher orders is a higher order autocorrelation function. The Nth-order autocorrelation function is expressed by the following formula 1 when the three-dimensional data is f (r) (where r = (x, y, z)).

x_N(a₁,a₂,…,a_N)=∫f(r)f(r+a₁)…f(r+a_N)dr ……（式１）ただし、０次は３次元データの総和、x₀(0)=∫f(r)dr である x _N (a ₁ , a ₂ , ..., a _N ) = ∫f (r) f (r + a ₁ ) ... f (r + a _N ) dr ...... (Equation 1) However, the 0th order is three-dimensional data X ₀ (0) = ∫f (r) dr

ここで(a₁,a₂,…,a_N)は参照点（注目画素）からみた変位方向である。変位方向、次数のとり方により高次自己相関関数は無数に考えられるが、これを局所領域に限定したものが高次局所自己相関関数である。立体高次局所自己相関特徴では相関を取る範囲を注目画素から所定の距離にある画素の範囲とし、例えば変位方向を参照点を中心とする３×３×３画素の局所領域内、即ち参照点の２６近傍に限定している。特徴量の計算は１組の変位方向に対して式１の積分値が１つの特徴量になる。従って変位方向の組み合わせ（＝マスクパターン）の数だけ特徴量が生成される。 Here, (a ₁ , a ₂ ,..., A _N ) is a displacement direction viewed from the reference point (target pixel). There are an infinite number of higher-order autocorrelation functions depending on the direction of displacement and the order, but the higher-order local autocorrelation function is limited to local regions. In the three-dimensional high-order local autocorrelation feature, the range to be correlated is set as a pixel range at a predetermined distance from the target pixel, for example, within a 3 × 3 × 3 pixel local region centered on the reference point in the displacement direction, that is, the reference point It is limited to 26 vicinity. In the calculation of the feature amount, the integral value of Expression 1 becomes one feature amount for one set of displacement directions. Therefore, feature amounts are generated as many as the number of combinations of displacement directions (= mask patterns).

特徴量の数、つまり特徴ベクトルの次元はマスクパターンの種類に相当する。２値画像の場合、画素値１を何回乗算しても１であるので、二乗以上の項は乗数のみが異なる１乗の項と重複するものとして削除する。また式１の積分操作（スキャン）において、平行移動すると一致するパターンは重複するので、１つの代表パターンを残して他を削除する。式１右辺の式で参照点f(r)は局所領域の中心を必ず含むので、代表パターンとしては中心点を参照点とし、パターン全体が３×３×３画素の局所領域内に収まるものを選択する。この結果、中心点を含むマスクパターンの種類は、選択画素数が１個のもの：１個、２個のもの：２６個、３個のもの：２６×２５／２＝３２５個の計３５２個あるが、式１の積分操作（平行移動：スキャン）で重複するパターンを除くと、マスクパターンの種類は２５１種類となる。即ち、１つの３次元データに対する立体高次局所自己相関特徴ベクトルは２５１次元となる。なお、画素の値が多値の濃淡画像の場合には、例えば画素値をａとすると、相関値はａ（０次）≠ａ×ａ（１次）≠ａ×ａ×ａ（２次）となり、選択画素が同じでも乗数の異なるものを重複削除できない。従って、多値の場合には、２値の場合より選択画素数が１の場合に２個、選択画素数が２の場合に２６個増加し、マスクパターンの種類は計２７９種類となる。 The number of feature amounts, that is, the dimension of the feature vector corresponds to the type of mask pattern. In the case of a binary image, since the pixel value 1 is multiplied by 1 regardless of how many times it is, a term of square or higher is deleted as an overlapping term of a square with a different multiplier only. Further, in the integration operation (scan) of Equation 1, matching patterns overlap when they are moved in parallel, so that one representative pattern is left and the other is deleted. Since the reference point f (r) in the expression on the right side of Equation 1 always includes the center of the local area, the representative pattern is the center point as the reference point, and the whole pattern fits within the local area of 3 × 3 × 3 pixels. select. As a result, the number of types of mask patterns including the center point is 352 with a total number of selected pixels of one: one, two: 26, three: 26 × 25/2 = 325. However, there are 251 types of mask patterns except for overlapping patterns in the integration operation (parallel movement: scan) of Equation 1. That is, the three-dimensional higher-order local autocorrelation feature vector for one three-dimensional data is 251 dimensions. When the pixel value is a multi-value gray image, for example, when the pixel value is a, the correlation value is a (0th order) ≠ a × a (primary) ≠ a × a × a (secondary). Thus, even if the selected pixels are the same, those having different multipliers cannot be redundantly deleted. Therefore, in the case of multi-value, the number is 2 when the number of selected pixels is 1 and 26 when the number of selected pixels is 2, and the number of mask patterns is 279 in total.

このように計算される立体高次局所自己相関特徴の性質としては、自己相関で積分の特徴量であるために、しかも変位方向を局所領域に限定しているために、データ内の対象の位置に依らない位置不変性と、対象に対する加法性といった好ましい性質がある。またノイズ対しても頑強である性質を持つ。Ｓ
１４においては、多変量解析手法により予め求めた課題に適した係数行列Ａを用いて、下記の計算により、課題に有効な新特徴データを生成する。 As the properties of the three-dimensional higher-order local autocorrelation feature calculated in this way, since the feature is an integral feature by autocorrelation and the displacement direction is limited to the local region, the position of the target in the data There are favorable properties such as position invariance not depending on and additivity to the object. It is also robust against noise. S
14, new feature data effective for the task is generated by the following calculation using the coefficient matrix A suitable for the task determined in advance by the multivariate analysis method.

ｙ＝Ａ’ｘ y = A’x

ここで、ｘは２５１次元特徴ベクトルデータ（縦ベクトル）、ｙはｎ次元（一般にn<<251）の新特徴ベクトルデータ（縦ベクトル）、Ａ’は係数行列Ａの転置行列である。新特徴ベクトルｙの次数ｎは、例えば判別分析の場合、「識別したいクラスの数−１」および「元の特徴ベクトルの次元」の小さい方となるので、例えばクラスが４であればｙは３次元となり、特徴データが大幅に圧縮される。また、主成分分析の場合は元の次元251以下の任意の次元を有効な順番に取れる。 Here, x is 251-dimensional feature vector data (vertical vector), y is n-dimensional (generally n << 251) new feature vector data (vertical vector), and A 'is a transposed matrix of the coefficient matrix A. For example, in the case of discriminant analysis, the order n of the new feature vector y is the smaller of “the number of classes to be identified−1” and “the dimension of the original feature vector”. Dimensional and feature data is greatly compressed. In the case of principal component analysis, any dimension below the original dimension 251 can be taken in an effective order.

ここで、係数行列Ａの決定方法について説明する。係数行列Ａは、例えば「右歩き」など「動作」が判明している学習用データ（２５１次元特徴ベクトルデータ）を使用して、回帰分析、主成分分析、判別分析等の周知の多変量解析手法を使用して求める。なお、多変量解析手法については前記した特許文献あるいは、柳井他編「多変量解析実例ハンドブック」（2002年6月25日朝倉書店発行）などに記載されているので詳細な説明は省略すが、一例として判別分析について説明する。判別分析においては、ベクトルｙが空間としてＫクラスを最適に分離するように係数Ａは次の固有値問題の解（固有値ベクトル）として求まる。 Here, a method for determining the coefficient matrix A will be described. The coefficient matrix A is a well-known multivariate analysis such as regression analysis, principal component analysis, discriminant analysis, etc. using learning data (251 dimensional feature vector data) such as “right walk”, for example. Find using a technique. The multivariate analysis method is described in the above-mentioned patent document or Yanai et al.'S “Multivariate Analysis Handbook” (published on June 25, 2002 by Asakura Shoten). As an example, discriminant analysis will be described. In discriminant analysis, the coefficient A is obtained as a solution of the next eigenvalue problem (eigenvalue vector) so that the vector y is optimally separated as a space.

Ｘ_BＡ＝Ｘ_wＡΛ （Ａ’Ｘ_wＡ＝Ｉ） X _B A = X _w AΛ (A′X _w A = I)

ここに、Λは固有値対角行列、Ｉは単位行列である。また、Ｘ_w、Ｘ_Bはそれぞれ特徴ベクトルｘのクラス内、クラス間共分散行列であり、次式で定義される。 Here, Λ is an eigenvalue diagonal matrix, and I is a unit matrix. X _w and X _B are the intra-class and inter-class covariance matrices of the feature vector x, respectively, and are defined by the following equations.

ここで、ω_jはクラスｊの生起確率、ｘ_j（アッパーライン）はクラスｊの平均ベクトル、Ｘ_jはクラスｊの共分散行列、ｘ_T（アッパーライン）は全平均ベクトルである。 Here, ω _j is the occurrence probability of class j, x _j (upper line) is the average vector of class j, X _j is the covariance matrix of class j, and x _T (upper line) is the total average vector.

Ｓ１５においては、新特徴空間Ｙにおいて登録データと比較する。比較方法としては、例えば新特徴空間における各クラスの平均ベクトル（y_j（アッパーライン）=Ａ’x_j（アッパーライン））と入力新特徴ベクトル間の距離を計算する。Ｓ１６においては、比較結果に基づき判定を行い、認識結果を出力する。最も単純な判定方法としては、登録されている各クラス平均ベクトル（各クラスの重心）の内で最も距離の近いものに対応する動作と判定する。あるいは新特徴ベクトルの内で最も入力に距離が近いk個の登録データの内、最大多数のクラスに対応する動作と判定するk-ＮＮ（Nearest neighbor rule）法などの方法が考えられる。更に高度な非線形識別方法としては、カーネルベース識別手法を採用してもよい。 In S15, the new feature space Y is compared with the registered data. As a comparison method, for example, the distance between the average vector (y _j (upper line) = A′x _j (upper line)) of each class in the new feature space and the input new feature vector is calculated. In S16, a determination is made based on the comparison result, and the recognition result is output. As the simplest determination method, it is determined that the motion corresponds to the closest one of the registered class average vectors (the center of gravity of each class). Alternatively, a method such as a k-NN (Nearest neighbor rule) method for determining an operation corresponding to a maximum number of classes among k pieces of registered data closest to the input among new feature vectors can be considered. As a more advanced nonlinear identification method, a kernel-based identification method may be adopted.

図２は、Ｓ１３の立体高次局所自己相関特徴抽出処理の内容を示すフローチャートである。Ｓ２０においては、２５１個の相関パターンカウンタをクリアする。Ｓ２１においては、未処理の画素を１つ選択する（注目画素を順にスキャンする）。Ｓ２２においては、未処理のマスクパターンを１つ選択する。 FIG. 2 is a flowchart showing the contents of the three-dimensional higher-order local autocorrelation feature extraction process in S13. In S20, 251 correlation pattern counters are cleared. In S21, one unprocessed pixel is selected (the pixel of interest is scanned sequentially). In S22, one unprocessed mask pattern is selected.

図５は、３次元画素空間における自己相関処理範囲を示す斜視図である。また、図６は、３次元画素空間における自己相関処理座標を示す説明図である。図６においては、ｔ−１フレーム、ｔフレーム、ｔ＋１フレームの３つのフレームのｘｙ平面を並べて図示してある。本発明においては、注目画素を中心とする３×３×３（＝２７）画素の立方体の内部の画素について相関を取る。マスクパターンは、相関を取る画素の組合せを示す情報であり、マスクパターンによって選択された画素のデータは相関値の計算に使用されるが、マスクパターンによって選択されなかった画素は無視される。 FIG. 5 is a perspective view showing an autocorrelation processing range in the three-dimensional pixel space. FIG. 6 is an explanatory diagram showing autocorrelation processing coordinates in a three-dimensional pixel space. In FIG. 6, the xy planes of three frames of t−1 frame, t frame, and t + 1 frame are shown side by side. In the present invention, a correlation is obtained for pixels inside a cube of 3 × 3 × 3 (= 27) pixels centered on the pixel of interest. The mask pattern is information indicating a combination of pixels to be correlated, and data of pixels selected by the mask pattern is used for calculation of correlation values, but pixels not selected by the mask pattern are ignored.

前記したように、マスクパターンでは注目画素（中心の画素）は必ず選択される。また、２値画像で０次〜２次までの相関値を考えた場合、３×３×３画素の立方体において重複を排除した後のパターン数は２５１個となる。図７は、自己相関マスクパターンの例を示す説明図である。図７（１）はハッチングを施した注目画素のみの最も簡単な０次のマスクであり、（２）は２つの画素が選択されている例（１次）、（３）、（４）は３つの画素が選択されている例（２次）である。 As described above, the target pixel (center pixel) is always selected in the mask pattern. Further, when considering correlation values from the 0th order to the 2nd order in the binary image, the number of patterns after eliminating the overlap in a 3 × 3 × 3 pixel cube is 251. FIG. 7 is an explanatory diagram showing an example of an autocorrelation mask pattern. FIG. 7 (1) is the simplest 0th-order mask of only the target pixel to which hatching is applied. (2) is an example in which two pixels are selected (first order), (3), (4) are This is an example (secondary) in which three pixels are selected.

図２に戻って、Ｓ２３においては、前記した式１を用いて相関値を計算する。式１のf(r)f(r+a₁)…f(r+a_N)の式はマスクパターンと対応する座標の差分２値化３次元データの値を掛け合わせる（＝相関値、０または１）ことに相当する。また、式１の積分操作は注目画素を３次元データ内で移動（スキャン）させて相関値をカウンタによって足し合わせる（１をカウントする）ことに相当する。 Returning to FIG. 2, in S 23, the correlation value is calculated using Equation 1 described above. The expression f (r) f (r + a ₁ )... F (r + a _N ) in Expression 1 is multiplied by the difference binary three-dimensional data values of the coordinates corresponding to the mask pattern (= correlation value, 0). Or 1). Further, the integration operation of Expression 1 corresponds to moving (scanning) the target pixel in the three-dimensional data and adding the correlation values by the counter (counting 1).

Ｓ２４においては、相関値は１か否かが判定され、判定結果が肯定の場合にはＳ２５に移行するが、否定の場合にはＳ２６に移行する。Ｓ２５においては、マスクパターンと対応する相関パターンカウンタを＋１する。Ｓ２６においては、全てのパターンについて処理が完了したか否かが判定され、判定結果が肯定の場合にはＳ２７に移行するが、否定の場合にはＳ２２に移行する。 In S24, it is determined whether or not the correlation value is 1. If the determination result is affirmative, the process proceeds to S25, but if not, the process proceeds to S26. In S25, the correlation pattern counter corresponding to the mask pattern is incremented by one. In S26, it is determined whether or not the processing has been completed for all patterns. If the determination result is affirmative, the process proceeds to S27, but if not, the process proceeds to S22.

Ｓ２７においては、全ての画素について処理が完了したか否かが判定され、判定結果が肯定の場合にはＳ２８に移行するが、否定の場合にはＳ２１に移行する。Ｓ２８においては、パターンカウンタ値の集合を２５１次元特徴データとして出力する。なお、画素値が多値のグレースケールの場合には相関値も多値となるので、カウンタの代わりにレジスタを使用し、相関値をレジスタに加算していくことによって特徴量を生成する。 In S27, it is determined whether or not the processing has been completed for all pixels. If the determination result is affirmative, the process proceeds to S28, but if not, the process proceeds to S21. In S28, a set of pattern counter values is output as 251D feature data. Note that when the pixel value is a multi-valued gray scale, the correlation value is also a multi-value. Therefore, a feature is generated by using a register instead of a counter and adding the correlation value to the register.

図３は、本発明による動画像実時間処理の内容を示すフローチャートである。図１、２に示した処理は動画像データを予め取り込んでから処理を行う場合の例であるが、図３の例は、例えばビデオカメラから実時間で画像データを取り込んで処理を行う場合の例である。 FIG. 3 is a flowchart showing the contents of moving image real-time processing according to the present invention. The processing shown in FIGS. 1 and 2 is an example in which processing is performed after moving image data is captured in advance. However, the example in FIG. 3 is performed when processing is performed by capturing image data in real time from, for example, a video camera. It is an example.

Ｓ３０においては、フレームデータの入力があるまで待つ。Ｓ３１においては、フレーム画像データを入力する。Ｓ３２においては、図１のＳ１１、Ｓ１２と同様に差分データを生成し、２値化する。Ｓ３３においては、新たな１フレーム分の画素データに関する相関パターンカウント処理を行う。 In S30, the process waits until frame data is input. In S31, frame image data is input. In S32, difference data is generated and binarized as in S11 and S12 of FIG. In S33, correlation pattern count processing relating to pixel data for one new frame is performed.

図８は、本発明による動画像実時間処理の内容を示す説明図である。動画像データはフレームのシーケンスとなっている。そこで時間方向に一定幅の時間窓を設定し、窓内のフレーム集合を１つの３次元データとする。そして、新たなフレームが入力される度に時間窓を移動させ、古いフレームを削除することで有限な３次元データが得られる。この時間窓の長さは、認識しようとする動作の１周期より長く設定することが望ましい。 FIG. 8 is an explanatory diagram showing the contents of the moving image real-time processing according to the present invention. The moving image data is a sequence of frames. Therefore, a time window having a certain width is set in the time direction, and a frame set in the window is set as one three-dimensional data. Each time a new frame is input, the time window is moved, and the old frame is deleted to obtain finite three-dimensional data. The length of this time window is desirably set longer than one cycle of the operation to be recognized.

図８において時刻ｔに新たなフレームが入力された時点では、直前の時間窓（t-1，t-n-1）に対応する特徴データはすでに算出されている。但し、（t-1）フレームは端なので相関値は（t-2）フレームに対応するものまで算出されている。 In FIG. 8, at the time when a new frame is input at time t, the feature data corresponding to the immediately preceding time window (t−1, t−n−1) has already been calculated. However, since the (t-1) frame is the end, the correlation value is calculated up to the one corresponding to the (t-2) frame.

そこで、新たに入力されたｔフレームを用いて（t-1）フレームに対応する特徴データを生成し、現在の特徴データに加算する。また、最も古い（t-n-1）フレームと対応する特徴データを現在の特徴データから減算する。このような処理により、より少ない計算量で特徴データの更新が可能となる。なお、上記処理を行うために、各フレームに対応して生成された特徴データを保存しておく。 Therefore, feature data corresponding to the (t−1) frame is generated using the newly input t frame and added to the current feature data. Also, the feature data corresponding to the oldest (t-n-1) frame is subtracted from the current feature data. By such processing, the feature data can be updated with a smaller calculation amount. In order to perform the above processing, feature data generated for each frame is stored.

図３に戻って、Ｓ３４においては、特徴データであるカウント値集合をフレームに対応して保存する。Ｓ３５においては、カウント値集合を現在の特徴データに加算する。Ｓ３６においては、最も古いフレームに対応するカウント値集合を特徴データから減算する。 Returning to FIG. 3, in S 34, the count value set as the feature data is stored corresponding to the frame. In S35, the count value set is added to the current feature data. In S36, the count value set corresponding to the oldest frame is subtracted from the feature data.

Ｓ３７においては、Ｓ１４と同様に多変量解析手法により得られた係数を使用して特徴データから新特徴データを生成する。Ｓ３８においては、Ｓ１５、Ｓ１６と同様に登録されている新特徴データとの一致を判定する。Ｓ３９においては、判定結果を出力する。Ｓ４０においては、処理を終了するか否かが判定され、判定結果が肯定の場合には処理を終了するが、否定の場合にはＳ３０に移行する。以上のような方法によって、実時間処理が可能となる。 In S37, new feature data is generated from the feature data using the coefficient obtained by the multivariate analysis method as in S14. In S38, a match with the registered new feature data is determined in the same manner as in S15 and S16. In S39, the determination result is output. In S40, it is determined whether or not to end the process. If the determination result is affirmative, the process ends. If not, the process proceeds to S30. Real-time processing is possible by the method described above.

次に、発明者らが行った実験の結果について説明する。基礎実験においては、対象者は５人、識別クラス（動作の種類）は「右歩き」「左歩き」「右走り」「左走り」の４クラス。動画像は352×240画素、グレースケール。人の大きさは30×80画素。全データ数は立体データが約2000個とした。 Next, the results of experiments conducted by the inventors will be described. In the basic experiment, there were 5 subjects, and the identification class (type of motion) was 4 classes of “right walk”, “left walk”, “right run”, and “left run”. The moving image is 352 x 240 pixels, grayscale. The size of a person is 30 x 80 pixels. The total number of data is about 2000 solid data.

ランダムに全体の3分の2をテストデータ、残りを学習データとして識別を行った。立体データの時間幅は３０フレームとした。但し歩行１周期は時間幅２５フレームに相当する。変更パラメータはマスクパターンの局所立体における縦、横、奥行の長さである。縦横はフレーム内での相関をとる空間的距離に相当し、奥行は時間軸上での相関をとる時間間隔に相当する。この結果、マスクの大きさにほとんど依存せずに高い識別率を得られることがわかった。 Randomly, two-thirds of the total was identified as test data and the rest as learning data. The time width of the three-dimensional data was 30 frames. However, one cycle of walking corresponds to a time width of 25 frames. The change parameter is the length of the vertical, horizontal, and depth in the local solid of the mask pattern. The vertical and horizontal directions correspond to the spatial distance for correlation within the frame, and the depth corresponds to the time interval for correlation on the time axis. As a result, it was found that a high identification rate can be obtained with little dependence on the size of the mask.

次に立体データの時間幅を変えて実験を行った。この結果、行動１周期が収まらない時間幅では若干識別率が落ちるが、それでもどの時間幅でも高い識別率が得られた。以上の実験から立体高次局所自己相関特徴がパラメータに依存しない特徴抽出手法であることが実証された。図９は、行動認識実験における新特徴空間での実験データの分布を示すグラフである。この実験では、クラス数が４であるので、判別分析によって得られる新特徴空間は３次元となり、グラフの各軸がそれぞれの次元（判別軸）を表している。さらに、発明者らは画面内に複数人の行動が映っている場合の認識実験を行なった。従来は個々の人物の検出とトラッキングを行なった上で別々に認識する方法しかなかったが、立体高次局所自己相関特徴の加法性を利用することでこれを同時に認識することが可能である。一般の複数人対象の場合は、加法性より次のように定式化される。 Next, the experiment was performed by changing the time width of the three-dimensional data. As a result, although the identification rate is slightly lowered in the time width in which one action period does not fit, a high identification rate is obtained in any time width. From the above experiments, it was proved that 3D higher-order local autocorrelation features are parameter-independent feature extraction methods. FIG. 9 is a graph showing the distribution of experimental data in the new feature space in the action recognition experiment. In this experiment, since the number of classes is 4, the new feature space obtained by discriminant analysis is three-dimensional, and each axis of the graph represents each dimension (discriminant axis). Furthermore, the inventors conducted a recognition experiment when the actions of multiple people are shown in the screen. Conventionally, there has been only a method of recognizing individual persons after performing detection and tracking of individual persons. However, this can be simultaneously recognized by using the additive property of a solid higher-order local autocorrelation feature. In the case of a general target of multiple persons, it is formulated as follows from the additivity.

x=α₁f₁＋α₂f₂＋α₃f₃＋α₄f₄＋ε……（式２）。 x = α ₁ f ₁ + α ₂ f ₂ + α ₃ f ₃ + α ₄ f ₄ + ε (Equation 2).

ここでｘは複数人が映っている動画像に対する立体高次局所自己相関特徴ベクトル、f₁,…,f₄は「右歩き」「左歩き」「右走り」「左走り」の各クラスにおける立体高次局所自己相関特徴ベクトルの平均ベクトル、α₁,…,α₄は「右歩き」「左歩き」「右走り」「左走り」のそれぞれの行動が画面内に含まれている数(整数値)、εは誤差ベクトルである。図９の判別空間プロット図を見ると、確かに行動ベクトルの線形和付近にデータが密集しているため、加法性（式２）の正当性が確かめられる。式２は立体高次局所自己相関特徴ベクトルの次元（２５１次元）と判別クラス数（４つ）の関係から疑似逆行列を用いて解析的に一意に解ける。また、線形判別空間（３次元）は元の特徴空間を線形写像したものであるため、式２は成立する。そこで、線形判別空間において式２の解を探索することも可能である。解の探索は、０≦α₁,…,α₄≦１０の範囲での全探索と、遺伝的アルゴリズムを用いた探索を行った。結果は、疑似逆行列では識別率が悪いが、これは２５１次元の特徴ベクトル空間においては各クラスがそれほどカテゴリを形成していないことによると考えられる。それぞれのクラスがほぼ左右対称な位置にあり、その平均ベクトルが独立でないと推測され、このことが原因と思われる。しかし線形判別空間では完全に識別できているため、複数対象の場合でも十分に識別が可能であることがわかった。 Here, x is a three-dimensional high-order local autocorrelation feature vector for a moving image in which a plurality of people are shown, and f ₁ ,..., F ₄ are “right walk”, “left walk”, “right run”, and “left run” classes. The average vector of three-dimensional higher-order local autocorrelation feature vectors, α ₁ ,…, α ₄ is the number of actions on the screen that include “right walk”, “left walk”, “right run”, and “left run”. (Integer value), ε is an error vector. When the discriminant space plot diagram of FIG. 9 is seen, since the data is certainly concentrated near the linear sum of the action vectors, the validity of the additivity (formula 2) can be confirmed. Equation 2 can be solved analytically and uniquely using a pseudo inverse matrix from the relationship between the dimension (251 dimensions) of the three-dimensional higher-order local autocorrelation feature vector and the number of discriminating classes (four). Further, since the linear discriminant space (three-dimensional) is a linear mapping of the original feature space, Equation 2 is established. Therefore, it is possible to search for the solution of Equation 2 in the linear discriminant space. The search of the solution performed the full search in the range of 0 <= (alpha) ₁ , ..., (alpha) ₄ <= 10, and the search using the genetic algorithm. As a result, although the discrimination rate is poor in the pseudo inverse matrix, it is considered that this is because each class does not form a category so much in the 251 dimensional feature vector space. Each class is almost symmetrical and its mean vector is assumed not to be independent. However, since the linear discriminant space can be completely discriminated, it was found that it can be discriminated sufficiently even in the case of multiple objects.

図１０は、行動認識実験における新特徴空間での行動の遷移状態を示すグラフである。実環境において人間の行動は一定ではなく、様々に変化する。そこで、そのような行動の
遷移が含まれる動画像に対する動作の認識を行なう。基本実験で用いた行動データを人間の行動が連続するようにつなぎ合わせることにより行動の遷移が含まれるデータを作成した。図１０を見ると、判別空間内での行動クラス間の遷移がほぼ直線状にきれいに現れている。 FIG. 10 is a graph showing the transition state of the action in the new feature space in the action recognition experiment. In the real environment, human behavior is not constant and varies in various ways. Therefore, an operation for a moving image including such a behavior transition is recognized. Data including behavior transitions was created by connecting the behavior data used in the basic experiment so that human behaviors are continuous. When FIG. 10 is seen, the transition between the action classes in the discrimination space appears neatly in a substantially straight line shape.

次に行動からの個人の同定、つまりGait Recognitionについて説明する。Gait Recognitionはバイオメトリクス認証の一つとして、近年注目をあつめている。その利点としては歩様などの行動は特徴を隠しづらく、また離れた位置からも個人認証が可能となる点である。図１１は、行動による複数人同時同定実験における新特徴空間での実験データの分布を示すグラフである。実験の対象者は5人、識別クラスは「人1」「人2」「人3」「人4」「人5」の5クラス。動画像は352×240画素、グレースケール。人の大きさは30×80画素とした。左右走歩行のデータ全体に対してランダムに3分の2をテストデータ、残りを学習データとして識別を行った。また、左右走歩行のデータそれぞれに対して、それぞれのデータ内においてランダムに3分の2をテストデータ、残りを学習データとして識別を行った。識別率はこれらを１００回繰り返し平均をとる。時間幅は30とした。その結果、どの行動に対しても高い識別率が得られた。つまり、立体高次局所自己相関特徴は個人の行動の差異を反映していると言える。そのためGait Recognitionも可能であることがわかる。 Next, personal identification from behavior, that is, Gait Recognition will be described. Gait Recognition has attracted attention in recent years as one of the biometrics certifications. The advantage is that it is difficult to hide the features of gait and other actions, and personal authentication is possible from a remote location. FIG. 11 is a graph showing the distribution of experimental data in the new feature space in the simultaneous identification experiment with multiple persons by behavior. The subjects of the experiment are 5 people, and the identification class is 5 classes of “person 1” “person 2” “person 3” “person 4” “person 5”. The moving image is 352 x 240 pixels, grayscale. The size of a person is 30 × 80 pixels. Randomly, two-thirds of the left-right running walk data was identified as test data and the rest as learning data. In addition, for each of the left and right running walking data, two-thirds of the data were randomly identified as test data and the rest as learning data. The discrimination rate is averaged by repeating these 100 times. The time width was 30. As a result, a high recognition rate was obtained for any action. That is, it can be said that the three-dimensional higher-order local autocorrelation feature reflects the difference in individual behavior. Therefore, it can be seen that Gait Recognition is also possible.

次に、より実環境に促した拡張実験を行なった。但し時間幅は３０とした。画面内に複数「人」が映っている場合の同定を行なう。これは人の往来の多い場所での認証に必要となる。従来は個々の人のトラッキングを行なった上で別々に同定する方法しかなかったが、立体高次局所自己相関特徴の加法性を利用することで、画面に写っている人全員を同時に同定することが可能となる。一般の複数人の場合は、加法性より次のように定式化される。 Next, we conducted an extended experiment that encouraged the real environment. However, the time width was 30. Identification is made when there are multiple "people" on the screen. This is necessary for authentication in places where there is a lot of traffic. Previously, there was only a method of identifying each person after tracking them individually, but by using the additive nature of the cubic higher-order local autocorrelation features, all the people on the screen can be identified simultaneously. Is possible. In the case of a general plural person, it is formulated as follows from the additivity.

x=α₁f₁＋α₂f₂＋α₃f₃＋α₄f₄＋α₅f₅＋ε……（式３）。 x = α ₁ f ₁ + α ₂ f ₂ + α ₃ f ₃ + α ₄ f ₄ + α ₅ f ₅ + ε (Equation 3).

ここでxは複数人が映っている動画像に対する立体高次局所自己相関特徴ベクトル、f₁，…，f₅は「人１」「人２」「人３」「人４」「人５」の各クラスにおける立体高次局所自己相関特徴ベクトルの平均ベクトル、α₁,…,α₅は「人１」「人２」「人３」「人４」「人５」のそれぞれの行動が画面内に含まれている数(0 or 1)、εは誤差ベクトルである。図１１の判別空間プロット図を見ると、確かに人ベクトルの線形和付近にデータが密集しているため、加法性の正当性が確かめられた。式３は立体高次局所自己相関特徴ベクトルの次元（２５１次元）と判別クラス数（５つ）の関係から疑似逆行列を用いて解析的に一意に解ける。また、線形判別空間（３次元）は元の特徴空間を線形変換したものであるため、式３は成立する。そこで、線形判別空間において式３の解を探索することも可能である。解の探索は、０≦α₁,…,α₅≦１の範囲での全探索と、遺伝的アルゴリズムを用いた探索を行った。この結果、疑似逆行列では識別率が悪いが、これは２５１次元の特徴ベクトル空間においては各クラスがそれほどカテゴリを形成していないことによると考えられる。しかし線形判別空間では完全に識別できているため、複数対象の場合でも十分に識別が可能であることがわかる。 Here, x is a three-dimensional high-order local autocorrelation feature vector for a moving image in which a plurality of people are shown, and f ₁ ,..., F ₅ are “person 1” “person 2” “person 3” “person 4” “person 5”. The average vector of the three-dimensional higher-order local autocorrelation feature vectors in each class, α ₁ ,..., Α ₅ are screens showing the actions of “person 1” “person 2” “person 3” “person 4” “person 5” The number contained in (0 or 1), ε is an error vector. Looking at the discriminant space plot of FIG. 11, the data is certainly concentrated near the linear sum of the human vectors, so the validity of the additivity was confirmed. Equation 3 can be uniquely solved analytically using a pseudo inverse matrix from the relationship between the dimension (251 dimensions) of the three-dimensional higher-order local autocorrelation feature vector and the number of discriminating classes (five). Further, since the linear discriminant space (three-dimensional) is a linear transformation of the original feature space, Equation 3 is established. Therefore, it is possible to search for the solution of Equation 3 in the linear discriminant space. The search of the solution performed the full search in the range of 0 <= (alpha) ₁ , ..., (alpha) ₅ <= 1, and the search using the genetic algorithm. As a result, the discrimination rate is poor in the pseudo inverse matrix, but this is considered to be because each class does not form a category so much in the 251 dimensional feature vector space. However, since it can be completely discriminated in the linear discriminant space, it can be seen that discrimination is possible even in the case of a plurality of objects.

以上、実施例および実験結果を示したが、本発明の手法は３次元データから認識に実質的に有効な特徴を抽出することで、これまでの手法よりデータの本質を捉えた認識が可能になると考えられる。また、この手法の基本概念である相関はノイズにも強いことが知られており、この手法がノイズにロバストな特徴量を算出することが期待できる。 As described above, the examples and the experimental results have been shown, but the method of the present invention enables recognition that captures the essence of data than the conventional methods by extracting features that are substantially effective for recognition from three-dimensional data. It is considered to be. In addition, it is known that the correlation, which is the basic concept of this method, is also strong against noise, and it can be expected that this method will calculate a feature quantity that is robust to noise.

本発明の有効性を確かめるために動画像認識における行動認識とGait（歩様） Recognitionに適用し、実験により高い識別率が得られた。さらにこれまでの手法では不可能であった複数対象の同時認識も立体高次局所自己相関特徴の加法性から可能であることを示し、実験によりそれを実証した。 In order to confirm the effectiveness of the present invention, it was applied to action recognition and Gait recognition in moving image recognition, and a high recognition rate was obtained by experiments. In addition, we demonstrated that simultaneous recognition of multiple objects, which was not possible with previous methods, is possible due to the additive nature of the three-dimensional local autocorrelation feature, and proved it by experiments.

動画像に対して、時間幅を設定することで時系列データの対処を行ったが、これは人間での短期記憶に相当すると考えられる。しかし、時間幅が短すぎると識別率が悪くなり、長すぎても行動の遷移に鈍くなるため、時間幅の適切な設定も重要である。 Time series data was dealt with by setting a time width for moving images, which is considered to correspond to human short-term memory. However, if the time width is too short, the identification rate is deteriorated, and if it is too long, the transition of the action becomes dull. Therefore, it is important to set the time width appropriately.

立体高次局所自己相関特徴はスケールの変化に敏感であるため、スケールの設定が重要である。立体高次局所自己相関特徴をスケール不変とするためには、１つの３次元データからスケールの異なる複数の３次元データを生成し、それぞれのデータにおいて行動等の認識処理を行い、その結果を用いて総合的に判定することが考えられる。実施例においては、２次までの自己相関を用いたが、３次以上のより高次の自己相関を用いてもよい。相関を取る範囲も３×３×３より狭い範囲あるいは広い範囲を用いてもよく、範囲が立方体でなくてもよい。また、１画素飛びや２画素飛びに３×３×３画素を取るようにしてもよい。実施例においては、２５１種のマスクパターン全てを使用する例を開示したが、特徴の表現に寄与しているパターンのみを選択して使用してもよい。そうすれば、処理速度がより向上する。以上、動作の認識を行う実施例について説明したが、ステレオビジョンやレンジファインダーなどの方法で得られる３次元（静止）物体のデータ、あるいはＭＲＩやＣＴスキャナにより得られるデータも３次元（立体）の数値データであるので、本発明をそのまま適用して、３次元物体の認識を行うことが可能であり、物体の識別や病巣の検出などの応用に利用可能である。 Since the cubic higher-order local autocorrelation features are sensitive to scale changes, setting the scale is important. In order to make the 3D higher-order local autocorrelation feature invariant to scale, a plurality of 3D data with different scales are generated from one 3D data. It is possible to make a comprehensive judgment. In the embodiment, the autocorrelation up to the second order is used, but a higher order autocorrelation of the third order or higher may be used. The range for obtaining the correlation may be a range narrower than 3 × 3 × 3 or a wide range, and the range may not be a cube. Alternatively, 3 × 3 × 3 pixels may be taken every 1 pixel or 2 pixels. In the embodiment, an example in which all 251 types of mask patterns are used has been disclosed. However, only patterns that contribute to feature expression may be selected and used. By doing so, the processing speed is further improved. As described above, the embodiment for recognizing the operation has been described. However, data of a three-dimensional (stationary) object obtained by a method such as stereo vision or a range finder, or data obtained by an MRI or CT scanner is also three-dimensional (solid). Since it is numerical data, the present invention can be applied as it is to recognize a three-dimensional object, and can be used for applications such as object identification and lesion detection.

本発明による物体、動作認識処理の内容を示すフローチャートである。It is a flowchart which shows the content of the object and motion recognition process by this invention. Ｓ１３の立体高次局所自己相関特徴抽出処理の内容を示すフローチャートである。It is a flowchart which shows the content of the solid high-order local autocorrelation feature extraction process of S13. 本発明による動画像実時間処理の内容を示すフローチャートである。It is a flowchart which shows the content of the moving image real-time process by this invention. ２値化差分処理結果の画像を示す説明図である。It is explanatory drawing which shows the image of a binarization difference process result. ３次元画素空間における自己相関処理範囲を示す斜視図である。It is a perspective view which shows the autocorrelation process range in three-dimensional pixel space. ３次元画素空間における自己相関処理座標を示す説明図である。It is explanatory drawing which shows the autocorrelation process coordinate in a three-dimensional pixel space. 自己相関マスクパターンの例を示す説明図である。It is explanatory drawing which shows the example of an autocorrelation mask pattern. 本発明による動画像実時間処理の内容を示す説明図である。It is explanatory drawing which shows the content of the moving image real-time process by this invention. 行動認識実験における新特徴空間での実験データの分布を示すグラフである。It is a graph which shows distribution of the experimental data in the new feature space in action recognition experiment. 行動認識実験における新特徴空間での行動の遷移状態を示すグラフである。It is a graph which shows the transition state of the action in the new feature space in action recognition experiment. 行動による複数人同時同定実験における新特徴空間での実験データの分布を示すグラフである。It is a graph which shows distribution of the experimental data in the new feature space in multiple person simultaneous identification experiment by action.

Explanation of symbols

ｘ２５１次元特徴データＡ係数行列ｙ新特徴データ x 251D feature data A Coefficient matrix y New feature data

Claims

Extracting feature data from 3D high-order local autocorrelation from three-dimensional data, generating new feature data from the feature data using coefficient data obtained by a multivariate analysis method, new feature data and registration data A method for extracting features from three-dimensional data, comprising the step of recognizing an action or an object by comparing

The three-dimensional data is moving image data cut out by a predetermined time window. In the three-dimensional high-order local autocorrelation, the range to be correlated is set to a pixel range at a predetermined distance from the target pixel, and is translated. 2. The method for extracting features from three-dimensional data according to claim 1, wherein correlation is obtained by correlation patterns from the 0th order to the 2nd order from which overlapping correlation patterns that coincide with each other are deleted.

The step of extracting the feature data includes the step of inputting frame data of a moving image, the step of generating feature data newly generated by adding the frame data to three-dimensional data, and the newly generated feature data Storing in correspondence with the frame data and adding to the current feature data; and reading out the feature data corresponding to the frame data after a predetermined time and subtracting from the current feature data The method for extracting features from the three-dimensional data according to claim 1.

New feature data generation means for generating new feature data from the feature data by using feature data extraction means for extracting feature data from three-dimensional data by three-dimensional local autocorrelation and coefficient data obtained by a multivariate analysis method A feature extraction apparatus from three-dimensional data, comprising: means for recognizing an action or an object by comparing new feature data and registered data.