JP6069095B2

JP6069095B2 - Image identification device, image identification method, and image identification program

Info

Publication number: JP6069095B2
Application number: JP2013104351A
Authority: JP
Inventors: 数藤　恭子; 恭子数藤; 行信谷口
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc
Current assignee: Nippon Telegraph and Telephone Corp; NTT Inc
Priority date: 2013-05-16
Filing date: 2013-05-16
Publication date: 2017-01-25
Anticipated expiration: 2033-05-16
Also published as: JP2014225149A

Description

本発明は、画像の識別を行う画像識別装置、画像識別方法及び画像識別プログラムに関する。 The present invention relates to an image identification device, an image identification method, and an image identification program for identifying an image.

異常がない普段の状態の映像を学習しておき、その学習されたパターンのどれにも属さないものを異常なシーンとすることが従来から行われている。このとき、画像中の動物体ごとの動きのパターンに基づいて学習・識別を行う場合（例えば、非特許文献１参照）や、動物体の軌跡を抽出し、その軌跡に基づいて学習・識別を行う場合（例えば、非特許文献２参照）などがある。これらの方法は、画像中に存在する動物体（人、車）の数が少ない（１つまたは数個）の場合や、道路を車が走るなど動物体の移動経路が限られている場合には、有効である。 It has been conventionally practiced to learn an image in a normal state without any abnormality and to make an abnormal scene that does not belong to any of the learned patterns. At this time, when learning / identification is performed based on the movement pattern for each moving object in the image (see, for example, Non-Patent Document 1), the trajectory of the moving object is extracted, and learning / identification is performed based on the trajectory. There are cases where it is performed (for example, see Non-Patent Document 2). These methods are used when the number of moving objects (people, cars) in the image is small (one or several), or when the moving path of moving objects is limited, such as when a car runs on a road. Is valid.

なお、本明細書において、画像とは、静止画像、または動画像を構成する１フレーム分の画像のことをいう。また映像とは、動画像と同じ意味であり、一連の画像の集合である。 Note that in this specification, an image means a still image or an image for one frame constituting a moving image. A video has the same meaning as a moving image, and is a set of a series of images.

人物動作に着目したシーン分割による作業動作の異常検出、清水早苗、平湯秀和、浅井博次、丹羽義典、電子情報通信学会技術研究報告、２００７−ＣＶＩＭ−１６０，２００７．Anomaly detection of work motion by scene division focusing on human motion, Sanae Shimizu, Hidekazu Hirayu, Hiroji Asai, Yoshinori Niwa, IEICE Technical Report, 2007-CVIM-160, 2007. 状況依存モデルを用いた異常行動の検出、岡本宏美、西尾修一、馬場口登森井藤樹萩田紀博、電子情報通信学会技術研究報告、ＤＥ２００８−１９，ＰＲＭＵ２００８−３７，２００８Detection of abnormal behavior using a situation-dependent model, Hiromi Okamoto, Shuichi Nishio, Noboru Babaguchi Fujiki Morii Norihiro Hamada, IEICE Technical Report, DE2008-19, PRMU2008-37, 2008

しかしながら、歩道や施設構内の敷地を人物が歩いている場合などのように、動物体が比較的多数存在し、また、それらの移動経路が限定されていない状況では、非特許文献１の方法で動物体ごとの動きに基づいてシーンの状態を識別できるだけの解像度で個々の動物体を観察することは難しい。また、非特許文献２の方法で、軌跡のパターンを学習することは難しい。 However, in a situation where there are a relatively large number of moving objects, such as when a person is walking on a sidewalk or a site in a facility, and the movement route thereof is not limited, the method of Non-Patent Document 1 is used. It is difficult to observe each moving object with a resolution that can identify the state of the scene based on the movement of each moving object. In addition, it is difficult to learn a locus pattern by the method of Non-Patent Document 2.

このように、従来技術では、比較的多くの動物体を含む映像から異常なシーンを含む時刻の画像を検出しようとしたとき、物体の動きに基づいた特徴量を抽出して判定することが難しいという問題がある。これは、動物体の個々の動き方だけではなく動物体どうしの動きの方向の関係性を反映した特徴量を用いる必要があるためである。 As described above, in the conventional technology, when an image at a time including an abnormal scene is detected from a video including a relatively large number of moving objects, it is difficult to extract and determine a feature amount based on the motion of the object. There is a problem. This is because it is necessary to use a feature value that reflects the relationship between the movement directions of the moving objects as well as the individual movements of the moving objects.

本発明は、このような事情に鑑みてなされたもので、入力画像を識別することにより、画像内の異常な状況の発生を検出することができる画像識別装置、画像識別方法及び画像識別プログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and provides an image identification device, an image identification method, and an image identification program capable of detecting occurrence of an abnormal situation in an image by identifying an input image. The purpose is to provide.

本発明は、入力画像系列から画像変化領域を抽出する変化領域抽出手段と、前記画像変化領域に基づいて動きベクトルを抽出するベクトル抽出手段と、予め決められた時空間領域において、前記動きベクトルの方向の関係を示すフローテクスチャ特徴量を生成するフローテクスチャ特徴量生成手段と、前記フローテクスチャ特徴量に基づき、画像の識別を行い、識別結果の情報を出力する識別手段とを備えることを特徴とする。 The present invention includes a changing area extraction means for extracting an image change region from an input image sequence, and vector extracting means for extracting a motion vector on the basis of the image change region, in the spatial domain when the predetermined, of the motion vector A flow texture feature amount generating unit that generates a flow texture feature amount indicating a relationship between directions; and an identification unit that identifies an image based on the flow texture feature amount and outputs information of an identification result; To do.

本発明は、前記フローテクスチャ特徴量生成手段は、量子化した動きベクトルの方向を示す値を行と列にした行列、または、動きベクトルの方向とベクトル相関値とを行と列にした行列によって前記フローテクスチャ特徴量を表現することを特徴とする。 The present invention, the flow texture feature generating means, a matrix in which the value indicating the direction of the motion vector quantized in rows and columns, or by a matrix in which the direction and vector correlation value of the motion vector in rows and columns The flow texture feature amount is expressed.

本発明は、前記フローテクスチャ特徴量生成手段は、動物体の動きベクトル方向に対して、該動物体を中心として一定距離以内に存在する、前記動物体以外の動物体の動きベクトル方向の頻度を行列の形式に表現したもの、および複数の動物体毎にそれぞれ中心とする位置が異なる前記一定距離以内に存在する、前記動物体以外の動物体の動きベクトル方向の頻度を行列の形式に表現したものの組み合わせ、または、動物体の動きベクトル方向に対して、該動物体を中心として一定距離以内に存在する、前記動物体以外の動物体の動きベクトル方向とのベクトル相関値を行列の形式に表現したもの、および複数の動物体毎にそれぞれ中心とする位置が異なる前記一定距離以内に存在する、前記動物体以外の動物体の動きベクトルの方向とベクトル相関値とを行列の形式に表現したものの組み合わせ、または、前記動きベクトルをテクスチャとみなした時の前記動きベクトルの分布の一様性を求めた数値、のいずれか一つによって前記フローテクスチャ特徴量を表現することを特徴とする。 According to the present invention, the flow texture feature quantity generation means calculates the frequency of the motion vector direction of an animal body other than the animal body that is present within a certain distance with respect to the motion vector direction of the animal body. Expressed in the form of a matrix and the frequency in the direction of the motion vector of the moving object other than the moving object that exists within the fixed distance, each of which has a different center position for each moving object. the combination of things, or, with respect to the motion vector direction of a moving object is present within a certain distance around the animal body, in the form of a vector correlation value between the motion vector direction of a moving object other than the animal body matrix a representation, and a plurality of positions around each per animal body exists within a different said predetermined distance, direction and vector of the motion vector of a moving object other than the moving object Combinations of a representation of a and Le correlation values in a matrix format, or a numerical value was determined uniformity of distribution of the motion vector when considered pre kidou-out vector and texture, the by any one of It is characterized by expressing a flow texture feature amount.

本発明は、入力画像から画像識別する画像識別装置が行う画像識別方法であって、入力画像系列から画像変化領域を抽出する変化領域抽出ステップと、前記画像変化領域に基づいて動きベクトルを抽出するベクトル抽出ステップと、予め決められた時空間領域において、前記動きベクトルの方向の関係を示すフローテクスチャ特徴量を生成するフローテクスチャ特徴量生成ステップと、前記フローテクスチャ特徴量に基づき、画像の識別を行い、識別結果の情報を出力する識別ステップとを有することを特徴とする。 The present invention relates to an image identification method performed by an image identification device for identifying an image from an input image, a change region extraction step for extracting an image change region from an input image series, and a motion vector based on the image change region. A vector extraction step , a flow texture feature amount generating step for generating a flow texture feature amount indicating a relationship between the directions of the motion vectors in a predetermined space-time region, and image identification based on the flow texture feature amount. And an identification step of outputting identification result information.

本発明は、前記フローテクスチャ特徴量生成ステップでは、量子化した動きベクトルの方向を示す値を行と列にした行列、または、動きベクトルの方向とベクトル相関値とを行と列にした行列によって前記フローテクスチャ特徴量を表現することを特徴とする。 The present invention, the flow in the texture feature generating step, the matrix in which the value indicating the direction of the motion vector quantized in rows and columns, or by a matrix in which the direction and vector correlation value of the motion vector in rows and columns The flow texture feature amount is expressed.

本発明は、前記フローテクスチャ特徴量生成ステップでは、動物体の動きベクトル方向に対して、該動物体を中心として一定距離以内に存在する、前記動物体以外の動物体の動きベクトル方向の頻度を行列の形式に表現したもの、および複数の動物体毎にそれぞれ中心とする位置が異なる前記一定距離以内に存在する、前記動物体以外の動物体の動きベクトル方向の頻度を行列の形式に表現したものの組み合わせ、または、動物体の動きベクトル方向に対して、該動物体を中心として一定距離以内に存在する、前記動物体以外の動物体の動きベクトル方向とのベクトル相関値を行列の形式に表現したもの、および複数の動物体毎にそれぞれ中心とする位置が異なる前記一定距離以内に存在する、前記動物体以外の動物体の動きベクトルの方向とベクトル相関値とを行列の形式に表現したものの組み合わせ、または、前記動きベクトルをテクスチャとみなした時の前記動きベクトルの分布の一様性を求めた数値、のいずれか一つによって前記フローテクスチャ特徴量を表現することを特徴とする。 According to the present invention, in the flow texture feature amount generation step, the frequency of the motion vector direction of an animal body other than the animal body existing within a certain distance with respect to the motion vector direction of the animal body is determined. Expressed in the form of a matrix and the frequency in the direction of the motion vector of the moving object other than the moving object that exists within the fixed distance, each of which has a different center position for each moving object. the combination of things, or, with respect to the motion vector direction of a moving object is present within a certain distance around the animal body, in the form of a vector correlation value between the motion vector direction of a moving object other than the animal body matrix a representation, and respectively the center position for each of the plurality of moving object is present within a different said predetermined distance, the direction of the motion vector of a moving object other than the moving object Combinations of a representation the vector correlation values into a matrix format, or a numerical value was determined uniformity of distribution of the motion vector when considered pre kidou-out vector and texture, the by any one of It is characterized by expressing a flow texture feature amount.

本発明は、コンピュータを、前記画像識別装置として機能させるための画像識別プロクラムである。 The present invention is an image identification program for causing a computer to function as the image identification device.

本発明によれば、画像に含まれる動物体の動きによる動きベクトルを時空間のテクスチャとみなし、低次元の特徴量に表現することで、動物体の個別の動きだけでなく、複数動物体の作り出す流れも反映して、画像内の異常な状況の発生を検出することが可能になるという効果が得られる。 According to the present invention, the motion vector due to the motion of the moving object included in the image is regarded as a temporal and spatial texture, and is expressed in a low-dimensional feature amount, so that not only the individual motions of the moving object but also a plurality of moving objects. Reflecting the flow to be produced, it is possible to detect the occurrence of an abnormal situation in the image.

本発明の一実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of one Embodiment of this invention. 図１に示す画像識別装置の処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of the image identification device shown in FIG. 図１に示す画像識別装置の処理動作を示す説明図である。It is explanatory drawing which shows the processing operation of the image identification device shown in FIG.

以下、図面を参照して、本発明の一実施形態による画像識別装置を説明する。図１は同実施形態の構成を示すブロック図である。この図において、符号１は、入力映像を入力する映像入力部である。映像入力部１は、カメラ等の撮像装置の出力を入力するかまたは映像ファイルから映像データを読み込むことにより映像入力を行う。符号２は、映像入力部１において入力した映像から階層的なフロー、すなわちオプティカルフローや動物体ごとの動きベクトルを検出するフロー情報検出部である。符号２１は、入力映像からオプティカルフローを検出するオプティカルフロー検出部である。符号２２は、入力映像から動きベクトルを検出する動きベクトル検出部である。 Hereinafter, an image identification device according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing the configuration of the embodiment. In this figure, reference numeral 1 denotes a video input unit for inputting an input video. The video input unit 1 inputs video by inputting the output of an imaging device such as a camera or reading video data from a video file. Reference numeral 2 denotes a flow information detection unit that detects a hierarchical flow, that is, an optical flow or a motion vector for each moving object, from the video input in the video input unit 1. Reference numeral 21 denotes an optical flow detector that detects an optical flow from the input video. Reference numeral 22 denotes a motion vector detection unit that detects a motion vector from the input video.

符号３は、予め決められた方法によってフローのサンプリングを行い、予め決められた時空間領域において、それらの方向の関係を算出し、フローテクスチャ特徴を生成するフローテクスチャ特徴生成部である。符号４は、フローテクスチャ特徴を用いた学習及び識別を行う学習・識別部である。符号４１は、学習・識別部４における学習の結果に応じて生成・更新される辞書データを記憶する辞書記憶部である。符号５は、識別の結果を数値として出力する判定出力部である。 Reference numeral 3 denotes a flow texture feature generation unit that performs flow sampling by a predetermined method, calculates a relationship between the directions in a predetermined space-time region, and generates a flow texture feature. Reference numeral 4 denotes a learning / identification unit that performs learning and identification using flow texture features. Reference numeral 41 denotes a dictionary storage unit that stores dictionary data generated and updated according to the learning result in the learning / identification unit 4. Reference numeral 5 denotes a determination output unit that outputs the identification result as a numerical value.

次に、図２を参照して、図１に示す画像識別装置の動作を説明する。図２は、図１に示す画像識別装置の動作を示すフローチャートである。まず、映像入力部１は、入力映像を入力する（ステップＳ１）。ここでは、入力映像として監視映像を入力し、この監視映像から異常なシーンを検出するための目的で使用することを想定する。映像入力部１によって入力される映像として、まず学習のための長時間の映像が必要である。異常なシーンを含まない映像で学習用のものが長時間あるほど精度よい辞書を作成することができる。必ずしも一続きの映像ファイルである必要はなく、画像の時系列や、短い映像ファイルが多数あるのでもよい。また、対象はどのようなものであってもよいが、例えば車が多数通る道路や、多数の人々が行きかう通りや駅構内など、ある程度の動物体群の動きがあり、それらに定常的な流れが生じているような場合には、従来手法での対応が難しく本発明の効果が発揮される。したがって、以下の説明では例として人通りの多い通りの監視映像であるものとして説明する。なお、画像の大きさや、含まれる動物体の大きさは特定しない。 Next, the operation of the image identification device shown in FIG. 1 will be described with reference to FIG. FIG. 2 is a flowchart showing the operation of the image identification device shown in FIG. First, the video input unit 1 inputs an input video (step S1). Here, it is assumed that a monitoring video is input as an input video and used for the purpose of detecting an abnormal scene from the monitoring video. As a video input by the video input unit 1, first, a long video for learning is required. The more accurate the video that does not contain abnormal scenes, the more accurate the dictionary can be created. It does not necessarily have to be a series of video files, and there may be many time-series images and short video files. The subject may be anything, but there are some movements of animal groups such as roads where many cars pass, streets where many people go, and station premises. When a flow is generated, it is difficult to cope with the conventional method, and the effect of the present invention is exhibited. Therefore, in the following description, it will be described as an example of a surveillance video with a lot of traffic. In addition, the size of the image and the size of the moving object included are not specified.

次に、フロー情報検出部２は、映像入力部１において入力した映像から階層的なフロー、すなわちオプティカルフローや動物体ごとの動きベクトルを検出する（ステップＳ２）。オプティカルフロー検出部２１では、映像の各フレームにおけるオプティカルフローを検出する。検出アルゴリズムは多数存在するが、例えばＬｕｃａｓ−Ｋａｎａｄｅ法、Ｈｏｒｎ−Ｓｃｈｕｎｃｋ法、ブロックマッチングによる方法などを用いることができる。また、動きベクトル検出部は、背景推定による前景抽出を行い、前景領域の時間変化から、人や車などのオブジェクト単位の領域とその領域内での平均の動きベクトルを算出する。 Next, the flow information detection unit 2 detects a hierarchical flow, that is, an optical flow or a motion vector for each moving object, from the video input by the video input unit 1 (step S2). The optical flow detector 21 detects the optical flow in each frame of the video. There are many detection algorithms, but for example, the Lucas-Kanade method, the Horn-Schunkck method, the method by block matching, and the like can be used. The motion vector detection unit performs foreground extraction based on background estimation, and calculates a region of an object unit such as a person or a car and an average motion vector in the region from the temporal change of the foreground region.

次に、フローテクスチャ特徴生成部は、予め決められた方法によってフローのサンプリングを行い、予め決められた時空間領域において、それらの方向の関係を算出し、フローテクスチャ特徴を生成する（ステップＳ３）。フローテクスチャ特徴生成部３は、フロー情報検出部２で得られたオプティカルフローとグローバルな動きベクトルを映像の時空間のテクスチャとみなして、テクスチャ情報を生成し、学習・識別部の入力の特徴ベクトルとして出力する。ここでは、テクスチャ解析に用いられる手法であるＧＬＣＭ（Gray Level Co-occurrence Matrix）を用いる方法を説明する。 Next, the flow texture feature generation unit performs flow sampling by a predetermined method, calculates a relationship between these directions in a predetermined space-time region, and generates a flow texture feature (step S3). . The flow texture feature generation unit 3 regards the optical flow and the global motion vector obtained by the flow information detection unit 2 as the temporal and spatial texture of the video, generates texture information, and inputs the feature vector of the learning / identification unit Output as. Here, a method using GLCM (Gray Level Co-occurrence Matrix), which is a method used for texture analysis, will be described.

ＧＬＣＭは基準とする画素ｐと、画素ｐから一定の距離にある画素ｑとの関係を行列として記述するもので（１）式のように定義する。

GLCM describes a relationship between a reference pixel p and a pixel q at a certain distance from the pixel p as a matrix, and is defined as shown in equation (1).

これは互いに一定距離にあるｐとｑの２つの画素間の画素値の関係を示している。ｉとｊは画素値そのものとすることもできるし、ｉまたはｊを画素値間の差分の値とすることもできる。 This shows the relationship of pixel values between two pixels p and q that are at a fixed distance from each other. i and j can be pixel values themselves, or i or j can be a difference value between pixel values.

そこで、画像から各画素の位置におけるオプティカルフローを求めたとすると、これを画素値の代わりに用いて、式（１）のｉとｊを量子化したオプティカルフローのクラスとすることで、同様の行列を得ることができる。これをフローテクスチャ特徴とする。フローの量子化の方法は、例えば、ｄ^ｋ（ｋ＝１，…，８）の４５度ずつの８方向として（２）式のように定義すると、８×８の行列で記述することができる。ここでｆ（ｐ）は画素ｐにおけるフローベクトルの方向を示し、ｄ^ｋ（ｋ＝１，…，８）の値をとりうるものとする。

Therefore, if the optical flow at the position of each pixel is obtained from the image, this is used instead of the pixel value, and the same matrix is obtained by classifying i and j in Equation (1) into a quantized optical flow class. Can be obtained. This is the flow texture feature. The flow quantization method can be described by an 8 × 8 matrix when, for example, d ^k (k = 1,..., 8) is defined as eight formulas of 45 degrees, as shown in equation (2). . Here, f (p) indicates the direction of the flow vector at the pixel p, and can take values of d ^k (k = 1,..., 8).

例えば、図３において点ｐとそこから一定距離点（円で示す）の領域のみ考えたとき、ｐにおけるフローの向きのクラスがｄ^２、一定距離内の点ｑ１，ｑ２，ｑ３におけるフローの向きがｄ^３，ｄ^２，ｄ^１であるとき、

となる。 For example, in FIG. 3, when considering only the point p and the region of a certain distance point (shown by a circle) from there, the flow direction class at p is d ² , and the flow direction at points q1, q2, q3 within a certain distance. Is d ³ , d ² , d ¹ ,

It becomes.

（３）式の右辺の行列でｄ^２ｄ^１，ｄ^２ｄ^２，ｄ^２ｄ^３の要素に１が与えられ、それ以外はゼロとなる。 In the matrix on the right side of equation (3), 1 is given to the elements of d ² d ¹ , d ² d ² , and d ² d ³ , and the others are zero.

また、画素と画素の関係を記述するだけでなく、オブジェクトとオブジェクトの関係を記述するように拡張することができる。あるオブジェクト領域の平均のフローと、一定の距離内にある別のオブジェクト領域の平均のフローについて、（２）式にあてはめることで、同様の記述ができる。これを用いて、グローバルな動きベクトルのテクスチャ特徴を記述することができる。オブジェクトの領域は、例えばある短い一定時間同じ方向に動きの勾配をもつ、一定以上の面積を占める領域として抽出することができる。 In addition to describing the relationship between pixels, it can be extended to describe the relationship between objects. The same description can be made by applying the equation (2) to the average flow of a certain object region and the average flow of another object region within a certain distance. This can be used to describe the texture features of global motion vectors. The object region can be extracted as a region occupying a certain area or more having a gradient of movement in the same direction for a certain short period of time.

（２）式のように行列Ｇの行と列を両方とも量子化したフローの方向とすることで、どの方向のフローがどのように分布しているかを記述することができるが、フローの大きさを反映することはできない。そこで、（４）式のように、行と列の一方をベクトル相関値（ｃｏｒｒ（ｐ，ｑ））とすることで、基準とする画素ｐとその周辺の画素ｑにおけるフローのベクトルが異なる程度を反映した行列とすることができる。ベクトルの相関値をとる方法としては、内積（ｖ（ｐ）・ｖ（ｑ））を求める方法がある（（５）式）。ベクトル相関値の量子化の方法は、例えば内積であれば０から１の値をとるので、これをｃ^ｊ（ｊ＝１，…，１０）の０．１刻みとして（４）式のように定義すると、１０×１０の行列で記述することができる。ここでｆ（ｐ）は画素ｐにおけるベクトル相関値を示し、ｃ^ｊ（ｊ＝１，…，１０）の値をとりうるものとする。

As shown in equation (2), by setting both the rows and columns of the matrix G to the quantized flow direction, it is possible to describe how and in which direction the flow is distributed. I can't reflect that. Therefore, as shown in the equation (4), by setting one of the row and the column as the vector correlation value (corr (p, q)), the flow vectors of the reference pixel p and the surrounding pixel q are different. Can be a matrix that reflects. As a method of obtaining the correlation value of the vector, there is a method of obtaining an inner product (v (p) · v (q)) (Equation (5)). The vector correlation value quantization method takes a value from 0 to 1 if it is an inner product, for example, so that this is expressed as 0.1 in increments of c ^j (j = 1,..., 10) as shown in equation (4). When defined, it can be described by a 10 × 10 matrix. Here, f (p) represents a vector correlation value in the pixel p, and can assume a value of c ^j (j = 1,..., 10).

更に、上記のようにして得られた行列Ｇを用いて、動きベクトルの分布の一様性などの数値を特徴量としてもよい。この場合、ＧＬＣＭを用いたテクスチャ解析に用いる特徴量をそのまま適用することができる。例としては（６）式〜（９）式のようなものがある。 Furthermore, using the matrix G obtained as described above, a numerical value such as uniformity of motion vector distribution may be used as the feature amount. In this case, the feature amount used for texture analysis using GLCM can be applied as it is. Examples include the equations (6) to (9).

平均：

標準偏差：

エントロピー：

２次モーメント：

average:

standard deviation:

Entropy:

Second moment:

次に、学習・識別部４は、フローテクスチャ特徴生成部３から出力されたテクスチャ情報を特徴ベクトルとして学習または識別を行う（ステップＳ４）。学習・識別部４は辞書記憶部４１を備えており、学習時は学習の結果の辞書データを辞書記憶部４１に保存する。識別時は辞書記憶部４１から辞書データを読みだして識別を行う。辞書との適合度に基づいて類似度または異常度を判定する。学習のモードと識別のモードの区別、また、学習のモードにおいて辞書データを更新するかしないかの区別は、実行時にユーザが外部から入力する。学習・識別の方法は、機械学習の一般的なアルゴリズムを用いることができる。 Next, the learning / identification unit 4 learns or identifies the texture information output from the flow texture feature generation unit 3 as a feature vector (step S4). The learning / identification unit 4 includes a dictionary storage unit 41, and stores dictionary data as a learning result in the dictionary storage unit 41 during learning. At the time of identification, the dictionary data is read from the dictionary storage unit 41 for identification. The degree of similarity or abnormality is determined based on the degree of matching with the dictionary. The distinction between the learning mode and the identification mode and whether the dictionary data is updated or not in the learning mode are input by the user from the outside at the time of execution. As a learning / identification method, a general machine learning algorithm can be used.

例えば、特徴ベクトルの分布を確率分布モデルとして表現する方法がある。学習時にこのモデルのパラメータを推定し、識別時はモデルへのあてはまりの良さから類似度または異常度を確率として出力する。また、特徴ベクトルの特徴空間での分布をクラスタリングし、いずれかのクラスに属するか、属するクラスが存在しないか、などに基づいて類似度または異常度を判定する方法がある。また、ＳＶＭ識別器を用いて、学習時に異常かそうでないかの２クラスを分離するモデルパラメータを学習し、識別時にはこれを用いて識別を行う方法がある。 For example, there is a method of expressing a feature vector distribution as a probability distribution model. The parameters of this model are estimated at the time of learning, and at the time of identification, the degree of similarity or degree of abnormality is output as a probability from the goodness of fit to the model. Further, there is a method of clustering distributions of feature vectors in the feature space and determining the similarity or abnormality based on whether the class belongs to any class or whether the class does not exist. In addition, there is a method of learning model parameters for separating two classes, which are abnormal or not at the time of learning, using an SVM classifier, and using this for identification.

次に、判定出力部５は、識別時の学習・識別部４で得られた識別の結果を出力する（ステップＳ５）。出力の種類は、前述した機械学習の方法として何を用いるかによって、異常かそうでないかを示す二値の出力や、類似度を示すスカラー値などを出力する。 Next, the determination output unit 5 outputs the identification result obtained by the learning / identification unit 4 at the time of identification (step S5). Depending on what is used as the above-described machine learning method, the output type is a binary output indicating whether it is abnormal or not, a scalar value indicating the similarity, and the like.

以上説明したように、個々の動物体の動き方も反映しつつ、複数の動物体が作り出す動きの流れの変化も反映して、動きベクトルを特徴量化するとともに、学習、識別処理に適した低次元の特徴量を用いて動画像の類似度や異常度を判定する際に、人物や車などの動物体の動きの流れを特徴量化することで、画像中に多数の動物体が様々な動きを伴って存在している場合にも、個々の動物体の動き方だけでなく、動きの経路や、複数動物体がつくる動きの流れを学習して、類似度や異常度を判定するようにした。 As explained above, while reflecting the movement of individual moving objects, it also reflects changes in the flow of movements created by multiple moving objects, making motion vectors feature quantities and reducing them suitable for learning and identification processing. When determining the similarity and abnormality level of moving images using dimension feature quantities, the movement of a moving object such as a person or a car is converted into a feature quantity so that a large number of moving objects can move in the image. In addition to learning how individual animals move, even if they exist, learn the path of movement and the flow of movements created by multiple moving objects to determine the degree of similarity and abnormality. did.

これを実現するために、画像のオプティカルフローや動物体の動きベクトルを時空間内のテクスチャとみなして特徴量化するため、入力画像のオプティカルフローを求め、動物体ごとの平均の動きベクトルを求め、それらの方向を量子化してテクスチャ特徴とし、学習サンプルから求めたテクスチャ特徴を入力として学習し辞書を作成し、その辞書を用いてテストサンプルから求めたテクスチャ特徴に対する識別を行い、サンプリングした特徴点の特徴情報を特徴ベクトル化するように構成した。 In order to realize this, the optical flow of the image and the motion vector of the moving object are regarded as textures in the space and time, and the feature value is obtained. Thus, the optical flow of the input image is obtained, the average motion vector for each moving object is obtained, Quantize those directions into texture features, learn from the texture features obtained from the learning sample as input, create a dictionary, use the dictionary to identify the texture features obtained from the test sample, and extract the sampled feature points The feature information is converted into feature vectors.

これにより、画像に含まれる動物体の動きによる動きベクトルを時空間のテクスチャとみなし、低次元の特徴量に表現することで、動物体の個別の動きだけでなく、複数動物体の作り出す流れも反映して、映像の類似度や異常度を判定することが可能になる。 As a result, the motion vector by the motion of the moving object included in the image is regarded as a temporal and spatial texture and expressed in a low-dimensional feature value, so that not only the individual movements of the moving object but also the flow created by multiple moving objects. Reflecting it, it becomes possible to determine the similarity or abnormality of the video.

前述した実施形態における画像識別装置をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されるものであってもよい。 You may make it implement | achieve the image identification device in embodiment mentioned above with a computer. In that case, a program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on this recording medium may be read into a computer system and executed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory inside a computer system serving as a server or a client in that case may be included and a program held for a certain period of time. Further, the program may be for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in the computer system. It may be realized using hardware such as PLD (Programmable Logic Device) or FPGA (Field Programmable Gate Array).

以上、図面を参照して本発明の実施の形態を説明してきたが、上記実施の形態は本発明の例示に過ぎず、本発明が上記実施の形態に限定されるものではないことは明らかである。したがって、本発明の技術思想及び範囲を逸脱しない範囲で構成要素の追加、省略、置換、その他の変更を行ってもよい。 As mentioned above, although embodiment of this invention has been described with reference to drawings, the said embodiment is only the illustration of this invention, and it is clear that this invention is not limited to the said embodiment. is there. Therefore, additions, omissions, substitutions, and other modifications of the components may be made without departing from the technical idea and scope of the present invention.

入力画像を識別することにより、画像内の異常状況を検出することが不可欠な用途に適用できる。 By identifying the input image, it can be applied to applications where it is essential to detect an abnormal situation in the image.

１・・・映像入力部、２・・・フロー情報検出部、２１・・・オプティカルフロー検出部、２２・・・動きベクトル検出部、３・・・フローテクスチャ特徴生成部、４・・・学習・識別部、４１・・・辞書記憶部、５・・・判定出力部 DESCRIPTION OF SYMBOLS 1 ... Video input part, 2 ... Flow information detection part, 21 ... Optical flow detection part, 22 ... Motion vector detection part, 3 ... Flow texture characteristic generation part, 4 ... Learning Identification unit, 41 ... dictionary storage unit, 5 ... determination output unit

Claims

Change area extraction means for extracting an image change area from the input image series;
Vector extraction means for extracting a motion vector based on the image change region;
A flow texture feature amount generating unit for generating a flow texture feature amount indicating a relationship between directions of the motion vectors in a predetermined space-time region ;
An image identification apparatus comprising: an identification unit configured to identify an image based on the flow texture feature amount and output information of the identification result.

The flow texture feature generating means, a matrix in which the value indicating the direction of the motion vector quantized in rows and columns, or the flow texture features by a matrix in which the direction and vector correlation value of the motion vector in rows and columns The image identification apparatus according to claim 1, wherein a quantity is expressed.

The flow texture feature quantity generation means includes:
Expressing the frequency of the motion vector direction of an animal body other than the above-mentioned animal body within a certain distance with respect to the motion vector direction of the animal body in a matrix form, and a plurality of animal bodies the combination of those respectively present within the predetermined distance position differs around, representing the frequency of the motion vector direction of a moving object other than the moving object in a matrix format for each,
Alternatively, a vector correlation value with a motion vector direction of a moving object other than the moving object that is present within a certain distance with respect to the moving object direction of the moving object in the form of a matrix, and present within the predetermined distance position differs around each of a plurality of each animal body, the combination of a representation in the form of a direction vector correlation values and a matrix of motion vectors of a moving object other than the animal body,
Or, the numerical values determine the uniformity of distribution of the motion vector when regarded before kidou-out vector and texture,
The image identification apparatus according to claim 1 , wherein the flow texture feature amount is expressed by any one of the following.

An image identification method performed by an image identification device for identifying an image from an input image,
A change area extraction step for extracting an image change area from the input image series;
A vector extraction step of extracting a motion vector based on the image change region ;
A flow texture feature value generating step for generating a flow texture feature value indicating a relationship between directions of the motion vectors in a predetermined space-time region ;
An image identification method comprising: an identification step of identifying an image based on the flow texture feature amount and outputting information of the identification result.

The flow in the texture feature generating step, the matrix in which the value indicating the direction of the motion vector quantized in rows and columns, or the flow texture features by a matrix in which the direction and vector correlation value of the motion vector in rows and columns The image identification method according to claim 4, wherein a quantity is expressed.

In the flow texture feature generation step,
Expressing the frequency of the motion vector direction of an animal body other than the above-mentioned animal body within a certain distance with respect to the motion vector direction of the animal body in a matrix form, and a plurality of animal bodies the combination of those respectively present within the predetermined distance position differs around, representing the frequency of the motion vector direction of a moving object other than the moving object in a matrix format for each,
Alternatively, a vector correlation value with a motion vector direction of a moving object other than the moving object that is present within a certain distance with respect to the moving object direction of the moving object in the form of a matrix, and present within the predetermined distance position differs around each of a plurality of each animal body, the combination of a representation in the form of a direction vector correlation values and a matrix of motion vectors of a moving object other than the animal body,
Or, the numerical values determine the uniformity of distribution of the motion vector when regarded before kidou-out vector and texture,
The image identification method according to claim 4 , wherein the flow texture feature quantity is expressed by any one of the following.

An image identification program for causing a computer to function as the image identification device according to claim 1.