JP4449483B2

JP4449483B2 - Image analysis apparatus, image analysis method, and computer program

Info

Publication number: JP4449483B2
Application number: JP2004039053A
Authority: JP
Inventors: アレハンドロハイメス; 和昌村井
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2004-02-16
Filing date: 2004-02-16
Publication date: 2010-04-14
Anticipated expiration: 2024-02-16
Also published as: JP2005228274A

Description

本発明は、画像データに基づいて被写体の動きの識別処理を実行する画像解析装置、および画像解析方法、並びにコンピュータ・プログラムに関する。さらに、詳細には、例えばカメラによる撮影画像から人の顔領域などの特徴部分を抽出し、特徴部分の位置データなど簡易なデータによって構成されるルールとの照合を実行して、被写体の動きを識別する画像解析装置、および画像解析方法、並びにコンピュータ・プログラムに関する。 The present invention relates to an image analysis apparatus, an image analysis method, and a computer program that execute an object motion identification process based on image data. More specifically, for example, a feature portion such as a human face region is extracted from an image captured by a camera, and collation with a rule configured by simple data such as position data of the feature portion is performed, and the movement of the subject is determined. The present invention relates to an image analysis device to be identified, an image analysis method, and a computer program.

例えばミーティングなどの撮影データに基づいてミーティングにおけるトピックの変更などの時点に対応するインデックスを設定することで、インデックス付きのビデオデータを作成することができる。 For example, an indexed video data can be created by setting an index corresponding to a time point such as a topic change in a meeting based on shooting data such as a meeting.

このようなインデックスを自動的に設定する方法としては、例えば音声データに基づくスピーチ解析、画像データに基づく人物の顔抽出あるいは動作検出などの手法が考えられる。しかし、例えば音声データに基づくスピーチ解析によってトピック変更などの時点を判別するためには、膨大な辞書データや複雑な解析が必要であり、また、高精度な音声入力が必須となるという問題がある。 As a method for automatically setting such an index, for example, a speech analysis based on audio data, a human face extraction based on image data, or a motion detection method can be considered. However, for example, in order to determine the time of topic change by speech analysis based on speech data, there is a problem that enormous dictionary data and complicated analysis are required, and high-accuracy speech input is essential. .

このような観点から、昨今では、音声データではなく、画像データに基づいて効率的にインデックスを付与する技術についての開発が多くなされている。例えば、ミーティングにおいて発生した動作を画像データに基づいて識別するものである。例えばミーティングビデオデータに対してインデックスを付与することにより、後日、ビデオデータの閲覧やサーチを行なう場合の効率性を高めることが可能となる。なお、例えば非特許文献１には、ミーティングルームにおける重複のない撮影画像を取得するためのカメラ配置構成や、明るさの変化の影響を少なくした画像解析に有用な画像データを取得するための処理構成について記載されている。 From this point of view, in recent years, much development has been made on a technique for efficiently assigning an index based on image data, not audio data. For example, an operation occurring in a meeting is identified based on image data. For example, by adding an index to the meeting video data, it is possible to improve the efficiency when browsing or searching the video data at a later date. For example, Non-Patent Document 1 discloses a camera arrangement configuration for acquiring captured images without duplication in a meeting room, and a processing configuration for acquiring image data useful for image analysis with less influence of changes in brightness. Is described.

画像に基づく動作識別処理を開示した従来技術としては、例えば、特許文献１に記載の技術がある。特許文献１には、被写体を複数のカメラで異なる方向から撮影し、これらの複数の画像の解析を行うことで、被写体の３次元の動きを解析する手法が示されている。また、特許文献２には、人物の頭部の撮影画像に基づいて頭部の動きと、視線方向を検出して、人物のうなづき動作を高精度に検出する構成が示されている。
I. Mikic, K. Huang, and M. Trivedi, "Activity Monitoring and Summarization for an Intelligent Meeting Room," in proc. IEEE Workshop on Human Motion, Austin, Texas, Dec. 2000. 特開平１０−３３４２７０号公報特開２０００−１６３１９６号公報 As a prior art disclosing the operation identification processing based on an image, for example, there is a technique described in Patent Document 1. Japanese Patent Application Laid-Open No. 2004-228561 discloses a technique for analyzing a three-dimensional movement of a subject by photographing the subject from different directions with a plurality of cameras and analyzing the plurality of images. Japanese Patent Application Laid-Open No. 2004-228561 discloses a configuration for detecting the motion of the person with high accuracy by detecting the movement of the head and the direction of the line of sight based on the captured image of the person's head.
I. Mikic, K. Huang, and M. Trivedi, "Activity Monitoring and Summarization for an Intelligent Meeting Room," in proc.IEEE Workshop on Human Motion, Austin, Texas, Dec. 2000. JP 10-334270 A JP 2000-163196 A

しかし、上述した、従来技術に示されている画像データの解析処理は、いずれも撮影画像データに基づく極めて複雑なアルゴリズムによる解析を必要とするものであり、効率的な処理とは言えず、専用のシステムの構築の必要性、コスト高、処理負荷が大きいといった問題がある。例えば、特許文献１に記載された処理は、複数の撮影角度より撮影された複数の二次元画像における各画素毎の動きの方向と強さを算出し、算出された動きの強さが所定値以上の領域を特定し、特定された特定領域を複数の二次元画像に対して対応させ、特定領域の三次元の位置と上記特定領域の三次元の動きとを算出する手順と、算出された特定領域の動きが撮影対象のどの部位に対応しているかを推定する手順という処理を必要とするものである。 However, all of the above-described image data analysis processes shown in the prior art require analysis by an extremely complicated algorithm based on photographed image data, and are not efficient processes. There are problems such as necessity of construction of the system, high cost, and heavy processing load. For example, the process described in Patent Document 1 calculates the direction and intensity of movement for each pixel in a plurality of two-dimensional images captured from a plurality of imaging angles, and the calculated intensity of movement is a predetermined value. A procedure for identifying the above regions, associating the identified specific regions with a plurality of two-dimensional images, and calculating the three-dimensional position of the specific region and the three-dimensional movement of the specific region, This requires a process called a procedure for estimating which part of the imaging target corresponds to the movement of the specific area.

また、特許文献２に記載の技術は、撮影画像の顔の領域からエッジ情報を抽出し、エッジ情報に基づいて、人の顔における目の位置を推定し、推定した画像中の濃淡画像を生成して、目における黒目の位置を解析して視線方向を算出するとともに、目の位置の上下方向に動き量が閾値以上である場合にうなずきがあったと判定する処理を実行するものであり、取得画像のエッジ画像生成、目の位置の算出、濃淡画像の生成など多くの処理工程が必要となるという問題がある。 The technique described in Patent Document 2 extracts edge information from a face area of a captured image, estimates the position of an eye on a human face based on the edge information, and generates a grayscale image in the estimated image. Then, the process of calculating the line-of-sight direction by analyzing the position of the black eye in the eye, and executing the process of determining that there was nodding when the amount of motion is equal to or higher than the threshold in the vertical direction of the eye position. There is a problem that many processing steps such as edge image generation, eye position calculation, and gray image generation are required.

本発明は、このような問題点を解決し、簡易な画像データの解析により効率的に被写体の動きを解析することを可能とした画像解析装置、および画像解析方法、並びにコンピュータ・プログラムを提供することを目的とする。 The present invention provides an image analysis apparatus, an image analysis method, and a computer program that can solve such problems and efficiently analyze the motion of a subject by simple image data analysis. For the purpose.

さらに、詳細には、カメラによる撮影画像から人の顔領域などの特定領域を抽出し、特定領域の位置データなど簡易なデータによって構成されるルールとの照合を実行して、被写体の動きを識別する画像解析装置、および画像解析方法、並びにコンピュータ・プログラムを提供するものである。 In more detail, a specific area such as a human face area is extracted from the image captured by the camera, and the movement of the subject is identified by collating with a rule composed of simple data such as position data of the specific area. An image analysis apparatus, an image analysis method, and a computer program are provided.

本発明の第１の側面は、画像解析装置であり、入力画像データから画像特徴を抽出する特徴抽出部と、画像特徴に関する複数の画像態様定義情報を格納した記憶部と、各入力画像データの画像特徴と一致する画像態様定義情報を前記記憶部から選択し、該選択情報を入力画像データ対応の定義情報として設定する定義情報照合部と、前記定義情報照合部において入力画像データに対応付けられた定義情報に基づいて、前記画像特徴を含む被写体の画像態様を識別する画像識別処理部と、を有することを特徴とする画像解析装置にある。 A first aspect of the present invention is an image analysis apparatus, which includes a feature extraction unit that extracts image features from input image data, a storage unit that stores a plurality of pieces of image mode definition information related to image features, and each input image data An image mode definition information that matches an image feature is selected from the storage unit, a definition information matching unit that sets the selection information as definition information corresponding to input image data, and the definition information matching unit is associated with input image data. And an image identification processing unit that identifies an image mode of a subject including the image feature based on the definition information.

本構成によれば、入力画像データから画像特徴を抽出し、その画像特徴と、画像特徴に関する複数の画像態様定義情報との照合を行なって画像データの被写体情報の解析を行なう構成であるので、画像全体のマッチングなどを実行する必要がなく、簡略化した効率的な処理による被写体の姿勢や動作の判定が可能となる。 According to this configuration, the image feature is extracted from the input image data, and the image feature is compared with a plurality of image mode definition information related to the image feature to analyze the subject information of the image data. It is not necessary to perform matching of the entire image, and the posture and motion of the subject can be determined by simplified and efficient processing.

さらに、本発明の画像解析装置の一実施態様において、前記画像態様定義情報は、前記画像特徴を定義した簡易なルールの記述情報として構成されていることを特徴とする。 Furthermore, in an embodiment of the image analysis apparatus of the present invention, the image mode definition information is configured as simple rule description information defining the image feature.

本構成によれば、新たな画像態様定義情報の追加が容易であり、様々な被写体の姿勢や動作に応じた定義情報を設定して画像解析を行うことが可能となる。 According to this configuration, it is easy to add new image mode definition information, and it is possible to perform image analysis by setting definition information according to various subject postures and actions.

さらに、本発明の画像解析装置の一実施態様において、前記入力画像データは、動画像を構成する時系列に従った画像列であることを特徴とする。 Furthermore, in one embodiment of the image analysis apparatus of the present invention, the input image data is an image sequence according to a time series constituting a moving image.

本構成によれば、動画像を構成する時系列に従った画像列の解析により、被写体の起立する動作、挙手する動作など、時間軸に沿った被写体の動作を解析することが可能となる。 According to this configuration, it is possible to analyze the motion of the subject along the time axis, such as the motion of raising the subject and the motion of raising the hand, by analyzing the image sequence according to the time series constituting the moving image.

さらに、本発明の画像解析装置の一実施態様において、前記画像解析装置は、さらに、動画像を構成する時系列に従った画像列を構成する画像フレームから被写体の動きを検出する動き検出部を有し、前記特徴抽出部は、前記動き検出部においてフレーム間で動きの検出された領域を画像特徴領域として抽出する処理を実行する構成であり、前記ルール照合部は、動きの検出された画像特徴領域を含む画像フレームについて、画像態様定義情報との対応付け処理を実行する構成であることを特徴とする。 Furthermore, in an embodiment of the image analysis device of the present invention, the image analysis device further includes a motion detection unit that detects a motion of a subject from an image frame constituting an image sequence according to a time series constituting a moving image. The feature extraction unit is configured to execute a process of extracting an area in which motion is detected between frames in the motion detection unit as an image feature region, and the rule matching unit is configured to detect an image in which motion is detected. The image frame including the feature region is configured to execute the association process with the image mode definition information.

本構成によれば、動きの検出された領域のみを特徴領域として抽出し、その特徴領域の表示態様と、各種の特徴領域態様を定義した画像態様定義情報との照合を行なって画像の被写体情報の解析を行なう構成であるので、動き部分以外のデータの処理に対応する負荷の削減が可能となり効率的な被写体の動作判定が可能となる。 According to this configuration, only the region in which the motion is detected is extracted as a feature region, and the subject information of the image is obtained by collating the display mode of the feature region with the image mode definition information defining various feature region modes. Therefore, the load corresponding to the processing of data other than the moving part can be reduced, and the movement of the subject can be determined efficiently.

さらに、本発明の画像解析装置の一実施態様において、前記画像態様定義情報は、画像特徴を定義した簡易なルールの記述情報であり、特定領域の位置情報、特定領域のアスペクト比情報、特定領域のサイズ情報、特定領域の重心位置情報、特定領域間の距離情報、これらの各情報の少なくともいずれかの条件を定めた定義データであることを特徴とする。 Furthermore, in an embodiment of the image analysis apparatus of the present invention, the image mode definition information is simple rule description information defining an image feature, and includes position information of a specific area, aspect ratio information of a specific area, a specific area Size information, center-of-gravity position information of specific areas, distance information between specific areas, and definition data that defines at least one of these conditions.

本構成によれば、画像態様定義情報としての位置情報、アスペクト比情報、サイズ情報などを規定した簡易なルールとの照合を行なって画像フレームの被写体情報の解析を行なう構成であるので効率的な被写体の動作判定が可能となる。 According to this configuration, since the configuration is such that the object information of the image frame is analyzed by collating with a simple rule that defines position information, aspect ratio information, size information, etc. as image mode definition information. It is possible to determine the movement of the subject.

さらに、本発明の画像解析装置の一実施態様において、前記画像態様定義情報は、前記画像特徴の態様を定義した条件式によって構成された情報であることを特徴とする。 Furthermore, in an embodiment of the image analysis apparatus of the present invention, the image mode definition information is information configured by a conditional expression that defines the mode of the image feature.

本構成によれば、画像特徴と、位置情報、サイズ情報など定めた条件式との照合を行なって画像の被写体情報の解析を行なう構成であるので効率的な被写体の動作判定が可能となる。 According to this configuration, since the image feature is compared with conditional expressions such as position information and size information to analyze subject information of the image, it is possible to efficiently determine the motion of the subject.

さらに、本発明の画像解析装置の一実施態様において、前記画像識別処理部は、前記ルール照合部において動画像を構成する時系列に従った複数の画像フレームに対応付けられた画像態様定義情報の時系列シーケンスデータに基づいて、前記被写体の動作判定を実行する構成であることを特徴とする。 Furthermore, in an embodiment of the image analysis apparatus of the present invention, the image identification processing unit includes image mode definition information associated with a plurality of image frames according to a time series constituting a moving image in the rule matching unit. The present invention is characterized in that the subject motion determination is executed based on time-series sequence data.

本構成によれば、複数の画像フレームの連続的な特定領域の表示態様の変遷に基づく動作判定を行なう構成であるので、動画像においても正確な被写体情報の動作解析が可能となる。 According to this configuration, since the operation determination is performed based on the transition of the display mode of the continuous specific areas of the plurality of image frames, it is possible to accurately analyze the motion of the subject information even in the moving image.

さらに、本発明の画像解析装置の一実施態様において、前記特徴抽出部は、画像データのカラー判別処理に基づいて入力画像データにおける特徴抽出を実行する構成であることを特徴とする。 Furthermore, in an embodiment of the image analysis apparatus of the present invention, the feature extraction unit is configured to execute feature extraction from input image data based on color discrimination processing of image data.

本構成によれば、入力画像からの特徴抽出を画像データのカラー判別処理によって実行するので、例えば専用の顔解析装置などのシステムを用いることなく、エラーの少ない特徴抽出が可能となる。 According to this configuration, feature extraction from an input image is executed by color discrimination processing of image data, so that feature extraction with few errors can be performed without using a system such as a dedicated face analysis device.

さらに、本発明の画像解析装置の一実施態様において、入力画像データのカラー判別により人物のスキン（皮膚）領域と推定される部分を抽出するとともに、画像フレームから被写体の動きを検出する動き検出部からの情報に基づいて、フレーム間での動きが検出された領域を特徴領域として抽出し、前記ルール照合部は、入力画像データにおける特徴領域であるスキン領域の画像態様と一致する画像態様定義情報を選択し、該選択情報を画像フレーム対応の画像態様定義情報として設定し、前記画像識別処理部は、前記ルール照合部において画像フレームに対応付けられた画像態様定義情報に基づいて、前記特徴領域を構成要素とする人物の姿勢または動作識別処理を実行する構成であることを特徴とする。 Furthermore, in one embodiment of the image analysis apparatus of the present invention, a motion detection unit that extracts a portion estimated as a human skin (skin) region by color discrimination of input image data and detects a motion of a subject from an image frame Based on the information from the above, an area in which movement between frames is detected is extracted as a feature area, and the rule matching unit matches the image mode definition information that matches the image mode of the skin area that is the feature area in the input image data The selection information is set as image mode definition information corresponding to an image frame, and the image identification processing unit is configured to select the feature region based on the image mode definition information associated with the image frame in the rule matching unit. It is the structure which performs the attitude | position or motion identification process of the person which uses as a component.

本構成によれば、入力画像から特定領域として、人物の顔や手などのスキン領域を選択抽出し、その画像におけるスキン領域の表示態様と、各種の特徴領域の態様を定義した画像態様定義情報との照合を行なって画像の被写体解析を行なう構成であるので、人物の挙手動作や、起立動作、着席動作などの動作を簡易にかつ正確に判定することが可能となる。 According to this configuration, image aspect definition information in which a skin area such as a person's face or hand is selected and extracted from the input image as a specific area, and the display form of the skin area in the image and various feature areas are defined. Therefore, it is possible to easily and accurately determine a person's hand-raising operation, standing operation, seating operation, and the like.

さらに、本発明の第２の側面は、被写体を示す入力画像データから特徴を抽出する特徴抽出部と、画像特徴の態様を定義した定義情報を記憶する記憶部と、入力画像データと前記定義情報とを照合し、当該定義情報に基づいて被写体を識別する照合識別部とを具備することを特徴とする画像解析装置にある。 Furthermore, a second aspect of the present invention provides a feature extraction unit that extracts features from input image data indicating a subject, a storage unit that stores definition information that defines image feature modes, input image data, and the definition information And a collation identifying unit that identifies a subject based on the definition information.

さらに、本発明の画像解析装置の一実施態様において、前記特徴抽出部は、前記特徴として前記被写体の少なくとも一部の特徴を示す特徴要素を抽出する構成である。 Furthermore, in an embodiment of the image analysis apparatus of the present invention, the feature extraction unit is configured to extract a feature element indicating at least a part of the subject as the feature.

本構成によれば、被写体の少なくとも一部の特徴に基づく被写体の識別が実行され、被写体の姿勢や動作判定が確実に実行される。 According to this configuration, subject identification based on at least some of the features of the subject is executed, and subject posture and motion determination is reliably executed.

さらに、本発明の画像解析装置の一実施態様において、前記画像特徴の態様は、前記被写体の少なくとも一部の特徴を示す特徴要素を含むものである。 Furthermore, in one embodiment of the image analysis apparatus of the present invention, the image feature mode includes a feature element indicating at least a part of the feature of the subject.

さらに、本発明の画像解析装置の一実施態様において、前記特徴要素は、画像上の領域を示すものである。 Furthermore, in one embodiment of the image analysis apparatus of the present invention, the feature element indicates a region on the image.

本構成によれば、例えば人物の顔や手などの画像上の領域に基づく被写体の識別が実行され、被写体の姿勢や動作判定が確実に実行される。 According to this configuration, for example, identification of a subject based on a region on an image such as a person's face or hand is executed, and the posture or motion determination of the subject is reliably executed.

さらに、本発明の画像解析装置の一実施態様において、前記定義情報は、前記画像特徴を定義した簡易なルールの記述情報として構成されていることを特徴とする。 Furthermore, in an embodiment of the image analysis apparatus of the present invention, the definition information is configured as simple rule description information defining the image feature.

さらに、本発明の画像解析装置の一実施態様において、前記画像解析装置は、さらに、動画像を構成する時系列に従った画像列を構成する画像フレームから被写体の動きを検出する動き検出部を有し、前記特徴抽出部は、前記動き検出部においてフレーム間で動きの検出された領域を特徴領域として抽出する処理を実行する構成であり、前記照合識別部は、動きの検出された特徴領域を含む画像フレームについて、画像態様定義情報との対応付け処理を実行する構成であることを特徴とする。 Furthermore, in an embodiment of the image analysis device of the present invention, the image analysis device further includes a motion detection unit that detects a motion of a subject from an image frame constituting an image sequence according to a time series constituting a moving image. The feature extraction unit is configured to execute a process of extracting a region in which motion is detected between frames in the motion detection unit as a feature region, and the collation identification unit is a feature region in which motion is detected It is the structure which performs the matching process with the image mode definition information about the image frame containing this.

さらに、本発明の画像解析装置の一実施態様において、前記画像態様定義情報は、前記特徴を定義した簡易なルールの記述情報であり、特定領域の位置情報、特定領域のアスペクト比情報、特定領域のサイズ情報、特定領域の重心位置情報、特定領域間の距離情報、これらの各情報の少なくともいずれかの条件を定めた定義データであることを特徴とする。 Furthermore, in one embodiment of the image analysis apparatus of the present invention, the image mode definition information is simple rule description information defining the feature, and includes position information of a specific area, aspect ratio information of a specific area, a specific area Size information, center-of-gravity position information of specific areas, distance information between specific areas, and definition data that defines at least one of these conditions.

さらに、本発明の画像解析装置の一実施態様において、前記画像態様定義情報は、前記特徴の態様を定義した条件式によって構成された情報であることを特徴とする。 Furthermore, in an embodiment of the image analysis apparatus of the present invention, the image mode definition information is information configured by a conditional expression defining the feature mode.

さらに、本発明の画像解析装置の一実施態様において、前記照合識別部は、動画像を構成する時系列に従った複数の画像フレームに対応付けられた画像態様定義情報の時系列シーケンスデータに基づいて前記被写体の動作判定を実行する構成であることを特徴とする。 Furthermore, in one embodiment of the image analysis apparatus of the present invention, the collation identifying unit is based on time-series sequence data of image aspect definition information associated with a plurality of image frames according to a time series constituting a moving image. In this case, the movement determination of the subject is executed.

さらに、本発明の第３の側面は、被写体を示す入力画像データから被写体の特徴を示す特徴要素を抽出する特徴抽出部と、画像上の被写体の特徴を示す特徴要素の態様の条件を定義した定義情報を記憶する記憶部と、入力画像データと前記定義情報とを照合し、当該定義情報の条件に基づいて被写体を識別する照合識別部とを具備することを特徴とする画像解析装置にある。 Furthermore, in the third aspect of the present invention, a feature extraction unit that extracts a feature element that indicates a feature of a subject from input image data that indicates the subject, and a condition condition of the feature element that indicates the feature of the subject on the image are defined. An image analysis apparatus comprising: a storage unit that stores definition information; and a collation identifying unit that collates input image data with the definition information and identifies a subject based on a condition of the definition information. .

さらに、本発明の画像解析装置の一実施態様において、前記定義情報は、特徴要素の大きさ、特徴要素間の関係、特徴要素の位置の条件の少なくとも一つを示すものであり、前記照合識別部は、当該条件を満足するときに、入力画像データを当該画像の定義内容を示す被写体として識別するものであることを特徴とする。 Furthermore, in one embodiment of the image analysis device of the present invention, the definition information indicates at least one of a size of a feature element, a relationship between the feature elements, and a condition of the position of the feature element, and the verification identification The unit is characterized in that, when the condition is satisfied, the input image data is identified as a subject indicating the definition content of the image.

さらに、本発明のさらなる側面は、上述の画像解析装置に対応する処理を実行する画像解析方法およびコンピュータ・プログラムにある。 Furthermore, the further side surface of this invention exists in the image analysis method and computer program which perform the process corresponding to the above-mentioned image analysis apparatus.

本構成によれば、入力画像データから画像特徴を抽出し、その画像特徴と、画像特徴に関する複数の画像態様定義情報との照合を行なって画像データの被写体情報の解析を行なう構成であるので、画像全体のマッチングなどを実行する必要がなく、簡略化した効率的な処理による被写体の姿勢や動作の判定が可能となる方法およびコンピュータ・プログラムが提供される。 According to this configuration, the image feature is extracted from the input image data, and the image feature is compared with a plurality of image mode definition information related to the image feature to analyze the subject information of the image data. There is provided a method and a computer program capable of determining the posture and motion of a subject by simplified and efficient processing without performing matching of the entire image.

なお、本発明のコンピュータ・プログラムは、例えば、様々なプログラム・コードを実行可能なコンピュータシステムに対して、コンピュータ可読な形式で提供する記憶媒体、通信媒体、例えば、ＣＤやＦＤ、ＭＯなどの記録媒体、あるいは、ネットワークなどの通信媒体によって提供可能なコンピュータ・プログラムである。このようなプログラムをコンピュータ可読な形式で提供することにより、コンピュータシステム上でプログラムに応じた処理が実現される。 Note that the computer program of the present invention is a recording medium provided in a computer-readable format for a computer system capable of executing various program codes, for example, a recording medium such as a CD, FD, or MO. A computer program that can be provided by a medium or a communication medium such as a network. By providing such a program in a computer-readable format, processing corresponding to the program is realized on the computer system.

本発明のさらに他の目的、特徴や利点は、後述する本発明の実施例や添付する図面に基づくより詳細な説明によって明らかになるであろう。なお、本明細書においてシステムとは、複数の装置の論理的集合構成であり、各構成の装置が同一筐体内にあるものには限らない。 Other objects, features, and advantages of the present invention will become apparent from a more detailed description based on embodiments of the present invention described later and the accompanying drawings. In this specification, the system is a logical set configuration of a plurality of devices, and is not limited to one in which the devices of each configuration are in the same casing.

本発明の構成によれば、入力画像データから、例えば人の顔や手などの特定の画像特徴を抽出し、入力画像データの持つ画像特徴の態様と、各種の画像特徴態様を定義した例えば条件式などによって記述されたルールからなる画像態様定義情報との照合を行ない、各画像に適合するルールを選択して画像データに対応付けて、対応付けたルールまたはそのシーケンスに基づいて、画像特徴に対応する領域を構成要素とする被写体、例えば顔や手を持つ人物の姿勢や動作判定を行なう構成としたので、条件式などによって構成された記述データを満足する画像データであるか否かを判定する効率的な照合処理によって人物の姿勢判定や動作判定など正確な被写体識別が可能となる。 According to the configuration of the present invention, a specific image feature such as a human face or hand is extracted from input image data, for example, a condition of the image feature possessed by the input image data, and various image feature modes are defined. The image feature definition information consisting of rules described by formulas is collated, the rule that matches each image is selected and associated with the image data, and the image feature is determined based on the associated rule or its sequence. Since the posture and motion of a subject having a corresponding region as a constituent element, for example, a person with a face or hand, is determined, it is determined whether the image data satisfies the description data configured by the conditional expression. Thus, it is possible to accurately identify a subject such as a posture determination or a motion determination of a person by an efficient matching process.

また、本発明の構成では、画像態様定義情報を、画像特徴としての特定領域の位置情報、アスペクト比情報、サイズ情報、重心位置情報、特定領域間の距離情報などの条件を定めた定義データ、例えば条件式として設定し、画像の態様がこれらの定義データを満足するか否かを判定する処理に基づくルール対応付けが可能であり、画像相互のマッチングを行なう従来方式に比較すると高速な処理が可能となる。また、画像態様定義情報は条件式などの記述データとして構成されるので、新たなルールの作成が容易であり、様々な動作に対応するルールの追加および動作判定への適用が効率的に実行できる。 In the configuration of the present invention, the image mode definition information includes definition data that defines conditions such as position information of specific areas as image features, aspect ratio information, size information, barycentric position information, distance information between specific areas, For example, it is possible to perform rule association based on a process that is set as a conditional expression and determines whether or not the image mode satisfies these definition data, and is faster than the conventional method that performs matching between images. It becomes possible. In addition, since the image mode definition information is configured as descriptive data such as conditional expressions, it is easy to create new rules, and it is possible to efficiently add rules corresponding to various operations and apply them to operation determination. .

以下、図面を参照しながら本発明の画像解析装置、および画像解析方法、並びにコンピュータ・プログラムの詳細について説明する。 Hereinafter, the details of an image analysis apparatus, an image analysis method, and a computer program according to the present invention will be described with reference to the drawings.

図１は本発明の画像解析装置の構成を示すブロック図である。図１に示すように、本発明の画像解析装置は、例えばカメラ１０１〜１０ｎの撮影画像を入力し、撮影画像に基づく動き検出を実行する。なお、処理対象とする画像データは、カメラから直接入力する画像に限らず、記憶手段に予め格納した画像データでもよい。 FIG. 1 is a block diagram showing the configuration of the image analysis apparatus of the present invention. As shown in FIG. 1, the image analysis apparatus of the present invention inputs captured images of cameras 101 to 10n, for example, and performs motion detection based on the captured images. Note that the image data to be processed is not limited to an image directly input from the camera, but may be image data stored in advance in a storage unit.

本発明の画像解析装置は、例えばカメラ１０１〜１０ｎの撮影画像である処理対象画像データを入力するサンプリング部１２１、動作検出部１２２、特徴抽出部１２３、定義情報（ルール）照合部１２４、画像識別処理部１２５、定義情報（ルール）蓄積部１２６、アクション定義情報蓄積部１２７を有する。なお、定義情報（ルール）照合部１２４と画像識別処理部１２５とは一体化した照合識別部として構成してもよい。 The image analysis apparatus according to the present invention includes, for example, a sampling unit 121 that inputs processing target image data that is captured images of the cameras 101 to 10n, an operation detection unit 122, a feature extraction unit 123, a definition information (rule) matching unit 124, and image identification. A processing unit 125, a definition information (rule) storage unit 126, and an action definition information storage unit 127 are included. The definition information (rule) verification unit 124 and the image identification processing unit 125 may be configured as an integrated verification identification unit.

サンプリング部１２１は、カメラ１０１〜１０ｎの撮影画像、あるいは図示しないデータ記憶部から解析処理対象となる画像データを入力する。処理対象とする画像データは例えば動画像データであり、連続するフレーム画像データである。ここで、カメラ１０１〜１０ｎは、例えば、あるミーティングルームに固定されたカメラであり、それぞれ異なる方向からミーティングの参加者を撮影しているカメラである。 The sampling unit 121 inputs captured images of the cameras 101 to 10n or image data to be analyzed from a data storage unit (not shown). The image data to be processed is, for example, moving image data, and continuous frame image data. Here, the cameras 101 to 10n are, for example, cameras fixed in a certain meeting room, and are cameras that capture the participants of the meeting from different directions.

サンプリング部は、入力画像から、後段の動き検出部１０３および特徴抽出部１０４において解析を実行するための画像を選択する処理を実行する。このサンプリング処理においては特に画像の解析を実行することなく、例えば、２０フレーム毎に１つのフレームを選択する等間隔にデータ抽出するなどのサンプリング処理を実行する。 The sampling unit executes a process of selecting an image for executing analysis in the subsequent motion detection unit 103 and the feature extraction unit 104 from the input image. In this sampling process, for example, a sampling process such as extracting data at equal intervals for selecting one frame every 20 frames is executed without executing an image analysis.

具体的には、図１に示すように、カメラ１０１から入力する画像ストリームデータを［Ｓ１］としたとき、画像ストリーム［Ｓ１］についてδ秒毎にｎ１フレームを抽出するなどの処理を実行する。カメラ１０２から入力する画像ストリームデータ［Ｓ２］についても、同様に各δ秒毎にｎ２フレームを抽出する。以下、すべてのストリームＳ１〜Ｓｎについて同様のサンプリング処理を実行する。 Specifically, as shown in FIG. 1, when the image stream data input from the camera 101 is [S1], processing such as extracting an n1 frame every δ seconds is executed for the image stream [S1]. Similarly for the image stream data [S2] input from the camera 102, n2 frames are extracted every δ seconds. Thereafter, the same sampling process is executed for all the streams S1 to Sn.

サンプリング部１０２で抽出されたサンプルフレーム画像データは、動き検出部１０３と特徴抽出部１０４に入力される。 The sample frame image data extracted by the sampling unit 102 is input to the motion detection unit 103 and the feature extraction unit 104.

動き検出部１０３の処理について説明する。動き検出部１０３は、サンプリング部１０２において抽出されたサンプルフレーム画像データに基づいて、複数フレーム間の差分抽出などにより、動きのある被写体領域を判別する動き抽出処理を実行する。 Processing of the motion detection unit 103 will be described. Based on the sample frame image data extracted by the sampling unit 102, the motion detection unit 103 performs a motion extraction process for determining a subject area in motion by extracting a difference between a plurality of frames.

動き検出部１０３では、各ストリームＳ_ｉについて個別に動き検出を実行する。まず、時間［ｔ］において、時間［ｔ］までに取得済みのサンプルフレームデータに基づいて、各ストリームのフレーム平均［Ｓ_ｉａｖｇ_ｔ］を、下式（式１）に従って算出する。
Ｓ_ｉａｖｇ_ｔ＝（ｆ_１＋ｆ_２＋・・・ｆ_ｔ−１）／（ｔ−１）・・・（式１） The motion detection unit 103 individually performs motion detection for each stream S _i . First, at time [t], the frame average [S _i avg _t ] of each stream is calculated according to the following equation (Equation 1) based on the sample frame data acquired up to time [t].
S _i avg _t = (f ₁ + f ₂ +... F _t−1 ) / (t−1) (Equation 1)

上記式において、ｆ_ｉ＋ｆ_ｊは、各サンプルフレームの対応画素の積算値ｆ_ｉ（ｘ，ｙ）＋ｆ_ｊ（ｘ，ｙ）を示している。 In the above formula, f _i + f _j represents the integrated value f _i (x, y) + f _j (x, y) of the corresponding pixel of each sample frame.

次に、時間［ｔ］におけるサンプルフレームデータ［ｆ_ｔ］と、上記式（式１）に基づいて算出したフレーム平均［Ｓ_ｉａｖｇ_ｔ］の差分［ｄｓ_ｉｔ］を下式（式２）によって算出する。
ｄｓ_ｉｔ＝Ｓ_ｉａｖｇ_ｔ−ｆ_ｔ・・・（式２） Next, the difference [ds _it ] between the sample frame data [f _t ] at time [t] and the frame average [S _i avg _t ] calculated based on the above formula (formula 1) is _expressed by the following formula (formula 2). calculate.
ds _it = S _i avg _t −f _t (Expression 2)

上記式は、時間ｔにおける現在フレーム［ｆ_ｔ］と、上記式（式１）に基づいて算出したフレーム平均［Ｓ_ｉａｖｇ_ｔ］の画像の対応画素の差分データを算出する式である。各画素についての差分データを上記式（式２）に基づいて算出する。 The above expression is an expression for calculating difference data between corresponding pixels of the image of the current frame [f _t ] at time t and the frame average [S _i avg _t ] calculated based on the above expression (Expression 1). Difference data for each pixel is calculated based on the above formula (Formula 2).

次に、差分データに基づくエッジ抽出を実行し、変化領域マスクを生成する。エッジ抽出は、上記式（式２）に基づいて算出された差分データ［ｄｓ_ｉｔ］に対して、例えばＧａｕｓｓｉａｎｓｍｏｏｔｈｉｎｇＦｉｌｔｅｒを適用したフィルタリング処理によって実行され、この結果データに対する閾値判定により、動きのあった領域を判別するための変化領域マスクを生成する。 Next, edge extraction based on the difference data is executed to generate a change area mask. The edge extraction is performed by filtering processing, for example, applying a Gaussian smoothing filter to the difference data [ds _it ] calculated based on the above formula (Formula 2). A change area mask for discriminating the area is generated.

この変化領域マスクにおいてビット［１］の設定領域が、フレーム１〜ｔ−１の平均値と、現時点（ｔ）のフレームとの間に明らかな変化の発生した画素領域である。ビット［０］の設定領域は変化のない、すなわち動きのない領域として判別される。 In this change area mask, the set area of bit [1] is a pixel area in which a clear change has occurred between the average value of frames 1 to t-1 and the current frame (t). The set area of bit [0] is determined as an area that does not change, that is, an area that does not move.

このようにして得られた変化領域マスクとしてのバイナリイメージに対して、ホール除去などのためのＤｉｌａｔｉｏｎ処理を施して、バイナリマスクを生成する。バイナリマスクにおいては、画素値＝黒が変化領域を示す。 The binary image as the change area mask obtained in this way is subjected to Dilation processing for hole removal or the like to generate a binary mask. In the binary mask, the pixel value = black indicates the change area.

上述の動き検出部１０３の処理を模式的に説明した図が図２である。例えば図２（ａ）に示すのが、フレーム１〜ｔ−１の画像ｆ_１〜ｆ_ｔ−１であり、現時点（ｔ）の画像を図２（ｂ）に示す画像ｆ_ｔとしたとき、まず、図２（ａ）に示す画像ｆ_１〜ｆ_ｔ−１に基づいて、上記式（式１）に基づいてフレーム平均［Ｓ_ｉａｖｇ_ｔ］の画像を算出する。 FIG. 2 schematically illustrates the processing of the motion detection unit 103 described above. For example, FIG. 2A shows images f _{1 to} f _t−1 of frames _{1 to} _t−1 , and when the image at the present time (t) is set to an image f _t shown in FIG. First, based on the images f _{1 to} f _t−1 shown in FIG. 2A, an image of the frame average [S _i avg _t ] is calculated based on the above equation (Equation 1).

次に、フレーム平均画像と、［Ｓ_ｉａｖｇ_ｔ］と、図２（ｂ）に示す画像ｆ_ｔとに基づいて、上記式（式２）に基づいて差分データ［ｄｓ_ｉｔ］を算出し、算出された差分データ［ｄｓ_ｉｔ］に対して、例えばＧａｕｓｓｉａｎｓｍｏｏｔｈｉｎｇＦｉｌｔｅｒを適用したフィルタリング処理によってエッジ抽出を行い、閾値判定により、動きのあった領域を判別するための変化領域マスクを生成し、さらに、ホール除去などのためのＤｉｌａｔｉｏｎ処理を施すことにより、図２（ｃ）に示すような、黒い部分が変化領域を示すバイナリマスクが生成される。 Next, based on the frame average image, [S _i avg _t ], and the image f _t shown in FIG. 2B, the difference data [ds _it ] is calculated based on the above formula (formula 2), For the calculated difference data [ds _it ], for example, edge extraction is performed by a filtering process using a Gaussian smoothing filter, and a change area mask for determining a moving area is generated by threshold determination. By performing Dilation processing for hole removal or the like, a binary mask in which the black portion indicates the changed region as shown in FIG. 2C is generated.

バイナリマスクは、図に示すようにｎ×ｍにブロック分割され、各ブロックに対するスコアが設定される。スコアは、各ブロックに含まれる変化画素の積算値であり、この動き検出結果は、特徴抽出部１２３、定義情報（ルール）照合部１２４に入力され、特徴抽出部１２３での特定領域検出処理、および定義情報（ルール）照合部１２４でのルールとのマッチング処理に適用される。 As shown in the figure, the binary mask is divided into n × m blocks, and a score for each block is set. The score is an integrated value of the change pixels included in each block, and the motion detection result is input to the feature extraction unit 123 and the definition information (rule) matching unit 124, and a specific area detection process in the feature extraction unit 123. The definition information (rule) matching unit 124 applies the matching process with the rule.

次に、特徴抽出部１２３の処理について説明する。特徴抽出部１２３は、画像フレームから画像特徴、例えば特徴を持つ特定領域の抽出を実行する。例えば目的とする被写体の姿勢または動作識別が人の姿勢または動作の識別、例えば人が立ち上がった姿勢または動作、着席姿勢または動作、挙手の姿勢または動作などの識別を目的とした場合には、人の顔や手のスキン（皮膚）領域を画像特徴を持つ領域として抽出する。なお、何を画像特徴として抽出するかは、目的に応じて設定することが可能である。ここでは、一例として、人物の姿勢や動作を検出することを目的とした例として、人の顔や手のスキン領域を画像特徴に対応する特定領域として抽出する処理例について説明する。 Next, processing of the feature extraction unit 123 will be described. The feature extraction unit 123 extracts an image feature, for example, a specific region having a feature from the image frame. For example, if the intended subject posture or motion identification is intended to identify the posture or motion of a person, for example, the posture or motion of a person standing up, the sitting posture or motion, the posture or motion of a raised hand, etc. The face and hand skin (skin) regions are extracted as regions having image features. Note that what is extracted as an image feature can be set according to the purpose. Here, as an example, a process example of extracting a human face or a skin area of a hand as a specific area corresponding to an image feature will be described as an example for the purpose of detecting the posture or motion of a person.

本実施例においては、特徴抽出部１２３での特徴抽出には、カラーフィルタリングを適用する。なお、例えば人の顔の抽出処理は、このようなカラーフィルタリング処理に限らず、特別な顔検出装置、例えば文献［K. Murai and S. Nakamura "Real Time Face Detection for Multimodal Speech Recognition", in proceedings of ICME 2002, Vol.2, pp.373-376, 2002］に記載の顔検出専用のシステムなどを適用することも可能である。しかし、多くの顔検出システムは、正面向きでない顔の検出の精度が低下するなどの問題がある。 In this embodiment, color filtering is applied to feature extraction by the feature extraction unit 123. For example, human face extraction processing is not limited to such color filtering processing, but a special face detection device such as a document [K. Murai and S. Nakamura "Real Time Face Detection for Multimodal Speech Recognition", in proceedings. of ICME 2002, Vol.2, pp.373-376, 2002] can be applied. However, many face detection systems have problems such as a reduction in the accuracy of detection of faces that are not front-facing.

一方、単純なカラー検出を基本とするカラーベースのスキン検出は、顔の方向や、光環境の変化があった場合にも比較的正確な検出が可能であり、検出エラーが少なくなるという利点がある。本実施例で実行するスキン検出アルゴリズムは、例えば下記の文献
［A. Jaimes. Conceptual Structures and Computational Methods for Indexing and Organization of Visual Information, Ph.D. Thesis, Department of Electrical Engineering, Columbia University, February 2003］
に記述されたアルゴリズムをベースとしている。 On the other hand, color-based skin detection based on simple color detection has the advantage that relatively accurate detection is possible even when there is a change in face direction or light environment, and detection errors are reduced. . The skin detection algorithm executed in this embodiment is, for example, the following document [A. Jaimes. Conceptual Structures and Computational Methods for Indexing and Organization of Visual Information, Ph.D. Thesis, Department of Electrical Engineering, Columbia University, February 2003]
Based on the algorithm described in.

特徴抽出部１２３での画像特徴抽出、すなわちスキン領域抽出アルゴリズムについて説明する。まず、サンプリング部から入力する各フレーム［ｆ_ｉ］の画素値データをＨＳＶカラー空間座標へ展開する。ＨＳＶカラー空間座標は、色相（Ｈ）、彩度（Ｓ）、輝度（Ｖ）の３次元座標である。このＨＳＶ空間において、特定の領域がスキン（皮膚）のカラー領域に対応する。 An image feature extraction by the feature extraction unit 123, that is, a skin region extraction algorithm will be described. First, the pixel value data of each frame [f _i ] input from the sampling unit is developed into HSV color space coordinates. The HSV color space coordinates are three-dimensional coordinates of hue (H), saturation (S), and luminance (V). In the HSV space, a specific area corresponds to a skin color area.

各フレーム［ｆ_ｉ］の画素値データ中、ＨＳＶカラー空間座標におけるスキン（皮膚）のカラー領域に対応する画素をスキン画像領域として判定し、ＨＳＶカラー空間座標におけるスキン（皮膚）のカラー領域以外に属する画素データは、スキン領域以外であると判定する。 Among the pixel value data of each frame [f _i ], a pixel corresponding to the color area of the skin (skin) in the HSV color space coordinates is determined as a skin image area, and other than the color area of the skin (skin) in the HSV color space coordinates. The pixel data to which it belongs is determined to be outside the skin area.

上述のカラーフィルタリングによるスキン領域の抽出は、サンプリング部１２１から入力する各フレーム［ｆ_ｉ］各々に対して実行される。ただし、このカラーフィルタリング処理によっても、人のスキン（皮膚）と類似する画素値を持つ例えば壁やテーブルなどスキン（皮膚）領域以外の領域がスキン（皮膚）領域と判断される場合がある。 The above-described extraction of the skin region by color filtering is executed for each frame [f _i ] input from the sampling unit 121. However, even with this color filtering process, a region other than a skin (skin) region such as a wall or a table having a pixel value similar to a human skin (skin) may be determined as a skin (skin) region.

そこで、特徴抽出部１２３では、さらに、以下の処理を実行する。まず、上述のカラーフィルタリングによって抽出されたスキン領域をグルーピングする。スキン領域として判定された隣接する画素の集合をグループとして設定し、その境界を設定した境界領域としてのバウンディングボックスＳ_ｂｂを検出する。なお、一定の大きさより小さい領域は排除する。 Therefore, the feature extraction unit 123 further executes the following processing. First, the skin regions extracted by the above color filtering are grouped. Set a set of adjacent pixels determined as the skin region as a group, detecting the bounding box S _bb as a boundary area set as the boundary. Note that areas smaller than a certain size are excluded.

次に、バウンディングボックスＳ_ｂｂと同一の中心を設定したバウンディングボックスＳ_ｂｂの近接領域を含むアクティブバウンディングボックスＡ_ｂｂを設定し、このアクティブバウンディングボックスＡ_ｂｂ内においてフレーム間で動きが検出されているか否かを判定する。 Then, set the active bounding box A _bb comprising a contiguous area of the bounding box S _bb and bounding boxes S _bb set to the same center, or motion between frames is detected within this active bounding box A _bb not Determine whether.

動きの有無は、現フレームと過去に数フレーム遡ったフレーム間の差分情報に基づいて判定可能である。また、動き検出部１０３の処理結果に基づいて得られる、先に図２を参照して説明したバイナリマスクのブロック単位の情報、すなわち動きに応じたスコア情報を用いてもよい。動きのない領域はスキン領域から排除する。この処理によって動きのある領域が特定領域、すなわちスキン領域であると判定する。 The presence / absence of motion can be determined based on difference information between the current frame and a frame that is several frames back in the past. Further, information in units of blocks of the binary mask described above with reference to FIG. 2, which is obtained based on the processing result of the motion detection unit 103, that is, score information corresponding to motion may be used. The non-moving area is excluded from the skin area. By this processing, it is determined that the region with movement is a specific region, that is, a skin region.

図３は、特徴抽出部１２３において実行する画像特徴抽出処理としての特定領域（スキン領域）抽出処理シーケンスをフローチャートとして示した図である。 FIG. 3 is a flowchart showing a specific area (skin area) extraction process sequence as an image feature extraction process executed by the feature extraction unit 123.

まず、ステップＳ１０１において、サンプリング部１２１から入力する各フレームの画像データをＨＳＶカラー空間に展開する。次にステップＳ１０２において、スキン領域として判定された部分領域をグループ化してバウンディングボックスＳ_ｂｂを設定する。 First, in step S101, image data of each frame input from the sampling unit 121 is developed in the HSV color space. In step S102, the bounding box _Sbb is set by grouping the partial areas determined as skin areas.

ステップＳ１０３において、予め設定した閾値より小さな領域を排除する。次に、ステップＳ１０４において、バウンディングボックスＳ_ｂｂの近隣領域を含むアクティブバウンディングボックスＡ_ｂｂを設定する。 In step S103, an area smaller than a preset threshold is excluded. Next, in step S104, it sets the active bounding box _{A bb} including neighboring regions of the bounding box _{S bb.}

次にステップＳ１０５において、アクティブバウンディングボックスＡ_ｂｂにおいて動きが検出されたか否かを判定し、動きが検出された領域を、最終的にスキン領域、すなわち特定領域として抽出する。 Next, in step S105, it is determined whether or not a motion is detected in the active bounding box _Abb , and the region where the motion is detected is finally extracted as a skin region, that is, a specific region.

以上の処理によって、特徴抽出部１２３は、サンプリング部１２１から入力する各フレームの画像から、カラー判別および動き判別に基づく画像特徴の抽出、すなわち特定領域（スキン領域）を抽出する。この抽出結果は、定義情報（ルール）照合部１２４に入力される。 Through the above processing, the feature extraction unit 123 extracts image features based on color discrimination and motion discrimination, that is, extracts a specific region (skin region) from each frame image input from the sampling unit 121. This extraction result is input to the definition information (rule) matching unit 124.

次に、定義情報（ルール）照合部１２４において実行する処理の詳細について説明する。本発明の画像解析装置では、画像特徴としての特定領域についての位置情報などのレイアウト情報を記述した条件式などの画像態様定義情報をドメイン知識として設定し、この画像態様定義情報に基づいて、画像データから被写体の姿勢または動作などの識別、すなわちアクション判定を実行する。動作識別対象となる被写体は特定領域を構成要素とする被写体、例えばスキン領域を構成要素とする人物などである。 Next, details of processing executed in the definition information (rule) matching unit 124 will be described. In the image analysis apparatus of the present invention, image mode definition information such as a conditional expression describing layout information such as position information about a specific region as an image feature is set as domain knowledge, and an image is generated based on the image mode definition information. Identification of the posture or motion of the subject from the data, that is, action determination is executed. A subject to be an action identification target is a subject having a specific area as a constituent element, for example, a person having a skin area as a constituent element.

ドメイン知識としての画像態様定義情報を設定する上で、我々が観察した事項は、例えば以下に示す事項である。
＊人物と、非人物領域との判別：人物によって占有されない非人物領域は、アクティビティが低い領域となる。このような領域のアクティビティは主にノイズや光環境の変化に基づく。
＊視覚的構成：撮像対象となる例えばミーティングルームの物理的構成は、撮像画像すべてに渡り不変であり、例えば天井などは人物に占有される領域とはならず、アクティビティの低い状態が継続する。
＊人物の構成：ミーティングにおけるアクションは人物によって発生する。また、人物は特有の物理的構成を持つ。 The items we observed in setting the image mode definition information as domain knowledge are, for example, the following items.
* Discrimination between a person and a non-person area: A non-person area not occupied by a person is a low activity area. The activity in such areas is mainly based on noise and changes in the light environment.
* Visual configuration: The physical configuration of, for example, the meeting room to be imaged is unchanged over all the captured images. For example, the ceiling is not an area occupied by a person, and the state of low activity continues.
* Person structure: Actions in meetings occur by person. A person also has a specific physical structure.

このような観察事項に基づいて、画像特徴を示す特定領域についての位置情報などを含むルールを生成する。すなわち、画像フレームにおける特定領域の表示態様情報としてのレイアウト情報を記述したルール、例えば条件式を生成する。このルールを画像態様定義情報として定義情報（ルール）蓄積部１２６に格納する。定義情報（ルール）照合部１２４では、定義情報（ルール）蓄積部１２６に格納された画像態様定義情報をテンプレートとして、各サンプル画像フレーム［ｆ_ｉ］がどのルール（テンプレート）に対応するかの対応付けを行なう。 Based on such observation items, a rule including position information about a specific region indicating an image feature is generated. That is, a rule describing layout information as display mode information of a specific area in an image frame, for example, a conditional expression is generated. This rule is stored in the definition information (rule) storage unit 126 as image mode definition information. The definition information (rule) matching unit 124 uses the image mode definition information stored in the definition information (rule) storage unit 126 as a template to determine which rule (template) each sample image frame [f _i ] corresponds to. Do the attachment.

なお、定義情報（ルール）照合部１２４において、定義情報（ルール）蓄積部１２６に格納された画像態様定義情報との照合を実行する画像フレームは、サンプリング部１２１において抽出されたサンプル画像フレーム［ｆ_ｉ］のすべてではなく、動き検出部１２２と、特徴抽出部１２３との処理によってスキン領域と認められた特定領域を含む選別された画像フレームのみを対象とすることができる。 Note that the definition information (rule) matching unit 124 uses the sample image frame [f extracted by the sampling unit 121 as the image frame for matching with the image mode definition information stored in the definition information (rule) storage unit 126. It is possible to target only selected image frames including a specific region recognized as a skin region by processing of the motion detection unit 122 and the feature extraction unit 123, instead of all of _i ].

定義情報（ルール）照合部１２４では、スキン領域と認められた特定領域を含む画像フレームと、定義情報（ルール）蓄積部１２６に格納された画像態様定義情報との照合を実行して、画像フレームの各々について、それぞれ画像態様定義情報を対応付ける。 The definition information (rule) matching unit 124 performs matching between an image frame including a specific region recognized as a skin region and the image mode definition information stored in the definition information (rule) storage unit 126 to obtain an image frame. Each of these is associated with image mode definition information.

定義情報（ルール）蓄積部１２６に格納された画像態様定義情報の例について、図４を参照して説明する。 An example of the image mode definition information stored in the definition information (rule) accumulation unit 126 will be described with reference to FIG.

定義情報（ルール）蓄積部１２６には、画像特徴の様々な態様を定義したテンプレートに相当する多数の画像態様定義情報が格納される。それぞれが、例えば人物の所定のアクション、例えば手を上げた状態、起立した状態など、様々な動作に対応する画像態様を示す条件データとして設定される。 The definition information (rule) storage unit 126 stores a large number of image mode definition information corresponding to templates that define various modes of image features. Each of them is set as condition data indicating image modes corresponding to various actions such as a predetermined action of a person, for example, a state where the hand is raised, a state where the person stands up.

図４に示す例では、定義情報（ルール）蓄積部１２６に［Ｒ００００１］〜［Ｒｎｎｎｎｎ］の画像態様定義情報（テンプレート）が格納された例を示しており、その１つの画像態様定義情報の具体例を図４（ａ）に示している。画像態様定義情報は、例えば図４（ａ）に示すような特定領域の位置、サイズなどの特定領域情報をｉｆ，ｔｈｅｎの条件式として設定した構成を持つ。 The example shown in FIG. 4 shows an example in which the image mode definition information (template) of [R00001] to [Rnnnn] is stored in the definition information (rule) storage unit 126. An example is shown in FIG. The image mode definition information has a configuration in which specific area information such as the position and size of the specific area as shown in FIG. 4A is set as a conditional expression of if and then.

図４（ａ）に示す条件は、図４（ｂ）に示す特定領域（ａｒｅａ１，ａｒｅａ２）の態様に対応している。 The conditions shown in FIG. 4 (a) correspond to the specific areas (area1, area2) shown in FIG. 4 (b).

図４（ａ）に示す条件式は、
ｉｆ（７＜ａｒｅａ１ｓｉｚｅ＜９）
ａｎｄ（１＜ａｒｅａ２ｓｉｚｅ＜２）
ａｎｄ（１＜ｄｉｓｔａｎｃｅ（ａｒｅａ２，ａｒｅａ１）＜２）
ａｎｄ（１＜ｙ−ｄｉｓｔａｎｃｅ（ａｒｅａ２，ａｒｅａ１）＜２）
ｔｈｅｎ
Ｔｅｍｐｌａｔｅ［Ｒ００ｘｘｘ］（挙手アクション）
の構成であり、 The conditional expression shown in FIG.
if (7 <area1size <9)
and (1 <area2size <2)
and (1 <distance (area2, area1) <2)
and (1 <y-distance (area2, area1) <2)
then
Template [R00xxx] (Raising hand action)
The configuration of

例えば、図４（ｃ）に示す画像フレームに対して、定義情報（ルール）蓄積部１２６に格納された画像態様定義情報を対応付ける場合、図４（ｃ）に示す画像フレームに対して定義情報（ルール）蓄積部１２６に格納された画像態様定義情報［Ｒ０００００］から順次画像フレームとの照合処理を実行し、画像態様定義情報の条件に合致するものをその画像フレームの対応ルールとして選択する。ここでは、画像態様定義情報［Ｒ００ｘｘｘ］が選択された例を示している。 For example, when the image mode definition information stored in the definition information (rule) storage unit 126 is associated with the image frame shown in FIG. 4C, the definition information ( Rule) The image frame definition information [R00000] stored in the storage unit 126 is sequentially compared with an image frame, and a matching rule for the image frame definition information is selected as a matching rule for the image frame. Here, an example is shown in which image mode definition information [R00xxx] is selected.

例えば、図４（ｃ）に示す画像フレームにおいて、この画像フレームの画像特徴を示す特定領域、すなわちスキン領域は、図４（ｃ）に示す顔部分領域２０１と、手部分領域２０２であり、これらがそれぞれ図４（ａ）のルールの領域１（ａｒｅａ１）と領域２（ａｒｅａ２）に対応する。 For example, in the image frame shown in FIG. 4C, the specific area indicating the image feature of the image frame, that is, the skin area, is the face partial area 201 and the hand partial area 202 shown in FIG. Corresponds to the area 1 (area1) and area 2 (area2) of the rule in FIG.

図４（ｂ）に示す画像態様定義情報は、領域１（ａｒｅａ１）のサイズが７〜９、領域２（ａｒｅａ２）のサイズが１〜２、領域１（ａｒｅａ１）と領域２（ａｒｅａ２）との距離が１〜２、領域１（ａｒｅａ１）と領域２（ａｒｅａ２）とのｙ方向の距離が１〜２という条件を設定しており、図４（ｃ）に示す画像フレームは、この画像態様定義情報の条件を満足するので、図４（ｃ）に示す画像フレームは、この画像態様定義情報、すなわち画像態様定義情報［Ｒ００２０１］対応のフレームであると判定される。 In the image mode definition information shown in FIG. 4B, the size of the area 1 (area1) is 7 to 9, the size of the area 2 (area2) is 1 to 2, the area 1 (area1) and the area 2 (area2) The condition that the distance is 1 to 2 and the distance in the y direction between the area 1 (area1) and the area 2 (area2) is 1 to 2 is set. The image frame shown in FIG. Since the information condition is satisfied, the image frame shown in FIG. 4C is determined to be a frame corresponding to the image mode definition information, that is, the image mode definition information [R00201].

前述したように、定義情報（ルール）照合部１２４では、動きが認められ、スキン領域として判定された画像フレームと、定義情報（ルール）蓄積部１２６に格納された画像態様定義情報との照合を実行して、画像フレームの各々において識別された特定領域、すなわちここではスキン（皮膚）領域の表示態様と一致する画像態様定義情報を、画像フレーム対応のルールとして決定する。 As described above, the definition information (rule) collation unit 124 collates the image frame in which movement is recognized and determined as a skin area with the image mode definition information stored in the definition information (rule) storage unit 126. The image mode definition information that matches the display mode of the specific area identified in each of the image frames, that is, the skin (skin) area in this case, is determined as a rule corresponding to the image frame.

なお、画像態様定義情報は、図４（ａ）に示すようなｉｆ−ｔｈｅｎ形式の条件式に限らず、例えば特定領域の大きさ（ｓｉｚｅ）、位置などの情報を記述したのみのデータ、あるいは表形式などのテーブルデータなどとして保持し、定義情報（ルール）照合部１２４では、これらのデータに基づく対応付け処理を実行してもよい。 The image mode definition information is not limited to the conditional expression in the if-then format as shown in FIG. 4A, for example, data that only describes information such as the size (size) and position of a specific area, or The definition information (rule) matching unit 124 may store the data as table data in a table format or the like, and may execute association processing based on these data.

画像態様定義情報として定義される情報は、画像特徴を示す情報であり、例えば画像特徴を示す特定領域に関する以下の情報である。
特定領域の位置情報（例えば座標データ）
特定領域のアスペクト比情報（縦横比）
特定領域のサイズ情報（面積）
特定領域の重心位置情報
特定領域間の距離情報
これらの情報の少なくともいずれかの条件を定めた定義データとして画像態様定義情報が設定される。 The information defined as the image mode definition information is information indicating an image feature, for example, the following information regarding a specific region indicating the image feature.
Position information of specific area (eg coordinate data)
Aspect ratio information of specific area (aspect ratio)
Size information (area) of specific area
Center-of-gravity position information of a specific area Distance information between specific areas Image mode definition information is set as definition data that defines at least one of these conditions.

定義情報（ルール）照合部１２４で実行する具体的な処理例について説明する。例えば、カメラ１０１の撮像データとして取得される画像ストリームデータを［Ｓ１］とし、サンプリング部において抽出されたサンプル画像フレームを［ｆ_ｉ］としたとき、動き検出部１２２と、特徴抽出部１２３との処理によって、動きが認められ、スキン領域として判定された領域を含む画像フレームが、［Ｐ１］、［Ｐ２］、［Ｐ３］の３枚抽出されたとする。 A specific processing example executed by the definition information (rule) matching unit 124 will be described. For example, when the image stream data acquired as the imaging data of the camera 101 is [S1] and the sample image frame extracted by the sampling unit is [f _i ], the motion detection unit 122 and the feature extraction unit 123 It is assumed that three image frames [P1], [P2], and [P3] are extracted by the processing, including a region in which movement is recognized and determined as a skin region.

定義情報（ルール）照合部１２４では、これらの３枚の画像フレームについて、定義情報（ルール）蓄積部１２６に格納された画像態様定義情報との照合を実行して、画像フレーム［Ｐ１］〜［Ｐ３］の特定領域の表示状態を満足する画像態様定義情報を対応付ける。例えば、
画像フレーム［Ｐ１］→画像態様定義情報［Ｒ００００３］
画像フレーム［Ｐ２］→画像態様定義情報［Ｒ０００２１］
画像フレーム［Ｐ３］→画像態様定義情報［Ｒ００１０２］
などの対応付けが実行される。 The definition information (rule) collation unit 124 collates the image mode definition information stored in the definition information (rule) accumulation unit 126 for these three image frames, and performs image frames [P1] to [P1] to [ P3] is associated with image mode definition information that satisfies the display state of the specific area. For example,
Image frame [P1] → Image mode definition information [R00003]
Image frame [P2] → Image mode definition information [R00021]
Image frame [P3] → Image mode definition information [R00102]
Etc. are executed.

なお、画像フレーム［Ｐ１］，［Ｐ２］，［Ｐ３］は時間の経過に従って並べられた画像フレーム、すなわち、動きのあるスキン領域の検出された画像フレームのシーケンスである。なおルールの対応付け対象となる画像フレームの数は１以上任意の数が可能である。 Note that the image frames [P1], [P2], and [P3] are image frames arranged in accordance with the passage of time, that is, a sequence of image frames in which a moving skin area is detected. Note that the number of image frames to be associated with the rule can be any number from 1 to an arbitrary number.

画像識別処理部１２５では、定義情報（ルール）照合部１２４で対応付けられた画像フレームの画像態様定義情報シーケンスに基づく動作判定、すなわちアクションの識別処理を実行する。上記の例では、画像態様定義情報シーケンスは、［Ｒ００００３］→［Ｒ０００２１］→［Ｒ００１０２］となる。 The image identification processing unit 125 executes an operation determination based on the image mode definition information sequence of the image frame associated with the definition information (rule) matching unit 124, that is, an action identification process. In the above example, the image mode definition information sequence is [R00003] → [R00021] → [R00102].

例えば２つの画像フレーム［Ｐ１］，［Ｐ２］に基づく動作識別の実行例について図５を参照して説明する。 For example, an execution example of motion identification based on two image frames [P1] and [P2] will be described with reference to FIG.

図５（ａ）は、人物の挙手アクションとして識別される例を示している。例えば、定義情報（ルール）照合部１２４において、
画像フレーム［Ｐ１］→画像態様定義情報［Ｒ００１］
画像フレーム［Ｐ２］→画像態様定義情報［Ｒ００２］
に対応するものと判定され、これらのルールシーケンス情報［Ｒ００１］→［Ｒ００２］が、画像識別処理部１２５に入力される。 FIG. 5A shows an example identified as a person raising hand action. For example, in the definition information (rule) matching unit 124,
Image frame [P1] → Image mode definition information [R001]
Image frame [P2] → Image mode definition information [R002]
The rule sequence information [R001] → [R002] is input to the image identification processing unit 125.

画像識別処理部１２５は、様々なルールシーケンスに対応するアクション情報を設定したテーブルを格納したアクション定義情報蓄積部１２７のテーブルデータに基づいて、
画像態様定義情報［Ｒ００１］→画像態様定義情報［Ｒ００２］
のシーケンスに対応するアクション情報を抽出する。 The image identification processing unit 125 is based on the table data of the action definition information storage unit 127 that stores a table in which action information corresponding to various rule sequences is set.
Image mode definition information [R001] → Image mode definition information [R002]
Action information corresponding to the sequence is extracted.

アクション定義情報蓄積部１２７のテーブルデータには、画像態様定義情報［Ｒ００１］→画像態様定義情報［Ｒ００２］の対応アクションは、挙手アクションであると定義されており、画像識別処理部１２５は、このルールシーケンスが、挙手アクションであると結論付ける。 In the table data of the action definition information storage unit 127, the corresponding action of the image mode definition information [R001] → the image mode definition information [R002] is defined as a hand raising action, and the image identification processing unit 125 Conclude that the rule sequence is a raised hand action.

図５（ｂ）は、人物の起立アクションとして識別される例を示している。例えば、定義情報（ルール）照合部１２４において、
画像フレーム［Ｐ１］→画像態様定義情報［Ｒ００１］
画像フレーム［Ｐ２］→画像態様定義情報［Ｒ００３］
に対応するものと判定され、これらのルールシーケンス情報［Ｒ００１］→［Ｒ００３］が画像識別処理部１２５に入力される。 FIG. 5B shows an example identified as a person standing action. For example, in the definition information (rule) matching unit 124,
Image frame [P1] → Image mode definition information [R001]
Image frame [P2] → Image mode definition information [R003]
The rule sequence information [R001] → [R003] is input to the image identification processing unit 125.

画像識別処理部１２５はアクション定義情報蓄積部１２７から、
画像態様定義情報［Ｒ００１］→画像態様定義情報［Ｒ００３］
のシーケンスに対応するアクション情報を抽出する。 The image identification processing unit 125 receives the action definition information storage unit 127 from
Image mode definition information [R001] → Image mode definition information [R003]
Action information corresponding to the sequence is extracted.

アクション定義情報蓄積部１２７のテーブルデータには、画像態様定義情報［Ｒ００１］→画像態様定義情報［Ｒ００３］の対応アクションは、起立アクションであると定義されており、画像識別処理部１２５は、このルールシーケンスが、起立アクションであると結論付ける。 In the table data of the action definition information storage unit 127, the corresponding action of the image mode definition information [R001] → the image mode definition information [R003] is defined as an upright action, and the image identification processing unit 125 Conclude that the rule sequence is a standing action.

アクション定義情報蓄積部１２７のテーブルデータは、様々なアクションに対応する画像態様定義情報シーケンスを定義している。図６を参照して、アクション定義情報蓄積部１２７のテーブルデータについて説明する。 The table data of the action definition information storage unit 127 defines image mode definition information sequences corresponding to various actions. With reference to FIG. 6, the table data of the action definition information storage unit 127 will be described.

図６には、（ａ）挙手アクション、（ｂ）起立アクション、（ｃ）着席アクションの３つのアクションに対応する画像態様定義情報シーケンスの例を示している。 FIG. 6 shows an example of an image mode definition information sequence corresponding to three actions of (a) raising hand action, (b) standing action, and (c) sitting action.

（ａ）挙手アクションに対応する画像態様定義情報シーケンスとしては、例えば図６（ａ１）〜（ａ６）があり、これらのシーケンスが挙手アクションに対応するルールシーケンスとして、アクション定義情報蓄積部１２７のテーブルデータとして設定される。 (A) As the image mode definition information sequence corresponding to the raising hand action, there are, for example, FIGS. 6A1 to 6A6. These sequences are rule tables corresponding to the raising hand action, and the table of the action definition information storage unit 127 Set as data.

なお、画像態様定義情報は、先に図４（ａ）を参照して説明したような条件を示すデータとして記述され、これらの記述データに対応する識別情報のシーケンスが、アクション定義情報蓄積部１２７のテーブルデータとして設定される。図６では、理解を容易にするため、図４（ａ）を参照して説明した条件データの表現態様を示して説明する。 The image mode definition information is described as data indicating conditions as described above with reference to FIG. 4A, and the sequence of identification information corresponding to these description data is the action definition information storage unit 127. Is set as table data. In FIG. 6, in order to facilitate understanding, the expression mode of the condition data described with reference to FIG.

例えば図６（ａ１）のシーケンスは、
画像フレーム［Ｐ１］：顔（ｆａｃｅ）領域が中央上部に位置し、手（ｈａｎｄ）領域が左右下部それぞれに位置した状態を示す画像態様定義情報［Ｒ００１］
画像フレーム［Ｐ２］：顔（ｆａｃｅ）領域が中央上部に位置し、手（ｈａｎｄ）領域が左上部と右下部それぞれに位置した状態を示す画像態様定義情報［Ｒ００２］
画像フレーム［Ｐ３］：顔（ｆａｃｅ）領域が中央上部に位置し、手（ｈａｎｄ）領域が左右下部それぞれに位置した状態を示す画像態様定義情報［Ｒ００１］
のシーケンスを示している。 For example, the sequence of FIG.
Image frame [P1]: Image mode definition information [R001] indicating a state in which the face area is located at the upper center and the hand area is located at the lower left and right.
Image frame [P2]: Image mode definition information [R002] indicating a state in which the face area is located at the upper center and the hand area is located at the upper left and lower right.
Image frame [P3]: Image mode definition information [R001] indicating a state in which the face area is located at the upper center and the hand area is located at the lower left and right
The sequence is shown.

すなわち、アクション定義情報蓄積部１２７のテーブルデータには、
画像態様定義情報［Ｒ００１］→［Ｒ００２］→［Ｒ００１］のシーケンスに対応するアクションは、挙手アクションであると定義されており、画像識別処理部１２５は、画像態様定義情報［Ｒ００１］→［Ｒ００２］→［Ｒ００１］のシーケンスを定義情報（ルール）照合部１２４から入力した場合には、挙手アクションであると判定する。 That is, the table data of the action definition information storage unit 127 includes
The action corresponding to the sequence of image mode definition information [R001] → [R002] → [R001] is defined as a hand-raising action, and the image identification processing unit 125 performs image mode definition information [R001] → [R002. ] → [R001] is input from the definition information (rule) matching unit 124, it is determined that the action is a hand raising action.

同様に、図６（ａ１）〜（ａ６）の全ての画像態様定義情報シーケンスは全て、アクション定義情報蓄積部１２７のテーブルデータに挙手アクションであると定義され、画像識別処理部１２５は、ルール照合１２４から入力する画像態様定義情報シーケンスが、これらのシーケンスである場合は、挙手アクションが発生したとの判定を行なう。 Similarly, all the image mode definition information sequences in FIGS. 6A1 to 6A6 are all defined as a hand raising action in the table data of the action definition information storage unit 127, and the image identification processing unit 125 performs rule matching. If the image mode definition information sequence input from 124 is one of these sequences, it is determined that a hand raising action has occurred.

図６（ｂ）は、起立アクションに対応する画像態様定義情報シーケンスを示している。
例えば図６（ｂ１）のシーケンスは、
画像フレーム［Ｐ１］：顔（ｆａｃｅ）領域が中央下部に位置した状態を示す画像態様定義情報［Ｒ００４］
画像フレーム［Ｐ２］：顔（ｆａｃｅ）領域が中央上部に位置した状態を示す画像態様定義情報［Ｒ００５］
画像フレーム［Ｐ３］：顔（ｆａｃｅ）領域が中央下部に位置した状態を示す画像態様定義情報［Ｒ００４］
のシーケンスを示している。 FIG. 6B shows an image mode definition information sequence corresponding to the standing action.
For example, the sequence of FIG.
Image frame [P1]: Image mode definition information [R004] indicating a state where the face area is located at the lower center.
Image frame [P2]: Image mode definition information [R005] indicating a state where the face area is located at the upper center.
Image frame [P3]: Image mode definition information [R004] indicating a state in which the face area is located at the lower center.
The sequence is shown.

アクション定義情報蓄積部１２７のテーブルデータには、
画像態様定義情報［Ｒ００４］→［Ｒ００５］→［Ｒ００４］のシーケンスに対応するアクションは、起立アクションであると定義されており、画像識別処理部１２５は、画像態様定義情報［Ｒ００４］→［Ｒ００５］→［Ｒ００４］のシーケンス情報を定義情報（ルール）照合部１２４から入力した場合には、起立アクションが発生したと判定する。 The table data of the action definition information storage unit 127 includes
The action corresponding to the sequence of image mode definition information [R004] → [R005] → [R004] is defined as an upright action, and the image identification processing unit 125 performs image mode definition information [R004] → [R005. ] → [R004] sequence information is input from the definition information (rule) matching unit 124, it is determined that an upright action has occurred.

図６（ｃ）は、着席アクションに対応する画像態様定義情報シーケンスを示している。
例えば図６（ｃ１）のシーケンスは、
画像フレーム［Ｐ１］：顔（ｆａｃｅ）領域が中央上部に位置した状態を示す画像態様定義情報［Ｒ００５］
画像フレーム［Ｐ２］：顔（ｆａｃｅ）領域が中央下部に位置した状態を示す画像態様定義情報［Ｒ００４］
画像フレーム［Ｐ３］：顔（ｆａｃｅ）領域が中央上部に位置した状態を示す画像態様定義情報［Ｒ００５］
のシーケンスを示している。 FIG. 6C shows an image mode definition information sequence corresponding to the seating action.
For example, the sequence of FIG.
Image frame [P1]: Image mode definition information [R005] indicating that the face area is located at the upper center
Image frame [P2]: Image mode definition information [R004] indicating a state where the face area is located at the lower center.
Image frame [P3]: Image mode definition information [R005] indicating a state where the face area is located at the upper center.
The sequence is shown.

アクション定義情報蓄積部１２７のテーブルデータには、
画像態様定義情報［Ｒ００５］→［Ｒ００４］→［Ｒ００５］のシーケンスに対応するアクションは、着席アクションであると定義されており、画像識別処理部１２５は、画像態様定義情報［Ｒ００５］→［Ｒ００４］→［Ｒ００５］のシーケンス情報を定義情報（ルール）照合部１２４から入力した場合には、着席アクションが発生したと判定する。 The table data of the action definition information storage unit 127 includes
The action corresponding to the sequence of image mode definition information [R005] → [R004] → [R005] is defined as a seating action, and the image identification processing unit 125 performs image mode definition information [R005] → [R004. ] → [R005] sequence information is input from the definition information (rule) matching unit 124, it is determined that a seating action has occurred.

このように、定義情報（ルール）照合部１２４では、動きのあるスキン領域が検出された画像フレーム［Ｐ１］〜［Ｐｎ］について、それぞれ、定義情報（ルール）蓄積部１２６に格納された画像態様定義情報との照合を実行して、画像フレーム［Ｐ１］〜［Ｐｎ］の特定領域の表示状態を満足する画像態様定義情報［Ｒｘｘｘ］〜［Ｒｙｙｙ］を対応付ける。なお、画像フレーム［Ｐ１］〜［Ｐｎ］は時系列に並べられ、画像フレーム［Ｐ１］〜［Ｐｎ］に対応する画像態様定義情報［Ｒｘｘｘ］〜［Ｒｙｙｙ］も時系列データとして設定されたシーケンスデータである。 As described above, the definition information (rule) matching unit 124 stores the image modes stored in the definition information (rule) storage unit 126 for the image frames [P1] to [Pn] in which the moving skin region is detected. Collation with the definition information is executed to associate the image mode definition information [Rxxx] to [Ryyy] satisfying the display state of the specific area of the image frames [P1] to [Pn]. The image frames [P1] to [Pn] are arranged in time series, and the image mode definition information [Rxxx] to [Ryyy] corresponding to the image frames [P1] to [Pn] are also set as time series data. It is data.

次に、画像識別処理部１２５は、アクション定義情報蓄積部１２７から画像態様定義情報シーケンスとアクションとを対応付けたテーブルに基づいて発生アクションを決定する。 Next, the image identification processing unit 125 determines the action to be generated based on a table in which the image mode definition information sequence and the actions are associated with each other from the action definition information storage unit 127.

すなわち、定義情報（ルール）照合部１２４からの入力情報、すなわち、時系列に並べられた画像フレーム［Ｐ１］〜［Ｐｎ］に対応する画像態様定義情報［Ｒｘｘｘ］〜［Ｒｙｙｙ］のルールシーケンスに対応するデータエントリを、アクション定義情報蓄積部１２７から取得したテーブルから検索、抽出して、抽出エントリに対応するアクションを、画像フレーム［Ｐ１］〜［Ｐｎ］から識別されるアクションとして決定する。 That is, the input information from the definition information (rule) matching unit 124, that is, the rule sequence of the image mode definition information [Rxxx] to [Ryyy] corresponding to the image frames [P1] to [Pn] arranged in time series. The corresponding data entry is searched and extracted from the table acquired from the action definition information storage unit 127, and the action corresponding to the extracted entry is determined as the action identified from the image frames [P1] to [Pn].

図７、図８を参照して、定義情報（ルール）照合部１２４の処理と、画像識別処理部１２５の処理手順について説明する。 The processing of the definition information (rule) matching unit 124 and the processing procedure of the image identification processing unit 125 will be described with reference to FIGS.

まず、図７を参照して、定義情報（ルール）照合部１２４の処理手順について説明する。 First, the processing procedure of the definition information (rule) matching unit 124 will be described with reference to FIG.

定義情報（ルール）照合部１２４は、まず、ステップＳ２０１において、時系列に設定した画像フレーム［Ｐ１］〜［Ｐｎ］から１つずつ、ルール対応付け処理対象フレーム［Ｐｉ］として選択する。 First, in step S201, the definition information (rule) matching unit 124 selects, as the rule association processing target frame [Pi], one by one from the image frames [P1] to [Pn] set in time series.

ステップＳ２０２において、処理対象の画像フレーム［Ｐｉ］に一致する画像態様定義情報［Ｒｘ］を選択して、フレーム対応ルールを決定する。 In step S202, image mode definition information [Rx] that matches the image frame [Pi] to be processed is selected, and a frame correspondence rule is determined.

ステップＳ２０３において、時系列に設定された画像フレーム［Ｐ１］〜［Ｐｎ］全てに対する画像態様定義情報の対応付けが終了したか否かを判定し、未終了の画像フレームがある場合は、ステップＳ２０１以下の処理を繰り返し実行する。全ての画像フレーム［Ｐ１］〜［Ｐｎ］に対する画像態様定義情報の対応付けが終了すると、ステップＳ２０４に進み、画像フレーム［Ｐ１］〜［Ｐｎ］に対応するルールシーケンス情報［Ｒｘ］〜［Ｒｙ］を画像識別処理部に出力する。 In step S203, it is determined whether or not the association of the image mode definition information with all the image frames [P1] to [Pn] set in time series is completed. If there is an unfinished image frame, step S201 is performed. The following process is repeated. When the association of the image mode definition information with all the image frames [P1] to [Pn] is completed, the process proceeds to step S204, and the rule sequence information [Rx] to [Ry] corresponding to the image frames [P1] to [Pn]. Is output to the image identification processing unit.

次に、図８を参照して、画像識別処理部１２５の処理手順について説明する。画像識別処理部１２５は、ステップＳ３０１において、時系列に設定した画像フレーム［Ｐ１］〜［Ｐｎ］に対応するルールシーケンス情報［Ｒｘ］〜［Ｒｙ］を定義情報（ルール）照合部１２４から入力する。 Next, the processing procedure of the image identification processing unit 125 will be described with reference to FIG. In step S301, the image identification processing unit 125 inputs rule sequence information [Rx] to [Ry] corresponding to the image frames [P1] to [Pn] set in time series from the definition information (rule) matching unit 124. .

次にステップＳ３０２において、アクション定義情報蓄積部１２７に格納されたテーブルに基づいて、画像フレーム［Ｐ１］〜［Ｐｎ］に対応するルールシーケンス情報［Ｒｘ］〜［Ｒｙ］に一致するシーケンスを設定したエントリを抽出し、ステップＳ３０３において、その抽出エントリに設定された動作情報を発生アクションとして決定する。 Next, in step S302, a sequence that matches the rule sequence information [Rx] to [Ry] corresponding to the image frames [P1] to [Pn] is set based on the table stored in the action definition information storage unit 127. An entry is extracted, and in step S303, the operation information set in the extracted entry is determined as a generated action.

以上の処理によって、例えばカメラによって撮影された画像データに基づく、発生動作の識別処理が実行される。このように識別された動作に基づいて、例えばビデオデータに対するインデックス付与などが実行される。 Through the above processing, for example, the generation operation identification processing based on the image data photographed by the camera is executed. Based on the operation identified in this way, for example, indexing of video data is performed.

実際の処理として、ある１時間のミーティングを異なる方向からの４台のカメラで撮影したデータを用いた動作識別処理を行なった。各カメラの撮影フレームは、約３０００フレームであり、計１２０００フレームを取得した。カメラ撮影データはＭＰＥＧ−１データであり、３５２×２４０画素のフレームデータによって構成される。まず、カメラの撮影したデータストリームから１秒毎に１つのサンプルフレームを抽出し、このサンプルフレームについて、上述の特定領域、すなわちスキン領域検出および動き検出を行い、動きのあるスキン領域を持つ画像フレームについて画像態様定義情報との対応付け処理を実行し、そのシーケンスに基づく動作判定を行なった。結果として、手の動きや一般的な人物の動作について効率的で正確な判定がなされた。 As an actual process, an action identification process was performed using data obtained by photographing a meeting for one hour with four cameras from different directions. The number of frames taken by each camera was about 3000 frames, and a total of 12000 frames were acquired. The camera shooting data is MPEG-1 data and is composed of frame data of 352 × 240 pixels. First, one sample frame is extracted every second from the data stream photographed by the camera, the above-mentioned specific region, that is, the skin region detection and motion detection is performed on this sample frame, and an image frame having a moving skin region A process of associating the image with the image mode definition information was executed, and an operation determination based on the sequence was performed. As a result, efficient and accurate judgments about hand movements and general human movements were made.

本発明の画像解析装置および方法においては、例えば図４（ａ）に示したように、特定領域を定義したルールを設定し、各選別画像について対応するルールを選択することにより、動作シーケンスを判定するものであり、図４（ａ）に示したように、特定領域を定義したルールは、特定領域情報、すなわち、
特定領域の位置情報（例えば座標データ）
特定領域のアスペクト比情報（縦横比）
特定領域のサイズ情報（面積）
特定領域の重心位置情報
特定領域間の距離情報
これらの情報の少なくともいずれかの条件を定めた定義データとして設定されるものであり、その作成はきわめて容易であり、様々な新たな画像態様定義情報を生成することが可能であり、実施例で説明した挙手、起立、着席などの動作のみならず、例えば握手、うなずきなど、様々な動作を定義した画像態様定義情報を容易に作成することが可能となる。 In the image analysis apparatus and method of the present invention, for example, as shown in FIG. 4A, a rule defining a specific area is set, and a rule corresponding to each selected image is selected to determine an operation sequence. As shown in FIG. 4A, the rule defining the specific area is specific area information, that is,
Position information of specific area (eg coordinate data)
Aspect ratio information of specific area (aspect ratio)
Size information (area) of specific area
Center-of-gravity position information of a specific area Distance information between specific areas It is set as definition data that defines at least one of the conditions of these information, and its creation is extremely easy, and various new image mode definition information It is possible to generate image mode definition information that defines various operations such as handshake and nodding as well as the operations such as raising hand, standing up, and sitting as described in the embodiment. It becomes.

従って、従来の、例えば画像データそのものをテンプレートとしてテンプレートマッチングを行なう手法に比較すると、テンプレートとして適用する画像態様定義情報の作成が極めて簡単であり、画像態様定義情報の生成により、様々な動作判定を効率的に実行することが可能となる。また、画像態様定義情報は画像データそのものをテンプレートとして設定した場合に比較すると、データ量が大幅に削減され、データ処理の効率化、メモリ量の削減、装置の小型化を実現することが可能となる。 Therefore, compared with the conventional method of performing template matching using, for example, image data itself as a template, creation of image mode definition information to be applied as a template is extremely simple, and various operation determinations can be made by generating image mode definition information. It becomes possible to execute efficiently. In addition, the image mode definition information is significantly reduced in data amount compared to the case where the image data itself is set as a template, and it is possible to achieve more efficient data processing, a reduced memory amount, and a smaller device. Become.

最後に、上述した画像解析装置のハードウェア構成例について、図９を参照して説明する。 Finally, a hardware configuration example of the above-described image analysis apparatus will be described with reference to FIG.

ＣＰＵ（Central Processing Unit）５０１は、ＯＳ（Operating System)、上述の実施例において説明したサンプリング処理、特定領域検出処理、動き検出処理、ルール照合処理、動作識別処理の各処理の実行シーケンスを記述した各種コンピュータ・プログラムに従った処理を実行する制御部である。 A CPU (Central Processing Unit) 501 describes an execution sequence of each process of the OS (Operating System), the sampling process, the specific area detection process, the motion detection process, the rule matching process, and the action identification process described in the above embodiment. It is a control part which performs the process according to various computer programs.

ＲＯＭ（Read Only Memory）５０２は、ＣＰＵ５０１が使用するプログラムや演算パラメータ等を格納する。ＲＡＭ（Random Access Memory）５０３は、ＣＰＵ５０１の実行において使用するプログラムや、その実行において適宜変化するパラメータ等を格納する。これらはＣＰＵバスなどから構成されるホストバス５０４により相互に接続されている。 A ROM (Read Only Memory) 502 stores programs used by the CPU 501, calculation parameters, and the like. A RAM (Random Access Memory) 503 stores programs used in the execution of the CPU 501, parameters that change as appropriate during the execution, and the like. These are connected to each other by a host bus 504 including a CPU bus.

ホストバス５０４は、ブリッジ５０５を介して、ＰＣＩ(Peripheral Component Interconnect/Interface)バスなどの外部バス５０６に接続されている。 The host bus 504 is connected to an external bus 506 such as a PCI (Peripheral Component Interconnect / Interface) bus via a bridge 505.

キーボード５０８、ポインティングデバイス５０９は、ユーザにより操作される入力デバイスである。ディスプレイ５１０は、液晶表示装置またはＣＲＴ（Cathode Ray Tube）などから成り、各種情報をテキストやイメージで表示する。 A keyboard 508 and a pointing device 509 are input devices operated by the user. The display 510 includes a liquid crystal display device, a CRT (Cathode Ray Tube), or the like, and displays various types of information as text and images.

ＨＤＤ（Hard Disk Drive）５１１は、ハードディスクを内蔵し、ハードディスクを駆動し、ＣＰＵ５０１によって実行するプログラムや情報を記録または再生させる。ハードディスクは、図１に示す定義情報（ルール）蓄積部１２６、アクション定義情報蓄積部１２７に格納されるデータの格納部としての役割、および各種のデータ処理プログラム等、各種コンピュータ・プログラムが格納される。 An HDD (Hard Disk Drive) 511 includes a hard disk, drives the hard disk, and records or reproduces a program executed by the CPU 501 and information. The hard disk stores various computer programs such as a role as a storage unit of data stored in the definition information (rule) storage unit 126 and action definition information storage unit 127 shown in FIG. 1 and various data processing programs. .

ドライブ５１２は、装着されている磁気ディスク、光ディスク、光磁気ディスク、または半導体メモリ等のリムーバブル記録媒体５２１に記録されているデータまたはプログラムを読み出して、そのデータまたはプログラムを、インタフェース５０７、外部バス５０６、ブリッジ５０５、およびホストバス５０４を介して接続されているＲＡＭ５０３に供給する。 The drive 512 reads data or a program recorded in a removable recording medium 521 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, and the data or program is read out from the interface 507 and the external bus 506. , And supplied to the RAM 503 connected via the bridge 505 and the host bus 504.

接続ポート５１４は、外部接続機器５２２を接続するポートであり、ＵＳＢ，ＩＥＥＥ１３９４等の接続部を持つ。接続ポート５１４は、インタフェース５０７、および外部バス５０６、ブリッジ５０５、ホストバス５０４等を介してＣＰＵ５０１等に接続されている。通信部５１５は、ネットワークに接続され、ＣＰＵ５０１、またはＨＤＤ５１１等からの供給データの送信、データ受信を実行する。 The connection port 514 is a port for connecting the external connection device 522 and has a connection unit such as USB or IEEE1394. The connection port 514 is connected to the CPU 501 and the like via the interface 507, the external bus 506, the bridge 505, the host bus 504, and the like. A communication unit 515 is connected to a network and executes transmission of data supplied from the CPU 501 or the HDD 511 and data reception.

なお、図９に示す画像解析装置のハードウェア構成例は、ＰＣを適用して構成した装置の一例であり、本発明の画像解析装置は、図９に示す構成に限らず、上述した実施例において説明した処理を実行可能な構成であればよい。 Note that the hardware configuration example of the image analysis apparatus shown in FIG. 9 is an example of an apparatus configured by applying a PC, and the image analysis apparatus of the present invention is not limited to the configuration shown in FIG. Any configuration can be used as long as the processing described in the above item can be executed.

以上、特定の実施例を参照しながら、本発明について詳解してきた。しかしながら、本発明の要旨を逸脱しない範囲で当業者が該実施例の修正や代用を成し得ることは自明である。すなわち、例示という形態で本発明を開示してきたのであり、限定的に解釈されるべきではない。本発明の要旨を判断するためには、冒頭に記載した特許請求の範囲の欄を参酌すべきである。 The present invention has been described in detail above with reference to specific embodiments. However, it is obvious that those skilled in the art can make modifications and substitutions of the embodiments without departing from the gist of the present invention. In other words, the present invention has been disclosed in the form of exemplification, and should not be interpreted in a limited manner. In order to determine the gist of the present invention, the claims section described at the beginning should be considered.

なお、明細書中において説明した一連の処理はハードウェア、またはソフトウェア、あるいは両者の複合構成によって実行することが可能である。ソフトウェアによる処理を実行する場合は、処理シーケンスを記録したプログラムを、専用のハードウェアに組み込まれたコンピュータ内のメモリにインストールして実行させるか、あるいは、各種処理が実行可能な汎用コンピュータにプログラムをインストールして実行させることが可能である。 The series of processes described in the specification can be executed by hardware, software, or a combined configuration of both. When executing processing by software, the program recording the processing sequence is installed in a memory in a computer incorporated in dedicated hardware and executed, or the program is executed on a general-purpose computer capable of executing various processing. It can be installed and run.

例えば、プログラムは記録媒体としてのハードディスクやＲＯＭ（Read Only Memory)に予め記録しておくことができる。あるいは、プログラムはフレキシブルディスク、ＣＤ−ＲＯＭ(Compact Disc Read Only Memory)，ＭＯ(Magneto optical)ディスク，ＤＶＤ(Digital Versatile Disc)、磁気ディスク、半導体メモリなどのリムーバブル記録媒体に、一時的あるいは永続的に格納（記録）しておくことができる。このようなリムーバブル記録媒体は、いわゆるパッケージソフトウエアとして提供することができる。 For example, the program can be recorded in advance on a hard disk or ROM (Read Only Memory) as a recording medium. Alternatively, the program is temporarily or permanently stored on a removable recording medium such as a flexible disk, a CD-ROM (Compact Disc Read Only Memory), an MO (Magneto optical) disk, a DVD (Digital Versatile Disc), a magnetic disk, or a semiconductor memory. It can be stored (recorded). Such a removable recording medium can be provided as so-called package software.

なお、プログラムは、上述したようなリムーバブル記録媒体からコンピュータにインストールする他、ダウンロードサイトから、コンピュータに無線転送したり、ＬＡＮ(Local Area Network)、インターネットといったネットワークを介して、コンピュータに有線で転送し、コンピュータでは、そのようにして転送されてくるプログラムを受信し、内蔵するハードディスク等の記録媒体にインストールすることができる。 The program is installed on the computer from the removable recording medium as described above, or is wirelessly transferred from the download site to the computer, or is wired to the computer via a network such as a LAN (Local Area Network) or the Internet. The computer can receive the program transferred in this manner and install it on a recording medium such as a built-in hard disk.

なお、明細書に記載された各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。また、本明細書においてシステムとは、複数の装置の論理的集合構成であり、各構成の装置が同一筐体内にあるものには限らない。 Note that the various processes described in the specification are not only executed in time series according to the description, but may be executed in parallel or individually according to the processing capability of the apparatus that executes the processes or as necessary. Further, in this specification, the system is a logical set configuration of a plurality of devices, and the devices of each configuration are not limited to being in the same casing.

以上、説明したように、本発明の構成によれば、入力画像データから、例えば人の顔や手などの特定の画像特徴を抽出し、その入力画像データの特徴態様と、各種の画像特徴態様を定義した例えば条件式などによって記述されたルールからなる画像態様定義情報との照合を行ない、各画像に適合するルールを選択して画像データに対応付けて、対応付けたルールまたはそのシーケンスに基づいて、画像特徴に対応する領域を構成要素とする被写体、例えば顔や手を持つ人物の姿勢や動作判定を行なう構成としたので、条件式などによって構成された記述データを満足する画像データであるか否かを判定する効率的な照合処理によって人物の姿勢判定や動作判定など正確な被写体識別が可能となる。 As described above, according to the configuration of the present invention, a specific image feature such as a human face or hand is extracted from input image data, and the feature mode of the input image data and various image feature modes are extracted. For example, the image mode definition information including rules described by a conditional expression or the like is defined, and a rule that matches each image is selected and associated with the image data. Based on the associated rule or sequence thereof Thus, the image data satisfies the description data constituted by the conditional expression because the posture and motion of a subject having a region corresponding to the image feature, for example, a person having a face or a hand, is determined. Thus, it is possible to accurately identify a subject such as a posture determination or a motion determination of a person by an efficient collation process for determining whether or not.

本発明に係る画像解析装置の構成を示すブロック図である。It is a block diagram which shows the structure of the image analysis apparatus which concerns on this invention. 本発明に係る画像解析装置における動き検出部の実行する処理について説明する図である。It is a figure explaining the process which the motion detection part performs in the image analyzer which concerns on this invention. 本発明に係る画像解析装置における特徴抽出部の実行する処理について説明するフローチャートを示す図である。It is a figure which shows the flowchart explaining the process which the feature extraction part in the image analysis apparatus which concerns on this invention performs. 本発明に係る画像解析装置におけるルール照合部の実行する処理およびこの処理に適用する画像態様定義情報の具体例について説明する図である。It is a figure explaining the specific example of the process which the rule collation part performs in the image analysis apparatus which concerns on this invention, and the image mode definition information applied to this process. 本発明に係る画像解析装置における画像識別処理部の実行する処理について説明する図である。It is a figure explaining the process which the image identification process part in the image analysis apparatus which concerns on this invention performs. 本発明に係る画像解析装置におけるアクション定義情報蓄積部の蓄積した情報に基づく動作判別処理について説明する図である。It is a figure explaining the operation | movement discrimination | determination process based on the information accumulate | stored in the action definition information storage part in the image analysis apparatus which concerns on this invention. 本発明に係る画像解析装置におけるルール照合部の実行する処理について説明するフローチャートを示す図である。It is a figure which shows the flowchart explaining the process which the rule collation part performs in the image analyzer which concerns on this invention. 本発明に係る画像解析装置における画像識別処理部の実行する処理について説明するフローチャートを示す図である。It is a figure which shows the flowchart explaining the process which the image identification process part in the image analyzer which concerns on this invention performs. 本発明に係る画像解析装置のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of the image analysis apparatus which concerns on this invention.

Explanation of symbols

１０１〜１０ｎカメラ
１２１サンプリング部
１２２動き検出部
１２３特徴抽出部
１２４ルール照合部
１２５画像識別処理部
１２６ルール蓄積部
１２７アクション定義情報蓄積部
２０１顔部分領域
２０２手部分領域
５０１ＣＰＵ(Central processing Unit)
５０２ＲＯＭ（Read-Only-Memory）
５０３ＲＡＭ（Random Access Memory）
５０４ホストバス
５０５ブリッジ
５０６外部バス
５０７インタフェース
５０８キーボード
５０９ポインティングデバイス
５１０ディスプレイ
５１１ＨＤＤ（Hard Disk Drive）
５１２ドライブ
５１４接続ポート
５１５通信部
５２１リムーバブル記録媒体
５２２外部接続機器 DESCRIPTION OF SYMBOLS 101-10n Camera 121 Sampling part 122 Motion detection part 123 Feature extraction part 124 Rule collation part 125 Image identification process part 126 Rule storage part 127 Action definition information storage part 201 Face partial area 202 Hand partial area 501 CPU (Central processing Unit)
502 ROM (Read-Only-Memory)
503 RAM (Random Access Memory)
504 Host bus 505 Bridge 506 External bus 507 Interface 508 Keyboard 509 Pointing device 510 Display 511 HDD (Hard Disk Drive)
512 drive 514 connection port 515 communication unit 521 removable recording medium 522 external connection device

Claims

An image analysis device,
Input multiple image frames that make up an image sequence according to time series, and set the skin area bounding box by grouping partial areas estimated as human skin (skin) areas by color discrimination of each image frame Then, an active bounding box including the skin area bounding box and its neighboring area is set , and motion between frames is detected based on information from the motion detection unit that detects the motion of the subject from the image frame. A feature extraction unit that extracts the skin region as a feature region when the region is in the active bounding box ;
A storage unit storing a plurality of image mode definition information characterized with respect to at least one of position information, aspect ratio information, size information, centroid position information, and distance information between the feature areas;
A definition information matching unit that selects image mode definition information that matches a feature region of each image frame extracted by the feature extraction unit from the storage unit, and sets the selection information as definition information of each image frame;
An operation mode identification processing unit for identifying an operation mode of a subject including the feature region based on definition information associated with each image frame in the definition information matching unit;
An image analysis apparatus comprising:

The image mode definition information is:
The information configured by a conditional expression defined with respect to at least one of position information, aspect ratio information, size information, barycentric position information, and distance information between the characteristic areas. The image analysis apparatus according to 1.

Computer
Input multiple image frames that make up an image sequence according to time series, and set the skin area bounding box by grouping partial areas estimated as human skin (skin) areas by color discrimination of each image frame Then, an active bounding box including the skin area bounding box and its neighboring area is set , and motion between frames is detected based on information from the motion detection unit that detects the motion of the subject from the image frame. A feature extraction unit that extracts the skin region as a feature region when the region is in the active bounding box ;
A storage unit storing a plurality of image mode definition information characterized with respect to at least one of position information, aspect ratio information, size information, centroid position information, and distance information between the feature areas;
A definition information matching unit that selects image mode definition information that matches a feature region of each image frame extracted by the feature extraction unit from the storage unit, and sets the selection information as definition information of each image frame;
A computer program for image analysis processing for causing the definition information matching unit to function as an operation mode identification processing unit that identifies an operation mode of a subject including the feature area based on definition information associated with each image frame.