JP2010123019A

JP2010123019A - Device and method for recognizing motion

Info

Publication number: JP2010123019A
Application number: JP2008297598A
Authority: JP
Inventors: Toshihiko Morita; 俊彦森田; Shinji Kanda; 真司神田; Naoyuki Sawazaki; 直之沢崎; Akihiro Imai; 章博今井; Takeshi Imai; 岳今井; Masayuki Inaba; 雅幸稲葉; Kei Okada; 慧岡田
Original assignee: Fujitsu Ltd; University of Tokyo NUC
Current assignee: Fujitsu Ltd; University of Tokyo NUC
Priority date: 2008-11-21
Filing date: 2008-11-21
Publication date: 2010-06-03
Anticipated expiration: 2028-11-21
Also published as: JP5001930B2

Abstract

<P>PROBLEM TO BE SOLVED: To recognize motion of a person even when it is difficult to discriminate a face area from the other areas with color information when motion of the person is recognized using video. <P>SOLUTION: An action estimation device 205 divides each image obtained by high-resolution cameras 211 and 213 into a plurality of small areas, and associates the small areas of images with each other, which are recognized to include the same portion of a person. Then, a distance between correspondence points of the associated two small areas is calculated as parallax. Next, parallax of a face area obtained by imaging the face of the person is compared with parallax of a flesh color area obtained by imaging a flesh color portion other than the face of the person, and a flesh color area having parallax close to the parallax of the face area around the face area is detected as an arm area. The motion of an arm of the person is recognized on the basis of a relative position of the arm area with respect to the face area, and a difference between the parallax of the face area and the parallax of the arm area. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、人物の動作を認識する動作認識装置及び方法に関する。 The present invention relates to a motion recognition apparatus and method for recognizing a motion of a person.

近年、人物の行動やスケジュールを管理して生活を支援するロボットが提案されている。例えば、人物の行為を認識し、不適切な行動を指摘したり、適切な行動を促したりするロボットである。 In recent years, robots have been proposed that support people's activities by managing their actions and schedules. For example, a robot that recognizes a person's action, points out an inappropriate action, or prompts an appropriate action.

映像による人物の行為認識技術としては、人物の動きのパターンや姿勢に基づいて動作要素を抽出し、動作要素の出現頻度により「食べる」、「飲む」等の行為を認識する方法が知られている。この場合、例えば、画像に含まれる色情報を基に肌色（顔、手、足）や衣服の色を抽出し、これを基に姿勢・動き情報が抽出される。 As a human action recognition technology based on video, a method is known in which motion elements are extracted based on a person's movement pattern and posture, and actions such as “eat” and “drink” are recognized based on the frequency of appearance of the motion elements. Yes. In this case, for example, skin color (face, hand, foot) and clothing color are extracted based on color information included in the image, and posture / movement information is extracted based on the extracted color.

また、画像中のある領域が顔領域か否かを判定する顔画像判定方法も知られている。
さらに、時系列の低解像度画像から移動領域の高解像度映像を抽出する方法や、中心部の解像度が高い特殊なレンズも知られている。
特開２００５−２１５９２７号公報特開２００５−２８４３４８号公報特開２００７−０００２０５号公報 Active Stereo Vision System with Foveated Wide Angle Lenses, Y. Kuniyoshi, N. Kita, S. Rougeaux and T. Suehiro, Proc. of Asian Conf. on Computer Vision (1995), pp. 359-363. There is also known a face image determination method for determining whether a certain area in an image is a face area.
Furthermore, a method for extracting a high-resolution video of a moving region from a time-series low-resolution image and a special lens having a high central resolution are also known.
JP 2005-215927 A JP 2005-284348 A JP 2007-000205 A Active Stereo Vision System with Foveated Wide Angle Lenses, Y. Kuniyoshi, N. Kita, S. Rougeaux and T. Suehiro, Proc. Of Asian Conf. On Computer Vision (1995), pp. 359-363.

上述した従来の行為認識技術には、次のような問題がある。
色情報を基に姿勢・動き情報を抽出する方法では、衣服の色や照明の状況等の条件が悪いと、色情報で顔領域とその他の領域とを区別することが困難な場合がある。この場合、画像中で人物の腕領域等を特定することができず、姿勢・動き情報を正確に抽出することができない。 The conventional action recognition technology described above has the following problems.
In the method of extracting posture / motion information based on color information, it may be difficult to distinguish a face area from other areas by color information if conditions such as clothing color and lighting conditions are bad. In this case, a person's arm region or the like cannot be specified in the image, and posture / motion information cannot be accurately extracted.

本発明の課題は、映像による人物の行為認識において、色情報で顔領域とその他の領域とを区別することが困難な場合でも、人物の動作を認識できるようにすることである。 It is an object of the present invention to make it possible to recognize a person's action even when it is difficult to distinguish a face area from other areas by color information in human action recognition by video.

開示の動作認識装置は、第１及び第２の撮像手段、視差計算手段、検出手段、及び認識手段を備える。
第１及び第２の撮像手段は、人物を撮像する。視差計算手段は、第１及び第２の撮像手段の各々により得られた画像を複数の小領域に分割し、人物の同じ部分を撮像したものと認識した、それぞれの画像の小領域同士を対応付ける。そして、対応付けられた２つの小領域の対応点の距離を視差として計算する。 The disclosed motion recognition apparatus includes first and second imaging means, parallax calculation means, detection means, and recognition means.
The first and second imaging means image a person. The parallax calculation means divides the image obtained by each of the first and second imaging means into a plurality of small areas, and associates the small areas of each image recognized as having captured the same part of the person . Then, the distance between the corresponding points of the two associated small areas is calculated as the parallax.

検出手段は、人物の顔を撮像したものと認識した顔領域の視差と、人物の顔以外の肌色部分を撮像したものと認識した肌色領域の視差とを比較する。そして、顔領域との距離が第１の閾値より小さく、かつ、顔領域との視差の差が第２の閾値より小さい肌色領域を、腕領域として検出する。 The detection unit compares the parallax of the face area recognized as an image of a person's face with the parallax of the recognized skin color area as an image of a skin color part other than the person's face. Then, a skin color area whose distance from the face area is smaller than the first threshold and whose parallax difference from the face area is smaller than the second threshold is detected as an arm area.

認識手段は、顔領域に対する腕領域の相対的な位置、及び、顔領域の視差と腕領域の視差の差に基づいて、人物の腕の動作を認識する。
第１及び第２の撮像手段は、人物の画像を視差計算手段に出力し、視差計算手段は、対応付けられた小領域の対応点の距離を視差として、検出手段及び認識手段に出力する。検出手段は、顔領域及び腕領域の情報を認識手段に出力し、認識手段は、認識された動作の情報を出力する。 The recognizing means recognizes the movement of the person's arm based on the relative position of the arm area with respect to the face area and the difference between the parallax of the face area and the parallax of the arm area.
The first and second imaging means output a human image to the parallax calculation means, and the parallax calculation means outputs the distance between the corresponding points of the associated small areas as parallax to the detection means and the recognition means. The detection means outputs information on the face area and the arm area to the recognition means, and the recognition means outputs information on the recognized motion.

第１及び第２の撮像手段の画像を併用することで、それぞれの小領域の視差を求めることができる。また、顔領域と肌色領域の視差の差が小さければ、画像の奥行き方向における顔と肌色部分の距離が小さいと考えられる。したがって、顔領域の視差と肌色領域の視差を比較することで、奥行き位置の離れた別人等の肌色領域を除外して、同一人物の腕領域に相当する肌色領域を検出することができる。 By using the images of the first and second imaging means together, the parallax of each small area can be obtained. Further, if the difference in parallax between the face area and the skin color area is small, it is considered that the distance between the face and the skin color part in the depth direction of the image is small. Therefore, by comparing the parallax of the face area and the parallax of the skin color area, the skin color area corresponding to the arm area of the same person can be detected by excluding the skin color area of another person or the like who is far away from the depth position.

開示の動作認識装置によれば、映像による人物の行為認識において、色情報で顔領域とその他の領域とを区別することが困難な場合でも、腕領域を特定して人物の腕の動作を認識することができる。 According to the disclosed motion recognition device, even when it is difficult to distinguish a face region from other regions using color information in human action recognition by video, the arm region is identified and the motion of the person's arm is recognized. can do.

以下、図面を参照しながら、最良の実施形態を詳細に説明する。
図１は、実施形態の動作認識装置の構成例を示している。図１の動作認識装置は、広視野撮影部１０１、高解像度撮影部１０２、動き／顔検知部１０３、注視点制御部１０４、姿勢推定部１０５、及び行為認識部１０６を備える。 Hereinafter, the best embodiment will be described in detail with reference to the drawings.
FIG. 1 shows a configuration example of the motion recognition apparatus of the embodiment. 1 includes a wide-field imaging unit 101, a high-resolution imaging unit 102, a motion / face detection unit 103, a gazing point control unit 104, a posture estimation unit 105, and an action recognition unit 106.

広視野撮影部１０１は、人物、物体等の対象物を含む広域の映像を撮影し、動き／顔検知部１０３は、撮影した映像から人物の動きや顔を検知する。注視点制御部１０４は、高解像度撮影部１０２の注視点を注目する対象物に合わせる制御を行い、高解像度撮影部１０２は、注視点周辺の映像を高解像度で撮影する。姿勢推定部１０５は、高解像度撮影部１０２により撮影された映像から人物の姿勢を推定し、行為認識部１０６は、推定された姿勢から人物の行為を特定する。 The wide-field imaging unit 101 captures a wide area image including objects such as a person and an object, and the motion / face detection unit 103 detects the movement and face of the person from the captured image. The gazing point control unit 104 performs control to adjust the gazing point of the high resolution imaging unit 102 to the target object, and the high resolution imaging unit 102 shoots the video around the gazing point with high resolution. The posture estimation unit 105 estimates the posture of the person from the video shot by the high-resolution photographing unit 102, and the action recognition unit 106 specifies the action of the person from the estimated posture.

広視野撮影部１０１及び高解像度撮影部１０２により撮影された映像には、時系列の画像が含まれる。広視野撮影部１０１及び高解像度撮影部１０２の構成としては、例えば、以下のようなものが用いられる。
（１）広視野撮影用カメラ及び高解像度撮影用カメラの２種類のカメラを設ける。
（２）中心部の解像度が高い特殊なレンズにより、広視野及び高解像度の映像を同時に撮影可能なカメラを使用する。
（３）ズーム機能を持つカメラにより同一のカメラで広視野映像と高解像度映像の撮影を行う。ただし、この場合、広視野映像と高解像度映像を同時に参照することはできない。（４）広視野映像と高解像度映像を撮影可能なカメラを使用した場合でも、それらの映像をリアルタイムで画像処理できないときは、縮小して解像度を落とした広域映像と、縮小しない部分映像とを利用する。
（５）広視野撮影用カメラと、超解像技術による注目点の高解像度映像を利用する。超解像技術としては、例えば、時系列の低解像度画像から移動領域の高解像度映像を抽出する方法を用いることができる。 The video captured by the wide field imaging unit 101 and the high resolution imaging unit 102 includes time-series images. As configurations of the wide-field imaging unit 101 and the high-resolution imaging unit 102, for example, the following is used.
(1) Two types of cameras are provided: a wide-field camera and a high-resolution camera.
(2) A camera capable of simultaneously shooting a wide field of view and a high-resolution image by using a special lens having a high central resolution.
(3) A wide-field image and a high-resolution image are taken with the same camera using a camera having a zoom function. However, in this case, the wide-field video and the high-resolution video cannot be referenced simultaneously. (4) Even when using a camera that can shoot wide-field images and high-resolution images, if those images cannot be processed in real time, a wide-area image with reduced resolution and a partial image that has not been reduced Use.
(5) Utilizing a wide-field camera and a high-resolution video of the point of interest by super-resolution technology. As a super-resolution technique, for example, a method of extracting a high-resolution video of a moving area from a time-series low-resolution image can be used.

広視野撮影部１０１による広視野映像を利用することで、広域で起こっている現象を把握することができ、高解像度撮影部１０２による高解像度映像を利用することで、人物の行為及びその対象物を精度良く識別することができる。 By using a wide-field image by the wide-field imaging unit 101, it is possible to grasp a phenomenon occurring in a wide area, and by using a high-resolution image by the high-resolution imaging unit 102, a person's action and its object Can be accurately identified.

動き／顔検知部１０３は、画像から対象物の色情報等を抽出することで対象物の大まかな動きを検知し、肌色抽出、顔画像判定等の方法により、画像から人物の顔領域等を抽出する。 The movement / face detection unit 103 detects rough movement of the object by extracting the color information of the object from the image, and detects the face area of the person from the image by methods such as skin color extraction and face image determination. Extract.

注視点制御部１０４は、動き／顔検知部１０３から注視すべき対象物の大まかな動きを示す情報を取得し、対象物の色情報や、対象物の動きを示すベクトル情報であるオプティカルフロー等を用いて、最終的に対象物を高解像度撮影部１０２の視野に捕捉する。 The point-of-gaze control unit 104 acquires information indicating the rough movement of the target object to be watched from the movement / face detection unit 103, and includes color information of the target object, optical flow that is vector information indicating the movement of the target object, and the like. Is used to finally capture the object in the field of view of the high-resolution imaging unit 102.

例えば、注視している人物が高解像度撮影部１０２の視野から外れた際に、広視野撮影部１０１で注視点周辺の物体の動き情報（例えば、オプティカルフロー）を取得する。そして、高解像度撮影部１０２の注視点を動きに合わせて移動させることにより、継続的に人物の高解像度画像と周辺の物体との相対位置情報を得ることができる。 For example, when the person who is gazing is out of the field of view of the high-resolution imaging unit 102, the wide-field imaging unit 101 acquires motion information (for example, an optical flow) of an object around the gazing point. The relative position information between the high-resolution image of the person and the surrounding objects can be continuously obtained by moving the gazing point of the high-resolution imaging unit 102 in accordance with the movement.

姿勢推定部１０５は、人物の姿勢として、例えば、人物の向きと手や腕の位置を推定する。高解像度撮影部１０２としてステレオカメラを用いた場合は、ステレオ視による３次元計測により人物の姿勢を推定する。高解像度で撮影する場合は２台のカメラの各々の視野が狭くなり、３次元計測可能なステレオカメラの映像の重なりが限定されるため、必要に応じて各カメラの輻輳制御を行う。 The posture estimation unit 105 estimates, for example, the direction of the person and the positions of the hands and arms as the posture of the person. When a stereo camera is used as the high-resolution imaging unit 102, the posture of the person is estimated by three-dimensional measurement using stereo vision. When shooting at high resolution, the field of view of each of the two cameras is narrowed, and the overlap of the images of the stereo camera capable of three-dimensional measurement is limited. Therefore, the congestion control of each camera is performed as necessary.

高解像度カメラは視野が狭いため、３次元計測において通常の視差による位置計算を行う場合、輻輳制御の精度の影響が大きく、キャリブレーションも困難である。一方、人物の行為を認識する場合に必要となるのは、人物の姿勢や、腕の位置、周辺の物体との相対的な位置関係であり、人物や物体の絶対的な位置ではない。 Since the field of view of the high-resolution camera is narrow, when performing position calculation based on normal parallax in three-dimensional measurement, the influence of the accuracy of congestion control is large, and calibration is difficult. On the other hand, what is necessary when recognizing a person's action is the posture of the person, the position of the arm, and the relative positional relationship with surrounding objects, not the absolute position of the person or object.

そこで、姿勢推定部１０５は、３次元の絶対位置を計算する代わりに、注視点付近での相対距離を測定するために、各カメラのステレオ特徴点の相対視差に注目し、注視点付近での相対的な位置関係を取得する。 Therefore, instead of calculating the three-dimensional absolute position, the posture estimation unit 105 pays attention to the relative parallax of the stereo feature points of each camera in order to measure the relative distance in the vicinity of the gazing point. Get the relative positional relationship.

これにより、各カメラが目標物を視野に捕らえてさえいれば、輻輳制御の精度やキャリブレーションによる影響を受けることなく、人物の姿勢を推定することが可能になる。また、絶対位置を計算するために必要であったキャリブレーションそのものも不要になる。 As a result, as long as each camera captures the target in the field of view, the posture of the person can be estimated without being affected by the accuracy of the congestion control or calibration. In addition, the calibration itself that is necessary for calculating the absolute position becomes unnecessary.

行為認識部１０６は、人物の姿勢の時系列情報のパターン等に基づいて、人物の行為を認識する。行為によっては、行為対象を識別する必要がある。そこで、姿勢の時系列情報に加えて、周囲の状況や対象物の詳細映像等を用いることで、詳細な行為を認識する。 The action recognition unit 106 recognizes the action of the person based on the pattern of the time series information of the posture of the person. Depending on the action, it is necessary to identify the action object. Therefore, in addition to the time-series information of the posture, the detailed action is recognized by using the surrounding situation and the detailed video of the object.

例えば、「薬を飲む」行為を認識する場合には、「飲む」動作のパターンを認識した際に、直前に手が触れた物体を高解像度で撮影し、薬の容器であることが認識できればよい。また、「扉を開ける」行為を認識する場合には、腕や手の動作パターンと同時に、広域の映像で、扉が開いたことをオプティカルフローのパターン等により認識できればよい。 For example, in the case of recognizing an action of “drinking medicine”, if the pattern of the “drinking” action is recognized, the object touched by the hand immediately before can be photographed with high resolution and recognized as a medicine container. Good. In addition, when recognizing the action of “opening the door”, it is only necessary to recognize that the door has been opened by an optical flow pattern or the like in a wide area image simultaneously with the movement pattern of the arms and hands.

このような動作認識装置によれば、広視野映像と部分的な高解像度映像を同時に利用し、人物の姿勢の履歴と周囲の状況等により、柔軟な行為認識が可能になる。また、人物と周辺の物体との相対的な位置関係を測定することにより、高精度な輻輳制御やキャリブレーションが不要になる。さらに、輻輳制御を採用することで３次元計測可能な範囲が拡大し、動きに強い認識処理が実現される。 According to such a motion recognition device, a wide-field video and a partial high-resolution video can be used at the same time, and flexible action recognition can be performed according to a person's posture history and surrounding conditions. Further, by measuring the relative positional relationship between a person and a surrounding object, highly accurate congestion control and calibration are not necessary. Furthermore, by adopting congestion control, the range in which three-dimensional measurement is possible is expanded, and recognition processing that is resistant to movement is realized.

図２は、図１の動作認識装置のより具体的な構成例を示している。図２の動作認識装置は、カメラヘッド２０１、パン・チルト制御モータ２０２、カメラ制御装置２０３、画像処理装置２０４、行為推定装置２０５、及びホスト装置２０６を備える。カメラヘッド２
０１は、高解像度カメラ２１１、２１３、輻輳制御モータ２１２、２１４、及び広視野カメラ２１５を含む。 FIG. 2 shows a more specific configuration example of the motion recognition apparatus of FIG. 2 includes a camera head 201, a pan / tilt control motor 202, a camera control device 203, an image processing device 204, an action estimation device 205, and a host device 206. Camera head 2
01 includes high-resolution cameras 211 and 213, congestion control motors 212 and 214, and a wide-field camera 215.

広視野カメラ２１５は、図１の広視野撮影部１０１に対応し、高解像度カメラ２１１及び２１３は、高解像度撮影部１０２に対応する。画像処理装置２０４及び行為推定装置２０５は、動き／顔検知部１０３、姿勢推定部１０５、及び行為認識部１０６に対応し、カメラ制御装置２０３及びホスト装置２０６は、注視点制御部１０４に対応する。 The wide-field camera 215 corresponds to the wide-field imaging unit 101 in FIG. 1, and the high-resolution cameras 211 and 213 correspond to the high-resolution imaging unit 102. The image processing device 204 and the action estimation device 205 correspond to the motion / face detection unit 103, the posture estimation unit 105, and the action recognition unit 106, and the camera control device 203 and the host device 206 correspond to the gaze point control unit 104. .

パン・チルト制御モータ２０２は、カメラ制御装置２０３により駆動され、カメラヘッド２０１にパン・チルト動作を行わせる。カメラヘッド２０１全体がパン・チルト動作を行うことで、撮影方向が変化する。なお、パン・チルト動作が不要な場合には、パン・チルト制御モータ２０２を省略してもよい。 The pan / tilt control motor 202 is driven by the camera control device 203 to cause the camera head 201 to perform a pan / tilt operation. When the entire camera head 201 performs a pan / tilt operation, the shooting direction changes. If the pan / tilt operation is unnecessary, the pan / tilt control motor 202 may be omitted.

高解像度カメラ２１１及び２１３は、例えば、両方で１台のステレオカメラを構成し、輻輳制御モータ２１２及び２１４は、カメラ制御装置２０３により駆動され、高解像度カメラ２１１及び２１３に輻輳運動を行わせる。 The high-resolution cameras 211 and 213 constitute, for example, one stereo camera, and the congestion control motors 212 and 214 are driven by the camera control device 203 to cause the high-resolution cameras 211 and 213 to perform a convergence motion.

画像処理装置２０４は、広視野カメラ２１５及び高解像度カメラ２１１、２１３により撮影された映像に含まれる画像から、対象物の色やオプティカルフロー等の画像情報を抽出し、行為推定装置２０５に出力する。 The image processing device 204 extracts image information such as the color of the object and the optical flow from the images included in the images captured by the wide-field camera 215 and the high-resolution cameras 211 and 213, and outputs the image information to the action estimation device 205. .

行為推定装置２０５は、画像処理装置２０４からの画像情報をホスト装置２０６に出力するとともに、その画像情報を用いて対象物の大まかな動きを検知し、人物の顔領域、腕領域等を抽出する。そして、抽出された顔領域、腕領域等を用いて人物の姿勢を推定し、推定された姿勢から行為を認識して、認識結果をホスト装置２０６に出力する。 The action estimation device 205 outputs the image information from the image processing device 204 to the host device 206, detects rough movement of the object using the image information, and extracts a human face area, arm area, and the like. . Then, the posture of the person is estimated using the extracted face region, arm region, and the like, an action is recognized from the estimated posture, and the recognition result is output to the host device 206.

ホスト装置２０６は、行為推定装置２０５からの画像情報に基づいて、パン・チルト動作及び輻輳制御をカメラ制御装置２０３に指示し、カメラ制御装置２０３は、その指示に従って、輻輳制御モータ２１２、２１４及びパン・チルト制御モータ２０２を駆動する。また、ホスト装置２０６は、行為推定装置２０５に対して行為認識開始を指示し、行為推定装置２０５から認識結果を受け取る。 The host device 206 instructs the camera control device 203 to perform pan / tilt operation and congestion control based on the image information from the action estimation device 205, and the camera control device 203 follows the instructions, and the congestion control motors 212, 214, and The pan / tilt control motor 202 is driven. Further, the host device 206 instructs the action estimation device 205 to start action recognition, and receives a recognition result from the action estimation device 205.

カメラ制御装置２０３、画像処理装置２０４、行為推定装置２０５、及びホスト装置２０６としては、例えば、情報処理装置（コンピュータ）が用いられる。また、これらの装置の全部を１つの情報処理装置を用いて実現することも可能である。 For example, an information processing device (computer) is used as the camera control device 203, the image processing device 204, the action estimation device 205, and the host device 206. It is also possible to realize all of these apparatuses using one information processing apparatus.

この場合、広視野カメラ２１５及び高解像度カメラ２１１、２１３により取得された映像データや、対象物の色やオプティカルフロー等の画像情報は、処理対象のデータとしてメモリに格納される。 In this case, video data acquired by the wide-field camera 215 and the high-resolution cameras 211 and 213 and image information such as the color of the object and the optical flow are stored in the memory as data to be processed.

図３は、高解像度カメラ２１１及び２１３の配置例を示している。図３では、左カメラ３１２及び右カメラ３２２が高解像度カメラ２１１及び２１３にそれぞれ対応する。輻輳制御は、左カメラ３１２及び右カメラ３２２の各々が垂直軸周りに回転する運動の制御である。高解像度で対象人物３０１を撮影する場合、３次元計測のために必要な左カメラ３２１及び右カメラ３２２の映像の重なりを確保するために、左カメラ３２１及び右カメラ３２２の輻輳制御が個別に行われる。 FIG. 3 shows an arrangement example of the high resolution cameras 211 and 213. In FIG. 3, the left camera 312 and the right camera 322 correspond to the high resolution cameras 211 and 213, respectively. Congestion control is control of the movement of each of the left camera 312 and the right camera 322 rotating around a vertical axis. When photographing the target person 301 at high resolution, the congestion control of the left camera 321 and the right camera 322 is performed individually in order to ensure the overlap of the images of the left camera 321 and the right camera 322 necessary for three-dimensional measurement. Is called.

図４は、図３の配置における相対視差の概念を示している。左カメラ３２１及び右カメラ３２２が人物３０１を撮影するとき、左カメラ３２１は画像４０１を取得し、右カメラ３２２は画像４０２を取得する。画像４０１及び４０２には、人物３０１の顔３１１、右
腕３１２、及び左腕３１３の画像が含まれている。 FIG. 4 shows the concept of relative parallax in the arrangement of FIG. When the left camera 321 and the right camera 322 capture a person 301, the left camera 321 acquires an image 401 and the right camera 322 acquires an image 402. Images 401 and 402 include images of the face 311, the right arm 312, and the left arm 313 of the person 301.

画像４０１及び４０２の水平方向をｘ軸とし、垂直方向をｙ軸とすると、３次元空間内の１点が両カメラで撮影されたとき、その点の相対視差は、例えば、次式で定義される。

相対視差＝左カメラのｘ座標−右カメラのｘ座標（１）

輻輳制御では垂直軸周りの回転運動しか行われないため、ｙ座標に関しては視差は生じない。 When the horizontal direction of the images 401 and 402 is the x-axis and the vertical direction is the y-axis, when one point in the three-dimensional space is photographed by both cameras, the relative parallax at that point is defined by the following equation, for example: The

Relative parallax = x coordinate of left camera-x coordinate of right camera (1)

In the convergence control, only a rotational movement around the vertical axis is performed, so that no parallax occurs with respect to the y coordinate.

図４に示すように、画像４０１及び４０２のｘ座標を仮に０、１、２、及び３と定義すると、点４１１、４１２、４１３、及び４１４の相対視差はそれぞれ０、１、２、及び３となる。また、点４２１、４２２、４２３、４２４、４３１、４３２、４３３、４４１、及び４４２の相対視差はそれぞれ−１、０、１、２、−１、０、１、−１、及び０となる。したがって、左カメラ３２１及び右カメラ３２２に近い点ほど相対視差が大きくなり、左カメラ３２１及び右カメラ３２２から遠い点ほど相対視差が小さくなることが分かる。 As shown in FIG. 4, if the x-coordinates of the images 401 and 402 are defined as 0, 1, 2, and 3, the relative parallax of the points 411, 412, 413, and 414 is 0, 1, 2, and 3, respectively. It becomes. Further, the relative parallaxes of the points 421, 422, 423, 424, 431, 432, 433, 441, and 442 are -1, 0, 1, 2, -1, 0, 1, -1, and 0, respectively. Therefore, it can be seen that the relative parallax increases as the point is closer to the left camera 321 and the right camera 322, and the relative parallax decreases as the point is farther from the left camera 321 and the right camera 322.

また、点４１２、４２３、及び４３３のように、画像４０１及び４０２の奥行き方向の位置が近い点同士の相対視差の差は小さいか又は０であり、点４１２及び４３１のように、奥行き方向の位置が離れている点同士の相対視差の差は大きい。したがって、顔３１１、右腕３１２、及び左腕３１３の相対視差同士を比較することで、これらの部分の奥行き方向の相対的な位置関係を判定することができる。 In addition, the difference in relative parallax between the points close to each other in the depth direction of the images 401 and 402 like the points 412, 423, and 433 is small or 0, and the points in the depth direction like the points 412 and 431 The difference in relative parallax between points that are separated is large. Therefore, by comparing the relative parallaxes of the face 311, the right arm 312, and the left arm 313, the relative positional relationship of these portions in the depth direction can be determined.

図５は、カメラ制御装置２０３及びホスト装置２０６による注視点制御処理の例を示すフローチャートである。人物３０１等の対象物を左カメラ３２１及び右カメラ３２２で撮影している状態において（ステップ５０１）、ホスト装置２０６は、対象物の色情報に基づいて左カメラ３２１が対象物を捕捉しているか否かをチェックする（ステップ５０２）。 FIG. 5 is a flowchart illustrating an example of gaze point control processing by the camera control device 203 and the host device 206. In a state where the left camera 321 and the right camera 322 are capturing an object such as a person 301 (step 501), the host device 206 has detected whether the left camera 321 has captured the object based on the color information of the object. It is checked whether or not (step 502).

左カメラ３２１が対象物を捕捉していなければ（ステップ５０２，ＮＯ）、行為推定装置２０５を介して画像処理装置２０４にオプティカルフローの計算を指示する。画像処理装置２０４は、広視野カメラ２１５の広域映像を用いて対象物のオプティカルフローを計算し、行為推定装置２０５を介してホスト装置２０６に出力する（ステップ５０３）。 If the left camera 321 has not captured the object (NO in step 502), the image processing apparatus 204 is instructed to calculate the optical flow via the action estimation apparatus 205. The image processing device 204 calculates the optical flow of the object using the wide-area video of the wide-field camera 215, and outputs it to the host device 206 via the action estimation device 205 (step 503).

ホスト装置２０６は、カメラ制御装置２０３を介してカメラヘッド２０１のパン・チルト制御を行うことで、オプティカルフローが示す方向に視線を動かしながら、左カメラ３２１で対象物を探索する（ステップ５０４）。そして、ステップ５０２以降の処理を繰り返す。 The host device 206 performs pan / tilt control of the camera head 201 via the camera control device 203 to search for an object with the left camera 321 while moving the line of sight in the direction indicated by the optical flow (step 504). Then, the processing after step 502 is repeated.

一方、左カメラ３２１が対象物を捕捉していれば（ステップ５０２，ＹＥＳ）、ホスト装置２０６は、対象物の色情報に基づいて右カメラ３２２が対象物を捕捉しているか否かをチェックする（ステップ５０５）。 On the other hand, if the left camera 321 has captured the object (step 502, YES), the host device 206 checks whether the right camera 322 has captured the object based on the color information of the object. (Step 505).

右カメラ３２２が対象物を捕捉していなければ（ステップ５０５，ＮＯ）、カメラ制御装置２０３を介して右カメラ３２２の輻輳制御を行うことで、水平方向に右カメラ３２２の視線を動かしながら、右カメラ３２２で対象物を探索する（ステップ５０６）。そして、ステップ５０５以降の処理を繰り返す。 If the right camera 322 has not captured the object (step 505, NO), the right camera 322 is controlled by the camera control device 203 to control the right camera 322 while moving the line of sight of the right camera 322 in the horizontal direction. The camera 322 searches for an object (step 506). Then, the processing after step 505 is repeated.

一方、右カメラ３２２が対象物を捕捉していれば（ステップ５０５，ＹＥＳ）、ステップ５０１以降の処理を繰り返す。
このような注視点制御処理を継続することで、対象物を左右のカメラで常に捕捉することができる。なお、図５の注視点制御処理では、パン・チルト制御のために左カメラ３２１の映像を用い、輻輳制御のために右カメラ３２２の映像を用いているが、左カメラ３２１と右カメラ３２２の制御順序を入れ替えても構わない。 On the other hand, if the right camera 322 has captured the object (step 505, YES), the processing after step 501 is repeated.
By continuing such a gazing point control process, the object can always be captured by the left and right cameras. In the gazing point control process of FIG. 5, the video of the left camera 321 is used for pan / tilt control and the video of the right camera 322 is used for congestion control, but the left camera 321 and the right camera 322 are used. The control order may be changed.

図６は、図２の動作認識装置による動作認識処理の例を示すフローチャートである。動作認識装置は、図５に示した注視点制御処理を行って、対象人物を左カメラ３２１及び右カメラ３２２の視野に捕らえる（ステップ６０１）。次に、行為推定装置２０５は、ステップ６０１〜６０６の処理を行う。 FIG. 6 is a flowchart illustrating an example of motion recognition processing by the motion recognition apparatus of FIG. The motion recognition apparatus performs the gazing point control process shown in FIG. 5 to capture the target person in the field of view of the left camera 321 and the right camera 322 (step 601). Next, the action estimation apparatus 205 performs the processing of steps 601 to 606.

行為推定装置２０５は、まず、ステレオマッチングにより、左カメラ３２１の画像と右カメラ３２２の画像の対応点を探索する（ステップ６０２）。ここでは、例えば、左カメラ３２１及び右カメラ３２２の各画像を複数の小領域に分割し、それぞれの画像の小領域同士の相関を求める。そして、相関の高い小領域同士を同じ部分を撮像したものと認識し、それらの小領域同士を対応付ける。そして、対応付けられた２つの小領域内のそれぞれの対応点（対応画素）について（１）式により相対視差を計算し、メモリに格納する。小領域の形状としては、例えば、一定数の画素からなる矩形が用いられる。 The action estimation device 205 first searches for corresponding points between the image of the left camera 321 and the image of the right camera 322 by stereo matching (step 602). Here, for example, each image of the left camera 321 and the right camera 322 is divided into a plurality of small areas, and the correlation between the small areas of the respective images is obtained. Then, the small areas with high correlation are recognized as images of the same part, and the small areas are associated with each other. Then, the relative parallax is calculated by the equation (1) for each corresponding point (corresponding pixel) in the two associated small regions, and is stored in the memory. As the shape of the small region, for example, a rectangle including a certain number of pixels is used.

次に、左カメラ３２１又は右カメラ３２２のいずれかの画像を用いて、人物の顔領域及び腕領域（又は手領域）を特定する（ステップ６０３）。以下では、手領域の場合も含めて腕領域と記載することにする。 Next, using either the left camera 321 or the right camera 322, the face area and arm area (or hand area) of the person are specified (step 603). In the following, it is described as an arm region including a hand region.

ここでは、例えば、肌色抽出により人物の肌色部分の領域（肌色領域）を抽出し、顔画像判定により人物の顔領域を抽出する。そして、抽出された顔領域に含まれる点の相対視差の平均値（平均視差）と、それぞれの肌色領域の平均視差とを比較し、顔領域の周辺で顔領域の平均視差に近い平均視差を有する肌色領域を腕領域として検出する。 Here, for example, an area of the skin color portion (skin color area) of the person is extracted by skin color extraction, and the face area of the person is extracted by face image determination. Then, the average value of the relative parallax of the points included in the extracted face area (average parallax) is compared with the average parallax of each skin color area, and the average parallax close to the average parallax of the face area around the face area is obtained. The skin color area to have is detected as an arm area.

例えば、顔領域及び各肌色領域の重心座標を計算し、顔領域の重心と肌色領域の重心の距離を計算し、その距離が閾値より小さければ、その肌色領域は顔領域の周辺にあると判定される。また、顔領域の平均視差と肌色領域の平均視差の差を計算し、その差が閾値より小さければ、その肌色領域の視差は顔領域の視差に近いと判定される。こうして抽出された顔領域及び腕領域の情報は、メモリに格納される。 For example, the center of gravity coordinates of the face area and each skin color area are calculated, the distance between the center of the face area and the center of the skin color area is calculated, and if the distance is smaller than the threshold, it is determined that the skin color area is around the face area Is done. Further, the difference between the average parallax of the face area and the average parallax of the skin color area is calculated, and if the difference is smaller than the threshold value, it is determined that the parallax of the skin color area is close to the parallax of the face area. Information of the face area and arm area extracted in this way is stored in the memory.

次に、顔領域の幅、高さ、及び視差のばらつきの範囲を計算し（ステップ６０４）、各腕領域の顔領域に対する相対的な位置関係を推定する（ステップ６０５）。ここでは、例えば、以下のような手順で位置関係が推定される。
１．顔領域の重心座標（ｆｘ、ｆｙ）、顔領域の幅ｆｗ及び高さｆｈ、顔領域の平均視差ｆｍｄ、及び視差のばらつきの範囲ｆｄｄを計算する。視差のばらつきの範囲は、顔領域内の視差の最大値と最小値の差として求められる。このとき、外れ値を除く等、誤差の考慮を行ってもよい。
２．未処理の腕領域を１つ選択する。
３．腕領域の重心座標（ａｘ，ａｙ）及び腕領域の平均視差ａｍｄを計算する。
４．左右の位置関係の判定（αは正の定数）
ｆｘ−ａｘ＜−α×ｆｗであれば、腕領域は顔領域より右にあると判定する。
ｆｘ−ａｘ＞α×ｆｗであれば、腕領域は顔領域より左にあると判定する。
−α×ｆｗ≦ｆｘ−ａｘ≦α×ｆｗであれば、腕領域は顔領域の中央にあると判定する。
５．上下の位置関係の判定（βは正の定数）
ｆｙ−ａｙ＜−β×ｆｈであれば、腕領域は顔領域より上にあると判定する。
ｆｙ−ａｙ＞β×ｆｈであれば、腕領域は顔領域より下にあると判定する。
−β×ｆｈ≦ｆｙ−ａｙ≦β×ｆｈであれば、腕領域は顔領域の中央にあると判定する。
６．奥行きの位置関係の判定（γは正の定数）
ｆｍｄ−ａｍｄ＜−γ×ｆｄｄであれば、腕領域は顔領域より手前にあると判定する。
ｆｍｄ−ａｍｄ＞γ×ｆｄｄであれば、腕領域は顔領域より奥にあると判定する。
−γ×ｆｄｄ≦ｆｍｄ−ａｍｄ≦γ×ｆｄｄであれば、腕領域は顔領域の中央にあると判定する。
７．別の腕領域（又は手領域）について上記３〜６の処理を行ってもよい。 Next, the width, height, and range of variation in parallax of the face area are calculated (step 604), and the relative positional relationship of each arm area with respect to the face area is estimated (step 605). Here, for example, the positional relationship is estimated by the following procedure.
1. The center of gravity coordinates (fx, fy) of the face area, the width fw and height fh of the face area, the average parallax fmd of the face area, and the parallax variation range fdd are calculated. The range of the parallax variation is obtained as a difference between the maximum value and the minimum value of the parallax in the face area. At this time, error may be taken into consideration, such as by removing outliers.
2. Select one unprocessed arm region.
3. The center-of-gravity coordinates (ax, ay) of the arm region and the average parallax amd of the arm region are calculated.
4). Judgment of left-right positional relationship (α is a positive constant)
If fx−ax <−α × fw, it is determined that the arm region is to the right of the face region.
If fx−ax> α × fw, it is determined that the arm region is to the left of the face region.
If −α × fw ≦ fx−ax ≦ α × fw, it is determined that the arm region is in the center of the face region.
5). Judgment of the vertical relationship (β is a positive constant)
If fy−ay <−β × fh, it is determined that the arm region is above the face region.
If fy-ay> β × fh, it is determined that the arm region is below the face region.
If −β × fh ≦ fy−ay ≦ β × fh, it is determined that the arm region is in the center of the face region.
6). Determining the positional relationship of depth (γ is a positive constant)
If fmd−amd <−γ × fdd, it is determined that the arm region is in front of the face region.
If fmd−amd> γ × fdd, it is determined that the arm region is behind the face region.
If −γ × fdd ≦ fmd−amd ≦ γ × fdd, it is determined that the arm region is in the center of the face region.
7). You may perform the said process of 3-6 about another arm area | region (or hand area | region).

このような位置関係推定処理によれば、簡単なパラメータを計算するだけで顔と腕の相対位置を推定することができる。また、シミュレーション等に基づいて適切なα、β、及びγの値を設定することで、閾値−α×ｆｗ、α×ｆｗ、−β×ｆｈ、β×ｆｈ、−γ×ｆｄｄ、及びγ×ｆｄｄを調整し、推定精度を向上させることが可能になる。 According to such positional relationship estimation processing, it is possible to estimate the relative positions of the face and the arm only by calculating simple parameters. Further, by setting appropriate values of α, β, and γ based on simulation or the like, threshold values −α × fw, α × fw, −β × fh, β × fh, −γ × fdd, and γ × It is possible to adjust fdd and improve the estimation accuracy.

なお、顔領域、肌色領域、及び腕領域の平均視差の代わりに、加重平均等の別の統計演算により求めた視差を用いてもよい。
腕領域の顔領域に対する相対的な位置関係が推定されると、次に、腕の位置の時系列情報と人物周辺の物体等の状況から、人物の行為を推定する（ステップ６０６）。ここでは、例えば、腕領域の顔領域に対する相対的な位置を時系列にメモリに記録して、記録された時系列情報と予め登録された動作パターンを照合することで、人物の行為を認識する。 Note that, instead of the average parallax of the face area, the skin color area, and the arm area, a parallax obtained by another statistical calculation such as a weighted average may be used.
Once the relative positional relationship between the arm region and the face region is estimated, the person's action is estimated from the time-series information of the arm position and the situation of objects around the person (step 606). Here, for example, the relative position of the arm region with respect to the face region is recorded in memory in time series, and the action of a person is recognized by collating the recorded time series information with a pre-registered operation pattern. .

例えば、薬を飲む行為における腕の動作パターンとしては、手が薬→口→コップ→口の順に移動するパターンが考えられる。この場合、行為推定装置２０５の格納装置には、図７に示すような認識テーブルが動作パターンとして登録され、図８に示すような物体テーブルが物体の位置として登録される。 For example, as an arm movement pattern in the act of taking medicine, a pattern in which the hand moves in the order of medicine → mouth → cup → mouth can be considered. In this case, in the storage device of the action estimation device 205, a recognition table as shown in FIG. 7 is registered as an operation pattern, and an object table as shown in FIG. 8 is registered as the position of the object.

図７の認識する行為の段階は、行為に含まれるそれぞれの動作の時間的な前後関係を表し、腕の位置は、図６のステップ６０５で説明した相対的な位置関係として定義される位置を表す。例えば、“中央／中央／中央”は、左右、上下、及び奥行きの３方向において、腕が顔の位置（口の位置）にあることを示している。 The stage of action shown in FIG. 7 represents the temporal context of each action included in the action, and the position of the arm is the position defined as the relative position described in step 605 of FIG. To express. For example, “center / center / center” indicates that the arm is in the face position (mouth position) in the three directions of right and left, up and down, and depth.

また、“薬”及び“コップ”は、図８の物体テーブルにより定義される物体の位置を示している。この例では、薬は“右／下／手前”の位置、つまり、顔より右、顔より下、かつ顔より手前の位置にあり、コップは“中央／下／手前”の位置、つまり、左右方向では顔と一致し、顔より下かつ顔より手前の位置にある。これらの物体の位置は、広視野カメラ２１５の映像、事前知識、過去の映像履歴、又はその他のセンサ情報により、予め取得されているものとする。図８のテーブルは、物体の位置に応じて随時更新することも可能である。 Further, “medicine” and “cup” indicate the positions of the objects defined by the object table of FIG. In this example, the medicine is in the “right / bottom / near” position, that is, to the right of the face, below the face, and in front of the face, and the cup is in the “center / bottom / near” position, that is, left and right. In direction, it matches the face and is below the face and in front of the face. Assume that the positions of these objects have been acquired in advance from the video of the wide-field camera 215, prior knowledge, past video history, or other sensor information. The table in FIG. 8 can be updated at any time according to the position of the object.

図７の所要時間は、各段階における腕の位置が継続する時間を表す。例えば、腕が薬の位置にある時間は２秒に設定され、薬からコップまで移動する時間は３秒に設定される。
薬を飲む行為をモデルケースとして撮影した映像から図６の動作認識処理により推定した相対的な位置を時系列に記録し、記録された位置と行為の各段階を対応付けることで、図７の動作パターンの情報を生成することも可能である。 The required time in FIG. 7 represents the time during which the arm position continues at each stage. For example, the time for which the arm is at the position of the medicine is set to 2 seconds, and the time for moving from the medicine to the cup is set to 3 seconds.
The relative position estimated by the action recognition process of FIG. 6 from the video taken as an action of taking medicine is recorded in time series, and the recorded position is associated with each stage of the action, so that the action of FIG. It is also possible to generate pattern information.

図９は、認識テーブル及び物体テーブルを用いた行為認識処理の例を示すフローチャートである。行為推定装置２０５は、まず、認識する行為の段階を０に初期化し（ステップ
９０１）、現在の段階の腕の位置を認識テーブルから取得する（ステップ９０２）。取得した位置が物体である場合は、その物体の位置を物体テーブルから取得する（ステップ９０３）。 FIG. 9 is a flowchart illustrating an example of an action recognition process using a recognition table and an object table. The action estimation device 205 first initializes the stage of the action to be recognized to 0 (step 901), and acquires the arm position at the current stage from the recognition table (step 902). If the acquired position is an object, the position of the object is acquired from the object table (step 903).

次に、図６のステップ６０１〜６０５の処理により人物の姿勢を推定し（ステップ９０５）、推定された腕の位置が取得した腕の位置と一致するか否かをチェックする（ステップ９０６）。 Next, the posture of the person is estimated by the processing of steps 601 to 605 in FIG. 6 (step 905), and it is checked whether or not the estimated arm position matches the acquired arm position (step 906).

推定された腕の位置が取得した腕の位置と一致すれば（ステップ９０６，ＹＥＳ）、現在の段階を１だけインクリメントして（ステップ９０７）、現在の段階が終了段階に達したか否かをチェックする（ステップ９０８）。図７の場合は、段階“５”が終了段階に対応する。現在の段階が終了段階に達していなければ（ステップ９０８，ＮＯ）、現在の時刻を記録して（ステップ９０９）、ステップ９０２以降の処理を繰り返す。 If the estimated arm position matches the acquired arm position (step 906, YES), the current stage is incremented by 1 (step 907), and whether or not the current stage has reached the end stage is determined. Check (step 908). In the case of FIG. 7, stage “5” corresponds to the end stage. If the current stage has not reached the end stage (step 908, NO), the current time is recorded (step 909), and the processes after step 902 are repeated.

一方、推定された腕の位置が取得した腕の位置と一致しなければ（ステップ９０６，ＮＯ）、現在の段階の所要時間を認識テーブルから取得し、現在の段階になってから所要時間以上経過しているか否かをチェックする（ステップ９１０）。 On the other hand, if the estimated arm position does not match the acquired arm position (step 906, NO), the required time for the current stage is acquired from the recognition table, and the required time has elapsed since the current stage. It is checked whether or not (step 910).

所要時間以上経過していなければ（ステップ９１０，ＮＯ）、ステップ９０３以降の処理を繰り返し、所要時間以上経過していれば（ステップ９１０，ＹＥＳ）、ステップ９０１以降の処理を繰り返す。 If the required time has not elapsed (step 910, NO), the processing after step 903 is repeated, and if the required time has elapsed (step 910, YES), the processing after step 901 is repeated.

そして、現在の段階が終了段階に達すると（ステップ９０８，ＹＥＳ）、処理を終了する。このとき、認識テーブルに登録された動作パターンの行為が認識結果として、ホスト装置２０６に出力される。図７の場合は、薬を飲む行為が認識結果として出力される。 When the current stage reaches the end stage (step 908, YES), the process is terminated. At this time, the action pattern action registered in the recognition table is output to the host device 206 as a recognition result. In the case of FIG. 7, the action of taking medicine is output as a recognition result.

図１及び図２の動作認識装置は、例えば、人物の行為を認識し生活を支援する生活支援ロボットに適用することができる。これにより、対象人物が薬を飲む行為を認識し、薬の重複服用や服用忘れを防止する支援を行う等のサービスを提供することが可能になる。 The motion recognition apparatus of FIGS. 1 and 2 can be applied to a life support robot that recognizes a person's action and supports life. As a result, it becomes possible to provide services such as recognizing the act of the subject person taking medicine, and providing support for preventing duplicate medications and forgetting to take medication.

なお、薬を飲む行為は認識対象の一例に過ぎず、認識テーブル及び物体テーブルの内容を変更することで、様々な行為の動作パターンを定義することができる。動作認識装置には、アプリケーションに応じて１つ又は複数の動作パターンが予め登録される。 Note that the act of taking medicine is merely an example of a recognition target, and the action patterns of various actions can be defined by changing the contents of the recognition table and the object table. In the motion recognition apparatus, one or a plurality of motion patterns are registered in advance according to the application.

さらに、腕の動作パターンに加えて、人物の周辺における物体の移動を検出することで、認識対象の範囲を拡大することも可能である。例えば、扉又は家具を開閉する行為を認識する場合には、広域映像を用いて扉又は家具のオプティカルフローを計算し、予め登録されたオプティカルフローのパターンと照合することにより、その開閉を検出することができる。 Furthermore, in addition to the movement pattern of the arm, it is also possible to expand the range of the recognition target by detecting the movement of the object around the person. For example, when recognizing the act of opening or closing a door or furniture, the optical flow of the door or furniture is calculated using a wide-area image, and the opening or closing is detected by collating with a pre-registered optical flow pattern. be able to.

図２のカメラ制御装置２０３、画像処理装置２０４、行為推定装置２０５、及びホスト装置２０６は、例えば、図１０に示すような情報処理装置を用いて実現することが可能である。図１０の情報処理装置は、Central Processing Unit（ＣＰＵ）１００１、メモリ１００２、入力装置１００３、出力装置１００４、外部記憶装置１００５、媒体駆動装置１００６、及びネットワーク接続装置１００７を備え、それらはバス１００８により互いに接続されている。 The camera control device 203, the image processing device 204, the action estimation device 205, and the host device 206 in FIG. 2 can be realized using, for example, an information processing device as shown in FIG. 10 includes a central processing unit (CPU) 1001, a memory 1002, an input device 1003, an output device 1004, an external storage device 1005, a medium drive device 1006, and a network connection device 1007, which are connected via a bus 1008. Are connected to each other.

メモリ１００２は、例えば、Read Only Memory（ＲＯＭ）、Random Access Memory（ＲＡＭ）等を含み、動作認識処理に用いられるプログラム及びデータを格納する。例えば、ＣＰＵ１００１は、メモリ１００２を利用してプログラムを実行することにより、動作認
識処理を行う。 The memory 1002 includes, for example, a read only memory (ROM), a random access memory (RAM), and the like, and stores a program and data used for operation recognition processing. For example, the CPU 1001 performs a motion recognition process by executing a program using the memory 1002.

入力装置１００３は、例えば、キーボード、ポインティングデバイス等であり、オペレータからの指示や情報の入力に用いられる。出力装置１００４は、例えば、ディスプレイ、プリンタ、スピーカ等であり、オペレータへの問い合わせや処理結果の出力に用いられる。 The input device 1003 is, for example, a keyboard, a pointing device, or the like, and is used for inputting instructions and information from an operator. The output device 1004 is, for example, a display, a printer, a speaker, and the like, and is used to output an inquiry to the operator and a processing result.

外部記憶装置１００５は、例えば、磁気ディスク装置、光ディスク装置、光磁気ディスク装置、テープ装置等である。情報処理装置は、この外部記憶装置１００５に、プログラム及びデータを格納しておき、必要に応じて、それらをメモリ１００２にロードして使用する。外部記憶装置１００５は、認識テーブル及び物体テーブルを格納するデータベースとしても使用される。 The external storage device 1005 is, for example, a magnetic disk device, an optical disk device, a magneto-optical disk device, a tape device, or the like. The information processing apparatus stores programs and data in the external storage device 1005, and loads them into the memory 1002 for use as necessary. The external storage device 1005 is also used as a database that stores a recognition table and an object table.

媒体駆動装置１００６は、可搬記録媒体１００９を駆動し、その記録内容にアクセスする。可搬記録媒体１００９は、メモリカード、フレキシブルディスク、光ディスク、光磁気ディスク等の任意のコンピュータ読み取り可能な記録媒体である。オペレータは、この可搬記録媒体１００９にプログラム及びデータを格納しておき、必要に応じて、それらをメモリ１００２にロードして使用する。 The medium driving device 1006 drives a portable recording medium 1009 and accesses the recorded contents. The portable recording medium 1009 is an arbitrary computer-readable recording medium such as a memory card, a flexible disk, an optical disk, and a magneto-optical disk. The operator stores programs and data in the portable recording medium 1009 and loads them into the memory 1002 for use as necessary.

ネットワーク接続装置１００７は、通信ネットワークに接続され、通信に伴うデータ変換を行う。情報処理装置は、必要に応じて、プログラム及びデータを外部の装置からネットワーク接続装置１００７を介して受け取り、それらをメモリ１００２にロードして使用する。 The network connection device 1007 is connected to a communication network and performs data conversion accompanying communication. The information processing apparatus receives programs and data from an external apparatus via the network connection apparatus 1007 as necessary, and loads them into the memory 1002 for use.

図１１は、図１０の情報処理装置にプログラム及びデータを提供する方法を示している。可搬記録媒体１００９や外部装置１１０１のデータベース１１１１に格納されたプログラム及びデータは、情報処理装置１１０２のメモリ１００２にロードされる。外部装置１１０１は、そのプログラム及びデータを搬送する搬送信号を生成し、通信ネットワーク上の伝送媒体を介して情報処理装置１１０２に送信する。ＣＰＵ１００１は、そのデータを用いてそのプログラムを実行し、上述した処理を行う。 FIG. 11 shows a method for providing a program and data to the information processing apparatus of FIG. Programs and data stored in the portable recording medium 1009 and the database 1111 of the external device 1101 are loaded into the memory 1002 of the information processing device 1102. The external device 1101 generates a carrier signal for carrying the program and data, and transmits the carrier signal to the information processing device 1102 via a transmission medium on the communication network. The CPU 1001 executes the program using the data and performs the above-described processing.

開示の実施形態とその利点について詳しく説明したが、当業者は、特許請求の範囲に明確に記載した発明の範囲から逸脱することなく、様々な変更、追加、省略をすることができるであろう。 Although the disclosed embodiments and their advantages have been described in detail, those skilled in the art will be able to make various changes, additions and omissions without departing from the scope of the invention as explicitly set forth in the claims. .

第１の動作認識装置の構成図である。It is a lineblock diagram of the 1st operation recognition device. 第２の動作認識装置の構成図である。It is a block diagram of the 2nd operation | movement recognition apparatus. 左カメラ及び右カメラの配置を示す図である。It is a figure which shows arrangement | positioning of the left camera and the right camera. 相対視差を示す図である。It is a figure which shows a relative parallax. 注視点制御処理のフローチャートである。It is a flowchart of a gaze point control process. 動作認識処理のフローチャートである。It is a flowchart of an action recognition process. 認識テーブルを示す図である。It is a figure which shows a recognition table. 物体テーブルを示す図である。It is a figure which shows an object table. 行為認識処理のフローチャートである。It is a flowchart of an action recognition process. 情報処理装置の構成図である。It is a block diagram of information processing apparatus. プログラム及びデータを提供する方法を示す図である。It is a figure which shows the method of providing a program and data.

Explanation of symbols

１０１広視野撮影部
１０２高解像度撮影部
１０３動き／顔検知部
１０４注視点制御部
１０５姿勢推定部
１０６行為認識部
２０１カメラヘッド
２０２パン・チルト制御モータ
２０３カメラ制御装置
２０４画像処理装置
２０５行為推定装置
２０６ホスト装置
２１１、２１３高解像度カメラ
２１２、２１４輻輳制御モータ
２１５広視野カメラ
３０１人物
３１１顔
３１２右腕
３１３左腕
３２１左カメラ
３２２右カメラ
４０１、４０２画像
４１１、４１２、４１３、４１４、４２１、４２２、４２３、４２４、４３１、４３２、４３３、４４１、４４２点
１００１ＣＰＵ
１００２メモリ
１００３入力装置
１００４出力装置
１００５外部記憶装置
１００６媒体駆動装置
１００７ネットワーク接続装置
１００８バス
１００９可搬記録媒体
１１０１外部装置
１１０２情報処理装置
１１１１データベース DESCRIPTION OF SYMBOLS 101 Wide-field imaging | photography part 102 High-resolution imaging | photography part 103 Motion / face detection part 104 Gaze point control part 105 Posture estimation part 106 Action recognition part 201 Camera head 202 Pan / tilt control motor 203 Camera control apparatus 204 Image processing apparatus 205 Action estimation apparatus 206 Host device 211, 213 High resolution camera 212, 214 Convergence control motor 215 Wide field of view camera 301 Person 311 Face 312 Right arm 313 Left arm 321 Left camera 322 Right camera 401, 402 Image 411, 412, 413, 414, 421, 422, 423 424, 431, 432, 433, 441, 442 points 1001 CPU
1002 Memory 1003 Input device 1004 Output device 1005 External storage device 1006 Medium drive device 1007 Network connection device 1008 Bus 1009 Portable recording medium 1101 External device 1102 Information processing device 1111 Database

Claims

First imaging means for imaging a person;
Second imaging means for imaging the person;
The image obtained by each of the first and second imaging means is divided into a plurality of small areas, and the small areas of each image recognized as having captured the same part of the person are associated with each other. Parallax calculation means for calculating the distance between corresponding points of the two attached small areas as parallax;
The parallax of the face area recognized as an image of the person's face is compared with the parallax of the skin color area recognized as an image of the skin color part other than the person's face, and the distance from the face area is first. Detecting means for detecting, as an arm region, a skin color region that is smaller than the threshold value and the difference in parallax from the face region is smaller than the second threshold value;
Recognizing means for recognizing the movement of the person's arm based on the relative position of the arm area with respect to the face area and the difference between the parallax of the face area and the parallax of the arm area; Motion recognition device.

The recognizing unit compares the difference between the coordinate value of the face region and the coordinate value of the arm region with a third threshold value in the image obtained by the first or second imaging unit, so that the face Determining the relative position of the arm region relative to the region and comparing the difference between the parallax of the face region and the parallax of the arm region with a fourth threshold value, so that The motion recognition apparatus according to claim 1, wherein a depth position is obtained, and the posture of the person is obtained from the obtained in-image position and depth position.

The image processing apparatus further comprises storage means for storing a time series pattern of a relative position in the image of the arm area with respect to the face area and a relative depth position of the arm with respect to the face. The position is recorded in time series, and the movement of the person is recognized by collating the time series pattern of the recorded in-image position and depth position with the time series pattern stored in the storage means. The motion recognition apparatus according to claim 2.

Controlling the first and second imaging means to image a person,
Dividing an image obtained by each of the first and second imaging means into a plurality of small regions;
Recognizing that the same part of the person was imaged, the small areas of each image are associated with each other,
Calculating the distance between corresponding points of the two associated small areas as parallax,
Comparing the parallax of the face area recognized as an image of the person's face with the parallax of the skin color area recognized as an image of the skin color part other than the person's face;
Detecting a skin color area whose distance from the face area is smaller than a first threshold and whose parallax difference from the face area is smaller than a second threshold as an arm area;
A program for causing a computer to execute processing for recognizing the movement of the person's arm based on the relative position of the arm region with respect to the face region and the difference between the parallax of the face region and the parallax of the arm region .

The first and second imaging means image a person,
Computer
Dividing an image obtained by each of the first and second imaging means into a plurality of small regions;
Recognizing that the same part of the person was imaged, the small areas of each image are associated with each other,
Calculating the distance between corresponding points of the two associated small areas as parallax,
Comparing the parallax of the face area recognized as an image of the person's face with the parallax of the skin color area recognized as an image of the skin color part other than the person's face;
Detecting a skin color area whose distance from the face area is smaller than a first threshold and whose parallax difference from the face area is smaller than a second threshold as an arm area;
A motion recognition method comprising recognizing a motion of the person's arm based on a relative position of the arm region with respect to the face region and a difference between the parallax of the face region and the parallax of the arm region.