JP2006268607A

JP2006268607A - Communication robot and motion identification system using it

Info

Publication number: JP2006268607A
Application number: JP2005087651A
Authority: JP
Inventors: Kazuhiko Shinosawa; 一彦篠沢; Kiyoshi Kogure; 潔小暮
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2005-03-25
Filing date: 2005-03-25
Publication date: 2006-10-05
Anticipated expiration: 2025-03-25
Also published as: JP4613284B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a communication robot accurately specifying a human motion while reducing a calculation time, and a motion identification system using it. <P>SOLUTION: The motion identification system 10 comprises the communication robot 12. The communication robot 12 detects a tag ID of an article owned by a person or present in the vicinity of the person. The communication robot 12 determines motion candidates of the person from the detected tag ID. The communication robot 12 takes an image of the person by an eye camera. The communication robot calculates the similarity of each of the motion candidates to the taken image, and specifies a motion shown by the taken image as a motion with the highest similarity. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

この発明はコミュニケーションロボットおよびそれを用いた動作識別システムに関し、特にたとえば、人間の動作を識別する、コミュニケーションロボットおよびそれを用いた動作識別システムに関する。 The present invention relates to a communication robot and a motion identification system using the communication robot, and more particularly to a communication robot and a motion identification system using the communication robot for identifying a human motion, for example.

本件出願人は、特許文献１に代表されるように、人間と相互作用するコミュニケーションロボットを提案してきた。
特開２００２−３５５７８３号公報 The present applicant has proposed a communication robot that interacts with a human, as represented by Patent Document 1.
Japanese Patent Application Laid-Open No. 2002-355783

背景技術のコミュニケーションロボットを用いて、その周囲に存在する人間の動作を識別する場合には、たとえば、人間のあらゆる動作（行動）についての映像を予め記録しておき、撮影した映像と予め記録しておいた映像とを比較し、いずれか１つの動作を特定する。これでは、計算量が膨大である。また、映像のみで動作を判断した場合には、異なる動作であっても映像が似ている動作では、誤った動作として特定してしまう可能性がある。 When using a communication robot of the background art to identify human movements around it, for example, video images of all human movements (behaviors) are recorded in advance, and recorded video images are recorded in advance. It compares with the stored video and identifies any one action. In this case, the calculation amount is enormous. In addition, when an operation is determined only by a video, there is a possibility that an operation similar to the video even if it is a different operation is specified as an erroneous operation.

それゆえに、この発明の主たる目的は、新規な、コミュニケーションロボットおよびそれを用いた動作識別システムを提供することである。 Therefore, a main object of the present invention is to provide a novel communication robot and a motion identification system using the communication robot.

この発明の他の目的は、計算処理を低減して人間の行動を正確に特定できる、コミュニケーションロボットおよびそれを用いた動作識別システムを提供することである。 Another object of the present invention is to provide a communication robot and an action identification system using the same that can accurately specify human behavior by reducing calculation processing.

請求項１の発明は、人間を撮影する撮影手段、人間の近傍に存在する物体についての識別情報を検出する識別情報検出手段、識別情報検出手段によって検出された識別情報に基づいて人間の動作を予測する動作予測手段、および撮影手段によって撮影された撮影画像が示す人間の実動作と動作予測手段によって予測された予測動作とから当該実動作を特定する動作特定手段を備える、コミュニケーションロボットである。 According to the first aspect of the present invention, a photographing means for photographing a human, an identification information detecting means for detecting identification information about an object existing in the vicinity of the human, and a human action based on the identification information detected by the identification information detecting means. The communication robot includes a motion predicting unit that predicts, and a motion specifying unit that specifies the actual motion from a human actual motion indicated by a captured image captured by the capturing unit and a predicted motion predicted by the motion predicting unit.

請求項１の発明では、コミュニケーションロボットは、撮影手段、識別情報検出手段、動作予測手段、および動作特定手段を備える。撮影手段は、たとえば、人間を撮影する。識別情報検出手段は、ロボット自身が存在する環境において、その周辺に存在する物体（物品）、厳密には、人間の近傍に存在する物品についての識別情報を検出する。実施例では、物品に装着した無線タグから発信されるタグＩＤを検出する。動作予測手段は、識別情報検出手段によって検出された識別情報に基づいて人間の動作を予測する。たとえば、人間が鉛筆を所持したり、人間の近傍に鉛筆が存在する場合には、人間が文字等を書いたり、書いた文字等を読んだりするなどしていると、その動作を予測することができる。そして、動作特定手段は、撮影手段によって撮影された撮影画像が示す人間の実動作と動作予測手段によって予測された予測動作とから当該実動作を特定する。 According to the first aspect of the present invention, the communication robot includes a photographing unit, an identification information detecting unit, an operation predicting unit, and an operation specifying unit. The photographing means photographs a human, for example. The identification information detection means detects identification information about an object (article) that exists in the vicinity of the robot in the environment where the robot itself exists, strictly speaking, an article that exists in the vicinity of a human. In the embodiment, the tag ID transmitted from the wireless tag attached to the article is detected. The motion prediction means predicts a human motion based on the identification information detected by the identification information detection means. For example, if a person has a pencil or a pencil is in the vicinity of a person, predict the movement of the person who is writing or reading the character. Can do. Then, the action specifying unit specifies the actual action from the human actual action indicated by the photographed image taken by the photographing means and the predicted action predicted by the action predicting unit.

請求項１の発明によれば、人間の近傍に存在する物体に基づいて動作を予測し、予測した動作から実動作を特定するので、あらゆる動作から１の動作を特定する場合よりも、大幅に特定処理を軽減することができる。また、物体と実動作との組み合わせにより、１の動作を特定するので、動作を正確に特定することができる。 According to the first aspect of the present invention, since the motion is predicted based on the object existing in the vicinity of the human and the actual motion is specified from the predicted motion, it is much more significant than the case where one motion is specified from every motion. Specific processing can be reduced. Further, since one motion is specified by the combination of the object and the actual motion, the motion can be accurately specified.

請求項２の発明は請求項１に従属し、撮影画像が示す人間の実動作と予測動作の各々との類似度を算出する類似度算出手段をさらに備え、動作特定手段は、類似度算出手段によって算出された類似度が最も高い予測動作を実動作として特定する。 The invention of claim 2 is dependent on claim 1, and further comprises similarity calculation means for calculating the similarity between each of the human actual motion and the predicted action indicated by the captured image, and the action specifying means is the similarity calculation means. The predicted motion with the highest similarity calculated by is identified as an actual motion.

請求項２の発明では、コミュニケ−ショロボットは、類似度算出手段をさらに備える。この類似度算出手段は、撮影画像が示す人間の実動作と予測動作の各々との類似度を算出する。動作特定手段は、類似度が最も高い予測動作を実動作として特定する。 In the invention of claim 2, the communication robot further includes a similarity calculation means. This similarity calculation means calculates the similarity between each human actual motion and predicted motion indicated by the captured image. The action specifying unit specifies a predicted action having the highest similarity as an actual action.

請求項２の発明では、識別情報に基づいて絞り込んだ予測動作との類似度を算出するだけなので、すべての動作との類似度を算出する場合に比べて処理負担を軽減することができる。また、絞り込んだ予測動作から実動作を特定するので、すべての動作について類似度を算出する場合よりも、識別精度を高くすることができる。 According to the second aspect of the present invention, since the degree of similarity with the predicted motion narrowed down based on the identification information is only calculated, the processing load can be reduced as compared with the case of calculating the degree of similarity with all motions. In addition, since the actual motion is specified from the narrowed predicted motion, the identification accuracy can be made higher than when the similarity is calculated for all motions.

請求項３の発明は、コミュニケーションロボットと、このコミュニケーションロボットと通信可能に設けられるサーバとを備える動作識別システムであって、コミュニケーションロボットは、人間を撮影する撮影手段、人間の近傍に存在する物体についての識別情報を検出する識別情報検出手段、検出した識別情報をサーバに送信する識別情報送信手段、サーバから人間の予測動作情報を受信する予測動作情報受信手段、および撮影手段によって撮影された撮影画像が示す人間の実動作と予測動作情報受信手段によって受信された予測動作情報とから当該実動作を特定する動作特定手段を備え、サーバは、識別情報送信手段に送信された識別情報を受信する識別情報受信手段、および識別情報受信手段によって受信された識別情報に基づいて取得した人間の予測動作情報をコミュニケーションロボットに送信する予測動作情報送信手段を備える、動作識別システムである。 The invention of claim 3 is an operation identification system comprising a communication robot and a server provided so as to be communicable with the communication robot, wherein the communication robot is used for photographing means for photographing a person and an object existing in the vicinity of the person. Identification information detecting means for detecting the identification information of the user, identification information transmitting means for transmitting the detected identification information to the server, predicted motion information receiving means for receiving human predicted motion information from the server, and a photographed image taken by the photographing means The operation identification means for identifying the actual movement from the human actual movement indicated by the prediction movement information received by the prediction movement information receiving means, and the server receives the identification information transmitted to the identification information transmission means. Based on the information receiving means and the identification information received by the identification information receiving means The prediction operation information obtained by human comprising a prediction operation information transmitting means for transmitting to the communication robot, an operation identification system.

請求項３の発明は、コミュニケーションロボットと、このコミュニケーションロボットと通信可能に設けられるサーバとを備える動作識別システムである。この動作識別システムでは、コミュニケーションロボットは、人間の近傍に存在する物体の識別情報を検出すると、その識別情報をサーバに送信する。サーバは、識別情報に基づいて動作を予測し、予測動作情報をコミュニケーションロボットに送信する。つまり、請求項２の発明では、動作を予測する手段ないし役割としてのサーバが設けられる点が請求項１の発明と異なる。 The invention of claim 3 is an operation identification system comprising a communication robot and a server provided so as to be communicable with the communication robot. In this motion identification system, when the communication robot detects identification information of an object existing in the vicinity of a person, the communication robot transmits the identification information to a server. The server predicts the operation based on the identification information, and transmits the predicted operation information to the communication robot. That is, the invention of claim 2 is different from the invention of claim 1 in that a server as a means or role for predicting the operation is provided.

請求項３の発明によれば、請求項１の発明と同様に、動作特定についての計算処理を大幅に低減できる。また、ロボットにおける計算処理の負担を軽減できる。 According to the invention of claim 3, as in the invention of claim 1, it is possible to greatly reduce the calculation processing for the action specification. Moreover, the burden of calculation processing in the robot can be reduced.

請求項４の発明は、コミュニケーションロボットと、このコミュニケーションロボットと通信可能に設けられるサーバとを備える動作識別システムであって、コミュニケーションロボットは、人間を撮影する撮影手段、人間の近傍に存在する物体についての識別情報を検出する識別情報検出手段、および撮影手段によって撮影された撮影画像と識別情報検出手段によって検出された識別情報とをサーバに送信する送信手段を備え、サーバは、送信手段に送信された撮影画像と識別情報とを受信する受信手段、受信手段によって受信された識別情報に基づいて人間の動作を予測する動作予測手段、識別情報受信手段によって受信された撮影画像が示す人間の実動作と動作予測手段によって予測された予測動作とから当該実動作を特定する動作特定手段を備える、動作識別システムである。 The invention of claim 4 is an operation identification system comprising a communication robot and a server provided so as to be communicable with the communication robot, wherein the communication robot is for photographing means for photographing a person, and an object existing in the vicinity of the person. Identification information detecting means for detecting the identification information, and transmission means for transmitting the photographed image photographed by the photographing means and the identification information detected by the identification information detecting means to the server, and the server is transmitted to the transmitting means. Receiving means for receiving the captured image and identification information, operation predicting means for predicting human motion based on the identification information received by the receiving means, and actual human motion indicated by the captured image received by the identification information receiving means And a motion characteristic that identifies the actual motion from the motion predicted by the motion prediction means. Comprising means, an operation identification system.

請求項４の発明では、コミュニケーションロボットは、識別情報と人間の撮影画像とをサーバに送信し、サーバが識別情報に基づいて行動を予測し、予測行動の各々と撮影画像とから、撮影画像が示す人間の動作を特定するようにしてある以外は、請求項３の発明と同じである。 In the invention of claim 4, the communication robot transmits identification information and a human captured image to the server, the server predicts an action based on the identification information, and the captured image is obtained from each of the predicted actions and the captured image. The present invention is the same as the invention of claim 3 except that the human action to be shown is specified.

請求項４の発明においても、請求項１の発明と同様に、動作特定についての計算処理を大幅に低減できる。また、ロボットにおける計算処理の負担を軽減できる。 In the invention of claim 4, as in the invention of claim 1, the calculation processing for specifying the operation can be greatly reduced. Moreover, the burden of calculation processing in the robot can be reduced.

この発明によれば、人間の近傍に存在する物体から人間の動作を推定し、推定した動作のいずれか１つに特定するので、計算量を大幅に低減することができる。また、映像のみならず、物体との組み合わせにより、動作を特定するので、正確に動作を特定することができる。 According to the present invention, since the human motion is estimated from an object existing in the vicinity of the human and is identified as any one of the estimated motions, the amount of calculation can be greatly reduced. Further, since the motion is specified not only by the image but also by the combination with the object, the motion can be accurately specified.

この発明の上述の目的，その他の目的，特徴および利点は、図面を参照して行う以下の実施例の詳細な説明から一層明らかとなろう。 The above object, other objects, features and advantages of the present invention will become more apparent from the following detailed description of embodiments with reference to the drawings.

＜第１実施例＞
図１を参照して、この第１実施例の動作識別システム（以下、単に「システム」という。）１０は、コミュニケーションロボット（以下、単に「ロボット」という。）１２および複数のタグ１４を含む。ロボット１２は、人間との間で、身振り手振りのような身体動作ないし行動（以下、「コミュニケーション行動」ということがある。）を取ることができる。ただし、コミュニケーション行動としては、ロボット１２と人間との間における会話が含まれる場合もある。タグ１４は、たとえばパッシブタイプの無線タグ（ＲＦタグ、ＩＣタグなど）であり、少なくともロボット１２が存在する環境に存在する物体（図１では省略する。）に装着される（貼り付けられる）。 <First embodiment>
Referring to FIG. 1, an operation identification system (hereinafter simply referred to as “system”) 10 according to the first embodiment includes a communication robot (hereinafter simply referred to as “robot”) 12 and a plurality of tags 14. The robot 12 can take a body motion or action (hereinafter sometimes referred to as “communication action”) such as gestures with a human. However, the communication action may include a conversation between the robot 12 and a human. The tag 14 is, for example, a passive type wireless tag (RF tag, IC tag, etc.), and is attached (pasted) to an object (not shown in FIG. 1) that exists at least in the environment where the robot 12 exists.

ロボット１２は、人間のような身体を有し、その身体を用いてコミュニケーションのために必要な複雑な身体動作を生成する。具体的には、図２を参照して、ロボット１２は台車３２を含み、この台車３２の下面には、このロボット１２を自律移動させる車輪３４が設けられる。この車輪３４は、車輪モータ（ロボット１２の内部構成を示す図３において参照番号「３６」で示す。）によって駆動され、台車３２すなわちロボット１２を前後左右任意の方向に動かすことができる。 The robot 12 has a human-like body and generates complex body movements necessary for communication using the body. Specifically, referring to FIG. 2, the robot 12 includes a carriage 32, and wheels 34 for autonomously moving the robot 12 are provided on the lower surface of the carriage 32. The wheel 34 is driven by a wheel motor (indicated by reference numeral “36” in FIG. 3 showing the internal configuration of the robot 12), and the carriage 32, that is, the robot 12 can be moved in any direction.

なお、図２では示さないが、この台車３２の前面には、衝突センサ（図３において参照番号「３８」で示す。）が取り付けられ、この衝突センサ３８は、台車３２への人や他の障害物の接触を検知する。そして、ロボット１２の移動中に障害物との接触を検知すると、直ちに車輪３４の駆動を停止してロボット１２の移動を急停止させる。 Although not shown in FIG. 2, a collision sensor (indicated by reference numeral “38” in FIG. 3) is attached to the front surface of the carriage 32, and the collision sensor 38 is connected to a person or other person to the carriage 32. Detect obstacle contact. When contact with an obstacle is detected during the movement of the robot 12, the driving of the wheels 34 is immediately stopped and the movement of the robot 12 is suddenly stopped.

また、ロボット１２の背の高さは、この第１実施例では、人、特に子供に威圧感を与えることがないように、１００ｃｍ程度とされている。ただし、この背の高さは任意に変更可能である。 In the first embodiment, the height of the robot 12 is about 100 cm so as not to intimidate people, particularly children. However, this height can be arbitrarily changed.

台車３２の上には、多角形柱のセンサ取付パネル４０が設けられ、このセンサ取付パネル４０の各面には、超音波距離センサ４２が取り付けられる。この超音波距離センサ４２は、取付パネル４０すなわちロボット１２の周囲の主として人との間の距離を計測するものである。 A polygonal column sensor mounting panel 40 is provided on the carriage 32, and an ultrasonic distance sensor 42 is mounted on each surface of the sensor mounting panel 40. The ultrasonic distance sensor 42 measures the distance between the mounting panel 40, that is, the person around the robot 12 mainly.

台車３２の上には、さらに、ロボット１２の胴体が、その下部が上述の取付パネル４０に囲まれて、直立するように取り付けられる。この胴体は下部胴体４４と上部胴体４６とから構成され、これら下部胴体４４および上部胴体４６は、連結部４８によって連結される。連結部４８には、図示しないが、昇降機構が内蔵されていて、この昇降機構を用いることによって、上部胴体４６の高さすなわちロボット１２の高さを変化させることができる。昇降機構は、後述のように、腰モータ（図３において参照番号「５０」で示す。）によって駆動される。上で述べたロボット１２の身長１００ｃｍは、上部胴体４６をそれの最下位置にしたときの値である。したがって、ロボット１２の身長は１００ｃｍ以上にすることができる。 Further, the body of the robot 12 is mounted on the carriage 32 so that the lower portion thereof is surrounded by the mounting panel 40 described above and stands upright. The body is composed of a lower body 44 and an upper body 46, and the lower body 44 and the upper body 46 are connected by a connecting portion 48. Although not shown, the connecting portion 48 has a built-in lifting mechanism, and the height of the upper body 46, that is, the height of the robot 12 can be changed by using the lifting mechanism. As will be described later, the elevating mechanism is driven by a waist motor (indicated by reference numeral “50” in FIG. 3). The height 100 cm of the robot 12 described above is a value when the upper body 46 is at its lowest position. Therefore, the height of the robot 12 can be 100 cm or more.

上部胴体４６のほぼ中央には、１つの全方位カメラ５２と、１つのマイク１６とが設けられる。全方位カメラ５２は、ロボット１２の周囲を撮影するもので、後述の眼カメラ５４と区別される。マイク１６は、周囲の音、とりわけ人の声を取り込む。 One omnidirectional camera 52 and one microphone 16 are provided in the approximate center of the upper body 46. The omnidirectional camera 52 photographs the surroundings of the robot 12 and is distinguished from an eye camera 54 described later. The microphone 16 captures ambient sounds, particularly human voice.

上部胴体４６の両肩には、それぞれ、肩関節５６Ｒおよび５６Ｌによって、上腕５８Ｒおよび５８Ｌが取り付けられる。肩関節５６Ｒおよび５６Ｌは、それぞれ３軸の自由度を有する。すなわち、右肩関節５６Ｒは、Ｘ軸，Ｙ軸およびＺ軸の各軸廻りにおいて上腕５８Ｒの角度を制御できる。Ｙ軸は、上腕５８Ｒの長手方向（または軸）に平行な軸であり、Ｘ軸およびＺ軸は、そのＹ軸に、それぞれ異なる方向から直交する軸である。左肩関節５６Ｌは、Ａ軸，Ｂ軸およびＣ軸の各軸廻りにおいて上腕５８Ｌの角度を制御できる。Ｂ軸は、上腕５８Ｌの長手方向（または軸）に平行な軸であり、Ａ軸およびＣ軸は、そのＢ軸に、それぞれ異なる方向から直交する軸である。 Upper arms 58R and 58L are attached to both shoulders of the upper body 46 by shoulder joints 56R and 56L, respectively. The shoulder joints 56R and 56L each have three degrees of freedom. That is, the right shoulder joint 56R can control the angle of the upper arm 58R around each of the X, Y, and Z axes. The Y axis is an axis parallel to the longitudinal direction (or axis) of the upper arm 58R, and the X axis and the Z axis are axes orthogonal to the Y axis from different directions. The left shoulder joint 56L can control the angle of the upper arm 58L around each of the A, B, and C axes. The B axis is an axis parallel to the longitudinal direction (or axis) of the upper arm 58L, and the A axis and the C axis are axes orthogonal to the B axis from different directions.

上腕５８Ｒおよび５８Ｌのそれぞれの先端には、肘関節６０Ｒおよび６０Ｌを介して、前腕６２Ｒおよび６２Ｌが取り付けられる。肘関節６０Ｒおよび６０Ｌは、それぞれ、Ｗ軸およびＤ軸の軸廻りにおいて、前腕６２Ｒおよび６２Ｌの角度を制御できる。 Forearms 62R and 62L are attached to the respective distal ends of upper arms 58R and 58L via elbow joints 60R and 60L. The elbow joints 60R and 60L can control the angles of the forearms 62R and 62L around the W axis and the D axis, respectively.

なお、上腕５８Ｒおよび５８Ｌならびに前腕６２Ｒおよび６２Ｌ（いずれも図２）の変位を制御するＸ，Ｙ，Ｚ，Ｗ軸およびＡ，Ｂ，Ｃ，Ｄ軸では、「０度」がホームポジションであり、このホームポジションでは、上腕５８Ｒおよび５８Ｌならびに前腕６２Ｒおよび６２Ｌは下方向に向けられる。 In the X, Y, Z, W axes and the A, B, C, D axes that control the displacement of the upper arms 58R and 58L and the forearms 62R and 62L (FIG. 2), “0 degree” is the home position. In this home position, the upper arms 58R and 58L and the forearms 62R and 62L are directed downward.

また、図２では示さないが、上部胴体４６の肩関節５６Ｒおよび５６Ｌを含む肩の部分や上述の上腕５８Ｒおよび５８Ｌならびに前腕６２Ｒおよび６２Ｌを含む腕の部分には、それぞれ、タッチセンサ（図３において参照番号６４で包括的に示す。）が設けられていて、これらのタッチセンサ６４は、人がロボット１２のこれらの部位に接触したかどうかを検知する。 Although not shown in FIG. 2, a touch sensor (FIG. 3) is provided on the shoulder portion including the shoulder joints 56R and 56L of the upper body 46 and the arm portion including the upper arms 58R and 58L and the forearms 62R and 62L. The touch sensor 64 detects whether or not a person has touched these parts of the robot 12.

前腕６２Ｒおよび６２Ｌのそれぞれの先端には、手に相当する球体６６Ｒおよび６６Ｌがそれぞれ固定的に取り付けられる。なお、この球体６６Ｒおよび６６Ｌに代えて、この第１実施例のロボット１２と異なり指の機能が必要な場合には、人の手の形をした「手」を用いることも可能である。また、球体６６Ｒには、右手用のタグリーダ１０２が設けられ、球体６６Ｌには、左手用のタグリーダ１０４が設けられる。ただし、いずれか一方の球体ないし手にのみタグリータを設けるようにしてもよい。 Spheres 66R and 66L corresponding to hands are fixedly attached to the tips of the forearms 62R and 62L, respectively. Instead of the spheres 66R and 66L, a “hand” in the shape of a human hand can be used when a finger function is required unlike the robot 12 of the first embodiment. The sphere 66R is provided with a right-hand tag reader 102, and the sphere 66L is provided with a left-hand tag reader 104. However, the tag reader may be provided only on one of the spheres or hands.

上部胴体４６の中央上方には、首関節６８を介して、頭部７０が取り付けられる。この首関節６８は、３軸の自由度を有し、Ｓ軸，Ｔ軸およびＵ軸の各軸廻りに角度制御可能である。Ｓ軸は首から真上に向かう軸であり、Ｔ軸およびＵ軸は、それぞれ、このＳ軸に対して異なる方向で直交する軸である。頭部７０には、人の口に相当する位置に、スピーカ７２が設けられる。スピーカ７２は、ロボット１２が、それの周囲の人に対して音声または声によってコミュニケーションを図るために用いられる。ただし、スピーカ７２は、ロボット１２の他の部位たとえば胴体に設けられてもよい。 A head 70 is attached to an upper center of the upper body 46 via a neck joint 68. The neck joint 68 has three degrees of freedom and can be controlled in angle around each of the S, T, and U axes. The S-axis is an axis that goes directly from the neck, and the T-axis and the U-axis are axes that are orthogonal to the S-axis in different directions. The head 70 is provided with a speaker 72 at a position corresponding to a human mouth. The speaker 72 is used for the robot 12 to communicate with a person around it by voice or voice. However, the speaker 72 may be provided in another part of the robot 12, for example, the trunk.

また、頭部７０には、目に相当する位置に眼球部７４Ｒおよび７４Ｌが設けられる。眼球部７４Ｒおよび７４Ｌは、それぞれ眼カメラ５４Ｒおよび５４Ｌを含む。なお、右の眼球部７４Ｒおよび左の眼球部７４Ｌをまとめて眼球部７４といい、右の眼カメラ５４Ｒおよび左の眼カメラ５４Ｌをまとめて眼カメラ５４ということもある。眼カメラ５４は、ロボット１２に接近した人の顔や他の部分ないし物体等を撮影してその映像信号を取り込む。 The head 70 is provided with eyeball portions 74R and 74L at positions corresponding to the eyes. Eyeball portions 74R and 74L include eye cameras 54R and 54L, respectively. The right eyeball portion 74R and the left eyeball portion 74L may be collectively referred to as an eyeball portion 74, and the right eye camera 54R and the left eye camera 54L may be collectively referred to as an eye camera 54. The eye camera 54 captures the video signal by photographing the face of the person approaching the robot 12 and other parts or objects.

なお、上述の全方位カメラ５２および眼カメラ５４のいずれも、たとえばＣＣＤやＣＭＯＳのような固体撮像素子を用いるカメラであってよい。 Note that each of the omnidirectional camera 52 and the eye camera 54 described above may be a camera using a solid-state imaging device such as a CCD or a CMOS.

たとえば、眼カメラ５４は眼球部７４内に固定され、眼球部７４は眼球支持部（図示せず）を介して頭部７０内の所定位置に取り付けられる。眼球支持部は、２軸の自由度を有し、α軸およびβ軸の各軸廻りに角度制御可能である。α軸およびβ軸は頭部７０に対して設定される軸であり、α軸は頭部７０の上へ向かう方向の軸であり、β軸はα軸に直交しかつ頭部７０の正面側（顔）が向く方向に直交する方向の軸である。この第１実施例では、頭部７０がホームポジションにあるとき、α軸はＳ軸に平行し、β軸はＵ軸に平行するように設定されている。このような頭部７０において、眼球支持部がα軸およびβ軸の各軸廻りに回転されることによって、眼球部７４ないし眼カメラ５４の先端（正面）側が変位され、カメラ軸すなわち視線方向が移動される。 For example, the eye camera 54 is fixed in the eyeball part 74, and the eyeball part 74 is attached to a predetermined position in the head 70 via an eyeball support part (not shown). The eyeball support unit has two degrees of freedom and can be controlled in angle around each of the α axis and the β axis. The α axis and the β axis are axes set with respect to the head 70, the α axis is an axis in a direction toward the top of the head 70, the β axis is orthogonal to the α axis and the front side of the head 70 It is an axis in a direction orthogonal to the direction in which (face) faces. In the first embodiment, when the head 70 is at the home position, the α axis is set to be parallel to the S axis and the β axis is set to be parallel to the U axis. In such a head 70, when the eyeball support portion is rotated around each of the α axis and the β axis, the tip (front) side of the eyeball portion 74 or the eye camera 54 is displaced, and the camera axis, that is, the line-of-sight direction is changed. Moved.

なお、眼カメラ５４の変位を制御するα軸およびβ軸では、「０度」がホームポジションであり、このホームポジションでは、図２に示すように、眼カメラ５４のカメラ軸は頭部７０の正面側（顔）が向く方向に向けられ、視線は正視状態となる。 In the α axis and β axis that control the displacement of the eye camera 54, “0 degree” is the home position. At this home position, the camera axis of the eye camera 54 is the head 70 as shown in FIG. The direction of the front side (face) is directed, and the line of sight is in the normal viewing state.

図３には、ロボット１２の内部構成を示すブロック図が示される。この図３に示すように、ロボット１２は、全体の制御のためにマイクロコンピュータまたはＣＰＵ７６を含み、このＣＰＵ７６には、バス７８を通して、メモリ８０，モータ制御ボード８２，センサ入力／出力ボード８４および音声入力／出力ボード８６が接続される。 FIG. 3 is a block diagram showing the internal configuration of the robot 12. As shown in FIG. 3, the robot 12 includes a microcomputer or a CPU 76 for overall control. The CPU 76 is connected to a memory 80, a motor control board 82, a sensor input / output board 84, and a voice through a bus 78. An input / output board 86 is connected.

メモリ８０は、図示しないが、ＲＯＭやＨＤＤ、ＲＡＭ等を含み、ＲＯＭまたはＨＤＤにはこのロボット１２の制御プログラムおよびデータ等が予め格納されている。ＣＰＵ７６は、このプログラムに従って処理を実行する。具体的には、ロボット１２の身体動作を制御するための複数のプログラム（行動モジュールと呼ばれる。）が記憶される。たとえば、行動モジュールが示す身体動作としては、「握手」、「抱っこ」、「万歳」…などがある。行動モジュールが示す身体動作が「握手」である場合には、当該行動モジュールを実行すると、ロボット１２は、たとえば、右手を前に差し出す。また、行動モジュールが示す身体動作が「抱っこ」である場合には、当該行動モジュールを実行すると、ロボット１２は、たとえば、両手を前に差し出す。さらに、行動モジュールが示す身体動作が「万歳」である場合には、当該行動モジュールを実行すると、ロボット１２は、たとえば、両手を数回（たとえば、２回）上下させる。また、ＲＡＭは、一時記憶メモリとして用いられるとともに、ワーキングメモリとして利用され得る。 Although not shown, the memory 80 includes a ROM, an HDD, a RAM, and the like, and the control program and data for the robot 12 are stored in the ROM or the HDD in advance. The CPU 76 executes processing according to this program. Specifically, a plurality of programs (referred to as action modules) for controlling the body movement of the robot 12 are stored. For example, the body motions indicated by the behavior module include “handshake”, “cuckling”, “many years”, and so on. When the body motion indicated by the behavior module is “handshake”, when the behavior module is executed, the robot 12 presents the right hand forward, for example. Further, when the body motion indicated by the behavior module is “cuddle”, when the behavior module is executed, the robot 12 presents both hands forward, for example. Further, when the body motion indicated by the behavior module is “many years”, when the behavior module is executed, the robot 12 raises and lowers both hands several times (for example, twice), for example. The RAM can be used as a working memory as well as a temporary storage memory.

モータ制御ボード８２は、たとえばＤＳＰ(Digital Signal Processor)で構成され、右腕、左腕、頭および眼等の身体部位を駆動するためのモータを制御する。すなわち、モータ制御ボード８２は、ＣＰＵ７６からの制御データを受け、右肩関節５６ＲのＸ，ＹおよびＺ軸のそれぞれの角度を制御する３つのモータと右肘関節６０Ｒの軸Ｗの角度を制御する１つのモータを含む計４つのモータ（図３ではまとめて、「右腕モータ」として示す。）８８の回転角度を調節する。また、モータ制御ボード８２は、左肩関節５６ＬのＡ，ＢおよびＣ軸のそれぞれの角度を制御する３つのモータと左肘関節６０ＬのＤ軸の角度を制御する１つのモータとを含む計４つのモータ（図３ではまとめて、「左腕モータ」として示す。）９０の回転角度を調節する。モータ制御ボード８２は、また、首関節６８のＳ，ＴおよびＵ軸のそれぞれの角度を制御する３つのモータ（図３ではまとめて、「頭部モータ」として示す。）９２の回転角度を調節する。モータ制御ボード８２は、また、腰モータ５０、および車輪３４を駆動する２つのモータ（図３ではまとめて、「車輪モータ」として示す。）３６を制御する。さらに、モータ制御ボード８２は、右眼球部７４Ｒのα軸およびβ軸のそれぞれの角度を制御する２つのモータ（図３ではまとめて、「右眼球モータ」として示す。）９４の回転角度を調節し、また、左眼球部７４Ｌのα軸およびβ軸のそれぞれの角度を制御する２つのモータ（図３ではまとめて、「左眼球モータ」として示す。）９６の回転角度を調節する。 The motor control board 82 is composed of, for example, a DSP (Digital Signal Processor) and controls a motor for driving body parts such as the right arm, the left arm, the head, and the eyes. That is, the motor control board 82 receives control data from the CPU 76, and controls the angles of the three motors for controlling the X, Y, and Z axes of the right shoulder joint 56R and the axis W of the right elbow joint 60R. The rotation angle of a total of four motors including one motor (collectively shown as “right arm motor” in FIG. 3) 88 is adjusted. The motor control board 82 includes a total of four motors including three motors that control the angles of the A, B, and C axes of the left shoulder joint 56L and one motor that controls the angle of the D axis of the left elbow joint 60L. The rotation angle of the motor (collectively shown as “left arm motor” in FIG. 3) 90 is adjusted. The motor control board 82 also adjusts the rotation angle of three motors 92 (collectively shown as “head motors” in FIG. 3) that control the angles of the S, T, and U axes of the neck joint 68. To do. The motor control board 82 also controls the waist motor 50 and the two motors 36 that drive the wheels 34 (collectively shown as “wheel motors” in FIG. 3). Further, the motor control board 82 adjusts the rotation angle of two motors 94 (collectively shown as “right eyeball motor” in FIG. 3) that control the angles of the α axis and β axis of the right eyeball portion 74R. In addition, the rotation angles of two motors 96 that collectively control the angles of the α axis and β axis of the left eyeball portion 74L (collectively shown as “left eyeball motor” in FIG. 3) 96 are adjusted.

なお、この第１実施例の上述のモータは、車輪モータ３６を除いて、制御を簡単化するためにそれぞれステッピングモータまたはパルスモータであるが、車輪モータ３６と同様に、直流モータであってよい。 The above-described motors of the first embodiment are stepping motors or pulse motors for simplifying the control, except for the wheel motors 36. However, like the wheel motors 36, they may be DC motors. .

センサ入力／出力ボード８４も、同様に、ＤＳＰで構成され、各センサやカメラからの信号を取り込んでＣＰＵ７６に与える。すなわち、超音波距離センサ４２の各々からの反射時間に関するデータがこのセンサ入力／出力ボード８４を通して、ＣＰＵ７６に入力される。また、全方位カメラ５２からの映像信号が、必要に応じてこのセンサ入力／出力ボード８４で所定の処理が施された後、ＣＰＵ７６に入力される。眼カメラ５４からの映像信号も、同様にして、ＣＰＵ７６に与えられる。また、タッチセンサ６４からの信号がセンサ入力／出力ボード８４を介してＣＰＵ７６に与えられる。 Similarly, the sensor input / output board 84 is also constituted by a DSP, and takes in signals from each sensor and camera and gives them to the CPU 76. That is, data relating to the reflection time from each of the ultrasonic distance sensors 42 is input to the CPU 76 through the sensor input / output board 84. The video signal from the omnidirectional camera 52 is input to the CPU 76 after being subjected to predetermined processing by the sensor input / output board 84 as necessary. Similarly, the video signal from the eye camera 54 is also supplied to the CPU 76. Further, a signal from the touch sensor 64 is given to the CPU 76 via the sensor input / output board 84.

スピーカ７２には音声入力／出力ボード８６を介して、ＣＰＵ７６から、合成音声データが与えられ、それに応じて、スピーカ７２からはそのデータに従った音声または声が出力される。また、マイク２４からの音声入力が、音声入力／出力ボード８６を介してＣＰＵ７６に取り込まれる。 Synthetic voice data is given to the speaker 72 from the CPU 76 via the voice input / output board 86, and accordingly, voice or voice according to the data is outputted from the speaker 72. Further, the voice input from the microphone 24 is taken into the CPU 76 via the voice input / output board 86.

また、ＣＰＵ７６には、バス７８を通して、通信ＬＡＮボード９８が接続される。この通信ＬＡＮボード９８も、同様に、ＤＳＰで構成され、ＣＰＵ７６から与えられた送信データを無線通信装置１００に与え、無線通信装置１００から送信データを送信させる。また、通信ＬＡＮボード９８は無線通信装置１００を介してデータを受信し、受信データをＣＰＵ７６に与える。 Further, a communication LAN board 98 is connected to the CPU 76 through the bus 78. Similarly, the communication LAN board 98 is also configured by a DSP, and sends the transmission data given from the CPU 76 to the wireless communication apparatus 100 and causes the wireless communication apparatus 100 to transmit the transmission data. The communication LAN board 98 receives data via the wireless communication device 100 and provides the received data to the CPU 76.

さらに、ＣＰＵ７６には、バス７８を介して、タグリーダ１０２、タグリーダ１０４およびデータベース１０６が接続される。ただし、データベース１０６は、ロボット１２内部に設ける必要はなく、ロボット１２と通信可能に、その外部に設けるようにすることもできる。タグリーダ１０２およびタグリーダ１０４は、上述したように、ロボット１２の手（球体６４Ｒおよび６４Ｌ）に設けられ、ロボット１２の周辺に存在する物体等に装着されたタグ１４が発信するタグ情報（タグＩＤ）を受信して、ＣＰＵ７６に与える。 Further, the tag reader 102, the tag reader 104, and the database 106 are connected to the CPU 76 via the bus 78. However, the database 106 does not need to be provided inside the robot 12 and can be provided outside the robot 12 so as to be communicable with the robot 12. As described above, the tag reader 102 and the tag reader 104 are provided in the hand of the robot 12 (spheres 64R and 64L), and tag information (tag ID) transmitted from the tag 14 attached to an object or the like existing around the robot 12 Is given to the CPU 76.

また、データベース１０６には、図４および図５に示すように、物品データ１０６ａ、動作候補データ１０６ｂおよび動作テンプレートデータ１０６ｃ等のデータが記憶される。図４（Ａ）に示すように、物品データ１０６ａは、タグ１４の識別情報（タグＩＤ）に対応して、該当するタグ１４が装着された物品（物体）の名称が記述される。つまり、物品データ１０６ａは、タグＩＤから物品を特定するためのテーブルデータである。また、図４（Ｂ）に示すように、動作候補データ１０６ｂは、物品の名称に対応して、動作の候補（動作候補）が記述される。たとえば、物品が「鉛筆」である場合には、動作候補として、「持つ」、「書く」、「見る」、「置く」などの動作が記述される。また、物品が「包丁」である場合には、「持つ」、「置く」、「切る」、「研ぐ」、「洗う」などの動作が記述される。たとえば、人間の近くに鉛筆がある場合には、人間は当該鉛筆を持ったり（握ったり）、当該鉛筆で文字等を書いたり、当該鉛筆で書いた内容を見たり、当該鉛筆を置いたりするなどの動作をすると考えられる。また、人間の近くに包丁がある場合には、人間は当該包丁を持ったり（握ったり）、当該包丁を置いたり、当該包丁で食物（肉、魚、野菜、果物など）を切ったり、当該包丁を研いだり、当該包丁を洗ったりするなどの動作をすると考えられる。つまり、動作候補データ１０６ｂは、物品（物体）から予測（推測）され得る人間の動作（行動）の候補を決定するためのテーブルデータである。 Further, as shown in FIGS. 4 and 5, the database 106 stores data such as article data 106a, motion candidate data 106b, motion template data 106c, and the like. As shown in FIG. 4A, the article data 106a describes the name of the article (object) to which the corresponding tag 14 is attached, corresponding to the identification information (tag ID) of the tag 14. That is, the article data 106a is table data for identifying an article from the tag ID. Further, as shown in FIG. 4B, the motion candidate data 106b describes motion candidates (motion candidates) corresponding to the names of articles. For example, when the article is “pencil”, operations such as “hold”, “write”, “see”, and “place” are described as motion candidates. When the article is a “knife”, operations such as “hold”, “place”, “cut”, “sharp”, “wash” are described. For example, if there is a pencil near a person, the person holds the pencil (holds it), writes characters with the pencil, looks at the contents written with the pencil, or places the pencil It is thought that the operation. If there is a knife near the person, the person holds the knife (holds it), puts the knife, cuts food (meat, fish, vegetables, fruits, etc.) with the knife, It is considered that the knife is sharpened or the knife is washed. In other words, the motion candidate data 106b is table data for determining candidates for human motion (behavior) that can be predicted (estimated) from an article (object).

また、図５に示すように、動作テンプレートデータ１０６ｃは、動作に対応して、当該動作を行う人間を撮影したときの映像（画像）ファイルのファイル名が記述される。図示は省略するが、データベース１０６には、実際の画像ファイルも記憶される。ただし、各動作に対応して、画像ファイルそのものを記述するようにしてもよい。 Also, as shown in FIG. 5, the action template data 106c describes the file name of a video (image) file when a person who performs the action is photographed, corresponding to the action. Although not shown, the database 106 also stores actual image files. However, the image file itself may be described corresponding to each operation.

たとえば、図６に示すように、或る部屋２００に、システム１０は適用される。ただし、図６では、簡単のため、ロボット１２および一部のタグ１４を示してある。また、図６に示すように、部屋２００には、人間２０２が存在し、その近傍にロボット１２が存在する。また、部屋２００には、机２０４が配置され、机２０４の上には、ノート２０６が載置される。たとえば、人間２０２は、鉛筆２０８を持って、ノート２０６に文字等を書いている。さらに、部屋２００の隅には、ごみ箱２１０が置いてある。 For example, as shown in FIG. 6, the system 10 is applied to a certain room 200. However, in FIG. 6, the robot 12 and some tags 14 are shown for simplicity. Further, as shown in FIG. 6, a person 202 exists in the room 200 and the robot 12 exists in the vicinity thereof. A desk 204 is disposed in the room 200, and a notebook 206 is placed on the desk 204. For example, a person 202 holds a pencil 208 and writes a character or the like on a notebook 206. Further, a trash can 210 is placed in the corner of the room 200.

また、上述したように、部屋２００に存在する物品（物体）には、それぞれタグ１４が装着される。図６では、机２０４のみにタグ１４が装着されている様子を示してあるが、実際には、ノート２０６、鉛筆２０８およびごみ箱２１０にもタグ１４は装着される。さらに、部屋２００自体（または、屋内や屋外）を識別する場合には、つまり場所を識別する場合には、当該部屋２００の入り口や壁等にタグ１４を装着するようにしてもよい。 Further, as described above, the tag 14 is attached to each article (object) present in the room 200. Although FIG. 6 shows that the tag 14 is attached only to the desk 204, the tag 14 is actually attached to the notebook 206, the pencil 208, and the trash can 210. Further, when identifying the room 200 itself (or indoor or outdoor), that is, when identifying the place, the tag 14 may be attached to the entrance or wall of the room 200.

たとえば、システム１０は、人間２０２の動作（行動）を識別（特定）する。具体的には、図３に示したＣＰＵ７６が図７および図８示す動作識別処理を実行する。図７に示すように、ＣＰＵ７６は動作識別処理を開始すると、ステップＳ１で、タグリーダ１０２およびタグリーダ１０４を動かし、タグＩＤを検出する。具体的には、眼カメラ５４の撮影画像（映像）から人間２０２を検出し、人間２０２に接近し、ロボット１２の近傍に存在する物品（厳密には、タグＩＤ）を検出する。ただし、この第１実施例では、人間２０２の動作を識別するようにしてあるため、厳密には、その近傍（たとえば、１〜２ｍ程度で、人間の手の届く範囲）に存在する物品を検出するようにしてある。 For example, the system 10 identifies (specifies) the action (action) of the human 202. Specifically, the CPU 76 shown in FIG. 3 executes the operation identification process shown in FIGS. As shown in FIG. 7, when the CPU 76 starts the operation identification process, the tag reader 102 and the tag reader 104 are moved in step S1 to detect the tag ID. Specifically, the person 202 is detected from the captured image (video) of the eye camera 54, approaches the person 202, and detects an article (strictly, a tag ID) that exists in the vicinity of the robot 12. However, in the first embodiment, since the movement of the human 202 is identified, strictly speaking, an article existing in the vicinity (for example, within a range of about 1 to 2 m and within the reach of a human hand) is detected. I have to do it.

また、人間２０２の近傍に存在する物品および人間２０２の撮影画像（映像）から人間２０２の動作を識別（特定）するようにしてあるため、ロボット１２を人間２０２に近づけるようにしてある。詳細な説明は省略するが、ロボット１２は、眼カメラ５４の撮影画像（映像）から人間２０２をパターンマッチングの手法により検出（推定）し、その方向に進行する。或いは、撮影画像（映像）に含まれる肌色領域を検出すると、その肌色領域が大きくなる方向に進行（または後退）或いは回転する。このようにして、ロボット１２は、人間２０２に近づくことができる。 In addition, since the motion of the human 202 is identified (specified) from an article existing in the vicinity of the human 202 and a captured image (video) of the human 202, the robot 12 is brought closer to the human 202. Although detailed description is omitted, the robot 12 detects (estimates) the human 202 from the captured image (video) of the eye camera 54 by a pattern matching method, and proceeds in that direction. Alternatively, when a skin color area included in the photographed image (video) is detected, the skin color area advances (or moves backward) or rotates in a direction in which the skin color area increases. In this way, the robot 12 can approach the human 202.

ただし、人間２０２に近づいたか否かの判断には、超音波距離センサ４２の検出結果（距離）を用いてもよく、眼カメラ５４の撮影画像と超音波距離センサ４２の検出結果とを用いるようにしてもよい。 However, the detection result (distance) of the ultrasonic distance sensor 42 may be used to determine whether or not the person 202 is approached, and the captured image of the eye camera 54 and the detection result of the ultrasonic distance sensor 42 are used. It may be.

ロボット１２は、人間２０２に近づくと、その両手、すなわち球体６６Ｒおよび球体６６Ｌに設けられる、タグリーダ１０２およびタグリーダ１０４によって受信されるタグＩＤを検出する。この第１実施例では、両手（腕）すなわち肩関節５６Ｒ、５６Ｌおよび肘関節６０Ｒ、６０Ｌを動かすことにより、タグリーダ１０２およびタグリーダ１０４を自在に動かし、タグリーダ１０２またはタグリーダ１０４によってタグＩＤを検出する。 When the robot 12 approaches the human 202, it detects a tag ID received by the tag reader 102 and the tag reader 104 provided in both hands, that is, the sphere 66R and the sphere 66L. In the first embodiment, the tag reader 102 and the tag reader 104 are moved freely by moving both hands (arms), that is, the shoulder joints 56R and 56L and the elbow joints 60R and 60L, and the tag ID is detected by the tag reader 102 or the tag reader 104.

続くステップＳ３では、タグＩＤを検出したかどうかを判断する。ステップＳ３で“ＮＯ”であれば、つまりタグＩＤを検出していなければ、ステップＳ１に戻って、タグリーダ１０２またはタグリーダ１０４或いはその両方を動かし、タグＩＤを検出する。一方、ステップＳ３で“ＹＥＳ”であれば、つまりタグＩＤを検出すれば、ステップＳ５で、当該タグＩＤを検出したタグリーダ１０２または１０４（両方の場合には、いずれか一方でよい。）と眼カメラ５４との距離を算出し、メモリ８０に記憶（一時記憶）する。ここで、眼カメラ５４の位置および両手（上腕５８Ｒ、５８Ｌおよび前腕６２Ｒ、６２Ｌなど）の長さは固定であり、タグリーダ１０２およびタグリーダ１０４は球体６６Ｒおよび６６Ｌに装着されているため、肩関節５６Ｒ、５６Ｌおよび肘関節６０Ｒ、６０Ｌの角度を考慮して、上記距離は算出される。 In a succeeding step S3, it is determined whether or not the tag ID is detected. If “NO” in the step S3, that is, if the tag ID is not detected, the process returns to the step S1, the tag reader 102 or the tag reader 104 or both are moved, and the tag ID is detected. On the other hand, if “YES” in the step S3, that is, if a tag ID is detected, the tag reader 102 or 104 (in either case, either may be used) and the eye that have detected the tag ID in a step S5. The distance to the camera 54 is calculated and stored (temporarily stored) in the memory 80. Here, the position of the eye camera 54 and the lengths of both hands (such as the upper arms 58R and 58L and the forearms 62R and 62L) are fixed, and the tag reader 102 and the tag reader 104 are attached to the spheres 66R and 66L. , 56L and the angles of the elbow joints 60R, 60L are calculated.

また、眼カメラ５４とタグリーダ１０２，１０４との距離を計測するのは、上述したように、タグ１４はパッシブタイプのものであり、眼カメラ５４と物品（物体）とのおおよその距離を計測し、最終的に、人間２０２と物品との距離を推定するためである。この結果と、上述したように、肌色領域が大きくなるように前進等することにより、人間２０２が所持（または装着）する、または近傍に存在する物品のタグＩＤを検出することができるのである。したがって、図示は省略するが、タグＩＤを検出した場合であっても、ステップＳ１において移動したときに、肌色領域が検出されなかったり、肌色領域が検出されたが、比較的その領域が小さく、ロボット１２と人間２０２との距離が遠いと判断されたりした場合には、検出したタグＩＤは検出結果から排除（リジェクト）するようにしてある。これは、人間２０２の動作を正確に識別するためである。 The distance between the eye camera 54 and the tag readers 102 and 104 is measured as described above, because the tag 14 is of a passive type, and the approximate distance between the eye camera 54 and an article (object) is measured. This is because the distance between the person 202 and the article is finally estimated. As a result of this, and as described above, the tag ID of the article that the person 202 possesses (or wears) or exists in the vicinity can be detected by moving forward so that the skin color area becomes larger. Therefore, although illustration is omitted, even when the tag ID is detected, when moving in step S1, the skin color area is not detected or the skin color area is detected, but the area is relatively small, When it is determined that the distance between the robot 12 and the human 202 is long, the detected tag ID is excluded (rejected) from the detection result. This is to accurately identify the operation of the human 202.

続いて、ステップＳ７では、両手（肩関節５６Ｒ、５６Ｌ、肘関節６０Ｒ、６０Ｌ）を動かして、タグリーダ１０２およびタグリーダ１０４を、眼カメラ５４に写らない位置に移動させる。ただし、タグリーダ１０２、１０４のみならず、両手も写らない位置に移動させる方が好ましい。これは、後述するように、眼カメラ５４で人間２０２を撮影し、人間２０２の動作を識別するようにしてあるためである。つまり、できる限り、眼カメラ５４によって人間２０２を撮影させるためである。 Subsequently, in step S7, both hands (shoulder joints 56R and 56L, elbow joints 60R and 60L) are moved to move the tag reader 102 and the tag reader 104 to positions that are not reflected in the eye camera 54. However, it is preferable to move not only the tag readers 102 and 104 but also a position where both hands are not captured. This is because, as will be described later, the human 202 is photographed by the eye camera 54 and the operation of the human 202 is identified. That is, this is because the human 202 is photographed by the eye camera 54 as much as possible.

続くステップＳ９では、検出したタグＩＤ、検出時刻および検出場所を記憶する。図３では省略したが、検出時刻は、ロボット１２内部に設けられる時計回路（タイマ）から取得される。また、検出場所は、場所毎に予め割り当てたタグを設置しておき、このタグＩＤを検出すれば、物品同様に、場所を特定することができる。続くステップＳ１１では、撮影画像から肌色が検出されたかどうかを判断する。ステップＳ１１で“ＮＯ”であれば、つまり撮影画像から肌色が検出されなければ、そのままステップＳ１５に進む。一方、ステップＳ１１で“ＹＥＳ”であれば、つまり撮影画像から肌色（領域）が検出されると、ステップＳ１３で、当該肌色領域が撮影画像の中心にくるように、眼カメラ５４を方向転換する。これは、人間２０２（の動作）を撮影して、正確に動作を識別するためである。ただし、眼カメラ５４のみならず、ロボット１２全体の方向も変化（旋回）させるようにしてもよい。このようにすれば、人間２０２およびその動作を撮影することができる。 In the subsequent step S9, the detected tag ID, detection time, and detection location are stored. Although omitted in FIG. 3, the detection time is acquired from a clock circuit (timer) provided in the robot 12. Moreover, as for the detection location, if a tag assigned in advance for each location is installed and this tag ID is detected, the location can be specified in the same manner as the article. In a succeeding step S11, it is determined whether or not the skin color is detected from the photographed image. If “NO” in the step S11, that is, if the skin color is not detected from the photographed image, the process proceeds to a step S15 as it is. On the other hand, if “YES” in the step S11, that is, if a skin color (region) is detected from the photographed image, the eye camera 54 is turned in a step S13 so that the skin color region is at the center of the photographed image. . This is for photographing the human 202 (the operation thereof) and accurately identifying the operation. However, not only the eye camera 54 but also the direction of the entire robot 12 may be changed (turned). In this way, the person 202 and its operation can be photographed.

なお、この第１実施例では、撮影画像から肌色領域が検出された場合には、何ら処理を施さないようにしてあるが、人間２０２（肌色領域）を検出すべく、ロボット１２を旋回させるようにしてもよい。 In the first embodiment, when the skin color area is detected from the photographed image, no processing is performed. However, the robot 12 is turned to detect the human 202 (skin color area). It may be.

続くステップＳ１５では、現在の場所（現在位置）において、タグＩＤをすべて読み取ったかどうかを判断する。つまり、ロボット１２を移動させたり、旋回させたり、両手を動かしたりして、人間２０２の周囲に存在する物品（タグＩＤ）をすべて検出したかどうかを判断する。ステップＳ１５で“ＮＯ”であれば、つまり、現在の場所において、タグＩＤをすべて読み取っていなければ、そのままステップＳ１に戻る。しかし、ステップＳ１５で“ＹＥＳ”であれば、つまり現在の場所において、タグＩＤをすべて読み取れば、図８に示すステップＳ１７で、検出した１または２以上のタグＩＤに基づいて、動作候補パターンを決定する。 In subsequent step S15, it is determined whether or not all tag IDs have been read at the current location (current location). That is, it is determined whether or not all articles (tag IDs) existing around the human 202 have been detected by moving the robot 12, turning it, or moving both hands. If “NO” in the step S15, that is, if all the tag IDs are not read at the current location, the process returns to the step S1 as it is. However, if “YES” in the step S15, that is, if all the tag IDs are read at the current location, the motion candidate pattern is determined based on the one or more tag IDs detected in the step S17 shown in FIG. decide.

具体的には、ＣＰＵ７６は、物品データ１０６ａを参照して、検出したタグＩＤに対応する１または２以上の物品を特定する。続いて、ＣＰＵ７６は、動作候補データ１０６ｂを参照して、特定した物品に対応する動作候補パターンを決定する。ただし、２以上の物品が特定された場合には、各物品についての動作候補パターンであって、すべての物品において重複する動作候補パターンのみが決定される。たとえば、物品として、「包丁」と「まな板」とが検出された場合には、動作候補としては、「持つ」、「置く」、「切る」、「洗う」が決定される。つまり、２つの物品で重複していない「研ぐ」についての動作候補パターンが排除される。 Specifically, the CPU 76 refers to the article data 106a and identifies one or more articles corresponding to the detected tag ID. Subsequently, the CPU 76 refers to the motion candidate data 106b and determines a motion candidate pattern corresponding to the identified article. However, when two or more articles are specified, only the candidate action patterns for each article and overlapping in all articles are determined. For example, when “knife” and “cutting board” are detected as articles, “hold”, “place”, “cut”, and “wash” are determined as motion candidates. That is, motion candidate patterns for “sharpening” that do not overlap in two articles are eliminated.

続いて、ステップＳ１９では、各動作候補パターンと撮影画像との類似度を計算する。ここでは、ＣＰＵ７６は、各動作候補パターンと撮影画像との類似度距離を計算する。たとえば、人間２０２に取り付けられたマーカの軌跡と、各動作候補パターンにおいて対応するマーカの軌跡について、「動的計画法（Dynamic Programming）」によって距離を計算する。 Subsequently, in step S19, the similarity between each motion candidate pattern and the captured image is calculated. Here, the CPU 76 calculates the similarity distance between each motion candidate pattern and the captured image. For example, the distance between the marker trajectory attached to the human 202 and the corresponding marker trajectory in each motion candidate pattern is calculated by “dynamic programming”.

なお、この「動的計画法（Dynamic Programming）」は、既に周知であり、また、本願発明の本質的部分ではないため、詳細な説明は省略するが、その内容については、たとえば、「高橋勝彦, 関進, 岡隆一. ジェスチャ動画像のスポッティング認識. 信学技報 PRU92-157, pp. 9-16, 3 1993.」に開示されている。 This “Dynamic Programming” is already well-known and is not an essential part of the present invention, so detailed description will be omitted. For example, “Katsuhiko Takahashi” , Seki Susumu, Oka Ryuichi. Spotting recognition of gesture moving images. IEICE Technical Report PRU92-157, pp. 9-16, 3 1993.

続くステップＳ２１では、動作を特定する。具体的には、ステップＳ１９で計算した類似度距離が最も小さい（最も類似度が高い）動作候補パターンについての動作を、人間２０２の動作として特定する。次のステップＳ２３では、特定した動作、時刻および場所を記憶する。たとえば、メモリ８０またはデータベース１０６に行動履歴のテーブルデータを記憶しておき、これに特定した動作、時刻および場所を記憶するようにすればよい。ただし、ステップＳ２３で記憶する時間および場所は、ステップＳ９で記憶した検出時間および検出場所である。そして、ステップＳ２５で、動作識別結果を出力して、動作識別処理を終了する。たとえば、ステップＳ２５では、ロボット１２のスピーカ７２から識別した動作を音声で出力することができる。ただし、ロボット１２と通信可能にコンピュータを接続しておき、識別結果を当該コンピュータに送信するようにすれば、当該コンピュータに接続されるモニタ（図示せず）等の画像表示装置に識別した動作をテキスト表示したり、当該コンピュータに接続されるスピーカ（図示せず）から識別した動作を音声で出力したりすることもできる。 In the subsequent step S21, the operation is specified. Specifically, the motion of the motion candidate pattern having the smallest similarity distance (highest similarity) calculated in step S19 is specified as the motion of the human 202. In the next step S23, the specified operation, time and place are stored. For example, action history table data may be stored in the memory 80 or the database 106, and the operation, time, and place specified may be stored. However, the time and place stored in step S23 are the detection time and detection place stored in step S9. In step S25, the operation identification result is output, and the operation identification process is terminated. For example, in step S25, the action identified from the speaker 72 of the robot 12 can be output by voice. However, if the computer is connected so as to be communicable with the robot 12 and the identification result is transmitted to the computer, the operation identified by the image display device such as a monitor (not shown) connected to the computer is performed. It is also possible to display text or to output an action identified from a speaker (not shown) connected to the computer.

この第１実施例によれば、人間の近傍に存在する物品を検出し、物品から動作候補を決定して、動作候補の中から人間の動作を識別するので、すべての動作候補の中から人間の動作を識別する場合よりも大幅に処理を低減することができる。 According to the first embodiment, an article existing in the vicinity of a person is detected, an action candidate is determined from the article, and a person's action is identified from the action candidates. The processing can be greatly reduced as compared with the case of identifying the operation.

また、この第１実施例によれば、映像のみならず、物品に基づいて動作を識別するため、正確に識別することができる。つまり、動作候補を絞り込むので、識別制度を高くすることができる。 Further, according to the first embodiment, since the operation is identified not only based on the image but also based on the article, it can be accurately identified. That is, since the operation candidates are narrowed down, the identification system can be increased.

なお、この第１実施例では、特定した動作とともに、その動作を検出（撮影）した時間および場所とともに記録しておくため、その記録内容は、或る人間についての行動メモとして用いることもできる。
＜第２実施例＞
図９に示す第２実施例のシステム１０は、人間の動作候補をロボット１２と通信可能に設けたサーバで検出するようにした以外は、上述の実施例と同様であるため、重複した説明は省略する。 In the first embodiment, the recorded action is recorded along with the time and place where the action is detected (photographed) together with the specified action, so that the recorded contents can be used as an action memo about a certain person.
<Second embodiment>
The system 10 of the second embodiment shown in FIG. 9 is the same as the above-described embodiment except that a human motion candidate is detected by a server provided to be able to communicate with the robot 12, and therefore, a duplicate description is not provided. Omitted.

図９を参照して、第２実施例のシステム１０では、ロボット１２はネットワーク２０を介してサーバ２２と通信可能に接続される。サーバ２２としては、汎用のサーバを用いることができ、サーバに代えて、汎用のパーソナルコンピュータやワークステーションを用いることもできる。また、ネットワーク２０は、有線または無線のいずれで構築されてもよい。また、システム１０では、サーバ２２にデータベース２４が接続される。このデータベース２４に、第１実施例で示した物品データ１０６ａおよび動作候補データ１０６ｂが記憶され、サーバ２２は、ロボット１２からの問い合わせに応じて動作候補を検出し、検出した動作候補をロボット１２に通知（送信）するのである。 Referring to FIG. 9, in system 10 of the second embodiment, robot 12 is connected to server 22 via network 20 so as to be communicable. A general-purpose server can be used as the server 22, and a general-purpose personal computer or workstation can be used instead of the server. Further, the network 20 may be constructed by either wired or wireless. In the system 10, a database 24 is connected to the server 22. The article data 106a and the motion candidate data 106b shown in the first embodiment are stored in the database 24, and the server 22 detects motion candidates in response to an inquiry from the robot 12, and the detected motion candidates are stored in the robot 12. Notify (send).

具体的なロボット１２（ＣＰＵ７６）の動作識別処理は、図７および図８のフロー図で示した動作識別処理とほぼ同じであるため、異なる処理についてのみ説明することにする。また、図７に示した処理は同じであるため、図示は省略してある。 Since the specific operation identification process of the robot 12 (CPU 76) is almost the same as the operation identification process shown in the flowcharts of FIGS. 7 and 8, only different processes will be described. Since the processing shown in FIG. 7 is the same, the illustration is omitted.

図１０に示すように、ＣＰＵ７６は、ステップＳ１７´で、検出した１または２以上のタグＩＤをサーバ２２に送信する。つまり、動作候補を問い合わせる。ここで、図示は省略するが、サーバ２２は、ロボット１２から１または２以上のタグＩＤを受信すると、タグＩＤに基づいて、動作候補パターンを決定し、決定した動作候補パターンをロボット１２に送信する。具体的には、サーバ１２は、物品データ１０６ａを参照して、ロボット１２から受信したタグＩＤに対応する１または２以上の物品を特定する。続いて、サーバ２２は、動作候補データ１０６ｂを参照して、特定した物品に対応する動作候補パターンを決定する。ただし、２以上の物品が特定された場合には、各物品についての動作候補パターンであって、すべての物品において重複する動作候補パターンのみが決定される。たとえば、物品として、「包丁」と「まな板」とが検出された場合には、動作候補としては、「持つ」、「置く」、「切る」、「洗う」が決定される。つまり、２つの物品で重複していない「研ぐ」についての動作候補パターンが排除される。 As shown in FIG. 10, the CPU 76 transmits the detected one or more tag IDs to the server 22 in step S <b> 17 ′. That is, an operation candidate is inquired. Here, although illustration is omitted, when the server 22 receives one or more tag IDs from the robot 12, the server 22 determines an operation candidate pattern based on the tag ID and transmits the determined operation candidate pattern to the robot 12. To do. Specifically, the server 12 refers to the article data 106a and specifies one or more articles corresponding to the tag ID received from the robot 12. Subsequently, the server 22 refers to the motion candidate data 106b and determines a motion candidate pattern corresponding to the identified article. However, when two or more articles are specified, only the candidate action patterns for each article and overlapping in all articles are determined. For example, when “knife” and “cutting board” are detected as articles, “hold”, “place”, “cut”, and “wash” are determined as motion candidates. That is, motion candidate patterns for “sharpening” that do not overlap in two articles are eliminated.

図１０に戻って、ステップＳ１８では、動作候補パターンを受信したかどうかを判断する。ステップＳ１８で“ＮＯ”であれば、つまり動作候補パターンを受信していなければ、同じステップＳ１８に戻って、動作候補パターンの受信を待機する。一方、ステップＳ１８で“ＹＥＳ”であれば、つまり動作候補パターンを受信すれば、ステップＳ１９で、各動作候補パターンと撮影画像との類似度を計算する。これ以降の処理は、図８を用いて説明した場合と同じである。 Returning to FIG. 10, in step S18, it is determined whether an operation candidate pattern has been received. If “NO” in the step S18, that is, if the motion candidate pattern has not been received, the process returns to the same step S18 and waits for the reception of the motion candidate pattern. On the other hand, if “YES” in the step S18, that is, if an operation candidate pattern is received, the similarity between each operation candidate pattern and the photographed image is calculated in a step S19. The subsequent processing is the same as that described with reference to FIG.

この第２実施例によれば、人間の動作を少ない処理で確実に識別することができ、さらに、サーバ側で動作候補を検出するので、ロボットの処理負担を低減することができる。
＜第３実施例＞
第３実施例のシステム１０は、サーバ２２側で人間２０２の動作を識別するようにした以外は、第２実施例のシステム１０と同じであるため、重複した説明は省略する。図示は省略するが、この第３実施例のシステム１０では、サーバ２２に接続されるデータベース２４に、物品データ１０６ａ、動作候補データ１０６ｂおよび動作テンプレートデータ１０６ｃを記憶する。したがって、ロボット１２内部に設けるデータベース１０６は削除することができる。 According to the second embodiment, human movements can be reliably identified with a small amount of processing, and further, since the motion candidates are detected on the server side, the processing load on the robot can be reduced.
<Third embodiment>
Since the system 10 of the third embodiment is the same as the system 10 of the second embodiment except that the operation of the human 202 is identified on the server 22 side, the duplicated explanation is omitted. Although illustration is omitted, in the system 10 of the third embodiment, the article data 106a, the motion candidate data 106b, and the motion template data 106c are stored in the database 24 connected to the server 22. Therefore, the database 106 provided in the robot 12 can be deleted.

具体的には、第３実施例のシステム１０では、ロボット１２は、人間２０２が所持するまたは人間２０２の近傍に存在する物品のタグＩＤを検出し、また、そのときの人間２０２の撮影画像を取得する。そして、検出したタグＩＤと撮影画像とをサーバ２２に送信する。サーバ２２では、上述の第２実施例で説明したように、タグＩＤに基づいて動作候補パターンを検出し、検出した動作候補パターンのそれぞれと撮影画像との類似度を計算し、当該撮影画像が示す人間２０２の動作を特定する。 Specifically, in the system 10 of the third embodiment, the robot 12 detects the tag ID of an article possessed by the human 202 or present in the vicinity of the human 202, and also displays a captured image of the human 202 at that time. get. Then, the detected tag ID and captured image are transmitted to the server 22. As described in the second embodiment, the server 22 detects motion candidate patterns based on the tag ID, calculates the similarity between each detected motion candidate pattern and the captured image, and the captured image is The action of the human 202 shown is specified.

したがって、図７および図１０で示される動作識別処理において、ステップＳ１〜ステップＳ１７´までの処理をロボット１２側で実行し、ステップＳ１８〜ステップＳ２５までの処理をサーバ２２側で実行するようにすればよい。ただし、ステップＳ１７´では、検出したタグＩＤおよび撮影画像をサーバ２２に送信する。 Therefore, in the action identification process shown in FIGS. 7 and 10, the process from step S1 to step S17 ′ is executed on the robot 12 side, and the process from step S18 to step S25 is executed on the server 22 side. That's fine. However, in step S17 ′, the detected tag ID and captured image are transmitted to the server 22.

なお、詳細な説明は省略するが、第１実施例で説明したように、ステップＳ２３で記憶する時間および場所は、ステップＳ９で記憶した検出時間および検出場所であるため、この第３実施例では、ステップＳ１７´では、ステップＳ９で記憶した検出時間および検出場所もサーバ２２に送信される。 Although detailed description is omitted, since the time and place stored in step S23 are the detection time and place stored in step S9 as described in the first embodiment, in this third embodiment, In step S17 ′, the detection time and detection location stored in step S9 are also transmitted to the server 22.

この第３実施例においても、人間の動作を少ない処理で確実に識別することができ、さらに、サーバ側で動作識別処理を実行するので、ロボットの処理負担を低減することができる。 Also in the third embodiment, it is possible to reliably identify human movements with a small amount of processing, and furthermore, since the movement identification processing is executed on the server side, the processing load on the robot can be reduced.

なお、上述の実施例では、いずれも、パッシブタイプのタグを用いたが、他の実施例として、アクティブタイプのタグ（赤外線タグなど）を用いることができる。かかる場合には、たとえば、各物品に赤外線タグを設けておき、一方、２台のカメラ（赤外線カメラ（センサ））をロボット１２の頭部（眼カメラ５４の上部）に設ける。そして、２台のカメラの検出結果（撮影画像）から三角測量を行うことにより、ロボット１２と物品との距離またはロボット１２から見た物品の位置（３次元位置）を計測（算出）する。同様に、眼カメラ５４の検出結果（撮影画像）から三角測量を行うことにより、ロボット１２と人間２０２との距離またはロボット１２から見た人間２０２の３次元位置を算出する。これにより、人間２０２と物品との距離（位置）を測定できる。つまり、人間２０２の近傍に存在する物品を検出することができる。なお、物品を検出した後の処理は、上述の実施例と同様である。 In each of the above-described embodiments, a passive type tag is used. However, as another embodiment, an active type tag (such as an infrared tag) can be used. In such a case, for example, an infrared tag is provided on each article, while two cameras (infrared cameras (sensors)) are provided on the head of the robot 12 (above the eye camera 54). Then, the distance between the robot 12 and the article or the position (three-dimensional position) of the article viewed from the robot 12 is measured (calculated) by performing triangulation from the detection results (captured images) of the two cameras. Similarly, by performing triangulation from the detection result (captured image) of the eye camera 54, the distance between the robot 12 and the human 202 or the three-dimensional position of the human 202 viewed from the robot 12 is calculated. Thereby, the distance (position) between the person 202 and the article can be measured. That is, an article existing in the vicinity of the person 202 can be detected. The processing after the article is detected is the same as that in the above-described embodiment.

図１はこの発明のコミュニケーションロボットを用いた動作識別システムの一例を示す図解図である。FIG. 1 is an illustrative view showing one example of an action identification system using the communication robot of the present invention. 図２は図１実施例に示すロボットの外観を説明するための図解図である。FIG. 2 is an illustrative view for explaining the appearance of the robot shown in FIG. 1 embodiment. 図３は図１および図２に示すロボットの電気的な構成を示す図解図である。FIG. 3 is an illustrative view showing an electrical configuration of the robot shown in FIGS. 1 and 2. 図４はロボットに内蔵されるデータベースに記憶される物品データおよび動作候補データの例を示す図解図である。FIG. 4 is an illustrative view showing an example of article data and motion candidate data stored in a database built in the robot. 図５はロボットに内蔵されるデータベースに記憶される動作テンプレートデータの例を示す図解図である。FIG. 5 is an illustrative view showing an example of operation template data stored in a database built in the robot. 図６は図１に示すシステムの適用例を示す図解図である。FIG. 6 is an illustrative view showing an application example of the system shown in FIG. 図７は図３に示すＣＰＵの動作識別処理の一部を示すフロー図である。FIG. 7 is a flowchart showing a part of the operation identification process of the CPU shown in FIG. 図８は図３に示すＣＰＵの動作識別処理の他の一部であり、図７のフロー図に後続するフロー図である。FIG. 8 is another part of the CPU operation identification process shown in FIG. 3, and is a flowchart subsequent to the flowchart of FIG. 図９はこの発明のコミュニケーションロボットを用いた動作識別システムの他の例を示す図解図である。FIG. 9 is an illustrative view showing another example of the motion identification system using the communication robot of the present invention. 図１０は他の実施例の動作識別処理の一部を示すフロー図である。FIG. 10 is a flowchart showing a part of the operation identification processing of another embodiment.

Explanation of symbols

１０ …コミュニケーションロボットを用いた動作識別システム
１２ …コミュニケーションロボット
１４ …タグ
１６ …マイク
２０ …ネットワーク
２２ …サーバ
２４，１０６ …データベース
３８ …衝突センサ
４２ …超音波距離センサ
５２ …全方位カメラ
５４ …眼カメラ
６４ …タッチセンサ
７６ …ＣＰＵ
８０ …メモリ
８２ …モータ制御ボード
８４ …センサ入力／出力ボード
８６ …音声入力／出力ボード
８８−９６ …モータ
９８ …通信ＬＡＮボード
１００ …無線通信装置
１０２，１０４ …タグリーダ DESCRIPTION OF SYMBOLS 10 ... Motion identification system using communication robot 12 ... Communication robot 14 ... Tag 16 ... Microphone 20 ... Network 22 ... Server 24, 106 ... Database 38 ... Collision sensor 42 ... Ultrasonic distance sensor 52 ... Omnidirectional camera 54 ... Eye camera 64 ... touch sensor 76 ... CPU
DESCRIPTION OF SYMBOLS 80 ... Memory 82 ... Motor control board 84 ... Sensor input / output board 86 ... Audio | voice input / output board 88-96 ... Motor 98 ... Communication LAN board 100 ... Wireless communication apparatus 102,104 ... Tag reader

Claims

Photography means for photographing humans,
Identification information detecting means for detecting identification information about an object existing in the vicinity of the human,
A motion prediction unit that predicts the human motion based on the identification information detected by the identification information detection unit; and the human actual motion and the motion prediction unit indicated by the captured image captured by the imaging unit. A communication robot provided with motion specifying means for specifying the actual motion from the predicted motion.

A degree of similarity calculating means for calculating the degree of similarity between the human actual motion indicated by the captured image and each of the predicted motions;
The communication robot according to claim 1, wherein the action specifying unit specifies the predicted action having the highest similarity calculated by the similarity calculation unit as the actual action.

An operation identification system comprising a communication robot and a server provided to be communicable with the communication robot,
The communication robot is
Photography means for photographing humans,
Identification information detecting means for detecting identification information about an object existing in the vicinity of the human,
Identification information transmitting means for transmitting the detected identification information to the server;
Predicted motion information receiving means for receiving the predicted motion information of the human from the server; and the actual motion of the human indicated by the captured image captured by the capturing means and the predicted motion information received by the predicted motion information receiving means; Comprises an action specifying means for specifying the actual action from
The server
An identification information receiving means for receiving the identification information transmitted to the identification information transmitting means; and a prediction for transmitting the human predicted motion information acquired based on the identification information received by the identification information receiving means to the communication robot An operation identification system comprising operation information transmitting means.

An operation identification system comprising a communication robot and a server provided to be communicable with the communication robot,
The communication robot is
Photography means for photographing humans,
An identification information detecting means for detecting identification information about an object existing in the vicinity of the human, and a transmission for transmitting to the server a photographed image photographed by the photographing means and the identification information detected by the identification information detecting means With means,
The server
Receiving means for receiving the captured image and identification information transmitted to the transmitting means;
Motion predicting means for predicting the human motion based on the identification information received by the receiving means;
An action identification system comprising action specifying means for specifying the actual action from the actual human action indicated by the captured image received by the identification information receiving means and the predicted action predicted by the action predicting means.