JP2004299006A

JP2004299006A - Robot device and face recognition state presenting method by robot

Info

Publication number: JP2004299006A
Application number: JP2003096135A
Authority: JP
Inventors: Ken Yamagishi; 建山岸; Kunihito Sawai; 邦仁沢井; Yasunori Kawanami; 康範川浪
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2003-03-31
Filing date: 2003-03-31
Publication date: 2004-10-28

Abstract

<P>PROBLEM TO BE SOLVED: To easily indicate whether a face that is a target of face recognition is in a visual field of a robot device, and to present a face recognition state. <P>SOLUTION: A cover portion 410 of an eye portion is made to be a mirror finished surface. When a face portion 450 and a body portion 451 of a user is reflected in a left side in a front view on the mirror finished surface of the cover portion 410, the face is not recognized ((a) in Fig.). When the face portion 450 and the body portion 451 of the user is reflected in a right side in the front view on the mirror finished surface of a bar portion 410, the face is also not recognized ((b) in Fig.). However, when a head portion is overlapped with a hole portion 412 so as not to be visually recognized by the user through the body portion 451 is reflected in a center lower part of the mirror finished surface, the face is recognized ((c) in Fig.). Therefore, the state of (c) in the Fig. indicates that the face portion 451 is within a visual field of a camera lens of a robot. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、小型カメラを視覚センサとして有し、外部環境に存在する人間の顔を認識するロボット装置及びロボット装置による顔認識状況の提示方法に関する。
【０００２】
【従来の技術】
電気的若しくは磁気的な作用を用いて人間の動作に似せた運動を行う機械装置のことを「ロボット」という。ロボットの語源は、スラブ語の”ＲＯＢＯＴＡ（奴隷機械）”に由来すると言われている。わが国では、ロボットが普及し始めたのは１９６０年代末からであるが、その多くは、工場における生産作業の自動化・無人化などを目的としたマニピュレータや搬送ロボットなどの産業用ロボット（ｉｎｄｕｓｔｒｉａｌｒｏｂｏｔ）であった。
【０００３】
最近では、イヌやネコ、クマのように４足歩行の動物の身体メカニズムやその動作を模したペット型ロボット、あるいは、ヒトやサルなどの２足直立歩行を行う動物の身体メカニズムや動作を模した「人間形」若しくは「人間型」のロボット（ｈｕｍａｎｏｉｄｒｏｂｏｔ）など、脚式移動ロボットの構造やその安定歩行制御に関する研究開発が進展し、実用化への期待も高まってきている。これら脚式移動ロボットは、クローラ式ロボットに比し不安定で姿勢制御や歩行制御が難しくなるが、階段の昇降や障害物の乗り越えなど、柔軟な歩行・走行動作を実現できるという点で優れている。
【０００４】
アーム式ロボットのように、ある特定の場所に植設して用いるような据置きタイプのロボットは、部品の組立・選別作業など固定的・局所的な作業空間でのみ活動する。これに対し、移動式のロボットは、作業空間は非限定的であり、所定の経路上または無経路上を自在に移動して、所定の若しくは任意の人的作業を代行したり、ヒトやイヌあるいはその他の生命体に置き換わる種々のサービスを提供することができる。
【０００５】
脚式移動ロボットの用途の１つとして、産業活動・生産活動等における各種の難作業の代行が挙げられる。例えば、原子力発電プラントや火力発電プラント、石油化学プラントにおけるメンテナンス作業、製造工場における部品の搬送・組立作業、高層ビルにおける清掃、火災現場その他における救助といったような危険作業・難作業の代行などである。
【０００６】
また、脚式移動ロボットの他の用途として、上述の作業支援というよりも、生活密着型、すなわち人間との「共生」あるいは「エンターティンメント」という用途が挙げられる。この種のロボットは、ヒトあるいはイヌ（ペット）、クマなどの比較的知性の高い脚式歩行動物の動作メカニズムや四肢を利用した豊かな感情表現を忠実に再現する。また、あらかじめ入力された動作パターンを単に忠実に実行するだけではなく、ユーザ（あるいは他のロボット）から受ける言葉や態度（「褒める」とか「叱る」、「叩く」など）に対して動的に対応した、生き生きとした応答表現を実現することも要求される。
【０００７】
従来の玩具機械は、ユーザ操作と応答動作との関係が固定的であり、玩具の動作をユーザの好みに合わせて変更することはできない。この結果、ユーザは同じ動作しか繰り返さない玩具をやがては飽きてしまうことになる。
【０００８】
これに対し、自律動作を行うインテリジェントなロボットは、一般に、外界の情報を認識してそれに対して自身の行動を反映させる機能を持っている。すなわち、ロボットは、外部環境からの音声や画像、触覚などの入力情報に基づいて感情モデルや本能モデルを変化させて動作を決定することにより、自律的な思考及び動作制御を実現する。すなわち、ロボットが感情モデルや本能モデルを用意することにより、より高度な知的レベルで人間とのリアリスティックなコミュニケーションを実現することも可能となる。
【０００９】
ロボットが環境変化に応じた自律動作を行うために、従来は、ある１つの観測結果に対してその情報を受けて行動を取るような単純な行動記述の組み合わせで行動を記述していた。これら入力に対する行動のマッピングにより、ランダム性、内部状態（感情・本能）、学習、成長などの機能を導入することで一意ではない複雑な行動の発現を可能にすることができる。この行動の発現を行うところを、行動生成部という。
【００１０】
例えば、２台のＣＣＤカメラからなるステレオカメラを両目として配した首部を胴体部に対して可動可能とし、さらも２足歩行を可能とする左右の脚部を有した自律型ロボット装置にあって、首部の動作を生成するような行動生成部を首部動作生成部という。
【００１１】
前記首部動作生成部を用いてロボットが使用者の顔など、人間の目に近い部分の認識をする必要がある場合、先ずロボット装置自体が前記２台のＣＣＤカメラからなるステレオカメラを両目として用い、人間の顔を撮影して得た画像を基に顔認識をする。
【００１２】
ところで、カメラにあって、撮影者自身による自己撮影を簡易に行うのに好適なカメラが特開平１４−０７２２９１号公報に開示されている。これは、カメラの前面に収納位置から露呈位置へと移動自在な反射部材を備え、その反射部材に撮影者が顔を写しだして自己撮影を簡易に行う技術である。
【００１３】
【特許文献１】
特開平１４−０７２２９１号
【００１４】
【発明が解決しようとする課題】
前記ロボット装置にあっては、前記特許文献１に開示された反射部材を首部に設けるのは、ヒューマノイド型であることを考慮すると、外観上問題がある。また、顔認識のための画像をＣＣＤカメラにて撮影するには精度の高い調整が必要であり、レンズの視野角との関係から前記反射部材の設置位置の設定が難しい。
【００１５】
しかし、前記ロボットが顔を撮影して得た画像を基に顔認識をする場合、顔を検出して顔の方向にＣＣＤカメラを向け、カメラの視野に顔を入れる必要がある。この際にロボットのカメラを動かすモータの動きが遅かったり、回転角範囲に限界があったりして顔を検出することに時間がかかってしまった場合、人間はもどかしさを感じたり、飽きてしまったりすることが考えられる。
【００１６】
このような状態を防ぐために、人間自身がロボットの視野に自分が入っているかどうかを容易に認識することができる手段はやはり必要である。
【００１７】
本発明は、前記実情に鑑みてなされたものであり、ロボット装置の視野に顔認識の対象となる顔が入っているか否かを容易に提示することができるロボット装置の提供を目的とする。
【００１８】
また、本発明は、ロボット装置の視野に顔認識の対象となる顔が入っているか否かを容易に提示することができるロボット装置による顔認識状況の提示方法の提供を目的とする。
【００１９】
【課題を解決するための手段】
本発明に係るロボット装置は、前記課題を解決するために、小型カメラを用いて外部環境に存在する人間の顔を認識するロボット装置において、前記小型カメラのレンズの視野角を塞ぐことのない孔部を囲むカバー部を有し、前記カバー部に鏡面仕上げを施して前記顔認識の提示に用いる。
【００２０】
また、本発明に係るロボット装置による顔認識状況の提示方法は、小型カメラを用いて外部環境に存在する人間の顔を認識するロボット装置による顔認識状況の提示方法であって、前記小型カメラのレンズの視野角を塞ぐことのない孔部を囲み、かつ鏡面仕上げが施されたカバー部に、前記顔認識の対象となる人間の顔を反射する工程を備え、前記反射する工程によって前記カバー部に映しだされる顔、又は孔部に入って映しだされない顔を提示し、顔認識状況を提示する。
【００２１】
また、本発明は、小型カメラを用いて画像を撮影する情報処理装置において、前記小型カメラのレンズの視野角を塞ぐことのない孔部を囲むカバー部を有し、前記カバー部に鏡面仕上げを施して前記画像を反射し、撮像の状況の提示に用いることを特徴とする情報処理装置であってもよい。
【００２２】
【発明の実施の形態】
以下、本発明の一構成例として示す２足歩行タイプのロボット装置について、図面を参照して詳細に説明する。この人間型のロボット装置は、住環境その他の日常生活上の様々な場面における人的活動を支援する実用ロボットであり、内部状態（怒り、悲しみ、喜び、楽しみ等）に応じて行動できるほか、人間が行う基本的な動作を表出できるエンターテインメントロボットである。つまり、音声や画像などの外的刺激の認識結果に基づいて自律的に行動制御を行うことができる。
【００２３】
図１に示すように、ロボット装置１は、体幹部ユニット２の所定の位置に頭部ユニット３が連結されると共に、左右２つの腕部ユニット４Ｒ／４Ｌと、左右２つの脚部ユニット５Ｒ／５Ｌが連結されて構成されている（但し、Ｒ及びＬの各々は、右及び左の各々を示す接尾辞である。以下において同じ。）。
【００２４】
このロボット装置１は、図１及び図２に示すように、頭部ユニット３にＣＣＤ（ｃｈａｒｇｅｃｏｕｐｌｅｄｄｅｖｉｃｅ）／ＣＭＯＳ（ｃｏｍｐｌｅｍｅｎｔａｒｙｍｅｔａｌ−ｏｘｉｄｅｓｅｍｉｃｏｎｄｕｃｔｏｒ）撮像素子を用いた小型カメラを視覚センサとして２個、人間の目に相当する位置に配置している。以下では、便宜上、目部４００Ｌ、４００Ｒと記す。これら目部４００Ｌ、４００Ｒは、人間の左右の目と同様接近しており、その視差は顔認識の画像撮影時にはあまり問題とならないものとする。
【００２５】
ロボット装置１は、目部４００Ｌ、４００Ｒを構成する、小型カメラを用いて外部環境に存在する人間の顔を撮像し、撮影画像を基に、顔認識や色認識などの画像認識処理や特徴抽出を行う。
【００２６】
目部４００Ｌ、４００Ｒは、図３及び図４に示すように、小型カメラ２００Ｌ及び２００Ｒのレンズ４２０Ｌ及び４２０Ｒの視野角を塞ぐことのない孔部４１２を囲むカバー部４１０を有している。カバー部４１０は、例えばプラスチックにより形成され、カメラのレンズ４２０Ｌ及び４２０Ｒの前面カバーを構成するものである。特に、カバー部４１０は、図５に示すように、カメラのレンズ４２０の視野角（破線４２６で図示）とほぼ同等の角度（一点鎖線４２５で示す）をなす角錘面或いは円錐面に近い曲面状にカットされた孔部４１２の周辺のカバー形状を球形に近い凸面の曲面とする。また、このカバー部４１０は、前記凸面部分を、アルミニウム、銀又はクロムを蒸着して、反射率の高い鏡面としている。
【００２７】
このようにカバー部４１０を反射率の高い鏡面とすることにより、外部環境に存在する使用者は自らの姿を図６の（ａ）、（ｂ）、（ｃ）に示すように映しだして見ることができる。図６の（ａ）では、使用者の顔部４５０及び胴体部４５１は、カバー部４１０の鏡面上の向かって左側に映っている。図６の（ｂ）では、使用者の顔部４５０及び胴体部４５１は、カバー部４１０の鏡面上の向かって右側に映っている。しかし、図６の（ｃ）では、胴体部４５１は鏡面の中央下部に映っているが頭部は孔部４１２にかかって使用者の目から視認できない状態となっている。
【００２８】
ロボット装置１が使用者の顔部４５０及び胴体部４５１を、カバー部４１０の鏡面の向かって左側に映しているときは、顔を認識していない（図６の（ａ））。使用者の顔部４５０及び胴体部４５１を、バー部４１０の鏡面の向かって右側に映しているときも、顔を認識していない（図６の（ｂ））。しかし、胴体部４５１を鏡面の中央下部に映しているが、頭部を孔部４１２にかけて使用者の目から視認できない状態としたときは、顔を認識している（図６の（ｃ））。つまり、図６の（ｃ）状態は、顔部４５０がロボットのカメラのレンズの視野角内に入っていることを示す。もちろん、顔認識を可能とする程度に顔部４５０が孔部４１２に入っていれば十分な場合もある。
【００２９】
このように、使用者は、カバー部４１０の鏡面に顔が映しだされる状態、又は孔部に入って映しだされない状態を確認することができ、それによってロボット装置における顔認識状況の提示を受ける。
【００３０】
すなわち、ロボット装置は、顔認識状況の提示方法を実行することにより、小型カメラのレンズの視野角を塞ぐことのない孔部を囲み、かつ鏡面仕上げが施されたカバー部に、前記顔認識の対象となる人間の顔を反射する工程を備え、前記反射する工程によって顔が前記カバー部に映しだされる状態、又は孔部に入って映しだされない状態を提示し、顔認識状況を提示することができる。
【００３１】
したがって、使用者は、カメラの方向、視野角を容易に確認できるので、ロボット装置の顔認識処理を速やかに行わせることができる。
【００３２】
図７には、前述した目部４００Ｌ、４００Ｒを備える２足歩行のロボット装置の概略を示す。図７に示すように、ロボット装置１の頭部ユニット２５０には、２台のＣＣＤカメラ２００Ｒ，２００Ｌが設けられ、このＣＣＤカメラ２００Ｒ，２００Ｌの後段には、ステレオ画像処理装置２１０が設けられている。２台のＣＣＤカメラ（以下、右目２００Ｒ、左目２００Ｌという。）により撮像された右目画像２０１Ｒ、左目画像２０１Ｌは、ステレオ画像処理装置２１０に入力される。ステレオ画像処理装置２１０は、各画像２０１Ｒ，２０１Ｌの視差情報（ｄｉｓｐａｒｉｔｙｄａｔａ）（距離情報）を計算し、カラー画像（ＹＵＶ：輝度Ｙ、ＵＶ色差）２０２及び視差画像（ＹＤＲ：輝度Ｙ、視差Ｄ、信頼度Ｒ）２０３をフレーム毎に左右交互に算出する。ここで、視差とは、空間中のある点が左目及び右目に写像される点の違いを示し、そのカメラからの距離に応じて変化するものである。
【００３３】
このカラー画像２０２及び視差画像２０３はロボット装置１の体幹部２６０に内蔵されたＣＰＵ（制御部）２２０に入力される。また、ロボット装置１の各関節にはアクチュエータ２３０が設けられており、ＣＰＵ２２０からの指令となる制御信号２３１が供給されて、その指令値に応じてモータを駆動する。各関節（アクチュエータ）には、ポテンショメータが取り付けられ、その時のモータの回転角がＣＰＵ２２０に送られる。このアクチュエータに取り付けられたポテンショメータ、足底に取り付けられたタッチセンサ及び体幹部に取り付けられたジャイロ・センサ等の各センサ２４０は、現在の関節角度、設置情報、及び姿勢情報等の現在のロボット装置の状態を計測し、センサデータ２４１としてＣＰＵ２２０へ出力する。ＣＰＵ２２０は、ステレオ画像処理装置２１０からのカラー画像２０２及び視差画像２０３と、アクチュエータの全ての関節角度等のセンサデータ２４１とが入力され、これらデータが後述するソフトウェアにより処理され、様々な動作を自律的に行うことが可能となる。
【００３４】
図８は、本実施の形態におけるロボット装置を動作させるソフトウェアの構成を示す模式図である。本実施の形態におけるソフトウェアは、オブジェクト単位で構成され、ロボット装置の位置、移動量、周囲の障害物、ランドマーク、ランドマーク地図、行動可能領域等を認識し、ロボット装置が最終的に取るべき行動についての行動列を出力する各種認識処理等を行うものである。なお、ロボット装置の位置を示す座標として、例えば、ランドマーク等の特定の物体等を座標の原点としたワールド基準系のカメラ座標系（以下、絶対座標ともいう。）と、ロボット装置自身を中心（座標の原点）としたロボット中心座標系（以下、相対座標ともいう。）との２つの座標を使用する。
【００３５】
オブジェクト同士は、非同期に通信し合うことで、システム全体が動作する。各オブジェクトはメッセージ通信と共有メモリを使用したオブジェクト間通信方法によりデータの受け渡し及びプログラムの起動（Ｉｎｖｏｋｅ）を行っている。図８に示すように、本実施の形態におけるロボット装置のソフトウェア３００は、ロボット装置の移動量を算出する移動量算出手段（キネマティックオドメトリ（ＫｉｎｅｍａｔｉｃｓＯｄｏｍｅｔｒｙ））ＫＩＮＥ３１０、環境内の平面を抽出する平面抽出部（ＰｌａｎｅＥｘｔｒａｃｔｏｒ）ＰＬＥＸ３２０、環境内の障害物を認識する障害物グリッド算出部（ＯｃｃｕｐａｎｃｙＧｒｉｄ）ＯＧ３３０、人工的なランドマークを含む環境において、自身のセンサ情報及び移動量算出手段から供給される自身の動作情報によって、ロボット装置の自己位置（位置及び姿勢）や、後述のランドマークの位置情報を特定するランドマーク位置検出部（ＬａｎｄｍａｒｋＳｅｎｓｏｒ）ＣＬＳ３４０、ロボット中心座標を絶対座標に変換する絶対座標算出部（Ｌｏｃａｌｉｚａｔｉｏｎ）ＬＺ３５０及びロボット装置の取るべき行動を決定する行動決定部（ＳｉｔｕａｔｅｄｂｅｈａｖｉｏｒＬａｙｅｒ）ＳＢＬ３６０から構成され、各オブジェクト単位にて処理がなされる。
【００３６】
このロボット装置１が具備する関節自由度構成を図９に模式的に示す。頭部ユニット３を支持する首関節は、首関節ヨー軸１０１と、首関節ピッチ軸１０２と、首関節ロール軸１０３という３自由度を有している。
【００３７】
また、上肢を構成する各々の腕部ユニット４Ｒ／Ｌは、肩関節ピッチ軸１０７と、肩関節ロール軸１０８と、上腕ヨー軸１０９と、肘関節ピッチ軸１１０と、前腕ヨー軸１１１と、手首関節ピッチ軸１１２と、手首関節ロール輪１１３と、手部１１４とで構成される。手部１１４は、実際には、複数本の指を含む多関節・多自由度構造体である。ただし、手部１１４の動作は、ロボット装置１の姿勢制御や歩行制御に対する寄与や影響が少ないので、本明細書ではゼロ自由度と仮定する。したがって、各腕部は７自由度を有するとする。
【００３８】
また、体幹部ユニット２は、体幹ピッチ軸１０４と、体幹ロール軸１０５と、体幹ヨー軸１０６という３自由度を有する。
【００３９】
また、下肢を構成する各々の脚部ユニット５Ｒ／Ｌは、股関節ヨー軸１１５と、股関節ピッチ軸１１６と、股関節ロール軸１１７と、膝関節ピッチ軸１１８と、足首関節ピッチ軸１１９と、足首関節ロール軸１２０と、足部１２１とで構成される。本明細書中では、股関節ピッチ軸１１６と股関節ロール軸１１７の交点は、ロボット装置１の股関節位置を定義する。人体の足部１２１は、実際には多関節・多自由度の足底を含んだ構造体であるが、ロボット装置１の足底は、ゼロ自由度とする。したがって、各脚部は、６自由度で構成される。
【００４０】
以上を総括すれば、ロボット装置１全体としては、合計で３＋７×２＋３＋６×２＝３２自由度を有することになる。ただし、エンターテインメント向けのロボット装置１が必ずしも３２自由度に限定されるわけではない。設計・制作上の制約条件や要求仕様等に応じて、自由度、即ち関節数を適宜増減することができることはいうまでもない。
【００４１】
上述したようなロボット装置１がもつ各自由度は、実際にはアクチュエータを用いて実装される。外観上で余分な膨らみを排してヒトの自然体形状に近似させること、２足歩行という不安定構造体に対して姿勢制御を行うことなどの要請から、アクチュエータは小型且つ軽量であることが好ましい。
【００４２】
図１０には、ロボット装置１の制御システム構成を模式的に示している。同図に示すように、ロボット装置１は、ヒトの四肢を表現した体幹部ユニット２，頭部ユニット３，腕部ユニット４Ｒ／Ｌ，脚部ユニット５Ｒ／Ｌと、各ユニット間の協調動作を実現するための適応制御を行う制御ユニット１０とで構成される。
【００４３】
ロボット装置１全体の動作は、制御ユニット１０によって統括的に制御される。制御ユニット１０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）や、ＤＲＡＭ、フラッシュＲＯＭ等の主要回路コンポーネント（図示しない）で構成される主制御部１１と、電源回路やロボット装置１の各構成要素とのデータやコマンドの授受を行うインターフェイス（何れも図示しない）などを含んだ周辺回路１２とで構成される。
【００４４】
この制御ユニット１０の設置場所は、特に限定されない。図１０では体幹部ユニット２に搭載されているが、頭部ユニット３に搭載してもよい。あるいは、ロボット装置１外に制御ユニット１０を配備して、ロボット装置１の機体とは有線又は無線で交信するようにしてもよい。
【００４５】
図１０に示したロボット装置１内の各関節自由度は、それぞれに対応するアクチュエータによって実現される。即ち、頭部ユニット３には、首関節ヨー軸１０１、首関節ピッチ軸１０２、首関節ロール軸１０３の各々を表現する首関節ヨー軸アクチュエータＡ_２、首関節ピッチ軸アクチュエータＡ_３、首関節ロール軸アクチュエータＡ_４が配設されている。
【００４６】
また、頭部ユニット３には、外部の状況を撮像するためのＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ）カメラが設けられているほか、前方に位置する物体までの距離を測定するための距離センサ、外部音を集音するためのマイク、音声を出力するためのスピーカ、使用者からの「撫でる」や「叩く」といった物理的な働きかけにより受けた圧力を検出するためのタッチセンサ等が配設されている。
【００４７】
また、体幹部ユニット２には、体幹ピッチ軸１０４、体幹ロール軸１０５、体幹ヨー軸１０６の各々を表現する体幹ピッチ軸アクチュエータＡ_５、体幹ロール軸アクチュエータＡ_６、体幹ヨー軸アクチュエータＡ_７が配設されている。また、体幹部ユニット２には、このロボット装置１の起動電源となるバッテリを備えている。このバッテリは、充放電可能な電池によって構成されている。
【００４８】
また、腕部ユニット４Ｒ／Ｌは、上腕ユニット４１Ｒ／Ｌと、肘関節ユニット４２Ｒ／Ｌと、前腕ユニット４３Ｒ／Ｌに細分化されるが、肩関節ピッチ軸１０７、肩関節ロール軸１０８、上腕ヨー軸１０９、肘関節ピッチ軸１１０、前腕ヨー軸１１１、手首関節ピッチ軸１１２、手首関節ロール軸１１３の各々表現する肩関節ピッチ軸アクチュエータＡ_８、肩関節ロール軸アクチュエータＡ_９、上腕ヨー軸アクチュエータＡ_１０、肘関節ピッチ軸アクチュエータＡ_１１、肘関節ロール軸アクチュエータＡ_１２、手首関節ピッチ軸アクチュエータＡ_１３、手首関節ロール軸アクチュエータＡ_１４が配備されている。
【００４９】
また、脚部ユニット５Ｒ／Ｌは、大腿部ユニット５１Ｒ／Ｌと、膝ユニット５２Ｒ／Ｌと、脛部ユニット５３Ｒ／Ｌに細分化されるが、股関節ヨー軸１１５、股関節ピッチ軸１１６、股関節ロール軸１１７、膝関節ピッチ軸１１８、足首関節ピッチ軸１１９、足首関節ロール軸１２０の各々を表現する股関節ヨー軸アクチュエータＡ_１６、股関節ピッチ軸アクチュエータＡ_１７、股関節ロール軸アクチュエータＡ_１８、膝関節ピッチ軸アクチュエータＡ_１９、足首関節ピッチ軸アクチュエータＡ_２０、足首関節ロール軸アクチュエータＡ_２１が配備されている。各関節に用いられるアクチュエータＡ_２，Ａ_３・・・は、より好ましくは、ギア直結型で旦つサーボ制御系をワンチップ化してモータ・ユニット内に搭載したタイプの小型ＡＣサーボ・アクチュエータで構成することができる。
【００５０】
体幹部ユニット２、頭部ユニット３、各腕部ユニット４Ｒ／Ｌ、各脚部ユニット５Ｒ／Ｌなどの各機構ユニット毎に、アクチュエータ駆動制御部の副制御部２０，２１，２２Ｒ／Ｌ，２３Ｒ／Ｌが配備されている。さらに、各脚部ユニット５Ｒ／Ｌの足底が着床したか否かを検出する接地確認センサ３０Ｒ／Ｌを装着するとともに、体幹部ユニット２内には、姿勢を計測する姿勢センサ３１を装備している。
【００５１】
接地確認センサ３０Ｒ／Ｌは、例えば足底に設置された近接センサ又はマイクロ・スイッチなどで構成される。また、姿勢センサ３１は、例えば、加速度センサとジャイロ・センサの組み合わせによって構成される。
【００５２】
接地確認センサ３０Ｒ／Ｌの出力によって、歩行・走行などの動作期間中において、左右の各脚部が現在立脚又は遊脚何れの状態であるかを判別することができる。また、姿勢センサ３１の出力により、体幹部分の傾きや姿勢を検出することができる。
【００５３】
主制御部１１は、各センサ３０Ｒ／Ｌ，３１の出力に応答して制御目標をダイナミックに補正することができる。より具体的には、副制御部２０，２１，２２Ｒ／Ｌ，２３Ｒ／Ｌの各々に対して適応的な制御を行い、ロボット装置１の上肢、体幹、及び下肢が協調して駆動する全身運動パターンを実現できる。
【００５４】
ロボット装置１の機体上での全身運動は、足部運動、ＺＭＰ（ＺｅｒｏＭｏｍｅｎｔＰｏｉｎｔ）軌道、体幹運動、上肢運動、腰部高さなどを設定するとともに、これらの設定内容にしたがった動作を指示するコマンドを各副制御部２０，２１，２２Ｒ／Ｌ，２３Ｒ／Ｌに転送する。そして、各々の副制御部２０，２１，・・・等では、主制御部１１からの受信コマンドを解釈して、各アクチュエータＡ_２，Ａ_３・・・等に対して駆動制御信号を出力する。ここでいう「ＺＭＰ」とは、歩行中の床反力によるモーメントがゼロとなる床面上の点のことであり、また、「ＺＭＰ軌道」とは、例えばロボット装置１の歩行動作期間中にＺＭＰが動く軌跡を意味する。なお、ＺＭＰの概念並びにＺＭＰを歩行ロボットの安定度判別規範に適用する点については、ＭｉｏｍｉｒＶｕｋｏｂｒａｔｏｖｉｃ著“ＬＥＧＧＥＤＬＯＣＯＭＯＴＩＯＮＲＯＢＯＴＳ”（加藤一郎外著『歩行ロボットと人工の足』（日刊工業新聞社））に記載されている。
【００５５】
以上のように、ロボット装置１は、各々の副制御部２０，２１，・・・等が、主制御部１１からの受信コマンドを解釈して、各アクチュエータＡ_２，Ａ_３・・・に対して駆動制御信号を出力し、各ユニットの駆動を制御している。これにより、ロボット装置１は、目標の姿勢に安定して遷移し、安定した姿勢で歩行できる。
【００５６】
また、ロボット装置１における制御ユニット１０では、上述したような姿勢制御のほかに、加速度センサ、タッチセンサ、接地確認センサ等の各種センサ、及びＣＣＤカメラからの画像情報、マイクからの音声情報等を統括して処理している。制御ユニット１０では、図示しないが加速度センサ、ジャイロ・センサ、タッチセンサ、距離センサ、マイク、スピーカなどの各種センサ、各アクチュエータ、ＣＣＤカメラ及びバッテリが各々対応するハブを介して主制御部１１と接続されている。
【００５７】
主制御部１１は、上述の各センサから供給されるセンサデータや画像データ及び音声データを順次取り込み、これらをそれぞれ内部インターフェイスを介してＤＲＡＭ内の所定位置に順次格納する。また、主制御部１１は、バッテリから供給されるバッテリ残量を表すバッテリ残量データを順次取り込み、これをＤＲＡＭ内の所定位置に格納する。ＤＲＡＭに格納された各センサデータ、画像データ、音声データ及びバッテリ残量データは、主制御部１１がこのロボット装置１の動作制御を行う際に利用される。
【００５８】
主制御部１１は、ロボット装置１の電源が投入された初期時、制御プログラムを読み出し、これをＤＲＡＭに格納する。また、主制御部１１は、上述のように主制御部１１よりＤＲＡＭに順次格納される各センサデータ、画像データ、音声データ及びバッテリ残量データに基づいて自己及び周囲の状況や、使用者からの指示及び働きかけの有無などを判断する。
【００５９】
さらに、主制御部１１は、この判断結果及びＤＲＡＭに格納した制御プログラムに基づいて自己の状況に応じて行動を決定するとともに、当該決定結果に基づいて必要なアクチュエータを駆動させることによりロボット装置１に、いわゆる「身振り」、「手振り」といった行動をとらせる。
【００６０】
このようにしてロボット装置１は、制御プログラムに基づいて自己及び周囲の状況を判断し、使用者からの指示及び働きかけに応じて自律的に行動できる。
【００６１】
ところで、このロボット装置１は、内部状態に応じて自律的に行動することができる。そこで、ロボット装置１における制御プログラムのソフトウェア構成例について、図１１乃至図１６を用いて説明する。なお、この制御プログラムは、上述したように、予めフラッシュＲＯＭ１２に格納されており、ロボット装置１の電源投入初期時において読み出される。
【００６２】
図１１において、デバイス・ドライバ・レイヤ４０は、制御プログラムの最下位層に位置し、複数のデバイス・ドライバからなるデバイス・ドライバ・セット４１から構成されている。この場合、各デバイス・ドライバは、ＣＣＤカメラやタイマ等の通常のコンピュータで用いられるハードウェアに直接アクセスすることを許されたオブジェクトであり、対応するハードウェアからの割り込みを受けて処理を行う。
【００６３】
また、ロボティック・サーバ・オブジェクト４２は、デバイス・ドライバ・レイヤ４０の最下位層に位置し、例えば上述の各種センサやアクチュエータ２８_１〜２８_ｎ等のハードウェアにアクセスするためのインターフェイスを提供するソフトウェア群でなるバーチャル・ロボット４３と、電源の切換えなどを管理するソフトウェア群でなるパワーマネージャ４４と、他の種々のデバイス・ドライバを管理するソフトウェア群でなるデバイス・ドライバ・マネージャ４５と、ロボット装置１の機構を管理するソフトウェア群でなるデザインド・ロボット４６とから構成されている。
【００６４】
マネージャ・オブジェクト４７は、オブジェクト・マネージャ４８及びサービス・マネージャ４９から構成されている。オブジェクト・マネージャ４８は、ロボティック・サーバ・オブジェクト４２、ミドル・ウェア・レイヤ５０、及びアプリケーション・レイヤ５１に含まれる各ソフトウェア群の起動や終了を管理するソフトウェア群であり、サービス・マネージャ４９は、メモリカードに格納されたコネクションファイルに記述されている各オブジェクト間の接続情報に基づいて各オブジェクトの接続を管理するソフトウェア群である。
【００６５】
ミドル・ウェア・レイヤ５０は、ロボティック・サーバ・オブジェクト４２の上位層に位置し、画像処理や音声処理などのこのロボット装置１の基本的な機能を提供するソフトウェア群から構成されている。また、アプリケーション・レイヤ５１は、ミドル・ウェア・レイヤ５０の上位層に位置し、当該ミドル・ウェア・レイヤ５０を構成する各ソフトウェア群によって処理された処理結果に基づいてロボット装置１の行動を決定するためのソフトウェア群から構成されている。
【００６６】
なお、ミドル・ウェア・レイヤ５０及びアプリケーション・レイヤ５１の具体なソフトウェア構成をそれぞれ図１２に示す。
【００６７】
ミドル・ウェア・レイヤ５０は、図１２に示すように、騒音検出用、温度検出用、明るさ検出用、音階認識用、距離検出用、姿勢検出用、タッチセンサ用、動き検出用及び色認識用の各信号処理モジュール６０〜６８並びに入力セマンティクスコンバータモジュール６９などを有する認識系７０と、出力セマンティクスコンバータモジュール７８並びに姿勢管理用、トラッキング用、モーション再生用、歩行用、転倒復帰用、ＬＥＤ点灯用及び音再生用の各信号処理モジュール７１〜７７などを有する出力系７９とから構成されている。
【００６８】
認識系７０の各信号処理モジュール６０〜６８は、ロボティック・サーバ・オブジェクト４２のバーチャル・ロボット４３によりＤＲＡＭから読み出される各センサデータや画像データ及び音声データのうちの対応するデータを取り込み、当該データに基づいて所定の処理を施して、処理結果を入力セマンティクスコンバータモジュール６９に与える。ここで、例えば、バーチャル・ロボット４３は、所定の通信規約によって、信号の授受或いは変換をする部分として構成されている。
【００６９】
入力セマンティクスコンバータモジュール６９は、これら各信号処理モジュール６０〜６８から与えられる処理結果に基づいて、「うるさい」、「暑い」、「明るい」、「ボールを検出した」、「転倒を検出した」、「撫でられた」、「叩かれた」、「ドミソの音階が聞こえた」、「動く物体を検出した」又は「障害物を検出した」などの自己及び周囲の状況や、使用者からの指令及び働きかけを認識し、認識結果をアプリケーション・レイヤ４１に出力する。
【００７０】
アプリケーション・レイヤ５１は、図１３に示すように、行動モデルライブラリ８０、行動切換モジュール８１、学習モジュール８２、感情モデル８３及び本能モデル８４の５つのモジュールから構成されている。
【００７１】
行動モデルライブラリ８０には、図１４に示すように、「バッテリ残量が少なくなった場合」、「転倒復帰する」、「障害物を回避する場合」、「感情を表現する場合」、「ボールを検出した場合」などの予め選択されたいくつかの条件項目にそれぞれ対応させて、それぞれ独立した行動モデルが設けられている。
【００７２】
そして、これら行動モデルは、それぞれ入力セマンティクスコンバータモジュール６９から認識結果が与えられたときや、最後の認識結果が与えられてから一定時間が経過したときなどに、必要に応じて後述のように感情モデル８３に保持されている対応する情動のパラメータ値や、本能モデル８４に保持されている対応する欲求のパラメータ値を参照しながら続く行動をそれぞれ決定し、決定結果を行動切換モジュール８１に出力する。
【００７３】
なお、この実施の形態の場合、各行動モデルは、次の行動を決定する手法として、図１５に示すような１つのノード（状態）ＮＯＤＥ_０〜ＮＯＤＥ_ｎから他のどのノードＮＯＤＥ_０〜ＮＯＤＥ_ｎに遷移するかを各ノードＮＯＤＥ_０〜ＮＯＤＥ_ｎに間を接続するアークＡＲＣ_１〜ＡＲＣ_ｎ１に対してそれぞれ設定された遷移確率Ｐ_１〜Ｐ_ｎに基づいて確率的に決定する有限確率オートマトンと呼ばれるアルゴリズムを用いる。
【００７４】
具体的に、各行動モデルは、それぞれ自己の行動モデルを形成するノードＮＯＤＥ_０〜ＮＯＤＥ_ｎにそれぞれ対応させて、これらノードＮＯＤＥ_０〜ＮＯＤＥ_ｎ毎に図１６に示すような状態遷移表９０を有している。
【００７５】
この状態遷移表９０では、そのノードＮＯＤＥ_０〜ＮＯＤＥ_ｎにおいて遷移条件とする入力イベント（認識結果）が「入力イベント名」の列に優先順に列記され、その遷移条件についてのさらなる条件が「データ名」及び「データ範囲」の列における対応する行に記述されている。
【００７６】
したがって、図１６の状態遷移表９０で表されるノードＮＯＤＥ_１００では、「ボールを検出（ＢＡＬＬ）」という認識結果が与えられた場合に、当該認識結果とともに与えられるそのボールの「大きさ（ＳＩＺＥ）」が「０から１０００」の範囲であることや、「障害物を検出（ＯＢＳＴＡＣＬＥ）」という認識結果が与えられた場合に、当該認識結果とともに与えられるその障害物までの「距離（ＤＩＳＴＡＮＣＥ）」が「０から１００」の範囲であることが他のノードに遷移するための条件となっている。
【００７７】
また、このノードＮＯＤＥ_１００では、認識結果の入力がない場合においても、行動モデルが周期的に参照する感情モデル８３及び本能モデル８４にそれぞれ保持された各情動及び各欲求のパラメータ値のうち、感情モデル８３に保持された「喜び（Ｊｏｙ）」、「驚き（Ｓｕｒｐｒｉｓｅ）」又は「悲しみ（Ｓａｄｎｅｓｓ）」の何れかのパラメータ値が「５０から１００」の範囲であるときには他のノードに遷移することができるようになっている。
【００７８】
また、状態遷移表９０では、「他のノードヘの遷移確率」の欄における「遷移先ノード」の行にそのノードＮＯＤＥ_０〜ＮＯＤＥ_ｎから遷移できるノード名が列記されているとともに、「入力イベント名」、「データ名」及び「データの範囲」の列に記述された全ての条件が揃ったときに遷移できる他の各ノードＮＯＤＥ_０〜ＮＯＤＥ_ｎへの遷移確率が「他のノードヘの遷移確率」の欄内の対応する箇所にそれぞれ記述され、そのノードＮＯＤＥ_０〜ＮＯＤＥ_ｎに遷移する際に出力すべき行動が「他のノードヘの遷移確率」の欄における「出力行動」の行に記述されている。なお、「他のノードヘの遷移確率」の欄における各行の確率の和は１００［％］となっている。
【００７９】
したがって、図１６の状態遷移表９０で表されるノードＮＯＤＥ_１００では、例えば「ボールを検出（ＢＡＬＬ）」し、そのボールの「ＳＩＺＥ（大きさ）」が「０から１０００」の範囲であるという認識結果が与えられた場合には、「３０［％］」の確率で「ノードＮＯＤＥ_１２０（ｎｏｄｅ１２０）」に遷移でき、そのとき「ＡＣＴＩＯＮ１」の行動が出力されることとなる。
【００８０】
各行動モデルは、それぞれこのような状態遷移表９０として記述されたノードＮＯＤＥ_０〜ＮＯＤＥ_ｎが幾つも繋がるようにして構成されており、入力セマンティクスコンバータモジュール６９から認識結果が与えられたときなどに、対応するノードＮＯＤＥ_０〜ＮＯＤＥ_ｎの状態遷移表を利用して確率的に次の行動を決定し、決定結果を行動切換モジュール８１に出力するようになされている。
【００８１】
図１４に示す行動切換モジュール８１は、行動モデルライブラリ８０の各行動モデルからそれぞれ出力される行動のうち、予め定められた優先順位の高い行動モデルから出力された行動を選択し、当該行動を実行すべき旨のコマンド（以下、行動コマンドという。）をミドル・ウェア・レイヤ５０の出力セマンティクスコンバータモジュール７８に送出する。なお、この実施の形態においては、図１５において下側に表記された行動モデルほど優先順位が高く設定されている。
【００８２】
また、行動切換モジュール８１は、行動完了後に出力セマンティクスコンバータモジュール７８から与えられる行動完了情報に基づいて、その行動が完了したことを学習モジュール８２、感情モデル８３及び本能モデル８４に通知する。
【００８３】
一方、学習モジュール８２は、入力セマンティクスコンバータモジュール６９から与えられる認識結果のうち、「叩かれた」や「撫でられた」など、使用者からの働きかけとして受けた教示の認識結果を入力する。
【００８４】
そして、学習モジュール８２は、この認識結果及び行動切換えモジュール７１からの通知に基づいて、「叩かれた（叱られた）」ときにはその行動の発現確率を低下させ、「撫でられた（誉められた）」ときにはその行動の発現確率を上昇させるように、行動モデルライブラリ７０における対応する行動モデルの対応する遷移確率を変更する。
【００８５】
他方、感情モデル８３は、「喜び（Ｊｏｙ）」、「悲しみ（Ｓａｄｎｅｓｓ）」、「怒り（Ａｎｇｅｒ）」、「驚き（Ｓｕｒｐｒｉｓｅ）」、「嫌悪（Ｄｉｓｇｕｓｔ）」及び「恐れ（Ｆｅａｒ）」の合計６つの情動について、各情動毎にその情動の強さを表すパラメータを保持している。そして、感情モデル８３は、これら各情動のパラメータ値を、それぞれ入力セマンティクスコンバータモジュール６９から与えられる「叩かれた」及び「撫でられた」などの特定の認識結果や、経過時間及び行動切換モジュール８１からの通知などに基づいて周期的に更新する。
【００８６】
具体的には、感情モデル８３は、入力セマンティクスコンバータモジュール６９から与えられる認識結果と、そのときのロボット装置１の行動と、前回更新してからの経過時間となどに基づいて所定の演算式により算出されるそのときのその情動の変動量を△Ｅ［ｔ］、現在のその情動のパラメータ値をＥ［ｔ］、その情動の感度を表す係数をｋｅとして、下記数式（１１）によって次の周期におけるその情動のパラメータ値Ｅ［ｔ＋１］を算出し、これを現在のその情動のパラメータ値Ｅ［ｔ］と置き換えるようにしてその情動のパラメータ値を更新する。また、感情モデル８３は、これと同様にして全ての情動のパラメータ値により更新する。
Ｅ＝［ｔ＋１］＝Ｅ＝［ｔ］＋ｋｅ×ΔＥ
なお、各認識結果や出力セマンティクスコンバータモジュール７８からの通知が各情動のパラメータ値の変動量△Ｅ［ｔ］にどの程度の影響を与えるかは予め決められており、例えば「叩かれた」といった認識結果は「怒り」の情動のパラメータ値の変動量△Ｅ［ｔ］に大きな影響を与え、「撫でられた」といった認識結果は「喜び」の情動のパラメータ値の変動量△Ｅ［ｔ］に大きな影響を与えるようになっている。
【００８７】
ここで、出力セマンティクスコンバータモジュール７８からの通知とは、いわゆる行動のフィードバック情報（行動完了情報）であり、行動の出現結果の情報であり、感情モデル８３は、このような情報によっても感情を変化させる。これは、例えば、「叫ぶ」といった行動により怒りの感情レベルが下がるといったようなことである。なお、出力セマンティクスコンバータモジュール７８からの通知は、上述した学習モジュール８２にも入力されており、学習モジュール８２は、その通知に基づいて行動モデルの対応する遷移確率を変更する。
【００８８】
なお、行動結果のフィードバックは、行動切換モジュール８１の出力（感情が付加された行動）によりなされるものであってもよい。
【００８９】
一方、本能モデル８４は、「運動欲（ｅｘｅｒｃｉｓｅ）」、「愛情欲（ａｆｆｅｃｔｉｏｎ）」、「食欲（ａｐｐｅｔｉｔｅ）」及び「好奇心（ｃｕｒｉｏｓｉｔｙ）」の互いに独立した４つの欲求について、これら欲求毎にその欲求の強さを表すパラメータを保持している。そして、本能モデル８４は、これらの欲求のパラメータ値を、それぞれ入力セマンティクスコンバータモジュール６９から与えられる認識結果や、経過時間及び行動切換モジュール８１からの通知などに基づいて周期的に更新する。
【００９０】
具体的には、本能モデル８４は、「運動欲」、「愛情欲」及び「好奇心」については、認識結果、経過時間及び出力セマンティクスコンバータモジュール７８からの通知などに基づいて所定の演算式により算出されるそのときのその欲求の変動量をΔＩ［ｋ］、現在のその欲求のパラメータ値をＩ［ｋ］、その欲求の感度を表す係数ｋ_ｉとして、所定周期で下記数式（１２）を用いて次の周期におけるその欲求のパラメータ値Ｉ［ｋ＋１］を算出し、この演算結果を現在のその欲求のパラメータ値Ｉ［ｋ］と置き換えるようにしてその欲求のパラメータ値を更新する。また、本能モデル８４は、これと同様にして「食欲」を除く各欲求のパラメータ値を更新する。
Ｉ［ｋ＋１］＝Ｉ［ｋ］＋ｋｉ×ΔＩ［ｋ］
なお、認識結果及び出力セマンティクスコンバータモジュール７８からの通知などが各欲求のパラメータ値の変動量△Ｉ［ｋ］にどの程度の影響を与えるかは予め決められており、例えば出力セマンティクスコンバータモジュール７８からの通知は、「疲れ」のパラメータ値の変動量△Ｉ［ｋ］に大きな影響を与えるようになっている。
【００９１】
なお、この具体例においては、各情動及び各欲求（本能）のパラメータ値がそれぞれ０から１００までの範囲で変動するように規制されており、また係数ｋｅ、ｋｉの値も各情動及び各欲求毎に個別に設定されている。
【００９２】
一方、ミドル・ウェア・レイヤ５０の出力セマンティクスコンバータモジュール７８は、図１２に示すように、上述のようにしてアプリケーション・レイヤ５１の行動切換モジュール８１から与えられる「前進」、「喜ぶ」、「鳴く」又は「トラッキング（ボールを追いかける）」といった抽象的な行動コマンドを出力系７９の対応する信号処理モジュール７１〜７７に与える。
【００９３】
そしてこれら信号処理モジュール７１〜７７は、行動コマンドが与えられると当該行動コマンドに基づいて、その行動をするために対応するアクチュエータに与えるべきサーボ指令値や、スピーカから出力する音の音声データ及び又はＬＥＤに与える駆動データを生成し、これらのデータをロボティック・サーバ・オブジェクト４２のバーチャル・ロボット４３及び信号処理回路を順次介して対応するアクチュエータ又はスピーカ又はＬＥＤに順次送出する。
【００９４】
このようにしてロボット装置１は、上述した制御プログラムに基づいて、自己（内部）及び周囲（外部）の状況や、使用者からの指示及び働きかけに応じた自律的な行動ができる。
【００９５】
このような制御プログラムは、ロボット装置が読取可能な形式で記録された記録媒体を介して提供される。制御プログラムを記録する記録媒体としては、磁気読取方式の記録媒体（例えば、磁気テープ、フレキシブルディスク、磁気カード）、光学読取方式の記録媒体（例えば、ＣＤ−ＲＯＭ、ＭＯ、ＣＤ−Ｒ、ＤＶＤ）等が考えられる。記録媒体には、半導体メモリ（いわゆるメモリカード（矩形型、正方形型など形状は問わない。）、ＩＣカード）等の記憶媒体も含まれる。また、制御プログラムは、いわゆるインターネット等を介して提供されてもよい。
【００９６】
これらの制御プログラムは、専用の読込ドライバ装置、又はパーソナルコンピュータ等を介して再生され、有線又は無線接続によってロボット装置１に伝送されて読み込まれる。また、ロボット装置１は、半導体メモリ、又はＩＣカード等の小型化された記憶媒体のドライブ装置を備える場合、これら記憶媒体から制御プログラムを直接読み込むこともできる。
【００９７】
なお、顔認識処理について以下に説明しておく。ロボット装置は、顔検出器を備え、画像フレーム中から顔領域を検出する。小型カメラによって撮像した画像データを受け取り、それを９段階のスケール画像に縮小変換する。このすべての画像の中から顔に相当する矩形領域を探索する。重なりあった候補領域を削減して最終的に顔と判断された領域に関する位置や大きさ、特徴量などの顔情報（ＦａｃｅＩｎｆｏ）を検出する。ここで検出された顔情報は短期記憶部へ送られる。
【００９８】
短期記憶部は、ロボット装置１の外部環境に関する情報を比較的短い時間だけ保持するオブジェクトであり、音検出部から音（声）検出結果、色検出部から色検出結果、また顔検出部から顔検出結果、さらには関節角度検出部から関節角度のセンサ出力を受け取る。そして、これらの複数の検出情報を時間的及び空間的に整合性を保つように統合して、意味を持った統合情報として扱い、比較的短い時間、例えば１５秒間保持する。これらの統合情報は、動作命令部及び首部動作生成部に渡される。
【００９９】
動作命令部は、状況依存行動層等の上位モジュールである。首部動作生成部に対して首部の動作を指示するコマンドを発現する。短期記憶部にて保存されている統合情報に関するＩＤ（対象物ＩＤ）を指定することによって、すなわち、何に対してトラッキングするのかを指定するだけでロボット装置１にトラッキング動作を行わせることができる。
【０１００】
首動作生成部は、動作命令部から首を動かすコマンドを受けたことに応答して、首の関節角を計算するモジュールである。「追跡」（本実施の形態では前記対象物ＩＤ）のコマンドを受けたときには、短期記憶部から受け取った情報を基に、その物体が存在する方向を向く首の関節角を計算して出力する。前記対象物ＩＤを受け取り、最適な対象物情報を選択して、その選択した対象物情報が得られる方向を向くことになる。
【０１０１】
顔を認識する際には、顔を検出して顔の方向にＣＣＤカメラを向け、カメラの視野に顔を入れる必要があり、ロボットのカメラを動かすモータの動きが遅かったり、回転角範囲に限界があったりして顔を検出することに時間がかかってしまうという問題が発生したが、本発明により、ロボット装置の視野に顔認識の対象となる顔が入っているか否かを容易に提示することができるので、ロボット装置は顔認識処理を迅速に行うことができる。
【０１０２】
また、ロボット装置は、顔認識の対象とする使用者と対話をしながら、本発明の顔認識状況提示方法を実行し、視野に顔認識の対象となる顔が入っているか否かを提示してもよい。
【０１０３】
なお、本発明は、前記実施の形態にのみ適用が限定されるものではない。以下の変形例を挙げることができる。例えば、鏡面は別の場所にロボット装置の視野と同等の範囲を映し出すように設置してもよい。また、より確実に確認できるよう鏡の表示範囲とカメラの視野角の間にマージンを持たせるようにしてもよい。また、カバー部の凸面鏡部分を球形以外の形状、例えば円錐形状、多角錘形状にしてもよい。
【０１０４】
さらに、本発明は、小型カメラを用いて画像を撮影する、携帯電話、携帯型情報処理端末（ＰＤＡ）、携帯型パーソナルコンピュータ等の情報処理装置において、前記小型カメラのレンズの視野角を塞ぐことのない孔部を囲むカバー部を有し、前記カバー部に鏡面仕上げを施して前記画像を反射し、撮像の状況の提示に用いることを特徴とする情報処理装置に適用してもよい。
【０１０５】
【発明の効果】
本発明によれば、小型カメラを用いて外部環境に存在する人間の顔を認識するロボット装置において、小型カメラのレンズの視野角を塞ぐことのない孔部を囲むカバー部を有し、カバー部に鏡面仕上げを施して顔認識の提示に用いるので、ロボット装置の視野に顔認識の対象となる顔が入っているか否かを容易に提示することができる。
【図面の簡単な説明】
【図１】ロボット装置の外観構成を示す図であって、ヒューマノイド型の２足歩行のロボット装置を示す斜視図である。
【図２】頭部に設けた目部の配置を示す図である。
【図３】目部の拡大図である。
【図４】頭部の断面図である。
【図５】目部の断面図であり、視野角も示す図である。
【図６】顔認識状況の提示方法を説明するための図である。
【図７】ロボット装置の概略を示すブロック図である。
【図８】ロボット装置を動作するソフトウェアの構成を示す模式図である。
【図９】本発明の実施の形態におけるロボット装置の自由度構成モデルを模式的に示す図である。
【図１０】同ロボット装置の回路構成を示すブロック図である。
【図１１】同ロボット装置のソフトウェア構成を示すブロック図である。
【図１２】同ロボット装置のソフトウェア構成におけるミドル・ウェア・レイヤの構成を示すブロック図である。
【図１３】同ロボット装置のソフトウェア構成におけるアプリケーション・レイヤの構成を示すブロック図である。
【図１４】アプリケーション・レイヤの行動モデルライブラリの構成を示すブロック図である。
【図１５】同ロボット装置の行動決定のための情報となる有限確率オートマトンを説明する図である。
【図１６】有限確率オートマトンの各ノードに用意された状態遷移表を示す図である。
【符号の説明】
１ロボット装置、３頭部、４００Ｌ，４００Ｒ目部、４１０カバー部（鏡面）、４１２孔部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a robot apparatus having a small camera as a visual sensor and recognizing a human face existing in an external environment, and a method of presenting a face recognition situation by the robot apparatus.
[0002]
[Prior art]
A mechanical device that performs a motion resembling a human motion using an electric or magnetic action is called a “robot”. It is said that the robot is derived from the Slavic word "ROBOTA (slave machine)". In Japan, robots began to spread from the late 1960's, but most of them were industrial robots (industrial robots) such as manipulators and transfer robots for the purpose of automation and unmanned production work in factories. Met.
[0003]
Recently, pet-type robots that mimic the body mechanism and behavior of four-legged animals such as dogs, cats, and bears, or the body mechanism and behavior of animals that perform bipedal walking, such as humans and monkeys, have been modeled. Research and development on the structure of a legged mobile robot such as the "humanoid" or "humanoid" robot and stable walking control thereof have been progressing, and expectations for its practical use have been increasing. These legged mobile robots are unstable compared to crawler type robots, making posture control and walking control difficult.However, they are excellent in that they can realize flexible walking and running operations such as climbing stairs and climbing over obstacles. I have.
[0004]
A stationary robot, such as an arm-type robot, which is implanted and used in a specific place, operates only in a fixed and local work space such as assembling and sorting parts. On the other hand, the mobile robot has a work space that is not limited, and can freely move on a predetermined route or on a non-route to perform a predetermined or arbitrary human work, or perform a human or dog operation. Alternatively, various services that replace other living things can be provided.
[0005]
One of the uses of the legged mobile robot is to represent various difficult tasks in industrial activities and production activities. For example, maintenance work in nuclear power plants, thermal power plants, petrochemical plants, transport and assembly of parts in manufacturing factories, cleaning in high-rise buildings, rescue in fire spots and other dangerous and difficult work, etc. .
[0006]
Another application of the legged mobile robot is not the work support described above, but a life-based type, that is, a "symbiosis" or "entertainment" with humans. This type of robot faithfully reproduces the motion mechanisms of relatively intelligent legged walking animals such as humans, dogs (pets), and bears and rich emotional expressions using limbs. In addition, the robot does not simply execute a pre-input motion pattern faithfully, but also dynamically responds to words and attitudes (eg, praise, scold, and hit) received from the user (or another robot). It is also required to realize a corresponding and lively response expression.
[0007]
In the conventional toy machine, the relationship between the user operation and the response operation is fixed, and the operation of the toy cannot be changed according to the user's preference. As a result, the user eventually gets tired of toys that repeat only the same operation.
[0008]
On the other hand, an intelligent robot that performs an autonomous operation generally has a function of recognizing information in the outside world and reflecting its own behavior on the information. That is, the robot realizes autonomous thinking and action control by changing an emotion model or an instinct model based on input information such as a voice, an image, and a tactile sensation from an external environment to determine an action. That is, if the robot prepares an emotion model or an instinct model, it is possible to realize realistic communication with a human at a higher intellectual level.
[0009]
Conventionally, in order for a robot to perform an autonomous operation according to an environmental change, an action is described by a combination of simple action descriptions such as receiving a piece of information for one observation result and taking an action. By mapping actions to these inputs, it is possible to express non-unique and complex actions by introducing functions such as randomness, internal state (emotional / instinct), learning, and growth. A place where this behavior is expressed is called a behavior generation unit.
[0010]
For example, there is an autonomous robot apparatus having left and right legs for enabling a neck portion having a stereo camera composed of two CCD cameras for both eyes to be movable with respect to a body portion and further allowing for bipedal walking. An action generation unit that generates a neck motion is referred to as a neck motion generation unit.
[0011]
When the robot needs to recognize a portion close to human eyes, such as a user's face, using the neck motion generation unit, first, the robot apparatus itself uses a stereo camera composed of the two CCD cameras as both eyes. Then, face recognition is performed based on an image obtained by photographing a human face.
[0012]
Japanese Patent Application Laid-Open No. 14-072291 discloses a camera suitable for easily performing self-photographing by a photographer himself. This is a technology in which a reflection member is provided on the front of the camera, which is movable from a storage position to an exposure position, and the photographer takes a picture of the face on the reflection member to easily perform self-photographing.
[0013]
[Patent Document 1]
JP-A-14-072291
[0014]
[Problems to be solved by the invention]
In the robot device, there is a problem in appearance in view of the fact that providing the reflecting member disclosed in Patent Document 1 on the neck is a humanoid type. In addition, a high-precision adjustment is required to capture an image for face recognition by a CCD camera, and it is difficult to set the installation position of the reflection member due to the relationship with the viewing angle of the lens.
[0015]
However, when performing face recognition based on an image obtained by capturing the face of the robot, it is necessary to detect the face, point the CCD camera in the direction of the face, and put the face in the camera's field of view. At this time, if the movement of the motor that moves the robot camera is slow or the rotation angle range is limited and it takes time to detect the face, humans will feel frustrated or bored. It is thought that it rolls.
[0016]
In order to prevent such a situation, there is still a need for a means by which a person can easily recognize whether or not he or she is in the field of view of the robot.
[0017]
The present invention has been made in view of the above circumstances, and has as its object to provide a robot device capable of easily presenting whether or not a face to be subjected to face recognition is included in the visual field of the robot device.
[0018]
Another object of the present invention is to provide a method of presenting a face recognition situation by a robot device, which can easily present whether or not a face to be subjected to face recognition is in the visual field of the robot device.
[0019]
[Means for Solving the Problems]
In order to solve the above-mentioned problems, a robot apparatus according to the present invention is a robot apparatus that recognizes a human face present in an external environment using a small camera, and a hole that does not block a viewing angle of a lens of the small camera. A cover section surrounding the section, and the cover section is mirror-finished and used for presentation of the face recognition.
[0020]
Also, a method for presenting a face recognition situation by a robot apparatus according to the present invention is a method for presenting a face recognition situation by a robot apparatus for recognizing a human face present in an external environment using a small camera, wherein: A cover that surrounds a hole that does not block the viewing angle of the lens, and that has a mirror-finished cover, includes a step of reflecting a human face to be subjected to the face recognition; and Or a face that does not appear in the hole and is presented, and a face recognition situation is presented.
[0021]
Further, the present invention provides an information processing apparatus that captures an image using a small camera, comprising a cover that surrounds a hole that does not block the viewing angle of the lens of the small camera, and the cover has a mirror finish. The information processing apparatus may be configured to reflect the image and perform the reflection of the image and use the reflected image to present an imaging state.
[0022]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, a bipedal walking type robot apparatus shown as one configuration example of the present invention will be described in detail with reference to the drawings. This humanoid robot device is a practical robot that supports human activities in various situations in the living environment and other everyday life, and can act according to the internal state (anger, sadness, joy, enjoyment, etc.) It is an entertainment robot that can express basic actions performed by humans. That is, the behavior control can be autonomously performed based on the recognition result of the external stimulus such as the voice or the image.
[0023]
As shown in FIG. 1, in the robot apparatus 1, a head unit 3 is connected to a predetermined position of a trunk unit 2, and two left and right arm units 4R / 4L and two left and right leg units 5R / 5L are connected (however, each of R and L is a suffix indicating each of right and left. The same applies hereinafter).
[0024]
As shown in FIGS. 1 and 2, the robot apparatus 1 includes two small cameras using a CCD (charge coupled device) / CMOS (complementary metal-oxide semiconductor) image sensor as a visual sensor in the head unit 3. It is located at a position corresponding to the human eye. In the following, the eye portions 400L and 400R are described for convenience. These eye portions 400L and 400R are close to the left and right eyes of a human, and their parallax is not so problematic when capturing an image for face recognition.
[0025]
The robot apparatus 1 captures a human face existing in the external environment using a small camera constituting the eyes 400L and 400R, and performs image recognition processing such as face recognition and color recognition and feature extraction based on the captured image. I do.
[0026]
As shown in FIGS. 3 and 4, the eyes 400L and 400R have a cover 410 surrounding the hole 412 that does not block the viewing angle of the lenses 420L and 420R of the small cameras 200L and 200R. The cover section 410 is formed of, for example, plastic and constitutes a front cover of the lenses 420L and 420R of the camera. In particular, as shown in FIG. 5, the cover 410 has a curved surface close to a pyramidal surface or a conical surface that forms an angle (indicated by a dashed line 425) substantially equal to the viewing angle (indicated by a dashed line 425) of the camera lens 420 The shape of the cover around the hole 412 cut into a shape is a convex curved surface close to a sphere. Further, the cover 410 has a mirror surface with a high reflectance by evaporating aluminum, silver or chromium on the convex portion.
[0027]
By making the cover portion 410 a mirror surface having a high reflectance, the user existing in the external environment reflects his / her own form as shown in FIGS. 6 (a), (b) and (c). You can see. In FIG. 6A, the user's face 450 and body 451 appear on the left side of the mirror surface of the cover 410. In FIG. 6B, the face 450 and the body 451 of the user appear on the right side of the mirror of the cover 410. However, in FIG. 6C, the body 451 is reflected in the lower center of the mirror surface, but the head is over the hole 412 and cannot be seen from the eyes of the user.
[0028]
When the robot apparatus 1 displays the face 450 and the body 451 of the user on the left side of the mirror of the cover 410, the face is not recognized (FIG. 6A). Even when the face 450 and the body 451 of the user are projected on the right side of the mirror of the bar 410, the face is not recognized (FIG. 6B). However, the face is recognized when the body 451 is projected at the lower center of the mirror surface but the head is over the hole 412 so as to be invisible from the eyes of the user (FIG. 6 (c)). . That is, the state (c) of FIG. 6 indicates that the face 450 is within the viewing angle of the lens of the robot camera. Of course, in some cases, it is sufficient that the face 450 is in the hole 412 to enable face recognition.
[0029]
In this way, the user can confirm the state where the face is projected on the mirror surface of the cover unit 410 or the state where the face is entered and not projected in the hole, and thereby, the presentation of the face recognition status in the robot apparatus is performed. receive.
[0030]
That is, the robot apparatus executes the method for presenting the face recognition status, and surrounds the hole that does not block the viewing angle of the lens of the small camera, and provides the face recognition A step of reflecting a target human face; presenting a state in which the face is projected on the cover portion by the reflecting step, or a state in which the face enters a hole and not projected, and presents a face recognition state. be able to.
[0031]
Therefore, since the user can easily confirm the direction and the viewing angle of the camera, the face recognition processing of the robot device can be promptly performed.
[0032]
FIG. 7 schematically shows a bipedal walking robot device including the aforementioned eye portions 400L and 400R. As shown in FIG. 7, a head unit 250 of the robot apparatus 1 is provided with two CCD cameras 200R and 200L, and a stereo image processing device 210 is provided at a stage subsequent to the CCD cameras 200R and 200L. I have. A right-eye image 201R and a left-eye image 201L captured by two CCD cameras (hereinafter, referred to as right eye 200R and left eye 200L) are input to the stereo image processing device 210. The stereo image processing device 210 calculates disparity data (distance information) of each of the images 201R and 201L, and calculates a color image (YUV: luminance Y, UV color difference) 202 and a parallax image (YDR: luminance Y, parallax D). , Reliability R) 203 are alternately calculated for each frame. Here, the parallax indicates a difference between a point in space that is mapped to a left eye and a right eye, and changes according to a distance from the camera.
[0033]
The color image 202 and the parallax image 203 are input to a CPU (control unit) 220 built in the trunk 260 of the robot device 1. Further, an actuator 230 is provided at each joint of the robot device 1, and a control signal 231 serving as a command from the CPU 220 is supplied, and the motor is driven according to the command value. A potentiometer is attached to each joint (actuator), and the rotation angle of the motor at that time is sent to the CPU 220. Each sensor 240 such as a potentiometer attached to this actuator, a touch sensor attached to the sole, and a gyro sensor attached to the trunk is used for the current robot device such as the current joint angle, installation information, and posture information. Is measured and output to the CPU 220 as sensor data 241. The CPU 220 receives the color image 202 and the parallax image 203 from the stereo image processing device 210, and sensor data 241 such as all joint angles of the actuator, and the data are processed by software described later to autonomously perform various operations. It is possible to perform it.
[0034]
FIG. 8 is a schematic diagram illustrating a configuration of software for operating the robot device according to the present embodiment. The software according to the present embodiment is configured in units of objects, recognizes the position, movement amount, surrounding obstacles, landmarks, landmark maps, actionable areas, and the like of the robot device, and should finally take the robot device. It performs various kinds of recognition processing for outputting an action sequence of actions. Note that, as coordinates indicating the position of the robot apparatus, for example, a camera coordinate system of a world reference system (hereinafter, also referred to as absolute coordinates) having a specific object such as a landmark as the origin of the coordinates, and a center of the robot apparatus itself. Two coordinates are used: a robot center coordinate system (hereinafter, also referred to as relative coordinates), which is set as an origin of coordinates.
[0035]
Objects communicate with each other asynchronously to operate the entire system. Each object exchanges data and starts (Invoke) a program by message communication and an inter-object communication method using a shared memory. As shown in FIG. 8, the software 300 of the robot apparatus according to the present embodiment includes a movement amount calculation unit (Kinematics Odometry) KINE 310 for calculating the movement amount of the robot apparatus, and a plane for extracting a plane in the environment. An extractor (Plane Extractor) PLEX320, an obstacle grid calculator (Occupancy Grid) OG330 for recognizing obstacles in the environment, and an environment including artificial landmarks are supplied from their own sensor information and movement amount calculator. A landmark position detection unit (Landmark Sensor) CLS340 that specifies the self-position (position and posture) of the robot device and the position information of a landmark to be described later based on its own operation information, and converts the robot center coordinates into absolute coordinates. An absolute coordinate calculation unit (Localization) LZ350 and an action determination unit (Suited behavior Layer) SBL360 that determines an action to be taken by the robot apparatus are processed for each object.
[0036]
FIG. 9 schematically shows the configuration of the degrees of freedom of the joints provided in the robot apparatus 1. The neck joint that supports the head unit 3 has three degrees of freedom: a neck joint yaw axis 101, a neck joint pitch axis 102, and a neck joint roll axis 103.
[0037]
Each arm unit 4R / L constituting the upper limb includes a shoulder joint pitch axis 107, a shoulder joint roll axis 108, an upper arm yaw axis 109, an elbow joint pitch axis 110, a forearm yaw axis 111, and a wrist. It comprises a joint pitch shaft 112, a wrist joint roll wheel 113, and a hand 114. The hand 114 is actually a multi-joint / multi-degree-of-freedom structure including a plurality of fingers. However, since the operation of the hand 114 has little contribution or influence on the posture control and the walking control of the robot apparatus 1, it is assumed in this specification that the degree of freedom is zero. Therefore, each arm has seven degrees of freedom.
[0038]
The trunk unit 2 has three degrees of freedom, namely, a trunk pitch axis 104, a trunk roll axis 105, and a trunk yaw axis 106.
[0039]
Each of the leg units 5R / L constituting the lower limb includes a hip joint yaw axis 115, a hip joint pitch axis 116, a hip joint roll axis 117, a knee joint pitch axis 118, an ankle joint pitch axis 119, and an ankle joint. It is composed of a roll shaft 120 and a foot 121. In the present specification, the intersection of the hip joint pitch axis 116 and the hip joint roll axis 117 defines the hip joint position of the robot device 1. Although the foot 121 of the human body is actually a structure including a sole with multiple joints and multiple degrees of freedom, the sole of the robot apparatus 1 has zero degrees of freedom. Therefore, each leg has six degrees of freedom.
[0040]
Summarizing the above, the robot apparatus 1 as a whole has a total of 3 + 7 × 2 + 3 + 6 × 2 = 32 degrees of freedom. However, the robot device 1 for entertainment is not necessarily limited to 32 degrees of freedom. Needless to say, the degree of freedom, that is, the number of joints, can be appropriately increased or decreased according to design / production constraints and required specifications.
[0041]
Each degree of freedom of the robot device 1 as described above is actually implemented using an actuator. It is preferable that the actuator is small and lightweight because of requirements such as removing excess swelling on the appearance and approximating the human body shape, and controlling the posture of an unstable structure called bipedal walking. .
[0042]
FIG. 10 schematically shows a control system configuration of the robot device 1. As shown in the figure, the robot apparatus 1 performs a cooperative operation between the trunk unit 2, the head unit 3, the arm unit 4R / L, and the leg unit 5R / L that represent a human limb. The control unit 10 performs adaptive control for realizing the control.
[0043]
The operation of the entire robot apparatus 1 is totally controlled by the control unit 10. The control unit 10 includes a main control unit 11 including a main processing component (not shown) such as a CPU (Central Processing Unit), a DRAM, and a flash ROM; It comprises a peripheral circuit 12 including an interface (not shown) for transmitting and receiving commands.
[0044]
The installation location of the control unit 10 is not particularly limited. Although it is mounted on the trunk unit 2 in FIG. 10, it may be mounted on the head unit 3. Alternatively, the control unit 10 may be provided outside the robot device 1 so as to communicate with the body of the robot device 1 by wire or wirelessly.
[0045]
Each joint degree of freedom in the robot apparatus 1 shown in FIG. 10 is realized by the corresponding actuator. In other words, the head unit 3 includes a neck joint yaw axis actuator A that expresses each of the neck joint yaw axis 101, the neck joint pitch axis 102, and the neck joint roll axis 103. ₂ , Neck joint pitch axis actuator A ₃ , Neck joint roll axis actuator A ₄ Are arranged.
[0046]
In addition, the head unit 3 is provided with a CCD (Charge Coupled Device) camera for imaging an external situation, a distance sensor for measuring a distance to an object located ahead, and an external sound. A microphone for collecting sound, a speaker for outputting sound, a touch sensor for detecting pressure received by a physical action such as “stroke” and “hit” from the user, and the like are provided.
[0047]
Further, the trunk unit 2 includes a trunk pitch axis actuator A that represents each of a trunk pitch axis 104, a trunk roll axis 105, and a trunk yaw axis 106. ₅ , Trunk roll axis actuator A ₆ , Trunk yaw axis actuator A ₇ Are arranged. Further, the trunk unit 2 includes a battery serving as a power supply for starting the robot device 1. This battery is constituted by a chargeable / dischargeable battery.
[0048]
The arm unit 4R / L is subdivided into an upper arm unit 41R / L, an elbow joint unit 42R / L, and a forearm unit 43R / L. Shoulder joint pitch axis actuator A expressing each of yaw axis 109, elbow joint pitch axis 110, forearm yaw axis 111, wrist joint pitch axis 112, and wrist joint roll axis 113 ₈ , Shoulder joint roll axis actuator A ₉ , Upper arm yaw axis actuator A ₁₀ , Elbow joint pitch axis actuator A ₁₁ , Elbow joint roll axis actuator A ₁₂ , Wrist joint pitch axis actuator A _Thirteen , Wrist joint roll axis actuator A ₁₄ Is deployed.
[0049]
The leg unit 5R / L is subdivided into a thigh unit 51R / L, a knee unit 52R / L, and a shin unit 53R / L. Hip joint yaw axis actuator A expressing each of roll axis 117, knee joint pitch axis 118, ankle joint pitch axis 119, and ankle joint roll axis 120 ₁₆ , Hip joint pitch axis actuator A ₁₇ , Hip roll axis actuator A ₁₈ , Knee joint pitch axis actuator A ₁₉ , Ankle joint pitch axis actuator A ₂₀ , Ankle joint roll axis actuator A ₂₁ Is deployed. Actuator A used for each joint ₂ , A ₃ .. Can more preferably be constituted by a small AC servo actuator of a type in which a servo control system is directly integrated into a single gear and mounted in a motor unit.
[0050]
For each mechanism unit such as the trunk unit 2, head unit 3, each arm unit 4R / L, each leg unit 5R / L, etc., the sub-control units 20, 21, 22R / L, 23R of the actuator drive control unit are provided. / L is deployed. Further, a grounding confirmation sensor 30R / L for detecting whether the sole of each leg unit 5R / L has landed is mounted, and a posture sensor 31 for measuring a posture is provided in the trunk unit 2. are doing.
[0051]
The ground contact confirmation sensor 30R / L is configured by, for example, a proximity sensor or a micro switch installed on the sole of the foot. The posture sensor 31 is configured by, for example, a combination of an acceleration sensor and a gyro sensor.
[0052]
Based on the output of the ground contact confirmation sensor 30R / L, it is possible to determine whether each of the left and right legs is in a standing leg or a free leg during an operation such as walking or running. In addition, the output of the posture sensor 31 can detect the inclination and posture of the trunk.
[0053]
The main control unit 11 can dynamically correct the control target in response to the output of each of the sensors 30R / L and 31. More specifically, a whole body in which the upper limb, trunk, and lower limb of the robot apparatus 1 are driven in a coordinated manner by performing adaptive control on each of the sub-control units 20, 21, 22, R / L, and 23R / L. Exercise patterns can be realized.
[0054]
The whole body motion of the robot apparatus 1 on the body sets foot motion, ZMP (Zero Moment Point) trajectory, trunk motion, upper limb motion, waist height, and the like, and instructs motion according to these settings. Is transferred to the sub-control units 20, 21, 22R / L and 23R / L. Each of the sub-control units 20, 21,... Interprets the command received from the main control unit 11 and ₂ , A ₃ .. Output a drive control signal. Here, “ZMP” refers to a point on the floor where the moment due to the floor reaction force during walking becomes zero, and “ZMP trajectory” refers to, for example, during the walking operation of the robot apparatus 1. The trajectory of ZMP movement. The concept of ZMP and the application of ZMP to the stability discrimination standard of walking robots are described in "LEGGED LOCOMMOTION ROBOTS" by Miomir Vukobravicic (Ichiro Kato, "Walking Robots and Artificial Feet" (Nikkan Kogyo Shimbun)). It is described in.
[0055]
As described above, in the robot apparatus 1, each of the sub-control units 20, 21,... ₂ , A ₃ , A drive control signal is output to control the drive of each unit. Thereby, the robot apparatus 1 stably transitions to the target posture and can walk in a stable posture.
[0056]
The control unit 10 of the robot apparatus 1 also includes various sensors such as an acceleration sensor, a touch sensor, and a ground contact confirmation sensor, image information from a CCD camera, and voice information from a microphone, in addition to the posture control described above. We are processing it collectively. In the control unit 10, although not shown, various sensors such as an acceleration sensor, a gyro sensor, a touch sensor, a distance sensor, a microphone, and a speaker, each actuator, a CCD camera, and a battery are connected to the main control unit 11 via corresponding hubs. Have been.
[0057]
The main control unit 11 sequentially captures sensor data, image data, and audio data supplied from each of the above-described sensors, and sequentially stores these at predetermined positions in the DRAM via the respective internal interfaces. Further, the main control unit 11 sequentially takes in remaining battery power data indicating the remaining battery power supplied from the battery, and stores the data at a predetermined position in the DRAM. The sensor data, image data, audio data, and remaining battery data stored in the DRAM are used when the main control unit 11 controls the operation of the robot device 1.
[0058]
When the power of the robot apparatus 1 is turned on, the main control unit 11 reads a control program and stores the control program in the DRAM. In addition, the main control unit 11 determines the status of itself and the surroundings based on the sensor data, image data, audio data, and remaining battery data sequentially stored in the DRAM from the main control unit 11 as described above. Judge whether or not there is an instruction and the action.
[0059]
Further, the main control unit 11 determines an action according to its own situation based on the determination result and the control program stored in the DRAM, and drives the necessary actuator based on the determination result to thereby control the robot apparatus 1. Then, he or she takes actions such as so-called “gesture” and “hand gesture”.
[0060]
In this way, the robot device 1 can determine its own and surroundings based on the control program, and can act autonomously according to instructions and actions from the user.
[0061]
By the way, the robot device 1 can act autonomously according to the internal state. Accordingly, an example of a software configuration of a control program in the robot device 1 will be described with reference to FIGS. As described above, this control program is stored in the flash ROM 12 in advance, and is read out at the initial stage of turning on the power of the robot apparatus 1.
[0062]
In FIG. 11, the device driver layer 40 is located at the lowest layer of the control program, and includes a device driver set 41 including a plurality of device drivers. In this case, each device driver is an object permitted to directly access hardware used in a normal computer such as a CCD camera and a timer, and performs processing upon receiving an interrupt from the corresponding hardware.
[0063]
The robotic server object 42 is located at the lowest layer of the device driver layer 40, and includes, for example, the various sensors and actuators 28 described above. ₁ ~ 28 _n A virtual robot 43, which is a software group that provides an interface for accessing hardware such as a virtual robot, a power manager 44, which is a software group that manages switching of power supplies, and software that manages various other device drivers It comprises a group of device driver managers 45 and a designed robot 46 which is a group of software for managing the mechanism of the robot apparatus 1.
[0064]
The manager object 47 includes an object manager 48 and a service manager 49. The object manager 48 is a software group that manages activation and termination of each software group included in the robotic server object 42, the middleware layer 50, and the application layer 51, and the service manager 49 A group of software that manages the connection of each object based on the connection information between the objects described in the connection file stored in the memory card.
[0065]
The middleware layer 50 is located on the upper layer of the robotic server object 42 and is composed of a software group that provides basic functions of the robot device 1 such as image processing and sound processing. Further, the application layer 51 is located on the upper layer of the middleware layer 50, and determines the action of the robot device 1 based on the processing result processed by each software group constituting the middleware layer 50. It consists of a group of software for performing
[0066]
FIG. 12 shows specific software configurations of the middleware layer 50 and the application layer 51.
[0067]
As shown in FIG. 12, the middle wear layer 50 includes noise detection, temperature detection, brightness detection, scale recognition, distance detection, posture detection, touch sensor, motion detection, and color recognition. A recognition system 70 having signal processing modules 60 to 68 and an input semantics converter module 69 for output, an output semantics converter module 78 and posture management, tracking, motion reproduction, walking, falling back, LED lighting And an output system 79 having signal processing modules 71 to 77 for sound reproduction.
[0068]
Each of the signal processing modules 60 to 68 of the recognition system 70 captures the corresponding data among the sensor data, image data, and audio data read from the DRAM by the virtual robot 43 of the robotic server object 42, and , And gives the processing result to the input semantics converter module 69. Here, for example, the virtual robot 43 is configured as a part that exchanges or converts signals according to a predetermined communication protocol.
[0069]
The input semantics converter module 69 performs “noisy”, “hot”, “bright”, “detected the ball”, “detected fall” based on the processing result given from each of the signal processing modules 60 to 68, Self and surrounding conditions, such as "stroke", "hit", "hearing the domes", "detecting a moving object" or "detecting an obstacle", and commands from the user And the action is recognized, and the recognition result is output to the application layer 41.
[0070]
As shown in FIG. 13, the application layer 51 includes five modules: a behavior model library 80, a behavior switching module 81, a learning module 82, an emotion model 83, and an instinct model 84.
[0071]
As shown in FIG. 14, the behavior model library 80 includes “when the remaining battery power is low”, “returns to fall”, “when avoids obstacles”, “when expressing emotions”, Independent behavior models are respectively provided in correspondence with several preselected condition items such as "when is detected."
[0072]
Then, these behavior models are used as described later when the recognition result is given from the input semantics converter module 69 or when a certain period of time has passed since the last recognition result was given. The subsequent action is determined with reference to the corresponding emotion parameter value held in the model 83 and the corresponding desire parameter value held in the instinct model 84, and the determined result is output to the action switching module 81. .
[0073]
In the case of this embodiment, each behavior model uses one node (state) NODE as shown in FIG. ₀ ~ NODE _n From any other node NODE ₀ ~ NODE _n To each node NODE ₀ ~ NODE _n Arc ARC connecting between ₁ ~ ARC _n1 Transition probability P set for ₁ ~ P _n An algorithm called a finite stochastic automaton that determines stochastically based on is used.
[0074]
Specifically, each behavior model is a node NODE that forms its own behavior model. ₀ ~ NODE _n Corresponding to each of these nodes NODE ₀ ~ NODE _n Each has a state transition table 90 as shown in FIG.
[0075]
In this state transition table 90, the node NODE ₀ ~ NODE _n , Input events (recognition results) as transition conditions are listed in order of priority in the column of “input event name”, and further conditions for the transition condition are described in corresponding rows in the columns of “data name” and “data range”. Have been.
[0076]
Therefore, the node NODE represented by the state transition table 90 of FIG. ₁₀₀ In the case where the recognition result of “detection of ball (BALL)” is given, the “size” of the ball given together with the recognition result is in the range of “0 to 1000”, If a recognition result of “obstacle detected (OBSTACLE)” is given, the other node may indicate that the “distance” to the obstacle given together with the recognition result is in the range of “0 to 100”. This is the condition for transitioning to.
[0077]
Also, this node NODE ₁₀₀ Then, even when there is no input of the recognition result, among the parameter values of each emotion and each desire held in the emotion model 83 and the instinct model 84 that the behavior model refers to periodically, the emotion model 83 holds When the parameter value of any of "joy", "surprise", or "sadness" is in the range of "50 to 100", it is possible to transit to another node. I have.
[0078]
Further, in the state transition table 90, the node NODE is added to the row of “transition destination node” in the column of “transition probability to another node”. ₀ ~ NODE _n The node names that can be transitioned from are listed, and other nodes NODE that can transition when all the conditions described in the columns of “input event name”, “data name”, and “data range” are met ₀ ~ NODE _n To the corresponding node in the column “Transition probability to another node”, and the node NODE ₀ ~ NODE _n The action to be output when transitioning to is described in the row of “output action” in the column of “transition probability to another node”. Note that the sum of the probabilities of each row in the column of “transition probability to another node” is 100 [%].
[0079]
Therefore, the node NODE represented by the state transition table 90 of FIG. ₁₀₀ Then, for example, when "the ball is detected (BALL)" and a recognition result indicating that the "SIZE (size)" of the ball is in the range of "0 to 1000" is given, "30 [%]" With the probability of "node NODE ₁₂₀ (Node 120) ", and the action of" ACTION1 "is output at that time.
[0080]
Each behavior model has a node NODE described as such a state transition table 90. ₀ ~ NODE _n Are connected to each other, and when a recognition result is given from the input semantics converter module 69, the corresponding node NODE ₀ ~ NODE _n The following action is stochastically determined using the state transition table, and the determination result is output to the action switching module 81.
[0081]
The action switching module 81 illustrated in FIG. 14 selects an action output from an action model with a predetermined higher priority order among actions output from each action model of the action model library 80, and executes the action. A command to be performed (hereinafter referred to as an action command) is sent to the output semantics converter module 78 of the middleware layer 50. Note that, in this embodiment, the behavior model described on the lower side in FIG. 15 has a higher priority.
[0082]
Further, the behavior switching module 81 notifies the learning module 82, the emotion model 83, and the instinct model 84 that the behavior is completed, based on the behavior completion information provided from the output semantics converter module 78 after the behavior is completed.
[0083]
On the other hand, the learning module 82 inputs, from among the recognition results given from the input semantics converter module 69, the recognition result of the instruction received from the user, such as “hit” or “stroke”.
[0084]
Then, based on the recognition result and the notification from the action switching module 71, the learning module 82 lowers the probability of occurrence of the action when "beaten (scolded)" and "strokes (praised)". ) ", The corresponding transition probability of the corresponding behavior model in the behavior model library 70 is changed so as to increase the occurrence probability of the behavior.
[0085]
On the other hand, the emotion model 83 is the sum of “joy”, “sadness”, “anger”, “surprise”, “disgust”, and “fear”. For each of the six emotions, a parameter indicating the intensity of the emotion is stored. Then, the emotion model 83 converts the parameter values of these emotions into specific recognition results such as “hit” and “stroke” given from the input semantics converter module 69, and the elapsed time and action switching module 81. It is updated periodically based on the notification from.
[0086]
Specifically, the emotion model 83 is determined by a predetermined arithmetic expression based on the recognition result given from the input semantics converter module 69, the behavior of the robot device 1 at that time, the elapsed time since the last update, and the like. Assuming that the amount of variation of the emotion at that time is ΔE [t], the parameter value of the emotion is E [t], and the coefficient representing the sensitivity of the emotion is ke, the following equation (11) is used. The parameter value E [t + 1] of the emotion in the cycle is calculated, and the parameter value of the emotion is updated by replacing it with the current parameter value E [t] of the emotion. Similarly, the emotion model 83 is updated with the parameter values of all emotions.
E = [t + 1] = E = [t] + ke × ΔE
It is determined in advance how much each recognition result and the notification from the output semantics converter module 78 affect the variation ΔE [t] of the parameter value of each emotion, such as “hit”. The recognition result has a great influence on the variation ΔE [t] of the parameter value of the emotion of “anger”, and the recognition result such as “stroke” indicates the variation ΔE [t] of the parameter value of the emotion of “joy”. Has become a major influence.
[0087]
Here, the notification from the output semantics converter module 78 is so-called action feedback information (action completion information), information of the appearance result of the action, and the emotion model 83 changes the emotion by such information. Let it. This is, for example, a behavior such as "shouting" that lowers the emotional level of anger. Note that the notification from the output semantics converter module 78 is also input to the learning module 82 described above, and the learning module 82 changes the corresponding transition probability of the behavior model based on the notification.
[0088]
The feedback of the action result may be made by the output of the action switching module 81 (the action to which the emotion is added).
[0089]
On the other hand, the instinct model 84 has four independent desires of “exercise”, “affection”, “appetite”, and “curiosity”, which are independent of each other. It holds a parameter indicating the strength of the desire. Then, the instinct model 84 periodically updates the parameter values of these desires based on the recognition result given from the input semantics converter module 69, the elapsed time, the notification from the action switching module 81, and the like.
[0090]
Specifically, the instinct model 84 uses a predetermined arithmetic expression based on the recognition result, the elapsed time, the notification from the output semantics converter module 78, and the like for “exercise desire”, “affection desire”, and “curiosity”. ΔI [k] is the variation of the desire at that time, I [k] is the current parameter value of the desire, and a coefficient k representing the sensitivity of the desire. _i In a predetermined cycle, the parameter value I [k + 1] of the desire in the next cycle is calculated using the following equation (12), and this calculation result is replaced with the current parameter value I [k] of the desire. Update the parameter value of that desire. Similarly, the instinct model 84 updates the parameter values of each desire except for “appetite”.
I [k + 1] = I [k] + ki × ΔI [k]
The degree to which the recognition result and the notification from the output semantics converter module 78 affect the variation ΔI [k] of the parameter value of each desire is determined in advance. Has a large effect on the variation ΔI [k] of the parameter value of “fatigue”.
[0091]
In this specific example, the parameter value of each emotion and each desire (instinct) is regulated to fluctuate in the range from 0 to 100, and the values of the coefficients ke and ki are also adjusted for each emotion and each desire. Each is set individually.
[0092]
On the other hand, as shown in FIG. 12, the output semantics converter module 78 of the middleware layer 50 outputs “forward”, “please”, and “squeals” given from the action switching module 81 of the application layer 51 as described above. ”Or“ tracking (following the ball) ”to the corresponding signal processing modules 71 to 77 of the output system 79.
[0093]
When the action command is given, the signal processing modules 71 to 77 perform, based on the action command, a servo command value to be given to a corresponding actuator to perform the action, audio data of a sound output from a speaker, and / or Drive data to be given to the LEDs is generated, and these data are sequentially sent to the corresponding actuators, speakers, or LEDs via the virtual robot 43 of the robotic server object 42 and the signal processing circuit.
[0094]
In this way, the robot apparatus 1 can perform autonomous actions according to its own (internal) and surrounding (external) conditions, and instructions and actions from the user, based on the above-described control program.
[0095]
Such a control program is provided via a recording medium recorded in a format readable by the robot device. As a recording medium for recording the control program, a recording medium of a magnetic reading system (for example, a magnetic tape, a flexible disk, a magnetic card) and a recording medium of an optical reading system (for example, a CD-ROM, an MO, a CD-R, a DVD) And so on. The recording medium also includes a storage medium such as a semiconductor memory (a so-called memory card (regardless of shape such as a rectangular shape or a square shape, an IC card)). Further, the control program may be provided via the so-called Internet or the like.
[0096]
These control programs are reproduced via a dedicated read driver device, a personal computer, or the like, and transmitted to and read from the robot device 1 via a wired or wireless connection. When the robot device 1 includes a drive device for a miniaturized storage medium such as a semiconductor memory or an IC card, the control program can be directly read from the storage medium.
[0097]
The face recognition processing will be described below. The robot apparatus includes a face detector and detects a face area from an image frame. Image data captured by a small camera is received, and it is reduced and converted into a nine-stage scale image. A rectangular area corresponding to a face is searched from all the images. Overlapping candidate areas are reduced, and face information (FaceInfo) such as a position, a size, and a feature amount regarding an area finally determined as a face is detected. The face information detected here is sent to the short-term storage unit.
[0098]
The short-term storage unit is an object that holds information on the external environment of the robot device 1 for a relatively short time, and includes a sound (voice) detection result from the sound detection unit, a color detection result from the color detection unit, and a face detection result from the face detection unit. A detection result and a joint angle sensor output from the joint angle detection unit are received. Then, the plurality of pieces of detection information are integrated so as to maintain consistency in time and space, treated as meaningful integrated information, and held for a relatively short time, for example, 15 seconds. The integrated information is passed to the operation command unit and the neck operation generation unit.
[0099]
The operation command unit is a higher module such as a situation-dependent behavior layer. A command for instructing the neck operation to the neck operation generator is generated. By specifying an ID (object ID) related to the integrated information stored in the short-term storage unit, that is, the robot apparatus 1 can perform the tracking operation only by specifying what to track. .
[0100]
The neck motion generation unit is a module that calculates a joint angle of the neck in response to receiving a command to move the neck from the motion command unit. When a command of “tracking” (in the present embodiment, the target object ID) is received, a joint angle of a neck in a direction in which the object is present is calculated and output based on information received from the short-term storage unit. . The object ID is received, the optimum object information is selected, and the selected object information is turned in the direction in which it can be obtained.
[0101]
When recognizing a face, it is necessary to detect the face, point the CCD camera in the direction of the face, and put the face in the field of view of the camera. The movement of the motor that moves the robot camera is slow, or the rotation angle range is limited. There is a problem that it takes time to detect a face due to the presence of a face. However, according to the present invention, it is easy to indicate whether or not a face to be subjected to face recognition is included in the visual field of the robot apparatus. Therefore, the robot apparatus can quickly perform the face recognition processing.
[0102]
Further, the robot apparatus executes the face recognition situation presenting method of the present invention while interacting with the user to be subjected to face recognition, and presents whether or not the face to be subjected to face recognition is included in the visual field. You may.
[0103]
The application of the present invention is not limited only to the above embodiment. The following modifications can be given. For example, the mirror surface may be installed at another location so as to project a range equivalent to the field of view of the robot device. Further, a margin may be provided between the display range of the mirror and the viewing angle of the camera so that the confirmation can be made more reliably. Further, the convex mirror portion of the cover may have a shape other than a spherical shape, for example, a conical shape or a polygonal pyramid shape.
[0104]
Further, the present invention provides an information processing apparatus, such as a mobile phone, a portable information processing terminal (PDA), or a portable personal computer, which captures an image using a small camera, and closes a viewing angle of a lens of the small camera. The present invention may be applied to an information processing apparatus having a cover portion surrounding a hole having no hole, applying a mirror finish to the cover portion, reflecting the image, and using the image to present an imaging situation.
[0105]
【The invention's effect】
According to the present invention, in a robot apparatus for recognizing a human face present in an external environment using a small camera, the robot apparatus includes a cover section surrounding a hole that does not block a viewing angle of a lens of the small camera, Of the robot device is used to present face recognition, so that it can be easily presented whether or not a face to be subjected to face recognition is included in the visual field of the robot device.
[Brief description of the drawings]
FIG. 1 is a diagram showing an external configuration of a robot apparatus, and is a perspective view showing a humanoid biped walking robot apparatus.
FIG. 2 is a diagram showing an arrangement of eyes provided on a head.
FIG. 3 is an enlarged view of an eye part.
FIG. 4 is a sectional view of a head.
FIG. 5 is a cross-sectional view of an eye and also shows a viewing angle.
FIG. 6 is a diagram for explaining a method of presenting a face recognition situation.
FIG. 7 is a block diagram schematically illustrating a robot device.
FIG. 8 is a schematic diagram illustrating a configuration of software that operates the robot device.
FIG. 9 is a diagram schematically illustrating a degree of freedom configuration model of the robot device according to the embodiment of the present invention.
FIG. 10 is a block diagram showing a circuit configuration of the robot device.
FIG. 11 is a block diagram showing a software configuration of the robot device.
FIG. 12 is a block diagram showing a configuration of a middleware layer in a software configuration of the robot apparatus.
FIG. 13 is a block diagram showing a configuration of an application layer in the software configuration of the robot device.
FIG. 14 is a block diagram showing a configuration of a behavior model library of an application layer.
FIG. 15 is a diagram illustrating a finite probability automaton serving as information for determining an action of the robot apparatus.
FIG. 16 is a diagram showing a state transition table prepared for each node of the finite probability automaton.
[Explanation of symbols]
1 robot device, 3 heads, 400L, 400R eyes, 410 cover (mirror surface), 412 holes

Claims

In a robot device that recognizes a human face in the external environment using a small camera,
A cover that surrounds a hole that does not block the viewing angle of the lens of the small camera,
A robot apparatus, wherein the cover section is mirror-finished and used for presentation of the face recognition.

The robot device according to claim 1, wherein the cover is formed by depositing aluminum, silver, or chromium on a plastic that forms a part of a spherical surface.

A method of presenting a face recognition situation by a robot device that recognizes a human face present in an external environment using a small camera,
Surrounding a hole that does not block the viewing angle of the lens of the small camera, and a cover portion that has been subjected to mirror finishing, comprising a step of reflecting a human face to be subjected to the face recognition,
A state in which the face is projected on the cover portion by the reflecting step, or a state in which the face enters the hole and is not projected, thereby presenting a face recognition situation, whereby the face recognition situation by the robot apparatus is presented. Presentation method.