JP4131392B2

JP4131392B2 - Robot apparatus, robot control method, recording medium, and program

Info

Publication number: JP4131392B2
Application number: JP2003019065A
Authority: JP
Inventors: 康治浅野
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2003-01-28
Filing date: 2003-01-28
Publication date: 2008-08-13
Anticipated expiration: 2023-01-28
Also published as: JP2004230480A

Abstract

PROBLEM TO BE SOLVED: To improve recognition accuracy of the recognizing function of a robot. SOLUTION: In the case an action decided by an action deciding mechanism part 103 requires recognition processing, the action deciding mechanism part 103 sends a command to a distance estimating part 111 to the effect that a distance between the robot and a user is adjusted to a distance appropriate to recognition processing. A distance determining part 113 determines whether the distance between the robot and user estimated by the distance estimating part 111 is within a prescribed range supplied from a threshold setting part 112. In the case of determining the distance to be out of the prescribed range, a distance adjusting part 114 sends a command to the action deciding mechanism part 103 to the effect of adjusting the distance between the robot and user. The action deciding mechanism part 103 outputs action command information to an attitude transition mechanism part 104 or a voice synthesizing part 105 based on the command outputted from the distance adjusting part 114. This constitution is applicable to the robot, for instance. COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、ロボット装置およびロボット制御方法、記録媒体、並びにプログラムに関し、特に、例えば、ユーザとロボットの距離を調整することにより、ロボットの認識機能の認識精度を向上させることができるようにしたロボット装置およびロボット制御方法、記録媒体、並びにプログラムに関する。
【０００２】
【従来の技術】
近年においては、玩具等として、音声認識装置や画像認識装置などの認識機能を備えたロボット（本明細書においては、ぬいぐるみ状のものを含む）が製品化されている。例えば、音声認識装置を備えたロボットでは、ユーザが発した音声を音声認識し、その音声認識結果に基づいて、ある仕草をしたり、合成音を出力する等の行動を自律的に行うようになされている。
【０００３】
音声認識装置を備えたロボットが、ユーザが発した音声を音声認識する場合、音声を発したユーザが、ロボットから遠く離れすぎているときには、ロボットに装着されているマイクロホンにより取得されるユーザの発した音声波形の信号値は減衰し、相対的に雑音レベルが高くなる。つまり、マイクロホンにより取得されたユーザの音声信号のＳ／Ｎ比（Signal to Noise ratio）は低くなる。また、一般に、ユーザ（発話者）とロボット（に装着されているマイクロホン）の距離が大きくなるほど、音声信号の波形は、残響特性の影響を強く受ける。従って、ユーザとロボットの距離が離れすぎているときには、ロボットの音声認識装置の認識精度は悪くなる。
【０００４】
反対に、ユーザとロボットの距離が近すぎるときには、ロボットに装着されているマイクロホンにより取得されるユーザの発した音声波形の信号値は、マイクロホンの検出可能な範囲を超えてしまう。従って、マイクロホンにより取得された音声波形は、飽和したものとなり、本来の音声波形より歪んだ波形となる。ユーザとロボットの距離が近すぎる場合には、ロボットの音声認識装置は、このような歪んだ波形を音声認識することとなるので、音声認識の精度は悪くなる。
【０００５】
そこで、音声認識結果とともに、周囲雑音の影響を検知する周囲雑音検知、入力音声のパワーが特定の閾値条件を満たす状況を検知するパワー不足検知、パワー過多検知などの状況検知を行い、音声認識結果と状況検知の結果を利用して、ロボットにおける音声認識精度劣化の問題に対処する方法が提案されている（例えば、非特許文献１参照）。
【０００６】
【非特許文献１】
岩沢，大中，藤田，「状況検知を利用したロボット用音声認識インタフェースの一手法とその評価」，人工知能学会研究会資料，社団法人人工知能学会，平成１４年１１月，ｐ．３３−３８
【０００７】
【発明が解決しようとする課題】
しかしながら、非特許文献１に示される方法では、ユーザ（発話者）とロボット（に装着されているマイクロホン）の距離が考慮されていない。そのため、例えば、ユーザが大きな声でロボットに話しかけているにもかかわらず、ユーザとロボットの距離が遠く離れているために、マイクロホンに入力された音声信号が、Ｓ／Ｎの低いものとなり、ユーザの音声を認識することができない場合に、ロボットがユーザに対してより大きな声による発声を求めるというような問題が発生しうる。
【０００８】
一方、画像認識装置を備えたロボットにおいて、画像認識装置がユーザを撮像して得られる画像を用いてユーザを識別する場合、ユーザとロボットとの間の距離が遠く離れすぎているときには、画像を撮像した撮像装置の解像度などの影響により、ユーザの識別精度が劣化する。例えば、ロボットの画像認識装置が、撮像装置が撮像したユーザの顔画像からユーザを識別する場合、ロボットの撮像装置からユーザまでの距離が遠くなるほど、撮像画像における顔領域の画素数が少なくなり、その結果、画像認識装置が認識に利用することができる有効な画素が少なくなるため、画像認識装置の認識精度は悪くなることがある。
【０００９】
また、ユーザとロボットとの距離が近すぎるときには、撮像装置が出力する画像の画枠内に、ユーザの顔領域の全体が入りきらず、画像認識装置によるユーザの識別精度が悪くなることがある。
【００１０】
本発明は、このような状況に鑑みてなされたものであり、例えば、ロボットに装着された、音声や画像などの認識装置が、ユーザなど認識対象の音声や画像などを認識する場合において、ロボットとユーザとの距離が認識装置に適切な距離となるように、ロボットとユーザとの距離を調整することにより、ロボットの認識装置の認識精度を向上させることができるようにするものである。
【００１１】
【課題を解決するための手段】
本発明のロボット装置は、所定の認識対象を認識する認識機能を有するロボット装置において、周囲の状況を撮像し、画像信号を出力する複数の撮像手段と、音声を入力する音声入力手段と、超音波パルスを発して認識対象から反射する反射波を受信する超音波出力手段と、複数の撮像手段、音声入力手段、および超音波出力手段の出力結果に基づいて、認識対象までの距離を推定する距離推定手段と、距離推定手段により推定された距離に基づいて、認識対象までの距離を調整する距離調整手段とを備え、距離推定手段は、超音波出力手段の出力結果から推定される距離と複数の撮像手段および音声入力手段の出力結果から推定される距離との差が大きく、超音波出力手段の出力結果から推定される距離が短い場合、超音波出力手段が検出した物体は障害物であると判定して、超音波出力手段の出力結果を除外して認識対象までの距離を推定し、距離調整手段は、障害物があると判定された場合、ユーザが移動するようにロボット装置に行動させることにより、ユーザまでの距離を調整し、障害物がないと判定された場合、ロボット装置を移動させることにより、認識対象までの距離を調整することを特徴とする。
【００１２】
距離推定手段には、複数の撮像手段が出力する画像信号を用いてステレオ処理することにより、認識対象までの距離を推定させるようにすることができる。
【００１３】
複数の撮像手段が認識する認識対象は、ユーザであり、距離推定手段には、撮像手段が出力する画像信号を用いて、ユーザの顔領域を検出し、その顔領域に基づいて、ユーザまでの距離を推定させるようにすることができる。
【００１４】
音声入力手段が認識する認識対象は、ユーザであり、距離推定手段には、音声入力手段に入力されるユーザが発する音声の大きさに基づいて、ユーザまでの距離を推定させるようにすることができる。
【００１８】
音声を出力する音声出力手段をさらに設け、距離調整手段には、ユーザが移動するように、音声出力手段に音声を出力させることにより、ユーザまでの距離を調整させるようにすることができる。
【００１９】
距離調整手段には、ユーザが移動するようにユーザに促す動作を、ロボット装置にさせることにより、ユーザまでの距離を調整させるようにすることができる。
【００２０】
距離推定手段により推定された距離が、所定の範囲内であるかどうかを判定する判定手段をさらに設け、距離調整手段には、判定手段の判定結果に基づいて、認識対象までの距離を調整させるようにすることができる。
【００２１】
所定の範囲を設定する範囲設定手段をさらに設けることができる。
【００２２】
範囲設定手段には、音声入力手段の出力結果に基づいて周囲の背景雑音を計測し、その背景雑音の大きさによって所定の範囲を動的に設定させるようにすることができる。
また、範囲設定手段には、ロボット装置の動作状態を取得し、ロボット装置自身が発生する雑音成分を推定し、その雑音成分の大きさによって所定の範囲を動的に設定させるようにすることができる。
【００２３】
本発明のロボット制御方法は、所定の認識対象を認識する認識機能を有するロボット装置を制御するロボット制御方法において、周囲の状況を撮像し、画像信号を出力する撮像ステップと、音声を入力する音声入力ステップと、超音波パルスを発して認識対象から反射する反射波を受信する超音波出力ステップと、撮像ステップ、音声入力ステップ、および超音波出力ステップの処理結果に基づいて、認識対象までの距離を推定する距離推定ステップと、距離推定ステップの処理の結果に基づいて、認識対象までの距離を調整する距離調整ステップとを含み、距離推定ステップの処理は、超音波出力ステップの処理結果から推定される距離と撮像ステップおよび音声入力ステップの処理結果から推定される距離との差が大きく、超音波出力ステップの処理結果から推定される距離が短い場合、超音波パルスが反射した認識対象は障害物であると判定して、超音波出力ステップの処理結果を除外して認識対象までの距離を推定し、距離調整ステップの処理は、障害物があると判定された場合、ユーザが移動するようにロボット装置に行動させることにより、ユーザまでの距離を調整し、障害物がないと判定された場合、ロボット装置を移動させることにより、認識対象までの距離を調整することを特徴とする。
【００２４】
本発明の記録媒体のプログラムは、所定の認識対象を認識する認識機能を有するロボット装置の制御をコンピュータに行わせるプログラムであって、周囲の状況を撮像し、画像信号を出力する撮像ステップと、音声を入力する音声入力ステップと、超音波パルスを発して認識対象から反射する反射波を受信する超音波出力ステップと、撮像ステップ、音声入力ステップ、および超音波出力ステップの処理結果に基づいて、認識対象までの距離を推定する距離推定ステップと、距離推定ステップの処理の結果に基づいて、認識対象までの距離を調整する距離調整ステップとを含み、距離推定ステップの処理は、超音波出力ステップの処理結果から推定される距離と撮像ステップおよび音声入力ステップの処理結果から推定される距離との差が大きく、超音波出力ステップの処理結果から推定される距離が短い場合、超音波パルスが反射した認識対象は障害物であると判定して、超音波出力ステップの処理結果を除外して認識対象までの距離を推定し、距離調整ステップの処理は、障害物があると判定された場合、ユーザが移動するようにロボット装置に行動させることにより、ユーザまでの距離を調整し、障害物がないと判定された場合、ロボット装置を移動させることにより、認識対象までの距離を調整することを特徴とする。
【００２５】
本発明のプログラムは、所定の認識対象を認識する認識機能を有するロボット装置の制御をコンピュータに行わせるプログラムにおいて、周囲の状況を撮像し、画像信号を出力する撮像ステップと、音声を入力する音声入力ステップと、超音波パルスを発して認識対象から反射する反射波を受信する超音波出力ステップと、撮像ステップ、音声入力ステップ、および超音波出力ステップの処理結果に基づいて、認識対象までの距離を推定する距離推定ステップと、距離推定ステップの処理の結果に基づいて、認識対象までの距離を調整する距離調整ステップとを含み、距離推定ステップの処理は、超音波出力ステップの処理結果から推定される距離と撮像ステップおよび音声入力ステップの処理結果から推定される距離との差が大きく、超音波出力ステップの処理結果から推定される距離が短い場合、超音波パルスが反射した認識対象は障害物であると判定して、超音波出力ステップの処理結果を除外して認識対象までの距離を推定し、距離調整ステップの処理は、障害物があると判定された場合、ユーザが移動するようにロボット装置に行動させることにより、ユーザまでの距離を調整し、障害物がないと判定された場合、ロボット装置を移動させることにより、認識対象までの距離を調整する処理をコンピュータに実行させることを特徴とする。
【００２６】
本発明においては、超音波出力結果から推定される距離と撮像結果および音声入力結果から推定される距離との差が大きく、超音波出力結果から推定される距離が短い場合、超音波パルスが反射した認識対象は障害物であると判定して、超音波出力結果を除外して認識対象までの距離が推定され、障害物があると判定された場合、ユーザが移動するようにロボット装置に行動させることにより、ユーザまでの距離が調整され、障害物がないと判定された場合、ロボット装置を移動させることにより、認識対象までの距離が調整される。
【００２７】
【発明の実施の形態】
図１は、本発明を適用した２足歩行型のロボット１の正面方向の斜視図であり、図２は、ロボット１の背面方向からの斜視図である。また、図３は、ロボット１の軸構成について説明するための斜視図である。
【００２８】
ロボット１は、胴体部ユニット１１の上部に頭部ユニット１２が配設されるとともに、胴体部ユニット１１の上部左右に、同様の構成を有する腕部ユニット１３Ａおよび１３Ｂが所定位置にそれぞれ取り付けられ、かつ、胴体部ユニット１１の下部左右に、同様の構成を有する脚部ユニット１４Ａおよび１４Ｂが所定位置にそれぞれ取り付けられることにより構成されている。頭部ユニット１２には、タッチセンサ５１が設けられている。
【００２９】
胴体部ユニット１１においては、体幹上部を形成するフレーム２１および体幹下部を形成する腰ベース２２が、腰関節機構２３を介して連結することにより構成されており、体幹下部の腰ベース２２に固定された腰関節機構２３のアクチュエータＡ１、および、アクチュエータＡ２をそれぞれ駆動することによって、体幹上部を、図３に示す直交するロール軸２４およびピッチ軸２５の回りに、それぞれ独立に回転させることができるようになされている。
【００３０】
また頭部ユニット１２は、フレーム２１の上端に固定された肩ベース２６の上面中央部に首関節機構２７を介して取り付けられており、首関節機構２７のアクチュエータＡ３およびＡ４をそれぞれ駆動することによって、図３に示す直交するピッチ軸２８およびヨー軸２９の回りに、それぞれ独立に回転させることができるようになされている。
【００３１】
更に、腕部ユニット１３Ａおよび１３Ｂは、肩関節機構３０を介して肩ベース２６の左右にそれぞれ取り付けられており、対応する肩関節機構３０のアクチュエータＡ５およびＡ６をそれぞれ駆動することによって、図３に示す、直交するピッチ軸３１およびロール軸３２の回りに、それぞれを独立に回転させることができるようになされている。
【００３２】
腕部ユニット１３Ａおよび１３Ｂは、上腕部を形成するアクチュエータＡ７の出力軸に、肘関節機構３３を介して、前腕部を形成するアクチュエータＡ８が連結され、前腕部の先端に手部３４が取り付けられることにより構成されている。
【００３３】
そして腕部ユニット１３Ａおよび１３Ｂでは、アクチュエータＡ７を駆動することによって、前腕部を図３に示すヨー軸３５に対して回転させることができ、アクチュエータＡ８を駆動することによって、前腕部を図３に示すピッチ軸３６に対して回転させることができるようになされている。
【００３４】
脚部ユニット１４Ａおよび１４Ｂは、股関節機構３７を介して、体幹下部の腰ベース２２にそれぞれ取り付けられており、対応する股関節機構３７のアクチュエータＡ９乃至Ａ１１をそれぞれ駆動することによって、図３に示す、互いに直交するヨー軸３８、ロール軸３９、およびピッチ軸４０に対して、それぞれ独立に回転させることができるようになされている。
【００３５】
脚部ユニット１４Ａおよび１４Ｂは、大腿部を形成するフレーム４１の下端が、膝関節機構４２を介して、下腿部を形成するフレーム４３に連結されるとともに、フレーム４３の下端が、足首関節機構４４を介して、足部４５に連結されることにより構成されている。
【００３６】
これにより脚部ユニット１４Ａおよび１４Ｂにおいては、膝関節機構４２を形成するアクチュエータＡ１２を駆動することによって、図３に示すピッチ軸４６に対して、下腿部を回転させることができ、また足首関節機構４４のアクチュエータＡ１３およびＡ１４をそれぞれ駆動することによって、図３に示す直交するピッチ軸４７およびロール軸４８に対して、足部４５をそれぞれ独立に回転させることができるようになされている。
【００３７】
また、胴体部ユニット１１の体幹下部を形成する腰ベース２２の背面側には、後述するメイン制御部６１や周辺回路６２（いずれも図４）などを内蔵したボックスである、制御ユニット５２が配設されている。
【００３８】
図４は、ロボット１のアクチュエータとその制御系等の構成例を示している。
【００３９】
制御ユニット５２には、ロボット１全体の動作制御をつかさどるメイン制御部６１、電源回路および通信回路などの周辺回路６２、および、バッテリ７４（図５）などが収納されている
【００４０】
そして、制御ユニット５２は、各構成ユニット（胴体部ユニット１１、頭部ユニット１２、腕部ユニット１３Ａおよび１３Ｂ、並びに、脚部ユニット１４Ａおよび１４Ｂ）内にそれぞれ配設されたサブ制御部６３Ａ乃至６３Ｄと接続されており、サブ制御部６３Ａ乃至６３Ｄに対して必要な電源電圧を供給したり、サブ制御部６３Ａ乃至６３Ｄと通信を行う。
【００４１】
また、サブ制御部６３Ａ乃至６３Ｄは、対応する構成ユニット内のアクチュエータＡ１乃至Ａ１４と、それぞれ接続されており、メイン制御部６１から供給された各種制御コマンドに基づいて、構成ユニット内のアクチュエータＡ１乃至Ａ１４を、指定された状態に駆動させるように制御する。
【００４２】
図５は、ロボット１の電気的な内部構成例を示すブロック図である。
【００４３】
頭部ユニット１２には、ロボット１の「目」として機能するＣＣＤ（Charge Coupled Device ）カメラ８１Ｌおよび８１Ｒ、「耳」として機能するマイクロホン８２−１乃至８２−N、タッチセンサ５１、並びに超音波センサ８３などからなる外部センサ部７１、および、「口」として機能するスピーカ７２などがそれぞれ所定位置に配設され、制御ユニット５２内には、バッテリセンサ９１および加速度センサ９２などからなる内部センサ部７３が配設されている。
【００４４】
そして、外部センサ部７１のＣＣＤカメラ８１Ｌおよび８１Ｒは、周囲の状況を撮像し、得られた画像信号Ｓ１Ａを、メイン制御部６１に送出する。マイクロホン８２−１乃至８２−Nは、ユーザから音声入力として与えられる「歩け」、「とまれ」または「右手を挙げろ」等の各種命令音声や周囲の背景雑音を集音し、得られた音声信号Ｓ１Ｂを、メイン制御部６１にそれぞれ送出する。なお、以下において、N個のマイクロホン８２−１乃至８２−Nを特に区別する必要がない場合には、マイクロホン８２と称する。
【００４５】
また、タッチセンサ５１は、例えば、図１および図２に示されるように頭部ユニット１２の上部に設けられており、ユーザからの「撫でる」や「叩く」といった物理的な働きかけにより受けた圧力を検出し、その検出結果を、圧力検出信号Ｓ１Ｃとしてメイン制御部６１に送出する。
【００４６】
超音波センサ８３は、図示せぬ音源とマイクを有し、超音波センサ８３の内部の音源から、超音波パルスを発する。さらに、超音波センサ８３は、その超音波パルスがユーザその他の物体で反射され、返ってくる反射波を、マイクで受信し、超音波パルスを発してから、反射波を受信するまでの時間（以下、適宜、ラグ時間という）Ｓ１Ｄを求め、メイン制御部６１に送出する。
【００４７】
内部センサ部７３のバッテリセンサ９１は、バッテリ７４のエネルギ残量を所定の周期で検出し、検出結果をバッテリ残量検出信号Ｓ２Ａとして、メイン制御部６１に送出する。加速度センサ９２は、ロボット１の移動について、３軸方向（ｘ軸、ｙ軸およびｚ軸）の加速度を、所定の周期で検出し、その検出結果を、加速度検出信号Ｓ２Ｂとして、メイン制御部６１に送出する。
【００４８】
外部メモリ７５は、プログラムやデータ、および制御パラメータなどを記憶しており、そのプログラムやデータを必要に応じてメイン制御部６１に内蔵されるメモリ６１Ａに供給する。また、外部メモリ７５は、データ等をメモリ６１Ａから受け取り、記憶する。なお、外部メモリ７５は、ロボット１から着脱可能となされている。
【００４９】
メイン制御部６１は、メモリ６１Ａを内蔵している。メモリ６１Ａは、プログラムやデータを記憶しており、メイン制御部６１は、メモリ６１Ａに記憶されたプログラムを実行することで、各種の処理を行う。即ち、メイン制御部６１は、外部センサ部７１のＣＣＤカメラ８１Ｌおよび８１Ｒ、マイクロホン８２、タッチセンサ５１、および超音波センサ８３からそれぞれ供給される、画像信号Ｓ１Ａ、音声信号Ｓ１Ｂ、圧力検出信号Ｓ１Ｃ、およびラグ時間Ｓ１Ｄ（以下、これらをまとめて外部センサ信号Ｓ１と称する）と、内部センサ部７３のバッテリセンサ９１および加速度センサ等からそれぞれ供給される、バッテリ残量検出信号Ｓ２Ａおよび加速度検出信号Ｓ２Ｂ（以下、これらをまとめて内部センサ信号Ｓ２と称する）に基づいて、ロボット１の周囲および内部の状況や、ユーザからの指令、または、ユーザからの働きかけの有無などを判断する。
【００５０】
そして、メイン制御部６１は、ロボット１の周囲および内部の状況や、ユーザからの指令、または、ユーザからの働きかけの有無の判断結果と、内部メモリ６１Ａに予め格納されている制御プログラム、あるいは、そのとき装填されている外部メモリ７５に格納されている各種制御パラメータなどに基づいて、ロボット１の行動を決定し、その決定結果に基づく制御コマンドを生成して、対応するサブ制御部６３Ａ乃至６３Ｄに送出する。サブ制御部６３Ａ乃至６３Ｄは、メイン制御部６１から供給された制御コマンドに基づいて、アクチュエータＡ１乃至Ａ１４のうち、対応するものの駆動を制御する。これにより、ロボット１は、例えば、頭部ユニット１２を上下左右に揺動かさせたり、腕部ユニット１３Ａ、あるいは、腕部ユニット１３Ｂを上に挙げたり、脚部ユニット１４Ａと１４Ｂを交互に駆動させて、歩行するなどの行動を行う。
【００５１】
また、メイン制御部６１は、必要に応じて、所定の音声信号Ｓ３をスピーカ７２に与えることにより、音声信号Ｓ３に基づく音声を外部に出力させる。更に、メイン制御部６１は、外見上の「目」として機能する、頭部ユニット１２の所定位置に設けられた、図示しないＬＥＤに対して駆動信号を出力することにより、ＬＥＤを点滅させる。
【００５２】
このようにして、ロボット１は、周囲および内部の状況（状態）や、ユーザからの指令および働きかけの有無などに基づいて、自律的に行動する。
【００５３】
図６は、図５のメイン制御部６１の機能的構成例を示している。なお、図６に示す機能的構成は、メイン制御部６１が、メモリ６１Ａに記憶された制御プログラムを実行することで実現されるようになっている。
【００５４】
メイン制御部６１は、特定の外部状態を認識する状態認識情報処理部１０１、状態認識情報処理部１０１の認識結果等に基づいて更新される、ロボット１の感情、本能、あるいは、成長の状態などのモデルを記憶するモデル記憶部１０２、状態認識情報処理部１０１の認識結果等に基づいて、ロボット１の行動を決定する行動決定機構部１０３、行動決定機構部１０３の決定結果に基づいて、実際にロボット１に行動を起こさせる姿勢遷移機構部１０４、合成音を生成する音声合成部１０５、行動決定機構部１０３からの指令に基づいて、ロボット１とユーザとの距離の調整を制御する距離制御部１１０から構成されている。
【００５５】
状態認識情報処理部１０１には、マイクロホン８２や、ＣＣＤカメラ８１Ｌおよび８１Ｒ、タッチセンサ５１等から音声信号、画像信号、圧力検出信号等が、ロボット１の電源が投入されている間、常時入力される。そして、状態認識情報処理部１０１は、マイクロホン８２や、ＣＣＤカメラ８１Ｌおよび８１Ｒ、タッチセンサ５１等から与えられる音声信号、画像信号、圧力検出信号等に基づいて、特定の外部状態や、ユーザからの特定の働きかけ、ユーザからの指示等を認識し、その認識結果を表す状態認識情報を、モデル記憶部１０２および行動決定機構部１０３に常時出力する。なお、ここでは、例えば、ユーザとロボット１との距離が、状態認識情報処理部１０１が音声認識や画像認識などを行ううえで、最適（適切）な距離ではなく、状態認識情報処理部１０１が精度の良い認識を行うことが困難である場合であっても、状態認識情報処理部１０１は、上述の認識結果を表す状態認識情報をモデル記憶部１０２および行動決定機構部１０３に出力するものとする。
【００５６】
状態認識情報処理部１０１は、音声認識部１０１Ａ、画像認識部１０１Ｂ、および圧力処理部１０１Ｃを有している。
【００５７】
音声認識部１０１Ａは、マイクロホン８２−１乃至８２−Ｎそれぞれから与えられる音声信号Ｓ１Ｂについて音声認識を行う。そして、音声認識部１０１Ａは、例えば、「歩け」、「止まれ」、「右手を挙げろ」等の指令、その他の音声認識結果を、状態認識情報として、モデル記憶部１０２および行動決定機構部１０３に通知する。
【００５８】
また、画像認識部１０１Ｂは、ＣＣＤカメラ８１Ｌおよび８１Ｒから与えられる画像信号Ｓ１Ａを用いて、画像認識処理を行う。そして、画像認識部１０１Ｂは、その処理の結果、例えば、「赤い丸いもの」や、「地面に対して垂直なかつ所定高さ以上の平面」等を検出したときには、「ボールがある」や、「壁がある」等の画像認識結果を、状態認識情報として、モデル記憶部１０２および行動決定機構部１０３に通知する。
【００５９】
ここで、ユーザは、一般に、ロボット１の正面方向から話しかけることが多いと予想されるため、周囲の状況を撮像するCCDカメラ８１Ｌおよび８１Ｒは、その撮像方向が、ロボット１の正面方向になるように、頭部ユニット１２（図１）に設置されているものとする。
【００６０】
なお、ユーザが、ロボット１の正面方向からはずれた、例えば、側面や背面方向などから話しかけてきた場合には、CCDカメラ８１Ｌおよび８１Ｒにおいて、ユーザを撮像することができないことになる。そこで、例えば、マイクロホン８２−１乃至８２−Ｎに到達する音声信号のパワー差や位相差から音源の方向を推定し、そのマイクロホン８２−１乃至８２−Ｎのうち、最大の音声レベルが得られるものの方向に、頭部ユニット１２を動かすことによって、CCDカメラ８１Ｌおよび８１Ｒにおいて、ユーザを撮像することができるようにすることが可能である。なお、音声認識においては、例えば、最大の音声レベルが得られるマイクロホン（ロボット１がユーザの方向を向いた場合には、基本的には、正面方向に設けられているマイクロホン）が出力する音声データが、音声認識の対象とされる。
【００６１】
また、例えば、マイクロホン８２として、CCDカメラ８１Ｌおよび８１Ｒの撮像方向と同一方向の指向性を有するマイクロホンを採用し、マイクロホン８２に入力される音声レベルが最大となる方向に、頭部ユニット１２を動かし、これにより、CCDカメラ８１Ｌおよび８１Ｒにおいて、ユーザを撮像することができるようにすることも可能である。
【００６２】
圧力処理部１０１Ｃは、タッチセンサ５１から与えられる圧力検出信号Ｓ１Ｃを処理する。そして、圧力処理部１０１Ｃは、その処理の結果、例えば、所定の閾値以上で、かつ短時間の圧力を検出したときには、「叩かれた（しかられた）」と認識し、所定の閾値未満で、かつ長時間の圧力を検出したときには、「撫でられた（ほめられた）」と認識して、その認識結果を、状態認識情報として、モデル記憶部１０２および行動決定機構部１０３に通知する。
【００６３】
モデル記憶部１０２は、ロボット１の感情、本能、成長の状態を表現する感情モデル、本能モデル、成長モデルをそれぞれ記憶、管理している。
【００６４】
ここで、感情モデルは、例えば、「うれしさ」、「悲しさ」、「怒り」、「楽しさ」等の感情の状態（度合い）を、所定の範囲（例えば、−１．０乃至１．０等）の値によってそれぞれ表し、状態認識情報処理部１０１からの状態認識情報や時間経過等に基づいて、その値を変化させる。本能モデルは、例えば、「食欲」、「睡眠欲」、「運動欲」等の本能による欲求の状態（度合い）を、所定の範囲の値によってそれぞれ表し、状態認識情報処理部１０１からの状態認識情報や時間経過等に基づいて、その値を変化させる。成長モデルは、例えば、「幼年期」、「青年期」、「熟年期」、「老年期」等の成長の状態（度合い）を、所定の範囲の値によってそれぞれ表し、状態認識情報処理部１０１からの状態認識情報や時間経過等に基づいて、その値を変化させる。
【００６５】
モデル記憶部１０２は、上述のようにして感情モデル、本能モデル、成長モデルの値で表される感情、本能、成長の状態を、状態情報として、行動決定機構部１０３に送出する。
【００６６】
なお、モデル記憶部１０２には、状態認識情報処理部１０１から状態認識情報が供給される他、行動決定機構部１０３から、ロボット１の現在または過去の行動、具体的には、例えば、「長時間歩いた」などの行動の内容を示す行動情報が供給されるようになっており、モデル記憶部１０２は、同一の状態認識情報が与えられても、行動情報が示すロボット１の行動に応じて、異なる状態情報を生成するようになっている。
【００６７】
即ち、例えば、ロボット１が、ユーザに挨拶をし、ユーザに頭を撫でられた場合には、ユーザに挨拶をしたという行動情報と、頭を撫でられたという状態認識情報とが、モデル記憶部１０２に与えられ、この場合、モデル記憶部１０２では、「うれしさ」を表す感情モデルの値が増加される。
【００６８】
一方、ロボット１が、何らかの仕事を実行中に頭を撫でられた場合には、仕事を実行中であるという行動情報と、頭を撫でられたという状態認識情報とが、モデル記憶部１０２に与えられ、この場合、モデル記憶部１０２では、「うれしさ」を表す感情モデルの値は変化されない。
【００６９】
このように、モデル記憶部１０２は、状態認識情報だけでなく、現在または過去のロボット１の行動を示す行動情報も参照しながら、感情モデルの値を設定する。これにより、例えば、何らかのタスクを実行中に、ユーザが、いたずらするつもりで頭を撫でたときに、「うれしさ」を表す感情モデルの値を増加させるような、不自然な感情の変化が生じることを回避することができる。
【００７０】
なお、モデル記憶部１０２は、本能モデルおよび成長モデルについても、感情モデルにおける場合と同様に、状態認識情報および行動情報の両方に基づいて、その値を増減させるようになっている。また、モデル記憶部１０２は、感情モデル、本能モデル、成長モデルそれぞれの値を、他のモデルの値にも基づいて増減させるようになっている。
【００７１】
行動決定機構部１０３は、状態認識情報処理部１０１からの状態認識情報や、モデル記憶部１０２からの状態情報、時間経過等に基づいて、次の行動を決定し、決定された行動の内容が、例えば、「ダンスをする」というような音声認識処理や画像認識処理を必要としない場合、その行動の内容を、行動指令情報として、姿勢遷移機構部１０４に送出する。
【００７２】
すなわち、行動決定機構部１０３は、ロボット１がとり得る行動をステート（状態：state）に対応させた有限オートマトンを、ロボット１の行動を規定する行動モデルとして管理しており、この行動モデルとしての有限オートマトンにおけるステートを、状態認識情報処理部１０１からの状態認識情報や、モデル記憶部１０２における感情モデル、本能モデル、または成長モデルの値、時間経過等に基づいて遷移させ、遷移後のステートに対応する行動を、次にとるべき行動として決定する。
【００７３】
ここで、行動決定機構部１０３は、所定のトリガ(trigger)があったことを検出すると、ステートを遷移させる。即ち、行動決定機構部１０３は、例えば、現在のステートに対応する行動を実行している時間が所定時間に達したときや、特定の状態認識情報を受信したとき、モデル記憶部１０２から供給される状態情報が示す感情や、本能、成長の状態の値が所定の閾値以下または以上になったとき等に、ステートを遷移させる。
【００７４】
なお、行動決定機構部１０３は、上述したように、状態認識情報処理部１０１からの状態認識情報だけでなく、モデル記憶部１０２における感情モデルや、本能モデル、成長モデルの値等にも基づいて、行動モデルにおけるステートを遷移させることから、同一の状態認識情報が入力されても、感情モデルや、本能モデル、成長モデルの値（状態情報）によっては、ステートの遷移先は異なるものとなる。
【００７５】
一方、行動決定機構部１０３が、状態認識情報処理部１０１からの状態認識情報や、モデル記憶部１０２からの状態情報、時間経過等に基づいて、次の行動を決定し、決定された行動の内容が、例えば、ユーザが発する音声を認識して対応する発話を行う「ユーザと会話する」や、ユーザ（の顔画像）を認識して、そのユーザに対して手を振る動作を行う「ユーザに手を振る」というような音声認識処理や画像認識処理を必要とする場合、行動決定機構部１０３は、上述した状態認識情報処理部１０１の音声認識部１０１Ａや画像認識部１０１Ｂが、精度良く認識することができるように、ロボット１とユーザとの距離を調整する旨の指令を距離制御部１１０に送出する。また、上述したように、状態認識情報処理部１０１が、認識精度としては良くない認識結果であっても、例えば、画像信号の肌色領域などから判断されるユーザの顔画像などを検出し、検出した旨を示す状態認識情報を行動決定機構部１０３に送出する場合、行動決定機構部１０３は、ユーザの顔画像をさらに精度良く認識することができるように、ロボット１とユーザとの距離を調整する旨の指令を距離制御部１１０に送出する。
【００７６】
そして、距離制御部１１０から行動決定機構部１０３にロボット１とユーザとの距離が調整されたことの指令が供給されたときに、行動決定機構部１０３は、その指令が供給されたときと同じタイミングで、状態認識情報処理部１０１から供給されている状態認識情報を、認識精度の良い状態認識情報として取得し、上述したような、例えば、「ユーザと会話する」や「ユーザに手を振る」などの、行動決定機構部１０３自身が先に決定した動作を行う（その行動の内容を、行動指令情報として、姿勢遷移機構部１０４に送出する）。
【００７７】
なお、行動決定機構部１０３では、上述したように、ロボット１の頭部や手足等を動作させる行動指令情報の他、ロボット１に発話を行わせる行動指令情報も生成される。ロボット１に発話を行わせる行動指令情報は、音声合成部１０５に供給されるようになっており、音声合成部１０５に供給される行動指令情報には、音声合成部１０５に生成させる合成音に対応するテキスト等が含まれる。そして、音声合成部１０５は、行動決定機構部１０３から行動指令情報を受信すると、その行動指令情報に含まれるテキストに基づき、合成音を生成し、スピーカ７２に供給して出力させる。例えば、行動決定機構部１０３が、ロボット１の発話によってロボット１とユーザとの距離を調整したい旨の指令を、後述する距離制御部１１０より受け取ると、行動決定機構部１０３は、例えば、「少し離れてください」あるいは、「もう少し近づいてください」などのテキストを含んだ行動指令情報を音声合成部１０５に送出する。この場合、スピーカ７２からは、「少し離れてください」あるいは、「もう少し近づいてください」などの音声出力（ロボット１による発話）が行われる。
【００７８】
距離制御部１１０は、距離推定部１１１、閾値設定部１１２、距離判定部１１３、および距離調整部１１４から構成されている。上述したように、距離制御部１１０には、行動決定機構部１０３からロボット１とユーザとの距離を調整する旨の指令が供給される。また、外部センサ部７１（図５）のＣＣＤカメラ８１Ｌおよび８１Ｒ、マイクロホン８２、タッチセンサ５１、および超音波センサ８３から画像信号Ｓ１Ａ、音声信号Ｓ１Ｂ、圧力検出信号Ｓ１Ｃ、およびラグ時間Ｓ１Ｄ、すなわち、外部センサ信号Ｓ１も、距離制御部１１０に常時供給される。
【００７９】
距離推定部１１１は、外部センサ部７１からの各種の信号を基に、ロボット１とユーザとの距離を推定し、その推定したロボット１とユーザとの距離を推定距離情報として、距離判定部１１３と距離調整部１１４に送出する。
【００８０】
すなわち、距離推定部１１１は、ＣＣＤカメラ８１Ｌおよび８１Ｒから供給される画像信号Ｓ１Ａのなかに、ユーザの顔画像が含まれる場合、その画像信号Ｓ１Ａのなかの顔画像の領域（以下、顔画像領域と称する）に基づいて、ロボット１とユーザとの距離を推定する。人間の顔画像の検出は、例えば、画像信号Ｓ１Ａから肌色の領域を検出することにより行われる。なお、ユーザの顔画像領域を用いる距離の推定は、１つのＣＣＤカメラで行うことも可能であるので、ＣＣＤカメラ８１Ｌまたは８１Ｒのうちのいずれか一方の画像信号から行うようにしてもよい。
【００８１】
また、距離推定部１１１は、ＣＣＤカメラ８１Ｌおよび８１Ｒを用いて、ステレオ処理を行うことにより、ロボット１とユーザとの距離を推定する。ステレオ処理の原理についての詳細な説明は、後述する。
【００８２】
さらに、距離推定部１１１は、超音波センサ８３から供給されるラグ時間Ｓ１Ｄに基づいて、ロボット１とユーザとの距離を推定する。
【００８３】
また、距離推定部１１１は、マイクロホン８２から供給される音声に基づいて、ロボット１とユーザとの距離を推定する。即ち、音声信号Ｓ１Ｂが、マイクロホン８２から距離推定部１１１に供給された場合、すなわち、ユーザが音声を発した場合、距離推定部１１１は、入力されたユーザの音声の大きさ（レベル）からロボット１とユーザとの距離を推定する。また、距離推定部１１１は、マイクロホン８２−１乃至８２−Ｎそれぞれから供給される音声に基づいて、音声を発したユーザの方向（音源方向）を推定する。
【００８４】
距離推定部１１１は、上述したようにＣＣＤカメラ８１Ｌおよび８１Ｒ、マイクロホン８２、超音波センサ８３などから供給されるそれぞれの信号から、ロボット１とユーザとの距離をそれぞれ推定し、さらに、その推定結果から、総合的にロボット１とユーザとの距離を推定する。総合的に推定された距離は、推定距離情報として、距離推定部１１１から距離判定部１１３および距離調整部１１４に出力される。
【００８５】
閾値設定部１１２は、後述する距離判定部１１３で利用される閾値としての所定の範囲（例えば、ロボット１の位置を基準とした第１の位置から第２の位置までの範囲）Ｒおよび距離調整部１１４で利用される閾値Ｄ１を予め記憶しており、その所定の範囲Ｒと閾値Ｄ１を、距離判定部１１３と距離調整部１１４にそれぞれ供給する。
【００８６】
ここで、所定の範囲Ｒは、例えば、ロボット１がユーザの顔画像の認識、または、ユーザが発する音声の認識などを行うために適切な、ロボット１とユーザとの距離であり、例えば、ＣＣＤカメラ８１Ｌおよび８１Ｒ、マイクロホン８２、音声認識部１０１Ａ、画像処理部１０１Ｂの性能等に基づいて決定することができる。また、閾値Ｄ１は、ユーザがロボット１に対して近い位置にいるか、または遠い位置にいるかを判定するのに用いられる。後述する距離調整部１１４においては、ロボット１とユーザとの距離が、閾値Ｄ１と比較されることにより調整されるため、閾値Ｄ１を、例えば、所定の範囲Ｒ内の値にすれば、ロボット１とユーザとの距離が、所定の範囲Ｒ内になるように調整されることになる。従って、閾値Ｄ１は、例えば、所定の範囲Ｒを２分する値とされる。
【００８７】
距離判定部１１３には、距離推定部１１１から推定距離情報としてのロボット１とユーザとの距離（推定値）と、閾値設定部１１２から所定の範囲Ｒが供給される。そして、距離判定部１１３は、ロボット１とユーザとの距離が、所定の範囲Ｒ内であるかどうかを判定する。
【００８８】
距離判定部１１３は、距離推定部１１１から距離判定部１１３に供給されるロボット１とユーザとの距離が、所定の範囲Ｒ内であると判定した場合、行動決定機構部１０３に対して、ロボット１とユーザとの距離が所定の範囲Ｒ内である旨の指令を送出する。
【００８９】
一方、距離判定部１１３は、距離推定部１１１から距離判定部１１３に供給されるロボット１とユーザとの距離が、所定の範囲Ｒ内でないと判定した場合、距離調整部１１４に対して、ロボット１とユーザとの距離を調整する旨の指令を送出する。
【００９０】
距離調整部１１４には、上述したように、距離推定部１１１からロボット１とユーザとの推定距離情報が供給される。また、距離調整部１１４には、ロボット１とユーザとの距離が、所定の位置より遠い（大きい）か、または近い（小さい）かどうかを判断するための閾値Ｄ１が閾値設定部１１２から供給される。そして、距離調整部１１４は、距離判定部１１３から距離を調整する旨の指令を受け取ると、距離調整部１１４は、推定距離情報と閾値Ｄ１に基づき、距離推定部１１１から供給されたロボット１とユーザとの距離が、閾値Ｄ１よりも遠いか、または近いかを判断する。次に、距離調整部１１４は、その判断結果に応じて、ロボット１とユーザとの距離を調整するための各種の指令信号を、行動決定機構部１０３に送出する。
【００９１】
ここで、ロボット１とユーザとの距離を調整する方法としては、例えば、ユーザがロボット１に対して、近づく、または離れるように促す発話を行う方法がある。この場合、距離調整部１１４は、例えば、「ユーザに近づく、または離れるように発話する」などの行動依頼の指令を行動決定機構部１０３に送出する。この指令を受け取った行動決定機構部１０３は、上述したように音声合成部１０５に、例えば、「少し離れてください」あるいは、「もう少し近づいてください」などのテキストを含んだ行動指令情報を音声合成部１０５に送出する。この場合、音声合成部１０５では、「少し離れてください」や「もう少し近づいてください」などのユーザの移動を促す合成音が生成され、スピーカ７２から出力される。そして、その音声を聞いたユーザが、その音声に従い、ロボット１に対して離れるように、または近づくように移動することにより、ロボット１とユーザとの距離が調整される。
【００９２】
また、ロボット１とユーザとの距離を調整する他の方法としては、例えば、ユーザがロボット１に対して、近づく、または離れるように促す動作（ロボット１の仕草）を行う方法がある。この場合、距離調整部１１４は、例えば、ユーザに近づいて欲しいときには、「手招きをする」、ユーザに離れて欲しいときには、「追い払う」などの行動依頼の指令を行動決定機構部１０３に送出する。この指令を受け取った行動決定機構部１０３は、上述したように、その行動の内容を行動指令情報として、姿勢遷移機構部１０４に送出する。姿勢遷移機構部１０４は、行動決定機構部１０３から供給される行動指令情報に基づいて、ロボット１の姿勢を、現在の姿勢から次の姿勢に遷移させるための姿勢遷移情報を生成し、これをサブ制御部６３Ａ乃至６３Ｄに送出する。サブ制御部６３Ａ乃至６３Ｄは、上述したように、ロボット１の手や足となる腕部ユニット１３Ａおよび１３Ｂや脚部ユニット１４Ａおよび１４Ｂなどを動かす。その結果、ロボット１はユーザに対して、「手招きをする」や「追い払う」などの動作を行い、その動作を見たユーザが、その動作に従い、ロボット１に対して離れるように、または近づくように移動することにより、ロボット１とユーザとの距離が調整される。
【００９３】
さらに、ロボット１とユーザとの距離を調整する他の方法としては、例えば、ロボット１自身がユーザとの距離を調整するように行動する（移動する）方法がある。その場合、距離調整部１１４は、例えば、「ユーザ方向（前）に歩く（移動する）」や「ユーザから離れる方向（後ろ）に歩く（移動する）」などの行動依頼の指令を行動決定部１０３に送出する。この指令を受け取った行動決定機構部１０３は、その行動の内容を行動指令情報として、姿勢遷移機構部１０４に送出する。そして、上述したように、姿勢遷移機構部１０４が姿勢遷移情報を生成し、その姿勢遷移情報をサブ制御部６３Ａ乃至６３Ｄに送出することにより、ロボット１は、ユーザに対して、前や後ろに移動する。その結果、ロボット１とユーザとの距離が調整される。
【００９４】
姿勢遷移機構部１０４は、上述したように、行動決定機構部１０３から供給される行動指令情報に基づいて、ロボット１の姿勢を、現在の姿勢から次の姿勢に遷移させるための姿勢遷移情報を生成し、これをサブ制御部６３Ａ乃至６３Ｄに送出する。
【００９５】
次に、図７を参照して、超音波センサ８３の出力から距離を推定する原理について説明する。
【００９６】
超音波センサ８３は、図示せぬ音源とマイクロホンを有し、図７に示すように、音源から、超音波パルスを発する。さらに、超音波センサ８３は、その超音波パルスが障害物で反射され、返ってくる反射波を、マイクロホンで受信し、超音波パルスを発してから、反射波を受信するまでの時間（ラグ時間）を求める。このようにして求めた時間（ラグ時間）から、障害物までの距離を求めることができる。
【００９７】
次に、図８乃至図１２を参照して、CCDカメラ８１Ｌおよび８１Ｒからの画像信号を用いてステレオ処理（ステレオマッチング法による処理）を行うことにより、ロボット１とユーザとの距離を推定する原理について説明する。
【００９８】
ステレオ処理は、２つ以上の方向（異なる視線方向）からカメラで同一対象物を撮影して得られる複数の画像間の画素同士を対応付けることで、対応する画素間の視差情報や、カメラから対象物までの距離を求めるものである。
【００９９】
即ち、いま、CCDカメラ８１Ｌと８１Ｒを、それぞれ基準カメラ８１Ｌと検出カメラ８１Ｒというとともに、それぞれが出力する画像を、基準カメラ画像と検出カメラ画像というものとして、例えば、図８に示すように、基準カメラ８１Ｌおよび検出カメラ８１Ｒで、撮像対象物としてのユーザを撮影すると、基準カメラ８１Ｌからはユーザの投影像を含む基準カメラ画像が得られ、検出カメラ８１Ｒからもユーザの投影像を含む検出カメラ画像が得られる。そして、いま、例えば、ユーザの口部上のある点Ｐが、基準カメラ画像および検出カメラ画像の両方に表示されているとすると、その点Ｐが表示されている基準カメラ画像上の位置と、検出カメラ画像上の位置、つまり対応点（対応画素）とから、視差情報を求めることができ、さらに、三角測量の原理を用いて、点Ｐの３次元空間における位置（３次元位置）を求めることができる。
【０１００】
従って、ステレオ処理では、まず、対応点を検出することが必要となるが、その検出方法としては、例えば、エピポーラライン（Epipolar Line）を用いたエリアベースマッチング法などがある。
【０１０１】
即ち、図９に示すように、基準カメラ８１Ｌにおいては、ユーザ上の点Ｐは、その点Ｐと基準カメラ８１Ｌの光学中心（レンズ中心）Ｏ₁とを結ぶ直線Ｌ上の、基準カメラ１の撮像面Ｓ₁との交点ｎ_aに投影される。
【０１０２】
また、検出カメラ８１Ｒにおいては、ユーザ上の点Ｐは、その点Ｐと検出カメラ８１Ｒの光学中心（レンズ中心）Ｏ₂とを結ぶ直線上の、検出カメラ８１Ｒの撮像面Ｓ₂との交点ｎ_bに投影される。
【０１０３】
この場合、直線Ｌは、光学中心Ｏ₁およびＯ₂、並びに点ｎ_a（または点Ｐ）の３点を通る平面と、検出カメラ画像が形成される撮像面Ｓ₂との交線Ｌ₂として、撮像面Ｓ₂上に投影される。点Ｐは、直線Ｌ上の点であり、従って、撮像面Ｓ₂において、点Ｐを投影した点ｎ_bは、直線Ｌを投影した直線Ｌ₂上に存在し、この直線Ｌ₂はエピポーララインと呼ばれる。即ち、点ｎ_aの対応点ｎ_bが存在する可能性のあるのは、エピポーララインＬ₂上であり、従って、対応点ｎ_bの探索は、エピポーララインＬ₂上を対象に行えば良い。
【０１０４】
ここで、エピポーララインは、例えば、撮像面Ｓ₁に形成される基準カメラ画像を構成する画素ごとに考えることができるが、基準カメラ８１Ｌと検出カメラ８１Ｒの位置関係が既知であれば、その画素ごとに存在するエピポーララインは、例えば計算によって求めることができる。
【０１０５】
エピポーララインＬ₂上の点からの対応点ｎ_bの検出は、例えば、次のようなエリアベースマッチングによって行うことができる。
【０１０６】
即ち、エリアベースマッチングでは、図１０Ａに示すように、基準カメラ画像上の点ｎ_aを中心（例えば、対角線の交点）とする、例えば長方形状の小ブロック（以下、適宜、基準ブロックという）が、基準カメラ画像から抜き出されるとともに、図１０Ｂに示すように、検出カメラ画像に投影されたエピポーララインＬ₂上の、ある点を中心とする、基準ブロックと同一の大きさの小ブロック（以下、適宜、検出ブロックという）が、検出カメラ画像から抜き出される。
【０１０７】
ここで、図１０Ｂの実施の形態においては、エピポーララインＬ₂上に、検出ブロックの中心とする点として、点ｎ_b1乃至ｎ_b6の６点が設けられている。この６点ｎ_b1乃至ｎ_b6は、図９に示した３次元空間における直線Ｌを、所定の一定距離ごとに区分する点、即ち、基準カメラ８１Ｌからの距離が、例えば、１ｍ，２ｍ，３ｍ，４ｍ，５ｍ，６ｍの点それぞれを、検出カメラ８１Ｒの撮像面Ｓ₂に投影した点で、従って、基準カメラ８１Ｌからの距離が１ｍ，２ｍ，３ｍ，４ｍ，５ｍ，６ｍの点にそれぞれ対応している。
【０１０８】
エリアベースマッチングでは、検出カメラ画像から、エピポーララインＬ₂上に設けられている点ｎ_b1乃至ｎ_b6それぞれを中心とする検出ブロックが抜き出され、各検出ブロックと、基準ブロックとの相関が、所定の評価関数を用いて演算される。そして、点ｎ_aを中心とする基準ブロックとの相関が最も高い検出ブロックの中心の点ｎ_bが、点ｎ_aの対応点として求められる。
【０１０９】
即ち、例えば、いま、評価関数として、相関が高いほど小さな値をとる関数を用いた場合に、エピポーララインＬ₂上の点ｎ_b1乃至ｎ_b6それぞれについて、例えば、図１１に示すような評価値（評価関数の値）が得られたとする。この場合、評価値が最も小さい（相関が最も高い）点ｎ_b3が、点ｎ_aの対応点として検出される。なお、図１１において、点ｎ_b1乃至ｎ_b6それぞれについて求められた評価値（図１１において黒丸印で示す）のうちの最小値付近のものを用いて補間を行い、評価値がより小さくなる点（図１１においてバツ印で示す）を求めて、その点を、最終的な対応点として検出することも可能である。
【０１１０】
図１０の実施の形態では、上述したように、３次元空間における直線Ｌを所定の等距離ごとに区分する点を、検出カメラ８１Ｒの撮像面Ｓ₂に投影した点が設定されているが、この設定は、例えば、基準カメラ８１Ｌおよび検出カメラ８１Ｒのキャリブレーション時に行うことができる。そして、このような設定を、基準カメラ８１Ｌの撮像面Ｓ₁を構成する画素ごとに存在するエピポーララインごとに行い、図１２Ａに示すように、エピポーラライン上に設定された点（以下、適宜、設定点という）と、基準カメラ８１Ｌからの距離とを対応付ける設定点／距離テーブルをあらかじめ作成しておけば、対応点となる設定点を検出し、設定点／距離テーブルを参照することで、即座に、基準カメラ８１Ｌからの距離（ユーザまでの距離）を求めることができる。即ち、いわば、対応点から、直接、距離を求めることができる。
【０１１１】
一方、基準カメラ画像上の点ｎ_aについて、検出カメラ画像上の対応点ｎ_bを検出すれば、その２点ｎ_aおよびｎ_bの間の視差（視差情報）を求めることができる。さらに、基準カメラ８１Ｌと検出カメラ８１Ｒの位置関係が既知であれば、２点ｎ_aおよびｎ_bの間の視差から、三角測量の原理によって、ユーザまでの距離を求めることができる。視差から距離の算出は、所定の演算を行うことによって行うことができるが、あらかじめその演算を行っておき、図１２Ｂに示すように、視差ζと距離との対応付ける視差／距離テーブルをあらかじめ作成しておけば、対応点を検出し、視差を求め、視差／距離テーブルを参照することで、やはり、即座に、基準カメラ８１Ｌからの距離を求めることができる。
【０１１２】
ここで、視差と、ユーザまでの距離とは一対一に対応するものであり、従って、視差を求めることとと、ユーザまでの距離を求めることとは、いわば等価である。
【０１１３】
また、対応点の検出に、基準ブロックおよび検出ブロックといった複数画素でなるブロックを用いるのは、ノイズの影響を軽減し、基準カメラ画像上の画素（点）ｎ_aの周囲の画素のパターンの特徴と、検出カメラ画像上の対応点（画素）ｎ_bの周囲の画素のパターンの特徴との相関性を明確化して判断することにより、対応点の検出の確実を期すためであり、特に、変化の少ない基準カメラ画像および検出カメラ画像に対しては、画像の相関性により、ブロックの大きさが大きければ大きいほど対応点の検出の確実性が増す。
【０１１４】
なお、エリアベースマッチングにおいて、基準ブロックと検出ブロックとの相関性を評価する評価関数としては、基準ブロックを構成する画素と、それぞれの画素に対応する、検出ブロックを構成する画素の画素値の差分の絶対値の総和や、その差分の自乗和、正規化された相互相関(normalized cross correlation)などを用いることができる。
【０１１５】
以上、ステレオ処理について簡単に説明したが、ステレオ処理（ステレオマッチング法）については、その他、例えば、安居院、長尾、「Ｃ言語による画像処理入門」、昭晃堂 pp.127ページなどにも記載されている。
【０１１６】
次に、図１３のフローチャートを参照して、図６のメイン制御部６１が行うロボット１の動作処理について説明する。この処理は、ロボット１の電源投入と同時に開始される。
【０１１７】
初めに、ステップＳ１において、行動決定機構部１０３は、状態認識情報処理部１０１からの状態認識情報や、モデル記憶部１０２からの状態情報、時間経過等に基づいて、ロボット１の行動を決定して、ステップＳ２に進む。
【０１１８】
ステップＳ２において、行動決定機構部１０３は、ステップＳ１で決定された行動が、認識処理を必要とする行動であるか否かを判定する。ここで、認識処理を必要としない行動としては、上述したように、例えば、「ダンスをする」などがある。また、認識処理を必要とする行動としては、上述したように、例えば、「ユーザと会話する」などがある。
【０１１９】
ステップＳ２において、ステップＳ１で決定された行動が認識処理を必要とする行動であると判定された場合、ステップＳ３に進み、行動決定機構部１０３は、距離推定部１１１に距離を調整する旨の指令を送出して、ステップＳ４に進む。
【０１２０】
ステップＳ４において、距離推定部１１１は、行動決定機構部１０３から距離を調整する旨の指令を受け取ると、ロボット１と認識処理の対象（認識対象）であるユーザとの距離を推定する距離推定処理を行う。距離推定処理の詳細については、図１４を参照して後述するが、この処理により、ロボット１とユーザとの距離が推定され、その推定された距離を表す推定距離情報が、距離推定部１１１から距離判定部１１３および距離調整部１１４に供給される。
【０１２１】
ステップＳ４の処理の後、ステップＳ５に進み、距離判定部１１３は、ステップＳ４の処理により距離推定部１１１から供給された、ロボット１とユーザとの距離が、閾値設定部１１２より予め供給され、距離判定部１１３に設定されている所定の範囲Ｒ内に入っているか否かを判定する。
【０１２２】
ステップＳ５において、ロボット１とユーザとの距離が、所定の範囲Ｒの内側に入っていないと判定された場合、ステップＳ６に進み、距離判定部１１３は、ロボット１とユーザとの距離を調整する旨の指令を距離調整部１１４に出力して、ステップＳ７に進む。
【０１２３】
ステップＳ７において、距離調整部１１４は、距離判定部１１３からロボット１とユーザとの距離を調整する旨の指令を受け取ると、ロボット１とユーザとの距離を調整する後述する処理（距離調整処理）を実行する。そして、ステップＳ７からステップＳ４に戻り、以下、ステップＳ４乃至Ｓ７の処理が繰り返される。ステップＳ４乃至Ｓ７の処理が繰り返されることにより、ロボット１とユーザとの距離は所定の範囲Ｒ内、即ち、認識処理に適切な距離にされる。
【０１２４】
一方、ステップＳ５において、ロボット１とユーザとの距離が、所定の範囲Ｒ内に入っていると判定された場合、即ち、ロボット１とユーザとの距離が、認識処理に適切な距離になっている場合、ステップＳ８に進み、距離判定部１１３は、ロボット１とユーザとの距離が所定の範囲内である旨の指令を行動決定機構部１０３に出力して、ステップＳ９に進む。
【０１２５】
また、上述のステップＳ２において、ステップＳ１で決定された行動が認識処理を必要とする行動でないと判定された場合も、ステップＳ９に進み、行動決定機構部１０３は、ステップＳ１で決定された行動に対応する行動指令情報を、姿勢遷移機構部１０４または音声合成部１０５に送出する。
【０１２６】
ここで、認識処理を必要としない場合としては、例えば、「ダンスをする」などがあり、認識処理を必要とする場合としては、例えば、「ユーザと会話する」などがある。また、例えば、「さようなら」という音声を発して、「手を振る」などのように、ロボット１の行動の内容によっては、姿勢遷移機構部１０４および音声合成部１０５の両方に行動指令情報を送出することもできる。
【０１２７】
ステップＳ９の処理の後、ステップＳ１０に進み、行動決定機構部１０３は、ロボット動作処理を終了するかどうかを判定し、終了しないと判定した場合、ステップＳ１に戻り、それ以降の処理が繰り返される。
【０１２８】
また、ステップＳ１０において、ロボット動作処理を終了すると判定された場合、即ち、例えば、ユーザによって、ロボット１の電源がオフ状態とされた場合、ロボット動作処理が終了される。
【０１２９】
次に、図１４のフローチャートを参照して、図１３のステップＳ４における、距離推定部１１１の距離推定処理について説明する。
【０１３０】
ステップＳ２１において、距離推定部１１１は、ＣＣＤカメラ８１Ｌおよび８１Ｒが撮像した画像信号からユーザの顔画像を検出したか否か（画像信号に顔画像が含まれるか否か）を判定する。
【０１３１】
ステップＳ２１において、ＣＣＤカメラ８１Ｌおよび８１Ｒが撮像した画像信号から顔画像を検出したと判定された場合、ステップＳ２２に進み、距離推定部１１１は、その画像信号における顔画像領域の大きさからロボット１とユーザとの距離を推定し、ステップＳ２３に進む。ここで、画像信号における顔画像領域の大きさからロボット１とユーザとの距離を推定する方法としては、例えば、顔画像領域の大きさ（画素数）と、ロボット１とユーザとの距離を対応づけたテーブルを予め用意し、そのテーブルに基づいて、顔画像領域の大きさからロボット１とユーザとの距離を推定する方法がある。
【０１３２】
一方、ステップＳ２１において、ＣＣＤカメラ８１Ｌおよび８１Ｒが撮像した画像信号から顔画像を検出していないと判定された場合、即ち、例えば、ＣＣＤカメラ８１Ｌおよび８１Ｒが出力する画像信号に、十分な大きさの顔画像領域が存在しない場合、ステップＳ２２をスキップして、ステップＳ２３に進み、距離推定部１１１は、ＣＣＤカメラ８１Ｌおよび８１Ｒからの画像信号を用いて、上述したステレオ処理によりロボット１とユーザとの距離を推定してステップＳ２４に進む。
【０１３３】
ステップＳ２４において、距離推定部１１１は、超音波センサ８３が距離制御部１１０に出力したラグ時間より、ロボット１とユーザとの距離を推定して、ステップＳ２５に進む。
【０１３４】
ステップＳ２５において、距離推定部１１１は、マイクロホン８２に音声入力があったか否か、即ち、ユーザがロボット１に音声を発していたか否かを判定する。マイクロホン８２に音声入力があったと判定された場合、ステップＳ２６に進み、距離推定部１１１は、マイクロホン８２から距離制御部１１０に供給された音声信号を用いて、マイクロホン８２に入力されたユーザの音声の大きさからロボット１とユーザとの距離を推定する。また、距離推定部１１１は、マイクロホン８２−１乃至８２−Ｎそれぞれの音声信号から、音声を発したユーザの方向（音源方向）を推定し、ステップＳ２７に進む。
【０１３５】
また、ステップＳ２５において、マイクロホン８２に音声入力がなかったと判定された場合、ステップＳ２６の処理をスキップして、ステップＳ２７に進み、距離推定部１１１は、ステップＳ２２，Ｓ２３，Ｓ２４、およびＳ２６それぞれで推定されたロボット１とユーザとの距離から、総合的なロボット１とユーザとの距離を推定する。
【０１３６】
ここで、上述した顔画像領域の大きさ、ステレオ処理、ラグ時間、および入力されたユーザの音声のそれぞれから推定される距離の推定精度を比較する。入力されたユーザの音声からの距離の推定は、音声ボリューム（音声信号のレベル）の大小と距離とを関連付けるものであるが、人間が通常発話するときの声のボリュームは、人さまざまで、元々声が小さい人であったり、大きい人であったりすることもあるので、音声ボリュームの大小と距離との相関度は、ユーザによってバラツキがあることがあり得る。
【０１３７】
顔画像領域の大きさから距離を推定する方法は、ＣＣＤカメラ８１Ｌおよび８１Ｒの有効画素の領域に対して、極端に顔画像領域が小さい（ロボット１とユーザとの距離が遠すぎる）、または、顔画像領域が入りきらないほど大きい（ロボット１とユーザとの距離が近すぎる）場合、そのような顔画像領域から推定されるロボット１とユーザとの距離は、誤差が大きくなることがあり得る。
【０１３８】
ステレオ処理によって距離を推定する方法も、顔画像領域の大きさから距離を推定する場合と同様に、画像信号におけるユーザの画像部分の大きさなどの影響を受ける。
【０１３９】
一方、超音波センサ８３が出力するラグ時間から距離を推定する方法は、上述の他の３つの方法のようにユーザの画像や音声の影響を受けないため、最も信頼度が高いと考えられる。
【０１４０】
そこで、総合的な距離の推定方法としては、例えば、４つの推定結果である距離の値にそれぞれの信頼度にあわせた重みを付け、その重みを推定結果にそれぞれ乗算し、それらの平均値（重み付け平均値）を総合的な距離とする方法を採用することができる。
【０１４１】
また、例えば、ステレオ処理や顔画像領域による距離の推定などの画像認識を行ったユーザの方向（頭部ユニット１２の正面方向）とマイクロホン８２−１乃至８２−Ｎそれぞれの音声信号から距離推定部１１１が推定したユーザの方向（音源方向）が異なる方向である場合には、マイクロホン８２−１乃至８２−Ｎが集音した音声は、認識対象ではない別のユーザが発した音声であるとして、入力音声による距離の推定結果を除外して総合的な距離を推定するようにしてもよい。また、例えば、信頼度が最も低いと一般に考えられる音声入力からの距離の推定結果と、その他の３つの方法による距離の推定結果を比較して、音声入力からの距離の推定結果が、その他の３つの方法による距離の推定結果と、極端に差のある値となった場合には、音声入力からの距離の推定結果を破棄し、その他の３つの方法による距離の推定結果だけを用いて、ロボット１とユーザとの総合的な距離を推定するようにしてもよい。また、その他の総合的な距離の推定方法としては、４つの方法または、その４つの方法のうちの複数の方法により得られた距離の推定結果のメジアン（中央値）を求める方法を採用してもよい。
【０１４２】
ここで、超音波センサ８３の出力から推定する距離の結果は、ロボット１とユーザとの間に障害物がない場合は、最も信頼度が高いと考えられるが、一方で、ロボット１とユーザとの間に障害物がある場合、その障害物を検出してしまうということがありうる。そのため、例えば、超音波センサ８３の出力から推定した距離が、ステレオ処理や顔画像領域の大きさから推定された距離よりも極端に小さい（ロボット１とユーザとの距離が近い）場合は、障害物がロボット１とユーザの間のロボット１に極めて近い位置にあり、超音波センサ８３が、その障害物を検出しているとして、距離推定部１１１において、超音波センサ８３の出力を除外して総合的な距離を推定するようにすることもできる。
【０１４３】
以上のように、距離推定部１１１は、外部センサ部７１の各種のセンサの出力信号から、ロボット１とユーザとの距離を推定する。なお、ロボット１とユーザとの距離の推定にあたっては、必ずしも上述の４つの距離の推定方法を採用しなければならないわけではなく、そのうちのいずれかのみを採用してもよいし、その他の距離の推定方法を採用するようにしてもよい。その他の距離の推定方法としては、例えば、PSD(Position Sensitive Detector)を用いた距離の推定方法がある。PSDによる距離の推定は、次のようなものである。すなわち、PSDとともに設けられたLED(Light Emitting Diode)が発光し、物体に反射して戻ってくる反射光をPSDが受光する。PSDは、その反射光の位置情報を基に、三角測量の原理で物体とPSDとの距離を推定する。
【０１４４】
次に、図１５のフローチャートを参照して、図１３のステップＳ７における距離調整部１１４の距離調整処理の第１の実施の形態（第１の距離調整処理）について説明する。
【０１４５】
初めに、ステップＳ４１において、距離調整部１１４は、図１３のステップＳ４（図１３）で距離推定部１１１が推定し、距離調整部１１４に供給された、ロボット１とユーザとの距離が、閾値設定部１１２から距離調整部１１４に供給された閾値Ｄ１より遠い（大きい）か否かを判定する。
【０１４６】
ステップＳ４１において、ロボット１とユーザとの距離は、閾値Ｄ１より遠いと判定された場合、ステップＳ４２に進み、距離調整部１１４は、「ユーザに近づくように発話する」旨の行動依頼の指令を行動決定機構部１０３に送出して、距離調整処理を終了する。
【０１４７】
一方、ステップＳ４１において、ロボット１とユーザとの距離は、閾値Ｄ１より遠くないと判定された場合、ステップＳ４３に進み、距離調整部１１４は、ロボット１とユーザとの距離が閾値Ｄ１より近い（小さい）か否かを判定する。
【０１４８】
ステップＳ４３において、ロボット１とユーザとの距離は、閾値Ｄ１より近いと判定された場合、ステップＳ４４に進み、距離調整部１１４は、「ユーザに離れるように発話する」旨の行動依頼の指令を行動決定機構部１０３に送出して、距離調整処理を終了する。
【０１４９】
一方、ステップＳ４３において、ロボット１とユーザとの距離は、閾値Ｄ１より近くないと判定された場合、距離調整部１１４は、距離調整処理を終了する。
【０１５０】
ここで、図１３のステップＳ１で決定された行動で行う必要のある認識処理が音声認識処理である場合には、ステップＳ４２またはステップＳ４４における、「ユーザに近づくように発話する」および「ユーザに離れるように発話する」旨の行動依頼の指令の代わりに、「ユーザに、より大きい声を出してもらうように発話する」および「ユーザに、より小さい声を出してもらうように発話する」旨の行動依頼の指令を、距離調整部１１４から行動決定機構部１０３に送出するようにしてもよい。
【０１５１】
次に、図１６のフローチャートを参照して、図１３のステップＳ７における距離調整部１１４の距離調整処理の第２実施の形態（第２の距離調整処理）について説明する。なお、図１６において、図１５のフローチャートと同様の部分については、その説明を適宜省略する。即ち、図１６においては、ロボット１とユーザとの距離が、閾値Ｄ１より遠いと判定された場合に距離調整部１１４が行う処理（ステップＳ５２の処理）と、ロボット１とユーザとの距離が、閾値Ｄ１より近いと判定された場合に距離調整部１１４が行う処理（ステップＳ５４の処理）が図１５における場合と異なる。
【０１５２】
図１６の第２の距離調整処理では、ステップＳ５１とＳ５３において、図１５のステップＳ４１とＳ４３における場合とそれぞれ同様の処理が行われる。そして、ステップＳ５１において、ロボット１とユーザとの距離は、閾値Ｄ１より遠いと判定された場合、ステップＳ５２に進み、距離調整部１１４は、例えば、手招きなどの「ユーザに近づくように促す動作を行う」旨の行動依頼の指令を行動決定機構部１０３に送出して、距離調整処理を終了する。
【０１５３】
ステップＳ５３において、ロボット１とユーザとの距離は、閾値Ｄ１より近いと判定された場合、ステップＳ５４に進み、距離調整部１１４は、例えば、手で追い払うなどの「ユーザに離れるように促す動作を行う」旨の行動依頼の指令を行動決定機構部１０３に送出して、距離調整処理を終了する。
【０１５４】
ここで、ステップＳ５２またはＳ５４では、図１３のステップＳ１で決定された行動で行う必要のある認識処理が音声認識処理である場合には、距離推定部１１１で行われたロボット１とユーザとの距離の推定が、マイクロホン８２から距離推定部１１１に入力された、ユーザが発した音声の大きさに基づいて行われた場合、例えば、ロボット１の手を耳にかざすというような、ユーザにより大きい、または小さい声の発話を促す仕草の行動依頼の指令を、距離調整部１１４から行動決定機構部１０３に送出するようにしてもよい。
【０１５５】
次に、図１７のフローチャートを参照して、図１３のステップＳ７における距離調整部１１４の距離調整処理の第３実施の形態（第３の距離調整処理）について説明する。なお、図１７において、図１５のフローチャートと同様の部分については、その説明を適宜省略する。即ち、図１７においては、ロボット１とユーザとの距離が、閾値Ｄ１より遠いと判定された場合に距離調整部１１４が行う処理（ステップＳ６２の処理）と、ロボット１とユーザとの距離が、閾値Ｄ１より近いと判定された場合に距離調整部１１４が行う処理（ステップＳ６４の処理）が図１５における場合と異なる。
【０１５６】
図１７の第３の距離調整処理では、ステップＳ６１とＳ６３において、図１５のステップＳ４１とＳ４３における場合とそれぞれ同様の処理が行われる。そして、ステップＳ６１において、ロボット１とユーザとの距離は、閾値Ｄ１より遠いと判定された場合、ステップＳ６２に進み、距離調整部１１４は、ロボット１自身が「ユーザ方向に移動する」旨の行動依頼の指令を行動決定機構部１０３に送出して、距離調整処理を終了する。
【０１５７】
ステップＳ６３において、ロボット１とユーザとの距離は、閾値Ｄ１より近いと判定された場合、ステップＳ６４に進み、距離調整部１１４は、ロボット１自身が「ユーザから離れるように移動する」旨の行動依頼の指令を行動決定機構部１０３に送出して、距離調整処理を終了する。
【０１５８】
以上、距離調整部１１４による距離の調整方法として、ユーザに近づくまたは離れるようにロボット１が発話する方法（図１５に示した第１の距離調整処理）、ユーザが離れるまたは近づくように促す動作をロボット１が行う方法（図１６に示した第２の距離調整処理）、およびロボット１がユーザ方向に移動するまたはユーザから離れるように移動する方法（図１７に示した第３の距離調整処理）を採用した場合について説明したが、これらの３つの距離の調整方法は、そのうちのいずれか１つだけを採用しなければならないということはなく、例えば、距離調整部１１４において、上述の３つの方法のうちのいずれかを、ランダムに選択するようにすることができる。また、例えば、距離調整部１１４では、ユーザに近づくまたは離れるようにロボット１が発話するとともに、ユーザが離れるまたは近づくように促す動作をロボット１が行うことをともに行うようにすることも可能である。
【０１５９】
次に、図１８のフローチャートを参照して、図１３のステップＳ７における距離調整部１１４の距離調整処理の第４実施の形態（第４の距離調整処理）について説明する。なお、図１８において、図１５のフローチャートと同様の部分については、その説明を適宜省略する。即ち、図１８においては、ロボット１とユーザとの距離が、閾値Ｄ１より遠いと判定された場合に距離調整部１１４が行う処理（ステップＳ７２乃至Ｓ７４の処理）と、ロボット１とユーザとの距離が、閾値Ｄ１より近いと判定された場合に距離調整部１１４が行う処理（ステップＳＳ７６乃至Ｓ７８の処理）が図１５における場合と異なる。
【０１６０】
図１８のフローチャートに示す距離調整処理においては、ユーザに近づくまたは離れるようにロボット１が発話する方法（図１５に示した第１の距離調整処理）と、ロボット１がユーザ方向に移動するまたはユーザから離れるように移動する方法（図１７に示した第３の距離調整処理）のうちのいずれか一方を、ロボット１が移動する方向に障害物が存在するかどうかによって選択するようになっている。
【０１６１】
図１８の第４の距離調整処理では、ステップＳ７１とＳ７５において、図１５のステップＳ４１とＳ４３における場合とそれぞれ同様の処理が行われる。そして、ステップＳ７１において、ロボット１とユーザとの距離は、閾値Ｄ１より遠いと判定された場合、ステップＳ７２に進み、距離調整部１１４は、ロボット１からユーザ方向の平面には、障害物があるかどうかを判定する。
【０１６２】
ステップＳ７２において、ロボット１からユーザ方向の平面には、障害物があると判定された場合、ステップＳ７３に進み、図１５のステップＳ４２と同様に、距離調整部１１４は、「ユーザに近づくように発話する」旨の行動依頼の指令を行動決定機構部１０３に送出して、距離調整処理を終了する。
【０１６３】
一方、ステップＳ７２において、ロボット１からユーザ方向の平面には、障害物がないと判定された場合、ステップＳ７４に進み、図１７のステップＳ６２と同様に、距離調整部１１４は、ロボット１自身が「ユーザ方向に移動する」旨の行動依頼の指令を行動決定機構部１０３に送出して、距離調整処理を終了する。
【０１６４】
また、ステップＳ７５において、ロボット１とユーザとの距離は、閾値Ｄ１より近いと判定された場合、ステップＳ７６に進み、距離調整部１１４は、ロボット１の背面方向の平面には、障害物があるかどうかを判定する。
【０１６５】
ステップＳ７６において、ロボット１の背面方向の平面には、障害物があると判定された場合、ステップＳ７７に進み、図１５のステップＳ４４と同様に、距離調整部１１４は、「ユーザに離れるように発話する」旨の行動依頼の指令を行動決定機構部１０３に送出して、距離調整処理を終了する。
【０１６６】
一方、ステップＳ７６において、ユーザに対するロボット１の背面方向の平面には、障害物がないと判定された場合、ステップＳ７８に進み、図１７のステップＳ６４と同様に、距離調整部１１４は、ロボット１自身が「ユーザから離れるように移動する」旨の行動依頼の指令を行動決定機構部１０３に送出して、距離調整処理を終了する。
【０１６７】
ここで、ステップＳ７２およびＳ７６での障害物があるかどうかの判定は、例えば、次のようにして行うことができる。即ち、上述したように、超音波センサ８３の出力に基づいて距離推定部１１１が推定したロボット１とユーザとの距離が、その他のセンサの出力、例えば、ＣＣＤカメラ８１Ｌおよび８１Ｒからの画像信号からのステレオ処理に基づく距離の推定結果などと大きく差があり、さらに超音波センサ８３の出力に基づいて距離推定部１１１が推定したロボット１とユーザとの距離が、極めてロボット１に近い距離（位置）にある場合などに、超音波センサ８３が検出した物体は、ユーザではなく障害物であるとして、ロボット１とユーザとの間に障害物があると判定することができる。
【０１６８】
従って、距離推定部１１１において、ロボット１とユーザとの距離の推定が行われ、超音波センサ８３が検出した物体が障害物であると判定された場合、距離推定部１１１はその情報を推定距離情報とともに、距離調整部１１４に送出するようにすれば、超音波センサ８３は、ロボット１からユーザ方向に対する障害物の検知センサも兼ねることができる。
【０１６９】
一方、ロボット１の背面方向の障害物は、例えば、ロボット１に背面を向かせ（背面に顔を向かせ）、ロボット１の正面に配設されている超音波センサ８３を用いて検出する方法があるが、ロボット１が認識処理を必要とする行動は、ユーザと対話する場合が多く、ユーザに対して背を向ける（顔を背ける）ことは、動作として不自然であり、また振り向き直して認識を行うことは、処理の時間も長くかかる。そこで、例えば、ロボット１の背面側に、超音波センサ８３と同様の超音波センサ８３を取り付け、障害物を検知させる方法がある。しかしながら、ロボット１に各種のセンサを多く取り付けることは、コストの問題を発生させる。従って、ロボット１とユーザとの距離が閾値Ｄ１より近い場合には、ユーザにロボット１から離れるように移動してもらうようにして、ロボット１は、ユーザから離れる方向（背面方向）には移動しないことにすることができる。
【０１７０】
図１９は、そのような場合の、距離調整部１１４の距離調整処理の第５実施の形態（第５の距離調整処理）のフローチャートを示している。以下に、図１９の距離調整部１１４の距離調整処理について説明するが、図１８と同様の部分については、その説明は適宜省略する。
【０１７１】
即ち、ステップＳ９１乃至Ｓ９５では、図１８のステップＳ７１乃至Ｓ７５における場合とそれぞれ同様の処理が行われる。そして、ステップＳ９５において、ロボット１とユーザとの距離は、閾値Ｄ１より近いと判定された場合、ステップＳ９６に進み、図１８のステップＳ７７と同様に、距離調整部１１４は、「ユーザに離れるように発話する」旨の行動依頼の指令を行動決定機構部１０３に送出して、距離調整処理を終了する。
【０１７２】
図１８および図１９に示した距離調整部１１１の距離調整処理においては、ロボット１からユーザ方向またはユーザと反対方向（ロボット１の背面方向）の平面に障害物があるかどうかにより、ロボット１が移動するかまたはユーザに移動するように促す行動（発話）するかの場合分けを行うようにしたが、距離調整処理においては、その他、例えば、初めに、ロボット１が移動するようにして、その移動時に障害物を検出した場合は、ユーザに対して移動するように促す行動（発話）をとるようにしてもよい。この場合、ロボット１が移動中に障害物を検出する方法としては、例えば、脚部ユニット１４Ａまたは１４Ｂが障害物にぶつかると、モータのトルクが大きくなるので、そのアクチュエータＡ９乃至Ａ１４のトルクの変化を検出する方法などがある。
【０１７３】
以上のように、ＣＣＤカメラ８１Ｌおよび８１Ｒ、マイクロホン８２、および超音波センサ８３などから出力される信号に基づいて、ロボット１とユーザとの距離を推定し、その推定された距離が認識処理に適切な所定の範囲の距離から外れていた場合、ロボット１とユーザとの距離を調整するようにしたので、ロボット１の認識機能の認識精度を向上させることができる。
【０１７４】
上述の実施の形態においては、所定の範囲Ｒおよび閾値Ｄ１のそれぞれは、予め決められた値として、閾値設定部１１２に設定されている。しかし、所定の範囲Ｒおよび閾値Ｄ１のそれぞれは、例えば、周囲の状況やユーザの状態などによって適応的に変化させることもできる。
【０１７５】
図２０は、そのような、閾値設定部１１２が距離判定部１１３および距離調整部１１４に供給する所定の範囲Ｒおよび閾値Ｄ１のそれぞれを、動的に変化させる、図６に対応するメイン制御部６１の機能的構成例を示している。なお、図中、図６における場合と対応する部分については、同一の符号を付してあり、以下では、その説明は適宜省略する。即ち、図２０のメイン制御部６１は、図６における場合と、基本的に同様に構成されている。
【０１７６】
行動決定機構部１０３が、状態認識情報処理部１０１からの状態認識情報や、モデル記憶部１０２からの状態情報、時間経過等に基づいて、次の行動を決定し、決定された行動の内容が、例えば、「ユーザと会話する」や「ユーザに手を振る」というような音声認識処理や画像認識処理を必要とする場合、行動決定機構部１０３は、状態認識情報処理部１０１の音声認識部１０１Ａや画像認識部１０１Ｂが、精度良く認識することができるように、ロボット１とユーザとの距離を調整する旨の指令を距離制御部１１０に送出する。また、行動決定機構部１０３は、ロボット１とユーザとの距離を調整する旨の指令とともに、例えば、「足踏みしている」などのロボット１の現在の動作状態も距離制御部１１０に供給する。
【０１７７】
距離推定部１１１は、図６における場合と同様に、ＣＣＤカメラ８１Ｌおよび８１Ｒ、マイクロホン８２、および超音波センサ８３などからの各種の信号を基に、ロボット１とユーザとの距離を推定する。そして、距離推定部１１１は、推定した距離の推定距離情報を、距離判定部１１３と距離調整部１１４の他に、閾値設定部１１２にも送出する。
【０１７８】
音声認識における周囲の雑音の大きさは、認識処理に影響を及ぼすことがある。例えば、ロボット１とユーザとの距離が遠すぎるため、ユーザの発する音声の大きさと、周囲の雑音の区別がつきにくい場合、ロボット１とユーザとの距離を近づけることにより、ユーザの発する音声を大きくさせることができる。従って、周囲の雑音の大きさにより、ロボット１とユーザとの適切な距離は異なってくる。そこで、閾値設定部１１２は、周囲の雑音を求め、認識処理に適切なロボット１とユーザとの距離の範囲（所定の範囲Ｒ₁）を設定する。
【０１７９】
そのため閾値設定部１１２は、マイクロホン８２からの音声信号に基づいて、ロボット１の周囲の背景雑音を計測する。これは、例えば、マイクロホン８２が集音した音声信号のなかで、ユーザの発した音声信号が含まれていない区間（音声区間以外の期間）の音声信号を所定の時間だけ集め、集めた音声信号のパワー値の平均を求めることにより計算することができる。
【０１８０】
また、閾値設定部１１２は、行動決定機構部１０３から距離制御部１１０に供給されるロボット１の現在の動作状態を用いて、マイクロホン８２に入力された音声における、ロボット１自身が発生する雑音の影響（雑音成分）を推定する。ロボット１の動作状態としては、例えば、歩行中、立位静止、または座位などのロボット１の動作モードといえるようなものから、ロボット１の各関節部にあたるアクチュエータＡ１乃至Ａ１４の動作状態に至るまでの様々な情報が、行動決定機構部１０３から距離制御部１１０に供給される。例えば、ロボット１が歩行や足踏みをしている場合には、閾値設定部１１２は、ロボット１の足部４５と床面との打撃音や、ロボットのアクチュエータＡ１乃至Ａ１４のモータ音などのレベルを、ロボット１自身が発生する雑音の影響として推定する。
【０１８１】
そして、閾値設定部１１２は、上述したような、ロボット１の周囲の背景雑音、ロボット１の現在の動作状態などから、認識に適切なロボット１とユーザとの距離の範囲（所定の範囲Ｒ₁）を設定する。例えば、マイクロホン８２から入力される周囲の雑音が大きすぎて、ユーザの発する音声の区別がつきにくい場合、認識に適切なロボット１とユーザとの距離の範囲（所定の範囲Ｒ₁）を近い値とすることにより、ユーザにロボット１の方向に近づいてもらい、ユーザの音声信号のＳ／Ｎを高くさせることができる。
【０１８２】
また、閾値設定部１１２は、ＣＣＤカメラ８１Ｌおよび８１Ｒからの画像信号に顔画像を検出した場合、その検出した顔画像領域の大きさから、認識に適切なロボット１とユーザとの距離の範囲（所定の範囲Ｒ₂）を設定する。これは、検出した顔画像領域の大きさが、ＣＣＤカメラ８１Ｌおよび８１Ｒが出力する画像の画枠に対して大きすぎても、また小さすぎても、顔画像領域の大きさから推定するロボット１とユーザとの距離の誤差が、大きくなるためである。そこで、ＣＣＤカメラ８１Ｌおよび８１Ｒの有効画素の縦と横のそれぞれの画素数に対して、顔画像領域を含む長方形の縦と横のそれぞれの画素数が、例えば、１／２程度の画素数となるとき（このときの顔画像領域の大きさを、以下、適宜、適量値という）のロボット１とユーザとの距離を、認識に最適な距離とし、その最適な距離から所定のマージンをもつ範囲を、顔画像領域の大きさの検出による適切な認識の範囲（所定の範囲Ｒ₂）とすることができる。
【０１８３】
さらに、閾値設定部１１２は、マイクロホン８２からの音声信号を検出した場合、その音声信号の大きさから、認識に適切なロボット１とユーザとの距離の範囲（所定の範囲Ｒ₂）を設定する。これは、例えば、ユーザが発する音声が、ロボット１に近すぎて大きすぎる場合、マイクロホン８２が集音した音声波形は、マイクロホン８２のダイナミックレンジを超えて、本来の音声波形より歪んだ波形となる。逆に、ユーザが発する音声が、ロボット１から遠すぎて小さすぎる場合、周囲の雑音と認識されるべきユーザの音声の区別がつきにくくなる。そこで、マイクロホン８２に入力されるユーザの音声の平均レベルが、そのマイクロホン８２が測定可能な音声レベルの範囲の、例えば、中心値（ダイナミックレンジの中点）程度となるとき（このときのユーザの音声のレベルも、以下、適宜、適量値という）のロボット１とユーザとの距離を、認識に最適な距離とし、その最適な距離から所定のマージンをもつ範囲を、音声の大きさの検出による認識に適切なロボット１とユーザとの距離の範囲（所定の範囲Ｒ₂）とすることができる。
【０１８４】
さらに、閾値設定部１１２は、上述した、ロボット１の周囲の背景雑音、ロボット１の現在の動作状態などから計算された、認識に適切なロボット１とユーザとの距離の範囲（所定の範囲Ｒ₁）と、ＣＣＤカメラ８１Ｌおよび８１Ｒが出力する画像信号から検出した顔画像領域の大きさ、およびマイクロホン８２が出力する音声信号の大きさから計算された、認識に適切なロボット１とユーザとの距離の範囲（所定の範囲Ｒ₂）に基づいて、距離判定部１１３において閾値として用いられる所定の範囲Ｒを動的に設定する。
【０１８５】
さらに、閾値設定部１１２は、動的に設定した所定の範囲Ｒから、例えば、その中心値を算出し、閾値Ｄ１として設定する。そして、閾値設定部１１２は、所定の範囲Ｒの値を距離判定部１１３に、閾値Ｄ１を距離調整部１１４にそれぞれ送出する。
【０１８６】
次に、図２１のフローチャートを参照して、閾値設定部１１２による所定の範囲Ｒの動的決定処理について説明する。この処理は、例えば、ロボット１の電源が投入されている間、一定の周期で、実行される。
【０１８７】
初めに、ステップＳ１１１において、閾値設定部１１２は、マイクロホン８２からの音声信号に基づいて、ロボット１の周囲の背景雑音を計測して、ステップＳ１１２に進む。
【０１８８】
ステップＳ１１２において、閾値設定部１１２は、行動決定機構部１０３から距離制御部１１０に供給される、ロボット１の現在の動作状態を取得して、ステップＳ１１３に進む。
【０１８９】
ステップＳ１１３において、閾値設定部１１２は、ステップＳ１１１で計算したロボット１の周囲の背景雑音、ステップＳ１１２で取得したロボット１の現在の動作状態などから、認識に適切なロボット１とユーザとの距離の範囲を計算して、所定の範囲Ｒ₁として設定し、ステップＳ１１４に進む。
【０１９０】
ステップＳ１１４において、閾値設定部１１２は、マイクロホン８２から距離制御部１１０にユーザの音声入力があったか、またはＣＣＤカメラ８１Ｌおよび８１Ｒから距離制御部１１０に入力された画像信号のなかに、顔画像を検出したかどうかを判定する。
【０１９１】
ステップＳ１１４において、マイクロホン８２から距離制御部１１０にユーザの音声入力がなく、かつＣＣＤカメラ８１Ｌおよび８１Ｒから距離制御部１１０に入力された画像信号のなかに、顔画像が検出されなかったと判定された場合、後述するステップＳ１１５乃至ステップＳ１１８の処理を、スキップし、ステップＳ１１９に進む。
【０１９２】
一方、ステップＳ１１４において、マイクロホン８２から距離制御部１１０にユーザの音声入力があったか、またはＣＣＤカメラ８１Ｌおよび８１Ｒから距離制御部１１０に入力された画像信号のなかに、顔画像が検出されたと判定された場合、ステップＳ１１５に進み、マイクロホン８２から距離制御部１１０に入力されたユーザの音声の大きさ、またはＣＣＤカメラ８１Ｌおよび８１Ｒから距離制御部１１０に入力された画像信号のなかの顔画像領域の大きさが、適量値であるかどうかが、閾値設定部１１２によって判定される。ここで、ユーザの音声入力または顔画像の検出のどちらか一方だけが検出された場合は、ステップＳ１１５では、検出された信号についてのみ処理が行われる。
【０１９３】
ステップＳ１１５において、マイクロホン８２から距離制御部１１０に入力された音声の大きさ、並びにＣＣＤカメラ８１Ｌおよび８１Ｒから距離制御部１１０に入力された画像信号のなかの顔画像領域の大きさが、適量値であると判定された場合、後述するステップＳ１１６乃至ステップＳ１１８の処理を、スキップし、ステップＳ１１９に進む。
【０１９４】
一方、ステップＳ１１５において、マイクロホン８２から距離制御部１１０に入力された音声の大きさ、またはＣＣＤカメラ８１Ｌおよび８１Ｒから距離制御部１１０に入力された画像信号のなかの顔画像領域の大きさが適量値ではない、と判定された場合、ステップＳ１１６に進み、閾値設定部１１２は、距離推定部１１１から供給された推定距離情報、すなわち、ロボット１とユーザとの距離を取得して、ステップＳ１１７に進む。
【０１９５】
ステップＳ１１７において、閾値設定部１１２は、ステップＳ１１６で取得したロボット１とユーザとの距離が、ステップＳ１１３で計算した、認識に適切な所定の範囲Ｒ₁内であるかどうかを判定する。
【０１９６】
ステップＳ１１７において、ロボット１とユーザとの距離が、認識に適切な所定の範囲Ｒ₁内ではないと判定された場合、ステップＳ１１８の処理はスキップされ、ステップＳ１１９に進む。
【０１９７】
一方、ステップＳ１１７において、ロボット１とユーザとの距離が、認識に適切な所定の範囲Ｒ₁内であると判定された場合、ステップＳ１１８に進み、閾値設定部１１２は、ユーザの音声の大きさ、または顔画像領域の大きさを適量値にするロボット１とユーザとの距離の範囲である所定の範囲Ｒ₂を計算し、ステップＳ１１３で設定した所定の範囲Ｒ₁ではなく、ユーザの音声の大きさ、または顔画像領域の大きさを適量値にするロボット１とユーザとの距離の所定の範囲Ｒ₂を、認識に適切なロボット１とユーザとの距離の範囲（所定の範囲Ｒ）として設定して、ステップＳ１１９に進む。例えば、閾値設定部１１２がステップＳ１１６で距離推定部１１１から取得したロボット１とユーザとの距離は、所定の範囲Ｒ₁内であるが、ステップＳ１１８で計算された所定の範囲Ｒ₂が、例えば、入力画像から得られる顔画像領域の大きさが小さく、所定の範囲Ｒ₁よりもロボット１に近い距離の範囲として計算された場合、認識に適切なロボット１とユーザとの距離の所定の範囲Ｒとして、所定の範囲Ｒ₂が採用されて設定される。
【０１９８】
ここで、ステップＳ１１７において、閾値設定部１１２がステップＳ１１６で距離推定部１１１から取得したロボット１とユーザとの距離が、所定の範囲Ｒ₁内ではないと判定された場合には、音声や顔画像領域の大きさに拘らず、距離調整部１１４は、ロボット１とユーザとの距離を調整する指令を行動決定機構部１０３に送出することになるので、閾値設定部１１２は、あえて所定の範囲Ｒ₂を計算しない（ステップＳ１１８をスキップする）。
【０１９９】
一方、ロボット１とユーザとの距離が、所定の範囲Ｒ₁であるにも拘らず、マイクロホン８２から距離制御部１１０に入力された音声の大きさは適量ではない、またはＣＣＤカメラ８１Ｌおよび８１Ｒから距離制御部１１０に入力された画像信号のなかの顔画像領域の大きさが適量ではない場合には、より認識精度が良い範囲（所定の範囲Ｒ₂）が存在していることになるので、その所定の範囲Ｒ₂をステップＳ１１８で計算する。
【０２００】
なお、顔画像領域の大きさが適量ではない場合の他の対処方法としては、ＣＣＤカメラ８１Ｌおよび８１Ｒにズーム機構を備え、そのズーム機構により顔画像領域の大きさを調整する方法もあるが、ズームの比率にも限界があるため、ユーザの顔画像領域を最適にする範囲にも限界がある。また、ズーム機構を備えるには、ロボット１の構造的（場所的）制約やコストの制約などを受ける。従って、上述したように、ユーザまたはロボット１が距離を調整することによって、顔画像領域の大きさを調整するのが望ましい。
【０２０１】
ステップＳ１１９では、閾値設定部１１２は、所定の範囲Ｒから閾値Ｄ１を計算し、所定の範囲Ｒを距離判定部１１３に、閾値Ｄ１を距離調整部１１４にそれぞれ出力して、処理を終了する。
【０２０２】
以上のように、所定の範囲Ｒおよび閾値Ｄ１の値を、周囲の状況やユーザの状態などによって適応的に変化させることにより、ロボット１とユーザとの距離を常に認識機能に最適な距離に調整することができるので、認識機能の認識精度を向上させることができる。
【０２０３】
以上の実施の形態においては、外部センサ部７１のマイクロホン８２、ＣＣＤカメラ８１Ｌおよび８１Ｒ、並びにタッチセンサ５１などの各種のセンサからの出力信号は、常に状態認識情報処理部１０１に入力され、状態認識情報処理部１０１は、認識処理を行って、状態認識情報を、常時、行動決定機構部１０３に出力するようにしたが、行動決定機構部１０３が次の行動を決定し、その行動に認識処理が必要な時のみ、行動決定機構部１０３から状態認識情報処理部１０１にコマンドを送出することにより、状態認識情報処理部１０１に認識処理を行わせ、状態認識情報を受け取るようにしてもよい。
【０２０４】
また、閾値設定部１１２から距離調整部１１４に出力される閾値Ｄ１は、所定の範囲Ｒの中心値としたが、例えば、所定の範囲Ｒの上限値（ロボット１から遠い側の値）および下限値（ロボット１に近い側の値）などのように２つの閾値として設定することもできる。この場合、例えば、図１５に示した第１の距離調整処理のフローチャートにおいて、ステップＳ４１の閾値Ｄ１に代えて上限値を用いるとともに、ステップＳ４３の閾値Ｄ１に代えて下限値を用いることができる。
【０２０５】
上述した一連の処理を実行するプログラムは、フレキシブルディスク、CD-ROM(Compact Disc Read Only Memory)，MO(Magneto optical)ディスク，DVD(Digital Versatile Disc)、磁気ディスク、半導体メモリなどのリムーバブル記録媒体に、一時的あるいは永続的に格納（記録）しておくことができる。このようなリムーバブル記録媒体は、いわゆるパッケージソフトウエアとして提供することができ、メモリ６１Ａにインストールされる。
【０２０６】
さらに、プログラムは、上述したようなリムーバブル記録媒体からメモリ６１Ａにインストールする他、ダウンロードサイトから、ディジタル衛星放送用の人工衛星を介して、コンピュータに無線で転送したり、LAN(Local Area Network)、インターネットといったネットワークを介して、コンピュータに有線で転送し、コンピュータでは、そのようにして転送されてくるプログラムを受信し、メモリ６１Ａにインストールすることができる。
【０２０７】
なお、本明細書において、記録媒体に記録されるプログラムを記述するステップは、記載された順序に沿って時系列的に行われる処理はもちろん、必ずしも時系列的に処理されなくとも、並列的あるいは個別に実行される処理をも含むものである。
【０２０８】
ロボット１とユーザとの距離は、上述したようなマイクロホン８２などの出力から推定して求める方法の他、例えば、監視カメラなどを用いることによってロボット１の外部で検出し、例えば、無線などによりロボット１に供給するようにしてもよい。
【０２０９】
【発明の効果】
以上のごとく本発明によれば、ロボットの認識機能の認識精度を向上させることができる。
【図面の簡単な説明】
【図１】本発明を適用したロボットの外観構成を示す斜視図である。
【図２】図１のロボットの外観構成を示す、背後側の斜視図である。
【図３】図１のロボットについて説明するための略線図である。
【図４】図１のロボットの制御に関する部分を主に説明するためのブロック図である。
【図５】図１のロボットの内部構成を示すブロック図である。
【図６】図５のメイン制御部の構成を示すブロック図である。
【図７】超音波センサの処理を説明する図である。
【図８】基準カメラおよび検出カメラで、ユーザを撮影している状態を示す図である。
【図９】エピポーララインを説明するための図である。
【図１０】基準カメラ画像および検出カメラ画像を示す図である。
【図１１】評価値の推移を示す図である。
【図１２】設定点／距離テーブルおよび視差／距離テーブルを示す図である。
【図１３】図１のロボットのロボット動作処理を説明するフローチャートである。
【図１４】図１３のステップＳ４の距離推定処理を説明するフローチャートである。
【図１５】図１３のステップＳ７の距離調整処理を説明するフローチャートである。
【図１６】図１３のステップＳ７の距離調整処理を説明するフローチャートである。
【図１７】図１３のステップＳ７の距離調整処理を説明するフローチャートである。
【図１８】図１３のステップＳ７の距離調整処理を説明するフローチャートである。
【図１９】図１３のステップＳ７の距離調整処理を説明するフローチャートである。
【図２０】図５のメイン制御部の構成を示すブロック図である。
【図２１】動的に所定の範囲Ｒを決定する場合の閾値設定部の範囲の動的決定処理について説明するフローチャートである。
【符号の説明】
１ロボット，６１メイン制御部，６３サブ制御部，７１外部センサ部，７２スピーカ，８１ＬＣＣＤカメラ，８１ＲＣＣＤカメラ，８２マイクロホン，８３超音波センサ，１０１状態認識情報処理部，１０２モデル記憶部，１０３行動決定機構部，１０４姿勢遷移機構部，１０５音声合成部，１１０距離制御部，１１１距離推定部，１１２閾値設定部，１１３距離判定部，１１４距離調整部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a robot apparatus, a robot control method, a recording medium, and a program, and in particular, for example, a robot capable of improving the recognition accuracy of a robot recognition function by adjusting the distance between the user and the robot. The present invention relates to an apparatus, a robot control method, a recording medium, and a program.
[0002]
[Prior art]
In recent years, robots (including stuffed animals in this specification) having a recognition function such as a voice recognition device and an image recognition device have been commercialized as toys and the like. For example, a robot equipped with a speech recognition device recognizes speech uttered by a user, and autonomously performs actions such as performing a certain gesture or outputting synthesized sound based on the speech recognition result. Has been made.
[0003]
When a robot equipped with a voice recognition device recognizes a voice uttered by a user, when the user who uttered the voice is too far away from the robot, the user's speech acquired by a microphone attached to the robot is obtained. The signal value of the voice waveform is attenuated and the noise level becomes relatively high. That is, the S / N ratio (Signal to Noise ratio) of the user's voice signal acquired by the microphone is low. In general, as the distance between the user (speaker) and the robot (microphone attached to the robot) increases, the waveform of the audio signal is more affected by the reverberation characteristics. Therefore, when the distance between the user and the robot is too great, the recognition accuracy of the voice recognition device for the robot deteriorates.
[0004]
On the other hand, when the distance between the user and the robot is too close, the signal value of the voice waveform issued by the user acquired by the microphone attached to the robot exceeds the detectable range of the microphone. Therefore, the speech waveform acquired by the microphone is saturated and becomes a waveform distorted from the original speech waveform. If the distance between the user and the robot is too close, the robot speech recognition apparatus recognizes such a distorted waveform as a voice, and the accuracy of speech recognition deteriorates.
[0005]
Therefore, along with the speech recognition results, ambient noise detection that detects the effects of ambient noise, power shortage detection that detects situations where the power of the input voice satisfies a certain threshold condition, power overload detection, etc., and voice recognition results And a method for coping with the problem of voice recognition accuracy degradation in a robot using the result of situation detection has been proposed (for example, see Non-Patent Document 1).
[0006]
[Non-Patent Document 1]
Iwasawa, Onaka, Fujita, “A Method and Evaluation of Speech Recognition Interface for Robots Using Situation Detection”, Japanese Society for Artificial Intelligence, Artificial Intelligence Society, November 2002, p. 33-38
[0007]
[Problems to be solved by the invention]
However, in the method shown in Non-Patent Document 1, the distance between the user (speaker) and the robot (microphone attached to the robot) is not considered. Therefore, for example, although the user is speaking to the robot with a loud voice, since the distance between the user and the robot is far away, the audio signal input to the microphone has a low S / N, and the user If the robot cannot recognize the voice, a problem may arise that the robot asks the user to speak louder.
[0008]
On the other hand, in a robot equipped with an image recognition device, when the image recognition device identifies a user using an image obtained by imaging the user, if the distance between the user and the robot is too far, the image is displayed. The user's identification accuracy deteriorates due to the influence of the resolution of the image pickup apparatus that has picked up the image. For example, when the robot image recognition device identifies a user from the user's face image captured by the imaging device, the distance from the robot imaging device to the user increases, and the number of pixels in the face area in the captured image decreases. As a result, the number of effective pixels that can be used by the image recognition apparatus for recognition decreases, and the recognition accuracy of the image recognition apparatus may deteriorate.
[0009]
Also, when the distance between the user and the robot is too close, the entire face area of the user does not fit within the image frame output by the imaging device, and the user's identification accuracy by the image recognition device may deteriorate.
[0010]
The present invention has been made in view of such a situation. For example, when a recognition device such as a voice or an image attached to a robot recognizes a voice or an image to be recognized such as a user, the robot By adjusting the distance between the robot and the user so that the distance between the user and the user becomes an appropriate distance for the recognition device, the recognition accuracy of the robot recognition device can be improved.
[0011]
[Means for Solving the Problems]
 The robot apparatus of the present inventionIn a robot apparatus having a recognition function for recognizing a predetermined recognition target, a plurality of imaging means for imaging a surrounding situation and outputting an image signal, a voice input means for inputting sound, and a recognition target by emitting an ultrasonic pulse Output results of ultrasonic output means for receiving reflected waves reflected from a plurality of imaging means, audio input means, and ultrasonic output meansA distance estimation unit that estimates a distance to the recognition target, and a distance adjustment unit that adjusts the distance to the recognition target based on the distance estimated by the distance estimation unit;The distance estimation means has a large difference between the distance estimated from the output result of the ultrasonic output means and the distance estimated from the output results of the plurality of imaging means and the voice input means, and the output result of the ultrasonic output means If the estimated distance is short, it is determined that the object detected by the ultrasonic output means is an obstacle, the output result of the ultrasonic output means is excluded, the distance to the recognition target is estimated, and the distance adjustment means Adjusts the distance to the user by moving the robot device so that the user moves when it is determined that there is an obstacle, and moves the robot device when it is determined that there is no obstacle To adjust the distance to the recognition targetIt is characterized by.
[0012]
 distanceThe estimation means can estimate the distance to the recognition target by performing stereo processing using image signals output from the plurality of imaging means.
[0013]
 Recognized by multiple imaging meansThe recognition target is the user,The distance estimation means can detect the user's face area using the image signal output from the imaging means, and can estimate the distance to the user based on the face area.
[0014]
 Recognized by voice input meansThe recognition target is the user,The distance estimation unit can estimate the distance to the user based on the volume of the voice uttered by the user input to the voice input unit.
[0018]
Voice output means for outputting voice can be further provided, and the distance adjustment means can adjust the distance to the user by causing the voice output means to output voice so that the user moves.
[0019]
The distance adjustment means can adjust the distance to the user by causing the robot device to perform an operation that prompts the user to move.
[0020]
Determination means for determining whether the distance estimated by the distance estimation means is within a predetermined range is further provided, and the distance adjustment means is configured to adjust the distance to the recognition target based on the determination result of the determination means. Can be.
[0021]
Range setting means for setting a predetermined range can be further provided.
[0022]
 Range setting meansIn,Measure the background noise of the surroundings based on the output result of the voice input means, and dynamically set a predetermined range according to the magnitude of the background noiseYou can make it.
 Further, the range setting means may acquire an operation state of the robot apparatus, estimate a noise component generated by the robot apparatus itself, and dynamically set a predetermined range depending on the magnitude of the noise component. it can.
[0023]
 The robot control method of the present invention includes:In a robot control method for controlling a robot apparatus having a recognition function for recognizing a predetermined recognition target, an imaging step for imaging a surrounding situation and outputting an image signal, a voice input step for inputting voice, and an ultrasonic pulse Based on the processing results of the ultrasonic output step that emits and receives the reflected wave reflected from the recognition target, the imaging step, the voice input step, and the ultrasonic output step,A distance estimation step for estimating the distance to the recognition target, and a distance adjustment step for adjusting the distance to the recognition target based on the processing result of the distance estimation step;In the distance estimation step, the difference between the distance estimated from the processing result of the ultrasonic output step and the distance estimated from the processing result of the imaging step and the voice input step is large, and the processing result of the ultrasonic output step If the estimated distance is short, it is determined that the recognition target reflected by the ultrasonic pulse is an obstacle, the processing result of the ultrasonic output step is excluded, the distance to the recognition target is estimated, and the distance adjustment step If it is determined that there is an obstacle, the robot device is caused to act so that the user moves, thereby adjusting the distance to the user. If it is determined that there is no obstacle, the robot device is moved. By adjusting the distance to the recognition targetIt is characterized by.
[0024]
 The program of the recording medium of the present invention isA program for causing a computer to control a robot apparatus having a recognition function for recognizing a predetermined recognition object, imaging a surrounding situation, outputting an image signal, and an audio input step for inputting sound; Based on the processing result of the ultrasonic output step that emits an ultrasonic pulse and receives the reflected wave reflected from the recognition target, the imaging step, the voice input step, and the ultrasonic output step,A distance estimation step for estimating the distance to the recognition target, and a distance adjustment step for adjusting the distance to the recognition target based on the processing result of the distance estimation step;In the distance estimation step, the difference between the distance estimated from the processing result of the ultrasonic output step and the distance estimated from the processing result of the imaging step and the voice input step is large, and the processing result of the ultrasonic output step If the estimated distance is short, it is determined that the recognition target reflected by the ultrasonic pulse is an obstacle, the processing result of the ultrasonic output step is excluded, the distance to the recognition target is estimated, and the distance adjustment step If it is determined that there is an obstacle, the robot device is caused to act so that the user moves, thereby adjusting the distance to the user. If it is determined that there is no obstacle, the robot device is moved. By adjusting the distance to the recognition targetIt is characterized by.
[0025]
 The program of the present inventionIn a program for causing a computer to control a robot apparatus having a recognition function for recognizing a predetermined recognition target, an imaging step for imaging a surrounding situation and outputting an image signal, a voice input step for inputting sound, and an ultrasonic wave Based on the processing result of the ultrasonic output step that emits a pulse and reflects the reflected wave reflected from the recognition target, the imaging step, the voice input step, and the ultrasonic output step,A distance estimation step for estimating the distance to the recognition target, and a distance adjustment step for adjusting the distance to the recognition target based on the processing result of the distance estimation step;In the distance estimation step, the difference between the distance estimated from the processing result of the ultrasonic output step and the distance estimated from the processing result of the imaging step and the voice input step is large, and the processing result of the ultrasonic output step If the estimated distance is short, it is determined that the recognition target reflected by the ultrasonic pulse is an obstacle, the processing result of the ultrasonic output step is excluded, the distance to the recognition target is estimated, and the distance adjustment step If it is determined that there is an obstacle, the robot device is caused to act so that the user moves, thereby adjusting the distance to the user. If it is determined that there is no obstacle, the robot device is moved. Process to adjust the distance to the recognition targetIs executed by a computer.
[0026]
 In the present invention,If there is a large difference between the distance estimated from the ultrasonic output result and the distance estimated from the imaging result and the voice input result, and the distance estimated from the ultrasonic output result is short, the recognition target reflected by the ultrasonic pulse is obstructed. If it is determined that the object is an object, the ultrasonic output result is excluded and the distance to the recognition target is estimated, and if it is determined that there is an obstacle, the user moves the robot device so that the user moves. The distance to the recognition target is adjusted by moving the robot device when it is determined that there is no obstacle.Is adjusted.
[0027]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 is a front perspective view of a biped walking robot 1 to which the present invention is applied, and FIG. 2 is a perspective view of the robot 1 from the back side. FIG. 3 is a perspective view for explaining the axis configuration of the robot 1.
[0028]
In the robot 1, a head unit 12 is disposed on the upper portion of the body unit 11, and arm units 13A and 13B having the same configuration are attached to predetermined positions on the upper left and right of the body unit 11, respectively. In addition, leg units 14A and 14B having the same configuration are respectively attached to predetermined positions on the lower left and right sides of the body unit 11. The head unit 12 is provided with a touch sensor 51.
[0029]
In the torso unit 11, a frame 21 that forms the upper part of the trunk and a waist base 22 that forms the lower part of the trunk are connected via a hip joint mechanism 23. By driving the actuator A1 and the actuator A2 of the lumbar joint mechanism 23 fixed to each other, the upper part of the trunk is independently rotated around the orthogonal roll axis 24 and pitch axis 25 shown in FIG. It has been made so that it can.
[0030]
The head unit 12 is attached to the center of the upper surface of the shoulder base 26 fixed to the upper end of the frame 21 via a neck joint mechanism 27. By driving the actuators A3 and A4 of the neck joint mechanism 27, respectively. The pitch axis 28 and yaw axis 29 shown in FIG. 3 can be rotated independently of each other.
[0031]
Further, the arm units 13A and 13B are respectively attached to the left and right sides of the shoulder base 26 via the shoulder joint mechanism 30. By driving the actuators A5 and A6 of the corresponding shoulder joint mechanism 30, respectively, FIG. The pitch axis 31 and the roll axis 32 that are shown in the figure can be rotated independently of each other.
[0032]
In the arm units 13A and 13B, an actuator A8 that forms a forearm is connected to an output shaft of an actuator A7 that forms an upper arm via an elbow joint mechanism 33, and a hand 34 is attached to the tip of the forearm. It is constituted by.
[0033]
In the arm units 13A and 13B, the forearm can be rotated with respect to the yaw shaft 35 shown in FIG. 3 by driving the actuator A7, and the forearm is shown in FIG. 3 by driving the actuator A8. The pitch shaft 36 can be rotated.
[0034]
The leg units 14A and 14B are respectively attached to the lower back base 22 of the trunk via the hip joint mechanism 37, and the actuators A9 to A11 of the corresponding hip joint mechanism 37 are respectively driven, as shown in FIG. The yaw axis 38, the roll axis 39, and the pitch axis 40 that are orthogonal to each other can be independently rotated.
[0035]
In the leg units 14A and 14B, the lower end of the frame 41 forming the thigh is connected to the frame 43 forming the lower leg part via the knee joint mechanism 42, and the lower end of the frame 43 is connected to the ankle joint. It is configured by being connected to the foot 45 via the mechanism 44.
[0036]
Accordingly, in the leg units 14A and 14B, by driving the actuator A12 that forms the knee joint mechanism 42, the lower leg can be rotated with respect to the pitch axis 46 shown in FIG. By driving the actuators A13 and A14 of the mechanism 44, respectively, the foot portions 45 can be independently rotated with respect to the orthogonal pitch shaft 47 and roll shaft 48 shown in FIG.
[0037]
A control unit 52, which is a box containing a main control unit 61 and a peripheral circuit 62 (both shown in FIG. 4), which will be described later, is provided on the back side of the waist base 22 that forms the lower trunk of the trunk unit 11. It is arranged.
[0038]
FIG. 4 shows a configuration example of the actuator of the robot 1 and its control system.
[0039]
The control unit 52 houses a main control unit 61 that controls the operation of the entire robot 1, a peripheral circuit 62 such as a power supply circuit and a communication circuit, a battery 74 (FIG. 5), and the like.
[0040]
The control unit 52 includes sub-control units 63A to 63D disposed in each component unit (the body unit 11, the head unit 12, the arm units 13A and 13B, and the leg units 14A and 14B), respectively. To supply necessary power supply voltages to the sub-control units 63A to 63D and communicate with the sub-control units 63A to 63D.
[0041]
Further, the sub-control units 63A to 63D are respectively connected to the actuators A1 to A14 in the corresponding component unit, and based on various control commands supplied from the main control unit 61, the actuators A1 to A1 in the component unit. A14 is controlled to be driven to a designated state.
[0042]
FIG. 5 is a block diagram illustrating an example of an electrical internal configuration of the robot 1.
[0043]
The head unit 12 includes CCD (Charge Coupled Device) cameras 81L and 81R that function as “eyes” of the robot 1, microphones 82-1 to 82-N that function as “ears”, a touch sensor 51, and an ultrasonic sensor. An external sensor unit 71 composed of 83 and the like, a speaker 72 functioning as a “mouth”, and the like are disposed at predetermined positions, and an internal sensor unit 73 composed of a battery sensor 91, an acceleration sensor 92, and the like is provided in the control unit 52. Is arranged.
[0044]
Then, the CCD cameras 81L and 81R of the external sensor unit 71 capture the surrounding situation and send the obtained image signal S1A to the main control unit 61. The microphones 82-1 to 82-N collect various command voices such as “walk”, “tore” or “lift the right hand” given as voice input from the user and surrounding background noises, and obtain voice signals. S1B is sent to the main control unit 61, respectively. In the following, the N microphones 82-1 to 82-N are referred to as microphones 82 when it is not necessary to distinguish them.
[0045]
In addition, the touch sensor 51 is provided at the upper part of the head unit 12 as shown in FIGS. 1 and 2, for example, and pressure received by a physical action such as “blow” or “slap” from the user. And the detection result is sent to the main controller 61 as a pressure detection signal S1C.
[0046]
The ultrasonic sensor 83 has a sound source (not shown) and a microphone, and emits an ultrasonic pulse from the sound source inside the ultrasonic sensor 83. Further, the ultrasonic sensor 83 receives a reflected wave that is reflected by the user or other object and is returned by the microphone, and after the ultrasonic pulse is emitted until the reflected wave is received ( S1D (hereinafter referred to as lag time) is obtained and sent to the main control unit 61.
[0047]
The battery sensor 91 of the internal sensor unit 73 detects the remaining energy of the battery 74 at a predetermined cycle, and sends the detection result to the main control unit 61 as a remaining battery level detection signal S2A. The acceleration sensor 92 detects accelerations in three axis directions (x axis, y axis, and z axis) with respect to the movement of the robot 1 at a predetermined cycle, and the detection result is set as an acceleration detection signal S2B to the main control unit 61. To send.
[0048]
The external memory 75 stores programs, data, control parameters, and the like, and supplies the programs and data to the memory 61A built in the main control unit 61 as necessary. The external memory 75 receives data from the memory 61A and stores it. The external memory 75 is detachable from the robot 1.
[0049]
The main control unit 61 has a built-in memory 61A. The memory 61A stores programs and data, and the main control unit 61 performs various processes by executing the programs stored in the memory 61A. That is, the main control unit 61 is supplied from the CCD cameras 81L and 81R of the external sensor unit 71, the microphone 82, the touch sensor 51, and the ultrasonic sensor 83, respectively, the image signal S1A, the audio signal S1B, the pressure detection signal S1C, And a lag time S1D (hereinafter collectively referred to as an external sensor signal S1), a battery remaining amount detection signal S2A and an acceleration detection signal S2B (supplied from the battery sensor 91 and the acceleration sensor of the internal sensor unit 73, respectively) Hereinafter, based on the internal sensor signal S 2, the surrounding and internal conditions of the robot 1, the command from the user, the presence / absence of an action from the user, and the like are determined.
[0050]
And the main control part 61 is the control program previously stored in the internal memory 61A, the judgment result of the circumference | surroundings and inside of the robot 1, the instruction | command from a user, or the presence or absence of the action from a user, or Based on various control parameters stored in the external memory 75 loaded at that time, the action of the robot 1 is determined, a control command based on the determination result is generated, and the corresponding sub-control units 63A to 63D. To send. Based on the control command supplied from the main control unit 61, the sub control units 63A to 63D control driving of the corresponding ones of the actuators A1 to A14. Thereby, for example, the robot 1 swings the head unit 12 up and down, left and right, raises the arm unit 13A or the arm unit 13B, or alternately drives the leg units 14A and 14B. And perform actions such as walking.
[0051]
Moreover, the main control part 61 outputs the audio | voice based on audio | voice signal S3 outside by giving the predetermined audio | voice signal S3 to the speaker 72 as needed. Further, the main control unit 61 blinks the LED by outputting a drive signal to an LED (not shown) provided at a predetermined position of the head unit 12 that functions as an “eye” in appearance.
[0052]
In this way, the robot 1 behaves autonomously based on surrounding and internal situations (states), instructions from the user, presence / absence of actions, and the like.
[0053]
FIG. 6 shows a functional configuration example of the main control unit 61 of FIG. Note that the functional configuration shown in FIG. 6 is realized by the main control unit 61 executing a control program stored in the memory 61A.
[0054]
The main control unit 61 updates the state recognition information processing unit 101 that recognizes a specific external state, the recognition result of the state recognition information processing unit 101, the emotion, instinct, or growth state of the robot 1 Based on the determination result of the behavior determination mechanism unit 103 and the behavior determination mechanism unit 103 that determines the behavior of the robot 1 based on the recognition result of the state recognition information processing unit 101 and the like. Distance control that controls the adjustment of the distance between the robot 1 and the user based on commands from the posture transition mechanism unit 104 that causes the robot 1 to perform an action, the voice synthesis unit 105 that generates synthesized sound, and the action determination mechanism unit 103 The unit 110 is configured.
[0055]
Audio signals, image signals, pressure detection signals, and the like from the microphone 82, the CCD cameras 81L and 81R, the touch sensor 51, and the like are constantly input to the state recognition information processing unit 101 while the robot 1 is powered on. The Then, the state recognition information processing unit 101 receives a specific external state or a user's input based on an audio signal, an image signal, a pressure detection signal, or the like given from the microphone 82, the CCD cameras 81L and 81R, the touch sensor 51, or the like. A specific action, an instruction from the user, and the like are recognized, and state recognition information representing the recognition result is constantly output to the model storage unit 102 and the action determination mechanism unit 103. Here, for example, the distance between the user and the robot 1 is not an optimal (appropriate) distance for the state recognition information processing unit 101 to perform voice recognition, image recognition, or the like. Even when it is difficult to perform accurate recognition, the state recognition information processing unit 101 outputs state recognition information representing the above-described recognition result to the model storage unit 102 and the action determination mechanism unit 103. To do.
[0056]
The state recognition information processing unit 101 includes a voice recognition unit 101A, an image recognition unit 101B, and a pressure processing unit 101C.
[0057]
The voice recognition unit 101A performs voice recognition on the voice signal S1B given from each of the microphones 82-1 to 82-N. Then, the voice recognition unit 101A sends, for example, commands such as “walk”, “stop”, and “lift the right hand” and other voice recognition results to the model storage unit 102 and the action determination mechanism unit 103 as state recognition information. Notice.
[0058]
The image recognition unit 101B performs image recognition processing using the image signal S1A given from the CCD cameras 81L and 81R. When the image recognition unit 101B detects, for example, “a red round object”, “a plane perpendicular to the ground and higher than a predetermined height” as a result of the processing, “there is a ball”, “ An image recognition result such as “There is a wall” is notified to the model storage unit 102 and the action determination mechanism unit 103 as state recognition information.
[0059]
Here, since it is generally expected that the user often talks from the front direction of the robot 1, the CCD cameras 81 L and 81 R that capture the surrounding situation are set so that the imaging direction is the front direction of the robot 1. It is assumed that the head unit 12 (FIG. 1) is installed.
[0060]
Note that when the user talks from the front direction of the robot 1, for example, from the side or back direction, the CCD cameras 81 L and 81 R cannot capture the user. Therefore, for example, the direction of the sound source is estimated from the power difference or phase difference of the audio signals reaching the microphones 82-1 to 82-N, and the maximum audio level is obtained from the microphones 82-1 to 82-N. By moving the head unit 12 in the direction of the object, the CCD cameras 81L and 81R can capture the user. In voice recognition, for example, voice data output from a microphone that can obtain a maximum voice level (basically, a microphone provided in the front direction when the robot 1 faces the user). Are subject to speech recognition.
[0061]
Further, for example, a microphone having directivity in the same direction as the imaging direction of the CCD cameras 81L and 81R is adopted as the microphone 82, and the head unit 12 is moved in a direction in which the sound level input to the microphone 82 is maximized. Thus, it is possible to allow the CCD cameras 81L and 81R to image the user.
[0062]
The pressure processing unit 101C processes the pressure detection signal S1C given from the touch sensor 51. Then, as a result of the processing, for example, when the pressure processing unit 101C detects a pressure that is equal to or higher than a predetermined threshold and for a short time, the pressure processing unit 101C recognizes that it has been struck and is below the predetermined threshold. When a long-time pressure is detected, it is recognized as “boiled (praised)”, and the recognition result is notified to the model storage unit 102 and the action determination mechanism unit 103 as state recognition information.
[0063]
The model storage unit 102 stores and manages an emotion model, an instinct model, and a growth model that express the emotion, instinct, and growth state of the robot 1, respectively.
[0064]
Here, the emotion model includes, for example, emotion states (degrees) such as “joyfulness”, “sadness”, “anger”, “fun”, etc. within a predetermined range (for example, −1.0 to 1.. 0), and the value is changed based on the state recognition information from the state recognition information processing unit 101, the passage of time, and the like. The instinct model represents, for example, the state (degree) of desire by instinct such as “appetite”, “sleep desire”, “exercise desire”, etc., by a predetermined range of values. The value is changed based on information, time passage, or the like. The growth model represents, for example, growth states (degrees) of “childhood”, “adolescence”, “mature age”, “old age”, and the like by values in a predetermined range, and the state recognition information processing unit 101 The value is changed on the basis of the state recognition information or the passage of time.
[0065]
The model storage unit 102 sends the emotion, instinct, and growth states represented by the values of the emotion model, instinct model, and growth model as described above to the action determination mechanism unit 103 as state information.
[0066]
Note that the model storage unit 102 is supplied with state recognition information from the state recognition information processing unit 101, and from the behavior determination mechanism unit 103, the current or past behavior of the robot 1, specifically, for example, “long Action information indicating the content of the action such as “walked in time” is supplied, and the model storage unit 102 responds to the action of the robot 1 indicated by the action information even if the same state recognition information is given. Thus, different state information is generated.
[0067]
That is, for example, when the robot 1 greets the user and strokes the head, the behavior information that the user has been greeted and the state recognition information that the head has been stroked are the model storage unit. In this case, in the model storage unit 102, the value of the emotion model representing “joyfulness” is increased.
[0068]
On the other hand, when the robot 1 is stroked while performing some kind of work, behavior information indicating that the work is being performed and state recognition information indicating that the head has been stroked are given to the model storage unit 102. In this case, the value of the emotion model representing “joyfulness” is not changed in the model storage unit 102.
[0069]
As described above, the model storage unit 102 sets the value of the emotion model while referring not only to the state recognition information but also to behavior information indicating the current or past behavior of the robot 1. This causes an unnatural emotional change that increases the value of the emotion model that expresses “joyfulness” when, for example, the user is stroking his / her head while performing some task. You can avoid that.
[0070]
Note that the model storage unit 102 increases or decreases the values of the instinct model and the growth model based on both the state recognition information and the behavior information, as in the emotion model. In addition, the model storage unit 102 is configured to increase or decrease the values of the emotion model, instinct model, and growth model based on the values of other models.
[0071]
The action determination mechanism unit 103 determines the next action based on the state recognition information from the state recognition information processing unit 101, the state information from the model storage unit 102, the passage of time, and the like, and the content of the determined action is For example, when voice recognition processing or image recognition processing such as “dancing” is not required, the content of the action is sent to the posture transition mechanism unit 104 as action command information.
[0072]
That is, the behavior determination mechanism unit 103 manages a finite automaton that associates the behavior that can be taken by the robot 1 with a state (state) as a behavior model that defines the behavior of the robot 1. The state in the finite automaton is transitioned based on the state recognition information from the state recognition information processing unit 101, the value of the emotion model, instinct model, or growth model in the model storage unit 102, the time course, etc., and the state after the transition The corresponding action is determined as the next action to be taken.
[0073]
Here, when the behavior determination mechanism unit 103 detects that a predetermined trigger (trigger) has occurred, the behavior determination mechanism unit 103 transitions the state. That is, the behavior determination mechanism unit 103 is supplied from the model storage unit 102 when, for example, the time during which the behavior corresponding to the current state is executed reaches a predetermined time or when specific state recognition information is received. The state is changed when the emotion, instinct, and growth state values indicated by the state information are below or above a predetermined threshold.
[0074]
As described above, the behavior determination mechanism unit 103 is based not only on the state recognition information from the state recognition information processing unit 101 but also on the emotion model, instinct model, growth model value, etc. in the model storage unit 102. Since the state in the behavior model is transitioned, even if the same state recognition information is input, the transition destination of the state differs depending on the value (state information) of the emotion model, instinct model, and growth model.
[0075]
On the other hand, the behavior determination mechanism unit 103 determines the next behavior based on the state recognition information from the state recognition information processing unit 101, the state information from the model storage unit 102, the passage of time, and the like. The content is, for example, “conversing with the user” that recognizes the voice uttered by the user and makes a corresponding utterance, or recognizes the user (its face image) and performs an action of waving the user. When a voice recognition process or an image recognition process such as “waving your hand” is required, the behavior determination mechanism unit 103 uses the voice recognition unit 101A and the image recognition unit 101B of the state recognition information processing unit 101 described above with high accuracy. A command to adjust the distance between the robot 1 and the user is sent to the distance control unit 110 so that it can be recognized. Further, as described above, even if the state recognition information processing unit 101 has a recognition result that is not good in recognition accuracy, for example, it detects a user's face image determined from a skin color area of the image signal and the like, and detects it. When the state recognition information indicating that the action has been performed is sent to the action determination mechanism unit 103, the action determination mechanism unit 103 adjusts the distance between the robot 1 and the user so that the user's face image can be recognized more accurately. An instruction to do so is sent to the distance control unit 110.
[0076]
When the command that the distance between the robot 1 and the user is adjusted is supplied from the distance control unit 110 to the behavior determination mechanism unit 103, the behavior determination mechanism unit 103 is the same as when the command is supplied. At the timing, the state recognition information supplied from the state recognition information processing unit 101 is acquired as the state recognition information with good recognition accuracy, and as described above, for example, “conversation with the user” or “was hand to the user” The action determination mechanism unit 103 itself performs the action determined earlier (sends the content of the action to the posture transition mechanism unit 104 as action command information).
[0077]
In addition, as described above, the behavior determination mechanism unit 103 generates behavior command information for causing the robot 1 to speak in addition to behavior command information for operating the head, limbs, and the like of the robot 1. The action command information for causing the robot 1 to speak is supplied to the voice synthesizer 105, and the action command information supplied to the voice synthesizer 105 includes the synthesized sound generated by the voice synthesizer 105. Corresponding text etc. are included. When the voice synthesis unit 105 receives the behavior command information from the behavior determination mechanism unit 103, the voice synthesis unit 105 generates a synthesized sound based on the text included in the behavior command information, and supplies the synthesized sound to the speaker 72 for output. For example, when the behavior determination mechanism unit 103 receives a command to adjust the distance between the robot 1 and the user by the utterance of the robot 1 from the distance control unit 110 described later, the behavior determination mechanism unit 103, for example, Action command information including text such as “Please leave” or “Please get closer” is sent to the speech synthesizer 105. In this case, the speaker 72 outputs a voice (speech by the robot 1) such as “Please move away a little” or “Please move a little closer”.
[0078]
The distance control unit 110 includes a distance estimation unit 111, a threshold setting unit 112, a distance determination unit 113, and a distance adjustment unit 114. As described above, the distance control unit 110 is supplied with a command to adjust the distance between the robot 1 and the user from the behavior determination mechanism unit 103. Further, the CCD camera 81L and 81R, the microphone 82, the touch sensor 51, and the ultrasonic sensor 83 of the external sensor unit 71 (FIG. 5) receive an image signal S1A, an audio signal S1B, a pressure detection signal S1C, and a lag time S1D, that is, The external sensor signal S1 is also constantly supplied to the distance control unit 110.
[0079]
The distance estimation unit 111 estimates the distance between the robot 1 and the user on the basis of various signals from the external sensor unit 71, and uses the estimated distance between the robot 1 and the user as estimated distance information. And sent to the distance adjustment unit 114.
[0080]
That is, when the image signal S1A supplied from the CCD cameras 81L and 81R includes the face image of the user, the distance estimation unit 111 determines the area of the face image in the image signal S1A (hereinafter referred to as the face image area). The distance between the robot 1 and the user is estimated. The human face image is detected, for example, by detecting a skin color region from the image signal S1A. Note that the estimation of the distance using the user's face image area can be performed by one CCD camera, and therefore may be performed from the image signal of one of the CCD cameras 81L and 81R.
[0081]
The distance estimation unit 111 estimates the distance between the robot 1 and the user by performing stereo processing using the CCD cameras 81L and 81R. A detailed description of the principle of stereo processing will be given later.
[0082]
Furthermore, the distance estimation unit 111 estimates the distance between the robot 1 and the user based on the lag time S1D supplied from the ultrasonic sensor 83.
[0083]
The distance estimation unit 111 estimates the distance between the robot 1 and the user based on the sound supplied from the microphone 82. That is, when the voice signal S1B is supplied from the microphone 82 to the distance estimation unit 111, that is, when the user utters a voice, the distance estimation unit 111 determines the robot from the input user's voice level (level). The distance between 1 and the user is estimated. The distance estimation unit 111 estimates the direction (sound source direction) of the user who emitted the sound based on the sound supplied from each of the microphones 82-1 to 82-N.
[0084]
The distance estimation unit 111 estimates the distance between the robot 1 and the user from the signals supplied from the CCD cameras 81L and 81R, the microphone 82, the ultrasonic sensor 83, and the like as described above, and further, the estimation result From the above, the distance between the robot 1 and the user is estimated comprehensively. The comprehensively estimated distance is output from the distance estimation unit 111 to the distance determination unit 113 and the distance adjustment unit 114 as estimated distance information.
[0085]
The threshold setting unit 112 adjusts a predetermined range (for example, a range from a first position to a second position based on the position of the robot 1) R and a distance as a threshold used by a distance determination unit 113 described later. The threshold D1 used in the unit 114 is stored in advance, and the predetermined range R and threshold D1 are supplied to the distance determination unit 113 and the distance adjustment unit 114, respectively.
[0086]
Here, the predetermined range R is, for example, a distance between the robot 1 and the user that is appropriate for the robot 1 to recognize the face image of the user or to recognize the voice emitted by the user. This can be determined based on the performance of the cameras 81L and 81R, the microphone 82, the voice recognition unit 101A, the image processing unit 101B, and the like. The threshold value D1 is used to determine whether the user is near or far from the robot 1. In the distance adjusting unit 114 to be described later, the distance between the robot 1 and the user is adjusted by comparing with the threshold value D1, and therefore, if the threshold value D1 is set to a value within a predetermined range R, for example, the robot 1 The distance between the user and the user is adjusted to be within a predetermined range R. Therefore, the threshold value D1 is, for example, a value that divides the predetermined range R into two.
[0087]
The distance determination unit 113 is supplied with a distance (estimated value) between the robot 1 and the user as estimated distance information from the distance estimation unit 111 and a predetermined range R from the threshold setting unit 112. Then, the distance determination unit 113 determines whether the distance between the robot 1 and the user is within a predetermined range R.
[0088]
When the distance determination unit 113 determines that the distance between the robot 1 and the user supplied from the distance estimation unit 111 to the distance determination unit 113 is within a predetermined range R, the distance determination unit 113 instructs the behavior determination mechanism unit 103 to A command indicating that the distance between 1 and the user is within a predetermined range R is sent.
[0089]
On the other hand, when the distance determination unit 113 determines that the distance between the robot 1 and the user supplied from the distance estimation unit 111 to the distance determination unit 113 is not within the predetermined range R, the distance determination unit 113 determines that the robot A command to adjust the distance between 1 and the user is sent.
[0090]
As described above, the distance adjustment unit 114 is supplied with estimated distance information between the robot 1 and the user from the distance estimation unit 111. In addition, a threshold value D1 for determining whether the distance between the robot 1 and the user is far (larger) or closer (smaller) than a predetermined position is supplied from the threshold setting unit 112 to the distance adjusting unit 114. The Then, when the distance adjusting unit 114 receives a command to adjust the distance from the distance determining unit 113, the distance adjusting unit 114 is connected to the robot 1 supplied from the distance estimating unit 111 based on the estimated distance information and the threshold value D1. It is determined whether the distance to the user is greater than or less than the threshold value D1. Next, the distance adjustment unit 114 sends various command signals for adjusting the distance between the robot 1 and the user to the action determination mechanism unit 103 according to the determination result.
[0091]
Here, as a method of adjusting the distance between the robot 1 and the user, for example, there is a method of uttering that prompts the user to approach or leave the robot 1. In this case, for example, the distance adjustment unit 114 sends an action request command such as “speak so as to approach or leave the user” to the action determination mechanism unit 103. Upon receiving this command, the action determination mechanism unit 103 synthesizes the action command information including text such as “Please leave a little” or “Please get a little closer” to the voice synthesis unit 105 as described above. The data is sent to the unit 105. In this case, the speech synthesizer 105 generates a synthesized sound that prompts the user to move, such as “Please stay a little further” or “Please come a little closer”, and output it from the speaker 72. The distance between the robot 1 and the user is adjusted by the user who has heard the voice moving away from or approaching the robot 1 according to the voice.
[0092]
In addition, as another method for adjusting the distance between the robot 1 and the user, for example, there is a method of performing an operation (a gesture of the robot 1) that prompts the user to approach or leave the robot 1. In this case, for example, the distance adjustment unit 114 sends an action request command such as “invite the user” when the user wants to approach the user, or “purge away” to the action determination mechanism unit 103 when the user wants to move away. The behavior determination mechanism unit 103 that has received this command sends the content of the behavior to the posture transition mechanism unit 104 as behavior command information, as described above. The posture transition mechanism unit 104 generates posture transition information for transitioning the posture of the robot 1 from the current posture to the next posture based on the behavior command information supplied from the behavior determination mechanism unit 103. The data is sent to the sub-control units 63A to 63D. As described above, the sub-control units 63A to 63D move the arm units 13A and 13B and the leg units 14A and 14B that are the hands and feet of the robot 1. As a result, the robot 1 performs operations such as “beckon” and “purge” to the user, and the user who sees the operation moves away from or approaches the robot 1 according to the operation. The distance between the robot 1 and the user is adjusted.
[0093]
Furthermore, as another method for adjusting the distance between the robot 1 and the user, for example, there is a method in which the robot 1 itself acts (moves) so as to adjust the distance from the user. In that case, the distance adjustment unit 114 may, for example, issue a behavior request command such as “walk (move) in the user direction (front)” or “walk (move) in the direction away from the user (back)”. 103. The behavior determination mechanism unit 103 that has received this command sends the content of the behavior to the posture transition mechanism unit 104 as behavior command information. Then, as described above, the posture transition mechanism unit 104 generates posture transition information and sends the posture transition information to the sub-control units 63A to 63D, so that the robot 1 moves forward and backward with respect to the user. Moving. As a result, the distance between the robot 1 and the user is adjusted.
[0094]
As described above, the posture transition mechanism unit 104 generates posture transition information for transitioning the posture of the robot 1 from the current posture to the next posture based on the behavior command information supplied from the behavior determination mechanism unit 103. It is generated and sent to the sub-control units 63A to 63D.
[0095]
Next, the principle of estimating the distance from the output of the ultrasonic sensor 83 will be described with reference to FIG.
[0096]
The ultrasonic sensor 83 has a sound source and a microphone (not shown), and emits ultrasonic pulses from the sound source as shown in FIG. Further, the ultrasonic sensor 83 receives a reflected wave that is reflected by an obstacle and returns by the microphone, and the ultrasonic sensor 83 emits the ultrasonic pulse before receiving the reflected wave (lag time). ) The distance to the obstacle can be obtained from the time thus obtained (lag time).
[0097]
Next, referring to FIG. 8 to FIG. 12, the principle of estimating the distance between the robot 1 and the user by performing stereo processing (processing by the stereo matching method) using image signals from the CCD cameras 81L and 81R. Will be described.
[0098]
Stereo processing associates pixels between multiple images obtained by capturing the same object with the camera from two or more directions (different line-of-sight directions). The distance to an object is calculated.
[0099]
That is, the CCD cameras 81L and 81R are now referred to as the reference camera 81L and the detection camera 81R, and the images output from them are referred to as the reference camera image and the detection camera image, for example, as shown in FIG. When the camera 81L and the detection camera 81R capture a user as an imaging object, a reference camera image including the user's projection image is obtained from the reference camera 81L, and a detection camera image including the user's projection image from the detection camera 81R. Is obtained. And now, for example, if a certain point P on the mouth of the user is displayed in both the reference camera image and the detected camera image, the position on the reference camera image where the point P is displayed, The disparity information can be obtained from the position on the detected camera image, that is, the corresponding point (corresponding pixel), and further, the position of the point P in the three-dimensional space (three-dimensional position) is obtained using the principle of triangulation. be able to.
[0100]
Therefore, in stereo processing, it is first necessary to detect corresponding points. As a detection method, for example, there is an area-based matching method using an epipolar line.
[0101]
That is, as shown in FIG. 9, in the reference camera 81L, the point P on the user is the point P and the optical center (lens center) O of the reference camera 81L.₁The imaging surface S of the reference camera 1 on the straight line L connecting₁Intersection with n_aProjected on.
[0102]
In the detection camera 81R, the point P on the user is the point P and the optical center (lens center) O of the detection camera 81R.₂The imaging surface S of the detection camera 81R on the straight line connecting₂Intersection with n_bProjected on.
[0103]
In this case, the straight line L is the optical center O.₁And O₂And point n_aA plane passing through three points (or point P) and an imaging surface S on which a detection camera image is formed₂Line of intersection L₂As an imaging surface S₂Projected on top. The point P is a point on the straight line L. Therefore, the imaging surface S₂, The point n projected from the point P_bIs a straight line L projected from the straight line L₂This straight line L₂Is called an epipolar line. That is, point n_aCorresponding point n_bMay be present on the epipolar line L₂And therefore corresponding point n_bSearch for epipolar line L₂You can do the above.
[0104]
Here, the epipolar line is, for example, the imaging surface S.₁However, if the positional relationship between the reference camera 81L and the detection camera 81R is known, the epipolar line existing for each pixel can be obtained by calculation, for example. Can do.
[0105]
Epipolar line L₂Corresponding point n from above_bThe detection can be performed by, for example, area-based matching as follows.
[0106]
That is, in area-based matching, as shown in FIG._aFor example, a small rectangular block (hereinafter, referred to as a reference block as appropriate) having the center at (for example, the intersection of diagonal lines) is extracted from the reference camera image and projected onto the detected camera image as shown in FIG. 10B. Epipolar line L₂A small block having the same size as the reference block (hereinafter referred to as a detection block as appropriate) centered on a certain point is extracted from the detection camera image.
[0107]
Here, in the embodiment of FIG. 10B, the epipolar line L₂Above, point n as the center of the detection block_b1Thru n_b66 points are provided. These 6 points n_b1Thru n_b6Is a point that divides the straight line L in the three-dimensional space shown in FIG. 9 at a predetermined constant distance, that is, a point at which the distance from the reference camera 81L is, for example, 1 m, 2 m, 3 m, 4 m, 5 m, 6 m. Respectively, the imaging surface S of the detection camera 81R₂Therefore, the distances from the reference camera 81L correspond to points of 1 m, 2 m, 3 m, 4 m, 5 m, and 6 m, respectively.
[0108]
In area-based matching, epipolar line L is detected from the detected camera image.₂Point n provided above_b1Thru n_b6Detection blocks centered on each are extracted, and the correlation between each detection block and the reference block is calculated using a predetermined evaluation function. And point n_aThe center point n of the detection block having the highest correlation with the reference block centered on_bIs the point n_aIt is calculated as a corresponding point.
[0109]
That is, for example, when a function having a smaller value as the correlation is higher is used as the evaluation function, the epipolar line L₂Top point n_b1Thru n_b6For example, assume that an evaluation value (value of an evaluation function) as shown in FIG. 11 is obtained. In this case, the point n having the smallest evaluation value (highest correlation)_b3Is the point n_aAre detected as corresponding points. In FIG. 11, the point n_b1Thru n_b6Interpolation is performed using the evaluation value obtained for each of the evaluation values (indicated by black circles in FIG. 11) in the vicinity of the minimum value to obtain a point (indicated by a cross in FIG. 11) where the evaluation value becomes smaller. It is also possible to detect that point as the final corresponding point.
[0110]
In the embodiment of FIG. 10, as described above, the point where the straight line L in the three-dimensional space is divided at predetermined equal distances is defined as the imaging surface S of the detection camera 81R.₂The projected point is set, but this setting can be performed at the time of calibration of the reference camera 81L and the detection camera 81R, for example. Then, such a setting is made with the imaging surface S of the reference camera 81L.₁The setting is performed for each epipolar line existing for each pixel that constitutes, and as shown in FIG. 12A, a point set on the epipolar line (hereinafter referred to as a set point as appropriate) and a distance from the reference camera 81L are associated with each other. If a point / distance table is created in advance, a set point as a corresponding point is detected, and a distance (distance to the user) from the reference camera 81L is immediately obtained by referring to the set point / distance table. be able to. In other words, the distance can be obtained directly from the corresponding points.
[0111]
On the other hand, the point n on the reference camera image_aCorresponding point n on the detected camera image_bIf two points n are detected,_aAnd n_bCan be obtained. Further, if the positional relationship between the reference camera 81L and the detection camera 81R is known, two points n_aAnd n_bThe distance to the user can be obtained from the parallax between the two by the principle of triangulation. The distance can be calculated from the parallax by performing a predetermined calculation. However, the calculation is performed in advance, and a parallax / distance table for associating the parallax ζ with the distance is created in advance as shown in FIG. 12B. Then, the distance from the reference camera 81L can be immediately obtained by detecting the corresponding point, obtaining the parallax, and referring to the parallax / distance table.
[0112]
Here, the parallax and the distance to the user have a one-to-one correspondence. Therefore, obtaining the parallax and obtaining the distance to the user are equivalent to each other.
[0113]
In addition, the use of a block composed of a plurality of pixels such as a reference block and a detection block for detection of corresponding points reduces the influence of noise, and pixels (points) n on the reference camera image_aAnd the corresponding point (pixel) n on the detected camera image_bThis is for the purpose of ensuring the detection of corresponding points by clarifying and determining the correlation with the pattern characteristics of the surrounding pixels. Especially for the reference camera image and the detection camera image with little change, Due to the correlation of images, the larger the block size, the greater the certainty of detection of corresponding points.
[0114]
In area-based matching, the evaluation function for evaluating the correlation between the reference block and the detection block is the difference between the pixel values of the pixels constituting the reference block and the pixels constituting the detection block corresponding to each pixel. The sum of absolute values of the values, the square sum of the differences, normalized cross correlation, and the like can be used.
[0115]
The stereo processing has been briefly described above. However, the stereo processing (stereo matching method) is also described in, for example, Yakuin, Nagao, “Introduction to Image Processing in C Language”, Shosodo pp. 127, etc. ing.
[0116]
Next, the operation process of the robot 1 performed by the main control unit 61 of FIG. 6 will be described with reference to the flowchart of FIG. This process is started at the same time when the robot 1 is powered on.
[0117]
First, in step S 1, the behavior determination mechanism unit 103 determines the behavior of the robot 1 based on the state recognition information from the state recognition information processing unit 101, the state information from the model storage unit 102, the passage of time, and the like. Then, the process proceeds to step S2.
[0118]
In step S2, the action determination mechanism unit 103 determines whether or not the action determined in step S1 is an action that requires recognition processing. Here, as described above, the action that does not require the recognition process includes, for example, “dancing”. In addition, as described above, the action requiring the recognition process includes “conversation with the user”, for example.
[0119]
If it is determined in step S2 that the action determined in step S1 is an action that requires recognition processing, the process proceeds to step S3, and the action determination mechanism unit 103 instructs the distance estimation unit 111 to adjust the distance. The command is sent and the process proceeds to step S4.
[0120]
In step S4, when the distance estimation unit 111 receives a command for adjusting the distance from the behavior determination mechanism unit 103, the distance estimation unit 111 estimates the distance between the robot 1 and a user who is a recognition process target (recognition target). I do. The details of the distance estimation process will be described later with reference to FIG. 14, and by this process, the distance between the robot 1 and the user is estimated, and estimated distance information representing the estimated distance is obtained from the distance estimation unit 111. It is supplied to the distance determination unit 113 and the distance adjustment unit 114.
[0121]
After the process of step S4, the process proceeds to step S5, where the distance determination unit 113 is supplied in advance from the threshold setting unit 112 with the distance between the robot 1 and the user supplied from the distance estimation unit 111 by the process of step S4. It is determined whether or not it is within a predetermined range R set in the distance determination unit 113.
[0122]
If it is determined in step S5 that the distance between the robot 1 and the user is not within the predetermined range R, the process proceeds to step S6, and the distance determination unit 113 adjusts the distance between the robot 1 and the user. A command to that effect is output to the distance adjustment unit 114, and the process proceeds to step S7.
[0123]
In step S 7, when the distance adjustment unit 114 receives a command to adjust the distance between the robot 1 and the user from the distance determination unit 113, a process (distance adjustment process) described later that adjusts the distance between the robot 1 and the user. Execute. And it returns to step S4 from step S7, and the process of step S4 thru | or S7 is repeated hereafter. By repeating the processes of steps S4 to S7, the distance between the robot 1 and the user is set within a predetermined range R, that is, a distance suitable for the recognition process.
[0124]
On the other hand, when it is determined in step S5 that the distance between the robot 1 and the user is within the predetermined range R, that is, the distance between the robot 1 and the user is an appropriate distance for the recognition process. If YES in step S8, the distance determination unit 113 outputs a command indicating that the distance between the robot 1 and the user is within a predetermined range to the action determination mechanism unit 103, and then proceeds to step S9.
[0125]
Moreover, also when it determines with the action determined by step S1 not being the action which requires a recognition process in above-mentioned step S1, it progresses to step S9 and the action determination mechanism part 103 is the action determined by step S1. Action command information corresponding to is sent to the posture transition mechanism unit 104 or the voice synthesis unit 105.
[0126]
 Where recognition processing is requireddo not doIn some cases, for example, “dancing” is required, and recognition processing is required.DoExamples include “conversing with the user”. Also, for example, a voice saying “Goodbye” is emitted, and action command information is sent to both the posture transition mechanism unit 104 and the voice synthesis unit 105 depending on the action content of the robot 1 such as “waving hand”. You can also
[0127]
After the process of step S9, the process proceeds to step S10, where the action determination mechanism unit 103 determines whether or not to end the robot operation process. If it determines not to end, the process returns to step S1 and the subsequent processes are repeated. .
[0128]
Further, when it is determined in step S10 that the robot operation process is to be ended, that is, for example, when the power of the robot 1 is turned off by the user, the robot operation process is ended.
[0129]
Next, the distance estimation process of the distance estimation unit 111 in step S4 of FIG. 13 will be described with reference to the flowchart of FIG.
[0130]
In step S21, the distance estimation unit 111 determines whether or not the user's face image is detected from the image signals captured by the CCD cameras 81L and 81R (whether or not the face signal is included in the image signal).
[0131]
If it is determined in step S21 that a face image has been detected from the image signals captured by the CCD cameras 81L and 81R, the process proceeds to step S22, and the distance estimation unit 111 determines the robot 1 based on the size of the face image area in the image signal. And the user's distance is estimated, and the process proceeds to step S23. Here, as a method of estimating the distance between the robot 1 and the user from the size of the face image area in the image signal, for example, the size (number of pixels) of the face image area and the distance between the robot 1 and the user are associated. There is a method in which a table is attached in advance, and the distance between the robot 1 and the user is estimated from the size of the face image area based on the table.
[0132]
On the other hand, when it is determined in step S21 that the face image is not detected from the image signals captured by the CCD cameras 81L and 81R, that is, for example, the image signal output by the CCD cameras 81L and 81R is sufficiently large. If the face image area does not exist, step S22 is skipped, and the process proceeds to step S23. The distance estimation unit 111 uses the image signal from the CCD cameras 81L and 81R to perform the above-described stereo processing and the robot 1 and the user. The distance is estimated and the process proceeds to step S24.
[0133]
In step S24, the distance estimation unit 111 estimates the distance between the robot 1 and the user from the lag time output from the ultrasonic sensor 83 to the distance control unit 110, and proceeds to step S25.
[0134]
In step S 25, the distance estimation unit 111 determines whether or not there has been a voice input to the microphone 82, that is, whether or not the user has made a voice to the robot 1. If it is determined that there is an audio input to the microphone 82, the process proceeds to step S 26, and the distance estimation unit 111 uses the audio signal supplied from the microphone 82 to the distance control unit 110 and the user's audio input to the microphone 82. The distance between the robot 1 and the user is estimated from the size of. Further, the distance estimation unit 111 estimates the direction (sound source direction) of the user who has emitted the sound from the sound signals of the microphones 82-1 to 82-N, and proceeds to step S27.
[0135]
If it is determined in step S25 that there is no voice input to the microphone 82, the process of step S26 is skipped and the process proceeds to step S27. The distance estimation unit 111 performs steps S22, S23, S24, and S26, respectively. The total distance between the robot 1 and the user is estimated from the estimated distance between the robot 1 and the user.
[0136]
Here, the size estimation accuracy of the distance estimated from each of the size of the face image area, the stereo processing, the lag time, and the input user's voice is compared. The estimation of the distance from the input voice of the user correlates the magnitude of the voice volume (the level of the voice signal) with the distance, but the volume of the voice when a human usually speaks varies from person to person. Since the user may be a small person or a large person, the degree of correlation between the size of the audio volume and the distance may vary depending on the user.
[0137]
The method for estimating the distance from the size of the face image area is that the face image area is extremely small (the distance between the robot 1 and the user is too far) with respect to the effective pixel area of the CCD cameras 81L and 81R, or If the face image area is too large to fit (the distance between the robot 1 and the user is too close), the distance between the robot 1 and the user estimated from such a face image area may have a large error. .
[0138]
The method of estimating the distance by stereo processing is also affected by the size of the image portion of the user in the image signal, as in the case of estimating the distance from the size of the face image area.
[0139]
On the other hand, the method of estimating the distance from the lag time output from the ultrasonic sensor 83 is considered to have the highest reliability because it is not affected by the user's image and sound unlike the other three methods described above.
[0140]
Therefore, as a comprehensive distance estimation method, for example, weights corresponding to the respective reliability values are assigned to the distance values as the four estimation results, the weights are multiplied by the estimation results, and the average value ( It is possible to adopt a method in which the weighted average value) is set as a total distance.
[0141]
Further, for example, a distance estimation unit based on the direction of the user who has performed image recognition such as stereo processing or distance estimation using a face image area (front direction of the head unit 12) and the sound signals of the microphones 82-1 to 82-N. If the user direction (sound source direction) estimated by 111 is a different direction, the sound collected by the microphones 82-1 to 82-N is assumed to be a sound emitted by another user who is not a recognition target. The total distance may be estimated by excluding the distance estimation result from the input voice. In addition, for example, the distance estimation result from the speech input generally considered to have the lowest reliability is compared with the distance estimation result by the other three methods. When the distance estimation result by the three methods and an extremely different value are used, the distance estimation result from the voice input is discarded, and only the distance estimation result by the other three methods is used. You may make it estimate the total distance of the robot 1 and a user. As other comprehensive distance estimation methods, four methods or a method of obtaining the median (median value) of the distance estimation results obtained by a plurality of methods among the four methods are adopted. Also good.
[0142]
Here, the result of the distance estimated from the output of the ultrasonic sensor 83 is considered to have the highest reliability when there is no obstacle between the robot 1 and the user. If there is an obstacle between the two, the obstacle may be detected. Therefore, for example, when the distance estimated from the output of the ultrasonic sensor 83 is extremely smaller than the distance estimated from the size of the stereo processing and the face image area (the distance between the robot 1 and the user is short), Assuming that the object is very close to the robot 1 between the robot 1 and the user, and the ultrasonic sensor 83 detects the obstacle, the distance estimation unit 111 excludes the output of the ultrasonic sensor 83. It is also possible to estimate the total distance.
[0143]
As described above, the distance estimation unit 111 estimates the distance between the robot 1 and the user from the output signals of the various sensors of the external sensor unit 71. In estimating the distance between the robot 1 and the user, it is not always necessary to adopt the above-described four distance estimation methods, only one of them may be adopted, or other distances may be used. An estimation method may be adopted. As another distance estimation method, for example, there is a distance estimation method using PSD (Position Sensitive Detector). The estimation of distance by PSD is as follows. That is, an LED (Light Emitting Diode) provided with the PSD emits light, and the PSD receives reflected light that is reflected back to the object. The PSD estimates the distance between the object and the PSD based on the triangulation principle based on the position information of the reflected light.
[0144]
Next, a first embodiment (first distance adjustment process) of the distance adjustment process of the distance adjustment unit 114 in step S7 of FIG. 13 will be described with reference to the flowchart of FIG.
[0145]
First, in step S41, the distance adjustment unit 114 estimates the distance between the robot 1 and the user estimated by the distance estimation unit 111 in step S4 (FIG. 13) of FIG. It is determined whether or not the threshold value D1 supplied from the setting unit 112 to the distance adjusting unit 114 is far (larger).
[0146]
If it is determined in step S41 that the distance between the robot 1 and the user is greater than the threshold value D1, the process proceeds to step S42, and the distance adjusting unit 114 issues an action request command to “speak to the user”. The data is sent to the action determination mechanism unit 103, and the distance adjustment process is terminated.
[0147]
On the other hand, if it is determined in step S41 that the distance between the robot 1 and the user is not longer than the threshold value D1, the process proceeds to step S43, and the distance adjustment unit 114 causes the distance between the robot 1 and the user to be closer than the threshold value D1 ( Small).
[0148]
If it is determined in step S43 that the distance between the robot 1 and the user is closer than the threshold value D1, the process proceeds to step S44, and the distance adjustment unit 114 issues an action request command to “speak away from the user”. The data is sent to the action determination mechanism unit 103, and the distance adjustment process is terminated.
[0149]
On the other hand, when it is determined in step S43 that the distance between the robot 1 and the user is not closer than the threshold value D1, the distance adjustment unit 114 ends the distance adjustment process.
[0150]
Here, when the recognition process that needs to be performed in the action determined in step S1 in FIG. 13 is a voice recognition process, in steps S42 and S44, “speak to approach the user” and “to the user” Instead of an action request command that says “speak away”, “speak to ask the user to speak louder” and “speak to ask the user to speak louder” The action request command may be sent from the distance adjustment unit 114 to the action determination mechanism unit 103.
[0151]
Next, a second embodiment (second distance adjustment process) of the distance adjustment process of the distance adjustment unit 114 in step S7 of FIG. 13 will be described with reference to the flowchart of FIG. In FIG. 16, the description of the same parts as those in the flowchart of FIG. 15 is omitted as appropriate. That is, in FIG. 16, the process performed by the distance adjustment unit 114 when the distance between the robot 1 and the user is determined to be greater than the threshold value D1 (the process of step S52), and the distance between the robot 1 and the user are The process performed by the distance adjustment unit 114 when it is determined that the distance is closer than the threshold value D1 (the process of step S54) is different from that in FIG.
[0152]
In the second distance adjustment process of FIG. 16, the same processes as in steps S41 and S43 of FIG. 15 are performed in steps S51 and S53, respectively. In step S51, when it is determined that the distance between the robot 1 and the user is greater than the threshold value D1, the process proceeds to step S52, and the distance adjustment unit 114 performs an operation of prompting the user to approach, for example, beckoning. An action request command indicating “to do” is sent to the action determination mechanism unit 103, and the distance adjustment process is terminated.
[0153]
In step S53, when it is determined that the distance between the robot 1 and the user is closer than the threshold value D1, the process proceeds to step S54, and the distance adjustment unit 114 performs an operation of prompting the user to leave, for example, driving away by hand. An action request command indicating “to do” is sent to the action determination mechanism unit 103, and the distance adjustment process is terminated.
[0154]
Here, in step S52 or S54, when the recognition process which needs to be performed by the action determined in step S1 of FIG. 13 is a voice recognition process, the robot 1 and the user performed by the distance estimation unit 111 are connected. When the distance is estimated based on the volume of the voice uttered by the user, which is input from the microphone 82 to the distance estimating unit 111, the distance is larger than the user, for example, holding the hand of the robot 1 over the ear. Alternatively, an action request command for a gesture that prompts the utterance of a low voice may be sent from the distance adjustment unit 114 to the action determination mechanism unit 103.
[0155]
Next, a third embodiment (third distance adjustment process) of the distance adjustment process of the distance adjustment unit 114 in step S7 of FIG. 13 will be described with reference to the flowchart of FIG. In FIG. 17, the description of the same parts as those in the flowchart of FIG. 15 is omitted as appropriate. That is, in FIG. 17, the process performed by the distance adjustment unit 114 when the distance between the robot 1 and the user is determined to be greater than the threshold value D1 (the process of step S62), and the distance between the robot 1 and the user are The process performed by the distance adjustment unit 114 when it is determined that the distance is closer than the threshold value D1 (the process of step S64) is different from that in FIG.
[0156]
In the third distance adjustment process of FIG. 17, the same processes as in steps S41 and S43 of FIG. 15 are performed in steps S61 and S63, respectively. If it is determined in step S61 that the distance between the robot 1 and the user is greater than the threshold value D1, the process proceeds to step S62, and the distance adjustment unit 114 performs an action indicating that the robot 1 itself “moves in the user direction”. A request command is sent to the action determination mechanism unit 103, and the distance adjustment process is terminated.
[0157]
If it is determined in step S63 that the distance between the robot 1 and the user is closer than the threshold value D1, the process proceeds to step S64, and the distance adjustment unit 114 performs an action indicating that the robot 1 itself “moves away from the user”. A request command is sent to the action determination mechanism unit 103, and the distance adjustment process is terminated.
[0158]
As described above, as the distance adjustment method by the distance adjustment unit 114, the method in which the robot 1 speaks so as to approach or leave the user (first distance adjustment processing shown in FIG. 15), the operation that prompts the user to leave or approach. A method performed by the robot 1 (second distance adjustment processing shown in FIG. 16) and a method in which the robot 1 moves in the direction of the user or away from the user (third distance adjustment processing shown in FIG. 17). However, it is not necessary to adopt any one of these three distance adjustment methods. For example, in the distance adjustment unit 114, the above three methods are used. Any of these can be selected randomly. Further, for example, in the distance adjustment unit 114, the robot 1 can speak so as to approach or leave the user, and the robot 1 can also perform an operation that prompts the user to leave or approach. .
[0159]
Next, a fourth embodiment (fourth distance adjustment process) of the distance adjustment process of the distance adjustment unit 114 in step S7 of FIG. 13 will be described with reference to the flowchart of FIG. In FIG. 18, the description of the same parts as those in the flowchart of FIG. 15 is omitted as appropriate. That is, in FIG. 18, the process performed by the distance adjustment unit 114 when the distance between the robot 1 and the user is determined to be greater than the threshold value D1 (the process of steps S72 to S74), and the distance between the robot 1 and the user. However, the processing performed by the distance adjustment unit 114 when it is determined that the distance is closer than the threshold value D1 (the processing in steps SS76 to S78) is different from that in FIG.
[0160]
In the distance adjustment process shown in the flowchart of FIG. 18, a method in which the robot 1 speaks so as to approach or leave the user (first distance adjustment process shown in FIG. 15), and the robot 1 moves in the user direction or the user One of the methods of moving away from the vehicle (third distance adjustment processing shown in FIG. 17) is selected depending on whether there is an obstacle in the direction in which the robot 1 moves. .
[0161]
In the fourth distance adjustment process of FIG. 18, the same processes as in steps S41 and S43 of FIG. 15 are performed in steps S71 and S75, respectively. If it is determined in step S71 that the distance between the robot 1 and the user is greater than the threshold value D1, the process proceeds to step S72, and the distance adjusting unit 114 has an obstacle on the plane from the robot 1 to the user. Determine whether or not.
[0162]
In Step S72, when it is determined that there is an obstacle on the plane from the robot 1 to the user direction, the process proceeds to Step S73, and the distance adjustment unit 114, as in Step S42 of FIG. The action request command “speak” is sent to the action determination mechanism unit 103, and the distance adjustment process is terminated.
[0163]
On the other hand, if it is determined in step S72 that there is no obstacle on the plane from the robot 1 to the user direction, the process proceeds to step S74, and the distance adjustment unit 114 is controlled by the robot 1 itself as in step S62 in FIG. An action request command “moving in the direction of the user” is sent to the action determination mechanism unit 103, and the distance adjustment process is terminated.
[0164]
If it is determined in step S75 that the distance between the robot 1 and the user is closer than the threshold value D1, the process proceeds to step S76, and the distance adjusting unit 114 has an obstacle on the plane in the back direction of the robot 1. Determine whether or not.
[0165]
In step S76, when it is determined that there is an obstacle on the plane in the back direction of the robot 1, the process proceeds to step S77, and the distance adjustment unit 114, as in step S44 in FIG. The action request command “speak” is sent to the action determination mechanism unit 103, and the distance adjustment process is terminated.
[0166]
On the other hand, if it is determined in step S76 that there is no obstacle on the plane in the back direction of the robot 1 with respect to the user, the process proceeds to step S78, and the distance adjustment unit 114, as in step S64 in FIG. An action request command to the effect that "the user moves away from the user" is sent to the action determination mechanism unit 103, and the distance adjustment process ends.
[0167]
Here, the determination of whether there is an obstacle in steps S72 and S76 can be performed, for example, as follows. That is, as described above, the distance between the robot 1 estimated by the distance estimation unit 111 based on the output of the ultrasonic sensor 83 and the user is calculated based on the outputs of other sensors, for example, image signals from the CCD cameras 81L and 81R. The distance between the robot 1 and the user estimated by the distance estimation unit 111 based on the output of the ultrasonic sensor 83 is very close to the robot 1 (position). ), The object detected by the ultrasonic sensor 83 can be determined to be an obstacle between the robot 1 and the user, assuming that the object is not an user but an obstacle.
[0168]
Accordingly, when the distance estimation unit 111 estimates the distance between the robot 1 and the user and determines that the object detected by the ultrasonic sensor 83 is an obstacle, the distance estimation unit 111 uses the information as the estimated distance. If the information is sent to the distance adjustment unit 114 together with the information, the ultrasonic sensor 83 can also serve as an obstacle detection sensor from the robot 1 toward the user.
[0169]
On the other hand, an obstacle in the back direction of the robot 1 is detected using, for example, an ultrasonic sensor 83 disposed in front of the robot 1 with the back side facing the robot 1 (face facing the back side). However, the actions that the robot 1 requires recognition processing often interact with the user, and it is unnatural as an action to turn away from the user (turn away the face) and turn around again. Recognizing takes a long processing time. Therefore, for example, there is a method of detecting an obstacle by attaching an ultrasonic sensor 83 similar to the ultrasonic sensor 83 to the back side of the robot 1. However, attaching a large number of various sensors to the robot 1 causes a cost problem. Therefore, when the distance between the robot 1 and the user is closer than the threshold value D1, the user moves the robot 1 away from the robot 1, and the robot 1 does not move in the direction away from the user (backward direction). Can be done.
[0170]
FIG. 19 shows a flowchart of the fifth embodiment (fifth distance adjustment process) of the distance adjustment process of the distance adjustment unit 114 in such a case. The distance adjustment process of the distance adjustment unit 114 in FIG. 19 will be described below, but the description of the same parts as in FIG. 18 will be omitted as appropriate.
[0171]
That is, in steps S91 to S95, the same processing as in steps S71 to S75 of FIG. 18 is performed. If it is determined in step S95 that the distance between the robot 1 and the user is closer than the threshold value D1, the process proceeds to step S96, and the distance adjustment unit 114, as in step S77 in FIG. An action request command to “speak to” is sent to the action determination mechanism unit 103, and the distance adjustment process ends.
[0172]
In the distance adjustment processing of the distance adjustment unit 111 shown in FIGS. 18 and 19, the robot 1 determines whether there is an obstacle on the plane in the direction of the user from the robot 1 or in the direction opposite to the user (the back direction of the robot 1). In the distance adjustment process, for example, first, the robot 1 is moved so that the robot 1 moves. When an obstacle is detected during movement, an action (speech) that prompts the user to move may be taken. In this case, as a method of detecting the obstacle while the robot 1 is moving, for example, when the leg unit 14A or 14B hits the obstacle, the torque of the motor increases, and therefore the torque change of the actuators A9 to A14 changes. There is a method to detect.
[0173]
As described above, the distance between the robot 1 and the user is estimated based on signals output from the CCD cameras 81L and 81R, the microphone 82, the ultrasonic sensor 83, and the like, and the estimated distance is appropriate for the recognition process. If the distance is outside the predetermined range, the distance between the robot 1 and the user is adjusted, so that the recognition accuracy of the recognition function of the robot 1 can be improved.
[0174]
In the above-described embodiment, each of the predetermined range R and the threshold value D1 is set in the threshold setting unit 112 as a predetermined value. However, each of the predetermined range R and the threshold value D1 can be adaptively changed depending on, for example, the surrounding situation and the user's state.
[0175]
FIG. 20 shows a main control unit corresponding to FIG. 6 that dynamically changes each of the predetermined range R and threshold D1 that the threshold setting unit 112 supplies to the distance determination unit 113 and the distance adjustment unit 114. 61 shows a functional configuration example. In the figure, portions corresponding to those in FIG. 6 are denoted by the same reference numerals, and description thereof will be omitted below as appropriate. That is, the main control unit 61 in FIG. 20 is basically configured in the same manner as in FIG.
[0176]
The action determination mechanism unit 103 determines the next action based on the state recognition information from the state recognition information processing unit 101, the state information from the model storage unit 102, the passage of time, etc., and the content of the determined action is For example, when voice recognition processing or image recognition processing such as “talking with the user” or “waving a hand to the user” is required, the behavior determination mechanism unit 103 includes the voice recognition unit of the state recognition information processing unit 101. A command for adjusting the distance between the robot 1 and the user is sent to the distance control unit 110 so that 101A and the image recognition unit 101B can recognize with high accuracy. The action determination mechanism unit 103 also supplies the distance control unit 110 with the current operation state of the robot 1 such as “stepping on” together with a command to adjust the distance between the robot 1 and the user.
[0177]
The distance estimation unit 111 estimates the distance between the robot 1 and the user based on various signals from the CCD cameras 81L and 81R, the microphone 82, the ultrasonic sensor 83, and the like, as in the case of FIG. Then, the distance estimation unit 111 sends the estimated distance information of the estimated distance to the threshold setting unit 112 in addition to the distance determination unit 113 and the distance adjustment unit 114.
[0178]
The magnitude of ambient noise in speech recognition may affect the recognition process. For example, when the distance between the robot 1 and the user is too far and it is difficult to distinguish between the loudness of the user's voice and the surrounding noise, the voice of the user is increased by reducing the distance between the robot 1 and the user. Can be made. Therefore, the appropriate distance between the robot 1 and the user varies depending on the magnitude of ambient noise. Therefore, the threshold setting unit 112 obtains ambient noise, and a range of the distance between the robot 1 and the user appropriate for the recognition process (predetermined range R₁) Is set.
[0179]
Therefore, the threshold setting unit 112 measures background noise around the robot 1 based on the audio signal from the microphone 82. This is because, for example, among the audio signals collected by the microphone 82, the audio signals collected during a predetermined period of time (a period other than the audio period) that do not include the user-generated audio signal are collected and collected. It is possible to calculate by calculating the average of the power values.
[0180]
Further, the threshold setting unit 112 uses the current operation state of the robot 1 supplied from the action determination mechanism unit 103 to the distance control unit 110, and the noise generated by the robot 1 itself in the voice input to the microphone 82. Estimate the effect (noise component). As the operation state of the robot 1, for example, it can be said that the operation mode of the robot 1 is such as walking, standing still, or sitting, to the operation state of the actuators A1 to A14 corresponding to the joint portions of the robot 1. Are supplied from the action determination mechanism unit 103 to the distance control unit 110. For example, when the robot 1 is walking or stepping, the threshold setting unit 112 sets the level of the hitting sound between the foot 45 of the robot 1 and the floor, the motor sound of the actuators A1 to A14 of the robot, and the like. This is estimated as the influence of noise generated by the robot 1 itself.
[0181]
Then, the threshold setting unit 112 determines the range of the distance between the robot 1 and the user suitable for recognition (predetermined range R) from the background noise around the robot 1 and the current operation state of the robot 1 as described above.₁) Is set. For example, when the ambient noise input from the microphone 82 is too large to distinguish the user's voice, the range of the distance between the robot 1 and the user suitable for recognition (predetermined range R)₁) Is a close value, the user can approach the direction of the robot 1 and the S / N of the user's voice signal can be increased.
[0182]
In addition, when the face setting unit 112 detects a face image in the image signals from the CCD cameras 81L and 81R, the threshold setting unit 112 determines the range of the distance between the robot 1 suitable for recognition and the user based on the size of the detected face image region ( Predetermined range R₂) Is set. This is because the size of the detected face image area is estimated from the size of the face image area, whether it is too large or too small with respect to the image frame of the image output by the CCD cameras 81L and 81R. This is because the error in the distance between the user and the user increases. Therefore, the number of vertical and horizontal pixels of the rectangle including the face image area is, for example, about half the number of pixels of the effective pixels of the CCD cameras 81L and 81R. The distance between the robot 1 and the user at this time (the size of the face image area at this time is appropriately referred to as an appropriate value hereinafter) is set as an optimum distance for recognition, and a range having a predetermined margin from the optimum distance. , An appropriate recognition range (predetermined range R) by detecting the size of the face image area.₂).
[0183]
Further, when the threshold value setting unit 112 detects an audio signal from the microphone 82, the range of the distance between the robot 1 and the user suitable for recognition (predetermined range R) from the magnitude of the audio signal.₂) Is set. For example, when the voice uttered by the user is too close to the robot 1 and is too loud, the voice waveform collected by the microphone 82 exceeds the dynamic range of the microphone 82 and becomes a waveform distorted from the original voice waveform. . On the other hand, if the voice uttered by the user is too far away from the robot 1, it becomes difficult to distinguish the user's voice that should be recognized as ambient noise. Therefore, when the average level of the user's voice input to the microphone 82 is about the center value (midpoint of the dynamic range) of the range of the voice level that can be measured by the microphone 82 (at this time, the user's voice). The distance between the robot 1 and the user (hereinafter also referred to as an appropriate amount value as appropriate) is set as an optimum distance for recognition, and a range having a predetermined margin from the optimum distance is detected by detecting the magnitude of the voice. Range of distance between robot 1 and user appropriate for recognition (predetermined range R₂).
[0184]
Further, the threshold setting unit 112 calculates the range of the distance between the robot 1 and the user suitable for recognition (predetermined range R) calculated from the background noise around the robot 1 and the current operation state of the robot 1 described above.₁) And the distance between the robot 1 and the user suitable for recognition calculated from the size of the face image area detected from the image signals output from the CCD cameras 81L and 81R and the size of the audio signal output from the microphone 82 Range (predetermined range R₂), A predetermined range R used as a threshold value in the distance determination unit 113 is dynamically set.
[0185]
Furthermore, the threshold value setting unit 112 calculates, for example, the center value from the predetermined range R that is dynamically set, and sets it as the threshold value D1. Then, the threshold setting unit 112 sends the value of the predetermined range R to the distance determination unit 113 and the threshold D1 to the distance adjustment unit 114.
[0186]
Next, the dynamic determination process of the predetermined range R by the threshold setting unit 112 will be described with reference to the flowchart of FIG. This process is executed at a constant cycle while the robot 1 is powered on, for example.
[0187]
First, in step S111, the threshold setting unit 112 measures the background noise around the robot 1 based on the audio signal from the microphone 82, and the process proceeds to step S112.
[0188]
In step S112, the threshold setting unit 112 acquires the current operation state of the robot 1 supplied from the behavior determination mechanism unit 103 to the distance control unit 110, and proceeds to step S113.
[0189]
In step S113, the threshold setting unit 112 determines the distance between the robot 1 and the user suitable for recognition from the background noise around the robot 1 calculated in step S111, the current operation state of the robot 1 acquired in step S112, and the like. The range is calculated and the predetermined range R₁And proceed to step S114.
[0190]
In step S114, the threshold value setting unit 112 detects a face image in the image signal input from the microphone 82 to the distance control unit 110 or from the CCD cameras 81L and 81R to the distance control unit 110. Determine if you did.
[0191]
In step S114, it is determined that there is no voice input from the microphone 82 to the distance control unit 110 and that no face image is detected in the image signals input from the CCD cameras 81L and 81R to the distance control unit 110. In this case, the processing from step S115 to step S118 described later is skipped, and the process proceeds to step S119.
[0192]
On the other hand, in step S114, it is determined that there has been a user's voice input from the microphone 82 to the distance control unit 110 or that a face image has been detected in the image signals input from the CCD cameras 81L and 81R to the distance control unit 110. In step S115, the user's voice input from the microphone 82 to the distance control unit 110 or the face image area in the image signal input from the CCD cameras 81L and 81R to the distance control unit 110 is displayed. The threshold value setting unit 112 determines whether the size is an appropriate amount value. If only one of the user's voice input or face image detection is detected, only the detected signal is processed in step S115.
[0193]
In step S115, the magnitude of the sound input from the microphone 82 to the distance control unit 110 and the size of the face image area in the image signal input from the CCD cameras 81L and 81R to the distance control unit 110 are appropriate amounts. If it is determined that the process is performed, the processes in steps S116 to S118 described later are skipped, and the process proceeds to step S119.
[0194]
On the other hand, in step S115, the volume of the sound input from the microphone 82 to the distance controller 110 or the size of the face image area in the image signal input from the CCD cameras 81L and 81R to the distance controller 110 is an appropriate amount. When it is determined that the value is not a value, the process proceeds to step S116, and the threshold setting unit 112 acquires the estimated distance information supplied from the distance estimation unit 111, that is, the distance between the robot 1 and the user, and the process proceeds to step S117. move on.
[0195]
In step S117, the threshold setting unit 112 determines that the distance between the robot 1 acquired in step S116 and the user is a predetermined range R suitable for recognition calculated in step S113.₁It is determined whether it is within.
[0196]
In step S117, the distance between the robot 1 and the user is a predetermined range R suitable for recognition.₁If it is determined that it is not within, the process of step S118 is skipped and the process proceeds to step S119.
[0197]
On the other hand, in step S117, the distance between the robot 1 and the user is within a predetermined range R suitable for recognition.₁If it is determined that it is within the range, the process proceeds to step S118, and the threshold setting unit 112 is a range of the distance between the robot 1 and the user that makes the user's voice level or the size of the face image area an appropriate amount value. Predetermined range R₂And the predetermined range R set in step S113₁Rather, the predetermined range R of the distance between the robot 1 and the user that makes the user's voice level or the size of the face image area an appropriate value.₂Is set as a range (predetermined range R) between the robot 1 and the user suitable for recognition, and the process proceeds to step S119. For example, the distance between the robot 1 and the user acquired by the threshold setting unit 112 from the distance estimation unit 111 in step S116 is a predetermined range R.₁Is within the predetermined range R calculated in step S118.₂For example, the size of the face image area obtained from the input image is small, and the predetermined range R₁Calculated as a range of distance closer to the robot 1 than the predetermined range R as the predetermined range R of the distance between the robot 1 and the user suitable for recognition.₂Is adopted and set.
[0198]
Here, in step S117, the distance between the robot 1 and the user acquired by the threshold setting unit 112 from the distance estimation unit 111 in step S116 is a predetermined range R.₁If it is determined that the distance is not within, the distance adjustment unit 114 sends a command to adjust the distance between the robot 1 and the user to the action determination mechanism unit 103 regardless of the size of the voice or the face image area. Therefore, the threshold value setting unit 112 dares to set a predetermined range R₂Is not calculated (step S118 is skipped).
[0199]
On the other hand, the distance between the robot 1 and the user is within a predetermined range R.₁Nevertheless, the volume of the sound input from the microphone 82 to the distance control unit 110 is not an appropriate amount, or the face image area in the image signal input from the CCD cameras 81L and 81R to the distance control unit 110. If the size of is not an appropriate amount, a range with better recognition accuracy (predetermined range R₂) Is present, the predetermined range R₂Is calculated in step S118.
[0200]
As another countermeasure method when the size of the face image area is not an appropriate amount, there is a method in which the CCD cameras 81L and 81R are provided with a zoom mechanism and the size of the face image area is adjusted by the zoom mechanism. Since the zoom ratio is also limited, there is a limit to the range in which the user's face image area is optimized. In order to provide the zoom mechanism, the robot 1 is subject to structural (location) restrictions and cost restrictions. Therefore, as described above, it is desirable that the size of the face image area is adjusted by the user or the robot 1 adjusting the distance.
[0201]
In step S119, the threshold setting unit 112 calculates the threshold D1 from the predetermined range R, outputs the predetermined range R to the distance determination unit 113, and outputs the threshold D1 to the distance adjustment unit 114, and ends the process.
[0202]
As described above, the distance between the robot 1 and the user is always adjusted to the optimum distance for the recognition function by adaptively changing the values of the predetermined range R and the threshold value D1 according to the surrounding situation and the user's condition. Therefore, the recognition accuracy of the recognition function can be improved.
[0203]
In the above embodiment, output signals from various sensors such as the microphone 82 of the external sensor unit 71, the CCD cameras 81L and 81R, and the touch sensor 51 are always input to the state recognition information processing unit 101, and the state recognition is performed. The information processing unit 101 performs the recognition process and always outputs the state recognition information to the action determination mechanism unit 103. However, the action determination mechanism unit 103 determines the next action and recognizes the action as a recognition process. Only when necessary, the behavior determination mechanism unit 103 may send a command to the state recognition information processing unit 101 to cause the state recognition information processing unit 101 to perform recognition processing and receive state recognition information.
[0204]
The threshold value D1 output from the threshold setting unit 112 to the distance adjustment unit 114 is the center value of the predetermined range R. For example, the upper limit value (the value farther from the robot 1) and the lower limit of the predetermined range R It can also be set as two threshold values such as a value (a value closer to the robot 1). In this case, for example, in the flowchart of the first distance adjustment process shown in FIG. 15, the upper limit value can be used in place of the threshold value D1 in step S41, and the lower limit value can be used in place of the threshold value D1 in step S43.
[0205]
A program for executing the above-described series of processing is applied to a removable recording medium such as a flexible disk, a CD-ROM (Compact Disc Read Only Memory), an MO (Magneto optical) disk, a DVD (Digital Versatile Disc), a magnetic disk, and a semiconductor memory. It can be stored (recorded) temporarily or permanently. Such a removable recording medium can be provided as so-called package software, and is installed in the memory 61A.
[0206]
Further, the program is installed in the memory 61A from the removable recording medium as described above, or transferred from a download site to a computer wirelessly via a digital satellite broadcasting artificial satellite, LAN (Local Area Network), The program can be transferred to a computer via a network such as the Internet, and the computer can receive the program transferred in this way and install it in the memory 61A.
[0207]
In the present specification, the step of describing the program recorded on the recording medium is not limited to the processing performed in chronological order according to the described order, but is not necessarily performed in chronological order. It also includes processes that are executed individually.
[0208]
The distance between the robot 1 and the user is detected outside the robot 1 by using, for example, a monitoring camera, in addition to the method of estimating and obtaining from the output of the microphone 82 or the like as described above. 1 may be supplied.
[0209]
【The invention's effect】
As described above, according to the present invention, the recognition accuracy of the recognition function of the robot can be improved.
[Brief description of the drawings]
FIG. 1 is a perspective view showing an external configuration of a robot to which the present invention is applied.
FIG. 2 is a rear perspective view showing an external configuration of the robot shown in FIG. 1;
FIG. 3 is a schematic diagram for explaining the robot of FIG. 1;
4 is a block diagram for mainly explaining a portion related to control of the robot of FIG. 1; FIG.
FIG. 5 is a block diagram showing an internal configuration of the robot shown in FIG. 1;
6 is a block diagram showing a configuration of a main control unit in FIG. 5. FIG.
FIG. 7 is a diagram illustrating processing of an ultrasonic sensor.
FIG. 8 is a diagram illustrating a state in which a user is photographed with a reference camera and a detection camera.
FIG. 9 is a diagram for explaining epipolar lines.
FIG. 10 is a diagram illustrating a reference camera image and a detection camera image.
FIG. 11 is a diagram showing transition of evaluation values.
FIG. 12 is a diagram showing a set point / distance table and a parallax / distance table.
13 is a flowchart for explaining robot operation processing of the robot of FIG. 1. FIG.
FIG. 14 is a flowchart for explaining the distance estimation processing in step S4 of FIG.
FIG. 15 is a flowchart illustrating the distance adjustment processing in step S7 of FIG.
16 is a flowchart for explaining the distance adjustment processing in step S7 in FIG. 13;
FIG. 17 is a flowchart illustrating the distance adjustment process in step S7 of FIG.
FIG. 18 is a flowchart illustrating the distance adjustment process in step S7 of FIG.
FIG. 19 is a flowchart illustrating the distance adjustment process in step S7 of FIG.
20 is a block diagram illustrating a configuration of a main control unit in FIG. 5;
FIG. 21 is a flowchart for describing dynamic range determination processing of a threshold setting unit when a predetermined range R is dynamically determined.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Robot, 61 Main control part, 63 Sub control part, 71 External sensor part, 72 Speaker, 81L CCD camera, 81R CCD camera, 82 Microphone, 83 Ultrasonic sensor, 101 State recognition information processing part, 102 Model storage part, 103 Action determination mechanism section, 104 posture transition mechanism section, 105 speech synthesis section, 110 distance control section, 111 distance estimation section, 112 threshold setting section, 113 distance determination section, 114 distance adjustment section

Claims

In a robot apparatus having a recognition function for recognizing a predetermined recognition target,
A plurality of imaging means for imaging a surrounding situation and outputting an image signal;
Voice input means for inputting voice;
Ultrasonic output means for emitting an ultrasonic pulse and receiving a reflected wave reflected from the recognition target;
Distance estimation means for estimating a distance to the recognition target based on output results of the plurality of imaging means, the voice input means, and the ultrasonic output means ;
Distance adjusting means for adjusting the distance to the recognition object based on the distance estimated by the distance estimating means;
With
The distance estimation means has a large difference between the distance estimated from the output result of the ultrasonic output means and the distance estimated from the output results of the plurality of imaging means and the voice input means. When the distance estimated from the output result is short, it is determined that the object detected by the ultrasonic output unit is an obstacle, and the output result of the ultrasonic output unit is excluded to estimate the distance to the recognition target And
When it is determined that there is an obstacle, the distance adjustment means adjusts the distance to the user by causing the robot device to act so that the user moves, and when it is determined that there is no obstacle, Adjust the distance to the recognition target by moving the robot device
A robot apparatus characterized by that .

The robot apparatus according to claim 1, wherein the distance estimation unit estimates a distance to the recognition target by performing stereo processing using image signals output from the plurality of imaging units.

The recognition target recognized by the plurality of imaging units is a user,
The distance estimation unit detects an area of the user's face using an image signal output from the imaging unit, and estimates a distance to the user based on the face area. The robot apparatus as described in.

The recognition target recognized by the voice input means is a user,
The robot apparatus according to claim 1, wherein the distance estimation unit estimates a distance to the user based on a volume of a voice uttered by the user input to the voice input unit.

It further comprises audio output means for outputting audio,
The robot apparatus according to claim 1 , wherein the distance adjusting unit adjusts a distance to the user by causing the audio output unit to output a sound so that the user moves.

The robot apparatus according to claim 1 , wherein the distance adjustment unit adjusts a distance to the user by causing the robot apparatus to perform an operation that prompts the user to move the user.

A determination unit for determining whether the distance estimated by the distance estimation unit is within a predetermined range;
The robot apparatus according to claim 1, wherein the distance adjustment unit adjusts a distance to the recognition target based on a determination result of the determination unit.

The robot apparatus according to claim 7 , further comprising range setting means for setting the predetermined range.

It said range setting means, to claim 8, wherein the measuring the ambient background noise based on the output result of the speech input means, for dynamically setting the predetermined range by the magnitude of the background noise The robot apparatus described.

The range setting means acquires an operation state of the robot apparatus, estimates a noise component generated by the robot apparatus itself, and dynamically sets the predetermined range according to the magnitude of the noise component. The robot apparatus according to claim 8 .

In a robot control method for controlling a robot apparatus having a recognition function for recognizing a predetermined recognition target,
An imaging step of imaging the surrounding situation and outputting an image signal;
A voice input step for inputting voice;
An ultrasonic output step of receiving an reflected wave reflected from the recognition target by emitting an ultrasonic pulse;
A distance estimation step of estimating a distance to the recognition target based on processing results of the imaging step, the voice input step, and the ultrasonic output step ;
A distance adjusting step of adjusting a distance to the recognition target based on a result of the processing of the distance estimating step;
Including
In the processing of the distance estimation step, the difference between the distance estimated from the processing result of the ultrasonic output step and the distance estimated from the processing result of the imaging step and the voice input step is large. When the distance estimated from the processing result is short, it is determined that the recognition target reflected by the ultrasonic pulse is an obstacle, and the distance to the recognition target is determined by excluding the processing result of the ultrasonic output step. Estimate
In the process of the distance adjustment step, when it is determined that there is an obstacle, it is determined that there is no obstacle by adjusting the distance to the user by causing the robot device to act so that the user moves. If the robot apparatus is moved, the distance to the recognition target is adjusted.
Robot control wherein the.

A program for causing a computer to control a robot apparatus having a recognition function for recognizing a predetermined recognition target,
An imaging step of imaging the surrounding situation and outputting an image signal;
A voice input step for inputting voice;
An ultrasonic output step of receiving an reflected wave reflected from the recognition target by emitting an ultrasonic pulse;
A distance estimation step of estimating a distance to the recognition target based on processing results of the imaging step, the voice input step, and the ultrasonic output step ;
A distance adjusting step of adjusting a distance to the recognition target based on a result of the processing of the distance estimating step;
Including
In the processing of the distance estimation step, the difference between the distance estimated from the processing result of the ultrasonic output step and the distance estimated from the processing result of the imaging step and the voice input step is large. When the distance estimated from the processing result is short, it is determined that the recognition target reflected by the ultrasonic pulse is an obstacle, and the distance to the recognition target is determined by excluding the processing result of the ultrasonic output step. Estimate
In the process of the distance adjustment step, when it is determined that there is an obstacle, it is determined that there is no obstacle by adjusting the distance to the user by causing the robot device to act so that the user moves. If the robot apparatus is moved, the distance to the recognition target is adjusted.
Recording medium having a computer is recorded readable program characterized by.

In a program for causing a computer to control a robot apparatus having a recognition function to recognize a predetermined recognition target
An imaging step of imaging the surrounding situation and outputting an image signal;
A voice input step for inputting voice;
An ultrasonic output step of receiving an reflected wave reflected from the recognition target by emitting an ultrasonic pulse ;
A distance estimation step of estimating a distance to the recognition target based on processing results of the imaging step, the voice input step, and the ultrasonic output step ;
A distance adjusting step of adjusting a distance to the recognition target based on a result of the processing of the distance estimating step;
Including
In the processing of the distance estimation step, the difference between the distance estimated from the processing result of the ultrasonic output step and the distance estimated from the processing result of the imaging step and the voice input step is large. When the distance estimated from the processing result is short, it is determined that the recognition target reflected by the ultrasonic pulse is an obstacle, and the distance to the recognition target is determined by excluding the processing result of the ultrasonic output step. Estimate
In the process of the distance adjustment step, when it is determined that there is an obstacle, it is determined that there is no obstacle by adjusting the distance to the user by causing the robot device to act so that the user moves. If the robot apparatus is moved, the distance to the recognition target is adjusted.
A program that causes a computer to execute processing .