JP2004133637A

JP2004133637A - Face detector, face detection method and program, and robot apparatus

Info

Publication number: JP2004133637A
Application number: JP2002296783A
Authority: JP
Inventors: Hidehiko Morisada; 森貞　英彦
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2002-10-09
Filing date: 2002-10-09
Publication date: 2004-04-30

Abstract

<P>PROBLEM TO BE SOLVED: To highly efficiently detect a face picture even the face is not full-faced like that inclined or turned to a different direction. <P>SOLUTION: A face detection device detects a face by template matching. When an input picture is supplied, it is discriminated whether a rotation angle R_found at which a face has been detected in a preceding input picture exists or not. In the case of the absence of the rotation angle R_found, a template picture is used to obtain a correlation value, and a face is detected in a discrimination part 302 on the basis of the correlation value. When a face cannot be detected, the template picture is rotated at, for example, 90° from the present state to repeat a series of processing until a face can be detected or matching is performed for all rotation angles. Meanwhile, in the case that the rotation angle of the template picture at the preceding time has been stored, the template picture rotated at this stored rotation angle is used to perform matching to the input picture; and when a face is detected, the rotation angle R=R_found at this time is stored, and processing of the next input picture is performed. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、入力画像から対象物の顔を検出する顔検出装置、及び顔検出方法、並びに顔検出装置を搭載してエンターテインメント性の向上等を図ったロボット装置、並びに顔検出を行う動作をコンピュータに実行させるためのプログラムに関する。
【０００２】
【従来の技術】
電気的又は磁気的な作用を用いて人間（生物）の動作に似た運動を行う機械装置を「ロボット」という。我が国においてロボットが普及し始めたのは、１９６０年代末からであるが、その多くは、工場における生産作業の自動化・無人化等を目的としたマニピュレータ及び搬送ロボット等の産業用ロボット（Ｉｎｄｕｓｔｒｉａｌ
Ｒｏｂｏｔ）であった。
【０００３】
最近では、人間のパートナーとして生活を支援する、即ち住環境その他の日常生活上の様々な場面における人的活動を支援する実用ロボットの開発が進められている。このような実用ロボットは、産業用ロボットとは異なり、人間の生活環境の様々な局面において、個々に個性の相違した人間、又は様々な環境への適応方法を自ら学習する能力を備えている。例えば、犬又は猫のように４足歩行の動物の身体メカニズム及びその動作を模した「ペット型」ロボット、或いは、２足直立歩行を行う人間等の身体メカニズム及びその動作をモデルにしてデザインされた「人間型」又は「人間形」ロボット（Ｈｕｍａｎｏｉｄ　Ｒｏｂｏｔ）等のロボット装置は、既に実用化されつつある。
【０００４】
これらのロボット装置は、産業用ロボットと比較して、例えばエンターテインメント性を重視した様々な動作等を行うことができるため、エンターテインメントロボットと呼称される場合もある。また、そのようなロボット装置には、ＣＣＤ（Ｃｈａｒｇｅ　　Ｃｏｕｐｌｅｄ　Ｄｅｖｉｃｅ）カメラ及びマイクロホン等の各種外部センサが搭載され、これら外部センサの出力に基づいて外部状況を認識して、外部からの情報及び内部の状態に応じて自律的に動作するものがある。
【０００５】
ところで、かかるエンターテインメント型のロボット装置において、対話中にその相手となる人間の顔や、移動中に視界内に入る人間の顔を検出して、その人間の顔を見ながら対話や動作を行うことができれば、人間が普段行う場合と同様に、その自然性から考えて最も望ましく、エンターテインメントロボット装置としてのエンターテインメント性をより一層向上させ得るものと考えられる。
【０００６】
例えば、下記特許文献１には、顔の向き傾き及び表情等による顔パターンの変化に左右されることなく顔パターンを認識するための顔認識装置が開示されている。この特許文献１に記載の技術においては、顔認識装置は、同じ顔画像が入力され、この顔画像に対し夫々異なる変形を除去するための変換を施して正規化された顔パターンを出力する、互いに独立して動作可能な複数の逆変換部と、この逆変換部から出力された複数の顔パターンと、予め用意された複数の人物の参照パターンと比較して類似度を計算する識別部と、この識別結果に基づいて顔画像に対応する人物を特定する結合部とを備える。逆変換部が除去する変形は、顔の位置ずれ、顔のカメラに映りこむ大きさの違い、顔の上下左右の向きの違い、顔の傾きの違い等の変形要素の組み合わせからなり、変形がない顔領域として切り出した正方形領域を、画面に平行な軸周りの回転を考慮したり、画面に垂直な軸周りの回転を考慮したりした領域に変換する。これにより、顔が前後に傾いていたり、顔の向きが違っているような場合においても、その逆変換部にて逆変換すれば認識することができる。
【０００７】
【特許文献１】
特開平１２−０９０１９１号公報
【０００８】
【発明が解決しようとする課題】
しかしながら、上述の特許文献１に記載の技術においては、予め想定される変形要素に対応した変形方法の数の逆変換部を設ける必要があり、装置が大型化するため、ロボット装置等の限られたリソースしか持たない装置に搭載するのは不向きである。また、顔の上下左右の向き又は傾き等の大きさを予め設定しておかなければならず、移動可能なロボット装置に搭載する場合、どれくらいの変形になるか予想できない場合等には認識率が低下する。例えば、上述の特許文献１においては、逆変換部では、予め設定する変形度合いにおいて、変形度合いが小さくなる可能性が大きく、変形度合いが大きくなる可能性が低いとして、変形度合いが小さく設定された逆変換部からの出力に対して識別部で算出された類似度ほど大きくなるような重みづけをして人物を特定しているが、ロボット装置に搭載された場合、ロボット装置が転倒すれば、撮影される顔画像は標準の顔パターンから大きく外れたものになるため、認識率が低下してしまう。一方、変形度合いが大きいほど、識別部で算出された類似度に対して大きい重み付けをした場合、変形により、元の顔パターンが別人に近い顔に変形されることになり、認識率が低下してしまう。
【０００９】
本発明は、このような従来の実情に鑑みて提案されたものであり、顔が傾いたり向きが違っているような非正面顔であっても、高効率に検出可能な顔検出装置、顔検出方法及びプログラム並びにこれを搭載したロボット装置を提供することを目的とする。
【００１０】
【課題を解決するための手段】
上述した目的を達成するために、本発明に係る顔検出装置は、入力画像から対象物の顔を検出する顔検出装置において、入力画像と平均的な顔画像を示す所定サイズのテンプレート画像との相関を求める相関算出手段と、上記相関に基づき、該入力画像に顔画像が含まれるか否かを判定する判定手段と備え、上記相関算出手段は、上記判定手段により上記入力画像に顔画像が含まれないと判定された場合には、該テンプレート画像を画面に垂直な方向を軸として所定の角度回転させたテンプレート画像を使用して上記入力画像との相関を求め、上記判定手段により上記入力画像に顔画像が含まれると判定された場合には、該判定時のテンプレート画像を使用して次の入力画像との相関を求めることを特徴とする。
【００１１】
本発明においては、判定手段により入力画像に顔画像が含まれないと判定された場合には、テンプレート画像を所定角度回転させたテンプレート画像を使用して入力画像との相関を再び算出することにより、入力画像に含まれる顔画像が正面顔でない場合においても正面顔のテンプレート画像を使用して検出することができると共に、入力画像に顔画像が含まれると判定された場合には、判定時の回転角度（０°を含む）のテンプレート画像を使用して次の入力画像との相関を求めるため、マッチングの処理が高速化する。
【００１２】
また、供給された入力情報に基づいて動作を行うロボット装置に搭載されることができ、例えばロボット装置が転倒した場合等で正面顔が撮影できないような状況においても顔検出を可能とする。
【００１３】
更に、上記相関算出手段は、上記ロボット装置に備えられたロボット装置自身の姿勢を検出する姿勢検出手段からの姿勢検出結果に基づき、上記回転角度を決定してもよく、例えば転倒した場合、姿勢情報に基づき、テンプレート画像の回転角を推定することにより、テンプレート画像を順次回転させて顔を検出するより処理が更に高速化する。
【００１４】
本発明に係る顔検出方法は、入力画像から対象物の顔を検出する顔検出装置の顔検出方法において、入力画像と平均的な顔画像を示す所定サイズのテンプレート画像との相関を求める相関算出工程と、上記相関に基づき、上記入力画像に顔画像が含まれるか否かを判定する判定工程とを有し、上記相関算出工程では、上記判定工程にて上記入力画像に顔画像が含まれないと判定された場合には、該テンプレート画像を画面に垂直な方向を軸として所定の角度回転させたテンプレート画像を使用して上記入力画像との相関を求め、上記判定工程にて上記入力画像に顔画像が含まれると判定された場合には、該判定時のテンプレート画像を使用して次の入力画像との相関が求められることを特徴とする。
【００１５】
また、本発明に係るプログラムは、上述した顔検出処理をコンピュータに実行させるものである。
【００１６】
本発明に係るロボット装置は、供給された入力情報に基づいて動作を行うロボット装置において、画像を撮像する撮像部と、上記撮像部から供給される入力画像から対象物の顔を検出する顔検出部とを備え、上記顔検出部は、上記入力画像と平均的な顔画像を示す所定サイズのテンプレート画像との相関を求める相関算出手段と、上記相関に基づき、該入力画像に顔画像が含まれるか否かを判定する判定手段とを有し、上記相関算出手段は、上記判定手段により上記入力画像に顔画像が含まれないと判定された場合には、該テンプレート画像を画面に垂直な方向を軸として所定の角度回転させたテンプレート画像を使用して上記入力画像との相関を求め、上記判定手段により上記入力画像に顔画像が含まれると判定された場合には、該判定時のテンプレート画像を使用して次の入力画像との相関を求めることを特徴とする。
【００１７】
【発明の実施の形態】
（１）第１の実施の形態
本実施の形態における顔検出装置は、例えば後述するロボット装置に搭載することができる。以下、ロボット装置に搭載して周囲の人間の顔を認識するのに好適な顔検出装置について説明するが、ロボット装置の構成についての詳細は後述する。ロボット装置は、ＣＣＤカメラと、ＣＣＤカメラにより取得したフレーム画像を記憶するメモリと、このメモリに記憶されたフレーム画像の中から人間の顔画像を検出する顔検出タスク機能を有する顔検出モジュールとを備えている。テンプレート画像を使用した顔検出において、通常、テンプレートマッチングで使用される平均顔は、正面から撮影された一般的なものを使用して行うため、例えば、逆さから写した場合等、正面から写した顔以外（以下、非正面顔という。）を検出することが難しい。例えば、ロボット装置において、画像を取得するためのＣＣＤカメラが例えばロボット装置の顔部に搭載されていると、転倒して仰向けになったロボット装置を使用者等が覗き込んだ際等に写される顔画像は、通常の正面顔とは逆方向、即ち、正面顔を画面に垂直な方向を軸として略１８０°回転した状態の非正面顔となる。このような非正面顔が撮影された場合であっても顔検出を可能とするため、本実施の形態においては、正面顔のテンプレート画像を使用しても顔検出できなかった場合は、テンプレート画像を所定の角度回転して使用すると共に、顔検出された場合は、その回転角度を記憶し、次の入力画像のマッチングの際は、記憶した回転角度で回転したテンプレート画像を使用してマッチングを行うものである。
【００１８】
図１は、本発明の第１の本実施の形態における顔検出モジュールの機能を模式的に示すブロック図である。図１に示すように、画像検出モジュール３００は、ＣＣＤカメラ等の撮像手段による撮像結果として得られるフレーム画像を入力画像とし、この入力画像と平均的な顔画像を示す所定サイズのテンプレート画像との相関を求めるテンプレートマッチング部（相関手段）３０１と、相関に基づき、該入力画像に顔画像が含まれるか否かを判定する判定部３０２と、顔画像が含まれると判定された場合に、該顔画像を抽出する顔抽出部３０３とから構成される。
【００１９】
テンプレートマッチング部３０１に供給される入力画像は、用意されたテンプレート画像における顔の大きさと一致させるため、フレーム画像を例えば複数のスケールに変換した後、所定の大きさに切り出した画像とすることができ、テンプレートマッチング部３０１は、各スケール毎の入力画像についてマッチングを行う。テンプレート画像としては、例えば１００人程度の人物の平均からなる平均的な顔画像を使用することができる。本実施の形態においては、テンプレート画像は正面顔とし、このときの回転角Ｒ＝０°とする。
【００２０】
判定部３０２は、テンプレートマッチング部３０１におけるテンプレートマッチングにおいて、所定の閾値以上の相関値を示した場合にその入力画像に顔画像が含まれると判定し、顔抽出部３０３により、該当する顔領域を抽出する。
【００２１】
ここで、判定部３０２において、いずれのマッチング結果も所定の閾値未満である場合は、入力画像にはテンプレート画像が示す顔が含まれていないと判定し、その判定結果をテンプレートマッチング部３０１に返す。マッチング部３０１は、入力画像に顔画像が含まれないと判定された場合、テンプレート画像を画面に垂直な方向を軸として所定角度回転し、入力画像に対して回転後の回転テンプレート画像を使用して再びテンプレートマッチングを行う。本実施の形態においては、顔画像が検出されなかった場合、テンプレート画像を画面に垂直な軸に対して所定方向に９０°ずつ、即ち、回転角Ｒ＝９０°、１８０°、２７０°となるよう、顔が検出されるまでテンプレート画像を順次回転させた回転テンプレート画像を使用するものとする。なお、この回転方向は、右でも左でもよく、順次回転させる回転角は９０°に限らず、例えば１８０°、４５°等適宜設定するものとする。
【００２２】
判定部３０２は、入力画像とＲ＝９０°として回転させた変形後のテンプレート画像とのマッチング結果を基に顔画像が含まれるか否かを判定する。そして、上述した如く、相関値が所定の閾値以上である場合、顔画像が含まれると判定する。顔画像が含まれると判定された場合は、その結果をテンプレートマッチング部３０１に入力し、テンプレートマッチング部３０１では、この顔画像が含まれると判定された回転角度（以下、Ｒ＝Ｒ＿ｆｏｕｎｄという。）が記憶される。テンプレートマッチング部３０１は、顔画像が検出された際の回転角度を記憶し、その回転角度のテンプレート画像を使用して次回の入力画像に対してマッチングを行う。ここで、顔画像が検出されなかった場合は、更に９０°回転させ、回転角Ｒ＝１８０°としたテンプレート画像を使用して再びマッチングをとる。
【００２３】
テンプレートマッチング部３０１に入力される画像は、例えば４０ｍｓｅｃ等毎に撮影されたものであり、一旦顔画像が検出された場合、次の入力画像が入力されるまでの短い間では、その顔の方向が急には変わらないことを利用し、例えば、１８０°回転した状態の顔画像が検出された場合に、次回の入力画像においても、その回転角のテンプレート画像を使用して顔検出する方が、再び正面顔を順次回転させてテンプレートマッチングを行うのに比して、格段に処理効率が高いためでる。
【００２４】
図２は、本実施の形態における顔検出方法を示すフローチャートである。図２に示すように、テンプレートマッチング部３０１に入力画像が供給されると（ステップＳ１）、前回の入力画像において顔検出された場合における回転角Ｒ＿ｆｏｕｎｄが存在するか否かが判定される（ステップＳ２）。ここで、前回の入力画像において、顔検出されてない場合は、回転角Ｒ＿ｆｏｕｎｄが存在しないため、テンプレート画像の回転角Ｒ＝０°、即ち、正面顔のテンプレート画像を使用して相関値を求める（ステップＳ３）。そして、この相関値に基づき判定部３０２において、相関値が所定の閾値以上か否かを判定する（ステップＳ４）。ステップ４において、相関値が所定の閾値未満であるとき、顔検出できなかったものとし、テンプレート画像の回転が終了したか否か、即ち、例えば、所定方向に９０°回転させる場合は、最初の状態から２７０°回転したが否かを判定し（ステップＳ５）、回転していない場合は、テンプレート画像を現在の状態から所定角度、即ち、本実施の形態においては、９０°回転する（ステップＳ６）。そして、ステップＳ４で顔が検出されると判定されるまでか、又はステップＳ５で２７０°、即ち、全ての回転角について、テンプレートマッチングを行うまで一連の処理を繰り返す。
【００２５】
一方、ステップＳ２において、前回の入力画像において顔画像が検出され、そのときのテンプレート画像の回転角Ｒ＿ｆｏｕｎｄが記憶されている場合、回転角をＲ＿ｆｏｕｎｄとたテンプレート画像を使用し、入力画像に対してマッチングを行う（ステップＳ７）。そして、上述のステップ４の顔検出判定に進む。
【００２６】
ステップＳ４において、相関値が所定の閾値以上であり、顔検出されたと判定された場合は、そのときの回転角Ｒ＝Ｒ＿ｆｏｕｎｄを記憶し（ステップＳ８）、ステップＳ１に戻り、次の入力画像の処理を行う。
【００２７】
このように構成された本実施の形態においては、前回の入力画像で顔画像が検出された場合は、そのときのテンプレート画像の回転角を記憶しておき、この回転角としたテンプレート画像を使用して、次の入力画像に対してマッチングを行うため、マッチング処理が高速化する。また、正面顔のテンプレート画像を使用し、例えば回転角０°の正面顔でマッチングを行い、顔が検出できなかった場合に、例えば９０°等、所定角度回転させて非正面顔としたテンプレート画像を使用して顔検出を行う動作を繰り返すことにより、正面顔のテンプレート画像のみを使用して、非正面顔を検出することが可能となり、極めて高効率で顔検出を行うことができる。
【００２８】
ここで、非正面顔を検出しようとした際に全ての方向に対して演算を行う場合に比して演算量を低減するために、上述のように、所定角度ずつ順次回転させたテンプレート画像を使用するのではなく、例えば、回転角Ｒ＝１８０°のテンプレート画像のみ等、所定の回転角のテンプレート画像のみのマッチングを行ってもよい。
【００２９】
また、前回のマッチングにおいて、回転角Ｒ＿ｆｏｕｎｄが記憶されていた場合に、次の入力画像において、顔画像が検出されなかった場合は、回転角をＲ＿ｆｏｕｎｄから更に所定方向に９０°回転させるものとしたが、再び回転角Ｒ＝０°から処理を開始してもよい。また、本実施の形態においては、所定方向に９０°回転するものとしたが、Ｒ＝０°の次に、Ｒ＝１８０°としたり、Ｒ＝９０°の次にＲ＝２７０°としたりする等、適宜回転角度を選択できるようにしてもよい。
【００３０】
更に、回転角Ｒ＿ｆｏｕｎｄが記憶されている場合に、上述したように、例えば入力画像は、例えば４０ｍｓｅｃ間隔で入力されるような場合、次の入力画像においても、回転角Ｒ＿ｆｏｕｎｄで顔検出される可能性が高い。この際、次の入力画像において、回転角Ｒ＿ｆｏｕｎｄのテンプレート画像を使用しても顔検出できなかった場合、回転角Ｒ＿ｆｏｕｎｄ近傍で顔検出される可能性が高いため、回転角Ｒ＿ｆｏｕｎｄ±αとしたテンプレート画像を使用して顔検出を行ってもよい。
【００３１】
更にまた、テンプレートマッチングにより顔検出されなかった場合、テンプレート画像を回転するものとしたが、Ｒ＝０°のテンプレート画像と共に、例えばＲ＝９０°、１８０°、２７０°で回転した回転後のテンプレート画像を予め準備するようにしてもよい。
【００３２】
（２）第２の実施の形態
次に、本発明の第２の実施の形態について説明する。本第２の実施の形態は、テンプレートマッチングの際に姿勢情報が供給され、これに基づき、得られる顔画像の回転角を予測し、テンプレート画像の回転角を選択するようにした点が上述の第１の実施の形態と異なる。
【００３３】
即ち、ロボット装置には、自身の姿勢を検出する姿勢センサ等が設けられており、この姿勢センサからの姿勢情報がテンプレート画像マッチン部に供給される。上述したように、例えばロボット装置が転倒した場合、周囲の人間がそのロボット装置を覗き込む等すれば、ロボット装置が取得する画像、即ち、顔検出モジュールに供給される入力画像に含まれる顔画像は、通常の正面顔から画面に垂直な方向を軸として略１８０°回転したものとなることが予想される。従って、このような姿勢情報をテンプレートマッチング部に供給し、テンプレート画像の回転角を選択させるようにすることにより、処理が高速化する。
【００３４】
図３は、本発明の第２の実施の形態における顔検出方法を示すフローチャートである。図３に示すように、入力画像が供給されると（ステップＳ１１）、姿勢情報が供給されているか否かが判定される（ステップＳ１２）。ここで、姿勢情報が供給されている場合は、その姿勢情報に基づき最も可能性が高いと考えられる回転角Ｒ＿ｓｅｎｓｏｒを選択し、この回転角Ｒ＿ｓｅｎｓｏｒのテンプレート画像を使用して、テンプレートマッチングを行い（ステップＳ２０）、その結果、相関値が所定の閾値以上であるか否かが判定される（ステップＳ１５）。
【００３５】
一方、ステップＳ１２において、姿勢情報が供給されてない場合は、上述の第１の実施の形態と同様の方法にて顔検出が行われる。即ち、前回の入力画像において顔検出された場合における回転角Ｒ＿ｆｏｕｎｄが存在するか否かが判定され（ステップＳ１３）、回転角Ｒ＿ｆｏｕｎｄが存在しなければ、テンプレート画像の回転角Ｒ＝０°、即ち、正面顔のテンプレート画像を使用して相関値を求める（ステップＳ１４）。そして、この相関値に基づき判定部１２において、相関値が所定の閾値以上か否かが判定され（ステップＳ１５）、相関値が所定の閾値未満であるとき、顔検出できなかったものとし、テンプレート画像を正面顔であるＲ＝０°から２７０°回転させたか否かを判定し（ステップＳ１６）、回転していない場合はテンプレート画像を現在の状態から９０°回転する（ステップＳ１７）。そして、再びステップＳ１４に戻り、テンプレートマッチングを行う。
【００３６】
一方、ステップＳ１３において、前回の入力画像において顔画像が検出され、そのときのテンプレート画像の回転角Ｒ＿ｆｏｕｎｄが記憶されている場合、回転角をＲ＿ｆｏｕｎｄとしたテンプレート画像を使用し、入力画像に対してマッチングを行い（ステップＳ１８）、相関値が所定の閾値以上か否かを判定する（ステップＳ１５）。
【００３７】
ステップＳ１５において、相関値が所定の閾値以上であり、顔検出されたと判定された場合は、そのときの回転角Ｒ＝Ｒ＿ｆｏｕｎｄを記憶し（ステップＳ１９）、ステップＳ１に戻り、次の入力画像の処理を行う。
【００３８】
このように構成された本実施の形態においても、第１の実施の形態と同様に、前回の入力画像において顔検出された場合には、そのテンプレート画像の回転角Ｒ＿ｆｏｕｎｄを記憶しておき、この回転角Ｒ＿ｆｏｕｎｄのテンプレート画像を使用してマッチングを行うため、処理が高速化すると共に、姿勢情報が入力された場合は、姿勢情報に基づきテンプレート画像の回転角を予測し、この予測した回転角Ｒ＿ｓｅｎｓｏｒのテンプレート画像を使用してマッチングを行うので、例えばロボット装置が転倒する等、急な動作にも対応して高効率で短時間に顔検出することが可能となる。
【００３９】
（３）第１の適用例
次に、上述の第１及び第２の実施の形態で説明したようなテンプレートマッチングを適用して顔検出する本発明の第１の適用例について説明する（特願２００２−１６３６２２号参照）。本適用例においてもロボット装置に設けられたものであり、ロボット装置の全体の制御を司るコントロール部及びその内部に設けられた内部メモリ等により、顔検出処理が行われるものであり、第１及び第２の実施の形態においては、テンプレートマッチングにおいて所定の閾値以上である場合は、該当する画像を顔画像として抽出するものとしたが、本適用例においては、テンプレート画像マチングにより顔画像と判定された後、これを顔候補とし、更に、サポートベクタマシン等の識別手段を使用して顔であるか否かを判定するものである。
【００４０】
図４は、本発明の適用例を示す顔検出装置の機能を模式的に示すブロック図である。本適用例における顔検出タスク機能に関するコントロール部の処理内容を機能的に分類すると、図４に示すように、入力画像スケール変換部３６０、ウィンドウ切出部３６１、テンプレートマッチング部３６２、前処理部３６３、パターン識別部３６４及び重なり判定部３６５に分けることができる。
【００４１】
入力画像スケール変換部３６０は、ロボット装置の頭部等に設けられたＣＣＤカメラからの画像信号Ｓ１Ａに基づくフレーム画像を内部メモリから読み出して、当該フレーム画像を縮小率が相異なる複数のスケール画像に変換する。この適用例の場合、２５３４４（＝１７６×１４４）画素からなるフレーム画像に対して、これを０．８倍ずつ順次縮小して５段階（１．０倍、０．８倍、０．６４倍、０．５１倍、０．４１倍）のスケール画像（以下、これを第１〜第５のスケール画像と呼ぶ）に変換する。
【００４２】
続くウィンドウ切出部３６１は、第１〜第５のスケール画像のうち、まず第１のスケール画像に対して、画像左上を起点として順に画像右下まで、適当な画素（例えば２画素）分を右側又は下側にずらしながらスキャンするようにして、４００（＝２０×２０）画素の矩形領域（以下、この領域をウィンドウ画像と呼ぶ。）を順次切り出す。
【００４３】
その際、ウィンドウ切出部３６１は、第１のスケール画像から切り出した複数のウィンドウ画像のうち先頭のウィンドウ画像を後段のテンプレートマッチング部３６２に送出する。
【００４４】
テンプレートマッチング部３６２は、ウィンドウ切出部３６１から得られた先頭のウィンドウ画像について、正規化相関法や誤差二乗法等の演算処理を実行してピーク値をもつ関数曲線に変換した後、当該関数曲線に対して認識性能が落ちない程度に十分に低い閾値を設定して、当該閾値を基準として当該ウィンドウ画像が顔画像か否かを判断する。この際、上述の第１及び第２の実施の形態において、説明した如く、顔が検出されない場合は、テンプレート画像を所定角度回転させ、再度テンプレートマッチングを行うと共に、前回の入力画像において顔検出されている場合には、そのとき回転角Ｒ＿ｆｏｕｎｄが記憶されており、この回転角Ｒ＿ｆｏｕｎｄのテンプレート画像を使用してマッチング処理を行う。
【００４５】
本適用例の場合においても、テンプレートマッチング部３６２では、例えば１００人程度の人物の平均からなる平均的な顔画像をテンプレート画像として、かかる顔画像か否かの判断基準となる閾値を設定するようになされている。これにより当該ウィンドウ画像について、テンプレート画像となる平均的な顔画像との大まかなマッチングをとり得るようになされている。
【００４６】
このようにしてテンプレートマッチング部３６２は、ウィンドウ切出部３６１から得られたウィンドウ画像について、テンプレート画像によるマッチングをとり、顔画像であると判断された場合には、当該ウィンドウ画像をスコア画像として後段の前処理部３６３に送出する一方、顔画像でないと判断された場合には、当該ウィンドウ画像をそのまま後段の重なり判定部３６５に送出する。
【００４７】
この時点で顔画像であると判断されたウィンドウ画像（スコア画像）には、実際には顔画像以外の判断誤りの画像が大量に含まれるが、日常のシーンの中では顔に類似した背景画像が多く存在することはあまりないため、ほとんどのウィンドウ画像は顔画像ではないと判断されることとなり極めて有効である。
【００４８】
実際に上述した正規化相関法や誤差二乗法等の演算処理は、後段の前処理部及びパターン識別部における演算処理と比較すると、演算量が１０分の１から１００分の１程度で済むと共に、実験上この段階で顔画像以外の画像を８０〔％〕以上はふるい落とすことができることが確認されたため、コントロール部全体としては大幅な演算量の削減につながることがわかる。
【００４９】
前処理部３６３は、テンプレートマッチング部３６２から得られたスコア画像について、矩形領域でなる当該スコア画像から人間の顔画像とは無関係である背景部分に相当する４隅の領域を除去するために、当該４隅の領域を切り取ったマスクを用いて、４００（＝２０×２０）画素あるスコア画像から３６０画素分を抽出する。
【００５０】
そして前処理部３６３は、撮像時の照明により濃淡で表される被写体の傾き条件を解消すべく、当該抽出した３６０画素分のスコア画像のうち顔画像として最適な部位を基準とする平面を形成するように、例えば平均二乗誤差（ＲＳＭ：Ｒｏｏｔ　Ｍｅａｎ　Ｓｑｕａｒｅ）等による算出方法を用いて当該３６０画素の濃淡値に補正をかける。
【００５１】
続いて前処理部３６３は、当該３６０画素分のスコア画像のコントラストを強調した結果をヒストグラム平滑化処理を行うことにより、ＣＣＤカメラ５０のゲインや照明の強弱によらずに検出できるようにする。
【００５２】
次いで、前処理部３６３は、ガボア・フィルタリング（Ｇａｂｏｒ　Ｆｉｌｔｅｒｉｎｇ）処理を行うことにより、当該３６０画素分のスコア画像をベクトル変換し、得られたベクトル群を更に１本のパターンベクトルに変換する。
【００５３】
パターン識別部３６４は、外部から供給される学習用のデータすなわち教師データを用いて、暫定的な識別関数を得た後、当該識別関数を前処理部３６３からパターンベクトルとして得られた３６０画素分のスコア画像に試して顔の検出を行う。そして、検出に成功したものを顔データとして出力する。また検出に失敗したものを非顔データとして学習データに追加して、更に学習をし直す。
【００５４】
パターン識別部３６４における顔認識に関して、例えば、パターン認識の分野で最も学習汎化能力が高いとされるサポートベクタマシン（Ｓｕｐｐｏｒｔ　Ｖｅｃｔｏｒ　Ｍａｃｈｉｎｅ：ＳＶＭ）を用いて該当する顔か否かの識別を行うことができる。
【００５５】
サポートベクタマシン自体に関しては、例えばＢ．ｓｈｏｌｋｏｐｆ外著の報告（Ｂ．Ｓｈｏｌｋｏｐｆ、Ｃ．Ｂｕｒｇｅｓ、Ａ．Ｓｍｏｌａ、Ａｄｖａｎｃｅ　ｉｎ　Ｋｅｒｎｅｌ　Ｓｕｐｐｏｒｔ　Ｖｅｃｔｏｒ　ｅａｒｎｉｎｇ、Ｔｈｅ　ＭＩＴ　Ｐｒｅｓｓ、１９９９．）を挙げることができる。本願出願人が行った予備実験の結果からは、サポートベクタマシンによる顔認識方法は、主成分分析（ＰＣＡ）やニューラル・ネットワークを用いる手法に比べ、良好な結果を示すことが判っている。
【００５６】
そして、パターン識別部３６４は、前処理部３６３から与えられたスコア画像に基づくパターンベクトルについて、当該スコア画像内に顔データが存在するか否かを判断し、存在する場合のみ当該スコア画像の画像領域における左上位置（座標）及びその大きさ（縦横の画素数）と、当該スコア画像の切出し元となるスケール画像のフレーム画像に対する縮小率（すなわち上述の５段階のうちの該当する段階）とをリスト化し、これをリストデータとして内部メモリに格納する。
【００５７】
この後、パターン識別部３６４は、ウィンドウ切出部３６１に対して、第１のスケール画像のうち先頭のウィンドウ画像の顔検出が終了した旨を通知することにより、当該ウィンドウ切出部３６１から第１のスケール画像のうち次にスキャンされたウィンドウ画像をテンプレートマッチング部３６２に送出させる。
【００５８】
そしてテンプレートマッチング部３６２は、当該ウィンドウ画像がテンプレート画像にマッチングした場合のみスコア画像とし、そのときのテンプレート画像の回転角Ｒ＿ｆｏｕｎｄを記憶すると共に、上記スコア画像を前処理部３６３に送出する。前処理部３６３は、当該スコア画像をパターンベクトルに変換してパターン識別部３６４に送出する。パターン識別部３６４は、パターンベクトルから識別結果として得られた顔データに基づいてリストデータを生成して内部メモリに格納する。
【００５９】
このようにウィンドウ切出部３６１おいて第１のスケール画像から切り出した全てのウィンドウ画像について、スキャン順にテンプレートマッチング部３６２、前処理部３６３及びパターン識別部３６４の各処理を行うことにより、当該第１のスケール画像から撮像結果に存在する顔画像を含むスコア画像を複数検出することができる。
【００６０】
この後、パターン識別部３６４は、入力画像スケール変換部３６０に対して、第１のスケール画像の顔検出が終了した旨を通知することにより、当該入力画像スケール変換部３６０から第２のスケール画像をウィンドウ切出部３６１に送出させる。
【００６１】
そして第２のスケール画像についても、上述した第１のスケール画像と同様の処理を行って、当該第２のスケール画像から撮像結果に存在する顔画像を含むスコア画像を複数検出した後、第３〜第５のスケール画像についても同様の処理を順次行う。テンプレートマッチング部３６２で記憶された回転角Ｒ＿ｆｏｕｎｄは、次の入力画像における同スケール画像とのマッチングをとる際に使用することができる。また、上述の第２の実施の形態と同様に、姿勢情報から推定される回転角Ｒ＿ｓｅｎｓｏｒのテンプレート画像を使用してマッチングを行うようにしてもよい。
【００６２】
かくしてパターン識別部３６４は、撮像画像であるフレーム画像を５段階に縮小した第１〜第５のスケール画像について、当該撮像画像内に存在する顔画像を含むスコア画像をそれぞれ複数検出した後、その結果得られた回転角Ｒ＿ｆｏｕｎｄを含むリストデータをそれぞれ内部メモリに格納する。この場合、元のフレーム画像内での顔画像のサイズによっては、全くスコア画像が得られない場合もあるが、少なくとも１以上（２又は３以上でもよい）のスケール画像でスコア画像が得られれば、顔検出処理を続行することとする。
【００６３】
ここで、各スケール画像において顔画像を含む複数のスコア画像は、ウィンドウ切出部３６１におけるスキャンが２画素ずつすらして行われたため、実際に顔がある領域とその近傍領域とで高い相関性があり、隣接するスコア画像同士で相互に重なり合う画像領域を含むこととなる。
【００６４】
そこで続く重なり判定部３６５は、内部メモリに格納されている第１〜第５のスケール画像ごとに複数のリストデータをそれぞれ読み出して、当該各リストデータに含まれるスコア画像同士を比較して、相互に重なり合う領域を含むか否かを判定する。
【００６５】
重なり判定部３６５は、当該判定結果に基づいて、スコア画像同士で重なり合う領域を除去することにより、各スケール画像において、最終的に複数のスコア画像を互いに重なることなく寄せ集めた単一の画像領域として得ることができ、当該画像領域を顔決定データとして新たに内部メモリに格納する。
【００６６】
また重なり判定部３６５は、テンプレートマッチング部６２において顔画像でないと判断された場合には、そのまま何もすることなく、内部メモリの格納も行わない。
【００６７】
次に、この適用例における動作について説明する。以上の構成において、このロボット装置では、ＣＣＤカメラにより撮像したフレーム画像を縮小率が相異なる複数のスケール画像に変換した後、当該各スケール画像の中からそれぞれ所定サイズのウィンドウ画像を所定画素ずつずらすようにスキャンしながら１枚ずつ切り出す。
【００６８】
このウィンドウ画像について、平均的な顔画像を表すテンプレート画像を用いてマッチングをとって大まかに顔画像であるか否かを判断するようにして、明らかに顔画像でないウィンドウ画像を除去することにより、後段の顔検出処理に要する演算量及び時間をその分減少させることができる。また、顔画像でないと判定された場合は、テンプレート画像を所定の回転角で順次回転させたテンプレート画像を使用しマッチングを行なうことにより、非正面顔であっても検出を可能にする。
【００６９】
続いてテンプレートマッチングで顔画像であると判断されたウィンドウ画像（すなわちスコア画像）について、当該スコア画像の矩形領域の４隅部分を除去した後、濃淡補正及び続くコントラスト強調の平滑化を行い、更に１本のパターンベクトルに変換する。
【００７０】
そして当該パターンベクトルについて、元のスコア画像内での顔検出を行って顔データ又は非顔データを判断し、顔データが存在するスコア画像の画像領域の位置（座標）及びその大きさ（画素数）と、当該スコア画像の切出し元となるスケール画像のフレーム画像に対する縮小率とをリスト化したリストデータを生成する。
【００７１】
このように各スケール画像毎にそれぞれ全てのスコア画像についてリストデータを生成した後、当該各リストデータに含まれるスコア画像同士を比較して、相互に重なり合う領域を除去した顔決定データを求めることにより、元のフレーム画像から顔画像を検出することができる。
【００７２】
このような顔検出タスク処理のうち特にテンプレートマッチング処理は、比較的構成が簡易な演算器にもたやすく実装できる上に、画像圧縮等で利用されるブロックマッチングの手法と類似する処理であることからＣＰＵを用いた高速処理を行うハードウェアが数多く存在する。従ってテンプレートマッチング処理に関してはさらなる高速化が可能である。
【００７３】
以上の構成によれば、このロボット装置において、ＣＣＤカメラにより撮像したフレーム画像について顔画像を検出する顔検出タスク処理の際、当該フレーム画像を相異なる縮小率で縮小した各スケール画像の中からそれぞれ所定サイズのウィンドウ画像を所定画素ずつずらすようにスキャンしながら１枚ずつ切り出した後、平均的な顔画像を表すテンプレート画像を用いてマッチングをとって大まかに顔画像であるか否かを判断するようにして、明らかに顔画像でないウィンドウ画像を除去するようにしたことにより、当該テンプレートマッチングで顔画像であると判断されたスコア画像に対する種々の顔検出処理に要する演算量及び時間をその分減少させることができ、ロボット装置全体の制御を司る主制御部３８１の処理負担を軽減させることができ、かくしてリアルタイム性を格段と向上し得るロボット装置を実現できると共に、テンプレート画像を順次回転させてマッチングを行うため、非正面顔であっても見逃すことなく検出することができる。
【００７４】
（４）第２の適用例
テンプレートマッチングを行って顔候補を抽出し（第１の工程）、この顔候補の中からＳＶＭ等により顔領域を判定して（第２の工程）顔領域を検出する方法において、第１の工程においては、単純に正規化相関値の代償により顔候補を決定しているため、顔候補の見逃しを軽減しようとした場合、閾値を上げる方法又は間引きを減らす方法をとることができるものの、閾値を下げると演算量が増大してしまい、ロボット装置等のリソースの限られた環境においては好ましくない場合がある。一方、閾値を上げると、第２の工程において顔判定するための候補画像が減るため、演算量を減らすことができるものの、本来顔である画像も候補画像から取り除いてしまい、顔画像を見逃してしまう場合がある。そこで、このような場合に好適な本発明の第２の適用例について説明する。
【００７５】
テンプレート画像と同一サイズの顔領域（顔画像）が入力画像内に存在する場合、この顔画像とテンプレート画像との相関をとれば、テンプレート画像サイズ近傍では最も相関値が大きくなる。よって、顔領域の候補を絞り込む際に、局所的な絞り込みを行うアルゴリズムを使用することにより、本来顔である画像を見逃すことなく顔候補画像を低減して後段の第２の工程にて顔判定する計算量を低減することができる。具体的には、入力画像と所定サイズの平均顔のテンプレート画像との正規化相関をとった相関値の集合であるマッチング結果における相関値の局所最大値に基づき候補となる顔領域を抽出するようにする。
【００７６】
即ち、図５（ａ）に示すように、任意のスケール画像から切り出した、例えば垂直方向の大きさ（ｙ軸方向の辺の長さ、以下高さという。）ｈｅｉ＿ｓ×水平方向の大きさ（ｘ軸方向の辺の長さ、以下幅という。）ｗｅｉｄ＿ｓのウィンドウ画像（スケール変換後の入力画像）Ｗ２について、図５（ｂ）に示すように、例えば高さｈｅｉ＿ｔ×幅ｗｉｄ＿ｓである第１のテンプレート画像サイズを有する平均的な顔画像であるテンプレート画像Ｔ２_１を使用し、ウィンドウ画像Ｗ２をスキャンし、所定画素（例えば１画素）ずつずらしながら移動させたテンプレート画像Ｔ２_１と上記入力画像との相関値の集合であるマッチング結果を求める。このマッチング結果は、テンプレート画像Ｔ２_１の移動に伴い相関値が２次元に配列されたものであり、図６に示すように、当該相関値を表す高さｈｅｉ＿ｒ×幅ｗｉｄ＿ｒのテンプレートマッチング結果画像Ｒ２が得られる。ここで、テンプレート画像レートマッチング結果画像Ｒ２の高さｈｅｒ＿ｒは、ｈｅｉ＿ｓ−（ｈｅｉ＿ｔ＋１）であり、画像Ｒ２の幅ｗｉｄ＿ｓは、ｗｉｄ＿ｓ−（ｗｉｄ＿ｔ＋１）となる。
【００７７】
次に、このテンプレート画像レートマッチング結果画像Ｒ２を所定のサイズ、例えば第１のテンプレート画像サイズと同一の大きさに分割し、各第１のテンプレート画像サイズに仕切られた分割領域毎に相関値の最大値を有する点（位置）を求め、これら各分割領域から得られた最大値を示す点のうち、所定の閾値以上のものを顔候補として抽出する。
【００７８】
即ち、平均顔のテンプレート画像を使用して正規化相関をしようした場合、必ずしも任意のパターンより、顔画像の方が相関値が高くなるという保証はないものの、テンプレート画像と同一のサイズの顔画像が存在する場合は、テンプレート画像サイズ近傍の大きさでは相関値が最大値をとることから、相関値が分割領域内で最大値となり、且つ所定の閾値以上の点を顔候補として抽出することにより、単にテンプレートマッチングの結果、相関値が所定の閾値以上であるものを顔候補として抽出する場合に比して、顔候補をより有効に絞り込むことができる。
【００７９】
ここで、本第２の適用例においては、任意の大きさのテンプレート画像を使用することができるが、使用するテンプレート画像サイズを切り替えて、テンプレート画像サイズを選択することにより、入力画像に対して準備できる全てのテンプレート画像サイズに対して演算をする場合に比して、演算量を減らして高効率化することができる。例えば、一度顔が検出された場合に、次に顔検出する際はそのテンプレート画像サイズを使用することができる。また、例えば、ロボット装置に設けられた距離センサを使用し、この距離センサからの距離情報に基づき入力画像に含まれる対象物との間の距離を認識することにより、対象物の顔領域の大きさを予測してテンプレート画像サイズを選択する対象距離切り替え手段を設ける等することができ、目的に応じてテンプレート画像サイズを切り替えることができる。
【００８０】
このウィンドウ画像について、平均的な顔画像を表すテンプレート画像を用いてマッチングをとって当該テンプレート画像との相関値の集合であるマッチング結果画像を生成する。このように各スケール画像毎にそれぞれ全てのウィンドウ画像についてスキャン順にそれぞれマッチング結果画像を生成する。以下、マッチング結果画像から顔候補を検出する工程について詳細に説明する。
【００８１】
図７は、テンプレートマッチング部において、テンプレートマッチング結果画像Ｒ２から顔候補となる画素を検出する各処理工程を示すフローチャートである。図７に示すように、先ず、テンプレートマッチング結果画像Ｒ２が入力されると、マッチング結果画像Ｒ２をテンプレート画像サイズに分割し、その分割領域の１つ、例えば０≦ｘ≦ｗｉｄ＿ｔ−１、０≦ｙ≦ｈｅｉ＿ｔ−１において、最も相関値が高い点（座標）を抽出する（ステップＳ２１）。以下、マッチング結果画像Ｒ２をテンプレート画像サイズに分割した領域を分割領域ｒｎ、分割領域ｒｎにおいて、相関値が最も大きい点（座標）をｌｏｃａｌ＿ｍａｘ（ｘ，ｙ）という。ここでは、この各分割領域内において最も相関値が高い画素を抽出するが、本適用例においては、マッチング結果画像において分割された分割領域を左から右へ一行ずつ順に処理を行う場合について説明する。
【００８２】
次に、ｌｏｃａｌ＿ｍａｘ（ｘ，ｙ）が所定の閾値（ｔｈ１）より大きいか否かを判定し（ステップＳ２２）、大きい場合は、顔候補として追加する（ステップＳ２３）。本適用例における顔検出装置は、スケールと共に入力画像に含まれると想定される顔の大きさのテンプレート画像サイズを選択する手段を有しているが、テンプレート画像サイズは異なる大きさの複数種類あり、複数種類ある各テンプレート画像サイズ毎にマッチング結果画像Ｒ２を算出して顔候補を抽出すると、同一の点が抽出される場合がある。従って、ステップＳ２３において、顔候補として同一の点がある場合、即ち、異なるテンプレート画像サイズで顔候補を抽出した際に既に抽出されている場合はこの点は追加しない。
【００８３】
次に、顔候補として抽出された点に対応するテンプレート画像サイズの入力画像領域において、この画像領域内に含まれる肌色画素の占有率を求める。本適用例においては、肌色画素の占有率を求める際に、肌色カラーテーブル１００を参照する。そして、この肌色画素占有率が所定の閾値（ｔｈ２）より大きいか否かを判定する（ステップＳＰ１４）。大きい場合は、このｌｏｃａｌ＿ｍａｘ（ｘ，ｙ）の周辺、例えば上下左右の８近傍点を顔候補として追加する（ステップＳ２５）。ここで、ステップＳ２３と同様に、既にこれらの８近傍点が既に顔候補として抽出されている場合は、候補に追加しない。
【００８４】
ステップＳ２２でｌｏｃａｌ＿ｍａｘ（ｘ，ｙ）が閾値ｔｈ１未満だった場合、ステップＳ２４でｌｏｃａｌ＿ｍａｘ（ｘ，ｙ）に相当する入力画像における肌色画素占有率が閾値ｔｈ２未満であった場合、及びステップＳ２５で顔候補の追加が終了した後は、いずれもステップＳ２６に進み、次の顔候補を抽出するために次の分割領域に移り、処理を進める。
【００８５】
先ず、マッチング結果画像Ｒ２において、ｘ方向にテンプレート画像サイズ分、即ち、ｗｉｄ＿ｔだけずれた隣の分割領域に移る（ステップＳ２６）。次に、ｗｉｄ＿ｔだけずれたｘ座標（ｘ＋ｗｉｄ＿ｔ）の分割領域において、そのｘ座標がマッチング結果画像の幅（ｘ方向の辺）ｗｉｄ＿ｒより大きい場合は、分割領域がマッチング結果画像に含まれないことを示し、次の行に移り、０≦ｘ≦ｗｉｄ＿ｔ−１であって、ｙ方向にテンプレート画像サイズ分、即ち、ｈｅｉ＿ｔだけずれた隣の分割領域に移る（ステップＳ２８）。次に、分割領域のｙ座標がマッチング結果画像の高さ（ｙ方向の辺）ｈｅｉ＿ｒより大きいか否かを判定し（ステップＳ２９）、大きい場合は、マッチング結果画像における全ての分割領域の相関値の最大値を求めたことを示し、処理を終了する。
【００８６】
一方、ステップＳ２７及びステップＳ２８において、分割領域がマッチング結果画像に含まれると判定された場合は、再びステップＳ２１に戻り、その分割領域内で最も高い相関値を有する点を抽出する。
【００８７】
このようにマッチング結果画像Ｒ２をテンプレート画像サイズに区切った分割領域における相関値の最大値を求めているため、ステップＳ２６において、隣接する分割領域に移る場合は、ｘ方向にｗｉｄ＿ｔだけずれるものとしたが、マッチング結果画像Ｒ２は、テンプレート画像サイズ以下のサイズであれば、任意の大きさに分割することができる。その際、分割する画像の大きさの幅（ｘ方向の辺）ｗｉｄ＿ｓｔｅｐ、高さ（ｙ方向）ｈｅｉ＿ｓｔｅｐとすると、ステップＳ２６及びステップＳ２８において、夫々ｘ方向にｗｉｄ＿ｓｔｅｐ、又はｙ方向にｈｅｉ＿ｓｔｅｐ移動することにより、次の分割領域に進むことができる。
【００８８】
図８は、テンプレートマッチング部において、ウィンドウ画像Ｗ２から顔候補として検出された点を示す図である。図８において、白で示す点が、図６に示すマッチング結果画像Ｒ２から顔候補として抽出された点である。比較として、図９は、マッチング結果画像Ｒ２において、閾値以上である点を全て顔候補として抽出した例を示す図である。図９に示す図と比較すると、本適用例において、テンプレートマッチング部にて顔候補として抽出される点が飛躍的に少なくなっているのがわかる。これにより、後段の処理における計算量を飛躍的に削減することができる。また、第１の実施の形態と同様に、テンプレート画像マッチン部において顔候補が抽出されなかった場合、又は顔候補として抽出される点が所定の閾値未満であるときは、図５（ｃ）に示すように、例えばテンプレート画像Ｔ２_１を１８０°回転させたテンプレート画像Ｔ２_２とする等、適宜テンプレート画像を回転させ、再度マッチングを行うことができる。更に、第２の実施の形態と同様に、姿勢情報からテンプレート画像の回転角を推定し、この推定した回転角のテンプレート画像を使用してマッチングを行うようにしてもよい。
【００８９】
このように、ウィンドウ画像について、平均的な顔画像を表すテンプレート画像を用いてマッチングをとって大まかに顔画像であるか否かを判断する際に、テンプレートマッチング結果画像を所定のサイズに仕切り、相関値の最大値を顔候補として抽出して明らかに顔画像でないウィンドウ画像を除去することにより、本来顔である領域を見逃すことなく、後段の顔検出処理に要する演算量及び時間を減少させることができ、かくしてリアルタイム性を格段と向上した顔検出装置及びこれを搭載したロボット装置を実現することができる。
【００９０】
また、相関値が最大となる点と共にその周囲においても顔検索範囲とすることにより、顔検出精度を向上することができる。更に、所定の閾値以上の肌色占有率又は顔の色占有率を有する場合のみ、顔検索範囲として設定することにより、顔検出精度を保ちつつ顔候補を減らして後段の演算量を減らすことができる。更にまた、テンプレート画像のサイズを適宜切り替えることにより、更に演算量を減らすことができる。
【００９１】
（５）ロボット装置の構成
次に、上述の第１及び第２の実施の形態におけるような顔検出モジュールを有するロボット装置について説明する。先ず、ロボット装置の構成について説明する。
【００９２】
図１０に示すように、本実施の形態におけるロボット装置１は、周囲環境（或いは外部刺激）や内部状態に応じて自律行動をする自律型のロボット装置であり、「犬」等の動物を模した形状のいわゆるペット型ロボットとされ、胴体部ユニット２の前後左右にそれぞれ脚部ユニット３Ａ，３Ｂ，３Ｃ，３Ｄが連結されると共に、胴体部ユニット２の前端部に頭部ユニット４が連結されて構成されている。
【００９３】
胴体部ユニット２には、図１１に示すように、ＣＰＵ（Ｃｅｎｔｒａｌ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ）１０、ＤＲＡＭ（Ｄｙｎａｍｉｃ　Ｒａｎｄｏｍ　Ａｃｃｅｓｓ　Ｍｅｍｏｒｙ）１１、フラッシュＲＯＭ（Ｒｅａｄ　Ｏｎｌｙ　Ｍｅｍｏｒｙ）１２、ＰＣ（Ｐｅｒｓｏｎａｌ　Ｃｏｍｐｕｔｅｒ）カードインターフェイス回路１３及び信号処理回路１４が内部バス１５を介して相互に接続されることにより形成されたコントロール部１６と、このロボット装置１の動力源としてのバッテリ１７とが収納されている。また、胴体部ユニット２には、ロボット装置１の向きや動きの加速度を検出するための角速度センサ１８及び加速度センサ１９が収納されている。また、胴体部ユニット２には、鳴き声等の音声又はメロディを出力するためのスピーカ２０が、図１０に示すように所定位置に配置されている。また、胴体部ユニット２の尻尾部５には、使用者からの操作入力を検出する検出機構としての操作スイッチ２１が備えられている。操作スイッチ２１は、使用者による操作の種類を検出できるスイッチであって、ロボット装置１は、操作スイッチ２１によって検出される操作の種類に応じて、例えば「誉められた」か「叱られた」かを認識する。
【００９４】
頭部ユニット４には、ロボット装置１の「目」に相当し、外部の状況や対象物の色、形、動き等を撮像するためのＣＣＤ（Ｃｈａｒｇｅ　Ｃｏｕｐｌｅｄ　Ｄｅｖｉｃｅ）カメラ２２と、前方に位置する対象物までの距離を測定するための距離センサ２３と、ロボット装置１の左右の「耳」に相当し、外部音を集音するためのマイクロホン２４と、例えばＬＥＤ（Ｌｉｇｈｔ　Ｅｍｉｔｔｉｎｇ　Ｄｉｏｄｅ）を備えた発光部２５等が、図１０に示すように所定位置にそれぞれ配置されている。ただし、発光部２５は、構成の説明等においては、必要に応じてＬＥＤ２５と示す。また、頭部ユニット４内部には、図１０には図示しないが、ユーザの頭部ユニット４に対する接触を間接的に検出するための検出機構として頭部スイッチ２６が備えられている。頭部スイッチ２６は、例えば、使用者の接触によって頭部が動かされた場合、その傾き方向を検出できるスイッチであって、ロボット装置１は、頭部スイッチ２６によって検出される頭部の傾き方向に応じて、「誉められた」か「叱られた」かを認識する。
【００９５】
各脚部ユニット３Ａ〜３Ｄの関節部分、各脚部ユニット３Ａ〜３Ｄと胴体部ユニット２との連結部分、頭部ユニット４と胴体部ユニット２との連結部分には、自由度数分のアクチュエータ２８_１〜２８_ｎ及びポテンショメータ２９_１〜２９_ｎがそれぞれ配設されている。アクチュエータ２８_１〜２８_ｎは、例えば、サーボモータを備えている。サーボモータの駆動により、脚部ユニット３Ａ〜３Ｄが制御されて目標の姿勢、或いは動作に遷移する。各脚部ユニット３Ａ〜３Ｄの先端の「肉球」に相当する位置には、主としてユーザからの接触を検出する検出機構としての肉球スイッチ２７Ａ〜２７Ｄが設けられ、ユーザによる接触等を検出できるようになっている。
【００９６】
ロボット装置１は、この他にも、ここでは図示しないが、該ロボット装置１の内部状態とは別の動作状態（動作モード）を表すための発光部や、充電中、起動中、起動停止等、内部電源の状態を表す状態ランプ等を、適切な箇所に適宜備えていてもよい。
【００９７】
そして、ロボット装置１において、操作スイッチ２１、頭部スイッチ２６及び肉球スイッチ２７等の各種スイッチと、角速度センサ１８、加速度センサ１９、距離センサ２３等の各種センサと、スピーカ２０、マイク２４、発光部２５、各アクチュエータ２８_１〜２８_ｎ、各ポテンショメータ２９_１〜２９_ｎは、それぞれ対応するハブ３０_１〜３０_ｎを介してコントロール部１６の信号処理回路１４と接続されている。一方、ＣＣＤカメラ２２及びバッテリ１７は、それぞれ信号処理回路１４と直接接続されている。
【００９８】
信号処理回路１４は、上述の各種スイッチから供給されるスイッチデータ、各種センサから供給されるセンサデータ、画像データ及び音声データを順次取り込み、これらをそれぞれ内部バス１５を介してＤＲＡＭ１１内の所定位置に順次格納する。また信号処理回路１４は、これとともにバッテリ１７から供給されるバッテリ残量を表すバッテリ残量データを順次取り込み、ＤＲＡＭ１１内の所定位置に格納する。
【００９９】
このようにしてＤＲＡＭ１１に格納された各スイッチデータ、各センサデータ、画像データ、音声データ及びバッテリ残量データは、ＣＰＵ１０が当該ロボット装置１の動作制御を行う際に使用される。
【０１００】
ＣＰＵ１０は、ロボット装置１の電源が投入された初期時において、フラッシュＲＯＭ１２に格納された制御プログラムを読み出して、ＤＲＡＭ１１に格納する。又は、ＣＰＵ１０は、図１０に図示しない胴体部ユニット２のＰＣカードスロットに装着された半導体メモリ装置、例えば、メモリカード３１に格納された制御プログラムをＰＣカードインターフェイス回路１３を介して読み出してＤＲＡＭ１１に格納する。
【０１０１】
ＣＰＵ１０は、上述のように信号処理回路１４よりＤＲＡＭ１１に順次格納される各センサデータ、画像データ、音声データ、及びバッテリ残量データに基づいて自己及び周囲の状況や、使用者からの指示及び働きかけの有無を判断している。
【０１０２】
さらに、ＣＰＵ１０は、この判断結果及びＤＲＡＭ１１に格納した制御プログラムに基づいて続く行動を決定すると共に、当該決定結果に基づいて必要なアクチュエータ２８_１〜２８_ｎを駆動させることにより、頭部ユニット４を上下左右に振らせたり、各脚部ユニット３Ａ〜３Ｄを駆動させて歩行させるなどの行動を行わせる。
【０１０３】
また、この際ＣＰＵ１０は、必要に応じて音声データを生成し、これを信号処理回路１４を介して音声信号としてスピーカ２０に与えることにより当該音声信号に基づく音声を外部に出力させたり、上述の発光部２５におけるＬＥＤの点灯及び消灯を指示する信号を生成し、発光部２５を点灯したり消灯したりする。
【０１０４】
このようにしてこのロボット装置１においては、自己及び周囲の状況や、使用者からの指示及び働きかけに応じて自律的に行動し得るようになされている。
【０１０５】
（６）制御プログラムのソフトウェア構成
ここで、ロボット装置１における上述の制御プログラムのソフトウェア構成は、図１２に示すようになる。この図１２において、デバイス・ドライバ・レイヤ４０は、この制御プログラムの最下位層に位置し、複数のデバイス・ドライバからなるデバイス・ドライバ・セット４１から構成されている。この場合、各デバイス・ドライバは、ＣＣＤカメラ２２（図１１）やタイマ等の通常のコンピュータで用いられるハードウェアに直接アクセスすることを許されたオブジェクトであり、対応するハードウェアからの割り込みを受けて処理を行う。
【０１０６】
また、ロボティック・サーバ・オブジェクト４２は、デバイス・ドライバ・レイヤ４０の最下位層に位置し、例えば上述の各種センサやアクチュエータ２８_１〜２８_ｎ等のハードウェアにアクセスするためのインターフェースを提供するソフトウェア群でなるバーチャル・ロボット４３と、電源の切換えなどを管理するソフトウェア群でなるパワーマネージャ４４と、他の種々のデバイス・ドライバを管理するソフトウェア群でなるデバイス・ドライバ・マネージャ４５と、ロボット装置１の機構を管理するソフトウェア群でなるデザインド・ロボット４６とから構成されている。
【０１０７】
マネージャ・オブジェクト４７は、オブジェクト・マネージャ４８及びサービス・マネージャ４９から構成されている。オブジェクト・マネージャ４８は、ロボティック・サーバ・オブジェクト４２、ミドル・ウェア・レイヤ５０、及びアプリケーション・レイヤ５１に含まれる各ソフトウェア群の起動や終了を管理するソフトウェア群であり、サービス・マネージャ４９は、メモリカード３１（図１１）に格納されたコネクションファイルに記述されている各オブジェクト間の接続情報に基づいて各オブジェクトの接続を管理するソフトウェア群である。
【０１０８】
ミドル・ウェア・レイヤ５０は、ロボティック・サーバ・オブジェクト４２の上位層に位置し、画像処理や音声処理などのこのロボット装置１の基本的な機能を提供するソフトウェア群から構成されている。また、アプリケーション・レイヤ５１は、ミドル・ウェア・レイヤ５０の上位層に位置し、当該ミドル・ウェア・レイヤ５０を構成する各ソフトウェア群によって処理された処理結果に基づいてロボット装置１の行動を決定するためのソフトウェア群から構成されている。
【０１０９】
なお、ミドル・ウェア・レイヤ５０及びアプリケーション・レイヤ５１の具体なソフトウェア構成をそれぞれ図１３に示す。
【０１１０】
ミドル・ウェア・レイヤ５０は、図１３に示すように、騒音検出用、温度検出用、明るさ検出用、音階認識用、距離検出用、姿勢検出用、接触検出用、操作入力検出用、動き検出用及び色認識用の各信号処理モジュール６０〜６９並びに入力セマンティクスコンバータモジュール７０などを有する認識系７１と、出力セマンティクスコンバータモジュール７９並びに姿勢管理用、トラッキング用、モーション再生用、歩行用、転倒復帰用、ＬＥＤ点灯用及び音再生用の各信号処理モジュール７２〜７８などを有する出力系８０とから構成されている。
【０１１１】
認識系７１の各信号処理モジュール６０〜６９は、ロボティック・サーバ・オブジェクト４２のバーチャル・ロボット４３によりＤＲＡＭ１１（図１１）から読み出される各センサデータや画像データ及び音声データのうちの対応するデータを取り込み、当該データに基づいて所定の処理を施して、処理結果を入力セマンティクスコンバータモジュール７０に与える。ここで、例えば、バーチャル・ロボット４３は、所定の通信規約によって、信号の授受或いは変換をする部分として構成されている。
【０１１２】
入力セマンティクスコンバータモジュール７０は、これら各信号処理モジュール６０〜６９から与えられる処理結果に基づいて、「うるさい」、「暑い」、「明るい」、「ドミソの音階が聞こえた」、「障害物を検出した」、「転倒を検出した」、「叱られた」、「誉められた」、「動く物体を検出した」又は「ボールを検出した」などの自己及び周囲の状況や、使用者からの指令及び働きかけを認識し、認識結果をアプリケーション・レイヤ５１（図１１）に出力する。
【０１１３】
アプリケーション・レイヤ５ｌは、図１４に示すように、行動モデルライブラリ９０、行動切換モジュール９１、学習モジュール９２、感情モデル９３及び本能モデル９４の５つのモジュールから構成されている。
【０１１４】
行動モデルライブラリ９０には、図１５に示すように、「バッテリ残量が少なくなった場合」、「転倒復帰する」、「障害物を回避する場合」、「感情を表現する場合」、「ボールを検出した場合」などの予め選択されたいくつかの条件項目にそれぞれ対応させて、それぞれ独立した行動モデル９０_１〜９０_ｎが設けられている。
【０１１５】
そして、これら行動モデル９０_１〜９０_ｎは、それぞれ入力セマンティクスコンバータモジュール７１から認識結果が与えられたときや、最後の認識結果が与えられてから一定時間が経過したときなどに、必要に応じて後述のように感情モデル９３に保持されている対応する情動のパラメータ値や、本能モデル９４に保持されている対応する欲求のパラメータ値を参照しながら続く行動をそれぞれ決定し、決定結果を行動切換モジュール９１に出力する。
【０１１６】
なお、この実施の形態の場合、各行動モデル９０_１〜９０_ｎは、次の行動を決定する手法として、図１６に示すような１つのノード（状態）ＮＯＤＥ_０〜ＮＯＤＥ_ｎから他のどのノードＮＯＤＥ_０〜ＮＯＤＥ_ｎに遷移するかを各ノードＮＯＤＥ_０〜ＮＯＤＥ_ｎに間を接続するアークＡＲＣ_１〜ＡＲＣ_ｎに対してそれぞれ設定された遷移確率Ｐ_１〜Ｐ_ｎに基づいて確率的に決定する有限確率オートマトンと呼ばれるアルゴリズムを用いる。
【０１１７】
具体的に、各行動モデル９０_１〜９０_ｎは、それぞれ自己の行動モデル９０_１〜９０_ｎを形成するノードＮＯＤＥ_０〜ＮＯＤＥ_ｎにそれぞれ対応させて、これらノードＮＯＤＥ_０〜ＮＯＤＥ_ｎごとに図１７に示すような状態遷移表１００を有している。
【０１１８】
この状態遷移表１００では、そのノードＮＯＤＥ_０〜ＮＯＤＥ_ｎにおいて遷移条件とする入力イベント（認識結果）が「入力イベント名」の列に優先順に列記され、その遷移条件についてのさらなる条件が「データ名」及び「データ範囲」の列における対応する行に記述されている。
【０１１９】
したがって、図１７の状態遷移表１００で表されるノードＮＯＤＥ_１００では、「ボールを検出（ＢＡＬＬ）」という認識結果が与えられた場合に、当該認識結果と共に与えられるそのボールの「大きさ（ＳＩＺＥ）」が「０から１０００」の範囲であることや、「障害物を検出（ＯＢＳＴＡＣＬＥ）」という認識結果が与えられた場合に、当該認識結果と共に与えられるその障害物までの「距離（ＤＩＳＴＡＮＣＥ）」が「０から１００」の範囲であることが他のノードに遷移するための条件となっている。
【０１２０】
また、このノードＮＯＤＥ_１００では、認識結果の入力がない場合においても、行動モデル９０_１〜９０_ｎが周期的に参照する感情モデル９３及び本能モデル９４にそれぞれ保持された各情動及び各欲求のパラメータ値のうち、感情モデル９３に保持された「喜び（ＪＯＹ）」、「驚き（ＳＵＲＰＲＩＳＥ）」若しくは「悲しみ（ＳＵＤＮＥＳＳ）」のいずれかのパラメータ値が「５０から１００」の範囲であるときには他のノードに遷移することができるようになっている。
【０１２１】
また、状態遷移表１００では、「他のノードヘの遷移確率」の欄における「遷移先ノード」の行にそのノードＮＯＤＥ_０〜　ＮＯＤＥ_ｎから遷移できるノード名が列記されていると共に、「入力イベント名」、「データ値」及び「データの範囲」の列に記述された全ての条件が揃ったときに遷移できる他の各ノードＮＯＤＥ_０〜ＮＯＤＥ_ｎへの遷移確率が「他のノードヘの遷移確率」の欄内の対応する箇所にそれぞれ記述され、そのノードＮＯＤＥ_０〜ＮＯＤＥ_ｎに遷移する際に出力すべき行動が「他のノードヘの遷移確率」の欄における「出力行動」の行に記述されている。なお、「他のノードヘの遷移確率」の欄における各行の確率の和は１００［％］となっている。
【０１２２】
したがって、図１７の状態遷移表１００で表されるノードＮＯＤＥ_１００では、例えば「ボールを検出（ＢＡＬＬ）」し、そのボールの「ＳＩＺＥ（大きさ）」が「０から１０００」の範囲であるという認識結果が与えられた場合には、「３０［％］」の確率で「ノードＮＯＤＥ_１２０（ｎｏｄｅ　１２０）」に遷移でき、そのとき「ＡＣＴＩＯＮ１」の行動が出力されることとなる。
【０１２３】
各行動モデル９０_１〜９０_ｎは、それぞれこのような状態遷移表１００として記述されたノードＮＯＤＥ_０〜ＮＯＤＥ_ｎがいくつも繋がるようにして構成されており、入力セマンティクスコンバータモジュール７１から認識結果が与えられたときなどに、対応するノードＮＯＤＥ_０〜ＮＯＤＥ_ｎの状態遷移表を利用して確率的に次の行動を決定し、決定結果を行動切換モジュール９１に出力するようになされている。
【０１２４】
図１４に示す行動切換モジュール９１は、行動モデルライブラリ９０の各行動モデル９０_１〜９０_ｎからそれぞれ出力される行動のうち、予め定められた優先順位の高い行動モデル９０_１〜９０_ｎから出力された行動を選択し、当該行動を実行すべき旨のコマンド（以下、これを行動コマンドという。）をミドル・ウェア・レイヤ５０の出力セマンティクスコンバータモジュール７９に送出する。なお、この実施の形態においては、図１５において下側に表記された行動モデル９０_１〜９０_ｎほど優先順位が高く設定されている。
【０１２５】
また、行動切換モジュール９１は、行動完了後に出力セマンティクスコンバータモジュール７９から与えられる行動完了情報に基づいて、その行動が完了したことを学習モジュール９２、感情モデル９３及び本能モデル９４に通知する。
【０１２６】
一方、学習モジュール９２は、入力セマンティクスコンバータモジュール７１から与えられる認識結果のうち、「叱られた」や「誉められた」など、使用者からの働きかけとして受けた教示の認識結果を入力する。そして、学習モジュール９２は、この認識結果及び行動切換モジュール９１からの通知に基づいて、「叱られた」ときにはその行動の発現確率を低下させ、「誉められた」ときにはその行動の発現確率を上昇させるように、行動モデルライブラリ９０における対応する行動モデル９０_１〜９０_ｎの対応する遷移確率を変更する。
【０１２７】
他方、感情モデル９３は、「喜び（ｊｏｙ）」、「悲しみ（ｓａｄｎｅｓｓ）」、「怒り（ａｎｇｅｒ）」、「驚き（ｓｕｒｐｒｉｓｅ）」、「嫌悪（ｄｉｓｇｕｓｔ）」及び「恐れ（ｆｅａｒ）」の合計６つの情動について、各情動ごとにその情動の強さを表すパラメータを保持している。そして、感情モデル９３は、これら各情動のパラメータ値を、それぞれ入力セマンティクスコンバータモジュール７１から与えられる「叱られた」及び「誉められた」などの特定の認識結果と、経過時間及び行動切換モジュール９１からの通知などに基づいて周期的に更新する。
【０１２８】
具体的には、感情モデル９３は、入力セマンティクスコンバータモジュール７１から与えられる認識結果と、そのときのロボット装置１の行動と、前回更新してからの経過時間などに基づいて所定の演算式により算出されるそのときのその情動の変動量を△Ｅ［ｔ］、現在のその情動のパラメータ値をＥ［ｔ］、その情動の感度を表す係数をｋ_ｅとして、（１）式によって次の周期におけるその情動のパラメータ値Ｅ［ｔ＋１］を算出し、これを現在のその情動のパラメータ値Ｅ［ｔ］と置き換えるようにしてその情動のパラメータ値を更新する。また、感情モデル７３は、これと同様にして全ての情動のパラメータ値を更新する。
【０１２９】
【数１】

【０１３０】
なお、各認識結果や出力セマンティクスコンバータモジュール７９からの通知が各情動のパラメータ値の変動量△Ｅ［ｔ］にどの程度の影響を与えるかは予め決められており、例えば「叩かれた」といった認識結果は「怒り」の情動のパラメータ値の変動量△Ｅ［ｔ］に大きな影響を与え、「撫でられた」といった認識結果は「喜び」の情動のパラメータ値の変動量△Ｅ［ｔ］に大きな影響を与えるようになっている。
【０１３１】
ここで、出力セマンティクスコンバータモジュール７９からの通知とは、いわゆる行動のフィードバック情報（行動完了情報）であり、行動の出現結果の情報であり、感情モデル９３は、このような情報によっても感情を変化させる。これは、例えば、「吠える」といった行動により怒りの感情レベルが下がるといったようなことである。なお、出力セマンティクスコンバータモジュール７９からの通知は、上述した学習モジュール９２にも入力されており、学習モジュール９２は、その通知に基づいて行動モデル９０_１〜９０_ｎの対応する遷移確率を変更する。
【０１３２】
なお、行動結果のフィードバックは、行動切換モジュール９１の出力（感情が付加された行動）によりなされるものであってもよい。
【０１３３】
一方、本能モデル９４は、「運動欲（ｅｘｅｒｃｉｓｅ）」、「愛情欲（ａｆｆｅｃｔｉｏｎ）」、「食欲（ａｐｐｅｔｉｔｅ）」及び「好奇心（ｃｕｒｉｏｓｉｔｙ）」の互いに独立した４つの欲求について、これら欲求ごとにその欲求の強さを表すパラメータを保持している。そして、本能モデル９４は、これらの欲求のパラメータ値を、それぞれ入力セマンティクスコンバータモジュール７１から与えられる認識結果や、経過時間及び行動切換モジュール９１からの通知などに基づいて周期的に更新する。
【０１３４】
具体的には、本能モデル９４は、「運動欲」、「愛情欲」及び「好奇心」については、認識結果、経過時間及び出力セマンティクスコンバータモジュール６８からの通知などに基づいて所定の演算式により算出されるそのときのその欲求の変動量をΔＩ［ｋ］、現在のその欲求のパラメータ値をＩ［ｋ］、その欲求の感度を表す係数ｋ_ｉとして、所定周期で（２）式を用いて次の周期におけるその欲求のパラメータ値Ｉ［ｋ＋１］を算出し、この演算結果を現在のその欲求のパラメータ値Ｉ［ｋ］と置き換えるようにしてその欲求のパラメータ値を更新する。また、本能モデル９４は、これと同様にして「食欲」を除く各欲求のパラメータ値を更新する。
【０１３５】
【数２】

【０１３６】
なお、認識結果及び出力セマンティクスコンバータモジュール７９からの通知などが各欲求のパラメータ値の変動量△Ｉ［ｋ］にどの程度の影響を与えるかは予め決められており、例えば出力セマンティクスコンバータモジュール７９からの通知は、「疲れ」のパラメータ値の変動量△Ｉ［ｋ］に大きな影響を与えるようになっている。
【０１３７】
なお、本実施の形態においては、各情動及び各欲求（本能）のパラメータ値がそれぞれ０から１００までの範囲で変動するように規制されており、また係数ｋ_ｅ、ｋ_ｉの値も各情動及び各欲求ごとに個別に設定されている。
【０１３８】
一方、ミドル・ウェア・レイヤ５０の出力セマンティクスコンバータモジュール７９は、図１３に示すように、上述のようにしてアプリケーション・レイヤ５１の行動切換モジュール９１から与えられる「前進」、「喜ぶ」、「鳴く」又は「トラッキング（ボールを追いかける）」といった抽象的な行動コマンドを出力系８０の対応する信号処理モジュール７２〜７８に与える。
【０１３９】
そしてこれら信号処理モジュール７２〜７８は、行動コマンドが与えられると当該行動コマンドに基づいて、その行動を行うために対応するアクチュエータ２８_１〜２８_ｎ（図１１）に与えるべきサーボ指令値や、スピーカ２０（図１１）から出力する音の音声データ及び／又は発光部２５（図１１）のＬＥＤに与える駆動データを生成し、これらのデータをロボティック・サーバ・オブジェクト４２のバーチャル・ロボット４３及び信号処理回路１４（図１１）を順次介して対応するアクチュエータ２８_１〜２８_ｎ、スピーカ２０又は発光部２５に順次送出する。
【０１４０】
このようにしてロボット装置１においては、制御プログラムに基づいて、自己（内部）及び周囲（外部）の状況や、使用者からの指示及び働きかけに応じた自律的な行動を行うことができるようになされている。
【０１４１】
このようなロボット装置１において、上述した顔検出処理は、ミドル・ウェア・レイヤ５０の顔検出モジュール３３において行うことができる。図１８は、図１１乃至図１５に示すロボット装置において、顔検出により、その行動を制御するために必要な構成部分を取り出して示すブロック図である。
【０１４２】
上述したように、ＣＣＤカメラ２２により撮像された画像データは、ＤＲＡＭ１１の所定の場所に格納され、ロボティック・サーバ・オブジェクト４２におけるバーチャル・ロボット４３に供給される。このバーチャル・ロボット４３は、画像データをＤＲＡＭ１１から読み出し、ミドル・ウェア・レイヤ５０における顔検出モジュール３３に供給する。顔検出モジュールでは、上述の第１及び第２の実施の形態において説明したような顔検出処理がなされ、その処理結果がアプリケーション・レイヤ５１における行動モデルライブラリ９０に供給されることにより、その処理結果がロボット装置の行動に反映される。
【０１４３】
即ち、行動モデルライブラリ９０は、必要に応じて情動のパラメータ値や欲求のパラメータ値を参照しながら続く行動を決定し、決定結果を行動切換モジュール９１に与える。そして、行動切換モジュール９１は、当該決定結果に基づく行動コマンドをミドル・ウェア・レイヤ５０の出力系８０における歩行モジュール７５に送出する。
【０１４４】
歩行モジュール７５は、行動コマンドが与えられると当該行動コマンドに基づいて、その行動を行うために対応するアクチュエータ２８_１〜２８_ｎに与えるべきサーボ指令値を生成し、このデータをロボティック・サーバ・オブジェクト４２のバーチャル・ロボット４３及び信号処理回路１４（図２）を順次介して対応するアクチュエータ２８_１〜２８_ｎに順次送出する。この結果、ロボット装置１の行動が制御され、例えば対象物に対して近づく等の行動を発現する。
【０１４５】
例えば、ロボット装置１は、顔検出モジュール３３の顔検出処理によって検出された顔画像の大きさ、方向等に基づき、検出された顔を有する対象物の方向を見たり、対象物に近くづくように移動することができる。そして、ロボット装置１は、検出した顔画像が正面顔であって、距離センサ２３からの距離データから対象物に対して所定の範囲内に近づいたと判定した場合、移動を開始して所定の距離進んだ場合、又は接触検出モジュール６６により接触が検出された場合等に、移動を停止するよう制御することができる。
【０１４６】
また、例えば、図１１のＣＣＤカメラ２２により取得された画像データを基に、動体検出を行う動体検出手段を設け、動体検出手段により検出された動体の位置に基づく動体位置方向、又は、図１１のマイククロホン２４等の音声検出手段により、音声データを取得し、この音声データから音源方向を推定する音源方向推定手段を設け、音源方向推定手段により推定された音源方向等を、上述の顔検出結果に基づき移動制御する際に、利用するようにしてもよい。
【０１４７】
なお、上述の実施の形態においては、本発明を図１０のように構成された４足歩行型のロボット装置１に適用するようにした場合について述べたが、本発明はこれに限らず、この他種々のロボット装置及びロボット装置以外のこの他種々の装置に広く適用することができる。例えば、ロボット装置は、２足歩行であってもよく、更に、移動手段は、脚式移動方式に限定されない。
【０１４８】
また、上述の実施の形態では、ソフトウェアの構成として説明したが、これに限定されるものではなく、これに限定されるものではなく、各機能をハードウェアで構成することも可能である。
【０１４９】
【発明の効果】
以上詳細に説明したように本発明に係る顔検出装置は、入力画像から対象物の顔を検出する顔検出装置において、入力画像と平均的な顔画像を示す所定サイズのテンプレート画像との相関を求める相関算出手段と、上記相関に基づき、該入力画像に顔画像が含まれるか否かを判定する判定手段と備え、上記相関算出手段は、上記判定手段により上記入力画像に顔画像が含まれないと判定された場合には、該テンプレート画像を画面に垂直な方向を軸として所定の角度回転させたテンプレート画像を使用して上記入力画像との相関を求め、上記判定手段により上記入力画像に顔画像が含まれると判定された場合には、該判定時のテンプレート画像を使用して次の入力画像との相関を求めるので、入力画像に含まれる顔画像が、正面を向いていない非正面顔であっても、入力画像から顔画像を検出することができる。
【０１５０】
また、本発明に係るプログラムは、上述した顔検出処理をコンピュータに実行させるものあり、このようなプログラムによれば、上述した顔検出処理をソフトウェアにより実現することができる。
【０１５１】
更に、本発明に係るロボット装置は、供給された入力情報に基づいて動作を行うロボット装置において、画像を撮像する撮像部と、上記撮像部から供給される入力画像から対象物の顔を検出する顔検出部とを備え、上記顔検出部は、上記入力画像と平均的な顔画像を示す所定サイズのテンプレート画像との相関を求める相関算出手段と、上記相関に基づき、該入力画像に顔画像が含まれるか否かを判定する判定手段とを有し、上記相関算出手段は、上記判定手段により上記入力画像に顔画像が含まれないと判定された場合には、該テンプレート画像を画面に垂直な方向を軸として所定の角度回転させたテンプレート画像を使用して上記入力画像との相関を求め、上記判定手段により上記入力画像に顔画像が含まれると判定された場合には、該判定時のテンプレート画像を使用して次の入力画像との相関を求めるので、顔検出対象となる人間等の対象物がロボット装置に対して正面を向いていない場合であっても、対象物の顔を検出し、対象物の方向に近づく等、顔検出結果に応じた行動を実行することができる。
【図面の簡単な説明】
【図１】本発明の第１の本実施の形態における顔検出モジュールの機能を模式的に示すブロック図である。
【図２】本実施の形態における顔検出方法を示すフローチャートである。
【図３】本発明の第２の実施の形態における顔検出方法を示すフローチャートである。
【図４】本発明の第１の適用例を示す顔検出装置の機能を模式的に示すブロック図である。
【図５】（ａ）及び（ｂ）は、夫々入力画像（ウィンドウ画像）及びテンプレート画像を示す模式図である。
【図６】入力画像（ウィンドウ画像）とテンプレート画像とから求めた相関値の集合であるマッチング結果画像を示す図である。
【図７】本発明の第１の適用例におけるテンプレートマッチング結果画像から顔候補となる画素を検出する各処理工程を示すフローチャートである。
【図８】本発明の第１の適用例における顔検出装置のテンプレートマッチング部においてマッチング結果画像から顔候補を抽出した結果を示す図である。
【図９】マッチング結果画像において、所定の閾値以上のものを顔候補として抽出した結果を示す図である。
【図１０】本発明の実施の形態におけるロボット装置の外観構成を示す斜視図である。
【図１１】同ロボット装置の回路構成を示すブロック図である。
【図１２】同ロボット装置のソフトウェア構成を示すブロック図である。
【図１３】同ロボット装置のソフトウェア構成におけるミドル・ウェア・レイヤの構成を示すブロック図である。
【図１４】同ロボット装置のソフトウェア構成におけるアプリケーション・レイヤの構成を示すブロック図である。
【図１５】同アプリケーション・レイヤの行動モデルライブラリの構成を示すブロック図である。
【図１６】同ロボット装置の行動決定のための情報となる有限確率オートマトンを説明するために使用した図である。
【図１７】有限確率オートマトンの各ノードに用意された状態遷移表を示す図である。
【図１８】図１１乃至図１５に示すロボット装置において、顔検出により、その行動を制御するために必要な構成部分を取り出して示すブロック図である。
【符号の説明】
１　ロボット装置、１０　ＣＰＵ、１１　ＤＲＡＭ、１４　信号処理回路、２２　ＣＣＤカメラ、２８_１〜２８_ｎ　アクチュエータ、３３　顔検出モジュール、４２　ロボティック・サーバ・オブジェクト、４３　バーチャル・ロボット、５０　ミドル・ウェア・レイヤ、５１　アプリケーション・レイヤ、６８　動き検出用信号処理モジュール、７０　入力セマンティクスコンバータモジュール、７１　認識系、７３　トラッキング用信号処理モジュール、７５　歩行モジュール、７９　出力セマンティクスコンバータモジュール、８０　出力系、９０　行動モデルライブラリ、９１　行動切換モジュール[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a face detection device for detecting a face of a target object from an input image, a face detection method, a robot device equipped with the face detection device for improving entertainment, and a computer for performing an operation of performing face detection. Related to programs to be executed.
[0002]
[Prior art]
A mechanical device that performs a motion similar to the motion of a human (living organism) using an electric or magnetic action is called a “robot”. Robots have begun to spread in Japan since the late 1960s, and most of them have been industrial robots (Industrial) such as manipulators and transfer robots for the purpose of automation and unmanned production work in factories.
Robot).
[0003]
In recent years, practical robots have been developed to support life as a human partner, that is, to support human activities in various situations in a living environment and other daily lives. Unlike an industrial robot, such a practical robot has the ability to learn a human being having different personalities individually or a method of adapting to various environments in various aspects of a human living environment. For example, a "pet-type" robot that simulates the body mechanism and operation of a four-legged animal such as a dog or a cat, or a body mechanism and operation of a human or the like that walks upright on two legs is designed as a model. Robotic devices such as “humanoid” or “humanoid” robots are already being put to practical use.
[0004]
These robot devices can perform, for example, various operations with an emphasis on entertainment, as compared to industrial robots, and are therefore sometimes referred to as entertainment robots. In addition, such a robot apparatus is equipped with various external sensors such as a CCD (Charge Coupled Device) camera and a microphone, and recognizes an external situation based on the output of these external sensors, and outputs information from the outside and internal information. Some operate autonomously depending on the state.
[0005]
By the way, in such an entertainment-type robot apparatus, it is necessary to detect the face of a human being during the dialogue or the face of a human who enters the field of view while moving, and perform the dialogue or operation while looking at the human face. If it is possible, it is considered that it is the most desirable from the viewpoint of naturalness as in the case where humans usually perform, and it is considered that the entertainment property as an entertainment robot device can be further improved.
[0006]
For example, Patent Literature 1 below discloses a face recognition device for recognizing a face pattern without being affected by a change in the face pattern due to the inclination and expression of the face and the expression. In the technology described in Patent Literature 1, the face recognition apparatus receives the same face image and outputs a normalized face pattern by performing conversion for removing different deformations on the face image. A plurality of inverse transform sections that can operate independently of each other, a plurality of face patterns output from the inverse transform section, and an identification section that calculates a similarity by comparing with a plurality of reference patterns of a plurality of persons prepared in advance. And a combining unit that specifies a person corresponding to the face image based on the identification result. The transformation removed by the inverse transformation unit consists of a combination of transformation elements such as face displacement, difference in the size of the face reflected in the camera, difference in the direction of the face, up and down, left and right, and difference in the face inclination. A square area cut out as a face area that does not exist is converted into an area in which rotation around an axis parallel to the screen is considered or rotation around an axis perpendicular to the screen is considered. Thereby, even when the face is tilted back and forth or the face direction is different, it can be recognized by performing the inverse conversion by the inverse conversion unit.
[0007]
[Patent Document 1]
JP-A-12-090191
[0008]
[Problems to be solved by the invention]
However, in the technology described in Patent Document 1 described above, it is necessary to provide the number of inverse conversion units corresponding to the number of deformation methods corresponding to the deformation elements assumed in advance, and the size of the device is increased. It is not suitable to be mounted on a device that has only resources. In addition, the size of the face, such as up, down, left, and right, or the size of the inclination, must be set in advance. descend. For example, in Patent Literature 1 described above, in the inverse transform unit, the degree of deformation is set to be small, based on the possibility that the degree of deformation is small and the degree of deformation is small in the preset degree of deformation. Although the person is specified by weighting the output from the inverse transform unit so that the similarity calculated by the identification unit becomes larger, if the robot device falls down when mounted on the robot device, Since the face image to be photographed greatly deviates from the standard face pattern, the recognition rate decreases. On the other hand, if the similarity calculated by the identification unit is weighted higher as the degree of deformation is larger, the original face pattern is deformed to a face closer to another person due to the deformation, and the recognition rate decreases. Would.
[0009]
The present invention has been proposed in view of such conventional circumstances, and a face detection device and a face detection method capable of detecting a non-frontal face in which the face is inclined or in a different direction with high efficiency. It is an object of the present invention to provide a detection method, a program, and a robot device having the same.
[0010]
[Means for Solving the Problems]
In order to achieve the above object, a face detection device according to the present invention includes a face detection device that detects a face of a target object from an input image, wherein the input image and a template image of a predetermined size indicating an average face image are combined. Correlation calculating means for obtaining a correlation, and determining means for determining whether or not a face image is included in the input image based on the correlation, wherein the correlation calculating means determines that the face image is included in the input image by the determining means. When it is determined that the input image is not included, a correlation with the input image is obtained by using the template image obtained by rotating the template image by a predetermined angle around a direction perpendicular to the screen, and the determination unit outputs the input image. When it is determined that a face image is included in the image, a correlation with the next input image is obtained using the template image at the time of the determination.
[0011]
In the present invention, when the determination unit determines that the face image is not included in the input image, the correlation with the input image is calculated again by using the template image obtained by rotating the template image by a predetermined angle. In addition, even when the face image included in the input image is not the front face, the face image can be detected using the template image of the front face, and when it is determined that the face image is included in the input image, Since the correlation with the next input image is obtained using the template image of the rotation angle (including 0 °), the speed of the matching process is increased.
[0012]
Further, it can be mounted on a robot device that operates based on the supplied input information, and enables face detection even in a situation where a front face cannot be photographed, for example, when the robot device falls down.
[0013]
Further, the correlation calculation means may determine the rotation angle based on a posture detection result from the posture detection means for detecting the posture of the robot device provided in the robot device. By estimating the rotation angle of the template image based on the information, the processing speed is further increased as compared with the case where the template image is sequentially rotated to detect a face.
[0014]
A face detection method according to the present invention is a face detection method of a face detection device that detects a face of a target object from an input image, wherein the correlation calculation for obtaining a correlation between the input image and a template image of a predetermined size representing an average face image is performed. And a determining step of determining whether or not the input image includes a face image based on the correlation. In the correlation calculating step, the input image includes the face image in the determining step. If it is determined that the input image is not present, a correlation with the input image is obtained using a template image obtained by rotating the template image by a predetermined angle around a direction perpendicular to the screen, and the input image is determined in the determination step. Is determined to include a face image, a correlation with the next input image is obtained using the template image at the time of the determination.
[0015]
A program according to the present invention causes a computer to execute the above-described face detection processing.
[0016]
A robot apparatus according to the present invention is a robot apparatus that performs an operation based on supplied input information, wherein an imaging unit that captures an image and face detection that detects a face of a target object from the input image supplied from the imaging unit A face calculation unit for obtaining a correlation between the input image and a template image of a predetermined size representing an average face image; and a face image included in the input image based on the correlation. Determination means for determining whether or not the template image is vertical to the screen when the determination means determines that the input image does not include a face image. A correlation with the input image is obtained by using a template image rotated by a predetermined angle around the direction, and when the determination unit determines that the input image includes a face image, Using the template image and obtains the correlation with the next input image.
[0017]
BEST MODE FOR CARRYING OUT THE INVENTION
(1) First embodiment
The face detection device according to the present embodiment can be mounted on, for example, a robot device described later. Hereinafter, a face detection device mounted on a robot device and suitable for recognizing the faces of surrounding humans will be described. The details of the configuration of the robot device will be described later. The robot apparatus includes a CCD camera, a memory for storing a frame image acquired by the CCD camera, and a face detection module having a face detection task function for detecting a human face image from the frame images stored in the memory. Have. In face detection using a template image, usually, the average face used in template matching is performed using a general one photographed from the front, so, for example, when the image is taken from the front, such as when the image is taken upside down, It is difficult to detect anything other than a face (hereinafter referred to as a non-frontal face). For example, in a robot device, if a CCD camera for acquiring an image is mounted on, for example, the face of the robot device, the CCD camera is captured when a user or the like looks down at the robot device that has fallen and turned on his back. The face image becomes a non-front face in a state opposite to the normal front face, that is, a state in which the front face is rotated by about 180 ° about a direction perpendicular to the screen as an axis. In order to enable face detection even when such a non-frontal face is photographed, in the present embodiment, if face detection is not possible using the frontal face template image, the template image Is rotated by a predetermined angle, and when a face is detected, the rotation angle is stored, and when matching the next input image, matching is performed using the template image rotated at the stored rotation angle. Is what you do.
[0018]
FIG. 1 is a block diagram schematically showing the function of the face detection module according to the first embodiment of the present invention. As shown in FIG. 1, the image detection module 300 uses a frame image obtained as a result of imaging by an imaging unit such as a CCD camera as an input image, and combines the input image with a template image of a predetermined size indicating an average face image. A template matching unit (correlation means) 301 for obtaining a correlation, a determination unit 302 for determining whether or not a face image is included in the input image based on the correlation; A face extraction unit 303 for extracting a face image.
[0019]
The input image supplied to the template matching unit 301 may be an image obtained by converting a frame image into, for example, a plurality of scales and then cutting out the image to a predetermined size in order to match the size of the face in the prepared template image. The template matching unit 301 performs matching on the input image for each scale. As the template image, for example, an average face image composed of an average of about 100 persons can be used. In the present embodiment, the template image is a front face, and the rotation angle R at this time is set to 0 °.
[0020]
The determination unit 302 determines that a face image is included in the input image when the correlation value is equal to or more than a predetermined threshold value in the template matching performed by the template matching unit 301, and the corresponding face region is determined by the face extraction unit 303. Extract.
[0021]
Here, if any of the matching results are less than the predetermined threshold in the determination unit 302, it is determined that the face indicated by the template image is not included in the input image, and the determination result is returned to the template matching unit 301. . When it is determined that the face image is not included in the input image, the matching unit 301 rotates the template image by a predetermined angle around the direction perpendicular to the screen as an axis, and uses the rotated template image after the rotation with respect to the input image. To perform template matching again. In the present embodiment, when no face image is detected, the template image is rotated by 90 ° in a predetermined direction with respect to an axis perpendicular to the screen, that is, rotation angles R = 90 °, 180 °, and 270 °. As described above, a rotated template image obtained by sequentially rotating a template image until a face is detected is used. The rotation direction may be right or left, and the rotation angle for sequentially rotating is not limited to 90 °, but may be set appropriately, for example, 180 °, 45 °, or the like.
[0022]
The determination unit 302 determines whether or not a face image is included based on the matching result between the input image and the template image after the transformation rotated at R = 90 °. Then, as described above, when the correlation value is equal to or larger than the predetermined threshold, it is determined that a face image is included. When it is determined that a face image is included, the result is input to the template matching unit 301, and the template matching unit 301 determines that the rotation angle determined to include the face image (hereinafter, referred to as R = R_found). Is stored. The template matching unit 301 stores a rotation angle when a face image is detected, and performs matching on a next input image using the template image of the rotation angle. Here, if no face image is detected, the image is rotated by 90 °, and matching is performed again using the template image with the rotation angle R = 180 °.
[0023]
The image input to the template matching unit 301 is captured, for example, every 40 msec, and once a face image is detected, the direction of the face is shortly determined until the next input image is input. Using the fact that does not change suddenly, for example, if a face image rotated by 180 ° is detected, it is better to detect the face using the template image of that rotation angle in the next input image as well. This is because the processing efficiency is much higher than when template matching is performed by sequentially rotating the front face again.
[0024]
FIG. 2 is a flowchart illustrating the face detection method according to the present embodiment. As shown in FIG. 2, when an input image is supplied to the template matching unit 301 (step S1), it is determined whether or not a rotation angle R_found exists when a face is detected in a previous input image (step S1). S2). Here, when no face is detected in the previous input image, since the rotation angle R_found does not exist, the rotation angle R of the template image is 0 °, that is, the correlation value is obtained using the template image of the front face. (Step S3). Then, based on the correlation value, the determination unit 302 determines whether the correlation value is equal to or greater than a predetermined threshold (Step S4). In step 4, when the correlation value is less than the predetermined threshold value, it is determined that the face has not been detected, and whether or not the rotation of the template image has been completed, that is, for example, if the rotation of the template image is 90 ° in the predetermined direction, the first It is determined whether or not the template image has been rotated by 270 degrees from the state (step S5). If not, the template image is rotated by a predetermined angle from the current state, that is, 90 degrees in the present embodiment (step S6). ). Then, a series of processing is repeated until it is determined in step S4 that a face is detected, or 270 ° in step S5, that is, until template matching is performed for all rotation angles.
[0025]
On the other hand, in step S2, when the face image is detected in the previous input image and the rotation angle R_found of the template image at that time is stored, the template image having the rotation angle R_found is used, and Matching is performed (step S7). Then, the process proceeds to the face detection determination in step 4 described above.
[0026]
If it is determined in step S4 that the correlation value is equal to or greater than the predetermined threshold value, and it is determined that a face has been detected, the rotation angle R = R_found at that time is stored (step S8), and the process returns to step S1 to return to the next input image. Perform processing.
[0027]
In the present embodiment configured as described above, when a face image is detected in the previous input image, the rotation angle of the template image at that time is stored, and the template image having the rotation angle is used. Then, since the matching is performed on the next input image, the matching processing speeds up. In addition, a template image of a front face is used, for example, matching is performed with a front face having a rotation angle of 0 °. If a face cannot be detected, the template image is rotated by a predetermined angle, for example, 90 °, to be a non-front face image. By repeating the operation of performing face detection using, the non-front face can be detected using only the template image of the front face, and face detection can be performed with extremely high efficiency.
[0028]
Here, in order to reduce the amount of calculation as compared with the case where calculation is performed in all directions when trying to detect a non-frontal face, as described above, a template image sequentially rotated by a predetermined angle is used. Instead of using, for example, matching may be performed only for template images having a predetermined rotation angle, such as only template images having a rotation angle R = 180 °.
[0029]
Further, in the previous matching, if the rotation angle R_found is stored, and if no face image is detected in the next input image, the rotation angle is further rotated by 90 ° in a predetermined direction from R_found. However, the processing may be started again from the rotation angle R = 0 °. Further, in the present embodiment, the rotation is made 90 ° in the predetermined direction. However, after R = 0 °, R = 180 °, or after R = 90 °, R = 270 °. For example, the rotation angle may be appropriately selected.
[0030]
Further, when the rotation angle R_found is stored, as described above, for example, when the input image is input at, for example, an interval of 40 msec, the face can be detected at the rotation angle R_found in the next input image. High in nature. At this time, if the face cannot be detected even if the template image of the rotation angle R_found is used in the next input image, the face is likely to be detected near the rotation angle R_found, so the template with the rotation angle R_found ± α is used. Face detection may be performed using an image.
[0031]
Furthermore, when the face is not detected by the template matching, the template image is rotated. However, the rotated template rotated at, for example, R = 90 °, 180 °, and 270 ° together with the template image of R = 0 ° is used. An image may be prepared in advance.
[0032]
(2) Second embodiment
Next, a second embodiment of the present invention will be described. The second embodiment is characterized in that the posture information is supplied at the time of template matching, the rotation angle of the obtained face image is predicted based on this, and the rotation angle of the template image is selected. This is different from the first embodiment.
[0033]
That is, the robot device is provided with a posture sensor or the like for detecting the posture of the robot device, and posture information from the posture sensor is supplied to the template image matching unit. As described above, for example, when the robot device falls down, if a surrounding person looks into the robot device, an image obtained by the robot device, that is, a face image included in the input image supplied to the face detection module Is expected to be substantially 180 ° rotated from a normal frontal face with an axis perpendicular to the screen as an axis. Therefore, by supplying such posture information to the template matching unit and selecting the rotation angle of the template image, the processing speed is increased.
[0034]
FIG. 3 is a flowchart illustrating a face detection method according to the second embodiment of the present invention. As shown in FIG. 3, when the input image is supplied (step S11), it is determined whether or not the posture information is supplied (step S12). Here, when the posture information is supplied, a rotation angle R_sensor considered to be most likely is selected based on the posture information, and template matching is performed using the template image of the rotation angle R_sensor ( (Step S20) As a result, it is determined whether or not the correlation value is equal to or more than a predetermined threshold (Step S15).
[0035]
On the other hand, if the posture information has not been supplied in step S12, face detection is performed in the same manner as in the above-described first embodiment. That is, it is determined whether or not the rotation angle R_found when a face is detected in the previous input image exists (step S13). If the rotation angle R_found does not exist, the rotation angle R of the template image is R = 0 °, that is, Then, a correlation value is obtained using the template image of the front face (step S14). Then, based on the correlation value, the determination unit 12 determines whether the correlation value is equal to or greater than a predetermined threshold (step S15). When the correlation value is less than the predetermined threshold, it is determined that the face cannot be detected and the template It is determined whether the image has been rotated from the front face R = 0 ° to 270 ° (step S16). If the image has not been rotated, the template image is rotated 90 ° from the current state (step S17). Then, the process returns to step S14 to perform template matching.
[0036]
On the other hand, in step S13, when the face image is detected in the previous input image and the rotation angle R_found of the template image at that time is stored, the template image with the rotation angle R_found is used, and Matching is performed (step S18), and it is determined whether or not the correlation value is equal to or greater than a predetermined threshold (step S15).
[0037]
If it is determined in step S15 that the correlation value is equal to or greater than the predetermined threshold value and it is determined that a face has been detected, the rotation angle R = R_found at that time is stored (step S19), and the process returns to step S1 to return to the next input image. Perform processing.
[0038]
In the present embodiment configured as above, similarly to the first embodiment, when a face is detected in the previous input image, the rotation angle R_found of the template image is stored, and Since matching is performed using the template image of the rotation angle R_found, the processing speed is increased, and when posture information is input, the rotation angle of the template image is predicted based on the posture information, and the predicted rotation angle R_sensor is calculated. Since the matching is performed using the template image, the face can be detected with high efficiency and in a short time in response to a sudden operation such as when the robot device falls down.
[0039]
(3) First application example
Next, a first application example of the present invention in which face detection is performed by applying template matching as described in the first and second embodiments will be described (see Japanese Patent Application No. 2002-163622). Also in this application example, a face detection process is performed by a control unit that controls the entire robot device and an internal memory provided therein, which is provided in the robot device. In the second embodiment, when the value is equal to or more than a predetermined threshold value in the template matching, the corresponding image is extracted as a face image. However, in this application example, the image is determined to be a face image by template image matching. After that, this is set as a face candidate, and further, it is determined whether or not the face is a face by using an identification means such as a support vector machine.
[0040]
FIG. 4 is a block diagram schematically showing functions of a face detection device showing an application example of the present invention. When the processing contents of the control unit relating to the face detection task function in this application example are functionally classified, as shown in FIG. 4, the input image scale conversion unit 360, the window cutout unit 361, the template matching unit 362, and the preprocessing unit 363 , A pattern identification unit 364 and an overlap determination unit 365.
[0041]
The input image scale conversion unit 360 reads a frame image based on an image signal S1A from a CCD camera provided on the head or the like of the robot device from an internal memory, and converts the frame image into a plurality of scale images having different reduction rates. Convert. In the case of this application example, a frame image composed of 25344 (= 176 × 144) pixels is sequentially reduced by 0.8 times, and is reduced in 5 steps (1.0 times, 0.8 times, 0.64 times). , 0.51 times, 0.41 times) (hereinafter referred to as first to fifth scale images).
[0042]
Subsequently, the window cutout unit 361 sequentially assigns appropriate pixels (for example, two pixels) to the first scale image from the first to fifth scale images, starting from the upper left of the image to the lower right of the image. A rectangular area of 400 (= 20 × 20) pixels (hereinafter, this area is referred to as a window image) is sequentially cut out while scanning while shifting to the right or downward.
[0043]
At this time, the window cutout unit 361 sends the first window image among the plurality of window images cut out from the first scale image to the template matching unit 362 at the subsequent stage.
[0044]
The template matching unit 362 performs an arithmetic process such as a normalized correlation method or an error square method on the top window image obtained from the window cutout unit 361 to convert the window image into a function curve having a peak value, and then converts the function curve into a function curve having a peak value. A sufficiently low threshold value is set for the curve so that the recognition performance does not deteriorate, and it is determined whether or not the window image is a face image based on the threshold value. At this time, as described in the first and second embodiments, if no face is detected, the template image is rotated by a predetermined angle, template matching is performed again, and the face is detected in the previous input image. In this case, the rotation angle R_found is stored at that time, and the matching process is performed using the template image of the rotation angle R_found.
[0045]
Also in the case of this application example, the template matching unit 362 sets an average face image composed of an average of, for example, about 100 persons as a template image and sets a threshold value as a criterion for determining whether the face image is the face image. It has been made. Thus, the window image can be roughly matched with an average face image serving as a template image.
[0046]
In this way, the template matching unit 362 performs matching using the template image on the window image obtained from the window cutout unit 361, and when it is determined that the window image is a face image, uses the window image as a score image in the subsequent stage. If it is determined that the image is not a face image, the window image is transmitted to the subsequent overlap determination unit 365 as it is.
[0047]
At this point, the window image (score image) determined to be a face image actually contains a large number of misjudged images other than the face image, but in everyday scenes, the background image similar to the face is similar to the face image. Is rarely present, so that it is determined that most window images are not face images, which is extremely effective.
[0048]
Actually, the arithmetic processing such as the normalized correlation method and the error square method described above requires only about one tenth to one hundredth of the amount of operation when compared with the arithmetic processing in the preprocessing unit and the pattern identification unit in the subsequent stage. From experiments, it has been confirmed at this stage that 80% or more of the image other than the face image can be eliminated, which indicates that the control unit as a whole is greatly reduced in the amount of calculation.
[0049]
The pre-processing unit 363 removes, from the rectangular score image, four corner regions corresponding to background portions that are irrelevant to the human face image, from the score image obtained from the template matching unit 362, Using a mask obtained by cutting out the four corners, 360 pixels are extracted from a score image having 400 (= 20 × 20) pixels.
[0050]
Then, the pre-processing unit 363 forms a plane based on an optimal part as a face image in the extracted 360-pixel score image in order to eliminate the tilt condition of the subject represented by shading by illumination at the time of imaging. In this case, the gray value of the 360 pixels is corrected using a calculation method based on, for example, a root mean square (RSM) error.
[0051]
Subsequently, the preprocessing unit 363 performs a histogram smoothing process on the result of enhancing the contrast of the score image for the 360 pixels so that the result can be detected regardless of the gain of the CCD camera 50 or the intensity of illumination.
[0052]
Next, the preprocessing unit 363 performs a Gabor Filtering process to convert the score image for the 360 pixels into a vector, and further converts the obtained vector group into one pattern vector.
[0053]
The pattern identification unit 364 obtains a provisional identification function using learning data, ie, teacher data, supplied from the outside, and then uses the identification function for 360 pixels obtained as a pattern vector from the preprocessing unit 363. The face is detected by experimenting with the score image. Then, the detection result is output as face data. Further, the detection failure is added to the learning data as non-face data, and learning is performed again.
[0054]
Regarding the face recognition in the pattern recognition unit 364, for example, the support vector machine (Support Vector Machine: SVM) which is considered to have the highest learning generalization ability in the field of pattern recognition is used to determine whether or not the face is applicable. Can be.
[0055]
Regarding the support vector machine itself, for example, Sholkof et al. (B. Sholkof, C. Burges, A. Smola, Advance in Kernel Support Vector learning, The MIT Press, 1999.). From the results of preliminary experiments conducted by the present applicant, it has been found that the face recognition method using the support vector machine shows better results than methods using principal component analysis (PCA) or a neural network.
[0056]
Then, the pattern identification unit 364 determines whether or not face data exists in the score image with respect to the pattern vector based on the score image provided from the preprocessing unit 363. The upper left position (coordinates) and its size (the number of pixels in the vertical and horizontal directions) in the region, and the reduction rate of the scale image from which the score image is cut out with respect to the frame image (that is, the corresponding stage among the above five stages) A list is created and stored in the internal memory as list data.
[0057]
Thereafter, the pattern identification unit 364 notifies the window cutout unit 361 that the face detection of the first window image in the first scale image has been completed, so that the window cutout unit 361 outputs the first cutout image. The next window image of the one scale image is sent to the template matching unit 362.
[0058]
Then, the template matching unit 362 sets a score image only when the window image matches the template image, stores the rotation angle R_found of the template image at that time, and sends the score image to the preprocessing unit 363. The preprocessing unit 363 converts the score image into a pattern vector and sends the pattern vector to the pattern identification unit 364. The pattern identification unit 364 generates list data based on face data obtained as an identification result from the pattern vector, and stores the list data in the internal memory.
[0059]
As described above, by performing the processes of the template matching unit 362, the preprocessing unit 363, and the pattern identification unit 364 on all the window images cut out from the first scale image in the window cutout unit 361 in the scanning order, A plurality of score images including a face image existing in the imaging result can be detected from one scale image.
[0060]
Thereafter, the pattern identification unit 364 notifies the input image scale conversion unit 360 that the face detection of the first scale image has been completed, so that the input image scale conversion unit 360 sends the second scale image To the window cutout unit 361.
[0061]
The second scale image is also subjected to the same processing as the above-described first scale image, and after detecting a plurality of score images including face images present in the imaging result from the second scale image, the third scale image is obtained. Similar processing is sequentially performed for the fifth to fifth scale images. The rotation angle R_found stored in the template matching unit 362 can be used when matching the next input image with the same scale image. Further, as in the above-described second embodiment, matching may be performed using a template image of the rotation angle R_sensor estimated from the posture information.
[0062]
Thus, for the first to fifth scale images obtained by reducing the frame image, which is the captured image, in five stages, the pattern identification unit 364 detects a plurality of score images including the face images existing in the captured image, and then detects the score image. The resulting list data including the rotation angle R_found is stored in the internal memory. In this case, depending on the size of the face image in the original frame image, a score image may not be obtained at all, but if a score image is obtained with at least one or more (or two or three or more) scale images, , The face detection process is continued.
[0063]
Here, as for a plurality of score images including a face image in each scale image, since the scan in the window cutout unit 361 was performed even every two pixels, a high correlation was found between the region where the face actually exists and the region in the vicinity thereof. Therefore, adjacent score images include image regions that overlap each other.
[0064]
Then, the subsequent overlap determination unit 365 reads a plurality of list data for each of the first to fifth scale images stored in the internal memory, compares the score images included in each of the list data, and It is determined whether or not a region overlapping with is included.
[0065]
The overlap determining unit 365 removes a region where score images overlap with each other based on the determination result, so that in each scale image, a single image region in which a plurality of score images are finally collected without overlapping each other The image area is newly stored in the internal memory as face determination data.
[0066]
When the template matching unit 62 determines that the image is not a face image, the overlap determination unit 365 does nothing and does not store the image in the internal memory.
[0067]
Next, an operation in this application example will be described. In the above configuration, in this robot apparatus, after converting a frame image captured by a CCD camera into a plurality of scale images having different reduction ratios, a window image of a predetermined size is shifted by a predetermined pixel from each of the scale images. And cut out one by one while scanning.
[0068]
For this window image, by using a template image representing an average face image, matching is performed, and it is roughly determined whether or not the image is a face image. The amount of calculation and the time required for the subsequent face detection processing can be reduced accordingly. When it is determined that the face image is not a face image, matching is performed using a template image obtained by sequentially rotating the template image at a predetermined rotation angle, so that even a non-frontal face can be detected.
[0069]
Subsequently, with respect to the window image (that is, the score image) determined to be a face image by the template matching, four corner portions of the rectangular area of the score image are removed, and then, gradation correction and subsequent smoothing of contrast enhancement are performed. Conversion into one pattern vector.
[0070]
Then, face data or non-face data is determined for the pattern vector by performing face detection within the original score image, and the position (coordinates) and size (number of pixels) of the image area of the score image in which the face data exists are determined. ) And the reduction ratio of the scale image from which the score image is cut out to the frame image are generated.
[0071]
After generating the list data for all the score images for each scale image in this manner, by comparing the score images included in each of the list data, and obtaining the face determination data from which the mutually overlapping regions have been removed. The face image can be detected from the original frame image.
[0072]
Among these face detection task processes, the template matching process is particularly similar to the block matching method used in image compression and the like, in addition to being easily mountable on an arithmetic unit having a relatively simple configuration. There are many types of hardware that perform high-speed processing using a CPU. Therefore, the speed of the template matching process can be further increased.
[0073]
According to the configuration described above, in this robot apparatus, at the time of the face detection task process of detecting a face image with respect to a frame image captured by a CCD camera, each of the scale images obtained by reducing the frame image at a different reduction rate is selected. After a window image of a predetermined size is cut out one by one while scanning so as to be shifted by a predetermined pixel, it is roughly determined whether or not the image is a face image by performing matching using a template image representing an average face image. In this way, by removing the window image that is clearly not a face image, the amount of calculation and the time required for various face detection processes for the score image determined to be the face image by the template matching are reduced by that amount. The processing load on the main control unit 381 that controls the entire robot apparatus can be reduced. Rukoto can, thus with a robot apparatus can be realized which can significantly enhancing real-time, for matching with the template image is sequentially rotated, it can be detected without missing even a non-frontal face.
[0074]
(4) Second application example
In a method for extracting a face candidate by performing template matching (first step), determining a face area from the face candidates by using SVM or the like (second step), and detecting the face area, the first step In, since face candidates are determined simply at the price of the normalized correlation value, in order to reduce oversight of face candidates, a method of increasing the threshold value or a method of reducing thinning can be used. If it is lowered, the amount of calculation increases, which may not be preferable in an environment where resources such as a robot device are limited. On the other hand, when the threshold value is increased, the number of candidate images for face determination in the second step is reduced, so that the amount of calculation can be reduced. However, the image that is originally a face is also removed from the candidate image, and the face image is overlooked. May be lost. Therefore, a second application example of the present invention suitable for such a case will be described.
[0075]
When a face area (face image) having the same size as the template image exists in the input image, if the face image is correlated with the template image, the correlation value becomes largest near the template image size. Therefore, when narrowing down face area candidates, an algorithm that performs local narrowing down is used to reduce face candidate images without overlooking an image that is originally a face. The amount of calculations to be performed can be reduced. Specifically, a candidate face area is extracted based on a local maximum value of a correlation value in a matching result which is a set of correlation values obtained by performing a normalized correlation between an input image and a template image of an average face of a predetermined size. To
[0076]
That is, as shown in FIG. 5A, for example, the size in the vertical direction (length of the side in the y-axis direction, hereinafter referred to as height) hei_s × the size in the horizontal direction (cut out from an arbitrary scale image) As for the window image (input image after scale conversion) W2 of weid_s, for example, as shown in FIG. 5B, a first image having a height hei_t × width wid_s is shown. Image T2 which is an average face image having the template image size of ₁ Is used to scan the window image W2, and move the template image T2 while shifting it by a predetermined pixel (for example, one pixel). ₁ A matching result, which is a set of correlation values between the input image and the input image, is obtained. This matching result is a template image T2 ₁ , The correlation values are two-dimensionally arranged with the movement of, and as shown in FIG. 6, a template matching result image R2 of height hei_r × width wid_r representing the correlation value is obtained. Here, the height her_r of the template image rate matching result image R2 is hei_s- (hei_t + 1), and the width wid_s of the image R2 is wid_s- (wid_t + 1).
[0077]
Next, the template image rate matching result image R2 is divided into a predetermined size, for example, the same size as the first template image size, and the correlation value of each divided region partitioned into the first template image size is calculated. The point (position) having the maximum value is obtained, and among the points indicating the maximum value obtained from each of the divided areas, those having a predetermined threshold or more are extracted as face candidates.
[0078]
That is, when the normalized correlation is performed using the template image of the average face, it is not necessarily guaranteed that the correlation value of the face image is higher than that of the arbitrary pattern, but the face image having the same size as the template image is not necessarily obtained. Is present, since the correlation value takes the maximum value at a size near the template image size, the correlation value becomes the maximum value in the divided area, and a point equal to or larger than a predetermined threshold is extracted as a face candidate. The face candidates can be narrowed down more effectively as compared with the case where only those whose correlation values are equal to or greater than a predetermined threshold value are extracted as face candidates as a result of template matching.
[0079]
Here, in the second application example, a template image of an arbitrary size can be used, but by switching the template image size to be used and selecting the template image size, the Compared to the case where calculations are performed for all template image sizes that can be prepared, the amount of calculations can be reduced and the efficiency can be increased. For example, once a face is detected, the next time a face is detected, the template image size can be used. Further, for example, by using a distance sensor provided in the robot device and recognizing a distance between the target object included in the input image based on distance information from the distance sensor, the size of the face area of the target object can be improved. For example, it is possible to provide a target distance switching unit that predicts the size of the template image and selects the template image size, and the template image size can be switched according to the purpose.
[0080]
For this window image, matching is performed using a template image representing an average face image to generate a matching result image that is a set of correlation values with the template image. In this way, matching result images are generated for all the window images for each scale image in the order of scanning. Hereinafter, a process of detecting a face candidate from a matching result image will be described in detail.
[0081]
FIG. 7 is a flowchart showing each processing step of detecting pixels that are face candidates from the template matching result image R2 in the template matching unit. As shown in FIG. 7, first, when the template matching result image R2 is input, the matching result image R2 is divided into template image sizes, and one of the divided regions, for example, 0 ≦ x ≦ wid_t−1, 0 ≦ At y ≦ hei_t−1, a point (coordinate) having the highest correlation value is extracted (step S21). Hereinafter, a region obtained by dividing the matching result image R2 into the template image size is referred to as a divided region rn, and a point (coordinate) having the largest correlation value in the divided region rn is referred to as local_max (x, y). Here, the pixel having the highest correlation value is extracted in each of the divided regions. In this application example, a case will be described where the divided regions divided in the matching result image are sequentially processed line by line from left to right. .
[0082]
Next, it is determined whether or not local_max (x, y) is larger than a predetermined threshold (th1) (step S22). If it is larger, it is added as a face candidate (step S23). The face detection device in this application example has means for selecting a template image size of a face size assumed to be included in the input image along with the scale, but there are a plurality of types of template image sizes different in size. When the matching result image R2 is calculated for each of a plurality of types of template image sizes and face candidates are extracted, the same point may be extracted. Therefore, if there is the same point as the face candidate in step S23, that is, if the face candidate has already been extracted when the face candidate is extracted with a different template image size, this point is not added.
[0083]
Next, in the input image area of the template image size corresponding to the point extracted as the face candidate, the occupancy of the skin color pixels included in this image area is determined. In this application example, the skin color table 100 is referred to when calculating the occupancy of the skin color pixels. Then, it is determined whether or not the skin color pixel occupancy is greater than a predetermined threshold (th2) (step SP14). If it is larger, peripheral points around this local_max (x, y), for example, eight points in the upper, lower, left and right directions are added as face candidates (step S25). Here, similarly to step S23, if these eight neighboring points have already been extracted as face candidates, they are not added to the candidates.
[0084]
If local_max (x, y) is less than the threshold th1 in step S22, if the skin color pixel occupancy in the input image corresponding to local_max (x, y) is less than the threshold th2 in step S24, and if the face is After the addition of the candidates is completed, the process proceeds to step S26, and the process proceeds to the next divided region to extract the next face candidate, and the process proceeds.
[0085]
First, in the matching result image R2, the process moves to the next divided region shifted by the template image size in the x direction, that is, by the width_t (step S26). Next, in the divided region of x coordinate (x + wid_t) shifted by wid_t, when the x coordinate is larger than the width (side in the x direction) wid_r of the matching result image, it is determined that the divided region is not included in the matching result image. Then, the processing moves to the next line, and the processing moves to the next divided area where 0 ≦ x ≦ wid_t−1, which is shifted by the template image size in the y direction, that is, hei_t (step S28). Next, it is determined whether or not the y coordinate of the divided area is greater than the height (side in the y direction) hei_r of the matching result image (step S29). If it is larger, the correlation value of all the divided areas in the matching result image is determined. Is determined, and the process is terminated.
[0086]
On the other hand, if it is determined in steps S27 and S28 that the divided area is included in the matching result image, the process returns to step S21, and the point having the highest correlation value in the divided area is extracted.
[0087]
As described above, since the maximum value of the correlation value in the divided area obtained by dividing the matching result image R2 into the template image size is obtained, when shifting to the adjacent divided area in step S26, it is assumed that the position is shifted by wd_t in the x direction. However, the matching result image R2 can be divided into any size as long as the size is equal to or smaller than the template image size. At this time, assuming that the width (side in the x direction) width_step and the height (y direction) hei_step of the size of the image to be divided are wi_step in the x direction or hei_step in the y direction in step S26 and step S28, respectively. Thereby, it is possible to proceed to the next divided area.
[0088]
FIG. 8 is a diagram illustrating points detected as face candidates from the window image W2 by the template matching unit. In FIG. 8, the points shown in white are points extracted as face candidates from the matching result image R2 shown in FIG. As a comparison, FIG. 9 is a diagram illustrating an example in which all points that are equal to or larger than a threshold are extracted as face candidates in the matching result image R2. Comparing with the diagram shown in FIG. 9, it can be seen that in this application example, the number of points extracted as face candidates by the template matching unit is significantly reduced. As a result, the amount of calculation in the subsequent processing can be drastically reduced. Similarly to the first embodiment, when no face candidate is extracted in the template image matching unit, or when the number of points to be extracted as a face candidate is less than a predetermined threshold value, FIG. As shown, for example, the template image T2 ₁ 180 ° rotated template image T2 ₂ The matching can be performed again by rotating the template image as appropriate. Further, similarly to the second embodiment, the rotation angle of the template image may be estimated from the posture information, and matching may be performed using the template image having the estimated rotation angle.
[0089]
In this way, for the window image, when using the template image representing the average face image to perform matching to determine whether the image is a rough face image, the template matching result image is partitioned into a predetermined size, By extracting the maximum value of the correlation value as a face candidate and removing a window image that is not clearly a face image, the amount of calculation and the time required for the subsequent face detection processing can be reduced without overlooking an area that is originally a face. Thus, it is possible to realize a face detection device with much improved real-time properties and a robot device equipped with the same.
[0090]
In addition, the face detection accuracy can be improved by setting the face search range around the point where the correlation value becomes the maximum as well as around the point. Furthermore, only when the skin color occupancy or the face color occupancy is equal to or greater than a predetermined threshold, by setting the face search range, the number of face candidates can be reduced while maintaining face detection accuracy, and the amount of calculation in the subsequent stage can be reduced. . Furthermore, the amount of calculation can be further reduced by appropriately switching the size of the template image.
[0091]
(5) Robot device configuration
Next, a robot apparatus having a face detection module as in the first and second embodiments will be described. First, the configuration of the robot device will be described.
[0092]
As shown in FIG. 10, the robot apparatus 1 according to the present embodiment is an autonomous robot apparatus that performs an autonomous action according to the surrounding environment (or an external stimulus) or an internal state, and imitates an animal such as a “dog”. A leg-shaped

unit

3A, 3B, 3C, 3D is connected to the front, rear, left and right of the body unit 2, respectively, and a head unit 4 is connected to the front end of the body unit 2. It is configured.
[0093]
As shown in FIG. 11, the body unit 2 includes a CPU (Central Processing Unit) 10, a DRAM (Dynamic Random Access Memory) 11, a flash ROM (Read Only Memory) 12, a PC (Personal Computer) card interface circuit 13, and the like. A control unit 16 formed by connecting the signal processing circuits 14 via an internal bus 15 and a battery 17 as a power source of the robot device 1 are housed therein. Further, the body unit 2 houses an angular velocity sensor 18 and an acceleration sensor 19 for detecting the acceleration of the direction and the movement of the robot apparatus 1. Further, in the body unit 2, a speaker 20 for outputting a sound such as a cry or a melody is arranged at a predetermined position as shown in FIG. The tail 5 of the body unit 2 is provided with an operation switch 21 as a detection mechanism for detecting an operation input from a user. The operation switch 21 is a switch that can detect the type of operation performed by the user, and the robot device 1 is, for example, “praised” or “reprimanded” according to the type of operation detected by the operation switch 21. Recognize.
[0094]
The head unit 4 corresponds to an “eye” of the robot apparatus 1, and is located in front of a CCD (Charge Coupled Device) camera 22 for capturing an external situation or the color, shape, movement, or the like of an object. A distance sensor 23 for measuring the distance to the target, a microphone 24 for collecting external sounds corresponding to the left and right "ears" of the robot apparatus 1, and, for example, an LED (Light Emitting Diode) are provided. The light emitting unit 25 and the like are arranged at predetermined positions as shown in FIG. However, the light emitting unit 25 is referred to as an LED 25 as necessary in the description of the configuration and the like. Although not shown in FIG. 10, a head switch 26 is provided inside the head unit 4 as a detection mechanism for indirectly detecting contact of the user with the head unit 4. The head switch 26 is, for example, a switch that can detect the tilt direction when the head is moved by a user's contact, and the robot apparatus 1 detects the tilt direction of the head detected by the head switch 26. Depending on, it recognizes whether it was "praised" or "reproached."
[0095]
Actuators 28 for the number of degrees of freedom are provided at joints of the leg units 3A to 3D, connecting portions of the leg units 3A to 3D and the body unit 2, and connecting portions of the head unit 4 and the body unit 2. ₁ ~ 28 _n And potentiometer 29 ₁ ~ 29 _n Are arranged respectively. Actuator 28 ₁ ~ 28 _n Has, for example, a servomotor. By driving the servomotor, the leg units 3A to 3D are controlled to transition to a target posture or operation. At positions corresponding to the "paws" at the tips of the leg units 3A to 3D, paws switches 27A to 27D are provided as detection mechanisms mainly for detecting contact from the user, and can detect contact and the like by the user. It has become.
[0096]
Although not shown here, the robot apparatus 1 also includes a light emitting unit for indicating an operation state (operation mode) different from the internal state of the robot apparatus 1, a charging operation, a starting operation, a start / stop operation, and the like. And a status lamp indicating the state of the internal power supply may be appropriately provided at an appropriate place.
[0097]
Then, in the robot apparatus 1, various switches such as the operation switch 21, the head switch 26, and the pad switch 27, various sensors such as the angular velocity sensor 18, the acceleration sensor 19, and the distance sensor 23, the speaker 20, the microphone 24, and the light emission Part 25, each actuator 28 ₁ ~ 28 _n , Each potentiometer 29 ₁ ~ 29 _n Are the corresponding hubs 30 ₁ ~ 30 _n Is connected to the signal processing circuit 14 of the control unit 16 via the. On the other hand, the CCD camera 22 and the battery 17 are directly connected to the signal processing circuit 14, respectively.
[0098]
The signal processing circuit 14 sequentially receives the switch data supplied from the various switches described above, the sensor data supplied from the various sensors, the image data, and the audio data, and stores them at predetermined positions in the DRAM 11 via the internal bus 15. Store sequentially. Further, the signal processing circuit 14 sequentially takes in the remaining battery level data indicating the remaining battery level supplied from the battery 17 and stores the data in a predetermined position in the DRAM 11.
[0099]
The switch data, sensor data, image data, audio data, and remaining battery data stored in the DRAM 11 in this manner are used when the CPU 10 controls the operation of the robot device 1.
[0100]
The CPU 10 reads the control program stored in the flash ROM 12 and stores the control program in the DRAM 11 at the initial stage when the power of the robot apparatus 1 is turned on. Alternatively, the CPU 10 reads out a control program stored in a semiconductor memory device, for example, a memory card 31 mounted on a PC card slot of the body unit 2 not shown in FIG. Store.
[0101]
The CPU 10, based on the sensor data, image data, audio data, and remaining battery data sequentially stored in the DRAM 11 from the signal processing circuit 14 as described above, the situation of itself and the surroundings, and instructions and actions from the user. Is determined.
[0102]
Further, the CPU 10 determines the subsequent action based on the result of the determination and the control program stored in the DRAM 11, and determines the necessary actuator 28 based on the determined result. ₁ ~ 28 _n , The head unit 4 is swung up and down, left and right, and the leg units 3A to 3D are driven to walk.
[0103]
Further, at this time, the CPU 10 generates audio data as necessary, and supplies the generated audio data to the speaker 20 as an audio signal via the signal processing circuit 14 so that the audio based on the audio signal is output to the outside. A signal for instructing lighting and extinguishing of the LED in the light emitting unit 25 is generated, and the light emitting unit 25 is turned on and off.
[0104]
In this way, the robot apparatus 1 can autonomously act according to the situation of itself and the surroundings, and instructions and actions from the user.
[0105]
(6) Software configuration of control program
Here, the software configuration of the above-described control program in the robot device 1 is as shown in FIG. In FIG. 12, the device driver layer 40 is located at the lowest layer of the control program, and includes a device driver set 41 including a plurality of device drivers. In this case, each device driver is an object permitted to directly access hardware used in a normal computer, such as the CCD camera 22 (FIG. 11) and a timer, and receives an interrupt from the corresponding hardware. Perform processing.
[0106]
The robotic server object 42 is located at the lowest layer of the device driver layer 40, and includes, for example, the various sensors and actuators 28 described above. ₁ ~ 28 _n A virtual robot 43, which is a group of software that provides an interface for accessing hardware such as a virtual machine, a power manager 44, which is a group of software that manages switching of power supplies, and software that manages various other device drivers It comprises a group of device driver managers 45 and a designed robot 46 which is a group of software for managing the mechanism of the robot apparatus 1.
[0107]
The manager object 47 includes an object manager 48 and a service manager 49. The object manager 48 is a software group that manages activation and termination of each software group included in the robotic server object 42, the middleware layer 50, and the application layer 51, and the service manager 49 This is a group of software that manages the connection between the objects based on the connection information between the objects described in the connection file stored in the memory card 31 (FIG. 11).
[0108]
The middleware layer 50 is located on the upper layer of the robotic server object 42 and is composed of a software group that provides basic functions of the robot device 1 such as image processing and sound processing. Further, the application layer 51 is located on the upper layer of the middleware layer 50, and determines the action of the robot device 1 based on the processing result processed by each software group constituting the middleware layer 50. It consists of a group of software for performing
[0109]
FIG. 13 shows specific software configurations of the middleware layer 50 and the application layer 51.
[0110]
As shown in FIG. 13, the middle wear layer 50 includes noise detection, temperature detection, brightness detection, scale recognition, distance detection, posture detection, contact detection, operation input detection, and motion detection. A recognition system 71 having signal processing modules 60 to 69 for detection and color recognition and an input semantics converter module 70, and an output semantics converter module 79 and posture management, tracking, motion reproduction, walking, and fall recovery And an output system 80 having respective signal processing modules 72 to 78 for lighting, LED lighting, and sound reproduction.
[0111]
Each of the signal processing modules 60 to 69 of the recognition system 71 converts each of the sensor data, image data, and sound data read from the DRAM 11 (FIG. 11) by the virtual robot 43 of the robotic server object 42. The input semantics converter module 70 takes in the data, performs predetermined processing based on the data, and provides a processing result to the input semantics converter module 70. Here, for example, the virtual robot 43 is configured as a part that exchanges or converts signals according to a predetermined communication protocol.
[0112]
The input semantics converter module 70 detects “noisy”, “hot”, “bright”, “sounds of domiso are heard”, and “obstacle” based on the processing results given from each of the signal processing modules 60 to 69. The situation of self and surroundings, such as "has detected", "detected a fall", "reprimanded", "praised", "detected a moving object", or "detected a ball", or a command from the user And the action is recognized, and the recognition result is output to the application layer 51 (FIG. 11).
[0113]
As shown in FIG. 14, the application layer 51 is composed of five modules: a behavior model library 90, a behavior switching module 91, a learning module 92, an emotion model 93, and an instinct model 94.
[0114]
As shown in FIG. 15, the behavior model library 90 includes “when the remaining battery power is low”, “returns from the fall”, “when avoids obstacles”, “when expressing emotions”, Are respectively associated with several pre-selected condition items such as "if the ₁ ~ 90 _n Is provided.
[0115]
And these behavior models 90 ₁ ~ 90 _n Are stored in the emotion model 93 as described later, as necessary, when a recognition result is provided from the input semantics converter module 71 or when a certain time has elapsed since the last recognition result was provided. The subsequent action is determined with reference to the corresponding emotion parameter value and the corresponding desire parameter value held in the instinct model 94, and the determination result is output to the action switching module 91.
[0116]
In this embodiment, each behavior model 90 ₁ ~ 90 _n Is a one node (state) NODE as shown in FIG. ₀ ~ NODE _n From any other node NODE ₀ ~ NODE _n To each node NODE ₀ ~ NODE _n Arc ARC connecting between ₁ ~ ARC _n Transition probability P set for ₁ ~ P _n An algorithm called finite stochastic automaton, which determines stochastically based on, is used.
[0117]
Specifically, each behavior model 90 ₁ ~ 90 _n Are their own behavior models 90 ₁ ~ 90 _n NODE that forms ₀ ~ NODE _n Corresponding to each of these nodes NODE ₀ ~ NODE _n Each has a state transition table 100 as shown in FIG.
[0118]
In this state transition table 100, the node NODE ₀ ~ NODE _n , Input events (recognition results) as transition conditions are listed in order of priority in the column of “input event name”, and further conditions for the transition condition are described in corresponding rows in the columns of “data name” and “data range”. Have been.
[0119]
Therefore, the node NODE represented by the state transition table 100 of FIG. ₁₀₀ In the above, when the recognition result of “detection of ball (BALL)” is given, the “size” of the ball given together with the recognition result is in the range of “0 to 1000”, When a recognition result of “obstacle detected (OBSTACLE)” is given, the other node that the “distance” to the obstacle given together with the recognition result is in the range of “0 to 100”. This is the condition for transitioning to.
[0120]
Also, this node NODE ₁₀₀ Then, even if there is no input of the recognition result, the behavior model 90 ₁ ~ 90 _n Of the parameter values of each emotion and each desire held in the emotion model 93 and the instinct model 94 that are periodically referred to by the user, “joy”, “surprise” or “surprise” held in the emotion model 93 When any of the parameter values of “Sadness” is in the range of “50 to 100”, it is possible to transition to another node.
[0121]
In the state transition table 100, the row of “transition destination node” in the column of “transition probability to another node” indicates that node NODE. ₀ ~ NODE _n The node names that can be transitioned from are listed, and other nodes NODE that can transition when all the conditions described in the columns of “input event name”, “data value”, and “data range” are met ₀ ~ NODE _n To the corresponding node in the column “Transition probability to another node”, and the node NODE ₀ ~ NODE _n The action to be output when transitioning to is described in the row of “output action” in the column of “transition probability to another node”. Note that the sum of the probabilities of each row in the column of “transition probability to another node” is 100 [%].
[0122]
Therefore, the node NODE represented by the state transition table 100 of FIG. ₁₀₀ Then, for example, when "the ball is detected (BALL)" and a recognition result indicating that the "SIZE (size)" of the ball is in the range of "0 to 1000" is given, "30 [%]" With the probability of "node NODE ₁₂₀ (Node 120) ", and the action of" ACTION1 "is output at that time.
[0123]
Each behavior model 90 ₁ ~ 90 _n Is a node NODE described as such a state transition table 100. ₀ ~ NODE _n Are connected to each other, and when a recognition result is given from the input semantics converter module 71 or the like, the corresponding node NODE ₀ ~ NODE _n The next action is determined stochastically using the state transition table, and the determination result is output to the action switching module 91.
[0124]
The behavior switching module 91 shown in FIG. ₁ ~ 90 _n Out of the actions respectively output from the action models 90 having a predetermined high priority. ₁ ~ 90 _n And outputs a command to execute the action (hereinafter referred to as an action command) to the output semantics converter module 79 of the middleware layer 50. In this embodiment, the behavior model 90 shown at the bottom in FIG. ₁ ~ 90 _n The higher the priority, the higher the priority.
[0125]
Further, the action switching module 91 notifies the learning module 92, the emotion model 93, and the instinct model 94 that the action has been completed, based on the action completion information provided from the output semantics converter module 79 after the action is completed.
[0126]
On the other hand, the learning module 92 inputs, from among the recognition results given from the input semantics converter module 71, the recognition result of the instruction received from the user, such as “reprimanded” or “praised”. Then, based on the recognition result and the notification from the action switching module 91, the learning module 92 lowers the probability of occurrence of the action when "scored" and increases the probability of occurrence of the action when "praised". Corresponding to the corresponding behavior model 90 in the behavior model library 90. ₁ ~ 90 _n Change the corresponding transition probability of.
[0127]
On the other hand, the emotion model 93 is a sum of “joy”, “sadness”, “anger”, “surprise”, “disgust”, and “fear”. For each of the six emotions, a parameter indicating the intensity of the emotion is stored. Then, the emotion model 93 converts the parameter values of each of these emotions into a specific recognition result such as “scolded” and “praised” given from the input semantics converter module 71 and the elapsed time and action switching module 91. It is updated periodically based on the notification from.
[0128]
Specifically, the emotion model 93 is calculated by a predetermined arithmetic expression based on the recognition result given from the input semantics converter module 71, the behavior of the robot device 1 at that time, the elapsed time since the last update, and the like.変動 E [t] is the variation amount of the emotion at that time, E [t] is the current parameter value of the emotion, and k is a coefficient representing the sensitivity of the emotion. _e Then, the parameter value E [t + 1] of the emotion in the next cycle is calculated by Expression (1), and the parameter value of the emotion is updated by replacing the parameter value E [t] with the current parameter value E [t] of the emotion. . The emotion model 73 updates the parameter values of all emotions in the same manner.
[0129]
(Equation 1)

[0130]
The degree to which each recognition result and the notification from the output semantics converter module 79 affect the variation ΔE [t] of the parameter value of each emotion is determined in advance, for example, “hit”. The recognition result has a great influence on the variation ΔE [t] of the parameter value of the emotion of “anger”, and the recognition result such as “stroke” indicates the variation ΔE [t] of the parameter value of the emotion of “joy”. Has become a major influence.
[0131]
Here, the notification from the output semantics converter module 79 is so-called action feedback information (action completion information), information on the appearance result of the action, and the emotion model 93 changes the emotion by such information. Let it. This is, for example, a behavior such as "barking" that lowers the emotional level of anger. Note that the notification from the output semantics converter module 79 is also input to the above-described learning module 92, and the learning module 92 generates an action model 90 based on the notification. ₁ ~ 90 _n Change the corresponding transition probability of.
[0132]
The feedback of the action result may be made by the output of the action switching module 91 (the action to which the emotion is added).
[0133]
On the other hand, the instinct model 94 provides four independent desires of “exercise”, “affection”, “appetite”, and “curiosity” for each of these desires. It holds a parameter indicating the strength of the desire. Then, the instinct model 94 periodically updates these desire parameter values based on the recognition result given from the input semantics converter module 71, the elapsed time, the notification from the action switching module 91, and the like.
[0134]
Specifically, the instinct model 94 uses a predetermined arithmetic expression based on the recognition result, the elapsed time, the notification from the output semantics converter module 68, and the like for “exercise desire”, “affection desire”, and “curiosity”. ΔI [k] is the variation of the desire at that time, I [k] is the current parameter value of the desire, and a coefficient k representing the sensitivity of the desire. _i The parameter value I [k + 1] of the desire in the next cycle is calculated using the equation (2) in a predetermined cycle, and the calculation result is replaced with the current parameter value I [k] of the desire. Update the parameter value of desire. Similarly, the instinct model 94 updates the parameter values of each desire except “appetite”.
[0135]
(Equation 2)

[0136]
Note that the degree to which the recognition result and the notification from the output semantics converter module 79 affect the amount of change ΔI [k] of the parameter value of each desire is determined in advance. Has a large effect on the variation ΔI [k] of the parameter value of “fatigue”.
[0137]
Note that, in the present embodiment, the parameter values of each emotion and each desire (instinct) are regulated so as to fluctuate in the range of 0 to 100, and the coefficient k _e , K _i Is also set individually for each emotion and each desire.
[0138]
On the other hand, as shown in FIG. 13, the output semantics converter module 79 of the middleware layer 50 outputs “forward”, “please”, and “squeals” given from the action switching module 91 of the application layer 51 as described above. ”Or“ tracking (follow the ball) ”to the corresponding signal processing modules 72 to 78 of the output system 80.
[0139]
When an action command is given, the signal processing modules 72 to 78 execute a corresponding actuator 28 for performing the action based on the action command. ₁ ~ 28 _n (FIG. 11), audio data of a sound output from the speaker 20 (FIG. 11) and / or drive data to be supplied to the LED of the light emitting unit 25 (FIG. 11). The corresponding actuator 28 via the virtual robot 43 of the tick server object 42 and the signal processing circuit 14 (FIG. 11) sequentially. ₁ ~ 28 _n , And sequentially to the speaker 20 or the light emitting unit 25.
[0140]
In this way, the robot apparatus 1 can perform autonomous actions according to its own (internal) and surrounding (external) conditions, instructions and actions from the user, based on the control program. Has been done.
[0141]
In such a robot device 1, the above-described face detection processing can be performed by the face detection module 33 of the middleware layer 50. FIG. 18 is a block diagram showing the components required to control the behavior of the robot apparatus shown in FIGS. 11 to 15 by face detection.
[0142]
As described above, the image data captured by the CCD camera 22 is stored at a predetermined location in the DRAM 11 and supplied to the virtual robot 43 in the robotic server object 42. The virtual robot 43 reads the image data from the DRAM 11 and supplies the image data to the face detection module 33 in the middleware layer 50. In the face detection module, the face detection processing as described in the first and second embodiments is performed, and the processing result is supplied to the behavior model library 90 in the application layer 51. Is reflected in the behavior of the robot device.
[0143]
That is, the behavior model library 90 determines a subsequent behavior while referring to the parameter value of the emotion or the parameter value of the desire as needed, and gives the determination result to the behavior switching module 91. Then, the behavior switching module 91 sends a behavior command based on the determination result to the walking module 75 in the output system 80 of the middleware layer 50.
[0144]
When an action command is given, the walking module 75 performs a corresponding actuator 28 for performing the action based on the action command. ₁ ~ 28 _n Is generated, and this data is sequentially transmitted to the corresponding actuator 28 via the virtual robot 43 of the robotic server object 42 and the signal processing circuit 14 (FIG. 2). ₁ ~ 28 _n Sequentially. As a result, the behavior of the robot device 1 is controlled, and, for example, an action such as approaching an object is developed.
[0145]
For example, based on the size, direction, and the like of the face image detected by the face detection processing of the face detection module 33, the robot device 1 looks at the direction of the target having the detected face, or approaches the target. Can be moved to. If the detected face image is a frontal face and the robot apparatus 1 determines from the distance data from the distance sensor 23 that it has approached the target object within a predetermined range, the robot apparatus 1 starts moving and moves to a predetermined distance. The control can be performed so as to stop the movement when the vehicle advances or when the contact is detected by the contact detection module 66.
[0146]
Also, for example, a moving object detecting means for detecting a moving object based on the image data acquired by the CCD camera 22 in FIG. 11 is provided, and a moving object position direction based on the position of the moving object detected by the moving object detecting means, or FIG. Audio data is acquired by audio detection means such as a microphone microphone 24, and sound source direction estimating means for estimating the sound source direction from the audio data is provided. It may be used when controlling the movement based on the result.
[0147]
In the above-described embodiment, a case has been described in which the present invention is applied to the quadruped walking robot device 1 configured as shown in FIG. 10, but the present invention is not limited to this, and the present invention is not limited thereto. The present invention can be widely applied to various other robot devices and other various devices other than the robot device. For example, the robot device may be a bipedal walker, and the moving means is not limited to the leg type moving system.
[0148]
Further, in the above-described embodiments, the description has been given as a software configuration. However, the present invention is not limited to this, and is not limited thereto, and each function may be configured by hardware.
[0149]
【The invention's effect】
As described in detail above, a face detection device according to the present invention detects a correlation between an input image and a template image of a predetermined size representing an average face image in a face detection device that detects a face of an object from an input image. A correlation calculating means to be obtained; and a determining means for determining whether or not a face image is included in the input image based on the correlation, wherein the correlation calculating means includes a facial image in the input image by the determining means. If it is determined that there is no correlation between the input image and the input image using a template image obtained by rotating the template image by a predetermined angle about a direction perpendicular to the screen, If it is determined that a face image is included, a correlation with the next input image is obtained using the template image at the time of the determination, so that the face image included in the input image does not face front. Even non-frontal face, it is possible to detect a face image from the input image.
[0150]
Further, a program according to the present invention causes a computer to execute the above-described face detection processing. According to such a program, the above-described face detection processing can be realized by software.
[0151]
Further, the robot apparatus according to the present invention is a robot apparatus which operates based on the supplied input information, wherein an imaging unit for capturing an image and a face of a target object are detected from the input image supplied from the imaging unit. A face detection unit, wherein the face detection unit calculates a correlation between the input image and a template image of a predetermined size representing an average face image, and a face image is added to the input image based on the correlation. Determination means for determining whether or not a face image is included in the input image when the determination means determines that the face image is not included in the input image. A correlation with the input image is obtained by using a template image rotated by a predetermined angle about a vertical direction, and when the determination unit determines that the input image includes a face image, Since the correlation with the next input image is obtained using the regular template image, even if the target object such as a human to be the face detection target is not facing the front of the robot apparatus, the face of the target object is detected. , And an action corresponding to the face detection result, such as approaching the direction of the target object, can be executed.
[Brief description of the drawings]
FIG. 1 is a block diagram schematically showing functions of a face detection module according to a first embodiment of the present invention.
FIG. 2 is a flowchart illustrating a face detection method according to the present embodiment.
FIG. 3 is a flowchart illustrating a face detection method according to a second embodiment of the present invention.
FIG. 4 is a block diagram schematically illustrating functions of a face detection device according to a first application example of the present invention.
FIGS. 5A and 5B are schematic diagrams showing an input image (window image) and a template image, respectively.
FIG. 6 is a diagram showing a matching result image which is a set of correlation values obtained from an input image (window image) and a template image.
FIG. 7 is a flowchart showing processing steps for detecting a pixel as a face candidate from a template matching result image in the first application example of the present invention.
FIG. 8 is a diagram illustrating a result of extracting a face candidate from a matching result image in a template matching unit of the face detection device according to the first application example of the present invention.
FIG. 9 is a diagram showing a result of extracting a matching result image that is equal to or more than a predetermined threshold value as a face candidate.
FIG. 10 is a perspective view illustrating an external configuration of a robot device according to an embodiment of the present invention.
FIG. 11 is a block diagram showing a circuit configuration of the robot device.
FIG. 12 is a block diagram showing a software configuration of the robot device.
FIG. 13 is a block diagram showing a configuration of a middleware layer in a software configuration of the robot apparatus.
FIG. 14 is a block diagram showing a configuration of an application layer in a software configuration of the robot apparatus.
FIG. 15 is a block diagram showing a configuration of a behavior model library of the application layer.
FIG. 16 is a diagram used to explain a finite probability automaton that is information for determining an action of the robot apparatus.
FIG. 17 is a diagram showing a state transition table prepared for each node of the finite probability automaton.
FIG. 18 is a block diagram showing components necessary for controlling the behavior of the robot device shown in FIGS. 11 to 15 by face detection.
[Explanation of symbols]
1 robot apparatus, 10 CPU, 11 DRAM, 14 signal processing circuit, 22 CCD camera, 28 ₁ ~ 28 _n Actuator, 33 face detection module, 42 robotic server object, 43 virtual robot, 50 middleware layer, 51 application layer, 68 signal processing module for motion detection, 70 input semantics converter module, 71 recognition system, 73 tracking signal processing module, 75 walking module, 79 output semantics converter module, 80 output system, 90 action model library, 91 action switching module

Claims

In a face detection device that detects a face of a target object from an input image,
Correlation calculating means for calculating a correlation between the input image and a template image of a predetermined size indicating an average face image,
Determining means for determining whether a face image is included in the input image based on the correlation,
The correlation calculation means uses a template image obtained by rotating the template image by a predetermined angle around a direction perpendicular to the screen when the determination means determines that the input image does not include a face image. Then, a correlation with the input image is obtained, and when the determination unit determines that the input image includes a face image, the correlation with the next input image is determined using the template image at the time of the determination. A face detection device characterized in that it is obtained.

2. The face detection device according to claim 1, further comprising an extraction unit that extracts the face image when the determination unit determines that the image includes a face image.

2. The face detection device according to claim 1, wherein the face detection device is mounted on a robot device that operates based on the supplied input information.

4. The face according to claim 3, wherein the correlation calculating means determines the predetermined angle based on a posture detection result from a posture detecting means for detecting a posture of the robot device provided in the robot device. Detection device.

In a face detection method of a face detection device that detects a face of a target object from an input image,
A correlation calculation step of calculating a correlation between the input image and a template image of a predetermined size indicating an average face image,
A determination step of determining whether a face image is included in the input image based on the correlation,
In the correlation calculation step, when it is determined in the determination step that the input image does not include a face image, the template image obtained by rotating the template image by a predetermined angle around an axis perpendicular to the screen is used. Is used to determine the correlation with the input image, and when it is determined in the determination step that the input image includes a face image, the next input image is determined using the template image at the time of the determination. A face detection method, wherein a correlation of

6. The face detection method according to claim 5, further comprising an extraction step of extracting the face image when it is determined in the determination step that the input image includes a face image.

The face detection method according to claim 5, wherein the face detection device is mounted on a robot device that operates based on supplied input information.

8. The correlation calculation step according to claim 7, wherein the predetermined angle is determined based on a posture detection result from posture detection means for detecting a posture of the robot device provided in the robot device. Face detection method.

In a program for executing an operation of detecting a face of an object from an input image,
A correlation calculation step of calculating a correlation between the input image and a template image of a predetermined size indicating an average face image,
A determination step of determining whether a face image is included in the input image based on the correlation,
In the correlation calculating step, when it is determined in the determining step that the input image does not include a face image, the template image obtained by rotating the template image by a predetermined angle around a direction perpendicular to the screen as an axis is used. Is used to determine the correlation with the input image. If it is determined in the determination step that the input image includes a face image, the template image at the time of determination is used to determine the correlation with the next input image. A program whose correlation is required.

In a robot device that operates based on the supplied input information,
An imaging unit that captures an image,
A face detection unit that detects a face of the target object from the input image supplied from the imaging unit,
The face detection unit includes: a correlation calculation unit configured to calculate a correlation between the input image and a template image having a predetermined size representing an average face image; and determining whether the input image includes a face image based on the correlation. A determining unit that determines that the input image does not include a face image by using the template image as an axis in a direction perpendicular to a screen when the determining unit determines that the input image does not include a face image. The correlation between the input image and the input image is obtained using the template image rotated by an angle, and when the determination unit determines that the input image includes a face image, the template image at the time of the determination is used. A robot apparatus for determining a correlation with a next input image.

Equipped with a posture detection unit that detects its own posture,
11. The robot apparatus according to claim 10, wherein the face detection unit selects the predetermined corner based on a posture detection result from the posture detection unit.