JP5482412B2

JP5482412B2 - Robot, position estimation method and program

Info

Publication number: JP5482412B2
Application number: JP2010105126A
Authority: JP
Inventors: 岳今井
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2010-04-30
Filing date: 2010-04-30
Publication date: 2014-05-07
Anticipated expiration: 2030-04-30
Also published as: JP2011233072A

Description

本発明は、パン（Pan）及びチルト（Tilt）可能なカメラを用いて人物の位置を推定するロボット、位置推定方法及びプログラムにに関する。 The present invention relates to a robot that estimates the position of a person using a panable and tiltable camera, a position estimating method, and a program.

人型又はペット型のコミュニケーションロボット（以下、単にロボットと言う）では、ユーザに親近感を与えるためにユーザとのアイコンタクトを取ることが望ましい。ロボットがユーザとアイコンタクトを取るためには、ロボットがユーザと目を合わせるようにロボットの首を動かす機能を設ける必要である。この機能は、例えばロボットの目や鼻の位置に設置したカメラでユーザの顔を撮像し、ユーザの顔が撮像画像の中央に位置するように首を制御する方法等で実現できる。 In a human-type or pet-type communication robot (hereinafter simply referred to as a robot), it is desirable to make eye contact with the user in order to give a close feeling to the user. In order for the robot to make eye contact with the user, it is necessary to provide a function of moving the neck of the robot so that the robot meets the user's eyes. This function can be realized by, for example, a method in which the user's face is imaged with a camera installed at the position of the eyes or nose of the robot, and the neck is controlled so that the user's face is positioned at the center of the captured image.

しかし、ロボットがユーザとコミュニケーションを取っている最中にロボットがある仕草をしたり、ユーザ以外の物に視線を移したり、他のタスクに首やカメラを利用したい場合がある。このような場合にも、ロボットがユーザと自然なコミュニケーションを取るためには、ロボットが適切なタイミングでユーザとアイコンタクトを取ることが望ましいが、ロボットが実行しているタスクによってはユーザがロボットの視界から外れてしまう場合もある。このような場合でもロボットがユーザとスムーズにアイコンタクトを取るためには、ユーザが視界から外れた場合や、ユーザの一部しか視界に入っていない場合等にも、ユーザを観察することでユーザの位置を推定する必要がある。 However, while the robot is communicating with the user, there may be a case where the robot wants to make a gesture, shifts the line of sight to something other than the user, or uses the neck or camera for other tasks. Even in such a case, it is desirable for the robot to make eye contact with the user at an appropriate timing in order for the robot to communicate naturally with the user. However, depending on the task being performed by the robot, the user may It may be out of sight. Even in such a case, in order for the robot to make eye contact with the user smoothly, the user can be observed by observing the user even when the user is out of view or when only a part of the user is in view. It is necessary to estimate the position of.

従来、ロボットに設けたアレイマイクによる音源検知によりユーザの位置を推定し、推定した位置に応じてロボットの首を動かしてユーザとアイコンタクトを取る方法が提案されている。しかし、この方法では、ロボットは、ユーザが発声しないと、ユーザの位置を推定できないのでアイコンタクトを取ることもできない。又、ロボットが使用される環境によっては、音の反射等のノイズによりユーザの位置を正確に推定することは難しい。 Conventionally, a method has been proposed in which a user's position is estimated by sound source detection using an array microphone provided in the robot, and the robot's neck is moved according to the estimated position to make eye contact with the user. However, with this method, the robot cannot make eye contact because the user's position cannot be estimated unless the user speaks. Also, depending on the environment in which the robot is used, it is difficult to accurately estimate the user's position from noise such as sound reflection.

一方、人物のシルエット画像と、シルエット画像をシミュレートしたシミュレーション画像の一致度を尤度としたパーティクルフィルタを、観測対象であるユーザに対して構成する方法が提案されている（特許文献１）。しかし、ロボットの目や鼻の位置に設置したカメラで撮像した撮像画像に基づいてユーザの位置を推定する方法では、ロボットの首の姿勢（即ち、カメラの姿勢）によって視野が変化するので背景も変化する。このため、首の特定の姿勢でカメラが撮像して用意された背景画像と、首の異なる姿勢でカメラが撮像した観測画像との比較演算により背景差分を求めて観測画像中のユーザを追跡するのでは、背景が変化すると背景差分からユーザの位置を正確に推定することは難しい。又、ロボットの首の姿勢にかかわらずユーザの位置を推定可能にする背景差分を求めるには、予め膨大な量の背景画像を用意しておく必要があると共に、背景差分を求めるのに膨大な量の比較演算を行う必要があり、比較演算を実行するプロセッサへの負荷も大きくなる。 On the other hand, a method has been proposed in which a particle filter whose likelihood is the degree of coincidence between a silhouette image of a person and a simulation image simulating a silhouette image is configured for a user to be observed (Patent Document 1). However, in the method of estimating the position of the user based on the captured image captured by the camera installed at the position of the eyes or nose of the robot, the field of view changes depending on the posture of the neck of the robot (that is, the posture of the camera). Change. For this reason, a background difference obtained by comparing the background image prepared by the camera with a specific neck posture and the observation image captured by the camera with a different neck posture is obtained to track the user in the observed image. Therefore, when the background changes, it is difficult to accurately estimate the position of the user from the background difference. In addition, in order to obtain a background difference that makes it possible to estimate the position of the user regardless of the posture of the robot's neck, it is necessary to prepare an enormous amount of background images in advance, and to obtain the background difference It is necessary to perform an amount comparison operation, which increases the load on the processor that performs the comparison operation.

カメラが撮像した観測画像の中からユーザの顔を認識することでユーザの位置を推定する方法もあるが、顔の認識には複雑な演算が必要であり、演算を実行するプロセッサへの負荷も大きい。又、ユーザの顔は、ある角度範囲でカメラの方を向いていないと例えば目、鼻や口といった顔の顕著な特徴がカメラの撮像画像に含まれないため顔と認識されず、顔の認識結果に基づいてユーザの位置を正確に推定することは難しい。 There is a method of estimating the user's position by recognizing the user's face from the observed images captured by the camera, but the face recognition requires a complicated operation, and the load on the processor that performs the operation also increases. large. In addition, if the user's face is not facing the camera within a certain angle range, for example, the facial features such as eyes, nose and mouth are not included in the captured image of the camera, so that the face is not recognized. It is difficult to accurately estimate the user's position based on the result.

例えば、色ヒストグラムを用いて対象物を追跡又は検出する技術も提案されている（特許文献２、特許文献３及び特許文献４）。 For example, techniques for tracking or detecting an object using a color histogram have also been proposed (Patent Document 2, Patent Document 3 and Patent Document 4).

特許第３６６４７８４号公報Japanese Patent No. 3664784 特開平１１−１３６６６４号公報Japanese Patent Laid-Open No. 11-136664 特開２００２−２４７４４０号公報JP 2002-247440 A 特表２００６−５０８４６１号公報JP 2006-508461 A

従来の位置推定方法では、比較的簡単、且つ、正確に観測対象であるユーザの位置を推定することは難しいという問題があった。 In the conventional position estimation method, there is a problem that it is relatively easy and difficult to accurately estimate the position of the user to be observed.

そこで、本発明は、比較的簡単、且つ、正確に観測対象であるユーザの位置を推定することができるロボット、位置推定方法及びプログラムを提供することを目的とする。 Accordingly, an object of the present invention is to provide a robot, a position estimation method, and a program capable of estimating the position of a user who is an observation target relatively easily and accurately.

本発明の一観点によれば、少なくとも１軸を中心に回転可能なカメラと、前記カメラの撮像画像から観測対象と推定される観測対象らしき画像の観測値を計算する画像生成部と、観測対象の位置の過去の推定結果と現在の前記カメラの姿勢に基づいて観測対象らしき画像の予測値を計算する画像予測部と、前記観測値と前記予測値を比較して前記観測値と前記予測値の一致度を尤度として計算する画像比較部と、前記尤度に基づいて前記観測対象の位置を推定する位置推定部と、前記位置推定部により推定された前記観測対象の位置の不確実さの評価に基づいて観測方針を決定する観測方針決定部と、前記観測方針に基づいて前記カメラの前記少なくとも１軸に対する回転位置を制御する制御部を備えたロボットが提供される。 According to an aspect of the present invention, a camera that can rotate about at least one axis, an image generation unit that calculates an observation value of an image that appears to be an observation object from an image captured by the camera, and an observation object An image prediction unit that calculates a predicted value of an image that seems to be an observation target based on a past estimation result of the position of the current position and the current posture of the camera, and compares the observed value with the predicted value to compare the observed value with the predicted value An image comparison unit that calculates the degree of coincidence as a likelihood, a position estimation unit that estimates the position of the observation target based on the likelihood, and the uncertainty of the position of the observation target estimated by the position estimation unit There is provided a robot including an observation policy determination unit that determines an observation policy based on the evaluation and a control unit that controls a rotational position of the camera relative to the at least one axis based on the observation policy .

本発明の一観点によれば、少なくとも１軸を中心に回転可能なカメラの撮像画像から観測対象と推定される観測対象らしき画像の観測値を画像生成部により計算する画像生成工程と、観測対象の位置の過去の推定結果と現在の前記カメラの姿勢に基づいて観測対象らしき画像の予測値を画像予測部により計算する画像予測工程と、画像比較部により前記観測値と前記予測値を比較して前記観測値と前記予測値の一致度を尤度として計算する画像比較工程と、前記尤度に基づいて前記観測対象の位置を位置推定部により推定する位置推定工程と、前記位置推定工程により推定された前記観測対象の位置の不確実さの評価に基づいて観測方針を観測方針決定部により決定する観測方針決定工程と、前記観測方針に基づいて前記カメラの前記少なくとも１軸に対する回転位置を制御部により制御する制御工程を含み、前記画像生成工程は、前記カメラからの撮像画像から観測対象である人物の顔領域を顔領域検出部により抽出し、前記カメラからの撮像画像から前景領域を前景領域抽出部により抽出し、抽出された顔領域及び前景領域に基づいて前記画像生成部により人物らしき画像を生成する位置推定方法が提供される。 According to an aspect of the present invention, an image generation step of calculating an observation value of an image that appears to be an observation target estimated from an image captured by a camera that can rotate about at least one axis by an image generation unit; An image prediction step for calculating a predicted value of an image that seems to be an observation target based on a past estimation result of the position of the current position and the current posture of the camera, and an image comparison unit for comparing the observed value and the predicted value. An image comparison step for calculating the degree of coincidence between the observed value and the predicted value as a likelihood, a position estimating step for estimating the position of the observation target based on the likelihood, and a position estimating step. An observation policy determining step for determining an observation policy by an observation policy determination unit based on the estimated uncertainty of the position of the observation target; and Look including a control step of controlling by the control unit the rotational position also with respect to one axis, the image generation process, a face area of a person is an observation target from a captured image from the camera is extracted by the face region detection unit, the camera A position estimation method is provided in which a foreground region is extracted from a captured image from a foreground region extraction unit, and an image that looks like a person is generated by the image generation unit based on the extracted face region and foreground region .

本発明の一観点によれば、コンピュータに観測対象の位置を推定させる位置推定処理を実行させるプログラムであって、少なくとも１軸を中心に回転可能なカメラの撮像画像から観測対象と推定される観測対象らしき画像の観測値を計算して記憶部に格納する画像生成手順と、観測対象の位置の過去の推定結果と現在の前記カメラの姿勢に基づいて観測対象らしき画像の予測値を計算して前記記憶部に格納する画像予測手順と、前記観測値と前記予測値を比較して前記観測値と前記予測値の一致度を尤度として計算して前記記憶部に記憶する画像比較手順と、前記尤度に基づいて前記観測対象の位置を推定する位置推定手順と、前記位置推定手順により推定された前記観測対象の位置の不確実さの評価に基づいて観測方針を決定する観測方針決定手順と、前記観測方針に基づいて前記カメラの前記少なくとも１軸に対する回転位置を制御する制御手順を前記コンピュータに実行させ、前記画像生成手順は、抽出された顔領域及び前景領域に基づいて人物らしき画像を生成するプログラムが提供される。 According to one aspect of the present invention, there is provided a program for causing a computer to perform a position estimation process for estimating the position of an observation target, the observation being estimated as an observation target from a captured image of a camera that can rotate about at least one axis. Calculate the observed value of the image that looks like the object and store it in the storage unit, calculate the predicted value of the image that looks like the observation object based on the past estimation result of the position of the observation object and the current posture of the camera An image prediction procedure to be stored in the storage unit, an image comparison procedure in which the observed value and the predicted value are compared and a degree of coincidence between the observed value and the predicted value is calculated as a likelihood and stored in the storage unit; a position estimation procedure for estimating the position of the observed object on the basis of the likelihood, observation policy for determining an observation policy based on evaluation of uncertainty of estimated position of the observed object by the position estimation procedure A constant procedure, the control procedure for controlling the rotational position relative to said at least one axis of the camera based on the observation policy is run on the computer, the image generation procedures, the extracted face region and the person on the basis of the foreground area A program for generating an image that looks like is provided.

開示のロボット、位置推定方法及びプログラムによれば、比較的簡単、且つ、正確に観測対象であるユーザの位置を推定することができる。 According to the disclosed robot, position estimation method, and program, it is possible to estimate the position of a user who is an observation target relatively easily and accurately.

本発明の一実施例におけるロボットの一例を示すブロック図である。It is a block diagram which shows an example of the robot in one Example of this invention. パーティクルフィルタを用いた予測画像の計算を説明する図である。It is a figure explaining calculation of the prediction picture using a particle filter. 顔を円形、体を楕円形で表現したテンプレートにより予測画像を生成する例を説明する図である。It is a figure explaining the example which produces | generates an estimated image with the template which expressed the face in the circular shape and the body in the ellipse. 顔領域とテンプレート領域の一例を示す図である。It is a figure which shows an example of a face area | region and a template area | region. 観測画像から人物らしき画像を生成する処理を説明するフローチャートである。It is a flowchart explaining the process which produces | generates the image which looks like a person from an observation image. ２０×２０画素のブロックに対する色ヒストグラムの一例を示す図である。It is a figure which shows an example of the color histogram with respect to a 20 * 20 pixel block. 色ヒストグラムを用いた前景領域の抽出において、バタチャリア距離を用いて前景らしき多値化画像を算出した結果を示す図である。It is a figure which shows the result of having calculated the multi-valued image which seems to be a foreground using the batacharia distance in extraction of the foreground area | region using a color histogram. ロボットの動作を説明するフローチャートである。It is a flowchart explaining operation | movement of a robot. 図８のステップＳ１３〜Ｓ１５の処理をより詳細に説明するフローチャートである。It is a flowchart explaining the process of FIG.8 S13-S15 in detail. 図８のステップＳ２４の処理をより詳細に説明するフローチャートである。It is a flowchart explaining the process of FIG.8 S24 in detail. コンピュータシステムの一例を示すブロック図である。It is a block diagram which shows an example of a computer system.

開示のロボット、位置推定方法及びプログラムでは、少なくとも１軸を中心に回転可能なカメラの撮像画像から抽出された観測対象らしき画像（又は、人物と推定される画像）の観測値を計算し、観測対象の位置の過去の推定結果とカメラの姿勢に基づいて観測対象らしき画像の予測値を計算し、観測対象らしき画像の観測値と予測値の比較結果から計算した尤度に基づいて観測対象の位置を推定する。 In the disclosed robot, the position estimation method, and the program, an observation value of an image that appears to be an observation object (or an image that is estimated to be a person) extracted from a captured image of a camera that can rotate around at least one axis is calculated and observed. Calculate the predicted value of the image that looks like the observation target based on the past estimation result of the target position and the camera posture, and the observation target based on the likelihood calculated from the comparison result of the observation value of the target target image and the predicted value Estimate the position.

観測対象の位置の推定結果は、不確実さ等の評価に基づいてカメラの姿勢を制御するのに使用できる。 The estimation result of the position of the observation target can be used to control the posture of the camera based on an evaluation such as uncertainty.

以下に、開示のロボット、位置推定方法及びプログラムの各実施例を図面と共に説明する。 Hereinafter, embodiments of the disclosed robot, position estimation method, and program will be described with reference to the drawings.

図１は、本発明の一実施例におけるロボットの一例を示すブロック図である。本実施例では、本発明が人型コミュニケーションロボットに適用されている。 FIG. 1 is a block diagram showing an example of a robot in one embodiment of the present invention. In this embodiment, the present invention is applied to a humanoid communication robot.

コミュニケーションロボット１は、図１に示す如く接続されたロボット本体１１、顔領域検出部１２、前景領域抽出部１３、人物らしき画像を生成する画像生成部１４、人物らしき画像を予測する画像予測部１５、人物らしき画像を比較する画像比較部１６、人物の位置を推定する位置推定部１７、観測方針決定部１８、及びロボット１の首の回転を制御する首制御部１９を有する。尚、ロボット本体１１は、ロボット１を歩行可能、或いは、走行可能とする周知の移動機構（図示せず）を有しても良いことは言うまでもない。 The communication robot 1 includes a robot body 11, a face area detection unit 12, a foreground area extraction unit 13, an image generation unit 14 that generates a person-like image, and an image prediction unit 15 that predicts a person-like image, as shown in FIG. An image comparison unit 16 that compares images that look like a person, a position estimation unit 17 that estimates the position of the person, an observation policy determination unit 18, and a neck control unit 19 that controls the rotation of the neck of the robot 1. Needless to say, the robot body 11 may include a known moving mechanism (not shown) that allows the robot 1 to walk or run.

ロボット本体１１は、パン（Pan）及びチルト（Tilt）可能なカメラ２１、首角度（又は、姿勢）取得部２２及び首回転駆動部２３を有する。カメラ２１は、所謂首振り可能な周知の構造を有し、例えばロボット本体１１の目や鼻の位置に設けられており、ロボット１から見える画像を撮像する。カメラ２１のパン軸を中心としたパン及びチルト軸を中心としたチルト（以下、単にパン・チルトと言う）は首回転駆動部２３により周知の方法で制御される。又、首角度取得部２２は、カメラ２１の首角度、即ち、カメラ２１の姿勢を周知の方法で取得する。カメラ２１の首角度は、パン角及びチルト角を含む。 The robot body 11 includes a camera 21 capable of panning and tilting, a neck angle (or posture) acquisition unit 22, and a neck rotation driving unit 23. The camera 21 has a known structure that can be swung, and is provided at, for example, the position of the eyes or nose of the robot body 11 and captures an image that can be seen from the robot 1. Panning around the pan axis of the camera 21 and tilting around the tilt axis (hereinafter simply referred to as pan / tilt) are controlled by the neck rotation drive unit 23 in a known manner. The neck angle acquisition unit 22 acquires the neck angle of the camera 21, that is, the posture of the camera 21 by a known method. The neck angle of the camera 21 includes a pan angle and a tilt angle.

顔領域検出部１２は、カメラ２１からの撮像画像から、ロボット１がコミュニケーションを取るべきユーザ（即ち、観測対象）の顔領域を抽出する。後述するように、顔領域検出部１２は、ユーザの顔らしき領域（即ち、顔領域）を検出するものであり、ユーザの顔を認識する個人認証を行うものではない。前景領域抽出部１３は、カメラ２１からの撮像画像から、前景領域を抽出する。画像生成部１４は、抽出された顔領域及び前景領域に基づいて、人物らしき画像、即ち、ユーザらしき画像を生成する。一方、画像予測部１５は、首角度取得部２２が取得した首角度と位置推定部１７からの人物の位置の過去の推定結果に基づいて、人物らしき画像、即ち、ユーザらしき画像を予測する。画像比較部１６は、画像生成部１４で生成された画像（又は、観測値）と画像予測部１５で予測された予測画像（又は、予測値）を比較して、比較結果を位置推定部１７に出力する。例えば、画像比較部１６は、比較する観測画像と予測画像の一致度を尤度とする周知のパーティクルフィルタを有し、尤度を含む比較結果を位置推定部１７に出力する。 The face area detection unit 12 extracts a face area of a user (that is, an observation target) that the robot 1 should communicate with from the captured image from the camera 21. As will be described later, the face area detection unit 12 detects a user-like area (that is, a face area) and does not perform personal authentication for recognizing the user's face. The foreground area extraction unit 13 extracts a foreground area from the captured image from the camera 21. The image generation unit 14 generates a person-like image, that is, a user-like image based on the extracted face area and foreground area. On the other hand, the image prediction unit 15 predicts a person-like image, that is, a user-like image, based on the neck angle acquired by the neck angle acquisition unit 22 and the past estimation result of the position of the person from the position estimation unit 17. The image comparison unit 16 compares the image (or observation value) generated by the image generation unit 14 with the predicted image (or prediction value) predicted by the image prediction unit 15, and compares the comparison result with the position estimation unit 17. Output to. For example, the image comparison unit 16 includes a known particle filter whose likelihood is the degree of coincidence between the observed image to be compared and the predicted image, and outputs a comparison result including the likelihood to the position estimation unit 17.

パーティクルフィルタは、観測対象の状態の連続的な事後確率密度分布を離散化して、パーティクルセット（Particle Set）と呼ばれるサンプル集団の各メンバーの状態に対して、観測結果との整合性を逐次的に評価してリサンプリング（Resampling）を行う。これにより、パーティクルでシミュレーションした事後確率密度分布を、真の確率密度分布に収束させる。 The particle filter discretizes the continuous posterior probability density distribution of the observation target state, and sequentially matches the consistency with the observation results for the state of each member of the sample group called particle set (Particle Set). Evaluate and perform resampling. As a result, the posterior probability density distribution simulated with particles is converged to a true probability density distribution.

位置推定部１７は、画像比較部１６からの比較結果に基づいて人物の位置、即ち、ユーザの位置を推定する。観測方針決定部１８は、推定されたユーザの位置の不確実さ等の評価に基づいて、ユーザの観測方針、即ち、ユーザをどのような規則（又は、ルール）に基づいて観測するかを決定する。例えば、観測方針がユーザの位置を確認する頻度に関するものであれば、推定されたユーザの位置と前回推定されたユーザの位置の間の距離が閾値以内であればユーザの位置の確認頻度を第１の時間毎に設定し、閾値を超える場合には確認頻度を第１の時間より短い第２の時間毎に設定する。首制御部１９は、観測方針決定部１８で決定された観測方針に基づいて首回転駆動部２３を制御することで、ユーザがカメラ２１の視界に入るようにカメラ２１の姿勢を制御する。 The position estimation unit 17 estimates the position of the person, that is, the position of the user based on the comparison result from the image comparison unit 16. The observation policy determination unit 18 determines the user's observation policy, that is, what rule (or rule) to observe the user based on the evaluation of the estimated uncertainty of the position of the user. To do. For example, if the observation policy relates to the frequency of confirming the position of the user, the frequency of confirming the position of the user is set if the distance between the estimated user position and the previously estimated user position is within a threshold. It is set every 1 time, and when the threshold value is exceeded, the confirmation frequency is set every 2nd time shorter than the 1st time. The neck control unit 19 controls the posture of the camera 21 so that the user enters the field of view of the camera 21 by controlling the neck rotation driving unit 23 based on the observation policy determined by the observation policy determination unit 18.

顔領域検出部１２、前景領域抽出部１３、画像生成部１４、画像予測部１５、画像比較部１６、位置推定部１７、観測方針決定部１８、及び首制御部１９のうち少なくとも一部はロボット１に対して外付けされていても良い。この場合、ロボット１に対して外付けされる部分は、ロボット１に設けた適切なインタフェース（図示せず）を介してロボット１と接続される。ロボット１と外付けされる部分とは、例えば無線ネットワーク（図示せず）を介して周知の方法で通信可能とすれば良い。つまり、ロボット１の首回転駆動部２３等は、リモートに制御されるものであっても良い。 At least some of the face area detection unit 12, the foreground area extraction unit 13, the image generation unit 14, the image prediction unit 15, the image comparison unit 16, the position estimation unit 17, the observation policy determination unit 18, and the neck control unit 19 are robots. 1 may be externally attached. In this case, a portion externally attached to the robot 1 is connected to the robot 1 via an appropriate interface (not shown) provided on the robot 1. What is necessary is just to enable communication with the robot 1 and the part attached externally by a well-known method, for example via a wireless network (not shown). That is, the neck rotation driving unit 23 and the like of the robot 1 may be remotely controlled.

画像予測部１５は、首角度取得部２２が取得した首角度に基づいて、例えば周知のパーティクルフィルタ（Particle Filter）を用いてユーザらしき画像を予測する予測画像を計算する。図２は、パーティクルフィルタを用いた予測画像の計算を説明する図である。図２において、ユーザ３０の位置は、図２に示すパラメータd，ｈ，θで表される。パラメータdは、ロボット１のカメラ２１（又は、首）のパン軸からユーザ３０までの最短距離を示し、この例ではユーザ３０の体３２の位置までの距離である。パラメータｈは、ロボット１のカメラ２１の基準位置から顔３１の中心までの高さを示す。パラメータθは、カメラ２１の定位置の視線ＦＹと上記最短距離dの方向がなす角度を示す。 Based on the neck angle acquired by the neck angle acquisition unit 22, the image prediction unit 15 calculates a predicted image that predicts an image that looks like a user using, for example, a known particle filter. FIG. 2 is a diagram for explaining prediction image calculation using a particle filter. In FIG. 2, the position of the user 30 is represented by parameters d, h, and θ shown in FIG. The parameter d indicates the shortest distance from the pan axis of the camera 21 (or neck) of the robot 1 to the user 30, and in this example is the distance to the position of the body 32 of the user 30. The parameter h indicates the height from the reference position of the camera 21 of the robot 1 to the center of the face 31. The parameter θ represents an angle formed between the line of sight FY at the fixed position of the camera 21 and the direction of the shortest distance d.

図２の例では、ロボット１は熊のぬいぐるみの形をしたペット型のコミュニケーションロボットである。例えば、カメラ２１は熊の目（又は鼻）に設置されており、首回転駆動部２３は熊の首の部分に設けられており、首角度取得部２２は熊の胴体部分に設けられている。 In the example of FIG. 2, the robot 1 is a pet-type communication robot in the shape of a teddy bear. For example, the camera 21 is installed on the bear's eyes (or nose), the neck rotation drive unit 23 is provided on the bear's neck, and the neck angle acquisition unit 22 is provided on the bear's trunk. .

予測画像は、ユーザ３０の顔３１の推定位置と、人物のシルエットを表す人型テンプレートから生成することができる。この場合、水平方向及び垂直方向の各々の１°当たりの画素数をα，βとすると、画像中の顔３１の座標（uf, vf）は、画像中心の座標を（uc, vc）、パン・チルト角を夫々pan, tilt、パン軸及びチルト軸とカメラ２１の光学中心（又は、光軸）との距離を夫々a,
b、パン軸とチルト軸の光軸方向の距離をｃで示すと、次のような式に基づいて計算することができる。この例では、パン軸はロボット１が設置される水平面に対して垂直であり、チルト軸は水平面と平行でありパン軸と直交する。 The predicted image can be generated from the estimated position of the face 31 of the user 30 and a human template representing the silhouette of a person. In this case, if the number of pixels per degree in the horizontal direction and the vertical direction is α and β, the coordinates (uf, vf) of the face 31 in the image are the coordinates of the image center (uc, vc), pan The tilt angles are pan and tilt, respectively, and the distance between the pan axis and tilt axis and the optical center (or optical axis) of the camera 21 is a,
b, where the distance in the optical axis direction between the pan axis and the tilt axis is denoted by c, it can be calculated based on the following equation. In this example, the pan axis is perpendicular to the horizontal plane on which the robot 1 is installed, and the tilt axis is parallel to the horizontal plane and orthogonal to the pan axis.

uf=uc+α×atan(d×sin(θ-pan)/（d×cos(θ-pan)-a））
vf=vc+β×atan(d×(atan(h/d)-tilt)/（(d-c)×cos(atan(h/d)-tilt)-b）) uf = uc + α × atan (d × sin (θ-pan) / (d × cos (θ-pan) -a))
vf = vc + β × atan (d × (atan (h / d) -tilt) / ((dc) × cos (atan (h / d) -tilt) -b))

特に、パラメータa,b,cが無視できる程度に小さい場合には、次の式のように計算を簡略化することができる。 In particular, when the parameters a, b, and c are small enough to be ignored, the calculation can be simplified as in the following equation.

uf=uc+α×(θ-pan)
vf=vc+β×(atan(h/d)-tilt) uf = uc + α × (θ-pan)
vf = vc + β × (atan (h / d) -tilt)

図３は、顔を円形、体を楕円形で表現したテンプレートにより予測画像を生成する例を説明する図である。厳密には、チルト角によって人物の形が変化するが、この例では説明の便宜上、大まかな推定しか要求されておらず、人物の形の変化に起因する誤差は無視できるとする。図３に示す定型の人型テンプレート３０Ａは、（vf, uf）を基点としており、円形の顔テンプレート部３１Ａ及び楕円形の体テンプレート部３２Ａを有する。ここでは、ユーザ３０のカメラ２１からの距離dに応じて人型テンプレート３０Ａの大きさを変化させる。 FIG. 3 is a diagram illustrating an example in which a predicted image is generated using a template in which a face is represented by a circle and a body is represented by an ellipse. Strictly speaking, the shape of the person changes depending on the tilt angle. However, in this example, only a rough estimation is required for convenience of explanation, and it is assumed that errors caused by changes in the shape of the person can be ignored. A regular human template 30A shown in FIG. 3 has (vf, uf) as a base point, and has a circular face template portion 31A and an elliptical body template portion 32A. Here, the size of the humanoid template 30 A is changed according to the distance d from the camera 21 of the user 30.

カメラ２１が撮像して観測された画像から人物らしさの度合いを表す画像を生成する際には、幾つかの異なる生成方法の組み合わせを採用する。これは、カメラ２１のパン・チルトのため、顔３１等のユーザの顕著な特徴がカメラ２１の視界に入らないこともあり得るからである。例えば、以下のような指標又は判定基準ＪＲ１〜ＪＲ３の少なくとも一つに基づいて観測画像から人物らしき画像を生成するようにしても良い。 When generating an image representing the degree of humanity from an image captured and observed by the camera 21, a combination of several different generation methods is employed. This is because, due to the pan / tilt of the camera 21, notable features of the user such as the face 31 may not enter the field of view of the camera 21. For example, an image that looks like a person may be generated from the observed image based on at least one of the following indices or determination criteria JR1 to JR3.

（ＪＲ１）顔領域３１Ｂを検出した場合、顔領域３１Ｂは人物らしさの度合い（又は、確率）が極めて高い。 (JR1) When the face area 31B is detected, the face area 31B has a very high degree of personality (or probability).

（ＪＲ２）顔領域３１Ｂを基点とした上記と同様の人型テンプレート３０Ａの領域は人物らしさの度合い（又は、確率）が高い。 (JR2) The region of the human template 30A similar to the above with the face region 31B as a base point has a high degree of humanity (or probability).

（ＪＲ３）前景領域抽出部１３において前景として抽出された領域（即ち、前景領域）は、背景とされた領域（即ち、背景領域）よりも人物らしさの度合い（又は、確率）が高い。 (JR3) The area extracted as the foreground by the foreground area extraction unit 13 (ie, the foreground area) has a higher degree of personality (or probability) than the area set as the background (ie, the background area).

図４は、顔領域とテンプレート領域の一例を示す図である。図４において、顔領域３１を検出すると、顔領域３１Ｂの画像を基点とした体領域３２Ｂを含む人型領域３０Ｂは人物らしさが極めて高いことがわかる。又、体領域３２Ｂ（又は、人型領域３０Ｂ）の大きさは、検出した顔領域３１Ｂの大きさに比例する。そこで、例えば表１に従って観測画像から人物らしさを示す値を決定することができる。 FIG. 4 is a diagram illustrating an example of a face area and a template area. In FIG. 4, when the face area 31 is detected, it can be seen that the humanoid area 30B including the body area 32B based on the image of the face area 31B has a very high human character. The size of the body region 32B (or the humanoid region 30B) is proportional to the size of the detected face region 31B. Therefore, for example, according to Table 1, a value indicating the character of a person can be determined from the observed image.

図５は、観測画像から人物らしき画像を生成する処理を説明するフローチャートである。図５において、ステップＳ１は、図１の顔領域抽出部１２により実行され、カメラ２１からの撮像画像からユーザの顔を検出して、顔領域３１Ｂと体領域３２Ｂを抽出する。ステップＳ２は、図１の前景領域抽出部１３により実行され、カメラ２１からの撮像画像から前景領域を抽出する。ステップＳ３は、図１の画像生成部１４により実行され、抽出された顔領域３１Ｂ及び前景領域に基づいて各画素の値を表１に従って決定し、人物らしき画像、即ち、ユーザらしき画像を生成する。 FIG. 5 is a flowchart for explaining processing for generating an image that looks like a person from an observed image. In FIG. 5, step S 1 is executed by the face area extraction unit 12 of FIG. 1, detects the user's face from the captured image from the camera 21, and extracts the face area 31 B and the body area 32 B. Step S 2 is executed by the foreground area extraction unit 13 of FIG. 1, and extracts the foreground area from the captured image from the camera 21. Step S3 is executed by the image generation unit 14 of FIG. 1, and determines the value of each pixel according to Table 1 based on the extracted face region 31B and foreground region, and generates a person-like image, that is, a user-like image. .

ステップＳ２において前景領域を抽出する際には、ロボット１の首の可動範囲全域において、撮像画像を適切な複数のブロック（又は、区域）に分割して分割された各ブロックの特徴量を例えば前景領域抽出部１３内の記憶部（図示せず）に記憶しておき、カメラ２１が現在撮像している（即ち、現在カメラ２１から見えている）領域の各ブロックの特徴量と記憶部に記憶されている該当領域のブロックの特徴量を比較し、差異がある部分を前景領域として扱う。この場合に使用する特徴量は、領域の特徴を表し、且つ、照明変化やパン・チルトの誤差等の影響を受けにくいことが望ましい。この例では、色ヒストグラムを特徴量として使用する。色ヒストグラムとは、画像を例えばＨＳＶ画像（Hue（色相）、Saturation（彩度）、Value（明度）の画像）に変換し、各画素の色相の値に関してある範囲の値域毎に出現頻度を計測した結果である。 When the foreground region is extracted in step S2, the feature amount of each block divided by dividing the captured image into a plurality of appropriate blocks (or areas) in the entire movable range of the neck of the robot 1 is, for example, the foreground. The data is stored in a storage unit (not shown) in the region extraction unit 13 and stored in the storage unit and the feature amount of each block of the region currently captured by the camera 21 (ie, currently visible from the camera 21). The feature quantities of the blocks in the corresponding area are compared, and the difference is treated as the foreground area. It is desirable that the feature amount used in this case represents the feature of the region and is not easily affected by changes in illumination or pan / tilt errors. In this example, a color histogram is used as a feature amount. A color histogram is an image that is converted into an HSV image (Hue, Saturation, Value), for example, and the appearance frequency is measured for each range of the hue value of each pixel. It is the result.

図６は、撮像画像中の２０画素×２０画素で形成された矩形ブロックに対する色ヒストグラムの一例を示す図である。図６中、（ａ）はカメラ２１からの撮像画像を示し、（ｂ）は（ａ）中の矩形ブロックについて算出した色ヒストグラムを示す。色ヒストグラムにおいて、Ｃ１〜Ｃ５は互いに異なる色成分を示す。 FIG. 6 is a diagram illustrating an example of a color histogram for a rectangular block formed of 20 pixels × 20 pixels in a captured image. 6A shows a captured image from the camera 21, and FIG. 6B shows a color histogram calculated for the rectangular block in FIG. In the color histogram, C1 to C5 indicate different color components.

背景領域を記憶部に記憶する時には、カメラ２１はロボット１の首の可動範囲で撮像を行い、首の可動範囲全域に対して左右、上下とも一定角度毎に特徴量を記憶部に記憶する。記憶部に記憶する際の角度と撮像した画像の特徴量の対応付けは、現在のパン角及びチルト角（以下、パン・チルト角と言う）と、各ブロックの画像内の位置から決定できる。首の可動範囲に対する撮像画像を記憶するのではなく、特徴量を記憶部に格納するので、撮像画像を記憶する場合と比較すると記憶部に求められる記憶容量を大幅に削減可能となる。 When storing the background area in the storage unit, the camera 21 captures an image within the movable range of the neck of the robot 1 and stores the feature amount in the storage unit at a certain angle both right and left and up and down with respect to the entire movable range of the neck. The association between the angle at the time of storing in the storage unit and the feature amount of the captured image can be determined from the current pan angle and tilt angle (hereinafter referred to as pan / tilt angle) and the position of each block in the image. Rather than storing the captured image for the movable range of the neck, the feature amount is stored in the storage unit, so that the storage capacity required for the storage unit can be greatly reduced as compared to storing the captured image.

一方、前景領域を抽出する際には、記憶部に記憶されている背景領域の特徴量とカメラ２１が現在撮像している画像の特徴量の比較に基づいて抽出する。この時、比較方法として、バタチャリア距離（Bhattacharyya Distance）や正規化相関等を用いることができる。人物らしき画像を生成する際には、この正規化相関の値やバタチャリア距離の値を利用することができる。つまり、例えば色ヒストグラムの比較で、比較に使った尺度（正規化相関やバタチャリア距離）の尺度を適切に変換した値を人物らしさの尺度に使うことができる。 On the other hand, when extracting the foreground area, the foreground area is extracted based on the comparison between the feature quantity of the background area stored in the storage unit and the feature quantity of the image currently captured by the camera 21. At this time, as a comparison method, a Bhattacharyya distance, a normalized correlation, or the like can be used. When a person-like image is generated, the normalized correlation value and the batcharia distance value can be used. That is, for example, in the comparison of color histograms, a value obtained by appropriately converting the scale used for the comparison (normalized correlation or batcharia distance) can be used as the measure of humanity.

例えば、０〜１の多値化画像を求めるには、バタチャリア距離は０〜１の値をとり、バタチャリアの値が０に近い程比較対象である特徴量が似ているため、例えばバタチャリアの値をそのまま多値化画像の画素の値として利用することができる。又、０〜１の多値化画像を求めるには、正規化相関の値は−１〜１の値を取り、１に近い程比較対象である特徴量が似ているため、例えば正規化相関の値を−０．５倍した後に１を加算した値を多値化画像の画素の値として利用することができる。 For example, in order to obtain a multi-valued image of 0 to 1, the batcharia distance takes a value of 0 to 1, and the feature value to be compared is similar as the value of the batcharia is closer to 0. Can be directly used as the pixel value of the multi-valued image. In order to obtain a multi-valued image of 0 to 1, the normalized correlation value takes a value of −1 to 1, and the closer to 1, the more similar the feature quantity to be compared. A value obtained by multiplying the value of −0.5 by −1 and adding 1 can be used as the pixel value of the multilevel image.

図７は、図６の如き色ヒストグラムを用いた前景領域の抽出において、バタチャリア距離を用いて前景らしき多値化画像を算出した結果を示す図である。図７中、（ａ）はカメラ２１からの撮像画像を示し、（ｂ）は（ａ）の撮像画像について算出した色ヒストグラムと記憶部に記憶されている背景部分の色ヒストグラムの比較の際にバタチャリア距離を用いて算出した多値化画像を示す。ここでは便宜上、多値化画像を黒画素で示している。 FIG. 7 is a diagram illustrating a result of calculating a multi-valued image that looks like the foreground using the batacharia distance in the extraction of the foreground region using the color histogram as shown in FIG. 7A shows a captured image from the camera 21, and FIG. 7B shows a comparison between the color histogram calculated for the captured image in FIG. 7A and the color histogram of the background portion stored in the storage unit. The multi-valued image calculated using the Batacharia distance is shown. Here, for convenience, the multi-valued image is indicated by black pixels.

多値化した前景情報（以下、前景度と言う）から実際に人物らしき画像を決定する方法の一例を表２に示す。この場合の処理は、図５の処理と同様の手順を行えば良い。尚、説明の便宜上、表２では前景度が０〜１に正規化されているものとする。 Table 2 shows an example of a method for actually determining a person-like image from multi-valued foreground information (hereinafter referred to as foreground level). The processing in this case may be performed in the same procedure as the processing in FIG. For convenience of explanation, it is assumed in Table 2 that the foreground level is normalized to 0 to 1.

尚、人領域の色ヒストグラムを記憶部に記憶しておき、カメラ２１からの撮像画像の各ブロック（又は、各領域）内の代表的な色が人領域ではどの程度の頻度（即ち、色頻度）であるかを示す情報を利用することも考えられる。この場合、例えば表３のように人物らしさの値、即ち、顔領域３１Ｂらしさの値及び体領域３２Ｂらしさの値を決定すれば良い。この場合の処理も、図５の処理と同様の手順を行えば良い。表３は、カメラ２１からの撮像画像から人物らしさを決定する方法の一例を示す。尚、説明の便宜上、表３では人領域（即ち、顔領域３１Ｂ及び体領域３２Ｂ）における色頻度は０〜１に正規化されているものとする。 Note that the color histogram of the human area is stored in the storage unit, and how often the representative color in each block (or each area) of the captured image from the camera 21 is in the human area (that is, the color frequency). It is also possible to use information indicating whether or not. In this case, for example, as shown in Table 3, the values of person-likeness, that is, the values of the face area 31B and the body area 32B may be determined. In this case, the same procedure as the process of FIG. 5 may be performed. Table 3 shows an example of a method for determining the person-likeness from the captured image from the camera 21. For convenience of explanation, it is assumed in Table 3 that the color frequency in the human area (that is, the face area 31B and the body area 32B) is normalized to 0 to 1.

尚、図１の顔領域抽出部１２に周知の顔識別機能が備えられている場合には、人型テンプレートを顔識別された人物毎に変更しても良い。 If the face area extraction unit 12 of FIG. 1 has a known face identification function, the human template may be changed for each person whose face is identified.

又、上記の尤度を計算する際に、顔識別が利用できる場合には、識別された顔より手前に他の人物がいる確率は低いため、その条件に当てはまるパーティクルの尤度を低くしても良い
図１の画像比較部１６における尤度計算を行う際、カメラ２１の前に手をかざした場合等には、撮像画像の全領域が前景と判断されてしまう。このため、顔領域３１Ｂが検出されておらず、且つ、撮像画像中の多くの領域が前景領域である場合には、観測結果を無効として何も観測されていないという扱いとしても良い。 Also, when face identification can be used when calculating the above likelihood, the probability that there is another person in front of the identified face is low, so the likelihood of particles that meet that condition is reduced. When performing likelihood calculation in the image comparison unit 16 in FIG. 1, when the hand is held in front of the camera 21, the entire region of the captured image is determined as the foreground. For this reason, when the face area 31B is not detected and many areas in the captured image are foreground areas, the observation result may be invalid and nothing may be observed.

撮像画像だけで必要な推定精度が得られない場合には、超音波センサやアレイマイク等により、ロボット１に対してどの方向に人物がいる可能性があるかを示す情報をも尤度計算に利用しても良い。例えば、超音波センサの反応がある方向のパーティクルは尤度が高い。超音波センサやアレイマイク等は、反射を受け取ることもあるが、今回の推定結果と前回までの推定結果等を併用することにより、反射による誤検知を防ぐことができる。 If the required estimation accuracy cannot be obtained with only the captured image, information indicating in which direction a person may be present with respect to the robot 1 is also used for likelihood calculation by an ultrasonic sensor, an array microphone, or the like. May be used. For example, particles in a direction in which the ultrasonic sensor reacts have a high likelihood. Ultrasonic sensors, array microphones, and the like may receive reflections, but erroneous detection due to reflections can be prevented by using the current estimation result and the previous estimation result together.

又、照明条件の大きな変化や、コミュニケーションロボットのようにカメラの姿勢が固定されていない場合は、カメラの設置位置（又は、取り付け位置）が設計位置からずれたりすることがある。このような場合には、位置推定精度が低下しないように、背景の特徴量を定期的に記憶部に記憶し直すことが有効である。このため、背景の特徴量は、人領域である確率が極めて低い領域については、記憶部に記憶し直すようにしても良い。 In addition, when the lighting condition changes greatly or the posture of the camera is not fixed like a communication robot, the installation position (or attachment position) of the camera may deviate from the design position. In such a case, it is effective to periodically store the background feature amount in the storage unit so that the position estimation accuracy does not deteriorate. For this reason, the background feature amount may be stored again in the storage unit for a region having a very low probability of being a human region.

人物の存在確率がある程度ばらついてきたら、その人物にカメラを向けて確認する機能設けても良い。この場合、パーティクルの平均値（即ち、顔の予測値）に向かってロボットの首を動かす。首を動かしている途中で人物が一部カメラの視界に入ったりして推定値が変化した場合には、適宜首の動かし方を変更すれば良い。これにより、カメラの視界に入らない人物の位置の推定を定期的に行い、位置推定精度を一定レベルに保つことができる。 If a person's existence probability varies to some extent, a function may be provided in which a camera is pointed at the person and checked. In this case, the robot's neck is moved toward the average value of the particles (ie, the predicted value of the face). If the estimated value changes due to a person entering the field of view of the camera while moving the neck, the manner of moving the neck may be changed as appropriate. This makes it possible to periodically estimate the position of a person who does not enter the field of view of the camera and maintain the position estimation accuracy at a certain level.

複数の人物を追跡する場合は、複数の人推定プログラムを動作させ、予測画像を複数の画像を足し合わせることにより作成しても良い。 In the case of tracking a plurality of persons, a plurality of person estimation programs may be operated to create a predicted image by adding a plurality of images.

コミュニケーションロボットの場合、通りすがりの人物は検出する必要がなく、コミュニケーションロボットが対面した人物のみの位置を推定すれば良い。このため、カメラの撮像画像中、人物が存在し得ない場所で顔が検出された場合には、新たな人物の推定を開始すれば良い。 In the case of a communication robot, it is not necessary to detect a passing person, and it is only necessary to estimate the position of only the person that the communication robot faces. For this reason, when a face is detected in a place where no person can exist in the captured image of the camera, the estimation of a new person may be started.

又、パーティクルの分布が広がり、確度が落ちてきた場合には、該当する方向の様子を確認する等、確度を改善する対策を取った後に、確度が改善しないようであれば人物が立ち去ったと判断すれば良い。 Also, if the distribution of particles spreads and the accuracy falls, after taking measures to improve the accuracy, such as checking the state of the corresponding direction, if the accuracy does not improve, it is determined that the person has left Just do it.

コミュニケーションロボットによる首のかしげ動作等、首がロール軸を中心としてロール方向に回転可能なカメラの場合、適切な座標変換を行った画像を入力にする。例えば、事前に撮像画像中の回転中心を求めておき、回転中心を中心として回転させた撮像画像において予測画像を生成することにより、ロール角がある場合でも人物が追跡可能となる。 In the case of a camera in which the neck can rotate in the roll direction around the roll axis, such as a neck movement by a communication robot, an image with appropriate coordinate transformation is input. For example, by obtaining the rotation center in the captured image in advance and generating the predicted image in the captured image rotated around the rotation center, the person can be tracked even when there is a roll angle.

図８は、ロボット１の動作を説明するフローチャートである。図８において、ステップＳ１１は、例えばデフォルトの設定に応じて首制御部１９により首回転駆動部２３を制御してロボット１の首を動しながら、首の可動範囲全域に関する背景領域の特徴量（以下、背景特徴量とも言う）を算出して記憶部に記憶する。 FIG. 8 is a flowchart for explaining the operation of the robot 1. 8, in step S11, for example, the neck control unit 19 controls the neck rotation driving unit 23 according to the default setting to move the neck of the robot 1, and the background region feature amount ( The background feature amount is also calculated and stored in the storage unit.

ステップＳ１２は、カメラ２１からの撮像画像と首角度取得部２２からのパン・チルト角を示す角度位置情報を取得する。ステップＳ１２において、首角度取得部２２は、例えばロボット１の首（又は、カメラ２１）のパン角を制御する首回転駆動部２３内のモータ（図示せず）及びチルト角を制御する首回転駆動部２３内のモータの夫々の回転角度位置を検出する角度センサが出力する角度位置信号から首角度を取得することができる。又、パン角及びチルト角を制御するモータ自体が回転角度位置を示す角度位置信号を出力する機能を備えている場合には、各モータが出力する角度位置信号から首角度を取得すれば良い。モータに回転角度位置を問い合わせる処理を省略する場合には、モータに対する指示（目標回転角度θdistと到達時間duration）とモータ指示時の回転角度θ0、指示時刻t0、現在時刻tから線形補完により近似的に現在の回転角度を求めることもできる。この場合、現在角度θは、例えばθ＝θ0＋（θdist−θ0）×（t−t0）／durationなる式から計算可能である。 In step S 12, angular position information indicating the captured image from the camera 21 and the pan / tilt angle from the neck angle acquisition unit 22 is acquired. In step S 12, the neck angle acquisition unit 22, for example, a motor (not shown) in a neck rotation drive unit 23 that controls the pan angle of the neck (or camera 21) of the robot 1 and a neck rotation drive that controls the tilt angle. The neck angle can be acquired from the angle position signal output from the angle sensor that detects the rotational angle position of each motor in the unit 23. When the motor itself that controls the pan angle and the tilt angle has a function of outputting an angle position signal indicating the rotation angle position, the neck angle may be acquired from the angle position signal output by each motor. When omitting the process of querying the rotational angle position of the motor, approximate to the instruction (target rotational angle θdist and arrival time duration) for the motor, the rotational angle θ0 at the time of the motor instruction, the instruction time t0, and the current time t by linear interpolation. The current rotation angle can also be obtained. In this case, the current angle θ can be calculated from, for example, an equation of θ = θ0 + (θdist−θ0) × (t−t0) / duration.

ステップＳ１３は、パン・チルト角から現在のカメラ２１の視野におけるブロック毎の背景特徴量を取得して記憶部に記憶する。ステップＳ１３の処理については、より詳細に後述する。ステップＳ１４は、撮像画像を複数のブロックに分割し、ブロック毎の特徴量を計算して記憶部に記憶する。ステップＳ１５は、ステップＳ１３で取得されたブロック単位の背景特徴量とステップＳ１４で計算されたブロック単位の特徴量を比較して前景領域を抽出し、記憶部に記憶する。一方、ステップＳ１６は、カメラ２１からの撮像画像から顔領域３１Ｂを抽出し、記憶部に記憶する。ステップＳ１７は、抽出した顔領域３１Ｂと抽出した前景領域に基づいて人物らしき画像を生成し、記憶部に記憶する。 In step S13, the background feature amount for each block in the field of view of the current camera 21 is acquired from the pan / tilt angle and stored in the storage unit. The process of step S13 will be described in detail later. In step S14, the captured image is divided into a plurality of blocks, a feature amount for each block is calculated and stored in the storage unit. In step S15, the foreground region is extracted by comparing the background feature quantity in block units acquired in step S13 with the feature quantity in block units calculated in step S14, and is stored in the storage unit. On the other hand, in step S16, the face area 31B is extracted from the captured image from the camera 21 and stored in the storage unit. In step S17, a person-like image is generated based on the extracted face area 31B and the extracted foreground area, and stored in the storage unit.

ステップＳ１８は、人物らしき画像の仮説（又は、候補）を一つ選択し、ステップＳ１９は、首回転駆動部２３により首を駆動して所定量動かして状態遷移を発生させる。ステップＳ２０は、パン・チルト角を示す角度位置情報と位置推定部１７からの人物の位置の過去の推定結果に基づいて上記の如く予測画像を生成し、記憶部に記憶する。ステップＳ２１は、少なくとも生成された予測画像と人らしき画像に基づいて尤度を計算し、記憶部に記憶する。ステップＳ２２は、全ての仮説について尤度の計算が終了したか否かを判定し、判定結果がＮＯであると処理はステップＳ１８へ戻る。 In step S18, one hypothesis (or candidate) of an image that looks like a person is selected, and in step S19, the neck is driven by the neck rotation driving unit 23 and moved by a predetermined amount to generate a state transition. In step S20, a predicted image is generated as described above based on the angular position information indicating the pan / tilt angle and the past estimation result of the position of the person from the position estimation unit 17, and is stored in the storage unit. In step S21, the likelihood is calculated based on at least the generated predicted image and the person-like image, and is stored in the storage unit. In step S22, it is determined whether or not the likelihood calculation has been completed for all hypotheses. If the determination result is NO, the process returns to step S18.

一方、ステップＳ２２の判定結果がＹＥＳであると、ステップＳ２３は、上記パーティクルフィルタによるリサンプリング処理を行う。ステップＳ２４は、必要であれば首を動かす制御指示を首回転駆動部２３に出力する。ステップＳ２５は、人物がいる確率が閾値より低い、即ち、確率が極めて低い場所に関して背景領域を更新し、処理はステップＳ１２へ戻る。 On the other hand, if the decision result in the step S22 is YES, a step S23 performs a resampling process by the particle filter. In step S24, a control instruction to move the neck if necessary is output to the neck rotation driving unit 23. In step S25, the background area is updated for a place where the probability that a person is present is lower than the threshold, that is, the probability is extremely low, and the process returns to step S12.

尚、図８において、ステップＳ１６を省略した場合には、ステップＳ１７は抽出した前景領域に基づいて人物らしき画像を生成すれば良い。又、ステップＳ２５を省略しても良い。この場合、ステップＳ２４の後、処理はステップＳ１２へ戻る。 In FIG. 8, if step S16 is omitted, step S17 may generate a person-like image based on the extracted foreground region. Further, step S25 may be omitted. In this case, after step S24, the process returns to step S12.

ところで、ステップＳ１３は、水平方向及び垂直方向の１°当たりの画素数をα、βとすると、画像中の顔３１の座標（uf, vf）は、画像中心の座標を（uc, vc）、パン・チルト角を夫々pan, tilt、パン軸とチルト軸とカメラ２１の光学中心との距離を夫々a, bで示すと、予測画像の生成時と同様に計算することができる。予測画像の生成時には予測の(θ, d, ｈ)から、画像中の点を求める必要があるが、ステップＳ１３では、背景特徴量の記憶の際には距離dが十分に遠い距離であると仮定して、パン・チルトの基準位置からのチルト方向の角度をφとし、背景の(θ, φ)の周辺に対する画像特徴量を記憶する。例えば、dθ，dφ毎に特徴量を記憶する場合、パン・チルトをdθ，dφずつ首の可動範囲内全域で行い、各回転位置（パン・チルト位置）において、撮像画像の中心付近で画像特徴量を計算して記憶部に記憶する。首の可動範囲の限界に達した場合には、可動範囲の外側についてもdθ，dφ毎に画像特徴量を計算して記憶部に記憶する。パン・チルトをdθ，dφより大きく行って複数のブロックの画像特徴量を一度に記憶部に記憶しても良いが、撮像画像の中心付近の画像特徴量を記憶する方が特徴量の比較時に誤差が生じる可能性が低くなる。 By the way, in step S13, assuming that the number of pixels per 1 ° in the horizontal direction and the vertical direction is α and β, the coordinates (uf, vf) of the face 31 in the image are the coordinates of the image center (uc, vc), When the pan / tilt angle is represented by pan and tilt, and the distance between the pan axis and tilt axis and the optical center of the camera 21 is represented by a and b, respectively, it can be calculated in the same manner as when the predicted image is generated. At the time of generating the predicted image, it is necessary to obtain a point in the image from the predicted (θ, d, h). In step S13, the distance d is sufficiently long when the background feature is stored. Assuming that the angle in the tilt direction from the pan / tilt reference position is φ, the image feature quantity for the periphery of (θ, φ) in the background is stored. For example, when storing feature values for each of dθ and dφ, pan / tilt is performed over the entire movable range of the neck by dθ and dφ, and image features near the center of the captured image at each rotational position (pan / tilt position). The amount is calculated and stored in the storage unit. When the limit of the movable range of the neck is reached, the image feature amount is calculated for each of dθ and dφ outside the movable range and stored in the storage unit. Panning and tilting may be performed larger than dθ and dφ, and the image feature values of a plurality of blocks may be stored in the storage unit at a time. However, storing the image feature values near the center of the captured image is more effective when comparing feature values. The possibility of errors is reduced.

パン・チルト角（pan, tilt）から、撮像画像中の各ブロックの画像特徴量を取得するには、記憶部に記憶している（θ，φ）毎の特徴量と現在の撮像画像中の座標（u, v）との対応を求める必要がある。この例では、a, bを無視できるくらい十分にdが長いと仮定し、以下の式により求める。 In order to obtain the image feature amount of each block in the captured image from the pan / tilt angle (pan, tilt), the feature amount for each (θ, φ) stored in the storage unit and the current captured image It is necessary to find the correspondence with the coordinates (u, v). In this example, it is assumed that d is long enough to ignore a and b, and is obtained by the following equation.

u=uc+α×(θ-pan)
v=vc+β×(φ-tilt) u = uc + α × (θ-pan)
v = vc + β × (φ-tilt)

上記の式を(θ,φ)を求める式に変換すると、以下の式が得られる。 When the above equation is converted into an equation for obtaining (θ, φ), the following equation is obtained.

θ=(u-uc)/α+pan
φ=(v-vc)/β＋tilt
これにより、撮像画像内の座標（u, v）における、記憶部に記憶されている（θ,φ）の画像特徴量を取得できる。この（θ,φ）に対応する座標（u, v）付近で画像特徴量を計算して、取得した画像特徴量と比較する。（θ,φ）に対する特徴量は、隣接する（θ＋dθ,φ＋dφ）の特徴量と重複するブロックから計算しても良い。 θ = (u-uc) / α + pan
φ = (v-vc) / β + tilt
Thereby, the image feature amount (θ, φ) stored in the storage unit at the coordinates (u, v) in the captured image can be acquired. An image feature amount is calculated in the vicinity of the coordinates (u, v) corresponding to (θ, φ), and compared with the acquired image feature amount. The feature quantity for (θ, φ) may be calculated from blocks that overlap with the adjacent feature quantity (θ + dθ, φ + dφ).

図９は、撮像画像中の各ブロックの背景特徴量を取得して前景を抽出する図８のステップＳ１３〜Ｓ１５の処理をより詳細に説明するフローチャートである。ブロックの分割方法は、各座標（u, v）を中心として動的に決定しても、予めブロックに分割しておいて各座標（u, v）を含むブロックの特徴を求めるようにしても良い。 FIG. 9 is a flowchart for explaining in more detail the processing in steps S13 to S15 in FIG. 8 for acquiring the background feature amount of each block in the captured image and extracting the foreground. The block division method may be determined dynamically around each coordinate (u, v), or may be divided into blocks in advance to obtain the feature of the block including each coordinate (u, v). good.

図９において、ステップＳ３１は、カメラ２１の視野内の（θ，φ）の範囲を求める。ステップＳ３２は、記憶部に記憶されている範囲内の離散値（θ，φ）に対応する座標（u, v）を求める。ステップＳ３３は、求めた座標（u, v）の中から一つの座標を選択する。ステップＳ３４は、選択した座標（u, v）に対する背景特徴量を対応する（θ，φ）から取得する。ステップＳ３５は、撮像画像中の座標（u, v）の周辺領域に対する特徴量を計算する。ステップＳ３６は、背景特徴量と撮像画像中の特徴量を比較して、座標（u, v）が含まれるブロックが前景領域か否かを決定するか、或いは、前景領域である確率を示す前景度を求める。ステップＳ３７は、ステップＳ３２で抽出した全ての（u, v）についての処理が終了したか否かを判定し、判定結果がＮＯであると処理はステップＳ３３へ戻る。一方、ステップＳ３７の判定結果がＹＥＳであると、処理は終了する。 In FIG. 9, step S 31 obtains a range of (θ, φ) within the field of view of the camera 21. In step S32, coordinates (u, v) corresponding to the discrete values (θ, φ) within the range stored in the storage unit are obtained. In step S33, one coordinate is selected from the obtained coordinates (u, v). In step S34, the background feature quantity for the selected coordinates (u, v) is acquired from the corresponding (θ, φ). Step S35 calculates the feature-value with respect to the peripheral area | region of the coordinate (u, v) in a captured image. A step S36 compares the background feature quantity with the feature quantity in the captured image to determine whether or not the block including the coordinates (u, v) is the foreground area, or foreground indicating the probability of being the foreground area. Find the degree. In step S37, it is determined whether or not the processing for all (u, v) extracted in step S32 has been completed. If the determination result is NO, the processing returns to step S33. On the other hand, the process ends if the decision result in the step S37 is YES.

図１０は、図８のステップＳ２４の処理をより詳細に説明するフローチャートである。図１０において、ステップＳ４１は、現在推定中の全ての人物に対して、推定値θの分散を計算する。ステップＳ４２は、全ての人物の中で最も推定値θの分散が大きい人物とその分散を求める。ステップＳ４３は、推定値θの分散が閾値より大きいか否かを判定する。ステップＳ４３の判定結果がＹＥＳであると、ステップＳ４４は、観測対象である人物の推定値（平均推定値）の（θ，d，ｈ）を求める。ステップＳ４５は、パン・チルト角を夫々θ，atan(h/d)に制御するように制御指示を首回転駆動部２３に出力する。ステップＳ４５の後、或いは、ステップＳ４３の判定結果がＮＯであると、処理は終了する。 FIG. 10 is a flowchart for explaining the process of step S24 of FIG. 8 in more detail. In FIG. 10, step S41 calculates the variance of the estimated value θ for all persons currently estimated. In step S42, the person having the largest variance of the estimated value θ among all the persons and its variance are obtained. Step S43 determines whether or not the variance of the estimated value θ is larger than the threshold value. If the decision result in the step S43 is YES, a step S44 obtains (θ, d, h) of the estimated value (average estimated value) of the person to be observed. In step S45, a control instruction is output to the neck rotation drive unit 23 so as to control the pan / tilt angle to θ and atan (h / d), respectively. After step S45 or when the determination result in step S43 is NO, the process ends.

尚、ロボット１の目を向けてアイコンタクトを取らせる観測対象であるユーザの決定方法については、必ずしも推定値θの分散が最大のユーザに決定するではなく、別の注目するユーザを優先する等、ロボット１の用途や使用環境に応じて適切に決定すれば良い。上記の例では、推定値θの分散が閾値より大きいか否かに応じてロボット１にパン・チルトを行わせるか否かを決定しているが、例えば最後に顔領域３１Ｂが検出されてからの経過時間等、様々な条件に応じてロボット１のパン・チルトを制御可能である。 As for the method of determining the user who is the observation target for making eye contact with the eyes of the robot 1, it is not always determined that the variance of the estimated value θ is the maximum, but priority is given to another user of interest, etc. What is necessary is just to determine suitably according to the application and use environment of the robot 1. FIG. In the above example, it is determined whether or not the robot 1 performs pan / tilt depending on whether or not the variance of the estimated value θ is larger than the threshold value. For example, since the face area 31B is detected last. The pan / tilt of the robot 1 can be controlled in accordance with various conditions such as the elapsed time of.

図１１は、コンピュータシステムの一例を示すブロック図である。図１１に示すコンピュータシステム１００は、ＣＰＵ１０１、記憶部１０２、インタフェース（Ｉ／Ｆ）１０３、入力装置１０４、及び表示部１０５がバス１０６により接続された構成を有する。ＣＰＵ１０１は、記憶部１０２に格納されたプログラムを実行することによりコンピュータシステム１００全体を制御する。記憶部１０２は、半導体記憶装置、磁気記録媒体、光記録媒体、光磁気記録媒体等で形成可能であり、上記のプログラムや各種データを格納すると共に、ＣＰＵ１０１が実行する演算の中間結果や演算結果等を一時的に格納する一時メモリとしても機能する。Ｉ／Ｆ１０３は、カメラからの撮像画像を受信したり、記憶部１０２に格納するデータをネットワーク（図示せず）から受信することができる。入力装置１０４は、キーボード等により形成可能である。表示部１０５は、ディスプレイ等により形成可能である。入力装置１０４及び表示部１０５は、タッチパネルのように入力装置と表示部の両方の機能を有する入出力装置で形成しても良い。入力装置１０４は、ユーザがロボット１に指示等を入力する必要がない場合には省略可能であり、表示部１０５は、ロボット１がユーザに対してメッセージ等を表示する必要がない場合には省略可能である。又、表示部１０５の代わりに、音声出力部（図示せず）を設けてユーザに対するメッセージ等を音声で出力するようにしても良いことは言うまでもない。 FIG. 11 is a block diagram illustrating an example of a computer system. A computer system 100 illustrated in FIG. 11 has a configuration in which a CPU 101, a storage unit 102, an interface (I / F) 103, an input device 104, and a display unit 105 are connected by a bus 106. The CPU 101 controls the entire computer system 100 by executing a program stored in the storage unit 102. The storage unit 102 can be formed of a semiconductor storage device, a magnetic recording medium, an optical recording medium, a magneto-optical recording medium, and the like. The storage unit 102 stores the above programs and various data, and also performs intermediate results and calculation results of calculations executed by the CPU 101. It also functions as a temporary memory for temporarily storing etc. The I / F 103 can receive a captured image from the camera and can receive data stored in the storage unit 102 from a network (not shown). The input device 104 can be formed by a keyboard or the like. The display unit 105 can be formed by a display or the like. The input device 104 and the display unit 105 may be formed of an input / output device having functions of both the input device and the display unit, such as a touch panel. The input device 104 can be omitted when the user does not need to input an instruction or the like to the robot 1, and the display unit 105 is omitted when the robot 1 does not need to display a message or the like to the user. Is possible. Needless to say, a voice output unit (not shown) may be provided instead of the display unit 105 to output a message to the user by voice.

ＣＰＵ１０１は、記憶部１０２に格納されたプログラムを実行することにより、コンピュータシステム１００をロボットとして機能させる。つまり、プログラムは、ＣＰＵ１０１にロボットの各部の機能を実現させる。言い換えると、プログラムは、ＣＰＵ１０１に少なくともロボットの位置推定処理の手順を実行させるものであり、記憶部１０２を含む適切なコンピュータ読み取り可能な記憶媒体に格納されていても良い。従って、ＣＰＵ１０１は、図８乃至図１０の処理を実行可能である。 The CPU 101 causes the computer system 100 to function as a robot by executing a program stored in the storage unit 102. That is, the program causes the CPU 101 to realize the function of each unit of the robot. In other words, the program causes the CPU 101 to execute at least the procedure of the robot position estimation process, and may be stored in an appropriate computer-readable storage medium including the storage unit 102. Therefore, the CPU 101 can execute the processes shown in FIGS.

上記実施例によれば、コミュニケーションロボットは任意のタイミングで任意のユーザとアイコンタクトを取れるので、コミュニケーションロボットとユーザ間のコミュニケーションがスムーズに行え、より自然な対人動作（Interactive Operation）が可能となる。ここで、コミュニケーションとは、言語による対話に限らない。ユーザの推定位置の確度を保つために、コミュニケーションロボットが時々ユーザに視線を向ける動作も、例えば幼児が遊びながら時折母親の存在を確認する動作に似た動作となり、ロボットの動作が非常に自然になる。又、コミュニケーションロボットの動作に対するユーザの表情等をすぐ確認することができる等、コミュニケーションに必要なセンシング（Sensing）性能が向上し、コミュニケーションロボットの基本性能の大幅な向上につながる。 According to the above embodiment, since the communication robot can make eye contact with an arbitrary user at an arbitrary timing, communication between the communication robot and the user can be performed smoothly, and a more natural interactive operation is possible. Here, communication is not limited to dialogue in language. In order to maintain the accuracy of the estimated position of the user, the movement of the communication robot sometimes turns the line of sight to the user, for example, an action similar to that of an infant who occasionally confirms the presence of a mother while playing, making the movement of the robot very natural. Become. In addition, the sensing performance necessary for communication is improved, such as the user's facial expression with respect to the operation of the communication robot can be immediately confirmed, leading to a significant improvement in the basic performance of the communication robot.

ところで、上記実施例では、カメラはパン・チルト可能であるため、直交する２軸（例えば、ｘｙ座標系のｘ軸及びｙ軸）を中心に回転可能であるか、或いは、パン・チルト・ロール可能であるため、互いに直交する３軸（例えば、ｘｙｚ座標系のｘ軸、ｙ軸及びｚ軸）を中心に回転可能である。しかし、ユーザの位置を推定できるのであれば、カメラは少なくとも１軸を中心に回転可能であれば良い。 In the above embodiment, since the camera can pan and tilt, it can rotate around two orthogonal axes (for example, the x axis and the y axis of the xy coordinate system), or pan, tilt, and roll. Therefore, it is possible to rotate around three axes orthogonal to each other (for example, the x axis, the y axis, and the z axis of the xyz coordinate system). However, as long as the position of the user can be estimated, it is sufficient that the camera can rotate about at least one axis.

以上の実施例を含む実施形態に関し、更に以下の付記を開示する。
（付記１）
少なくとも１軸を中心に回転可能なカメラと、
前記カメラの撮像画像から観測対象と推定される観測対象らしき画像の観測値を計算する画像生成部と、
観測対象の位置の過去の推定結果と現在の前記カメラの姿勢に基づいて観測対象らしき画像の予測値を計算する画像予測部と、
前記観測値と前記予測値を比較して前記観測値と前記予測値の一致度を尤度として計算する画像比較部と、
前記尤度に基づいて前記観測対象の位置を推定する位置推定部を備えた、ロボット。
（付記２）
前記カメラからの撮像画像から観測対象である人物の顔領域を抽出する顔領域検出部と、
前記カメラからの撮像画像から前景領域を抽出する前景領域抽出部を更に備え、
前記画像生成部は、抽出された顔領域及び前景領域に基づいて人物らしき画像を生成する、付記１記載のロボット。
（付記３）
前記前景領域抽出部は、前記前景領域を抽出する際、前記カメラの可動範囲全域において撮像画像を分割した各ブロックの特徴量を記憶部に予め記憶しておき、前記カメラの現在の撮像画像の各ブロックの特徴量と前記記憶部に予め記憶してある該当ブロックの特徴量を比較し、差異がある部分を前景領域とする、付記２記載のロボット。
（付記４）
前記前景領域抽出部は、前記記憶部に記憶されている背景領域の特徴量と前記カメラの現在の撮像画像の特徴量の比較に基づいて前記前景領域を抽出する際に、バタチャリア距離（Bhattacharyya Distance）又は正規化相関を用い、人物らしき画像を生成する際には前記正規化相関の値又はバタチャリア距離の値をそのまま多値化画像の画素の値として利用する、付記３記載のロボット。
（付記５）
前記位置推定部により推定された前記観測対象の位置の不確実さの評価に基づいて観測方針を決定する観測方針決定部を更に備えた、付記１乃至４のいずれか１項記載のロボット。
（付記６）
前記観測方針に基づいて前記カメラの前記少なくとも１軸に対する回転位置を制御する制御部を更に備えた、付記５記載のロボット。
（付記７）
人物の存在確率がある程度ばらついてきたら、前記制御部を制御することで顔の予測値に向かって前記カメラを動かす、付記６記載のロボット。
（付記８）
少なくとも１軸を中心に回転可能なカメラの撮像画像から観測対象と推定される観測対象らしき画像の観測値を画像生成部により計算する画像生成工程と、
観測対象の位置の過去の推定結果と現在の前記カメラの姿勢に基づいて観測対象らしき画像の予測値を画像予測部により計算する画像予測工程と、
画像比較部により前記観測値と前記予測値を比較して前記観測値と前記予測値の一致度を尤度として計算する画像比較工程と、
前記尤度に基づいて前記観測対象の位置を位置推定部により推定する位置推定工程を含み、
前記画像生成工程は、前記カメラからの撮像画像から観測対象である人物の顔領域を顔領域検出部により抽出し、前記カメラからの撮像画像から前景領域を前景領域抽出部により抽出し、抽出された顔領域及び前景領域に基づいて前記画像生成部により人物らしき画像を生成する、位置推定方法。
（付記９）
前記前景領域を抽出する際、前記カメラの可動範囲全域において撮像画像を分割した各ブロックの特徴量を記憶部に予め記憶しておき、前記カメラの現在の撮像画像の各ブロックの特徴量と前記記憶部に予め記憶してある該当ブロックの特徴量を比較し、差異がある部分を前景領域とする、付記８記載の位置推定方法。
（付記１０）
前記記憶部に記憶されている背景領域の特徴量と前記カメラの現在の撮像画像の特徴量の比較に基づいて前記前景領域を抽出する際に、バタチャリア距離（Bhattacharyya Distance）又は正規化相関を用い、人物らしき画像を生成する際には前記正規化相関の値又はバタチャリア距離の値をそのまま多値化画像の画素の値として利用する、付記９記載の位置推定方法。
（付記１１）
前記位置推定工程により推定された前記観測対象の位置の不確実さの評価に基づいて観測方針を観測方針決定部により決定する観測方針決定工程を更に含む、付記８乃至１０のいずれか１項記載の位置推定方法。
（付記１２）
前記観測方針に基づいて前記カメラの前記少なくとも１軸に対する回転位置を制御部により制御する制御工程を更に含む、付記１１記載の位置推定方法。
（付記１３）
コンピュータに観測対象の位置を推定させる位置推定処理を実行させるプログラムであって、
少なくとも１軸を中心に回転可能なカメラの撮像画像から観測対象と推定される観測対象らしき画像の観測値を計算して記憶部に格納する画像生成手順と、
観測対象の位置の過去の推定結果と現在の前記カメラの姿勢に基づいて観測対象らしき画像の予測値を計算して前記記憶部に格納する画像予測手順と、
前記観測値と前記予測値を比較して前記観測値と前記予測値の一致度を尤度として計算して前記記憶部に記憶する画像比較手順と、
前記尤度に基づいて前記観測対象の位置を推定する位置推定手順と、
前記カメラからの撮像画像から観測対象である人物の顔領域を抽出して前記記憶部に記憶する顔領域抽出手順と、
前記カメラからの撮像画像から前景領域を抽出して前記記憶部に記憶する前景領域抽出手順
を前記コンピュータに実行させ、
前記画像生成手順は、抽出された顔領域及び前景領域に基づいて人物らしき画像を生成する、プログラム。
（付記１４）
前記前景領域抽出手順は、前記前景領域を抽出する際、前記カメラの可動範囲全域において撮像画像を分割した各ブロックの特徴量を記憶部に予め記憶しておき、前記カメラの現在の撮像画像の各ブロックの特徴量と前記記憶部に予め記憶してある該当ブロックの特徴量を比較し、差異がある部分を前景領域とする、付記１３記載のプログラム。
（付記１５）
前記前景領域抽出手順は、前記記憶部に記憶されている背景領域の特徴量と前記カメラの現在の撮像画像の特徴量の比較に基づいて前記前景領域を抽出する際に、バタチャリア距離（Bhattacharyya Distance）又は正規化相関を用い、人物らしき画像を生成する際には前記正規化相関の値又はバタチャリア距離の値をそのまま多値化画像の画素の値として利用する、付記１４記載のプログラム。
（付記１６）
前記位置推定手順により推定された前記観測対象の位置の不確実さの評価に基づいて観測方針を決定する観測方針決定手順
を更に前記コンピュータに実行させる、付記１３乃至１５のいずれか１項記載のプログラム。
（付記１７）
前記観測方針に基づいて前記カメラの前記少なくとも１軸に対する回転位置を制御する制御手順
を更に前記コンピュータに実行させる、付記１６記載のプログラム。 The following additional notes are further disclosed with respect to the embodiment including the above examples.
(Appendix 1)
A camera rotatable about at least one axis;
An image generation unit that calculates an observation value of an image that seems to be an observation target estimated from an image captured by the camera;
An image prediction unit that calculates a predicted value of an image that seems to be an observation target based on the past estimation result of the position of the observation target and the current posture of the camera;
An image comparison unit that compares the observed value with the predicted value and calculates the degree of coincidence between the observed value and the predicted value as a likelihood;
A robot comprising a position estimation unit that estimates the position of the observation target based on the likelihood.
(Appendix 2)
A face area detection unit that extracts a face area of a person to be observed from a captured image from the camera;
Further comprising a foreground region extraction unit for extracting a foreground region from a captured image from the camera;
The robot according to appendix 1, wherein the image generation unit generates an image that looks like a person based on the extracted face region and foreground region.
(Appendix 3)
When extracting the foreground area, the foreground area extraction unit stores in advance a feature amount of each block obtained by dividing the captured image in the entire movable range of the camera in a storage unit, and stores the current captured image of the camera. The robot according to supplementary note 2, wherein a feature amount of each block is compared with a feature amount of a corresponding block stored in advance in the storage unit, and a portion having a difference is set as a foreground region.
(Appendix 4)
The foreground area extracting unit extracts a butteraria distance (Bhattacharyya Distance) when extracting the foreground area based on a comparison between a feature quantity of a background area stored in the storage unit and a feature quantity of a current captured image of the camera. The robot according to appendix 3, wherein when the normalized correlation is used to generate an image that looks like a person, the normalized correlation value or the batcharia distance value is used as it is as the pixel value of the multi-valued image.
(Appendix 5)
The robot according to any one of appendices 1 to 4, further comprising an observation policy determination unit that determines an observation policy based on an evaluation of the uncertainty of the position of the observation target estimated by the position estimation unit.
(Appendix 6)
The robot according to claim 5, further comprising a control unit that controls a rotational position of the camera with respect to the at least one axis based on the observation policy.
(Appendix 7)
The robot according to appendix 6, wherein when the presence probability of a person varies to some extent, the camera is moved toward a predicted face value by controlling the control unit.
(Appendix 8)
An image generation step of calculating an observation value of an image that seems to be an observation object estimated from an image captured by a camera rotatable around at least one axis by an image generation unit;
An image prediction step of calculating a predicted value of an image that seems to be an observation target by an image prediction unit based on a past estimation result of the position of the observation target and the current posture of the camera;
An image comparison step of comparing the observed value and the predicted value by an image comparing unit and calculating the degree of coincidence of the observed value and the predicted value as a likelihood;
Including a position estimation step of estimating the position of the observation target by a position estimation unit based on the likelihood,
In the image generation step, a face area of a person to be observed is extracted from a captured image from the camera by a face area detection unit, and a foreground area is extracted from a captured image from the camera by a foreground area extraction unit. A position estimation method in which an image that looks like a person is generated by the image generation unit based on a face area and a foreground area.
(Appendix 9)
When extracting the foreground area, the feature amount of each block obtained by dividing the captured image in the entire movable range of the camera is stored in advance in a storage unit, and the feature amount of each block of the current captured image of the camera and the block The position estimation method according to appendix 8, wherein the feature amounts of the corresponding blocks stored in advance in the storage unit are compared, and a portion having a difference is set as a foreground region.
(Appendix 10)
When extracting the foreground region based on the comparison between the feature amount of the background region stored in the storage unit and the feature amount of the current captured image of the camera, a Bachattacharyya Distance or a normalized correlation is used. The position estimation method according to appendix 9, wherein when the image that looks like a person is generated, the value of the normalized correlation or the value of the virtual distance is used as it is as a pixel value of the multilevel image.
(Appendix 11)
The appendix 8 to any one of the appendixes 8 to 10, further including an observation policy determination step of determining an observation policy by an observation policy determination unit based on an evaluation of the uncertainty of the position of the observation target estimated by the position estimation step. Location estimation method.
(Appendix 12)
The position estimation method according to claim 11, further comprising a control step of controlling a rotational position of the camera relative to the at least one axis by a control unit based on the observation policy.
(Appendix 13)
A program that executes a position estimation process that causes a computer to estimate the position of an observation target,
An image generation procedure for calculating an observation value of an image that seems to be an observation object from a captured image of a camera that can rotate about at least one axis and storing the observation value in a storage unit;
An image prediction procedure for calculating a predicted value of an image that seems to be an observation target based on the past estimation result of the position of the observation target and the current posture of the camera, and storing it in the storage unit,
An image comparison procedure for comparing the observed value with the predicted value and calculating the degree of coincidence between the observed value and the predicted value as a likelihood and storing it in the storage unit;
A position estimation procedure for estimating the position of the observation target based on the likelihood;
A face area extraction procedure for extracting a face area of a person to be observed from a captured image from the camera and storing it in the storage unit;
Causing the computer to execute a foreground area extraction procedure for extracting a foreground area from a captured image from the camera and storing the foreground area in the storage unit;
The image generation procedure is a program for generating a person-like image based on the extracted face area and foreground area.
(Appendix 14)
In the foreground area extraction procedure, when extracting the foreground area, the feature amount of each block obtained by dividing the captured image over the entire movable range of the camera is stored in a storage unit in advance, and the current captured image of the camera is stored. 14. The program according to appendix 13, wherein the feature amount of each block is compared with the feature amount of the corresponding block stored in advance in the storage unit, and a portion having a difference is set as a foreground region.
(Appendix 15)
The foreground area extraction procedure includes a batch area distance (Bhattacharyya Distance) when extracting the foreground area based on a comparison between a feature quantity of a background area stored in the storage unit and a feature quantity of a current captured image of the camera. The program according to appendix 14, wherein the normalized correlation value or the virtual distance value is used as it is as the pixel value of the multi-valued image when generating an image that looks like a person using normalized correlation.
(Appendix 16)
The appendix 13 to 15, wherein the computer further executes an observation policy determination procedure for determining an observation policy based on an evaluation of the uncertainty of the position of the observation target estimated by the position estimation procedure. program.
(Appendix 17)
The program according to claim 16, further causing the computer to execute a control procedure for controlling a rotational position of the camera relative to the at least one axis based on the observation policy.

以上、開示の位置推定方法、ロボット及びプログラムを実施例により説明したが、本発明は上記実施例に限定されるものではなく、本発明の範囲内で種々の変形及び改良が可能であることは言うまでもない。 As described above, the disclosed position estimation method, robot, and program have been described by way of examples. However, the present invention is not limited to the above examples, and various modifications and improvements can be made within the scope of the present invention. Needless to say.

１コミュニケーションロボット
１１ロボット本体
１２顔領域検出部
１３前景領域抽出部
１４画像生成部
１５画像予測部
１６画像比較部
１７位置推定部
１８観測方針決定部
１９首制御部
２１カメラ
２２首角度取得部
２３首回転駆動部
１０１ＣＰＵ
１０２記憶部 DESCRIPTION OF SYMBOLS 1 Communication robot 11 Robot main body 12 Face area detection part 13 Foreground area extraction part 14 Image generation part 15 Image prediction part 16 Image comparison part 17 Position estimation part 18 Observation policy determination part 19 Neck control part 21 Camera 22 Neck angle acquisition part 23 Neck Rotation drive unit 101 CPU
102 storage unit

Claims

A camera rotatable about at least one axis;
An image generation unit that calculates an observation value of an image that seems to be an observation target estimated from an image captured by the camera;
An image prediction unit that calculates a predicted value of an image that seems to be an observation target based on the past estimation result of the position of the observation target and the current posture of the camera;
An image comparison unit that compares the observed value with the predicted value and calculates the degree of coincidence between the observed value and the predicted value as a likelihood;
A position estimation unit that estimates the position of the observation target based on the likelihood ;
An observation policy determination unit that determines an observation policy based on an evaluation of the uncertainty of the position of the observation target estimated by the position estimation unit;
A robot comprising a control unit that controls a rotational position of the camera with respect to the at least one axis based on the observation policy .

A face area detection unit that extracts a face area of a person to be observed from a captured image from the camera;
Further comprising a foreground region extraction unit for extracting a foreground region from a captured image from the camera
The robot according to claim 1, wherein the image generation unit generates a person-like image based on the extracted face area and foreground area.

When extracting the foreground area, the foreground area extraction unit stores in advance a feature amount of each block obtained by dividing the captured image in the entire movable range of the camera in a storage unit, and stores the current captured image of the camera. The robot according to claim 2, wherein a feature amount of each block is compared with a feature amount of a corresponding block stored in advance in the storage unit, and a portion having a difference is set as a foreground region.

An image generation step of calculating an observation value of an image that seems to be an observation object estimated from an image captured by a camera rotatable around at least one axis by an image generation unit;
An image prediction step of calculating a predicted value of an image that seems to be an observation target by an image prediction unit based on a past estimation result of the position of the observation target and the current posture of the camera;
An image comparison step of comparing the observed value and the predicted value by an image comparing unit and calculating the degree of coincidence of the observed value and the predicted value as a likelihood;
A position estimation step of estimating the position of the observation target based on the likelihood by a position estimation unit ;
An observation policy determination step for determining an observation policy by an observation policy determination unit based on an evaluation of the uncertainty of the position of the observation target estimated by the position estimation step;
A control step of controlling a rotational position of the camera relative to the at least one axis based on the observation policy by a control unit ;
In the image generation step, a face area of a person to be observed is extracted from a captured image from the camera by a face area detection unit, and a foreground area is extracted from a captured image from the camera by a foreground area extraction unit. A position estimation method in which an image that looks like a person is generated by the image generation unit based on a face area and a foreground area.

A program that executes a position estimation process that causes a computer to estimate the position of an observation target,
An image generation procedure for calculating an observation value of an image that seems to be an observation object from a captured image of a camera that can rotate about at least one axis and storing the observation value in a storage unit;
An image prediction procedure for calculating a predicted value of an image that seems to be an observation target based on the past estimation result of the position of the observation target and the current posture of the camera, and storing it in the storage unit,
An image comparison procedure for comparing the observed value with the predicted value and calculating the degree of coincidence between the observed value and the predicted value as a likelihood and storing it in the storage unit;
A position estimation procedure for estimating the position of the observation target based on the likelihood;
A face area extraction procedure for extracting a face area of a person to be observed from a captured image from the camera and storing it in the storage unit;
A foreground area extraction procedure for extracting a foreground area from a captured image from the camera and storing it in the storage unit ;
An observation policy determination procedure for determining an observation policy based on an evaluation of the uncertainty of the position of the observation target estimated by the position estimation procedure;
Causing the computer to execute a control procedure for controlling a rotational position of the camera relative to the at least one axis based on the observation policy ;
The image generation procedure is a program for generating a person-like image based on the extracted face area and foreground area.