JP2003331266A

JP2003331266A - Interactive agent apparatus, interactive agent method, and interactive agent program

Info

Publication number: JP2003331266A
Application number: JP2002140457A
Authority: JP
Inventors: Shiro Takada; 司郎高田; Shinjiro Kawato; 慎二郎川戸; Kenji Mase; 健二間瀬
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2002-05-15
Filing date: 2002-05-15
Publication date: 2003-11-21

Abstract

<P>PROBLEM TO BE SOLVED: To provide an interactive agent apparatus, an interactive agent method, and an interactive agent program that can identify a direction of a user's face at a high recognition rate. <P>SOLUTION: A first camera 1 takes a picture of a user. A computer 9 for image processing detects the distance between eyebrows of the user on the image taken by the first camera 1. A computer 10 for voice processing makes a speaker 7 perform speaking processing. A computer 9 for total control determines that the user faces the front when the distance between the eyebrows detected by the computer 9 for image processing according to the speaking processing. <P>COPYRIGHT: (C)2004,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、例えば利用者と協
調して写真を撮るために用いられる対話エージェント装
置、対話エージェント方法および対話エージェントプロ
グラムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a dialog agent device, a dialog agent method, and a dialog agent program used for taking a picture in cooperation with a user, for example.

【０００２】[0002]

【従来の技術】従来より、体験を記録するメディアとし
て、写真、ビデオテープ等が利用されている。また、デ
ジタルカメラの普及により、利用者が撮影した画像を自
己の意図に基づいて編集することも行われている。2. Description of the Related Art Conventionally, photographs, video tapes, etc. have been used as media for recording experiences. Further, due to the widespread use of digital cameras, images taken by users are also being edited based on their own intentions.

【０００３】一方、娯楽施設、大型店舗等には、プリン
トクラブと呼ばれる写真撮影装置が設置され、比較的低
年齢層に広く利用されている。この写真撮影装置では、
利用者が予め用意された種々の背景のうち好みの背景を
選択し、その背景とともに写真撮影を行い、その場で撮
影した写真がプリントアウトされる。On the other hand, a photography device called a print club is installed in entertainment facilities, large-scale stores, etc., and is widely used by relatively young people. With this photography device,
The user selects a desired background from among various backgrounds prepared in advance, photographs the background together with the background, and prints the photograph taken on the spot.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、上記の
ような写真撮影装置では、音声や表示による操作説明に
従って利用者が操作ボタンを操作することにより写真撮
影が行われる。写真撮影装置は、利用者の操作通りに写
真撮影を行う機械にすぎないため、繰り返し利用してい
ると、興味が薄れてゆく。However, in the above-mentioned photo-taking device, the user takes a photo by operating the operation button in accordance with the operation description by voice or display. Since the photography device is just a machine for taking a photograph according to the operation of the user, if the photographer is repeatedly used, the interest is diminished.

【０００５】一方、人間に対して自律的および協調的に
インタラクションを形成しつつ所定の目標を達成するエ
ージェントの研究が進んでいる。このようなエージェン
トの技術を上記の写真撮影装置に適用すると、利用者の
意図を理解しつつ利用者の意図に沿った写真撮影を行う
ことが可能になると考えられる。それにより、広い年齢
層の興味を惹き付けることができる。また、単に遊びの
写真を撮る場合に限らず、写真撮影装置の応用範囲も広
がるものと予想される。On the other hand, research on agents that achieve predetermined goals while forming interactions with humans autonomously and cooperatively is underway. It is conceivable that if such an agent's technique is applied to the above-described photography device, it becomes possible to take a photograph in accordance with the user's intention while understanding the user's intention. Thereby, it is possible to attract the interest of a wide age group. Further, it is expected that the application range of the photography device is not limited to the case of simply taking a picture of play.

【０００６】利用者と協調して写真撮影を行うために
は、まず、利用者の顔をカメラに向かせることが重要で
ある。利用者の顔の向きを同定するためには、画像処理
技術を用いることができる。しかしながら、従来の画像
処理技術では、高い認識率で利用者の顔の向きを同定す
ることはできない。In order to take a picture in cooperation with the user, it is first important to direct the user's face toward the camera. Image processing techniques can be used to identify the orientation of the user's face. However, the conventional image processing technology cannot identify the direction of the user's face with a high recognition rate.

【０００７】次に、利用者の人数および位置を同定する
ことが重要である。利用者の人数および位置を同定する
ためにも、画像処理技術を用いることができる。しかし
ながら、人間と荷物等の他の物とを画像処理で区別する
ことが難しい場合がある。また、一人の人間の後ろに他
の人間が部分的または全体的に隠れている場合に、人数
の認識率が低下する。Next, it is important to identify the number and location of users. Image processing techniques can also be used to identify the number and location of users. However, there are cases where it is difficult to distinguish humans from other objects such as luggage by image processing. In addition, the recognition rate of the number of people decreases when another person partially or wholly hides behind one person.

【０００８】さらに、写真撮影時に、利用者の操作によ
ることなく、利用者の意図に沿ったレイアウトで撮影を
行うことが望まれる。Further, it is desired that the photograph is taken in a layout according to the user's intention, without the user's operation.

【０００９】本発明の目的は、利用者の顔の向きを高い
認識率で同定することができる対話エージェント装置、
対話エージェント方法および対話エージェントプログラ
ムを提供することである。An object of the present invention is to provide a dialogue agent device capable of identifying the direction of a user's face with a high recognition rate,
An object is to provide a dialog agent method and a dialog agent program.

【００１０】本発明の他の目的は、利用者の人数および
位置を高い認識率で同定することができる対話エージェ
ント装置、対話エージェント方法および対話エージェン
トプログラムを提供することである。Another object of the present invention is to provide a dialog agent device, a dialog agent method, and a dialog agent program capable of identifying the number and positions of users with a high recognition rate.

【００１１】本発明のさらに他の目的は、利用者の意図
に沿ったレイアウトで容易に写真撮影を行うことができ
る対話エージェント装置、対話エージェント方法および
対話エージェントプログラムを提供することである。Still another object of the present invention is to provide a dialogue agent device, a dialogue agent method, and a dialogue agent program which can easily take a photograph with a layout according to a user's intention.

【００１２】[0012]

【課題を解決するための手段および発明の効果】第１の
発明に係る対話エージェント装置は、利用者の顔の向き
を同定する対話エージェント装置であって、利用者を撮
像する撮像手段と、撮像手段により得られる画像におい
て利用者の両眼の間隔を検出する検出手段と、利用者に
発話を行う発話手段と、発話手段の発話に伴って検出手
段により検出される両眼の間隔の変化に基づいて利用者
の顔の向きを判定する判定手段とを備えたものである。Means for Solving the Problems and Effects of the Invention A dialogue agent apparatus according to the first invention is a dialogue agent apparatus for identifying the orientation of a user's face, and an image pickup means for picking up an image of the user and an image pickup means. In the image obtained by the means, a detecting means for detecting the distance between the eyes of the user, an utterance means for uttering the user, and a change in the distance between the eyes detected by the detecting means along with the utterance of the utterance means. And a determination unit that determines the orientation of the user's face based on the determination result.

【００１３】本発明に係る対話エージェント装置におい
ては、利用者が撮像され、撮像により得られる画像にお
いて利用者の両眼の間隔が検出される。利用者に発話が
行われ、発話に伴って検出される両眼の間隔の変化に基
づいて利用者の顔の向きが判定される。In the dialog agent apparatus according to the present invention, the user is imaged and the distance between the two eyes of the user is detected in the image obtained by the imaging. The user is uttered, and the orientation of the user's face is determined based on the change in the distance between the eyes detected with the utterance.

【００１４】このように、画像上での利用者の両眼の間
隔の検出および発話を組み合わせることにより、発話に
伴なう両眼の間隔の変化に基づいて利用者の顔が正面を
向いているか否かを高い認識率で同定することができ
る。By combining the detection of the distance between the eyes of the user on the image and the utterance in this manner, the face of the user is turned to the front based on the change in the distance between the eyes accompanying the utterance. It is possible to identify whether or not there is a high recognition rate.

【００１５】第２の発明に係る対話エージェント方法
は、利用者の顔の向きを同定する対話エージェント方法
であって、利用者を撮像するステップと、撮像により得
られる画像において利用者の両眼の間隔を検出するステ
ップと、利用者に発話を行うステップと、発話に伴って
検出される両眼の間隔の変化に基づいて利用者の顔の向
きを判定するステップとを備えたものである。A dialog agent method according to a second aspect of the present invention is a dialog agent method for identifying the orientation of a user's face, which comprises the step of capturing an image of the user and the image of the image captured by the image of the user's eyes. It is provided with a step of detecting the interval, a step of speaking to the user, and a step of determining the orientation of the user's face based on the change in the interval between both eyes detected with the speech.

【００１６】本発明に係る対話エージェント方法におい
ては、利用者が撮像され、撮像により得られる画像にお
いて利用者の両眼の間隔が検出される。利用者に発話が
行われ、発話に伴って検出される両眼の間隔の変化に基
づいて利用者の顔の向きが判定される。In the dialog agent method according to the present invention, the user is imaged and the distance between the two eyes of the user is detected in the image obtained by the imaging. The user is uttered, and the orientation of the user's face is determined based on the change in the distance between the eyes detected with the utterance.

【００１７】このように、画像上での利用者の両眼の間
隔の検出および発話を組み合わせることにより、発話に
伴なう両眼の間隔の変化に基づいて利用者の顔が正面を
向いているか否かを高い認識率で同定することができ
る。As described above, by detecting the distance between the eyes of the user and the utterance on the image, the face of the user is turned to the front based on the change in the distance between the eyes accompanying the utterance. It is possible to identify whether or not there is a high recognition rate.

【００１８】第３の発明に係る対話エージェントプログ
ラムは、利用者の顔の向きを同定するコンピュータ読み
取り可能な対話エージェントプログラムであって、利用
者を撮像する処理と、撮像により得られた画像において
利用者の両眼の間隔を検出する処理と、利用者に発話を
行う処理と、発話に伴って検出される両眼の間隔の変化
に基づいて利用者の顔の向きを判定する処理とを、コン
ピュータに実行させるものである。A dialogue agent program according to a third aspect of the present invention is a computer-readable dialogue agent program for identifying the orientation of a user's face, and is used in a process of picking up an image of a user and an image obtained by the image pickup. A process of detecting the distance between both eyes of a person, a process of uttering a user, and a process of determining the orientation of the user's face based on a change in the distance between both eyes detected with utterance. It is what makes a computer execute.

【００１９】本発明に係る対話エージェントプログラム
においては、利用者が撮像され、撮像により得られる画
像において利用者の両眼の間隔が検出される。利用者に
発話が行われ、発話に伴って検出される両眼の間隔の変
化に基づいて利用者の顔の向きが判定される。In the dialogue agent program according to the present invention, the user is imaged and the distance between the two eyes of the user is detected in the image obtained by the image pickup. The user is uttered, and the orientation of the user's face is determined based on the change in the distance between the eyes detected with the utterance.

【００２０】このように、画像上での利用者の両眼の間
隔の検出および発話を組み合わせることにより、発話に
伴なう両眼の間隔の変化に基づいて利用者の顔が正面を
向いているか否かを高い認識率で同定することができ
る。By combining the detection of the distance between the eyes of the user and the utterance in this manner, the face of the user is turned to the front based on the change in the distance between the eyes accompanying the utterance. It is possible to identify whether or not there is a high recognition rate.

【００２１】第４の発明に係る対話エージェント装置
は、利用者の人数および位置を同定する対話エージェン
ト装置であって、利用者を撮像する撮像手段と、撮像手
段により得られる画像に基づく画像処理により所定の判
定条件で利用者の人数を推測する推測手段と、推測手段
により推測された利用者の人数に基づいて利用者に発話
を行う発話手段と、発話手段の発話に対する利用者の返
答を認識する認識手段と、認識手段により認識された利
用者の返答に基づいて推測手段の判定条件を切り替える
とともに、推測手段による利用者の人数の推測、発話手
段による発話、認識手段による利用者の返答の認識およ
び撮像手段による撮像を行わせることにより、利用者の
人数および位置を判定する判定手段とを備えたものであ
る。A dialogue agent device according to a fourth aspect of the present invention is a dialogue agent device for identifying the number and position of users, which comprises image pickup means for picking up images of users and image processing based on an image obtained by the image pickup means. Recognizing a guessing unit that estimates the number of users based on a predetermined determination condition, a uttering unit that speaks to the user based on the number of users estimated by the estimating unit, and a user's response to the utterance of the uttering unit The recognition means and the judgment condition of the estimation means are switched based on the response of the user recognized by the recognition means, and the estimation means estimates the number of users, the utterance by the utterance means, and the response of the user by the recognition means. It is provided with a determination means for determining the number and position of the users by causing the recognition and the image pickup by the image pickup means.

【００２２】本発明に係る対話エージェント装置におい
ては、利用者が撮像され、撮像により得られる画像に基
づく画像処理により所定の判定条件で利用者の人数が推
測される。推測された利用者の人数に基づいて利用者に
発話が行われ、発話に対する利用者の返答が認識され
る。認識された利用者の返答に基づいて判定条件が切り
替えられるとともに、利用者の人数の推測、発話、利用
者の返答の認識および撮像が行われることにより、利用
者の人数および位置が判定される。In the dialogue agent device according to the present invention, the number of users is estimated under a predetermined determination condition by image-taking the user and performing image processing based on the image obtained by the image-taking. The user is uttered based on the estimated number of users, and the user's response to the utterance is recognized. The determination conditions are switched based on the recognized user response, and the number and position of the users are determined by estimating the number of users, utterance, recognizing the user's response, and imaging. .

【００２３】このように、異なる判定条件を用いた画像
認識処理および対話処理を組み合わせることにより、高
い認識率で利用者の人数および位置を同定することがで
きる。As described above, by combining the image recognition process and the interactive process using different determination conditions, it is possible to identify the number and position of users with a high recognition rate.

【００２４】第５の発明に係る対話エージェント方法
は、利用者の人数および位置を同定する対話エージェン
ト方法であって、利用者を撮像するステップと、撮像に
より得られる画像に基づく画像処理により所定の判定条
件で利用者の人数を推測するステップと、推測された利
用者の人数に基づいて利用者に発話を行うステップと、
発話に対する利用者の返答を認識するステップと、認識
された利用者の返答に基づいて判定条件を切り替えると
ともに、利用者の人数の推測、発話、利用者の返答の認
識および撮像を行わせることにより、利用者の人数およ
び位置を判定するステップとを備えたものである。A dialogue agent method according to a fifth aspect of the present invention is a dialogue agent method for identifying the number and position of users, the method including a step of picking up an image of the user and a predetermined step by image processing based on the image obtained by the image pickup. A step of estimating the number of users in the determination condition, a step of uttering the user based on the estimated number of users,
By recognizing the response of the user to the utterance and switching the judgment condition based on the recognized response of the user, by estimating the number of users, utterances, recognizing the user's response and imaging. , And the step of determining the number and position of users.

【００２５】本発明に係る対話エージェント方法におい
ては、利用者が撮像され、撮像により得られる画像に基
づく画像処理により所定の判定条件で利用者の人数が推
測される。推測された利用者の人数に基づいて利用者に
発話が行われ、発話に対する利用者の返答が認識され
る。認識された利用者の返答に基づいて判定条件が切り
替えられるとともに、利用者の人数の推測、発話、利用
者の返答の認識および撮像が行われることにより、利用
者の人数および位置が判定される。In the dialog agent method according to the present invention, the number of users is estimated under a predetermined determination condition by image-capturing the user and performing image processing based on the image obtained by the image-capturing. The user is uttered based on the estimated number of users, and the user's response to the utterance is recognized. The determination conditions are switched based on the recognized user response, and the number and position of the users are determined by estimating the number of users, utterance, recognizing the user's response, and imaging. .

【００２６】このように、異なる判定条件を用いた画像
認識処理および対話処理を組み合わせることにより、高
い認識率で利用者の人数および位置を同定することがで
きる。As described above, by combining the image recognition processing and the interactive processing using different determination conditions, it is possible to identify the number and position of users with a high recognition rate.

【００２７】第６の発明に係る対話エージェントプログ
ラムは、利用者の人数および位置を同定するコンピュー
タ読み取り可能な対話エージェントプログラムであっ
て、利用者を撮像する処理と、撮像により得られる画像
に基づく画像処理により所定の判定条件で利用者の人数
を推測する処理と、推測された利用者の人数に基づいて
利用者に発話を行う処理と、発話に対する利用者の返答
を認識する処理と、認識された利用者の返答に基づいて
判定条件を切り替えるとともに、利用者の人数の推測、
発話、の返答の認識および撮像を行わせることにより、
利用者の人数および位置を判定する処理とを、コンピュ
ータに実行させるものである。A dialogue agent program according to a sixth aspect of the present invention is a computer-readable dialogue agent program for identifying the number and position of users, which is a process of picking up an image of a user and an image based on an image obtained by the image pickup. The process of estimating the number of users under a predetermined determination condition by the process, the process of uttering the user based on the estimated number of users, the process of recognizing the user's reply to the utterance, and the recognition The judgment conditions are switched based on the user's reply, and the number of users is estimated,
By recognizing utterances and responses and capturing images,
The computer is caused to execute the process of determining the number and position of users.

【００２８】本発明に係る対話エージェントプログラム
においては、利用者が撮像され、撮像により得られる画
像に基づく画像処理により所定の判定条件で利用者の人
数が推測される。推測された利用者の人数に基づいて利
用者に発話が行われ、発話に対する利用者の返答が認識
される。認識された利用者の返答に基づいて判定条件が
切り替えられるとともに、利用者の人数の推測、発話、
利用者の返答の認識および撮像が行われることにより、
利用者の人数および位置が判定される。In the dialogue agent program according to the present invention, the number of users is estimated under a predetermined determination condition by image-capturing the user and performing image processing based on the image obtained by the image-capturing. The user is uttered based on the estimated number of users, and the user's response to the utterance is recognized. Judgment conditions are switched based on the recognized user's response, and the number of users is estimated, utterance,
By recognizing the user's reply and imaging,
The number and location of users are determined.

【００２９】このように、異なる判定条件を用いた画像
認識処理および対話処理を組み合わせることにより、高
い認識率で利用者の人数および位置を同定することがで
きる。As described above, by combining the image recognition processing and the interactive processing using different determination conditions, it is possible to identify the number and position of users with a high recognition rate.

【００３０】第７の発明に係る対話エージェント装置
は、利用者の写真撮影を行う対話エージェント装置であ
って、利用者を撮像する第１の撮像手段と、利用者を撮
像する第２の撮像手段と、第１の撮像手段により得られ
た画像の一定位置に利用者の像が位置するように第１の
撮像手段の撮像方向を制御するとともに、第１の撮像手
段の撮像方向の制御に連動して第２の撮像手段の撮像方
向を制御する第１の制御手段と、利用者からの音声によ
る指示を認識する認識手段と、認識手段により画像のレ
イアウトの指示が認識された場合にレイアウトの指示に
従った画像が得られるように第２の撮像手段を制御し、
認識手段により写真撮影の指示が認識された場合に第２
の撮像手段により静止画像が撮影されるように第２の撮
像手段を制御する第２の制御手段とを備えたものであ
る。A dialogue agent apparatus according to a seventh aspect of the present invention is a dialogue agent apparatus for taking a picture of a user, which is a first image pickup means for picking up an image of the user and a second image pickup means for picking up an image of the user. And controlling the imaging direction of the first imaging means so that the image of the user is located at a fixed position of the image obtained by the first imaging means, and interlocking with the control of the imaging direction of the first imaging means. Then, the first control means for controlling the image pickup direction of the second image pickup means, the recognition means for recognizing the voice instruction from the user, and the layout of the layout when the image layout instruction is recognized by the recognition means. Controlling the second imaging means so as to obtain an image according to the instruction,
If the recognition means recognizes the instruction to take the photograph, the second
And second control means for controlling the second imaging means so that a still image is captured by the imaging means.

【００３１】本発明に係る対話エージェント装置におい
ては、第１の撮像手段を用いて利用者が撮像され、第２
の撮像手段を用いて利用者が撮像される。また、第１の
撮像手段により得られた画像の一定位置に利用者の像が
位置するように第１の撮像手段の撮像方向が制御される
とともに、第１の撮像手段の撮像方向の制御に連動して
第２の撮像手段の撮像方向が制御される。利用者からの
音声による指示が認識され、画像のレイアウトの指示が
認識された場合にレイアウトの指示に従った画像が得ら
れるように第２の撮像手段が制御され、写真撮影の指示
が認識された場合に第２の撮像手段により静止画像が撮
影されるように第２の撮像手段が制御される。In the dialogue agent device according to the present invention, the user is imaged by using the first imaging means,
The user is imaged by using the image pickup means. Further, the image pickup direction of the first image pickup means is controlled so that the image of the user is located at a fixed position of the image obtained by the first image pickup means, and the image pickup direction of the first image pickup means is controlled. The imaging direction of the second imaging means is controlled in conjunction with each other. The second imaging means is controlled so that an image in accordance with the layout instruction is obtained when the voice instruction from the user is recognized, and when the image layout instruction is recognized, the photography instruction is recognized. In such a case, the second image pickup means is controlled so that a still image is taken by the second image pickup means.

【００３２】このように、画像認識処理および対話処理
を組み合わせることにより、利用者と協調して利用者が
意図するレイアウトで容易に写真撮影を行うことができ
る。また、第１の撮像手段により利用者を追跡し、第２
の撮像手段を第１の撮像手段に連動させるとともに、第
２の撮像手段を利用者の音声によるレイアウト指示に従
って制御することにより、利用者が移動しても利用者が
意図するレイアウトで写真撮影を行うことができる。As described above, by combining the image recognition processing and the interactive processing, it is possible to easily take a photograph in a layout intended by the user in cooperation with the user. Also, the user is tracked by the first imaging means, and the second
By interlocking the image pickup means of 1. with the first image pickup means and controlling the second image pickup means according to the layout instruction by the voice of the user, even if the user moves, it is possible to take a photograph with the layout intended by the user. It can be carried out.

【００３３】第８の発明に係る対話エージェント方法
は、利用者の写真撮影を行う対話エージェント方法であ
って、第１の撮像手段を用いて利用者を撮像するステッ
プと、第２の撮像手段を用いて利用者を撮像するステッ
プと、第１の撮像手段により得られた画像の一定位置に
利用者の像が位置するように第１の撮像手段の撮像方向
を制御するとともに、第１の撮像手段の撮像方向の制御
に連動して第２の撮像手段の撮像方向を制御するステッ
プと、利用者からの音声による指示を認識するステップ
と、認識により画像のレイアウトの指示が認識された場
合にレイアウトの指示に従った画像が得られるように第
２の撮像手段を制御し、認識により写真撮影の指示が認
識された場合に第２の撮像手段により静止画像が撮影さ
れるように第２の撮像手段を制御するステップとを備え
たものである。An interactive agent method according to an eighth aspect of the present invention is an interactive agent method for taking a picture of a user, which comprises a step of imaging the user using the first imaging means and a second imaging means. A step of capturing an image of the user by using the first image capturing means while controlling the image capturing direction of the first image capturing means so that the image of the user is located at a fixed position of the image obtained by the first image capturing means. The step of controlling the image pickup direction of the second image pickup means in conjunction with the control of the image pickup direction of the means, the step of recognizing a voice instruction from the user, and the case where the image layout instruction is recognized by the recognition. The second image pickup means is controlled so as to obtain an image in accordance with the layout instruction, and the second image pickup means captures a still image when the instruction for photography is recognized by recognition. Shooting Those with and controlling means.

【００３４】本発明に係る対話エージェント方法におい
ては、第１の撮像手段を用いて利用者が撮像され、第２
の撮像手段を用いて利用者が撮像される。また、第１の
撮像手段により得られた画像の一定位置に利用者の像が
位置するように第１の撮像手段の撮像方向が制御される
とともに、第１の撮像手段の撮像方向の制御に連動して
第２の撮像手段の撮像方向が制御される。利用者からの
音声による指示が認識され、画像のレイアウトの指示が
認識された場合にレイアウトの指示に従った画像が得ら
れるように第２の撮像手段が制御され、写真撮影の指示
が認識された場合に第２の撮像手段により静止画像が撮
影されるように第２の撮像手段が制御される。In the dialog agent method according to the present invention, the user is imaged using the first imaging means and the second
The user is imaged by using the image pickup means. Further, the image pickup direction of the first image pickup means is controlled so that the image of the user is located at a fixed position of the image obtained by the first image pickup means, and the image pickup direction of the first image pickup means is controlled. The imaging direction of the second imaging means is controlled in conjunction with each other. The second imaging means is controlled so that an image in accordance with the layout instruction is obtained when the voice instruction from the user is recognized, and when the image layout instruction is recognized, the photography instruction is recognized. In such a case, the second image pickup means is controlled so that a still image is taken by the second image pickup means.

【００３５】このように、画像認識処理および対話処理
を組み合わせることにより、利用者と協調して利用者が
意図するレイアウトで容易に写真撮影を行うことができ
る。また、第１の撮像手段により利用者を追跡し、第２
の撮像手段を第１の撮像手段に連動させるとともに、第
２の撮像手段を利用者の音声によるレイアウト指示に従
って制御することにより、利用者が移動しても利用者が
意図するレイアウトで写真撮影を行うことができる。As described above, by combining the image recognition processing and the interactive processing, it is possible to easily take a photograph in a layout intended by the user in cooperation with the user. Also, the user is tracked by the first imaging means, and the second
By interlocking the image pickup means of 1. with the first image pickup means and controlling the second image pickup means according to the layout instruction by the voice of the user, even if the user moves, it is possible to take a photograph with the layout intended by the user. It can be carried out.

【００３６】第９の発明に係る対話エージェントプログ
ラムは、利用者の写真撮影を行うコンピュータ読み取り
可能な対話エージェントプログラムであって、第１の撮
像手段を用いて利用者を撮像する処理と、第２の撮像手
段を用いて利用者を撮像する処理と、第１の撮像手段に
より得られた画像の一定位置に利用者の像が位置するよ
うに第１の撮像手段の撮像方向を制御するとともに、第
１の撮像手段の撮像方向の制御に連動して第２の撮像手
段の撮像方向を制御する処理と、利用者からの音声によ
る指示を認識する処理と、認識により画像のレイアウト
の指示が認識された場合にレイアウトの指示に従った画
像が得られるように第２の撮像手段を制御し、認識によ
り写真撮影の指示が認識された場合に第２の撮像手段に
より静止画像が撮影されるように第２の撮像手段を制御
する処理とを、コンピュータに実行させるものである。A dialogue agent program according to a ninth aspect of the present invention is a computer-readable dialogue agent program for taking a picture of a user, which comprises a process of picking up an image of the user using the first image pickup means, and a second step. A process of capturing an image of the user by using the image capturing means, and controlling the image capturing direction of the first image capturing means so that the image of the user is located at a fixed position of the image obtained by the first image capturing means. A process of controlling the image pickup direction of the second image pickup unit in association with the control of the image pickup direction of the first image pickup unit, a process of recognizing a voice instruction from a user, and a recognition of an image layout instruction. The second image pickup means is controlled so that an image according to the layout instruction is obtained when the image pickup instruction is received, and the still image is taken by the second image pickup means when the photograph instruction is recognized by the recognition. And a process of controlling the second imaging means to be, those to be executed by a computer.

【００３７】本発明に係る対話エージェントプログラム
においては、第１の撮像手段を用いて利用者が撮像さ
れ、第２の撮像手段を用いて利用者が撮像される。ま
た、第１の撮像手段により得られた画像の一定位置に利
用者の像が位置するように第１の撮像手段の撮像方向が
制御されるとともに、第１の撮像手段の撮像方向の制御
に連動して第２の撮像手段の撮像方向が制御される。利
用者からの音声による指示が認識され、画像のレイアウ
トの指示が認識された場合にレイアウトの指示に従った
画像が得られるように第２の撮像手段が制御され、写真
撮影の指示が認識された場合に第２の撮像手段により静
止画像が撮影されるように第２の撮像手段が制御され
る。In the dialogue agent program according to the present invention, the user is imaged by using the first image pickup means and the user is imaged by using the second image pickup means. Further, the image pickup direction of the first image pickup means is controlled so that the image of the user is located at a fixed position of the image obtained by the first image pickup means, and the image pickup direction of the first image pickup means is controlled. The imaging direction of the second imaging means is controlled in conjunction with each other. The second imaging means is controlled so that an image in accordance with the layout instruction is obtained when the voice instruction from the user is recognized, and when the image layout instruction is recognized, the photography instruction is recognized. In such a case, the second image pickup means is controlled so that a still image is taken by the second image pickup means.

【００３８】このように、画像認識処理および対話処理
を組み合わせることにより、利用者と協調して利用者が
意図するレイアウトで容易に写真撮影を行うことができ
る。また、第１の撮像手段により利用者を追跡し、第２
の撮像手段を第１の撮像手段に連動させるとともに、第
２の撮像手段を利用者の音声によるレイアウト指示に従
って制御することにより、利用者が移動しても利用者が
意図するレイアウトで写真撮影を行うことができる。As described above, by combining the image recognition processing and the interactive processing, it is possible to easily take a picture with the layout intended by the user in cooperation with the user. Also, the user is tracked by the first imaging means, and the second
By interlocking the image pickup means of 1. with the first image pickup means and controlling the second image pickup means according to the layout instruction by the voice of the user, even if the user moves, it is possible to take a photograph with the layout intended by the user. It can be carried out.

【００３９】[0039]

【発明の実施の形態】図１は本発明の一実施の形態にお
ける対話エージェント装置の構成を示すブロック図であ
る。また、図２は図１の対話エージェント装置の動作を
説明するための模式図である。図１の対話エージェント
装置は、利用者と協調して写真撮影を行うために用いら
れる写真撮影装置である。1 is a block diagram showing the configuration of a dialogue agent apparatus according to an embodiment of the present invention. FIG. 2 is a schematic diagram for explaining the operation of the dialogue agent device of FIG. The dialogue agent device of FIG. 1 is a photo-taking device used for taking a photo in cooperation with a user.

【００４０】図１の対話エージェント装置は、第１のカ
メラ１、第２のカメラ２および第３のカメラ３を備え
る。第１〜第３のカメラ１〜３はビデオカメラである。
第１のカメラ１は、駆動装置４によリ左右および上下に
揺動可能に設けられている。同様に、第２のカメラ２
は、駆動装置５により左右および上下に揺動可能に設け
られている。The dialogue agent apparatus of FIG. 1 comprises a first camera 1, a second camera 2 and a third camera 3. The first to third cameras 1 to 3 are video cameras.
The first camera 1 is swingably provided to the left and right and up and down by the drive device 4. Similarly, the second camera 2
Are provided so as to be swingable left and right and up and down by the drive device 5.

【００４１】図２に示すように、第１のカメラ１および
第２のカメラ２は、利用者をほぼ正面から撮像するよう
に配置されている。第３のカメラ３は、利用者を上方か
ら撮像するように配置されている。As shown in FIG. 2, the first camera 1 and the second camera 2 are arranged so as to image the user substantially from the front. The third camera 3 is arranged so as to capture an image of the user from above.

【００４２】また、図１の対話エージェント装置は、モ
ニタ装置６、スピーカ７およびマイク８を備える。モニ
タ装置６は、利用者により視認可能なように利用者のほ
ぼ正面に配置されている。Further, the dialogue agent device of FIG. 1 comprises a monitor device 6, a speaker 7 and a microphone 8. The monitor device 6 is arranged almost in front of the user so that the user can visually recognize it.

【００４３】さらに、図１の対話エージェント装置は、
画像処理用コンピュータ９、音声処理用コンピュータ１
０および統括制御用コンピュータ１１を備える。これら
のコンピュータ９，１０，１１は、例えばパーソナルコ
ンピュータであり、オペレーションシステム（ＯＳ）お
よび後述する対話エージェントプログラムに従って各種
動作を行う。Further, the dialogue agent device of FIG.
Image processing computer 9, audio processing computer 1
0 and a computer 11 for overall control. These computers 9, 10, 11 are, for example, personal computers, and perform various operations according to an operating system (OS) and a dialogue agent program described later.

【００４４】第１〜第３のカメラ１〜３により得られた
画像は画像データとして画像処理用コンピュータ９に与
えられる。画像処理用コンピュータ９は、画像処理プロ
グラムに従って、第１〜第３のカメラ１〜３から与えら
れ画像データに対して画像処理を行う。この画像処理に
は、画像認識処理、後述する眉間抽出処理等が含まれ
る。第１〜第３のカメラ１〜３により得られた画像は、
適宜モニタ装置６の画面に表示される。The images obtained by the first to third cameras 1 to 3 are given to the image processing computer 9 as image data. The image processing computer 9 performs image processing on the image data given from the first to third cameras 1 to 3 according to the image processing program. The image processing includes image recognition processing and eyebrow extraction processing, which will be described later. The images obtained by the first to third cameras 1 to 3 are
It is appropriately displayed on the screen of the monitor device 6.

【００４５】音声処理用コンピュータ１０は、音声処理
を行い、スピーカ７から音声を出力するとともに利用者
の音声をマイク８を通して音声データとして入力する。
この音声処理には、音声認識処理、音声合成処理、発話
処理、受話処理等が含まれる。The voice processing computer 10 performs voice processing, outputs voice from the speaker 7 and inputs the voice of the user as voice data through the microphone 8.
This voice processing includes voice recognition processing, voice synthesis processing, speech processing, reception processing, and the like.

【００４６】統括制御用コンピュータ１１は、対話エー
ジェントプログラムに従って画像処理用コンピュータ９
および音声処理用コンピュータ１０を統括制御するとと
もに、第１〜第３のカメラ１〜３および駆動装置４，５
を制御する。The overall control computer 11 is the image processing computer 9 according to the dialogue agent program.
And the audio processing computer 10 as a whole, and also controls the first to third cameras 1 to 3 and the driving devices 4 and 5.
To control.

【００４７】なお、本実施の形態では、画像処理用コン
ピュータ９、音声処理用コンピュータ１０および統括制
御用コンピュータ１１により対話エージェントプログラ
ムに従う対話エージェント方法が実行されるが、１台の
コンピュータにより対話エージェントプログラムに従う
対話エージェント方法が実行されてもよい。In the present embodiment, the image processing computer 9, the voice processing computer 10 and the overall control computer 11 execute the dialog agent method according to the dialog agent program. The interactive agent method according to claim 1 may be implemented.

【００４８】以下、対話エージェントプログラムに従う
図１の対話エージェント装置の動作について説明する。The operation of the dialog agent device of FIG. 1 according to the dialog agent program will be described below.

【００４９】図１の対話エージェント装置による写真撮
影では、まず、利用者の顔を正面（第１および第２のカ
メラ１，２の方向）に向かせることが重要である。この
場合、利用者の顔の向きを同定することが必要となる。In taking a picture by the dialog agent apparatus of FIG. 1, first, it is important that the user's face is directed to the front (direction of the first and second cameras 1 and 2). In this case, it is necessary to identify the orientation of the user's face.

【００５０】ここで、図１の対話エージェント装置によ
る利用者の顔向きの同定処理を説明する。図３および図
４は対話エージェント装置による利用者の顔向きの同定
処理を示すフローチャートである。また、図５は顔向き
の同定処理の一例を説明するための模式図である。Now, the process of identifying the face orientation of the user by the dialog agent apparatus of FIG. 1 will be described. 3 and 4 are flowcharts showing the identification processing of the face orientation of the user by the dialogue agent device. Further, FIG. 5 is a schematic diagram for explaining an example of face orientation identification processing.

【００５１】まず、統括制御用コンピュータ１１は、第
１のカメラ１に利用者の撮像を開始させる（ステップＳ
１）。第１のカメラ１により得られた画像は、画像デー
タとして画像処理用コンピュータ９に与えられるととも
に、モニタ装置６の画面に表示される。First, the overall control computer 11 causes the first camera 1 to start imaging the user (step S).
1). The image obtained by the first camera 1 is given to the image processing computer 9 as image data and is displayed on the screen of the monitor device 6.

【００５２】画像処理用コンピュータ９は、画像データ
に基づいて眉間抽出処理により画像上での利用者の眉間
（利用者の目と目の間）の距離を検出する（ステップＳ
２）。この場合、例えば、川戸慎二郎，鉄谷信二：「ア
イカメラへの目位置出力を目的とした目の検出と追
跡」，信学技報，ＰＲＭＵ２００１−１５３，ｐｐ．１
−６（２００１）に提案された眉間抽出技術を用いるこ
とができる。例えば、図５の上図に示すように、第１の
カメラ１により得られた画像ＶＤ１上で利用者の眉間の
距離Ｌ１を検出する。The image processing computer 9 detects the distance between the eyebrows (between the eyes of the users) on the image by the eyebrow extraction processing based on the image data (step S).
2). In this case, for example, Shinjiro Kawato, Shinji Tetsuya: “Eye Detection and Tracking for the purpose of eye position output to eye camera”, IEICE Technical Report, PRMU 2001-153, pp. 1
-6 (2001), the eyebrow extraction technique proposed can be used. For example, as shown in the upper diagram of FIG. 5, the distance L1 between the eyebrows of the user is detected on the image VD1 obtained by the first camera 1.

【００５３】そして、統括制御用コンピュータ１１は、
画像処理用コンピュータ９により検出された眉間の距離
を最大値として記憶する（ステップＳ３）。The overall control computer 11 is
The distance between the eyebrows detected by the image processing computer 9 is stored as the maximum value (step S3).

【００５４】その後、音声処理用コンピュータ１０が発
話処理を行う（ステップＳ４）。それより、スピーカ７
から音声が出力される。例えば、「はーい、こっちを向
いて」というような発話が行われる。この場合、単に
「こんにちは」というような発話が行われてもよい。Thereafter, the voice processing computer 10 performs speech processing (step S4). Than that, speaker 7
The sound is output from. For example, a utterance such as "Hye, look at me" is performed. In this case, simply the utterance may be performed, such as "Hello".

【００５５】画像処理用コンピュータ９は、第１のカメ
ラ１により得られた画像データに基づいて眉間抽出処理
により画像上での利用者の眉間の距離を検出する（ステ
ップＳ５）。The image processing computer 9 detects the distance between the eyebrows of the user on the image by the eyebrow extraction processing based on the image data obtained by the first camera 1 (step S5).

【００５６】次に、統括制御用コンピュータ１１は、画
像処理用コンピュータ９により検出された眉間の距離が
最大値よりも長くなったか否かを判定する（ステップＳ
６）。画像処理用コンピュータ９により検出された眉間
の距離が最大値よりも長くなった場合には、統括制御用
コンピュータ１１は、検出された眉間の距離を新たな最
大値として記憶する（ステップＳ７）。画像処理用コン
ピュータ９により検出された眉間の距離が最大値以下の
場合には、ステップＳ８に進む。Next, the overall control computer 11 determines whether or not the distance between the eyebrows detected by the image processing computer 9 has become longer than the maximum value (step S).
6). When the distance between the eyebrows detected by the image processing computer 9 becomes longer than the maximum value, the overall control computer 11 stores the detected distance between the eyebrows as a new maximum value (step S7). When the distance between the eyebrows detected by the image processing computer 9 is less than or equal to the maximum value, the process proceeds to step S8.

【００５７】ここで、画像上で眉間の距離が前より長く
なるということは、利用者がより正面を向いたと考える
ことができる。例えば、図５の下図に示すように、発話
処理に伴って、第１のカメラ１により得られた画像ＶＤ
１上で眉間の距離Ｌ１が長くなった場合には、利用者が
より正面を向いたと判定することができる。一方、第１
のカメラ１により得られた画像上で利用者の眉間の距離
が短くなった場合には、利用者が逆に横を向いたと判定
することができる。Here, the fact that the distance between the eyebrows on the image is longer than before can be considered to mean that the user is facing more toward the front. For example, as shown in the lower diagram of FIG. 5, an image VD obtained by the first camera 1 in association with the utterance process.
When the distance L1 between the eyebrows on 1 is long, it can be determined that the user is facing more frontward. On the other hand, the first
When the distance between the user's eyebrows is shortened on the image obtained by the camera 1 of 1, it can be determined that the user has turned sideways.

【００５８】次に、統括制御用コンピュータ１１は、所
定の回数発話したか否かを判定する（ステップＳ８）。
所定の回数発話していない場合には、ステップＳ４に戻
り、ステップＳ４〜Ｓ８のサイクルを繰り返す。ステッ
プＳ４の発話処理の内容は、前のサイクルのステップＳ
６の判定結果によって異なる。例えば、前のサイクルの
ステップＳ６において、眉間の距離が長くなったと判定
された場合には、次のサイクルのステップＳ４の発話処
理で「もっとこっちを向いて」のようになり、前のサイ
クルのステップＳ６において、眉間の距離が短くなった
と判定された場合には、次のサイクルのステップＳ４の
発話処理で「お願いだからこっちを向いて」のようにな
る。Next, the overall control computer 11 determines whether or not the user has spoken a predetermined number of times (step S8).
If the user has not uttered a predetermined number of times, the process returns to step S4 and the cycle of steps S4 to S8 is repeated. The content of the utterance processing in step S4 is the same as in step S of the previous cycle.
It depends on the judgment result of 6. For example, if it is determined in step S6 of the previous cycle that the distance between the eyebrows has become longer, the utterance processing of step S4 of the next cycle becomes "Look at me more", When it is determined in step S6 that the distance between the eyebrows has become shorter, the speech process of step S4 in the next cycle is like "Because this is what you are looking for".

【００５９】利用者が完全に正面を向いたときの画面上
での眉間の距離は不明であるため、所定の回数発話処理
を行った後、統括制御用コンピュータ１１は、記憶した
最大値を利用者が正面を向いたときの眉間の距離である
と認定する（ステップＳ９）。Since the distance between the eyebrows on the screen when the user completely faces the front is unknown, after the utterance processing is performed a predetermined number of times, the overall control computer 11 uses the stored maximum value. It is determined that the distance is between the eyebrows when the person faces the front (step S9).

【００６０】その後、音声処理用コンピュータ１０が発
話処理を行う（ステップＳ１０）。それより、スピーカ
７から音声が出力される。例えば、「ありがとう」とい
うような発話が行われる。After that, the voice processing computer 10 performs speech processing (step S10). As a result, sound is output from the speaker 7. For example, an utterance such as "Thank you" is performed.

【００６１】画像処理用コンピュータ９は、第１のカメ
ラ１により得られた画像データに基づいて眉間抽出処理
により画像上での利用者の眉間の距離を検出する（ステ
ップＳ１１）。The image processing computer 9 detects the distance between the eyebrows of the user on the image by the eyebrow extraction processing based on the image data obtained by the first camera 1 (step S11).

【００６２】次に、統括制御用コンピュータ１１は、画
像処理用コンピュータ９により検出された眉間の距離が
最大値以上であるか否かを判定する（ステップＳ１
２）。画像処理用コンピュータ９により検出された眉間
の距離が最大値以上の場合には、統括制御用コンピュー
タ１１は、現在利用者の顔が正面を向いていると判定す
る（ステップＳ１３）。Next, the overall control computer 11 determines whether or not the distance between the eyebrows detected by the image processing computer 9 is greater than or equal to the maximum value (step S1).
2). When the distance between the eyebrows detected by the image processing computer 9 is greater than or equal to the maximum value, the overall control computer 11 determines that the face of the user is currently facing the front (step S13).

【００６３】画像処理用コンピュータ９により検出され
た眉間の距離が最大値よりも小さい場合には、ステップ
Ｓ１０に進み、画像処理用コンピュータ９により検出さ
れる眉間の距離が最大値以上になるまで、ステップＳ１
０〜Ｓ１２の処理を繰り返す。この場合には、発話処理
により利用者に正面を向かせる。When the distance between the eyebrows detected by the image processing computer 9 is smaller than the maximum value, the process proceeds to step S10 until the distance between the eyebrows detected by the image processing computer 9 becomes equal to or larger than the maximum value. Step S1
The processing from 0 to S12 is repeated. In this case, the user is made to face the front by utterance processing.

【００６４】このようにして、画像上での利用者の眉間
の距離の検出および発話処理を組み合わせることによ
り、眉間の距離の変化に基づいて、利用者の顔が正面を
向いているか否かを高い認識率で同定することができ
る。In this way, by combining the detection of the distance between the eyebrows of the user on the image and the utterance processing, it is possible to determine whether or not the face of the user is facing forward based on the change in the distance between the eyebrows. It can be identified with a high recognition rate.

【００６５】次に、図１の対話エージェント装置による
利用者の人数および位置の同定処理を説明する。図６お
よび図７は図１の対話エージェント装置による利用者の
人数および位置の同定処理を示すフローチャートであ
る。また、図８は利用者の人数および位置の同定処理の
一例を説明するための模式図である。Next, the process of identifying the number of users and the positions of the users by the dialogue agent device of FIG. 1 will be described. 6 and 7 are flowcharts showing the identification processing of the number of users and the positions of the users by the dialogue agent device of FIG. FIG. 8 is a schematic diagram for explaining an example of the number of users and the position identification process.

【００６６】まず、統括制御用コンピュータ１１は、第
１および第３のカメラ１，３による撮像を開始させる
（ステップＳ２１）。第１および第３のカメラ１，３に
より得られた画像は、画像データとして画像処理用コン
ピュータ９に与えられる。また、第１のカメラ１により
得られた画像は、モニタ装置６の画面に表示される。First, the overall control computer 11 starts image pickup by the first and third cameras 1 and 3 (step S21). The images obtained by the first and third cameras 1 and 3 are given to the image processing computer 9 as image data. The image obtained by the first camera 1 is displayed on the screen of the monitor device 6.

【００６７】画像処理用コンピュータ９は、第１のカメ
ラ１により与えられた画像データに基づいて画像認識処
理により利用者の人数を推測する（ステップＳ２２）。
ここで、画像処理用コンピュータ９は、第１の判定条件
に基づいて利用者の人数を推測する。第１の判定条件
は、例えば、利用者を正面から撮像することにより得ら
れた画像に含まれるほぼ円形の肌色の領域の有無であ
る。The image processing computer 9 estimates the number of users by image recognition processing based on the image data given by the first camera 1 (step S22).
Here, the image processing computer 9 estimates the number of users based on the first determination condition. The first determination condition is, for example, the presence / absence of a substantially circular flesh-colored area included in an image obtained by imaging the user from the front.

【００６８】図７の上図に示すように、第１のカメラ１
により得られた画像ＶＤ１に４つの肌色の領域５１，５
２，５３，５４が含まれる場合には、利用者が４人であ
ると推測される。この場合、肌色に近い色のバッグ等の
物を人間の顔と誤って判別してしまう可能性がある。例
えば、領域５４はベージュ色のバッグである。As shown in the upper diagram of FIG. 7, the first camera 1
The image VD1 obtained by
When 2, 53, 54 are included, it is estimated that there are four users. In this case, an object such as a bag having a color close to skin color may be erroneously identified as a human face. For example, region 54 is a beige bag.

【００６９】次に、音声処理用コンピュータ１０は、画
像処理用コンピュータ９による人数の推測結果に基づい
て対話処理を行う（ステップＳ２３）。それにより、ス
ピーカ７から音声が出力される。例えば、「今日は４人
も来てくれてありがとう」というような発話が行われ
る。この場合、「何人で来てくれたの？」というような
警告 1人数を直接尋ねるような質問はせずに、画像認識
処理による人数の推測結果が正しいかどうかを間接的に
確かめるような内容の発話を行う。利用者の興味を惹き
付けるために、あたかも写真撮影装置が人間のような目
と思考力を持っているような印象を与える内容の発話が
好ましい。この発話に対して利用者が返答する。利用者
の返答は、マイク８を通して音声処理用コンピュータ１
０に与えられる。Next, the voice processing computer 10 performs a dialogue process based on the result of the estimation of the number of persons by the image processing computer 9 (step S23). As a result, sound is output from the speaker 7. For example, an utterance such as "Thank you for coming four people today" is performed. In this case, a warning such as "How many people did you come to?" 1 Do not ask directly the number of people, but indirectly confirm whether or not the estimation result of the number of people by image recognition processing is correct. Utter. In order to attract the interest of the user, it is preferable to utter a content that gives the impression that the photographing apparatus has human-like eyes and thinking power. The user responds to this utterance. The user's reply is through the microphone 8 to the voice processing computer 1
Given to 0.

【００７０】音声処理用コンピュータ１０は、音声認識
処理により利用者の返答の内容を認識し、利用者の返答
の内容が否定的であるか否かを判定する（ステップＳ２
４）。上記の発話に対して、利用者は、例えば「４人も
いないよ」と返答する。The voice processing computer 10 recognizes the content of the user's reply by the voice recognition processing and determines whether the content of the user's reply is negative (step S2).
4). In response to the above utterance, the user replies, for example, "There are no four."

【００７１】利用者の返答の内容が否定的な場合には、
画像処理用コンピュータ９は、第３のカメラ３により与
えられた画像データに基づいて画像認識処理により利用
者の人数を推測する（ステップＳ２５）。ここで、画像
処理用コンピュータ９は、第２の判定条件に基づいて利
用者の人数を推測する。第２の判定条件は、例えば、利
用者を上方から撮像することにより得られた画像に含ま
れるほぼ円形の黒色の領域の有無である。If the user's reply is negative,
The image processing computer 9 estimates the number of users by image recognition processing based on the image data provided by the third camera 3 (step S25). Here, the image processing computer 9 estimates the number of users based on the second determination condition. The second determination condition is, for example, the presence / absence of a substantially circular black region included in the image obtained by imaging the user from above.

【００７２】図７の下図に示すように、第３のカメラ３
により得られた画像ＶＤ３に３つの黒色の領域６１，６
２，６３が含まれる場合には、利用者が３人であると推
測される。この場合、図７の上図の肌色の領域５４は人
間と判定されない。As shown in the lower diagram of FIG. 7, the third camera 3
The image VD3 obtained by
When 2, 63 are included, it is estimated that there are three users. In this case, the flesh-colored area 54 in the upper diagram of FIG. 7 is not determined as a human.

【００７３】次に、音声処理用コンピュータ１０は、画
像処理用コンピュータ９による人数の推測結果に基づい
て対話処理を行う（ステップＳ２６）。それにより、ス
ピーカ７から音声が出力される。例えば、「ごめん、３
人だったね」というような発話が行われる。この発話に
対して利用者が返答する。利用者の返答は、マイク８を
通して音声処理用コンピュータ１０に与えられる。Next, the voice processing computer 10 carries out an interactive process based on the result of estimating the number of people by the image processing computer 9 (step S26). As a result, sound is output from the speaker 7. For example, "Sorry, 3
It was a person ". The user responds to this utterance. The user's reply is given to the voice processing computer 10 through the microphone 8.

【００７４】音声処理用コンピュータ１０は、音声認識
処理により利用者の返答の内容を認識し、利用者の返答
の内容が否定的であるか否かを判定する（ステップＳ２
７）。The voice processing computer 10 recognizes the content of the user's reply by the voice recognition processing, and determines whether or not the content of the user's reply is negative (step S2).
7).

【００７５】利用者の返答の内容が否定的な場合には、
ステップＳ２２に戻り、利用者の返答の内容が肯定的に
なるまで、ステップＳ２２〜Ｓ２７の処理を繰り返す。If the user's reply is negative,
Returning to step S22, the processing of steps S22 to S27 is repeated until the content of the user's reply becomes affirmative.

【００７６】ステップＳ２４またはステップＳ２７にお
いて、利用者の返答の内容が肯定的な場合には、画像処
理用コンピュータ９は、第１のカメラ１により与えられ
た画像データまたは第３のカメラ３により与えられた画
像データに基づいて画像認識処理により利用者の人数お
よび位置を確定する（ステップＳ２８）。In step S24 or step S27, if the user's reply is positive, the image processing computer 9 gives the image data given by the first camera 1 or the image data given by the third camera 3. The number and position of users are determined by image recognition processing based on the obtained image data (step S28).

【００７７】ステップＳ２６の対話処理において、利用
者が、例えば「そうだよ」と返答する。その場合には、
図７の下図に示されるように、第３のカメラ３により得
られた画像ＶＤ３における黒色の領域６１，６２，６３
が利用者の頭であると確定することができる。それによ
り、利用者の人数および位置を同定することができる。In the dialog processing of step S26, the user replies, for example, "Yes." In that case,
As shown in the lower diagram of FIG. 7, black regions 61, 62, 63 in the image VD3 obtained by the third camera 3
Can be determined to be the user's head. Thereby, the number and position of users can be identified.

【００７８】このようにして、異なる判定条件を用いた
画像認識処理および対話処理を組み合わせることによ
り、高い認識率で利用者の人数および位置を同定するこ
とができる。In this way, by combining the image recognition processing and the dialogue processing using different determination conditions, it is possible to identify the number and position of users with a high recognition rate.

【００７９】なお、本例では、異なる判定条件として、
利用者を正面から撮像することにより得られた画像に含
まれるほぼ円形の肌色の領域の有無および利用者を上方
から撮像することにより得られた画像に含まれるほぼ円
形の黒色の領域の有無を用いたが、他の判定条件を用い
てもよい。In this example, different judgment conditions are as follows:
The presence or absence of a substantially circular flesh-colored area included in the image obtained by capturing the user from the front and the presence or absence of the substantially circular black area included in the image obtained by capturing the user from above are determined. Although used, other determination conditions may be used.

【００８０】次に、図１の対話エージェント装置による
写真撮影処理を説明する。図８は図１の対話エージェン
ト装置による写真撮影処理を示すフローチャートであ
る。また、図９は写真撮影処理の一例を説明するための
模式図である。Next, the photographing process by the dialogue agent device of FIG. 1 will be described. FIG. 8 is a flowchart showing a photograph taking process by the dialogue agent device of FIG. Further, FIG. 9 is a schematic diagram for explaining an example of the photography process.

【００８１】まず、統括制御用コンピュータ１１は、第
１および第２のカメラ１，２の撮像領域（画像の領域）
の中央部に利用者の顔が位置するように駆動装置４，５
により第１および第２のカメラ１，２の撮像方向を初期
設定する（ステップＳ３１）。それにより、図９（ａ）
に示すように、第１のカメラ１により得られた画像ＶＤ
１の中央部および第２のカメラ２により得られた画像Ｖ
Ｄ２の中央部に利用者の顔が位置する。First, the computer 11 for overall control controls the image pickup areas (image areas) of the first and second cameras 1 and 2.
Drive devices 4, 5 so that the user's face is located in the center of the
Thus, the imaging directions of the first and second cameras 1 and 2 are initialized (step S31). As a result, FIG. 9 (a)
, The image VD obtained by the first camera 1
Image V obtained by the central part of 1 and the second camera 2
The face of the user is located at the center of D2.

【００８２】この場合、画像処理用コンピュータ９が画
像データに基づいて眉間抽出処理により画像上での利用
者の眉間の位置を検出し、検出された利用者の眉間の位
置に基づいて統括制御用コンピュータ１１が第１および
第２のカメラ１，２の撮像方向を調整する。ここでは、
例えば、上記のように、川戸慎二郎，鉄谷信二：「アイ
カメラへの目位置出力を目的とした目の検出と追跡」，
信学技報，ＰＲＭＵ２００１−１５３，ｐｐ，１−６
（２００１）に提案された眉間抽出技術を用いることが
できる。In this case, the image processing computer 9 detects the position of the user's eyebrow on the image by the eyebrow extraction process based on the image data, and performs the general control based on the detected position of the user's eyebrow. The computer 11 adjusts the imaging directions of the first and second cameras 1 and 2. here,
For example, as mentioned above, Shinjiro Kawato, Shinji Tetsuya: "Eye detection and tracking for eye position output to eye camera",
IEICE Technical Report, PRMU2001-153, pp, 1-6
The eyebrow extraction technique proposed in (2001) can be used.

【００８３】画像処理用コンピュータ９は、第１のカメ
ラ１から与えられる画像データに基づいて利用者が移動
したか否かを判定する（ステップＳ３２）。利用者が移
動した場合には、図９（ｂ）に示すように、第１のカメ
ラ１により得られた画像ＶＤ１および第２のカメラ２に
より得られた画像ＶＤ２上で利用者の位置が移動する。The image processing computer 9 determines whether or not the user has moved based on the image data provided from the first camera 1 (step S32). When the user moves, the position of the user moves on the image VD1 obtained by the first camera 1 and the image VD2 obtained by the second camera 2 as shown in FIG. 9B. To do.

【００８４】利用者が移動した場合には、統括制御用コ
ンピュータ１１は、第１のカメラ１の撮像領域の中央部
に利用者が位置するように駆動装置４により第１のカメ
ラ１の撮像方向を移動させる（ステップＳ３３）。ま
た、統括制御用コンピュータ１１は、第１のカメラ１に
連動して第２のカメラ２の撮像方向を駆動装置５により
移動させる（ステップＳ３４）。これにより、図９
（ｃ）に示すように、第１のカメラ１により得られた画
像ＶＤ１の中央部および第２のカメラ２により得られた
画像ＶＤ２の中央部に利用者の顔が位置する。When the user moves, the integrated control computer 11 causes the drive device 4 to move the image pickup direction of the first camera 1 so that the user is positioned in the center of the image pickup area of the first camera 1. Is moved (step S33). Further, the overall control computer 11 causes the driving device 5 to move the imaging direction of the second camera 2 in association with the first camera 1 (step S34). As a result, FIG.
As shown in (c), the face of the user is located at the center of the image VD1 obtained by the first camera 1 and the center of the image VD2 obtained by the second camera 2.

【００８５】このようにして、眉間抽出処理により利用
者の顔の検出を行い、以後、利用者の顔の追跡を継続し
て行う。これにより、第１のカメラ１は、利用者が移動
しても、常に利用者が撮像領域の中央部に位置するよう
に制御され、第１のカメラ１に連動して第２のカメラ２
も移動する。したがって、利用者から後述するレイアウ
ト指示がなければ、第２のカメラ２も、常に利用者が撮
像領域の中央部に位置するように制御される。In this way, the face of the user is detected by the eyebrow extraction processing, and thereafter the face of the user is continuously tracked. As a result, the first camera 1 is controlled so that the user is always located in the center of the imaging area even if the user moves, and the second camera 2 is linked with the first camera 1.
Also moves. Therefore, if there is no layout instruction to be described later from the user, the second camera 2 is also controlled so that the user is always positioned in the center of the image pickup area.

【００８６】次に、音声処理用コンピュータ１０は、利
用者からの音声による指示があったか否かを判定する
（ステップＳ３５）。利用者からの音声による指示がな
い場合には、ステップＳ３２に戻り、利用者からの音声
の指示があるまで、ステップＳ３２〜Ｓ３５の処理を繰
り返す。Next, the voice processing computer 10 determines whether or not there is a voice instruction from the user (step S35). If there is no voice instruction from the user, the process returns to step S32, and the processes of steps S32 to S35 are repeated until there is a voice instruction from the user.

【００８７】利用者からの音声による指示は、マイク８
を通して音声処理用コンピュータ１０に与えられる。音
声処理用コンピュータ１０は、音声認識処理により利用
者からの音声による指示の内容を認識する。The voice instruction from the user is given by the microphone 8
Through the audio processing computer 10. The voice processing computer 10 recognizes the content of the voice instruction from the user through voice recognition processing.

【００８８】利用者からの音声による指示があった場合
には、音声処理用コンピュータ１０は、その指示がレイ
アウト指示であるか否かを判定する（ステップＳ３
６）。利用者からの音声による指示がレイアウト指示で
ある場合には、統括制御用コンピュータ１１は、レイア
ウト指示に従って第２のカメラ２または駆動装置５を制
御する（ステップＳ３７）。ここで、レイアウト指示
は、画像上での利用者の配置または大きさの指示であ
る。例えば、レイアウト指示としては、「右へ」、「左
へ」、「上へ」、「下へ」、「真ん中へ」、「ズームイ
ン」、「ズームアウト」等がある。When there is a voice instruction from the user, the voice processing computer 10 determines whether or not the instruction is a layout instruction (step S3).
6). When the voice instruction from the user is a layout instruction, the overall control computer 11 controls the second camera 2 or the driving device 5 according to the layout instruction (step S37). Here, the layout instruction is an instruction of the arrangement or size of the user on the image. For example, layout instructions include "to the right", "to the left", "to the top", "to the bottom", "to the middle", "zoom in", "zoom out", and the like.

【００８９】例えば、利用者からの音声による指示が
「右へ」であった場合、統括制御用コンピュータ１１は
駆動装置５により第２のカメラ２の撮像方向を右へ移動
させる。それにより、図９（ｄ）に示すように、第２の
カメラ２により得られた画像ＶＤ２上で利用者が右へ移
動する。第１のカメラ１により得られた画像ＶＤ１上で
は、利用者は常に中央部に位置する。For example, when the voice instruction from the user is “to the right”, the overall control computer 11 causes the drive device 5 to move the image pickup direction of the second camera 2 to the right. As a result, the user moves to the right on the image VD2 obtained by the second camera 2 as shown in FIG. 9 (d). On the image VD1 obtained by the first camera 1, the user is always located in the center.

【００９０】また、利用者からの音声による指示が「少
しズームインして」であった場合、統括制御用コンピュ
ータ１１は第２のカメラ２の撮像倍率を大きくする。そ
れにより、図９（ｅ）に示すように、第２のカメラ２に
より得られた画像ＶＤ２が拡大される。When the voice instruction from the user is "zoom in a little", the overall control computer 11 increases the image pickup magnification of the second camera 2. Thereby, as shown in FIG. 9E, the image VD2 obtained by the second camera 2 is enlarged.

【００９１】なお、利用者は、レイアウト指示として
「フリーズ」と発話してモニタ装置６の画面に表示され
た画像を一時的に静止させることができる。The user can temporarily freeze the image displayed on the screen of the monitor device 6 by speaking "freeze" as a layout instruction.

【００９２】その後、ステップＳ３２に戻り、ステップ
Ｓ３２〜Ｓ３７の処理を繰り返す。ステップＳ３６にお
いて、利用者による指示がレイアウト指示でない場合に
は、音声処理用コンピュータ１０は、利用者による指示
が撮影指示であるか否かを判定する（ステップＳ３
８）。撮影指示は、静止画像（写真）を撮影する旨の指
示である。例えば、利用者は、撮影指示として「撮っ
て」と発話する。Then, the process returns to step S32 and the processes of steps S32 to S37 are repeated. If the instruction from the user is not the layout instruction in step S36, the audio processing computer 10 determines whether the instruction from the user is a shooting instruction (step S3).
8). The shooting instruction is an instruction to shoot a still image (photograph). For example, the user speaks "take a picture" as a shooting instruction.

【００９３】利用者による指示が撮影指示である場合に
は、統括制御用コンピュータ１１は、第２のカメラ２に
より利用者の静止画像を撮影する（ステップＳ３９）。
第２のカメラ２により撮影された利用者の静止画像は、
モニタ装置６の画面に表示される。その後、ステップＳ
３２に戻り、ステップＳ３２〜Ｓ３７の処理を繰り返
す。このようにして、利用者は、モニタ装置６の画面に
表示された静止画像を眺めながら、満足するまで静止画
像を撮り続けることができる。When the instruction from the user is a photographing instruction, the overall control computer 11 photographs the still image of the user with the second camera 2 (step S39).
The still image of the user captured by the second camera 2 is
It is displayed on the screen of the monitor device 6. After that, step S
Returning to 32, the processes of steps S32 to S37 are repeated. In this way, the user can continue to take a still image while being satisfied while looking at the still image displayed on the screen of the monitor device 6.

【００９４】利用者がモニタ装置６の画面に表示された
静止画像を選択すると、選択された静止画像がプリント
アウトされる。When the user selects a still image displayed on the screen of the monitor device 6, the selected still image is printed out.

【００９５】ステップＳ３８において、利用者による指
示が撮影指示でない場合には、音声処理用コンピュータ
１０は、対話処理を行い（ステップＳ４０）、ステップ
Ｓ３２に戻る。ここで、対話処理は、例えば挨拶等であ
る。In step S38, if the user's instruction is not a photographing instruction, the voice processing computer 10 performs an interactive process (step S40) and returns to step S32. Here, the dialogue processing is, for example, a greeting or the like.

【００９６】このようにして、眉間抽出処理および対話
処理を組み合わせることにより、利用者と協調して利用
者が意図するレイアウトで容易に写真撮影を行うことが
できる。また、第１のカメラ１により利用者を検出およ
び追跡し、第２のカメラ２を第１のカメラ１に連動させ
るとともに、利用者の音声によるレイアウト指示に従っ
て制御することにより、利用者が移動しても利用者が意
図するレイアウトで写真撮影を行うことができる。In this way, by combining the eyebrow extraction process and the interactive process, it is possible to easily take a picture in the layout intended by the user in cooperation with the user. In addition, the first camera 1 detects and tracks the user, the second camera 2 is linked to the first camera 1, and the user is moved by controlling according to the layout instruction by the user's voice. However, it is possible to take a picture with the layout intended by the user.

【００９７】本実施の形態においては、第１のカメラ１
および第２のカメラ２が撮像手段に相当し、第１のカメ
ラ１が第１の撮像手段に相当し、第３のカメラ３が第２
の撮像手段に相当する。また、画像処理用コンピュータ
９が検出手段および推測手段に相当し、音声処理用コン
ピュータ１０およびスピーカ７が発話手段に相当し、音
声処理用コンピュータ１０およびマイク８が認識手段に
相当し、統括制御用コンピュータ１１が判定手段に相当
する。さらに、統括制御用コンピュータ１１および駆動
装置４，５が第１の制御手段および第２の制御手段に相
当する。In the present embodiment, the first camera 1
The second camera 2 corresponds to the image pickup means, the first camera 1 corresponds to the first image pickup means, and the third camera 3 is the second image pickup means.
Corresponds to the image pickup means. Further, the image processing computer 9 corresponds to the detecting means and the estimating means, the voice processing computer 10 and the speaker 7 correspond to the uttering means, and the voice processing computer 10 and the microphone 8 correspond to the recognizing means. The computer 11 corresponds to the determination means. Further, the overall control computer 11 and the driving devices 4 and 5 correspond to the first control means and the second control means.

[Brief description of drawings]

【図１】本発明の一実施の形態における対話エージェン
ト装置の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a dialogue agent device according to an embodiment of the present invention.

【図２】図１の対話エージェント装置の動作を説明する
ための模式図である。FIG. 2 is a schematic diagram for explaining the operation of the dialogue agent device of FIG.

【図３】図１の対話エージェント装置による利用者の顔
向きの同定処理を示すフローチャートである。FIG. 3 is a flowchart showing a process of identifying a user's face orientation by the dialogue agent device of FIG.

【図４】図１の対話エージェント装置による利用者の顔
向きの同定処理を示すフローチャートである。FIG. 4 is a flowchart showing a process of identifying a user's face orientation by the dialog agent device of FIG.

【図５】顔向きの同定処理の一例を説明するための模式
図である。FIG. 5 is a schematic diagram for explaining an example of face orientation identification processing.

【図６】図１の対話エージェント装置による利用者の人
数および位置の同定処理を示すフローチャートである。FIG. 6 is a flowchart showing a process of identifying the number of users and the positions of users by the dialog agent device of FIG.

【図７】図１の対話エージェント装置による利用者の人
数および位置の同定処理を示すフローチャートである。FIG. 7 is a flowchart showing a process of identifying the number of users and the positions of users by the dialogue agent device of FIG.

【図８】利用者の人数および位置の同定処理の一例を説
明するための模式図である。FIG. 8 is a schematic diagram for explaining an example of identification processing of the number of people and positions of users.

【図９】写真撮影処理の一例を説明するための模式図で
ある。FIG. 9 is a schematic diagram for explaining an example of a photography process.

[Explanation of symbols]

１第１のカメラ２第２のカメラ３第３のカメラ４，５駆動装置６モニタ装置７スピーカ８マイク９画像処理用コンピュータ１０音声処理用コンピュータ１１統括制御用コンピュータ 1st camera 2 second camera 3rd camera 4,5 drive 6 Monitor device 7 speakers 8 microphone 9 Image processing computer 10 Speech processing computer 11 Integrated control computer

───────────────────────────────────────────────────── フロントページの続き (72)発明者間瀬健二京都府相楽郡精華町光台二丁目２番地２株式会社国際電気通信基礎技術研究所内Ｆターム(参考） 5B057 BA02 DA06 DA13 DB02 DB06 DC03 DC05 DC08 DC09 DC25 DC32 5C022 AA13 AB61 AB62 AC69 AC71 AC72 AC74 ─────────────────────────────────────────────────── ─── Continued front page (72) Inventor Kenji Mase 2-2 Kodai, Seika-cho, Soraku-gun, Kyoto International Telecommunications Basic Technology Research Institute Co., Ltd. F term (reference) 5B057 BA02 DA06 DA13 DB02 DB06 DC03 DC05 DC08 DC09 DC25 DC32 5C022 AA13 AB61 AB62 AC69 AC71 AC72 AC74

Claims

[Claims]

1. An interactive agent device for identifying the orientation of a user's face, wherein the imaging means captures an image of the user, and the distance between the eyes of the user is detected in an image obtained by the imaging means. Detection means, utterance means for uttering the user, determination for determining the orientation of the face of the user based on a change in the distance between the eyes detected by the detection means in association with the utterance of the utterance means And a dialogue agent device.

2. An interactive agent method for identifying the orientation of a user's face, the step of capturing an image of the user, and the step of detecting the distance between the eyes of the user in the image obtained by the image capturing. , A dialogue comprising a step of uttering the user, and a step of determining a direction of the user's face based on a change in the detected distance between both eyes accompanying the utterance. Agent way.

3. A computer-readable dialogue agent program for identifying the orientation of a user's face, the process of capturing an image of the user, and the distance between the eyes of the user in the image obtained by the image capturing. A process of detecting a user's face, a process of uttering the user, and a process of determining the orientation of the user's face based on a change in the detected binocular distance associated with the utterance, A dialogue agent program characterized by being executed by a user.

4. A dialogue agent device for identifying the number and position of users, comprising: imaging means for imaging the users; and image processing based on an image obtained by the imaging means, wherein the usage is performed under a predetermined determination condition. Estimating means for estimating the number of users, uttering means for uttering the user based on the number of users estimated by the estimating means, and recognizing the response of the user to the utterance of the uttering means Recognizing means and switching the judgment condition of the estimating means based on the response of the user recognized by the recognizing means, estimating the number of users by the estimating means, utterance by the uttering means, and the recognizing means Determination means for determining the number and position of the users by recognizing the response of the user by the Dialog agent apparatus characterized by comprising a.

5. A dialogue agent method for identifying the number and position of users, comprising the steps of capturing an image of the user, and performing image processing based on the image obtained by the image capturing of the user under a predetermined determination condition. A step of estimating the number of people, a step of uttering the user based on the estimated number of the users, a step of recognizing a response of the user to the utterance, and the recognized users A step of determining the number and position of the user by switching the determination condition based on the response of the user, estimating the number of the user, uttering the speech, recognizing the response of the user, and performing the imaging. And a dialog agent method comprising:

6. A computer-readable dialog agent program for identifying the number and position of users, wherein a predetermined determination condition is satisfied by a process of capturing an image of the user and an image process based on an image obtained by the image capturing. A process of estimating the number of the users, a process of uttering the user based on the estimated number of the users, a process of recognizing a response of the user to the utterance, the recognized The number of users and the position are determined by switching the determination condition based on the response of the user and estimating the number of users, uttering, recognizing the response, and imaging. A dialogue agent program, characterized in that the computer is caused to execute processing.

7. An interactive agent device for taking a picture of a user, comprising: first imaging means for imaging the user, second imaging means for imaging the user, and the first imaging. The imaging direction of the first imaging means is controlled so that the image of the user is located at a fixed position of the image obtained by the means, and the imaging direction of the first imaging means is linked to the control of the imaging direction of the first imaging means. A first control means for controlling the image pickup direction of the second image pickup means; a recognition means for recognizing a voice instruction from the user; and a layout control unit for recognizing an image layout instruction by the recognition means. The second image pickup means is controlled so that an image in accordance with the instruction is obtained, and the still image is taken by the second image pickup means when the photograph instruction is recognized by the recognition means. Second imaging Dialog agent apparatus characterized by comprising a second control means for controlling the stage.

8. An interactive agent method for taking a picture of a user, comprising the step of taking an image of the user using a first imaging means, and the step of taking an image of the user using a second imaging means. Step, controlling the imaging direction of the first imaging device so that the image of the user is located at a fixed position of the image obtained by the first imaging device, and imaging the first imaging device. Controlling the imaging direction of the second imaging means in conjunction with the control of the direction; recognizing a voice instruction from the user; and recognizing an image layout instruction by the recognition. The second image pickup means is controlled so that an image according to the layout instruction is obtained, and when the image pickup instruction is recognized by the recognition, a still image is photographed by the second image pickup means. Interactive Agents method characterized by comprising the step of controlling said second image pickup means so.

9. A computer-readable dialog agent program for taking a picture of a user, the process comprising: taking a picture of the user using a first imaging means; and using the second imaging means. A process of capturing an image of a person, controlling the image capturing direction of the first image capturing device so that the image of the user is located at a fixed position of the image obtained by the first image capturing device, and A process of controlling the image pickup direction of the second image pickup unit in association with the control of the image pickup direction of the image pickup unit, a process of recognizing a voice instruction from the user, and a recognition of an image layout instruction by the recognition. The second image pickup means is controlled so that an image in accordance with the layout instruction is obtained in the case where the instruction is given, and the second image pickup means is operated when the instruction for photography is recognized by the recognition. And a process for controlling the second image pickup means so that the image is captured, conversation agent program for causing the computer to perform.