JP2005254433A

JP2005254433A - Personal robot and personal robot control program

Info

Publication number: JP2005254433A
Application number: JP2004073603A
Authority: JP
Inventors: Shuji Ono; 修司小野
Original assignee: Fuji Photo Film Co Ltd
Current assignee: Fujifilm Holdings Corp
Priority date: 2004-03-15
Filing date: 2004-03-15
Publication date: 2005-09-22
Anticipated expiration: 2024-03-15
Also published as: JP4549703B2

Abstract

<P>PROBLEM TO BE SOLVED: To more smoothly have communication between a user and a personal robot. <P>SOLUTION: The personal robot communicating with a human, has an image pickup part picking up the image of an object and an opening part which is installed at the outside of the image pickup part and picks up the image by the image pickup part. The personal robot is also equipped with a shell part capable of changing the position of the opening part, and an image pickup range control part which controls the range picked up with the image pickup part by controlling the position of the opening part of the shell part. The shell part intercepts a partial light of the incident light into the image pickup part and is installed as at least a part of eyeballs of the personal robot. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、対人ロボット及び対人ロボット制御プログラムに関する。特に本発明は、人とのコミュニケーションを行う対人ロボット、及び対人ロボットを制御する対人ロボット制御プログラムに関する。 The present invention relates to an interpersonal robot and an interpersonal robot control program. In particular, the present invention relates to an interpersonal robot that communicates with a person and an interpersonal robot control program that controls the interpersonal robot.

近年、ロボット工学技術の発展に伴い、様々なロボットが開発されている。中でも、愛玩や接客といった用途に用いられ、人間とコミュニケーションを行う対人ロボットがマスメディア等で話題になっている。
従来の対人ロボットにおいては、利用者に話し掛けられた場合に、声が聞こえた大凡の方向を認識すると共に、身体をその方向に向けて、利用者とのコミュニケーションを行うという機能を有するものがある。 In recent years, various robots have been developed with the development of robot engineering technology. Among them, interpersonal robots that are used for applications such as pets and customer service and communicate with humans have become a hot topic in mass media and the like.
Some conventional interpersonal robots have a function of recognizing a general direction in which a voice is heard when talking to a user and communicating with the user with the body directed in that direction. .

現時点で先行技術文献の存在を認識していないので、先行技術文献に関する記載を省略する。 Since the existence of the prior art document is not recognized at the present time, the description regarding the prior art document is omitted.

しかしながら、従来の対人ロボットにおいて、多数の利用者から同時に話し掛けられている場合に、対人ロボットが、コミュニケーションを行う対象を切り替える毎に身体の向きを変えていると、切り替え動作に一定の時間を要することにより、コミュニケーションが円滑に行われないという問題があった。また、対人ロボットが、コミュニケーションを行う対象を切り替えても身体の向きを変えず、一定の方向を向いている場合、利用者は、対人ロボットが何れの利用者とコミュニケーションを行っているのかを判断しづらいという問題があった。 However, in a conventional interpersonal robot, when a large number of users are talking at the same time, if the interpersonal robot changes the direction of the body each time the target to be communicated is switched, a certain amount of time is required for the switching operation. As a result, there was a problem that communication was not performed smoothly. Also, if the interpersonal robot does not change the body direction even if the target to be communicated is changed and is facing a certain direction, the user determines with which user the interpersonal robot is communicating. There was a problem that it was difficult.

そこで本発明は、上記の課題を解決することができる対人ロボット及び対人ロボット制御プログラムを提供することを目的とする。この目的は特許請求の範囲における独立項に記載の特徴の組み合わせにより達成される。また従属項は本発明の更なる有利な具体例を規定する。 Accordingly, an object of the present invention is to provide an interpersonal robot and an interpersonal robot control program that can solve the above-described problems. This object is achieved by a combination of features described in the independent claims. The dependent claims define further advantageous specific examples of the present invention.

上記課題を解決するために、本発明の第１の形態においては、人とのコミュニケーションを行う対人ロボットであって、被写体の画像を撮像する撮像部と、撮像部の外側に設けられ、撮像部が画像を撮像するための開口部を有すると共に、開口部の位置を変化させることのできるシェル部と、シェル部が有する開口部の位置を制御することにより、撮像部が撮像する範囲を制御する撮像範囲制御部とを備える。シェル部は、撮像部に入射する光のうち、一部の光を遮ってもよい。シェル部は、当該対人ロボットの眼球の少なくとも一部として設けられていてもよい。当該対人ロボットは、撮像部により撮像された画像の中で、当該対人ロボットに話し掛けている人物が含まれる領域を選択する話者選択部を更に備え、撮像範囲制御部は、話者選択部により選択された範囲に基づいて、撮像部が撮像する範囲を制御してもよい。 In order to solve the above-described problem, in the first embodiment of the present invention, an interpersonal robot that communicates with a human being, provided with an imaging unit that captures an image of a subject, an outside of the imaging unit, and an imaging unit Has an opening for capturing an image, and a shell part that can change the position of the opening part, and a position of the opening part of the shell part is controlled to control a range in which the imaging part captures an image. An imaging range control unit. The shell portion may block part of the light incident on the imaging unit. The shell part may be provided as at least a part of the eyeball of the interpersonal robot. The interpersonal robot further includes a speaker selection unit that selects an area including a person talking to the interpersonal robot in an image captured by the imaging unit, and the imaging range control unit is configured by the speaker selection unit. Based on the selected range, the range captured by the imaging unit may be controlled.

また、本発明の第２の形態においては、対人ロボットを機能させる対人ロボット制御プログラムであって、当該対人ロボットを、被写体の画像を撮像する撮像部と、撮像部の外側に設けられ、撮像部が画像を撮像するための開口部を有すると共に、開口部の位置を変化させることのできるシェル部と、シェル部が有する開口部の位置を制御することにより、撮像部が撮像する範囲を制御する撮像範囲制御部とを備える対人ロボットとして機能させる。当該対人ロボットは、撮像部により撮像された画像の中で、当該対人ロボットに話し掛けている人物が含まれる領域を選択する話者選択部を更に備え、撮像範囲制御部は、話者選択部により選択された範囲に基づいて、撮像部が撮像する範囲を制御してもよい。 Further, in the second embodiment of the present invention, there is provided an interpersonal robot control program for causing an interpersonal robot to function, wherein the interpersonal robot is provided on an outside of the image pickup unit, an image pickup unit for picking up an image of a subject, and an image pickup unit Has an opening for capturing an image, and a shell part that can change the position of the opening part, and a position of the opening part of the shell part is controlled to control a range in which the imaging part captures an image. It functions as an interpersonal robot including an imaging range control unit. The interpersonal robot further includes a speaker selection unit that selects an area including a person talking to the interpersonal robot in an image captured by the imaging unit, and the imaging range control unit is configured by the speaker selection unit. Based on the selected range, the range captured by the imaging unit may be controlled.

なお、上記の発明の概要は、本発明の必要な特徴の全てを列挙したものではなく、これらの特徴群のサブコンビネーションもまた、発明となりうる。 The above summary of the invention does not enumerate all the necessary features of the present invention, and sub-combinations of these feature groups can also be the invention.

本発明によれば、利用者と対人ロボットとが、より円滑にコミュニケーションを行うことができる。 ADVANTAGE OF THE INVENTION According to this invention, a user and an interpersonal robot can communicate more smoothly.

以下、発明の実施の形態を通じて本発明を説明するが、以下の実施形態は特許請求の範囲にかかる発明を限定するものではなく、また実施形態の中で説明されている特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。 Hereinafter, the present invention will be described through embodiments of the invention. However, the following embodiments do not limit the invention according to the scope of claims, and all combinations of features described in the embodiments are included. It is not necessarily essential for the solution of the invention.

図１は、本発明の実施形態に係る対人ロボット１０の概要を示す。対人ロボット１０は、利用者から話し掛けられた場合に、音声や文字、画像等を用いて応答するといった、双方向のコミュニケーションを行う。具体的には、対人ロボット１０は、デジタルカメラ等を用いて入力した画像に基づいて利用者の動作及び表情を認識すると共に、マイクロフォン等を用いて入力した音声に基づいて利用者の話し声を認識することにより、利用者の意図を認識する。そして、対人ロボット１０は、利用者の意図を認識した結果に基づいて、予め格納されている音声を選択するか、又は必要に応じて音声を合成してスピーカ等を用いて出力することにより、自律的に利用者とのコミュニケーションを行う。 FIG. 1 shows an outline of an interpersonal robot 10 according to an embodiment of the present invention. The interpersonal robot 10 performs two-way communication such as a response using voice, characters, images, or the like when spoken by a user. Specifically, the interpersonal robot 10 recognizes the user's movement and facial expression based on an image input using a digital camera or the like, and recognizes the user's spoken voice based on voice input using a microphone or the like. By doing so, the user's intention is recognized. And the interpersonal robot 10 selects a voice stored in advance based on a result of recognizing the user's intention, or synthesizes a voice as necessary and outputs it using a speaker or the like. Communicate with users autonomously.

ここで、対人ロボット１０は、音声をスピーカから出力することにより利用者とのコミュニケーションを行うのに変えて、ＬＣＤパネル等の表示デバイスを用いて、文字及び画像を表示することにより、利用者とのコミュニケーションを行ってもよい。また、対人ロボット１０は、自律的に利用者とのコミュニケーションを行うのに変えて、入力した利用者の画像及び話し声を外部の制御端末に送信すると共に、オペレータの音声を制御端末から受信してスピーカ等を用いて出力することにより、利用者とのコミュニケーションを行ってもよい。 Here, the interpersonal robot 10 uses a display device such as an LCD panel to display characters and images instead of communicating with the user by outputting sound from a speaker. You may communicate. The interpersonal robot 10 transmits the input user's image and speech to the external control terminal instead of autonomously communicating with the user, and receives the operator's voice from the control terminal. You may communicate with a user by outputting using a speaker etc.

利用者にとってより親しみ易いという点で、対人ロボット１０は、本図に示したような人を模した形態であったり、又は、犬や猫等の動物を模した形態であったりすることが好ましい。本実施形態に係る対人ロボット１０は、デジタルカメラ等の、利用者の画像を入力するデバイスを含む眼球１５を備える。 It is preferable that the interpersonal robot 10 has a form imitating a person as shown in the figure or a form imitating an animal such as a dog or a cat because it is more familiar to the user. . The interpersonal robot 10 according to the present embodiment includes an eyeball 15 including a device for inputting a user image, such as a digital camera.

本実施形態に係る対人ロボット１０は、利用者とのコミュニケーションを行う場合に、画像の入力デバイスである眼球１５を外部から見えるように制御して、対人ロボット１０が何れの方向を見ているか、即ち、対人ロボット１０が何れの利用者とコミュニケーションを行っているかを、当該利用者及び他の利用者に知らせることにより、利用者とのコミュニケーションをより円滑に行うことを目的とする。 The interpersonal robot 10 according to the present embodiment controls the eyeball 15 that is an image input device so as to be visible from the outside when communicating with the user, and in which direction the interpersonal robot 10 is looking. That is, it is an object to make communication with a user smoother by informing the user and other users of which user the interpersonal robot 10 is communicating with.

図２は、本発明の実施形態に係る対人ロボット１０の構成の一例を示すブロック図である。対人ロボット１０は、撮像部１００、１次メモリ１１０、撮像制御部１２０、集音部１３０、話者選択部１４０、シェル部１５０、及び撮像範囲制御部１６０を備える。撮像部１００は、光学系１０２、ＣＣＤ１０４、及び撮像信号処理部１０６を有し、被写体の画像を撮像する。光学系１０２は、例えばフォーカスレンズやズームレンズ等を含み、被写体像をＣＣＤ１０４の受光面上に結像する。ＣＣＤ１０４は、複数の受光素子を含み、光学系１０２により受光面に結像された被写体の光学像によってそれぞれの受光素子に蓄積された電荷を、アナログの電圧信号として撮像信号処理部１０６に出力する。撮像信号処理部１０６は、ＣＣＤ１０４から受け取った、被写体像を示すアナログの電圧信号を、Ｒ、Ｇ、及びＢの各成分に分解する。そして、撮像信号処理部１０６は、Ｒ、Ｇ、及びＢの各成分に分解されたアナログ信号をＡ／Ｄ変換し、その結果得られた被写体像を示すデジタルの画像データを１次メモリ１１０に出力する。１次メモリ１１０は、例えばＤＲＡＭ等の揮発性メモリであり、撮像信号処理部１０６が出力したデジタルの画像データを格納する。撮像制御部１２０は、撮像部１００に含まれる機構部材を駆動することにより、被写体を示す画像の撮像を制御する。例えば、撮像制御部１２０は、露出度、ホワイトバランス、ズーム動作、及び絞り動作等を制御する。 FIG. 2 is a block diagram showing an example of the configuration of the interpersonal robot 10 according to the embodiment of the present invention. The interpersonal robot 10 includes an imaging unit 100, a primary memory 110, an imaging control unit 120, a sound collection unit 130, a speaker selection unit 140, a shell unit 150, and an imaging range control unit 160. The imaging unit 100 includes an optical system 102, a CCD 104, and an imaging signal processing unit 106, and captures an image of a subject. The optical system 102 includes, for example, a focus lens and a zoom lens, and forms a subject image on the light receiving surface of the CCD 104. The CCD 104 includes a plurality of light receiving elements, and outputs the charges accumulated in the respective light receiving elements by the optical image of the subject formed on the light receiving surface by the optical system 102 to the imaging signal processing unit 106 as an analog voltage signal. . The imaging signal processing unit 106 decomposes the analog voltage signal indicating the subject image received from the CCD 104 into R, G, and B components. Then, the imaging signal processing unit 106 performs A / D conversion on the analog signal decomposed into the R, G, and B components, and digital image data representing the subject image obtained as a result is stored in the primary memory 110. Output. The primary memory 110 is a volatile memory such as a DRAM and stores digital image data output from the imaging signal processing unit 106. The imaging control unit 120 controls the imaging of an image showing a subject by driving a mechanism member included in the imaging unit 100. For example, the imaging control unit 120 controls exposure, white balance, zoom operation, aperture operation, and the like.

集音部１３０は、例えばマイクロフォンであり、音声を入力し、入力した音声を話者選択部１４０に出力する。話者選択部１４０は、１次メモリ１１０に格納された画像データが示す画像の中で、対人ロボット１０に話し掛けている人物を、画像認識技術等を用いて認識し、当該人物が含まれる領域を選択する。ここで、話者選択部１４０は、集音部１３０から受け取った音声に基づいて、対人ロボット１０に話し掛けている人物を認識してもよい。そして、話者選択部１４０は、選択した領域を示す情報を撮像範囲制御部１６０に出力する。 The sound collection unit 130 is, for example, a microphone, inputs voice, and outputs the input voice to the speaker selection unit 140. The speaker selection unit 140 recognizes a person talking to the interpersonal robot 10 in an image indicated by image data stored in the primary memory 110 using an image recognition technique or the like, and an area including the person Select. Here, the speaker selection unit 140 may recognize a person talking to the interpersonal robot 10 based on the voice received from the sound collection unit 130. Then, the speaker selection unit 140 outputs information indicating the selected area to the imaging range control unit 160.

シェル部１５０は、撮像部１００の外側に設けられ、撮像部１００が画像を撮像するための開口部を有すると共に、開口部の位置を変化させることができる。撮像範囲制御部１６０は、シェル部１５０が有する開口部の位置を制御することにより、撮像部１００が撮像する範囲を制御する。具体的には、撮像範囲制御部１６０は、話者選択部１４０により選択された範囲に基づいて、撮像部１００が撮像する範囲を制御する。例えば、撮像範囲制御部１６０は、話者選択部１４０により選択された範囲が画像の中央に配置されて撮像されるべく、シェル部１５０の開口部の位置を制御してもよい。 The shell unit 150 is provided outside the imaging unit 100, has an opening for the imaging unit 100 to capture an image, and can change the position of the opening. The imaging range control unit 160 controls the range that the imaging unit 100 captures by controlling the position of the opening of the shell unit 150. Specifically, the imaging range control unit 160 controls the range that the imaging unit 100 captures based on the range selected by the speaker selection unit 140. For example, the imaging range control unit 160 may control the position of the opening of the shell unit 150 so that the range selected by the speaker selection unit 140 is captured at the center of the image.

本実施形態に係る対人ロボット１０によれば、撮像部１００の外部に設けられたシェル部１５０を用いて撮像部１００における撮像範囲を制御することにより、利用者は、対人ロボット１０が何れの範囲を撮像しているか、即ち、対人ロボット１０が何れの利用者とコミュニケーションを行っているかを、シェル部１５０における開口部の位置に基づいて判断することができる。これにより、利用者は、対人ロボット１０とのコミュニケーションを、より円滑に行うことができる。 According to the interpersonal robot 10 according to the present embodiment, by controlling the imaging range in the imaging unit 100 using the shell unit 150 provided outside the imaging unit 100, the user can determine which range the interpersonal robot 10 has. It is possible to determine based on the position of the opening in the shell 150 whether the interpersonal robot 10 is communicating with which user. Thereby, the user can perform communication with the interpersonal robot 10 more smoothly.

また、対人ロボット１０が複数の利用者に囲まれて話し掛けられており、コミュニケーションを行う対象を頻繁に切り替える場合であっても、対人ロボット１０の向きを変えることなく、シェル部１５０における開口部の位置を変化させることによって、コミュニケーションを行う対象を利用者から判別可能に切り替えることができる。これにより、対人ロボット１０は、コミュニケーションを行う対象をより短い時間で切り替えることができるので、応答性を向上させることができる。更に、コミュニケーションを行う対象を切り替える際に駆動させる部分を少なくすることができるので、対人ロボット１０における消費電力を低減することができ、対人ロボット１０の低コスト化や小型化を実現できる。 In addition, even if the interpersonal robot 10 is talked by being surrounded by a plurality of users and the object to be communicated is frequently switched, the opening of the shell 150 is not changed without changing the orientation of the interpersonal robot 10. By changing the position, the object to be communicated can be switched so as to be discriminable from the user. Thereby, since the interpersonal robot 10 can switch the communication target in a shorter time, the responsiveness can be improved. Furthermore, since the part to be driven when switching the object to be communicated can be reduced, the power consumption of the interpersonal robot 10 can be reduced, and the cost and size of the interpersonal robot 10 can be reduced.

図３は、本発明の実施形態に係る眼球１５の断面の一例を示す。眼球１５は、シェル部１５０、光学系１０２、及びＣＣＤ１０４を含む。シェル部１５０は、例えば、遮光性を有する部材により、光学系１０２及びＣＣＤ１０４を覆うべく構成され、一部に穴を空けることにより設けられた開口部１５５を有する。そして、シェル部１５０は、光学系１０２に入射する光のうち、一部の光を遮ることにより、撮像部１００における撮像範囲を制限する。具体的には、シェル部１５０は、開口部１５５を通じて入射する光のみを光学系１０２に入射させ、それ以外の光を遮ることにより、撮像部１００における撮像範囲を制限する。ここで、開口部１５５の大きさは多様であってよく、本図に示した開口部１５５より大きくても、又は小さくてもよい。 FIG. 3 shows an example of a cross section of the eyeball 15 according to the embodiment of the present invention. The eyeball 15 includes a shell portion 150, an optical system 102, and a CCD 104. The shell portion 150 is configured to cover the optical system 102 and the CCD 104 with a light-shielding member, for example, and has an opening portion 155 provided by making a hole in a part thereof. And the shell part 150 restrict | limits the imaging range in the imaging part 100 by interrupting | blocking one part light among the light which injects into the optical system 102. FIG. Specifically, the shell unit 150 limits the imaging range in the imaging unit 100 by causing only the light incident through the opening 155 to enter the optical system 102 and blocking the other light. Here, the size of the opening 155 may vary, and may be larger or smaller than the opening 155 shown in the figure.

また、シェル部１５０は、撮像範囲制御部１６０の制御に基づいて開口部１５５の位置を変化させることにより、撮像部１００における撮像範囲を変化させる。例えば、シェル部１５０は、底部に開口部１５５となる穴の空いた、遮光性を有する椀型の部材として設けられ、撮像範囲制御部１６０により駆動されるアクチュエータによって、光学系１０２における光学中心等の周りを回転可能に制御される。また、シェル部１５０は、一部分が透光性を有し、他の部分が遮光性を有する部材として構成され、透光性を有する部分を開口部１５５として機能させてもよい。ここで、開口部１５５として用いる透光性部材は無色透明でなくともよく、何らかの色を有していてもよい。また、シェル部１５０は、全体が透光性を有する部材であり、開口部１５５にあたる部分と、それ以外の部分とが異なる色を有するべく構成されてもよい。 The shell unit 150 changes the imaging range in the imaging unit 100 by changing the position of the opening 155 based on the control of the imaging range control unit 160. For example, the shell portion 150 is provided as a bowl-shaped member having a light shielding property with a hole serving as an opening 155 at the bottom, and the optical center in the optical system 102 is driven by an actuator driven by the imaging range control unit 160. It is controlled to be rotatable around. Further, the shell portion 150 may be configured as a member having a light-transmitting part and another part having a light-blocking property, and the light-transmitting part may function as the opening 155. Here, the translucent member used as the opening 155 does not have to be colorless and transparent, and may have some color. Shell part 150 is a member which has translucency as a whole, and may be constituted so that a portion which corresponds to opening 155 may have a different color from other portions.

また、シェル部１５０は、透光性を有する電極を用いた液晶パネルにより構成され、開口部１５５にあたる部分において光を透過させると共に、開口部１５５以外の部分において光を遮るべく電極を制御することにより、撮像部１００における撮像範囲を制御してもよい。 Further, the shell portion 150 is configured by a liquid crystal panel using a light-transmitting electrode, and transmits light at a portion corresponding to the opening 155 and controls the electrode to block light at portions other than the opening 155. Thus, the imaging range in the imaging unit 100 may be controlled.

図４は、本発明の実施形態に係る開口部１５５の位置変化の一例を示す。図４（ａ）は、開口部１５５が利用者から見て左に位置する場合の、対人ロボット１０の外観の一例を示す。この場合、撮像部１００は、対人ロボット１０から見て右側の被写体の画像を撮像する。図４（ｂ）は、開口部１５５が利用者から見て右に位置する場合の、対人ロボット１０の外観の一例を示す。この場合、撮像部１００は、対人ロボット１０から見て左側の被写体の画像を撮像する。 FIG. 4 shows an example of a change in position of the opening 155 according to the embodiment of the present invention. FIG. 4A shows an example of the external appearance of the interpersonal robot 10 when the opening 155 is positioned on the left as viewed from the user. In this case, the imaging unit 100 captures an image of the subject on the right side when viewed from the interpersonal robot 10. FIG. 4B shows an example of the appearance of the interpersonal robot 10 when the opening 155 is located on the right side when viewed from the user. In this case, the imaging unit 100 captures an image of a subject on the left side when viewed from the interpersonal robot 10.

図４（ａ）及び図４（ｂ）から明らかなように、利用者は、開口部１５５の位置に基づいて、対人ロボット１０が何れの方向の画像を撮像しているかを容易に判断することができる。更に、シェル部１５０が眼球１５の少なくとも一部として設けられていることにより、利用者は、対人ロボット１０が何れの方向を撮像しているかを、人間における視線方向の判断と同様の感覚で、容易に判断することができる。そして、利用者は、対人ロボット１０が自分自身を見ているか否かを判断することにより、対人ロボット１０が自分自身とのコミュニケーションを行っているか否かを判断することができる。つまり、利用者は、対人ロボット１０との間でアイコンタクトを交えたコミュニケーションを行うことができる。これにより、利用者は、対人ロボット１０とのコミュニケーションを、より円滑に行うことができる。 As is clear from FIGS. 4A and 4B, the user can easily determine in which direction the interpersonal robot 10 is capturing an image based on the position of the opening 155. Can do. Furthermore, since the shell unit 150 is provided as at least a part of the eyeball 15, the user can determine which direction the interpersonal robot 10 is imaging in the same sense as the determination of the gaze direction in humans. It can be easily judged. The user can determine whether or not the interpersonal robot 10 is communicating with itself by determining whether or not the interpersonal robot 10 is looking at itself. That is, the user can communicate with the interpersonal robot 10 through eye contact. Thereby, the user can perform communication with the interpersonal robot 10 more smoothly.

また、光学系１０２において、例えば魚眼レンズ等の広角レンズが用いられている場合に、必要とする方向とは異なる方向から入射する光を、シェル部１５０を用いて遮ることにより、開口部１５５の位置を変更させることによって撮像可能となる範囲を広く確保しながら、所謂迷光を防止することができ、より鮮明な画像を撮像することができる。 Further, when a wide-angle lens such as a fish-eye lens is used in the optical system 102, the position of the opening 155 is blocked by using the shell 150 to block light that is incident from a direction different from the required direction. It is possible to prevent so-called stray light and to capture a clearer image while ensuring a wide range that can be imaged by changing.

図５は、本発明の実施形態に係る対人ロボット１０における処理の流れの一例を示すフローチャートである。まず、集音部１３０が音声を入力する（Ｓ１０００）。ここで、集音部１３０は、複数のマイクロフォンにより構成され、それぞれのマイクロフォンを用いて互いに異なる位置において音声を入力してもよい。続いて、話者選択部１４０は、集音部１３０により入力された音声に、利用者の話し声が含まれているか否かを判定する（Ｓ１０１０）。例えば、話者選択部１４０は、入力された音声を示す信号を、高速フーリエ変換等の公知の技術を用いて複数の周波数成分に分解し、人間の声に特有の周波数帯における成分のレベルが予め定められた基準値以上である場合に、入力された音声に話し声が含まれていると判定する。また、話者選択部１４０は、公知の音声認識技術を用いることにより、入力された音声から特定の単語、例えば対人ロボット１０に付けられた名前等を認識した場合に、入力された音声に話し声が含まれていると判定してもよい。こうすることにより、対人ロボット１０とのコミュニケーションを望んでいる利用者から話し掛けられた場合にのみ、対人ロボット１０を応答させることができる。 FIG. 5 is a flowchart showing an example of a process flow in the interpersonal robot 10 according to the embodiment of the present invention. First, the sound collection unit 130 inputs sound (S1000). Here, the sound collection unit 130 may be configured by a plurality of microphones, and may input sound at different positions using the respective microphones. Subsequently, the speaker selection unit 140 determines whether or not the voice input by the sound collection unit 130 includes the user's speaking voice (S1010). For example, the speaker selection unit 140 decomposes a signal indicating the input speech into a plurality of frequency components using a known technique such as fast Fourier transform, and the level of the component in a frequency band specific to human voice is reduced. When it is equal to or greater than a predetermined reference value, it is determined that the input voice includes spoken voice. In addition, when the speaker selection unit 140 recognizes a specific word such as a name given to the interpersonal robot 10 from the input voice by using a known voice recognition technique, the speaker selection unit 140 speaks the input voice. May be determined to be included. By doing so, the interpersonal robot 10 can be made to respond only when spoken by a user who wants to communicate with the interpersonal robot 10.

また、ここで、話者選択部１４０は、入力された音声に利用者の話し声が含まれているか否かを判定するだけでなく、話し声の発生位置、又は伝播方向を検出してもよい。例えば、話者選択部１４０は、集音部１３０が複数のマイクロフォンにより構成されている場合に、マイクロフォン毎に入力された音声に対する話し声の判定を行い、最も大きな話し声が入力された位置に基づいて、話し声の発生位置又は伝播方向を検出してもよい。また、話者選択部１４０は、集音部１３０が少なくとも２つのマイクロフォンにより構成されている場合に、公知の音声定位技術を用いて、マイクロフォン毎の、同一の話し声を示す音波の入力タイミングのずれに基づいて、当該話し声の発生位置を検出してもよい。 Here, the speaker selection unit 140 may not only determine whether or not the user's speaking voice is included in the input voice, but may also detect the generation position or propagation direction of the speaking voice. For example, when the sound collection unit 130 is configured by a plurality of microphones, the speaker selection unit 140 determines the speech for the speech input for each microphone, and based on the position where the largest speech is input. The generation position or the propagation direction of the speaking voice may be detected. In addition, when the sound collection unit 130 is configured by at least two microphones, the speaker selection unit 140 uses a known sound localization technique to shift the input timing of sound waves indicating the same spoken voice for each microphone. Based on the above, the generation position of the spoken voice may be detected.

そして、入力された音声に話し声が含まれていない場合（Ｓ１０１０：Ｎｏ）、対人ロボット１０は、処理をＳ１０００に戻し、再び音声を入力する。一方、入力された音声に話し声が含まれている場合（Ｓ１０１０：Ｙｅｓ）、撮像範囲制御部１６０は、撮像部１００における撮像範囲を制御する（Ｓ１０２０）。ここで、話者選択部１４０により話し声の発生位置が検出されている場合、撮像範囲制御部１６０は、当該位置を撮像範囲に含ませるべく、撮像部１００における撮像範囲を制御する。一方、話し声の発生位置が検出されていない場合、撮像範囲制御部１６０は、繰り返し撮像することにより周囲全体を撮像すべく、撮像範囲を、例えば右方向に順次移動させる等、予め定められた規則に基づいて変化させてもよい。 If the input voice does not include a speaking voice (S1010: No), the interpersonal robot 10 returns the process to S1000 and inputs the voice again. On the other hand, when the input voice includes a speaking voice (S1010: Yes), the imaging range control unit 160 controls the imaging range in the imaging unit 100 (S1020). Here, when the speaking voice generation position is detected by the speaker selection unit 140, the imaging range control unit 160 controls the imaging range in the imaging unit 100 so that the position is included in the imaging range. On the other hand, when the generation position of the speaking voice is not detected, the imaging range control unit 160 determines a predetermined rule such as sequentially moving the imaging range in the right direction, for example, to capture the entire periphery by repeatedly imaging. You may change based on.

また、撮像範囲制御部１６０は、シェル部１５０が有する開口部の位置を変化させるだけでなく、対人ロボット１０の向きを変更することにより、撮像部１００の撮像範囲を制御してもよい。但し、その場合であっても、可能な限りシェル部１５０が有する開口部の位置を変化させて撮像範囲を制御することにより、撮像範囲の変更に要する時間を短縮することができるので、対人ロボット１０の応答性が向上する。 Further, the imaging range control unit 160 may control the imaging range of the imaging unit 100 by changing the orientation of the interpersonal robot 10 as well as changing the position of the opening of the shell unit 150. However, even in such a case, the time required for changing the imaging range can be shortened by changing the position of the opening of the shell 150 as much as possible to control the imaging range. 10 responsiveness is improved.

続いて、撮像部１００は、画像を撮像する（Ｓ１０３０）。ここで、後述する口の動きの検出においては、撮像部１００が、動画像を撮影するか、又は時間的に連続する複数の静止画像を撮像することが好ましい。続いて、話者選択部１４０は、撮像部１００により撮像された画像に人物が含まれているか否かを認識する（Ｓ１０４０）。例えば、話者選択部１４０は、撮像された画像における、肌色を示す領域の分布に基づいて、人物が含まれているか否かを認識する。ここで、撮像部１００により、動画像が撮影された場合、又は複数の静止画像が撮像された場合、話者選択部１４０は、少なくとも一部のフレーム画像又は静止画像に対して認識処理を行ってよい。そして、撮像された画像に人物が含まれていない場合（Ｓ１０４０：Ｎｏ）、対人ロボット１０は、処理をＳ１０００に戻し、再び集音部１３０に音声を入力させる。 Subsequently, the imaging unit 100 captures an image (S1030). Here, in detecting the movement of the mouth, which will be described later, it is preferable that the imaging unit 100 captures a moving image or captures a plurality of temporally continuous still images. Subsequently, the speaker selection unit 140 recognizes whether or not a person is included in the image captured by the imaging unit 100 (S1040). For example, the speaker selection unit 140 recognizes whether or not a person is included based on the distribution of the area indicating the skin color in the captured image. Here, when a moving image is captured by the image capturing unit 100 or when a plurality of still images are captured, the speaker selecting unit 140 performs a recognition process on at least some of the frame images or still images. It's okay. If no person is included in the captured image (S1040: No), the interpersonal robot 10 returns the process to S1000, and causes the sound collection unit 130 to input sound again.

一方、撮像された画像に人物が含まれている場合（Ｓ１０４０：Ｙｅｓ）、話者選択部１４０は、認識された人物を示す画像から、例えばパターンマッチング等の公知の画像認識技術を用いて、顔部分の画像を認識すると共に、顔部分の画像における口部分の画像を抽出する（Ｓ１０５０）。続いて、話者選択部１４０は、口部分の画像が話し声に同期して変化しているか否かを判定する（Ｓ１０６０）。例えば、話者選択部１４０は、動画像における複数のフレーム画像、又は時間的に連続する複数の静止画像において認識された人物の口部分の画像が、予め定められた時間間隔以下で変化している場合に、当該人物が話をしていると認識する。そして、話者選択部１４０は、入力された音声に話し声が含まれている時間帯と、当該人物が話をしている時間帯とが、予め定められた基準値以上の割合で一致している場合に、口部分の画像が話し声に同期して変化している、即ち、話し声に同期して口が動いていると判定する。ここで、撮像された画像に複数の人物が含まれている場合、話者選択部１４０は、複数の人物のそれぞれに対して、口部分の画像の検出を行うと共に、口部分の画像が話し声に同期して変化しているか否かを判定してよい。 On the other hand, when a person is included in the captured image (S1040: Yes), the speaker selection unit 140 uses a known image recognition technique such as pattern matching from the image showing the recognized person, for example. The face image is recognized, and the mouth image in the face image is extracted (S1050). Subsequently, the speaker selection unit 140 determines whether or not the mouth image changes in synchronization with the speaking voice (S1060). For example, the speaker selection unit 140 changes the image of the mouth part of a person recognized in a plurality of frame images in a moving image or a plurality of still images that are temporally continuous within a predetermined time interval. If it is, the person is recognized as talking. Then, the speaker selecting unit 140 matches the time zone in which the voice is included in the input voice and the time zone in which the person is speaking at a rate equal to or higher than a predetermined reference value. If it is determined that the mouth image is changing in synchronization with the speaking voice, that is, it is determined that the mouth is moving in synchronization with the speaking voice. Here, when a plurality of persons are included in the captured image, the speaker selection unit 140 detects the mouth image for each of the plurality of persons, and the mouth image is spoken. It may be determined whether or not it has changed in synchronization with the above.

口部分の画像が話し声に同期して変化しているのではない場合（Ｓ１０６０：Ｎｏ）、対人ロボット１０は、処理をＳ１０００に戻し、再び集音部１３０に音声を入力させる。一方、口部分の画像が話し声に同期して変化している場合に（Ｓ１０６０：Ｙｅｓ）、話者選択部１４０は、当該画像に対応する人物を対人ロボット１０に話し掛けている人物と認識し、撮像された画像の中で、当該人物の少なくとも一部が含まれる領域を選択する（Ｓ１０７０）。例えば、話者選択部１４０は、当該人物に外接する矩形を選択する。そして、撮像範囲制御部１６０は、撮像部１００により撮像される画像の中央に話者選択部１４０により選択された領域が配置されるべく、シェル部１５０を制御して撮像範囲を変更する（Ｓ１０８０）。 When the image of the mouth portion does not change in synchronization with the speaking voice (S1060: No), the interpersonal robot 10 returns the process to S1000 and causes the sound collecting unit 130 to input the sound again. On the other hand, when the image of the mouth portion changes in synchronization with the speaking voice (S1060: Yes), the speaker selecting unit 140 recognizes the person corresponding to the image as the person talking to the interpersonal robot 10, An area including at least a part of the person is selected from the captured image (S1070). For example, the speaker selection unit 140 selects a rectangle that circumscribes the person. Then, the imaging range control unit 160 changes the imaging range by controlling the shell unit 150 so that the region selected by the speaker selection unit 140 is arranged at the center of the image captured by the imaging unit 100 (S1080). ).

以上のように、対人ロボット１０に話し掛けた利用者を含む領域を撮像すべくシェル部１５０を制御することにより、対人ロボット１０に話し掛けた利用者は、対人ロボット１０が自分自身を見ていると認識することができるので、利用者と対人ロボット１０との間にアイコンタクトを交えたコミュニケーションを成立させることができる。これにより、利用者は、対人ロボット１０とより円滑にコミュニケーションを行うことができる。 As described above, a user who talks to the interpersonal robot 10 by controlling the shell unit 150 to image a region including the user who talks to the interpersonal robot 10 is that the interpersonal robot 10 is looking at himself / herself. Since it can recognize, the communication which made eye contact between a user and the interpersonal robot 10 can be materialized. Thereby, the user can communicate with the interpersonal robot 10 more smoothly.

本例において、対人ロボット１０は、集音部１３０により入力した音声と、撮像部１００により撮像した画像との双方を用いて、対人ロボット１０に話し掛けた利用者を検出したが、音声と画像との何れか一方のみを用いて、話し掛けた利用者を検出してもよい。例えば、対人ロボット１０は、複数のマイクロフォン等を用いて話し声の発生位置を検出し、検出した位置のみに基づいて撮像範囲を制御することにより、話し掛けた利用者を検出してもよい。また、例えば、対人ロボット１０は、シェル部１５０における開口部の位置、及び対人ロボット１０の方向を順次変化させて画像を撮像し、撮像された画像に対して人物認識すると共に、口部分の画像の変化を検出することにより、話し掛けた利用者を検出してもよい。 In this example, the interpersonal robot 10 detects the user who talks to the interpersonal robot 10 using both the sound input by the sound collecting unit 130 and the image captured by the image capturing unit 100. The user who talked to may be detected using only one of these. For example, the interpersonal robot 10 may detect the speaking user by detecting the generation position of the speaking voice using a plurality of microphones and controlling the imaging range based only on the detected position. Further, for example, the interpersonal robot 10 captures an image by sequentially changing the position of the opening in the shell unit 150 and the direction of the interpersonal robot 10, recognizes the person with respect to the captured image, and also images the mouth portion. The user who talked to may be detected by detecting the change in.

図６は、本実施形態に係るコンピュータ１５００のハードウェア構成の一例を示すブロック図である。本実施形態に係るコンピュータ１５００は、ホスト・コントローラ１５８２により相互に接続されるＣＰＵ１５０５、ＲＡＭ１５２０、グラフィック・コントローラ１５７５、及び表示装置１５８０を有するＣＰＵ周辺部と、入出力コントローラ１５８４によりホスト・コントローラ１５８２に接続される通信インターフェイス１５３０、ハードディスクドライブ１５４０、及びＣＤ−ＲＯＭドライブ１５６０を有する入出力部と、入出力コントローラ１５８４に接続されるＲＯＭ１５１０、フレキシブルディスク・ドライブ１５５０、及び入出力チップ１５７０を有するレガシー入出力部とを備える。 FIG. 6 is a block diagram illustrating an example of a hardware configuration of a computer 1500 according to the present embodiment. The computer 1500 according to this embodiment is connected to the CPU peripheral unit including the CPU 1505, the RAM 1520, the graphic controller 1575, and the display device 1580 connected to each other by the host controller 1582, and connected to the host controller 1582 by the input / output controller 1584. Input / output unit having communication interface 1530, hard disk drive 1540, and CD-ROM drive 1560, and legacy input / output unit having ROM 1510, flexible disk drive 1550, and input / output chip 1570 connected to input / output controller 1584 With.

ホスト・コントローラ１５８２は、ＲＡＭ１５２０と、高い転送レートでＲＡＭ１５２０をアクセスするＣＰＵ１５０５及びグラフィック・コントローラ１５７５とを接続する。ＣＰＵ１５０５は、ＲＯＭ１５１０及びＲＡＭ１５２０に格納されたプログラムに基づいて動作し、各部の制御を行う。グラフィック・コントローラ１５７５は、ＣＰＵ１５０５等がＲＡＭ１５２０内に設けたフレーム・バッファ上に生成する画像データを取得し、表示装置１５８０上に表示させる。これに代えて、グラフィック・コントローラ１５７５は、ＣＰＵ１５０５等が生成する画像データを格納するフレーム・バッファを、内部に含んでもよい。 The host controller 1582 connects the RAM 1520 to the CPU 1505 and the graphic controller 1575 that access the RAM 1520 at a high transfer rate. The CPU 1505 operates based on programs stored in the ROM 1510 and the RAM 1520 and controls each unit. The graphic controller 1575 acquires image data generated by the CPU 1505 and the like on a frame buffer provided in the RAM 1520 and displays the image data on the display device 1580. Alternatively, the graphic controller 1575 may include a frame buffer that stores image data generated by the CPU 1505 or the like.

入出力コントローラ１５８４は、ホスト・コントローラ１５８２と、比較的高速な入出力装置であるハードディスクドライブ１５４０、通信インターフェイス１５３０、ＣＤ−ＲＯＭドライブ１５６０を接続する。ハードディスクドライブ１５４０は、コンピュータ１５００内のＣＰＵ１５０５が使用するプログラム及びデータを格納する。通信インターフェイス１５３０は、ネットワークを介して対人ロボット１０と通信し、対人ロボット１０にプログラム及びデータを提供する。ＣＤ−ＲＯＭドライブ１５６０は、ＣＤ−ＲＯＭ１５９５からプログラム又はデータを読み取り、ＲＡＭ１５２０を介してハードディスクドライブ１５４０及び通信インターフェイス１５３０に提供する。 The input / output controller 1584 connects the host controller 1582 to the hard disk drive 1540, the communication interface 1530, and the CD-ROM drive 1560, which are relatively high-speed input / output devices. The hard disk drive 1540 stores programs and data used by the CPU 1505 in the computer 1500. The communication interface 1530 communicates with the interpersonal robot 10 via the network, and provides programs and data to the interpersonal robot 10. The CD-ROM drive 1560 reads a program or data from the CD-ROM 1595 and provides it to the hard disk drive 1540 and the communication interface 1530 via the RAM 1520.

また、入出力コントローラ１５８４には、ＲＯＭ１５１０と、フレキシブルディスク・ドライブ１５５０、及び入出力チップ１５７０の比較的低速な入出力装置とが接続される。ＲＯＭ１５１０は、コンピュータ１５００が起動時に実行するブート・プログラムや、コンピュータ１５００のハードウェアに依存するプログラム等を格納する。フレキシブルディスク・ドライブ１５５０は、フレキシブルディスク１５９０からプログラム又はデータを読み取り、ＲＡＭ１５２０を介してハードディスクドライブ１５４０及び通信インターフェイス１５３０に提供する。入出力チップ１５７０は、フレキシブルディスク・ドライブ１５５０や、例えばパラレル・ポート、シリアル・ポート、キーボード・ポート、マウス・ポート等を介して各種の入出力装置を接続する。 The input / output controller 1584 is connected to the ROM 1510, the flexible disk drive 1550, and the relatively low-speed input / output device of the input / output chip 1570. The ROM 1510 stores a boot program executed when the computer 1500 is started up, a program depending on the hardware of the computer 1500, and the like. The flexible disk drive 1550 reads a program or data from the flexible disk 1590 and provides it to the hard disk drive 1540 and the communication interface 1530 via the RAM 1520. The input / output chip 1570 connects various input / output devices via a flexible disk drive 1550 and, for example, a parallel port, a serial port, a keyboard port, a mouse port, and the like.

ＲＡＭ１５２０を介して通信インターフェイス１５３０に提供されるプログラムは、フレキシブルディスク１５９０、ＣＤ−ＲＯＭ１５９５、又はＩＣカード等の記録媒体に格納されて利用者によって提供される。プログラムは、記録媒体から読み出され、ＲＡＭ１５２０を介して通信インターフェイス１５３０に提供され、ネットワークを介して対人ロボット１０に送信される。対人ロボット１０に送信されたプログラムは、対人ロボット１０においてインストールされて実行される。対人ロボット１０にインストールされて実行されるプログラムは、対人ロボット１０を、図１から図５にかけて説明した対人ロボット１０として機能させる。 A program provided to the communication interface 1530 via the RAM 1520 is stored in a recording medium such as the flexible disk 1590, the CD-ROM 1595, or an IC card and provided by the user. The program is read from the recording medium, provided to the communication interface 1530 via the RAM 1520, and transmitted to the interpersonal robot 10 via the network. The program transmitted to the interpersonal robot 10 is installed and executed in the interpersonal robot 10. The program installed and executed in the interpersonal robot 10 causes the interpersonal robot 10 to function as the interpersonal robot 10 described with reference to FIGS.

以上に示したプログラムは、外部の記憶媒体に格納されてもよい。記憶媒体としては、フレキシブルディスク１５９０、ＣＤ−ＲＯＭ１５９５の他に、ＤＶＤやＰＤ等の光学記録媒体、ＭＤ等の光磁気記録媒体、テープ媒体、ＩＣカード等の半導体メモリ等を用いることができる。また、専用通信ネットワークやインターネットに接続されたサーバシステムに設けたハードディスク又はＲＡＭ等の記憶装置を記録媒体として使用し、ネットワークを介してプログラムをコンピュータ１５００に提供してもよい。 The program shown above may be stored in an external storage medium. As the storage medium, in addition to the flexible disk 1590 and the CD-ROM 1595, an optical recording medium such as a DVD or PD, a magneto-optical recording medium such as an MD, a tape medium, a semiconductor memory such as an IC card, or the like can be used. Further, a storage device such as a hard disk or a RAM provided in a server system connected to a dedicated communication network or the Internet may be used as a recording medium, and the program may be provided to the computer 1500 via the network.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されない。上記実施の形態に、多様な変更または改良を加えることが可能であることが当業者に明らかである。その様な変更または改良を加えた形態も本発明の技術的範囲に含まれ得ることが、特許請求の範囲の記載から明らかである。 As mentioned above, although this invention was demonstrated using embodiment, the technical scope of this invention is not limited to the range as described in the said embodiment. It will be apparent to those skilled in the art that various modifications or improvements can be added to the above-described embodiment. It is apparent from the scope of the claims that the embodiments added with such changes or improvements can be included in the technical scope of the present invention.

本発明の実施形態に係る対人ロボット１０の概要を示す図である。It is a figure which shows the outline | summary of the interpersonal robot 10 which concerns on embodiment of this invention. 本発明の実施形態に係る対人ロボット１０の構成の一例を示すブロック図である。It is a block diagram showing an example of composition of interpersonal robot 10 concerning an embodiment of the present invention. 本発明の実施形態に係る眼球１５の断面の一例を示す図である。It is a figure which shows an example of the cross section of the eyeball 15 which concerns on embodiment of this invention. 本発明の実施形態に係る開口部１５５の位置変化の一例を示す図である。It is a figure which shows an example of the position change of the opening part 155 which concerns on embodiment of this invention. 本発明の実施形態に係る対人ロボット１０における処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the process in the interpersonal robot 10 which concerns on embodiment of this invention. 本発明の実施形態に係るコンピュータ１５００の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the computer 1500 which concerns on embodiment of this invention.

Explanation of symbols

１０対人ロボット、１５眼球、１００撮像部、１０２光学系、１０４ＣＣＤ、１０６撮像信号処理部、１１０１次メモリ、１２０撮像制御部、１３０集音部、１４０話者選択部、１５０シェル部、１５５開口部、１６０撮像範囲制御部 DESCRIPTION OF SYMBOLS 10 Interpersonal robot, 15 Eyeball, 100 Imaging part, 102 Optical system, 104 CCD, 106 Imaging signal processing part, 110 Primary memory, 120 Imaging control part, 130 Sound collection part, 140 Speaker selection part, 150 Shell part, 155 Aperture, 160 Imaging range controller

Claims

An interpersonal robot that communicates with people,
An imaging unit that captures an image of a subject;
A shell portion that is provided outside the imaging unit, the imaging unit having an opening for capturing the image, and capable of changing a position of the opening;
An interpersonal robot comprising: an imaging range control unit configured to control a range captured by the imaging unit by controlling a position of the opening included in the shell unit.

The interpersonal robot according to claim 1, wherein the shell portion blocks a part of light incident on the imaging unit.

The interpersonal robot according to claim 1, wherein the shell portion is provided as at least a part of an eyeball of the interpersonal robot.

A speaker selection unit for selecting a region including a person talking to the interpersonal robot in the image captured by the imaging unit;
The interpersonal robot according to claim 1, wherein the imaging range control unit controls a range captured by the imaging unit based on a range selected by the speaker selection unit.

An interpersonal robot control program for functioning an interpersonal robot,
The interpersonal robot
An imaging unit that captures an image of a subject;
A shell portion that is provided outside the imaging unit, the imaging unit having an opening for capturing the image, and capable of changing a position of the opening;
An interpersonal robot control program that functions as an interpersonal robot, comprising: an imaging range control unit that controls a range captured by the imaging unit by controlling the position of the opening of the shell unit.

The interpersonal robot further includes a speaker selection unit that selects an area in the image captured by the image capturing unit that includes a person talking to the interpersonal robot,
The interpersonal robot control program according to claim 5, wherein the imaging range control unit controls a range captured by the imaging unit based on a range selected by the speaker selection unit.