JP2008254122A

JP2008254122A - Robot

Info

Publication number: JP2008254122A
Application number: JP2007099175A
Authority: JP
Inventors: Takahiro Ohashi; 孝裕大橋; Asuka Shiina; あす香椎名
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 2007-04-05
Filing date: 2007-04-05
Publication date: 2008-10-23
Anticipated expiration: 2027-04-05
Also published as: JP4976903B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a robot capable of autonomously controlling behavior in accordance with the surrounding environment, situation of person and reaction from the person. <P>SOLUTION: The robot stores a map data which associates speech output information whether the robot can speak or not, and which indicates a volume of speech and tone of speech with position information, and a person situation data which associates the speech output information, an image condition, and a sound condition with each other. An approach action control means 48 is equipped with an environment information detection means 110 for detecting the speech output information corresponding to a present position from the map data, a person situation determination means 120 for determining the situation of a target person based on the image condition, a processed result of the photograph image, and a sound recognition result, and a response action control means 130 which extracts the speech output information corresponding to the situation of determined person from the person situation data, and decides whether the speech for the target person is possible or not, the volume of speech and the tone of speech based on the extracted speech output information and the speech output information detected by the environment information detection means 110. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、人間と対話可能なロボットに係り、特に、自律移動可能なロボットに関するものである。 The present invention relates to a robot capable of interacting with a human, and more particularly to a robot capable of autonomous movement.

従来、入力される人間の音声を認識し、予め定められた情報を音声として出力することで人間と対話可能なロボットが知られている（例えば、特許文献１および特許文献２参照）。
特許文献１に記載のロボットは、ユーザに向かって能動的かつ積極的にインターネット上の情報を発信する対話型のロボット装置である。このロボットは、ユーザが対話不可能な領域に存在することを検知した場合や、ユーザが対話可能な領域に存在していることを検知しているがユーザの発話を検知できない場合には、動き部位を操作し、反応的ではない自発的動作を行わせ、ユーザの注意を引き、ユーザのロボットへの接近を誘発する。また、このロボットは、ユーザが発話する場合には、テキストの出力やブラウザの動作を停止させてユーザの発話を優先する。 Conventionally, there is known a robot capable of interacting with a human by recognizing an input human voice and outputting predetermined information as a voice (see, for example, Patent Document 1 and Patent Document 2).
The robot described in Patent Document 1 is an interactive robot apparatus that actively and actively transmits information on the Internet to a user. This robot moves when it detects that the user is in an area where the user cannot interact, or when it detects that the user exists in an area where the user can interact but cannot detect the user's speech. Manipulate the site to perform non-reactive voluntary movements, draw the user's attention and induce the user to approach the robot. In addition, when the user utters, this robot gives priority to the user's utterance by stopping text output and browser operation.

特許文献２に記載の対話装置は、データベースに格納される動作情報やＣＰＵの演算処理結果を動作により提示するロボットを備え、対話相手の音声および身体動作に基づいて判定されるタイミングで音声を出力するものである。具体的には、対話装置は、２人の人間が実際に対話したときの音声および身体動作と発話権の交代との相関を分析した結果を利用して人間側の発話権委譲を検出し、そのタイミングで発話することで自然で円滑な対話を図っている。
特開２００２−１０８３８８号公報（段落００１４〜段落００２５、図１）特開２００１−１６２５７３号公報（段落０００９〜段落００２０、図１） The dialogue apparatus described in Patent Document 2 includes a robot that presents operation information stored in a database and a calculation processing result of a CPU by an operation, and outputs a sound at a timing determined based on the voice of the conversation partner and the body motion. To do. Specifically, the dialogue apparatus detects the delegation of the utterance right on the human side using the analysis result of the correlation between the voice and the body movement when the two people actually interact and the change of the utterance right, Speaking at that timing, we have a natural and smooth dialogue.
JP 2002-108388 A (paragraph 0014 to paragraph 0025, FIG. 1) Japanese Patent Laying-Open No. 2001-162573 (paragraphs 0009 to 0020, FIG. 1)

しかしながら、特許文献１または特許文献２に記載のロボットは、対話を積極的に行ったり、インターネット上の情報を積極的に発信したりするものなので、想定された利用環境や使用状況と異なる場合に、想定どおりの発話を行うと利用者が不満を感じる場合が考えられる。例えば、騒々しい場所で使用されていた装置を静かな場所に移動させて使用した場合には、装置の発話音量を下げるために利用者が設定を変更しなければならない。また、友好的で親しみのある口調で発話していた装置をビジネスに転用した場合には、ビジネス用の口調とするために利用者が設定を変更しなければならない。そのため、装置が実際に利用される場所などの周囲の環境や対象とする人物の状況、人物からの反応に応じた行動を自律的に制御できるロボットが要望されている。 However, since the robot described in Patent Document 1 or Patent Document 2 actively performs dialogue or actively transmits information on the Internet, it is different from the assumed usage environment or usage situation. The user may feel dissatisfied when speaking as expected. For example, when a device used in a noisy place is moved to a quiet place and used, the user must change the setting in order to lower the utterance volume of the device. In addition, when a device that has spoken in a friendly and friendly tone is diverted to business, the user must change the setting to make it a business tone. Therefore, there is a demand for a robot that can autonomously control the surrounding environment such as a place where the apparatus is actually used, the situation of the target person, and the behavior according to the reaction from the person.

そこで、本発明では、前記した問題を解決し、周囲の環境や人物の状況、人物からの反応に応じた行動を自律的に制御することができるロボットを提供することを目的とする。 Therefore, an object of the present invention is to provide a robot that solves the above-described problems and can autonomously control the behavior according to the surrounding environment, the situation of the person, and the reaction from the person.

本発明は、前記目的を達成するために創案されたものであり、本発明のうち請求項１に記載のロボットは、予め設定された地図上における当該ロボットの現在位置を検出する現在位置検出手段と、コミュニケーションを行う対象とする人物を撮影手段で撮影した撮影画像から前記人物の状況を判別可能に画像処理する画像処理手段と、音声から前記人物の状況を判別可能に音声認識すると共に発話を行う音声処理手段と、前記発話を行う前に前記対象とする人物へのアプローチ行動を制御するアプローチ行動制御手段とを有したロボットであって、予め設定された当該ロボットの発話の可否、発話音量および発話口調のうちの少なくとも１つを示す発話出力情報と前記地図上の位置を示す位置情報とを対応付けて作成された地図データと、前記発話出力情報と予めそれぞれ設定された人物の状況を示す画像条件および人物からの反応を示す音声条件とを対応付けて作成された人物状況データとを記憶する記憶手段を備え、前記アプローチ行動制御手段が、検出された現在位置に対応する発話出力情報を前記地図データから当該ロボットの環境に起因した情報として検出する環境情報検出手段と、前記予め設定された画像条件と前記撮影画像の処理結果および前記音声認識の結果とに基づいて、前記対象とする人物の状況を判別する人状況判別手段と、前記人状況判別手段で判別された人物の状況に対応する発話出力情報を前記人物状況データから抽出し、前記抽出した発話出力情報と前記環境情報検出手段で検出された発話出力情報とに基づいて、前記対象とする人物に対する発話の可否、発話音量および発話口調のうちの少なくとも１つを決定する応対行動制御手段とを備えることを特徴とする。 The present invention was devised to achieve the above object, and the robot according to claim 1 of the present invention is a current position detecting means for detecting the current position of the robot on a preset map. And image processing means for performing image processing so that the situation of the person can be discriminated from a photographed image obtained by photographing the person to be communicated by the imaging means, and voice recognition and utterance so that the situation of the person can be distinguished from voice A robot having speech processing means to perform and approach behavior control means for controlling approach behavior to the target person before performing the utterance, and whether the utterance of the robot is set in advance, utterance volume Map data created by associating utterance output information indicating at least one of the utterance tone and position information indicating the position on the map; The approach behavior control means comprises storage means for storing utterance output information and human situation data created by associating preset image conditions indicating the situation of the person and voice conditions indicating a reaction from the person. An environment information detecting means for detecting speech output information corresponding to the detected current position from the map data as information resulting from the environment of the robot, the preset image condition, the processing result of the captured image, and Based on the result of the voice recognition, a human situation determination unit that determines the situation of the target person, and speech output information corresponding to the situation of the person determined by the human situation determination unit from the person situation data Utterances to the target person based on the extracted utterance output information and the utterance output information detected by the environment information detection means Availability, characterized in that it comprises a answering action control means for determining at least one of the speech sound and speech tone.

かかる構成によれば、ロボットは、発話を行う前に対象とする人物へのアプローチ行動を制御するために、応対行動制御手段によって、現在位置に対応して検出された発話出力情報と、人物の状況および人物からの反応に対応して抽出された発話出力情報とに基づいて、対象とする人物に対する発話の可否、発話音量および発話口調のうちの少なくとも１つを決定する。したがって、ロボットは、現在位置する場所と、対象とする人物の現在撮像された画像が示す状況およびその人物の現在の発話状況とに応じて、発話の可否、発話音量、発話口調を変更することができる。ここで、発話の可否において、発話の不可とは、発声（ｓａｙ）を不可とするものではなく、話しをすること（ｔａｌｋ）を不可とすることを示す。 According to such a configuration, the robot controls the approach behavior to the target person before speaking, and the speech output information detected corresponding to the current position by the response behavior control means, Based on the situation and the utterance output information extracted in response to the reaction from the person, at least one of utterance availability, utterance volume and utterance tone for the target person is determined. Therefore, the robot can change the availability of the utterance, the utterance volume, and the utterance tone according to the current position, the situation indicated by the currently captured image of the target person, and the current utterance situation of the person. Can do. Here, in the possibility of utterance, “impossibility of utterance” does not mean that utterance is not prohibited, but indicates that talking is not possible.

また、請求項２に記載のロボットは、請求項１に記載のロボットであって、当該ロボットの周囲の騒音を測定して騒音のレベルを検出する騒音測定手段をさらに備え、前記地図データが、予め設定された騒音のレベルごとに前記発話出力情報が前記位置情報と対応付けて作成され、前記環境情報検出手段が、検出された現在位置および検出された騒音のレベルに対応する発話出力情報を前記予め設定された騒音のレベルごとに作成された地図データから当該ロボットの環境に起因した情報として検出することを特徴とする。 The robot according to claim 2 is the robot according to claim 1, further comprising noise measuring means for measuring a noise around the robot and detecting a noise level, wherein the map data includes: For each preset noise level, the utterance output information is created in association with the position information, and the environment information detecting means generates utterance output information corresponding to the detected current position and the detected noise level. It is detected as information resulting from the environment of the robot from map data created for each preset noise level.

かかる構成によれば、ロボットは、現在位置する場所と、その場所で測定された騒音と、対象とする人物の現在撮像された画像が示す状況および人物からの反応に応じて、発話の可否、発話音量、発話口調を変更することができる。つまり、場所に依存したロボットの発話出力情報は、位置情報と騒音を考慮して適切に定められる。したがって、例えば、日常的には音量を低下する制御を行うべき静かな場所が、特別な用途に供されて騒々しい場所となっている場合には、ロボットは、音量をそのまま維持する制御または音量を高くする制御を行うことができる。 According to such a configuration, the robot determines whether or not to speak depending on the current position, the noise measured at the location, the situation indicated by the currently captured image of the target person, and the reaction from the person, The utterance volume and utterance tone can be changed. That is, the utterance output information of the robot depending on the location is appropriately determined in consideration of the position information and noise. Therefore, for example, when a quiet place where the control for reducing the volume is to be performed on a daily basis is a noisy place that is used for a special purpose, the robot performs control for maintaining the volume as it is or Control to increase the volume can be performed.

また、請求項３に記載のロボットは、請求項１または請求項２に記載のロボットであって、前記応対行動制御手段が、前記人物状況データから抽出した発話出力情報と、前記環境情報検出手段で検出された発話出力情報とを比較し、両者が異なる場合に、各発話出力情報を数値化して重み付けを行って統合した統合値を算出し、前記算出された統合値が予め設定された発話の許可を示す設定値より小さい場合に、前記人物への発話を許可する行動統括制御手段と、前記人物への発話が許可された場合に、前記統合値に基づいて発話音量のレベルの調整または発話口調の切り替えを行う発話レベル調整手段とを備えることを特徴とする。 The robot according to claim 3 is the robot according to claim 1 or 2, wherein the response action control means extracts the utterance output information extracted from the person situation data and the environment information detection means. If the two are different from each other, and the two are different, each utterance output information is digitized and weighted to calculate an integrated value, and the calculated integrated value is set in advance. If the utterance volume level is smaller than the set value indicating the permission of the person, the behavior control unit for allowing the utterance to the person and the utterance volume level adjustment based on the integrated value when the utterance to the person is permitted or And an utterance level adjusting means for switching the utterance tone.

かかる構成によれば、ロボットは、現在位置する場所に応じた発話出力情報と、対象とする人物の現在撮像された画像が示す状況および人物からの反応に応じた発話出力情報とがたとえ異なっていても、それらの統合値を算出する。そして、ロボットは、算出された統合値に基づいて、発話が許可されているときに発話音量のレベルの調整または発話口調の切り替えを行う。ここで、ロボットは、発話が許可されているときに、ロボットが実際に発話を行う位置から対象とする人物のいる位置までの距離をも加味して発話音量のレベルを調整することができる。 According to such a configuration, in the robot, the utterance output information corresponding to the current location is different from the utterance output information corresponding to the situation indicated by the currently captured image of the target person and the reaction from the person. Even those integrated values are calculated. Then, the robot adjusts the level of the utterance volume or switches the utterance tone when utterance is permitted based on the calculated integrated value. Here, when the utterance is permitted, the robot can adjust the level of the utterance volume in consideration of the distance from the position where the robot actually utters to the position where the target person is located.

また、請求項４に記載のロボットは、請求項３に記載のロボットであって、当該ロボットの胴部にそれぞれ接続された頭部、腕部および脚部のうちの少なくとも１つの部位を駆動する駆動手段に駆動信号を出力して前記少なくとも１つの部位を自律的に移動させる自律移動制御手段と、所定の発話を行うときに前記少なくとも１つの部位を移動させる身体動作である身振りを指定する予め作成されたシナリオを記憶するシナリオ記憶手段と、前記対象とする人物に対して行う発話に対応した身振りを前記シナリオから抽出し、前記抽出した身振りを指定するコマンドを前記自律移動制御手段に出力する身振り統合手段と、前記発話レベル調整手段で発話音量のレベルが調整された場合に、前記調整された発話音量のレベルに比例させて前記コマンドとして指定される身振りによる前記部位の移動幅を調整する身振り調整手段とをさらに備えることを特徴とする。 According to a fourth aspect of the present invention, there is provided the robot according to the third aspect, wherein the robot drives at least one of a head, an arm, and a leg connected to the torso of the robot. An autonomous movement control means for autonomously moving the at least one part by outputting a drive signal to the drive means, and predesignating gestures that are physical actions for moving the at least one part when performing a predetermined utterance Scenario storage means for storing the created scenario, and gestures corresponding to utterances made to the target person are extracted from the scenario, and a command for specifying the extracted gestures is output to the autonomous movement control means When the speech volume level is adjusted by the gesture integration unit and the utterance level adjustment unit, the volume is adjusted in proportion to the adjusted utterance volume level. Wherein the by gestures specified further comprising a gesture adjusting means for adjusting the movement width of the site as a command.

かかる構成によれば、ロボットは、発話音量のレベルが調整された場合に、調整された発話音量のレベルに比例させて身振りによる各部位の移動幅を調整する。したがって、ロボットは、比較的低い発話音量のときに比較的小さな移動幅の身振りを行い、比較的高い発話音量のときに比較的大きな移動幅の身振りを行う。そのため、ロボットの発話中の身振りが自然なものとなる。ここで、身振りには、発話内容に伴う様々な意思表示を行うために、頭部を傾ける動作、腕部を上げたり広げたりする動作、脚部を上下移動する動作等が含まれる。 According to this configuration, when the level of the utterance volume is adjusted, the robot adjusts the movement width of each part by gesture in proportion to the adjusted level of the utterance volume. Therefore, the robot performs gestures with a relatively small movement width when the speech volume is relatively low, and performs gestures with a relatively large movement width when the speech volume is relatively high. Therefore, gestures during the utterance of the robot become natural. Here, the gesture includes an action of tilting the head, an action of raising and spreading the arm part, an action of moving the leg part up and down, etc. in order to display various intentions according to the utterance content.

また、請求項５に記載のロボットは、請求項１ないし請求項４のいずれか一項に記載のロボットであって、前記画像処理手段が、前記撮影画像から前記対象とする人物の顔領域を抽出する顔認識手段と、前記抽出された顔領域から前記対象とする人物の視線方向を検出する視線検出手段とを有し、前記対象とする人物に対して所定の話題に関する発話を開始した後に前記視線検出手段で検出された視線方向を数値化した興味度を算出し、前記算出した興味度が上昇したか否かを判別し、前記興味度が上昇した場合に前記人物が前記話題に興味を有していると判定すると共に、前記判別結果を記録する興味推定手段と、前記興味度が下降した場合に、前記所定の話題に関する発話を中断する話題制御手段とをさらに備えることを特徴とする。 A robot according to a fifth aspect is the robot according to any one of the first to fourth aspects, wherein the image processing means extracts a face area of the target person from the captured image. A face recognizing means for extracting, and a line-of-sight detecting means for detecting a line-of-sight direction of the target person from the extracted face area, and after uttering a predetermined topic to the target person An interest level obtained by quantifying the gaze direction detected by the line-of-sight detection means is calculated, it is determined whether or not the calculated interest level has increased, and the person is interested in the topic when the interest level has increased. And an interest estimation unit that records the determination result, and a topic control unit that interrupts utterances related to the predetermined topic when the degree of interest decreases. To do.

かかる構成によれば、ロボットは、対象とする人物に所定の話題に関する発話を行った後に、その人物の視線方向から興味度を算出し、興味度が上昇した話題に関する情報を取得することができる。一般に、人は発話者の話す内容に興味を持っているときに視線を発話者の方へ向けるので、ロボットが話題を音声出力した後で検出された人物の視線方向は、音声出力された話題に関するその人物の興味を反映することとなる。したがって、算出された興味度は、人物の興味を反映するものとなる。また、興味度が下降した場合、具体的には、人物がロボットから視線をそむけた場合には、ロボットは、発話中の話題を提供することを中断する。したがって、ロボットから話題を提供された人物は、興味の無い話題を聞き続けなくてよいので、ロボットに親しみを感じ易くなる。 According to such a configuration, the robot can utter an utterance related to a predetermined topic to the target person, and then calculate the degree of interest from the gaze direction of the person and acquire information on the topic whose degree of interest has increased. . In general, when a person is interested in what the speaker speaks, the gaze direction is directed toward the speaker. Will reflect the person's interest in Therefore, the calculated interest level reflects the interest of the person. In addition, when the degree of interest decreases, specifically, when a person turns away the line of sight from the robot, the robot stops providing the topic being spoken. Therefore, a person who is provided with a topic from the robot does not have to keep listening to a topic that is not interested in, and thus it is easy to feel familiar with the robot.

また、請求項６に記載のロボットは、請求項１ないし請求項４のいずれか一項に記載のロボットであって、前記音声処理手段が、入力音声から前記対象とする人物の音声の音量、音質、話速のうちの少なくとも１つを示す人物発話情報を検出する発話情報検出手段を有し、前記対象とする人物に対して所定の話題に関する発話を開始した後に前記音声処理手段で音声認識された認識結果または前記発話情報検出手段で検出された人物発話情報を数値化した興味度を算出し、前記算出した興味度が上昇したか否かを判別し、前記興味度が上昇した場合に前記人物が前記話題に興味を有していると判定すると共に、前記判別結果を記録する興味推定手段と、前記興味度が下降した場合に、前記所定の話題に関する発話を中断する話題制御手段とをさらに備えることを特徴とする。 Moreover, the robot according to claim 6 is the robot according to any one of claims 1 to 4, wherein the sound processing means is configured to input a sound volume of the target person from an input sound, Speech information detecting means for detecting person speech information indicating at least one of sound quality and speech speed, and speech recognition by the speech processing means after starting speech on a predetermined topic for the target person When the degree of interest is calculated by calculating the degree of interest obtained by converting the recognition result obtained or the person utterance information detected by the utterance information detecting unit into a numerical value, and determining whether the calculated degree of interest has increased. An interest estimation unit that determines that the person is interested in the topic and records the determination result; a topic control unit that interrupts an utterance related to the predetermined topic when the degree of interest decreases; The Characterized in that it comprises the al.

かかる構成によれば、ロボットは、対象とする人物に所定の話題に関する発話を行った後に、その人物の音声から検出された人物発話情報から興味度を算出し、興味度が上昇した話題に関する情報を取得することができる。人は対話中の相手の話す内容に対する興味を失うと、相手に返す返事の音声の音量が小さくなったり、音質が低くなったり、話速が遅くなったりするので、ロボットが話題を音声出力した後で検出された人物発話情報は、音声出力された話題に関するその人物の興味を反映することとなる。したがって、算出された興味度は、人物の興味を反映するものとなる。また、興味度が下降した場合、具体的には、人物の返事の音声の音量が小さくなったり、音質が低くなったり、話速が遅くなった場合には、ロボットは、発話中の話題を提供することを中断する。したがって、ロボットから話題を提供された人物は、興味の無い話題を聞き続けなくてよいので、ロボットに親しみを感じ易くなる。 According to such a configuration, the robot performs an utterance on a predetermined topic to a target person, calculates an interest level from the person utterance information detected from the voice of the person, and information on the topic whose degree of interest has increased. Can be obtained. When a person loses interest in what the other party is talking to, the volume of the reply voice returned to the other party becomes lower, the sound quality is lower, or the speaking speed is slower. The person utterance information detected later reflects the person's interest regarding the topic that has been voiced. Therefore, the calculated interest level reflects the interest of the person. In addition, when the interest level decreases, specifically, when the volume of the answering voice of the person decreases, the sound quality decreases, or the speaking speed slows down, the robot selects the topic being spoken. Suspend providing. Therefore, a person who is provided with a topic from the robot does not have to keep listening to a topic that is not interested in, and thus it is easy to feel familiar with the robot.

また、請求項７に記載のロボットは、請求項５または請求項６に記載のロボットであって、複数の話題を記憶した人物情報記憶手段をさらに備え、前記話題制御手段が、前記興味度に基づいて、前記人物情報記憶手段に記憶された話題を切り替え、前記対象とする人物に対して前記切り替えた話題に関する発話を行うことを特徴とする。 The robot according to claim 7 is the robot according to claim 5 or 6, further comprising personal information storage means for storing a plurality of topics, wherein the topic control means determines the degree of interest. Based on this, the topic stored in the person information storage means is switched, and the utterance related to the switched topic is made to the target person.

かかる構成によれば、ロボットは、複数の話題を記憶し、対象とする人物に発話を行った後に取得した情報から算出した興味度の変化に合わせて提供する話題を切り替える。したがって、ロボットは、対話中の興味度に基づいて、対話相手の嗜好する話題を推定して発話することができる。 According to this configuration, the robot stores a plurality of topics, and switches the topics to be provided according to the change in the degree of interest calculated from the information acquired after speaking to the target person. Therefore, the robot can estimate a topic preferred by the conversation partner based on the degree of interest during the conversation and speak.

請求項１に記載の発明によれば、ロボットは、周囲の環境や人物の状況に応じた行動を自律的に制御することができる。
請求項２に記載の発明によれば、ロボットは、同じ場所であっても騒音に依存して異なる方法で発話を行うことが可能となる。
請求項３に記載の発明によれば、ロボットは、周囲の環境に応じて定められた行動と人物の状況に応じて定められた行動とを統合することができる。
請求項４に記載の発明によれば、ロボットは、発話音量の変化に対して自然な身振りを行うことができる。 According to the first aspect of the present invention, the robot can autonomously control the behavior according to the surrounding environment and the situation of the person.
According to the second aspect of the present invention, the robot can speak in a different manner depending on noise even at the same place.
According to the invention described in claim 3, the robot can integrate the action determined according to the surrounding environment and the action determined according to the situation of the person.
According to the fourth aspect of the present invention, the robot can perform natural gestures against changes in the utterance volume.

請求項５または請求項６に記載の発明によれば、ロボットは、対話相手の話題に対する興味に関する情報を定量的に取得することができる。
請求項７に記載の発明によれば、ロボットは、対話相手の嗜好する話題を推定して発話するので、対話相手である人物がロボットに好意や親しみを感じ易くなる。また、ロボットは、話題を切り替えて人物との会話を継続させることによってその人物の嗜好等の情報を効果的に収集することができる。 According to the invention described in claim 5 or 6, the robot can quantitatively acquire information related to the interest of the conversation partner.
According to the seventh aspect of the present invention, since the robot utters by estimating the topic preferred by the conversation partner, the person who is the conversation partner can easily feel the friendliness and friendliness of the robot. Further, the robot can effectively collect information such as the preference of the person by switching the topic and continuing the conversation with the person.

以下、図面を参照して本発明のロボットを実施するための最良の形態（以下「実施形態」という）について詳細に説明する。まず、本発明の実施形態に係るロボットを含むロボット制御システムＡの全体構成について図１を参照して説明する。図１は、本発明の実施形態に係るロボットを含むロボット制御システムの構成を模式的に示す図である。 Hereinafter, the best mode for carrying out the robot of the present invention (hereinafter referred to as “embodiment”) will be described in detail with reference to the drawings. First, an overall configuration of a robot control system A including a robot according to an embodiment of the present invention will be described with reference to FIG. FIG. 1 is a diagram schematically showing a configuration of a robot control system including a robot according to an embodiment of the present invention.

（ロボット制御システムＡの構成）
図１に示すように、ロボット制御システムＡは、ロボットＲと、このロボットＲと無線通信によって接続された基地局１と、この基地局１とロボット専用ネットワーク２を介して接続された管理用コンピュータ３と、この管理用コンピュータ３にネットワーク４を介して接続された端末５とから構成される。 (Configuration of robot control system A)
As shown in FIG. 1, a robot control system A includes a robot R, a base station 1 connected to the robot R by wireless communication, and a management computer connected to the base station 1 via a robot dedicated network 2. 3 and a terminal 5 connected to the management computer 3 via a network 4.

図１に示すように、このロボット制御システムＡは、移動機能を備えた複数のロボットＲ_A，Ｒ_B，Ｒ_C（ただし、ロボットを特定しない場合は、単にロボットＲという）を有しており、各ロボットＲは、管理用コンピュータ３においてロボットＲ毎に予め設定されたタスクの実行計画（タスクスケジュール）に従って、タスクを実行する。 As shown in FIG. 1, the robot control system A has a plurality of robots R _A , R _B , and R _C having a moving function (however, when a robot is not specified, it is simply referred to as a robot R). Each robot R executes a task in accordance with a task execution plan (task schedule) preset for each robot R in the management computer 3.

ここでは、自律移動型の２足歩行ロボットを一例として説明する。
ロボットＲは、管理用コンピュータ３から入力された実行命令に従ってタスクを実行するものであり、ロボットＲがタスクを実行する領域として予め設定されたタスク実行エリア内に、少なくとも一台配置されている。
ここで、図１には、来訪者を会議室などの所定の場所に案内するという内容のタスク（案内タスク）を実行中のロボットＲ_Aと、荷物をある人に渡すという内容のタスク（荷物配達タスク）を実行中のロボットＲ_Bと、新たなタスクが割り当てられるまで待機中のロボットＲ_Cとが、例示されている。 Here, an autonomous mobile biped robot will be described as an example.
The robot R executes a task in accordance with an execution command input from the management computer 3, and at least one robot R is arranged in a task execution area set in advance as an area where the robot R executes the task.
Here, FIG. 1 shows a robot _RA that is executing a task (guidance task) that guides a visitor to a predetermined place such as a conference room, and a task (package) that delivers a package to a person. A robot R _B executing the delivery task) and a robot R _C waiting until a new task is assigned are illustrated.

ロボットＲは、図２に示すように、頭部Ｒ１、腕部Ｒ２、脚部Ｒ３、胴部Ｒ４および背面格納部Ｒ５を有しており、胴部Ｒ４にそれぞれ接続された頭部Ｒ１、腕部Ｒ２、脚部Ｒ３は、それぞれアクチュエータ（駆動手段）により駆動され、自律移動制御部５０（図６参照）により２足歩行の制御がなされる。この２足歩行についての詳細は、例えば、特開２００１−６２７６０号公報に開示されている。 As shown in FIG. 2, the robot R has a head R1, an arm R2, a leg R3, a torso R4, and a rear housing R5, and the head R1 and arms connected to the torso R4, respectively. The part R2 and the leg part R3 are each driven by an actuator (driving means), and bipedal walking is controlled by the autonomous movement control part 50 (see FIG. 6). Details of this bipedal walking are disclosed in, for example, Japanese Patent Application Laid-Open No. 2001-62760.

このロボットＲは、例えば、案内タスクを実行するときには、人物Ｈを所定の案内領域（オフィスや廊下などの移動領域）で案内する。ここでは、ロボットＲは、周囲に光（例えば、赤外光、紫外光、レーザ光など）および電波を発信して周辺領域に、タグＴを備えた人物Ｈが存在するか否かを検知し、検知した人物Ｈの位置を特定して接近し、タグＴに基づいて、人物Ｈが誰であるのかという個人識別を行う。このタグＴは、ロボットＲが人物の位置（距離および方向）を特定するために発する赤外光および電波を受信する。このタグＴは、受信した赤外光に含まれる受光方向を示す信号と、受信した電波に含まれるロボットＩＤとに基づいて、タグ識別番号を含む受信報告信号を生成し、当該ロボットＲに返信する。この受信報告信号を受信したロボットＲは、受信報告信号に基づいて、タグＴを装着した人物Ｈまでの距離と方向とを認識し、当該人物Ｈに接近することができる。 For example, when executing a guidance task, the robot R guides the person H in a predetermined guidance area (a movement area such as an office or a corridor). Here, the robot R detects whether or not the person H having the tag T exists in the peripheral area by transmitting light (for example, infrared light, ultraviolet light, laser light, etc.) and radio waves to the surroundings. Then, the position of the detected person H is specified and approached, and based on the tag T, personal identification as to who the person H is is performed. The tag T receives infrared light and radio waves emitted from the robot R to specify the position (distance and direction) of the person. The tag T generates a reception report signal including a tag identification number based on the signal indicating the light receiving direction included in the received infrared light and the robot ID included in the received radio wave, and returns it to the robot R. To do. The robot R that has received the reception report signal can recognize the distance and direction to the person H wearing the tag T based on the reception report signal, and can approach the person H.

ロボットＲは、あるタスク（例えば案内タスクや荷物配達タスクなど）を実行するために案内領域内を自律移動する場合に、レーザスリット光または赤外線を照射して、路面状態あるいは路面上のマークを探索するようになっている。すなわち、ロボットＲは、自己が移動領域内のどこを移動しているかを把握し、通常の移動領域内にいる場合はレーザスリット光を路面に照射して路面の段差、うねり、障害物の有無などを検出し、マークＭの設置領域内にいる場合は、赤外線を路面に照射してマークＭを検出し、自己位置の確認・補正などを行うようになっている。ここで、マークＭは、例えば赤外線を再帰的に反射する反射材料で構成された部材である。また、マークＭは位置データを有しており、当該位置データは地図データに含まれる形で記憶部３０（図６参照）に記憶されている。なお、地図データは、案内領域内の特定の場所に設置されたマークＭの位置データと、当該位置データに所定の幅（範囲）を持たせたマークＭの設置領域に関するデータとを含んでいる。また、マークＭの設置領域とは、マークＭから所定距離の範囲内にある領域をいい、例えば、マークＭを中心とした半径１〜３ｍの円形領域や、マークＭの手前（ロボット側）３ｍの矩形領域などのように任意に設定される。 When the robot R autonomously moves within the guidance area to execute a certain task (for example, a guidance task or a package delivery task), the robot R irradiates laser slit light or infrared rays to search for a road surface state or a mark on the road surface. It is supposed to be. In other words, the robot R knows where the robot is moving in the moving area, and when in the normal moving area, the robot R irradiates the road surface with laser slit light to check for road steps, swells, and obstacles. When the mark M is within the installation area of the mark M, the mark M is detected by irradiating the road surface with infrared rays, and the self position is confirmed and corrected. Here, the mark M is a member made of a reflective material that recursively reflects infrared rays, for example. The mark M has position data, and the position data is stored in the storage unit 30 (see FIG. 6) in a form included in the map data. The map data includes the position data of the mark M installed at a specific location in the guidance area and the data related to the installation area of the mark M having a predetermined width (range) in the position data. . The mark M installation area refers to an area within a predetermined distance from the mark M. For example, a circular area having a radius of 1 to 3 m around the mark M, or 3 m before the mark M (robot side). It is arbitrarily set like a rectangular area.

図１に戻って、ロボット制御システムＡの構成の説明を続ける。
基地局１は、ロボットＲと管理用コンピュータ３との間のデータ交換を仲介するものである。
具体的には、基地局１は、管理用コンピュータ３から出力された実行命令をロボットＲに送信すると共に、ロボットＲから送信されたロボットＲの状態に関するデータ（ステータス情報）やロボットＲが実行命令を受信したことを示す信号（受信報告信号）を受信して、管理用コンピュータ３に出力するものである。
基地局１は、ロボットＲと管理用コンピュータ３との間のデータ交換を確実に行えるようにするために、タスク実行エリア内に少なくとも一つ設けられている。
なお、タスク実行エリアが建物の数フロアに亘って設定されている場合には、フロア毎に設けられていることが好ましく、一つの基地局１では総てのタスク実行エリアをカバーできない場合には、複数の基地局１がタスク実行エリア内に設けられていることが好ましい。 Returning to FIG. 1, the description of the configuration of the robot control system A will be continued.
The base station 1 mediates data exchange between the robot R and the management computer 3.
Specifically, the base station 1 transmits an execution command output from the management computer 3 to the robot R, and also transmits data (status information) on the state of the robot R transmitted from the robot R and the execution command from the robot R. Is received and a signal (reception report signal) is received and output to the management computer 3.
At least one base station 1 is provided in the task execution area in order to ensure data exchange between the robot R and the management computer 3.
In addition, when the task execution area is set over several floors of the building, it is preferable to be provided for each floor. When one base station 1 cannot cover all the task execution areas, A plurality of base stations 1 are preferably provided in the task execution area.

ロボット専用ネットワーク２は、基地局１と、管理用コンピュータ３と、ネットワーク４とを接続するものであり、ＬＡＮ（Local Area Network）などにより実現されるものである。 The robot dedicated network 2 connects the base station 1, the management computer 3, and the network 4, and is realized by a LAN (Local Area Network) or the like.

管理用コンピュータ３は、複数のロボットＲを管理するものであり、基地局１、ロボット専用ネットワーク２を介してロボットＲの移動・発話などの各種制御を行うと共に、ロボットＲに対して必要な情報を提供する。ここで、必要な情報とは、検知された人物の氏名や、ロボットＲの周辺の地図（ローカル地図）などがこれに相当し、これらの情報は、管理用コンピュータ３の記憶部３ａに記憶されている。 The management computer 3 manages a plurality of robots R, and performs various controls such as movement and speech of the robot R via the base station 1 and the robot dedicated network 2 and information necessary for the robot R. I will provide a. Here, the necessary information corresponds to the name of the detected person, a map around the robot R (local map), and the like. These pieces of information are stored in the storage unit 3a of the management computer 3. ing.

図３は、図１に示したロボットシステムで用いられるローカル地図の一例を示す図である。ここでは、案内領域３０１は、図３（ａ）に示すように、建物のあるフロアの長方形の領域である。ロボットＲやロボットＲが案内すべき人物は、案内領域３０１の出入口３０２の外側の通路３０３を通って案内領域３０１に入る。出入口３０２の内側には、ホール３０４が広がっており、ホール３０４の奥の隅には受付３０５が配置され、案内領域３０１の壁側には個室として仕切られた警備室３０６、談話室３０７および会議室３０８がそれぞれ設けられている。受付３０５は、Ｌ字型のカウンタテーブル３０５ａと、受付スタッフが配置されるカウンタスペース３０５ｂとから成る。カウンタスペース３０５ｂには、基地局１が設置されている。この案内領域３０１は、図３（ｂ）に示すように、場所に応じてロボットＲによって検出される騒音レベルが異なっている。ロボットＲが案内タスクを行う時間帯において、ホール３０４の騒音レベルは、例えば「６０ｄＢ」である。同様に、警備室３０６、談話室３０７および会議室３０８の騒音レベルは、それぞれ、例えば、「７０ｄＢ」、「１００ｄＢ」、「５０ｄＢ」である。なお、管理用コンピュータ３は、通路や部屋などのローカル地図の情報を位置座標データと関連づけて登録したローカルマップ（ローカル地図データ）と、ローカルマップを集積したタスク実行エリアの地図情報であるグローバルマップとを記憶部３ａ（図１参照）に保持している。 FIG. 3 is a diagram showing an example of a local map used in the robot system shown in FIG. Here, as shown in FIG. 3A, the guide area 301 is a rectangular area on the floor where the building is located. The robot R or a person to be guided by the robot R enters the guide area 301 through the passage 303 outside the entrance / exit 302 of the guide area 301. Inside the entrance / exit 302, a hall 304 spreads out, a reception 305 is arranged at the back corner of the hall 304, and a security room 306, a common room 307, and a conference room partitioned as a private room on the wall side of the guide area 301. Each chamber 308 is provided. The reception 305 includes an L-shaped counter table 305a and a counter space 305b in which reception staff are arranged. The base station 1 is installed in the counter space 305b. As shown in FIG. 3B, the noise level detected by the robot R varies depending on the location of the guide area 301. In the time zone when the robot R performs the guidance task, the noise level in the hall 304 is, for example, “60 dB”. Similarly, the noise levels of the security room 306, the common room 307, and the conference room 308 are, for example, “70 dB”, “100 dB”, and “50 dB”, respectively. The management computer 3 includes a local map (local map data) in which information on local maps such as passages and rooms is registered in association with position coordinate data, and a global map that is map information of a task execution area in which the local maps are accumulated. Are stored in the storage unit 3a (see FIG. 1).

また、管理用コンピュータ３は、ロボットＲに実行させるタスクに関する情報（タスクデータ）を記憶するタスク情報データベースを記憶部３ａ（図１参照）に保持している。
図４に示すように、タスク情報データベース４００には、タスク毎に割り当てられた固有の識別子であるタスクＩＤ、タスクの優先度、タスクの重要度、タスクを実行させるロボットの識別子であるロボットＩＤ、案内や運搬（荷物配達）などのタスクの内容、タスク実行エリア内におけるタスクを開始する位置（開始位置）、タスク実行エリア内におけるタスクを終了する位置（終了位置）、タスクの実行に要する時間（所要時間）、そしてタスクの開始予定時刻（開始時刻）、タスクの終了予定時刻（終了時刻）、そしてタスクの状態などが、情報項目として含まれている。 Further, the management computer 3 holds a task information database that stores information (task data) related to tasks to be executed by the robot R in the storage unit 3a (see FIG. 1).
As shown in FIG. 4, the task information database 400 includes a task ID that is a unique identifier assigned to each task, a task priority, a task importance, a robot ID that is an identifier of a robot that executes the task, Contents of tasks such as guidance and transportation (package delivery), the position where the task starts within the task execution area (start position), the position where the task ends within the task execution area (end position), and the time required to execute the task ( Time required), scheduled task start time (start time), scheduled task end time (end time), task status, and the like are included as information items.

また、管理用コンピュータ３は、ロボットＲに実行させるタスクの実行計画（タスクスケジュール）を、ロボットＲ毎に設定するものである。
図５に示すように、タスクスケジュールテーブル５００は、ロボットＲに実行させるタスクの実行順位、タスク情報データベース４００（図４参照）に登録されたタスクを特定するためのタスクＩＤ、タスクの優先度、タスクの内容、そしてタスクの状態を情報項目として含むテーブルである。
このタスクスケジュールテーブル５００では、これら情報項目が、タスク実行エリア内に配置されたロボットＲ毎に整理されており、どの様なタスクが、どのような順番で各ロボットＲに割り当てられているのかを把握できるようになっている。 The management computer 3 sets an execution plan (task schedule) for tasks to be executed by the robot R for each robot R.
As shown in FIG. 5, the task schedule table 500 includes an execution order of tasks to be executed by the robot R, a task ID for identifying a task registered in the task information database 400 (see FIG. 4), a task priority, It is a table that includes task contents and task status as information items.
In this task schedule table 500, these information items are arranged for each robot R arranged in the task execution area, and what kind of tasks are assigned to each robot R in what order. It is possible to grasp.

再び、図１に戻って、ロボット制御システムＡの構成の説明を続ける。
端末５は、ネットワーク４を介して管理用コンピュータ３に接続し、管理用コンピュータ３の記憶部３ａに、人物に関する情報などを登録する、もしくは登録されたこれらの情報を修正するものである。また、端末５は、ロボットＲに実行させるタスクの登録や、管理用コンピュータ３において設定されるタスクスケジュールの変更や、ロボットＲの動作命令の入力などを行うものである。 Returning to FIG. 1 again, the description of the configuration of the robot control system A will be continued.
The terminal 5 is connected to the management computer 3 via the network 4 and registers information related to a person in the storage unit 3a of the management computer 3 or corrects the registered information. The terminal 5 is used for registering tasks to be executed by the robot R, changing a task schedule set in the management computer 3, inputting an operation command for the robot R, and the like.

以下、ロボットＲについて詳細に説明する。 Hereinafter, the robot R will be described in detail.

［ロボット］
ロボットＲは、図６に示すように、頭部Ｒ１、腕部Ｒ２、脚部Ｒ３、胴部Ｒ４および背面格納部Ｒ５に加えて、これら各部Ｒ１〜Ｒ５の適所に、カメラＣ，Ｃ、スピーカＳ、マイクＭＣ，ＭＣ、画像処理部１０、音声処理部２０、記憶部３０、主制御部４０、自律移動制御部５０、無線通信部６０、バッテリ７０、対象検知部８０、および周辺状態検知部９０を有する。
さらに、ロボットＲは、ロボットＲの向いている方向を検出するジャイロセンサＳＲ１や、予め設定された地図上におけるロボットＲの存在する位置座標を取得するためのＧＰＳ（Global Positioning System）受信器ＳＲ２を有している。 [robot]
As shown in FIG. 6, in addition to the head R1, the arm R2, the leg R3, the trunk R4, and the rear housing R5, the robot R includes cameras C and C, speakers at appropriate positions of these parts R1 to R5. S, microphone MC, MC, image processing unit 10, audio processing unit 20, storage unit 30, main control unit 40, autonomous movement control unit 50, wireless communication unit 60, battery 70, object detection unit 80, and surrounding state detection unit 90.
Furthermore, the robot R includes a gyro sensor SR1 that detects the direction in which the robot R is facing, and a GPS (Global Positioning System) receiver SR2 that acquires the position coordinates of the robot R on a preset map. Have.

［カメラ］
カメラ（撮影手段）Ｃ，Ｃは、ロボットＲの前方移動方向側の映像をデジタルデータとして取り込むことができるものであり、例えば、カラーＣＣＤ(Charge-Coupled Device)カメラが使用される。カメラＣ，Ｃは、左右に平行に並んで配置され、撮影した画像は画像処理部１０に出力される。このカメラＣ，Ｃと、スピーカＳおよびマイクＭＣ，ＭＣは、いずれも頭部Ｒ１の内部に配設される。スピーカ（音声出力手段）Ｓは、音声処理部２０で音声合成された所定の音声を発することができる。 [camera]
Cameras (photographing means) C and C are capable of capturing an image of the robot R in the forward movement direction as digital data. For example, a color CCD (Charge-Coupled Device) camera is used. The cameras C and C are arranged side by side in parallel on the left and right, and the captured image is output to the image processing unit 10. The cameras C and C, the speaker S, and the microphones MC and MC are all disposed inside the head R1. The speaker (sound output means) S can emit a predetermined sound synthesized by the sound processing unit 20.

［画像処理部］
画像処理部（画像処理手段）１０は、カメラＣ，Ｃが撮影した画像（撮影画像）を処理して、撮影された画像からロボットＲの周囲の状況を把握するため、周囲の障害物や人物の認識を行う部分である。この画像処理部１０は、ステレオ処理部１１ａ、移動体抽出部１１ｂ、顔認識部１１ｃおよび視線検出部１１ｄを含んで構成される。
ステレオ処理部１１ａは、左右のカメラＣ，Ｃが撮影した２枚の画像の一方を基準としてパターンマッチングを行い、左右の画像中の対応する各画素の視差を計算して視差画像を生成し、生成した視差画像および元の画像を移動体抽出部１１ｂに出力する。なお、この視差は、ロボットＲから撮影された物体までの距離を表すものである。 [Image processing unit]
The image processing unit (image processing means) 10 processes images (captured images) taken by the cameras C and C, and grasps the situation around the robot R from the taken images. It is the part which recognizes. The image processing unit 10 includes a stereo processing unit 11a, a moving body extraction unit 11b, a face recognition unit 11c, and a line-of-sight detection unit 11d.
The stereo processing unit 11a performs pattern matching on the basis of one of the two images taken by the left and right cameras C and C, calculates the parallax of each corresponding pixel in the left and right images, and generates a parallax image. The generated parallax image and the original image are output to the moving object extraction unit 11b. This parallax represents the distance from the robot R to the photographed object.

移動体抽出部１１ｂは、ステレオ処理部１１ａから出力されたデータに基づき、撮影した画像中の移動体を抽出するものである。移動する物体（移動体）を抽出するのは、移動する物体が人物であると推定して、人物の認識をするためである。
移動体の抽出をするために、移動体抽出部１１ｂは、過去の数フレーム（コマ）の画像を記憶しており、最も新しいフレーム（画像）と、過去のフレーム（画像）を比較して、パターンマッチングを行い、各画素の移動量を計算し、移動量画像を生成する。そして、視差画像と、移動量画像とから、カメラＣ，Ｃから所定の距離範囲内で、移動量の多い画素がある場合に、人物があると推定し、その所定距離範囲のみの視差画像として、移動体を抽出し、顔認識部１１ｃへ移動体の画像を出力する。 The moving body extraction unit 11b extracts a moving body in the photographed image based on the data output from the stereo processing unit 11a. The reason for extracting the moving object (moving body) is to recognize the person by estimating that the moving object is a person.
In order to extract the moving object, the moving object extraction unit 11b stores images of several past frames (frames), compares the newest frame (image) with the past frames (images), and Pattern matching is performed, the movement amount of each pixel is calculated, and a movement amount image is generated. Then, from the parallax image and the movement amount image, when there is a pixel with a large movement amount within a predetermined distance range from the cameras C and C, it is estimated that there is a person, and as a parallax image of only the predetermined distance range The moving body is extracted, and an image of the moving body is output to the face recognition unit 11c.

顔認識部（顔認識手段）１１ｃは、抽出した移動体の一部分の大きさ、形状などから顔領域および顔の位置を認識する。なお、同様にして、抽出した移動体の一部分の大きさ、形状などから手の位置も認識される。
認識された顔の位置は、ロボットＲが移動するときの情報として、また、その人とのコミュニケーションを取るため、主制御部４０に出力されると共に、視線検出部１１ｄに出力される。 The face recognition unit (face recognition means) 11c recognizes the face area and the face position from the size, shape, etc. of a part of the extracted moving body. Similarly, the position of the hand is also recognized from the size and shape of a part of the extracted moving body.
The recognized face position is output to the main control unit 40 and also to the line-of-sight detection unit 11d as information when the robot R moves and to communicate with the person.

視線検出部（視線検出手段）１１ｄは、顔認識部１１ｃで抽出された顔領域から認識対象とする人物の視線方向を検出する。
視線検出部１１ｄは、目周辺の画像を解析して目が閉じているかどうかを判断し、目が閉じられていない場合に瞳孔を検出し、検出した瞳孔の位置と眼球の位置から視線方向を検出する。例えば、視線方向は、顔認識部１１ｃで認識された顔の位置および姿勢、並びに瞳孔の中心位置の関係で求められる。この場合、視線検出部１１ｄで検出される視線方向は、眼球の中心位置と瞳孔の中心位置とを結ぶベクトルとして求められる。求められた視線方向は、興味度を算出する際に用いるために主制御部４０に出力される。 The line-of-sight detection unit (line-of-sight detection means) 11d detects the line-of-sight direction of the person to be recognized from the face area extracted by the face recognition unit 11c.
The line-of-sight detection unit 11d analyzes the image around the eyes to determine whether the eyes are closed, detects the pupil when the eyes are not closed, and determines the line-of-sight direction from the detected pupil position and eyeball position. To detect. For example, the line-of-sight direction is obtained from the relationship between the position and posture of the face recognized by the face recognition unit 11c and the center position of the pupil. In this case, the line-of-sight direction detected by the line-of-sight detection unit 11d is obtained as a vector connecting the center position of the eyeball and the center position of the pupil. The obtained gaze direction is output to the main control unit 40 for use in calculating the degree of interest.

［音声処理部］
音声処理部２０は、音声合成部２１ａと、音声認識部２１ｂと、音源定位部２１ｃと、騒音測定部２１ｄおよび発話情報検出部２１ｅを有する。
音声合成部２１ａは、主制御部４０が決定し、出力してきた発話行動の指令に基づき、文字情報（テキストデータ）から音声データを生成し、スピーカＳに音声を出力する部分である。音声データの生成には、予め記憶部３０に記憶している文字情報（テキストデータ）と音声データとの対応関係を利用する。なお、音声データは、管理用コンピュータ３から取得され、記憶部３０に保存される。
音声認識部（音声認識手段）２１ｂは、マイクＭＣ，ＭＣから音声データが入力され、入力された音声データから文字情報（テキストデータ）を生成し、主制御部４０に出力するものである。音声認識部２１ｂは、音声から人物の状況を判別可能に音声認識する。例えば、「はい」、「何ですか？」のような音声を認識することで、主制御部４０において、「人物からの反応がある」と判別することが可能となる。なお、音声データと文字情報（テキストデータ）との対応関係は、記憶部３０に予め記憶されている。
音源定位部２１ｃは、マイクＭＣ，ＭＣ間の音圧差および音の到達時間差に基づいて音源位置（ロボットＲが認識する平面状の位置）を特定し、主制御部４０に出力するものである。音源位置は、例えば、ロボットＲの立っている方向（ｚ軸方向）周りの回転角θ_zで表される。 [Audio processor]
The speech processing unit 20 includes a speech synthesis unit 21a, a speech recognition unit 21b, a sound source localization unit 21c, a noise measurement unit 21d, and an utterance information detection unit 21e.
The voice synthesizer 21a is a part that generates voice data from character information (text data) and outputs the voice to the speaker S based on the speech action command determined and output by the main controller 40. For the generation of the voice data, the correspondence between the character information (text data) stored in the storage unit 30 in advance and the voice data is used. The audio data is acquired from the management computer 3 and stored in the storage unit 30.
The voice recognition unit (voice recognition unit) 21 b receives voice data from the microphones MC and MC, generates character information (text data) from the input voice data, and outputs the character information to the main control unit 40. The voice recognition unit 21b performs voice recognition so that the situation of a person can be determined from the voice. For example, by recognizing a voice such as “Yes” or “What?”, The main control unit 40 can determine that “there is a reaction from a person”. The correspondence relationship between the voice data and the character information (text data) is stored in the storage unit 30 in advance.
The sound source localization unit 21 c specifies a sound source position (a planar position recognized by the robot R) based on the sound pressure difference between the microphones MC and MC and the sound arrival time difference, and outputs the sound source position to the main control unit 40. The sound source position is represented by, for example, a rotation angle θ _z around the direction in which the robot R stands (z-axis direction).

騒音測定部（騒音測定手段）２１ｄは、ロボットＲの周囲の騒音を測定して騒音のレベルを検出する。検出された騒音のレベルは、発話音量を調整する際に用いるために主制御部４０に出力される。
発話情報検出部（発話情報検出）２１ｅは、マイクＭＣ，ＭＣから入力する音声から認識対象とする人物の音声に関する人物発話情報を検出する。人物発話情報は、人物の音声の音量、音質、話速のうちの少なくとも１つを示すものである。検出された人物発話情報は、興味度を算出する際に用いるために主制御部４０に出力される。 The noise measuring unit (noise measuring means) 21d measures the noise around the robot R and detects the noise level. The detected noise level is output to the main control unit 40 for use in adjusting the speech volume.
The utterance information detection unit (speech information detection) 21e detects person utterance information related to the voice of the person to be recognized from the voices input from the microphones MC and MC. The person utterance information indicates at least one of the volume, sound quality, and speech speed of the person's voice. The detected person utterance information is output to the main control unit 40 for use in calculating the degree of interest.

[記憶部]
記憶部（記憶手段）３０は、例えば、一般的なハードディスク等から構成され、管理用コンピュータ３から送信された必要な情報（ローカル地図データ、会話用データなど）を記憶するものである。本実施形態では、会話用データとして、通常用途で発話される通常口調用データと、特別な用途で発話される特別口調用データとが記憶される。ここで、通常口調は、例えば、人が日常的な場面で用いる口語の口調であり、特別口調は、例えば、人がビジネスや儀礼などの場面で用いる敬語の口調である。
また、記憶部３０は、後記するように、主制御部４０の各種動作を行うために必要な情報を記憶している。 [Memory]
The storage unit (storage means) 30 is composed of, for example, a general hard disk or the like, and stores necessary information (local map data, conversation data, etc.) transmitted from the management computer 3. In this embodiment, normal tone data uttered for normal use and special tone data uttered for special purposes are stored as conversation data. Here, the normal tone is, for example, a spoken tone used by a person in an everyday scene, and the special tone is, for example, an honorific tone used by a person in a business or ritual scene.
Further, the storage unit 30 stores information necessary for performing various operations of the main control unit 40, as will be described later.

[主制御部]
主制御部４０は、画像処理部１０、音声処理部２０、記憶部３０、自律移動制御部５０、無線通信部６０、対象検知部８０、および周辺状態検知部９０を統括制御するものである。また、ジャイロセンサＳＲ１、およびＧＰＳ受信器ＳＲ２が検出したデータは、主制御部４０に出力され、ロボットＲの行動を決定するために利用される。この主制御部４０は、例えば、管理用コンピュータ３と通信を行うための制御、管理用コンピュータ３から取得したタスク実行命令に基づいて所定のタスクを実行するための制御、ロボットＲを目的地に移動させるための制御、人物を識別するための制御、人物と対話するための制御を行うために、種々の判断を行ったり、各部の動作のための指令を生成したりする。 [Main control section]
The main control unit 40 controls the image processing unit 10, the sound processing unit 20, the storage unit 30, the autonomous movement control unit 50, the wireless communication unit 60, the target detection unit 80, and the surrounding state detection unit 90. The data detected by the gyro sensor SR1 and the GPS receiver SR2 is output to the main control unit 40 and used to determine the behavior of the robot R. The main control unit 40 includes, for example, control for communicating with the management computer 3, control for executing a predetermined task based on a task execution command acquired from the management computer 3, and the robot R as a destination. In order to perform control for movement, control for identifying a person, and control for interacting with a person, various determinations are made and commands for the operation of each unit are generated.

［自律移動制御部］
自律移動制御部５０は、主制御部４０の指示に従い頭部Ｒ１、腕部Ｒ２および脚部Ｒ３を駆動するものである。この自律移動制御部５０は、図示を省略するが、頭部Ｒ１を駆動する頭部制御部、腕部Ｒ２を駆動する腕部制御部、脚部Ｒ３を駆動する脚部制御部を有し、これら頭部制御部、腕部制御部および脚部制御部は、頭部Ｒ１、腕部Ｒ２および脚部Ｒ３を駆動するアクチュエータに駆動信号を出力する。この自律移動制御部５０および脚部Ｒ３は移動手段を構成する。 [Autonomous Movement Control Unit]
The autonomous movement control unit 50 drives the head R1, the arm R2, and the leg R3 in accordance with instructions from the main control unit 40. Although not shown, the autonomous movement control unit 50 includes a head control unit that drives the head R1, an arm control unit that drives the arm R2, and a leg control unit that drives the leg R3. These head control unit, arm control unit, and leg control unit output drive signals to actuators that drive the head R1, arm R2, and leg R3. This autonomous movement control part 50 and leg part R3 comprise a moving means.

［無線通信部］
無線通信部６０は、管理用コンピュータ３とデータの送受信を行う通信装置である。無線通信部６０は、公衆回線通信装置６１ａおよび無線通信装置６１ｂを有する。
公衆回線通信装置６１ａは、携帯電話回線やＰＨＳ(Personal Handyphone System)回線などの公衆回線を利用した無線通信手段である。一方、無線通信装置６１ｂは、IEEE802.11b規格に準拠するワイヤレスＬＡＮなどの、近距離無線通信による無線通信手段である。
無線通信部６０は、管理用コンピュータ３からの接続要求に従い、公衆回線通信装置６１ａまたは無線通信装置６１ｂを選択して管理用コンピュータ３とデータ通信を行う。 [Wireless communication part]
The wireless communication unit 60 is a communication device that transmits and receives data to and from the management computer 3. The wireless communication unit 60 includes a public line communication device 61a and a wireless communication device 61b.
The public line communication device 61a is a wireless communication means using a public line such as a mobile phone line or a PHS (Personal Handyphone System) line. On the other hand, the wireless communication device 61b is a wireless communication unit using short-range wireless communication such as a wireless LAN conforming to the IEEE802.11b standard.
The wireless communication unit 60 performs data communication with the management computer 3 by selecting the public line communication device 61 a or the wireless communication device 61 b in accordance with a connection request from the management computer 3.

バッテリ７０は、ロボットＲの各部の動作や処理に必要な電力の供給源である。このバッテリ７０は、充填式の構成をもつものが使用され、バッテリ補給エリア（図１参照）で電力が補給される。 The battery 70 is a power supply source necessary for the operation and processing of each unit of the robot R. The battery 70 has a rechargeable configuration and is replenished with electric power in a battery replenishment area (see FIG. 1).

［対象検知部］
対象検知部（対象検知手段）８０は、ロボットＲの周囲にタグＴを備える人物が存在するか否かを検知するものである。対象検知部８０は、複数の発光部８１（図６では１つのみ表示した）を備える。これら発光部８１は、例えば、ＬＥＤから構成され、ロボットＲの頭部Ｒ１外周に沿って前後左右などに配設される（図示は省略する）。対象検知部８０は、発光部８１から、各発光部８１を識別する発光部ＩＤを示す信号を含む赤外光をそれぞれ発信すると共に、この赤外光を受信したタグＴから受信報告信号を受信する。いずれかの赤外光を受信したタグＴは、その赤外光に含まれる発光部ＩＤに基づいて、受信報告信号を生成するので、ロボットＲは、この受信報告信号に含まれる発光部ＩＤを参照することにより、当該ロボットＲから視てどの方向にタグＴが存在するかを特定することができる。また、対象検知部８０は、タグＴから取得した受信報告信号の電波強度に基づいて、タグＴまでの距離を特定する機能を有する。したがって、対象検知部８０は、受信報告信号に基づいて、タグＴの位置（距離および方向）を、人物の位置として特定することができる。さらに、対象検知部８０は、発光部８１から赤外光を発光するだけではなく、ロボットＩＤを示す信号を含む電波を図示しないアンテナから発信する。これにより、この電波を受信したタグＴは、赤外光を発信したロボットＲを正しく特定することができる。なお、対象検知部８０およびタグＴについての詳細は、例えば、特開２００６−１９２５６３号公報に開示されている。 [Target detection unit]
The target detection unit (target detection means) 80 detects whether or not there is a person with the tag T around the robot R. The target detection unit 80 includes a plurality of light emitting units 81 (only one is displayed in FIG. 6). These light emitting units 81 are constituted by LEDs, for example, and are arranged on the front and rear, right and left along the outer periphery of the head R1 of the robot R (not shown). The target detection unit 80 transmits infrared light including a signal indicating a light emitting unit ID for identifying each light emitting unit 81 from the light emitting unit 81 and receives a reception report signal from the tag T that has received the infrared light. To do. The tag T that has received any infrared light generates a reception report signal based on the light emitting unit ID included in the infrared light, so that the robot R determines the light emitting unit ID included in the reception report signal. By referencing, it is possible to specify in which direction the tag T exists as viewed from the robot R. Further, the target detection unit 80 has a function of specifying the distance to the tag T based on the radio wave intensity of the reception report signal acquired from the tag T. Therefore, the target detection unit 80 can specify the position (distance and direction) of the tag T as the position of the person based on the reception report signal. Further, the target detection unit 80 not only emits infrared light from the light emitting unit 81 but also transmits a radio wave including a signal indicating the robot ID from an antenna (not shown). Thus, the tag T that has received the radio wave can correctly identify the robot R that has transmitted infrared light. Details of the target detection unit 80 and the tag T are disclosed in, for example, Japanese Patent Application Laid-Open No. 2006-192563.

［周辺状態検知部］
周辺状態検知部９０は、ロボットＲの周辺状態を検知するものであり、ジャイロセンサＳＲ１やＧＰＳ受信器ＳＲ２によって検出された自己位置データを取得可能になっている。また、周辺状態検知部９０は、探索域に向かってスリット光を照射するレーザ照射部９１と、探索域に向かって赤外線を照射する赤外線照射部９２と、スリット光または赤外線が照射された探索域を撮像する床面カメラ９３とを有する。この周辺状態検知部９０は、床面カメラ９３で撮像したスリット光画像（スリット光が照射されたときの画像）を解析して路面状態を検出する。また、周辺状態検知部９０は、床面カメラ９３で撮像した赤外線画像（赤外線が照射されたときの画像）を解析してマークＭ（図２参照）を検出し、検出されたマークＭの位置（座標）からマークＭとロボットＲとの相対的な位置関係を計算する。なお、周辺状態検知部９０についての詳細は、例えば、特開２００６−１６７８４４号公報に開示されている。 [Ambient condition detector]
The peripheral state detection unit 90 detects the peripheral state of the robot R, and can acquire self-position data detected by the gyro sensor SR1 and the GPS receiver SR2. The peripheral state detection unit 90 includes a laser irradiation unit 91 that irradiates slit light toward the search region, an infrared irradiation unit 92 that irradiates infrared light toward the search region, and a search region irradiated with slit light or infrared rays. And a floor camera 93. The peripheral state detection unit 90 detects a road surface state by analyzing a slit light image (an image when the slit light is irradiated) captured by the floor camera 93. Further, the peripheral state detection unit 90 analyzes the infrared image captured by the floor camera 93 (image when irradiated with infrared rays) to detect the mark M (see FIG. 2), and the position of the detected mark M The relative positional relationship between the mark M and the robot R is calculated from (coordinates). Details of the peripheral state detection unit 90 are disclosed in, for example, Japanese Patent Application Laid-Open No. 2006-167844.

［主制御部の構成］
図７は、図６に示したロボットの主制御部の構成を示すブロック図である。
主制御部４０は、静止障害物統合部４１と、オブジェクトデータ統合部４２と、行動パターン部４３と、身振り統合部４４と、内部状態検出部４５と、行動計画管理部４６と、モチベーション管理部４７とを備えている。 [Configuration of main controller]
FIG. 7 is a block diagram showing the configuration of the main control unit of the robot shown in FIG.
The main control unit 40 includes a stationary obstacle integration unit 41, an object data integration unit 42, an action pattern unit 43, a gesture integration unit 44, an internal state detection unit 45, an action plan management unit 46, and a motivation management unit. 47.

静止障害物統合部４１は、周辺状態検知部９０で検知されたロボットＲの周辺状態に関する情報を統合し、行動パターン部４３に出力するものである。例えば、静止障害物統合部４１が、ロボットＲの進路の床面に段ボール箱などの障害物を検知した場合や、床面の段差を検知した場合には、行動パターン部４３は、この統合された障害物情報に基づいて、図示しない局所回避モジュールによって迂回経路を探索する。 The stationary obstacle integration unit 41 integrates information related to the peripheral state of the robot R detected by the peripheral state detection unit 90 and outputs the information to the behavior pattern unit 43. For example, when the stationary obstacle integration unit 41 detects an obstacle such as a cardboard box on the floor surface of the path of the robot R or detects a step on the floor surface, the behavior pattern unit 43 is integrated. Based on the obstacle information, a bypass route is searched by a local avoidance module (not shown).

オブジェクトデータ統合部４２は、ロボットＲの姿勢データ、画像処理部１０、対象検知部８０および音源定位部２１ｃからの入力データに基づいて、対象物（オブジェクト）に関する識別データ（オブジェクトデータ）を統合し、この統合したオブジェクトデータを記憶部３０のオブジェクトデータ記憶手段３１に出力するものである。これにより、オブジェクトデータ記憶手段３１には、オブジェクトデータをオブジェクト別かつ時刻別に記録したデータであるオブジェクトマップが生成される。 The object data integration unit 42 integrates identification data (object data) related to the object (object) based on the posture data of the robot R, the input data from the image processing unit 10, the target detection unit 80, and the sound source localization unit 21c. The integrated object data is output to the object data storage means 31 of the storage unit 30. As a result, an object map, which is data in which object data is recorded for each object and for each time, is generated in the object data storage unit 31.

［オブジェクトマップの構成］
ここで、図８を参照して、オブジェクトデータ記憶手段３１に記憶されるオブジェクトマップの構成を説明する。図８は、オブジェクトデータの一例を示す図である。
オブジェクトマップは、時刻別に分類された複数の時刻別データ８０１を備えている。この時刻別データ８０１には、それぞれ、時刻情報としてのカウント８０２と、姿勢データ、カメラ姿勢および騒音レベルと、表８０３が付されている。姿勢データは、例えば顔の位置（ｘ，ｙ，ｚ）と顔の向き（θx，θy，θz）で表され、カメラ姿勢は、例えばパン、チルト、ロールの各軸周りの回転角度（pan，tilt，role）で表される。また、騒音レベルは、騒音測定部２１ｄ（図６参照）によって検出されたものであり、デシベル（ｄＢ）で表される。また、この表８０３では、列に識別すべき対象（オブジェクト）が配され、行に、このオブジェクトを特徴付ける複数の項目が配されており、オブジェクト別に（列ごとに）レコードが蓄積されている。以下に、各項目の詳細を説明する。 [Composition of object map]
Here, the configuration of the object map stored in the object data storage means 31 will be described with reference to FIG. FIG. 8 is a diagram illustrating an example of object data.
The object map includes a plurality of time-specific data 801 classified by time. Each time data 801 includes a count 802 as time information, posture data, a camera posture and a noise level, and a table 803. The posture data is represented by, for example, a face position (x, y, z) and a face direction (θx, θy, θz), and the camera posture is, for example, a rotation angle (pan, tilt, pan, tilt, roll axis) (tilt, role). The noise level is detected by the noise measurement unit 21d (see FIG. 6) and is expressed in decibels (dB). In this table 803, a target (object) to be identified is arranged in a column, a plurality of items characterizing this object are arranged in a row, and a record is accumulated for each object (for each column). Details of each item will be described below.

オブジェクトナンバ８０４は、ロボットＲがオブジェクトを検出した順番に最大Ｎ個まで付されるものであり、この表８０３では、「０」〜「１０」の１１個（Ｎ＝１１）のオブジェクトを管理できるようになっている。
ボディ位置８０５は、画像処理部１０から出力される位置座標データであり、ロボットＲが認識している座標平面における人物（オブジェクト）の重心位置座標（ｘ，ｙ）で表される。
速度８０６は、画像処理部１０から出力される速度データであり、ロボットＲが認識している座標平面における人物（オブジェクト）の移動速度（Ｖｘ，Ｖｙ）で表される。 The object number 804 is assigned up to N in the order in which the robot R detects the objects. In this table 803, eleven (N = 11) objects “0” to “10” can be managed. It is like that.
The body position 805 is position coordinate data output from the image processing unit 10 and is represented by the gravity center position coordinates (x, y) of the person (object) on the coordinate plane recognized by the robot R.
The speed 806 is speed data output from the image processing unit 10 and is represented by the moving speed (Vx, Vy) of the person (object) on the coordinate plane recognized by the robot R.

人物ＩＤ８０７は、人物を識別するための識別番号である。
人物確度８０８は、人物ＩＤ８０７の確度を示すものであり、完全一致を１００％として定められている。
人物ライフカウント８０９は、人物ＩＤ８０７に登録されたデータのオブジェクトデータ上での経過時間を表している。 The person ID 807 is an identification number for identifying a person.
The person accuracy 808 indicates the accuracy of the person ID 807, and is defined as 100% perfect match.
The person life count 809 represents the elapsed time on the object data of the data registered in the person ID 807.

ＲＦＩＤ識別番号８１０は、タグに記録された人物（オブジェクト）の識別番号であり、対象検知部８０から出力されたものである。
ＲＦＩＤ位置８１１は、対象検知部８０から出力される位置データであり、ロボットＲの周囲におけるタグ（オブジェクト）までの距離および方向で定まる領域で表される。
ＲＦＩＤ確度８１２は、ＲＦＩＤ識別番号８１０のデータ（識別番号）の確度を示すものである。
ＲＦＩＤライフカウント８１３は、ＲＦＩＤ識別番号８１０に登録されたデータ（識別番号）のオブジェクトマップ上での経過時間を表している。 The RFID identification number 810 is an identification number of a person (object) recorded on the tag and is output from the target detection unit 80.
The RFID position 811 is position data output from the target detection unit 80, and is represented by an area determined by the distance and direction to the tag (object) around the robot R.
The RFID accuracy 812 indicates the accuracy of the data (identification number) of the RFID identification number 810.
The RFID life count 813 represents an elapsed time on the object map of data (identification number) registered in the RFID identification number 810.

音源位置８１４は、音源定位部２１ｃから出力されるデータであり、ロボットＲが認識している座標平面における発声する人物（オブジェクト）の角度θｚで表される。
音源確度８１５は、音源位置８１４のデータの確度を示すものである。
音源ライフカウント８１６は、音源位置８１４に登録されたデータ（位置座標）のオブジェクトマップ上での経過時間を表している。 The sound source position 814 is data output from the sound source localization unit 21c, and is represented by an angle θz of a person (object) speaking on a coordinate plane recognized by the robot R.
The sound source accuracy 815 indicates the accuracy of the data of the sound source position 814.
The sound source life count 816 represents the elapsed time on the object map of the data (position coordinates) registered at the sound source position 814.

オブジェクトライフカウント８１７は、オブジェクトに対して、人物データ、ＲＦＩＤ識別データ、音源識別データのいずれかが初めて入力されたときに開始されたカウントを表すものである。
ＴＯＴＡＬ＿ＩＤ８１８は、人物ＩＤ８０７とＲＦＩＤ識別番号８１０に基づいてオブジェクトデータ統合部４２で決定されたオブジェクトの識別番号である。
ＴＯＴＡＬ＿確度８１９は、人物確度８０８とＲＦＩＤ確度８１２とに基づいてオブジェクトデータ統合部４２で決定されたオブジェクトの識別番号の確度を示すものである。 The object life count 817 represents a count started when any one of person data, RFID identification data, and sound source identification data is input to an object for the first time.
TOTAL_ID 818 is an object identification number determined by the object data integration unit 42 based on the person ID 807 and the RFID identification number 810.
The TOTAL_accuracy 819 indicates the accuracy of the object identification number determined by the object data integration unit 42 based on the person accuracy 808 and the RFID accuracy 812.

図７を参照して主制御部４０の構成の説明を続ける。
行動パターン部４３は、後記するように、ロボットＲの予め定められた行動（行動パターン）に伴って人物に発話を行う前にその人物へのアプローチ行動を制御するアプローチ行動制御手段４８を備えている。
また、行動パターン部４３は、行動パターンを実行するための各種プログラム（モジュール）を格納すると共に、この行動パターンを実行するときに、記憶部３０を参照して、行動パターンに反映するものである。 The description of the configuration of the main control unit 40 will be continued with reference to FIG.
As will be described later, the behavior pattern unit 43 includes approach behavior control means 48 that controls approach behavior to a person before speaking to the person in accordance with a predetermined behavior (behavior pattern) of the robot R. Yes.
The behavior pattern unit 43 stores various programs (modules) for executing the behavior pattern, and reflects the behavior pattern by referring to the storage unit 30 when the behavior pattern is executed. .

本実施形態では、図７に示すように、記憶部３０に、オブジェクトデータ記憶手段３１のほかに、ローカル地図データ記憶手段３２と、モチベーション指数記憶手段３３と、シナリオ記憶手段３４と、人物情報記憶手段３５とを備えている。なお、記憶部３０は、予め定められた人物の位置情報等も記憶している。ここで、人物の位置情報とは、当該人物の所在を示す情報であり、例えば、曜日、時間等に関連付けて予め作成されたものである。 In the present embodiment, as shown in FIG. 7, in addition to the object data storage unit 31, the local map data storage unit 32, the motivation index storage unit 33, the scenario storage unit 34, and the personal information storage are stored in the storage unit 30. Means 35. Note that the storage unit 30 also stores position information of a predetermined person. Here, the position information of the person is information indicating the location of the person, and is created in advance in association with the day of the week, time, or the like, for example.

ローカル地図データ記憶手段３２は、図３を参照して説明したロボットＲの周辺の地図（ローカル地図）を記憶するものである。このローカル地図は、例えば、管理用コンピュータ３から取得される。
モチベーション指数記憶手段３３は、モチベーション指数を記憶するものである。モチベーション指数は、モチベーション管理部４７が管理するものなので詳細は後記する。 The local map data storage means 32 stores a map (local map) around the robot R described with reference to FIG. This local map is acquired from the management computer 3, for example.
The motivation index storage means 33 stores a motivation index. Since the motivation index is managed by the motivation management unit 47, details will be described later.

シナリオ記憶手段３４は、各種行動パターンに対応したシナリオ（台本）を記憶するものである。シナリオは、例えば、歩行中に人物や障害物（オブジェクト）に遭遇したときにオブジェクトの１ｍ手前で立ち止まるといったもの、立ち止まってから１０秒後に腕部Ｒ２を所定位置まで上げるといったものなど動作に関するものと、発話に関するものとがある。
また、シナリオ記憶手段３４は、所定の発話を行うときに頭部Ｒ１、腕部Ｒ２、脚部Ｒ３のうちの少なくとも１つの部位を移動させる身体動作である身振りを指定する予め作成されたシナリオを記憶する。 The scenario storage unit 34 stores scenarios (scripts) corresponding to various behavior patterns. The scenario relates to an operation such as stopping a person or an obstacle (object) while walking, 1m before the object, and raising the arm R2 to a predetermined position 10 seconds after stopping. There is something about utterance.
In addition, the scenario storage unit 34 stores a scenario created in advance for designating a gesture that is a body motion that moves at least one of the head R1, the arm R2, and the leg R3 when a predetermined utterance is performed. Remember.

人物情報記憶手段３５は、予め定められた人物の嗜好する話題を複数記憶するものである。本実施形態では、人物情報記憶手段３５は、話題を人物別およびジャンル別に記憶している。この人物情報記憶手段３５に記憶された話題は、ロボットＲの発話行動に用いられる。 The person information storage means 35 stores a plurality of topics that a predetermined person likes. In the present embodiment, the person information storage means 35 stores topics by person and by genre. The topic stored in the person information storage means 35 is used for the speech behavior of the robot R.

行動パターン部４３は、オブジェクトデータ記憶手段３１、ローカル地図データ記憶手段３２、シナリオ記憶手段３４、および人物情報記憶手段３５を適宜利用して様々な場面や状況に応じた行動パターンを実行するモジュールを備えている。モジュールの例としては、目的地移動モジュール、局所回避モジュール、デリバリモジュール、案内モジュール、人対応モジュール等がある。 The behavior pattern unit 43 is a module that executes behavior patterns according to various scenes and situations by appropriately using the object data storage unit 31, the local map data storage unit 32, the scenario storage unit 34, and the person information storage unit 35. I have. Examples of the module include a destination movement module, a local avoidance module, a delivery module, a guidance module, a human correspondence module, and the like.

目的地移動モジュールは、ロボットＲの現在位置から、例えば、タスク実行エリア内のタスク実行位置等の目的地までの経路探索（例えばノード間の経路を探索）及び移動を行うものである。この目的地移動モジュールは、地図データと現在位置とを参照しつつ、目的地までの最短距離を求める。
局所回避モジュールは、歩行中に障害物が検知されたときに、静止障害物統合部４１で統合された障害物情報に基づいて、障害物を回避する迂回経路を探索するものである。 The destination movement module performs route search (for example, search for a route between nodes) and movement from the current position of the robot R to a destination such as a task execution position in the task execution area. This destination movement module obtains the shortest distance to the destination while referring to the map data and the current position.
The local avoidance module searches for a detour route for avoiding an obstacle based on the obstacle information integrated by the stationary obstacle integration unit 41 when an obstacle is detected during walking.

デリバリモジュールは、物品の運搬を依頼する人物（依頼人）から物品を受け取る（把持する）動作や、受け取った物品を受取人に渡す（物品を手放す）動作を実行するものである。
案内モジュールは、例えば、タスク実行エリア内の案内開始地点に来訪した来訪客を案内領域３０１（図３参照）の受付３０５にいる受付スタッフのもとへ案内するタスクを実行するものである。
人対応モジュールは、例えば、物品運搬タスクや案内タスクの実行時に所定のシナリオに基づいて、発話、姿勢の変更、腕部Ｒ２の上下移動や把持等を行うものである。 The delivery module executes an operation of receiving (grabbing) an article from a person who requests transportation of the article (client) and an operation of delivering the received article to the recipient (releasing the article).
The guidance module executes, for example, a task for guiding a visitor who has visited a guidance start point in the task execution area to a reception staff in the reception 305 of the guidance area 301 (see FIG. 3).
The person handling module performs, for example, speech, posture change, vertical movement and gripping of the arm portion R2 based on a predetermined scenario when executing an article transport task or a guidance task.

身振り統合部（身振り統合手段）４４は、対象とする人物に対して行う発話に対応した身振りをシナリオ記憶手段３４から抽出し、抽出した身振りを指定するコマンドを自律移動制御部５０に出力するものである。頭部Ｒ１の動作による身振りは、例えば、頭部Ｒ１を下方に傾けることで「お辞儀」、「礼」、「同意」、「謝罪」等を表示する動作や、頭部Ｒ１を左右に傾けることで「分からない」という意思表示を伝える動作が含まれる。また、腕部Ｒ２の動作による身振りは、例えば、腕部Ｒ２を上げることで「喜び」、「賞賛」等を表示する動作や、腕部Ｒ２を下方左右に広げることや握手を行うことで「歓迎」という意思表示を伝える動作が含まれる。また、脚部Ｒ３の動作による身振りは、例えば、その場で駆け足をすることで「喜び」、「元気」等の意思表示を伝える動作が含まれる。 The gesture integration unit (gesture integration unit) 44 extracts gestures corresponding to the utterances made to the target person from the scenario storage unit 34, and outputs a command for specifying the extracted gestures to the autonomous movement control unit 50. It is. Gesture by the movement of the head R1, for example, tilting the head R1 downward to display “bowing”, “thanks”, “consent”, “apology”, etc., or tilting the head R1 left and right This includes an action that conveys the intention expression “I don't know”. The gesture by the movement of the arm part R2 is, for example, an action of displaying “joy”, “praise”, etc. by raising the arm part R2, expanding the arm part R2 to the left and right, or shaking hands. This includes an action to convey an expression of “welcome”. The gesture by the operation of the leg portion R3 includes, for example, an operation of transmitting intention indications such as “joy” and “goodness” by running on the spot.

内部状態検出部４５は、ロボットＲの内部状態を検出し、検出結果をモチベーション管理部４７に出力するものである。本実施形態では、内部状態検出部４５は、バッテリ７０の残量を検出する。検出されたバッテリ残量は、モチベーション管理部４７に出力される。また、内部状態検出部４５は、ロボットＲの状態（現在位置、バッテリ残量、タスク実行状況など）に関するデータを所定時間間隔毎にステータス情報として生成し、生成したステータス情報を無線通信部６０を介して管理用コンピュータ３に出力する。そして、管理用コンピュータ３は、入力されたステータス情報を記憶部３ａに格納された図示しないロボット情報データベースにロボットＲ毎に登録する。 The internal state detection unit 45 detects the internal state of the robot R and outputs the detection result to the motivation management unit 47. In the present embodiment, the internal state detection unit 45 detects the remaining amount of the battery 70. The detected remaining battery level is output to the motivation management unit 47. In addition, the internal state detection unit 45 generates data regarding the state of the robot R (current position, remaining battery level, task execution status, etc.) as status information at predetermined time intervals, and the generated status information is transmitted to the wireless communication unit 60. To the management computer 3. Then, the management computer 3 registers the input status information for each robot R in a robot information database (not shown) stored in the storage unit 3a.

行動計画管理部４６は、行動パターン部４３が備える各種モジュールを所定のスケジュールで実行する行動計画を管理するものである。本実施形態では、行動計画管理部４６は、管理用コンピュータ３から取得したタスク実行命令に基づいて予め定められたタスクを実行するための行動計画を管理し、現在実行すべき作業に必要なモジュールを適宜選択する。また、行動計画管理部４６は、モチベーション管理部４７の指示に基づいて識別対象に対する行動計画に必要なモジュールを適宜選択する。 The action plan management unit 46 manages an action plan for executing various modules included in the action pattern unit 43 according to a predetermined schedule. In the present embodiment, the action plan management unit 46 manages an action plan for executing a predetermined task based on a task execution command acquired from the management computer 3, and is a module necessary for work to be currently executed. Is appropriately selected. The action plan management unit 46 appropriately selects a module necessary for the action plan for the identification target based on an instruction from the motivation management unit 47.

モチベーション管理部４７は、モチベーション指数を管理し、バッテリ補給が必要ではなく、かつ、現在実行すべきタスクを有していない場合に、モチベーション指数に基づいて、能動的に行動するための行動計画の追加を行動計画管理部４６に指示するものである。ここで、モチベーション指数とは、既知の予め定められた対象物に対する未来の行動の実行可能性の大きさを示すものである。本実施形態では、対象物は人間である。また、モチベーション指数は、人間に対するロボットＲの直近の行動から経過した時間に関する経過時間指数と、ロボットＲの過去の対象人物に対する行動の回数および行動時間に関する行動指数とに基づいて決定される。 The motivation management unit 47 manages the motivation index, and when the battery replenishment is not necessary and there is no task to be executed at present, an action plan for actively acting based on the motivation index The addition is instructed to the action plan management unit 46. Here, the motivation index indicates the magnitude of the feasibility of future action on a known predetermined object. In the present embodiment, the object is a human. The motivation index is determined based on the elapsed time index related to the time elapsed from the latest action of the robot R to the human and the action index related to the number of actions and the action time of the robot R with respect to the past target person.

具体的には、経過時間指数は、ロボットＲの直近の対象人物に対する行動から経過した時間の大きさに比例して大きくなるように設定され、行動指数は、ロボットＲの過去の対象人物に対する行動の回数および行動時間の大きさに比例して大きくなるように設定されている。つまり、ロボットＲを擬人化した場合には、経過時間指数が大きいことは人物に対する懐古性が高いことを意味し、また、行動指数が大きいことは人物に対する親密性が高いことを意味する。このように設定することにより、既知の複数の人物の中から、懐古性が高い人物や親密性が高い人物を選ぶためにモチベーション指数を用いることができる。行動指数が親密性を表すため、以下では、行動指数のことをあらためて情動指数と呼ぶことにする。 Specifically, the elapsed time index is set to increase in proportion to the amount of time that has elapsed since the action of the robot R on the latest target person, and the action index is the action of the robot R on the past target person. Is set so as to increase in proportion to the number of times and the amount of action time. That is, when the robot R is anthropomorphic, a large elapsed time index means that the person has a high degree of nostalgia, and a large action index means that the person has a high degree of intimacy. By setting in this way, the motivation index can be used to select a highly retrospective person or a highly intimate person from a plurality of known persons. Since the behavior index represents intimacy, hereinafter, the behavior index is referred to as an emotional index.

モチベーション管理部４７で管理されるモチベーション指数は、モチベーション指数記憶手段３３に記憶されている。本実施形態では、モチベーション指数記憶手段３３には、モチベーション指数テーブルが記憶されている。図９に示すように、モチベーション指数テーブル９００には、人物の名前、対象として選択された回数の指標である「頻度」、時間指数および情動指数の回復の指標である「回復」、モチベーション指数の内訳を示す「指数」、指数の値およびその最大値である「値／ＭＡＸ」などが、情報項目として含まれている。このモチベーション指数テーブル９００では、モチベーション指数を省略して「モチ指数」と表記した。また、最大値が「５０」である経過時間指数（「時間指数」と表記）と、最大値が「５０」である行動指数（「情動指数」と表記）との和によって、最大値が「１００」であるモチベーション指数を定義した。また、モチベーション指数テーブル９００には、モチベーション指数等の値を棒グラフで表示した。 The motivation index managed by the motivation management unit 47 is stored in the motivation index storage means 33. In the present embodiment, the motivation index storage means 33 stores a motivation index table. As shown in FIG. 9, the motivation index table 900 includes a name of a person, “frequency” that is an index of the number of times selected as a target, “recovery” that is an index of recovery of a time index and an emotional index, and a motivation index. “Index” indicating the breakdown, the value of the index, and “value / MAX” which is the maximum value are included as information items. In the motivation index table 900, the motivation index is omitted and expressed as “mochi index”. In addition, the maximum value is represented by the sum of an elapsed time index (expressed as “time index”) having a maximum value of “50” and an action index (expressed as “emotional index”) having a maximum value of “50”. A motivation index of “100” was defined. The motivation index table 900 displays values such as the motivation index as a bar graph.

［アプローチ行動制御手段の構成］
アプローチ行動制御手段４８は、図１０に示すように、環境情報検出手段１１０と、人状況判別手段１２０と、応対行動制御手段１３０とを備え、これらによって記憶部３０に記憶された各種の情報やデータに基づいて後記する制御を行う。 [Configuration of approach behavior control means]
As shown in FIG. 10, the approach behavior control unit 48 includes an environment information detection unit 110, a human situation determination unit 120, and a response behavior control unit 130, by which various types of information stored in the storage unit 30 and The control described later is performed based on the data.

記憶部３０のローカル地図データ記憶手段３２は、前記したローカル地図（図３参照）のほかに、ロボットＲの発話の可否、発話音量および発話口調のうちの少なくとも１つを示す発話出力情報と地図上の位置を示す位置情報とを対応付けて予め作成された地図データを記憶している。本実施形態では、地図データは、予め設定された騒音のレベルごとに発話出力情報が位置情報と対応付けて作成されている。 The local map data storage means 32 of the storage unit 30 includes, in addition to the local map (see FIG. 3) described above, utterance output information and map indicating at least one of the availability of the utterance of the robot R, the utterance volume, and the utterance tone. Map data created in advance in association with position information indicating the upper position is stored. In the present embodiment, the map data is created by associating speech output information with position information for each preset noise level.

また、記憶部３０は、人物の状況を示す画像条件および音声認識の結果を示す音声条件と発話出力情報とを対応付けて予め作成された人物状況データとを記憶する。
ここで、人物の状況とは、その人物が現在なすべき何かに集中している状況や、他者から話しかけられると、困惑、迷惑、不快を感じるような様々な状況を指す。また、人物の状況とは、人物ごとに異なり、また同一人物でも時、場所、場面等の種々の条件によって異なる。そこで、本実施形態では、人物の状況として、誰もが他者から話しかけられたくないと感じるような普遍性のある状況の例として、「人物が休息中（睡眠中）であること」と、「人物が既に対話中であること」との２つの事例に対応させるように構成した。 In addition, the storage unit 30 stores image conditions indicating a person's situation, voice conditions indicating a result of voice recognition, and person situation data created in advance by associating speech output information.
Here, the situation of a person refers to a situation in which the person is concentrated on something that should be done now, or various situations in which a person feels embarrassed, annoyed, or uncomfortable when spoken by another person. Further, the situation of a person differs for each person, and even for the same person, it varies depending on various conditions such as time, place, and scene. Therefore, in this embodiment, as an example of a universal situation in which everyone feels that they do not want to be spoken by others as the situation of a person, “the person is resting (sleeping)” It was configured to correspond to two cases of “a person is already in conversation”.

人物が休息中であることを示す人物状況データは、例えば、人物が休息中であることを特徴付ける休息中画像条件と、人物の音声が検出されないことを示す音声条件と、ロボットＲの発話の不許可を示す情報とを対応付けて構成されたデータである。
休息中画像条件は、撮影画像において、例えば、人物が目を閉じた状態が所定時間続いている場合、人物が顔を正面を向いているときに比較して下に向けた状態が所定時間続いている場合、人物の顔の位置が座っているときの顔の位置に比較して低い状態が所定時間続いている場合等を指すものである。 The person status data indicating that the person is resting includes, for example, a resting image condition characterizing that the person is resting, a voice condition indicating that the voice of the person is not detected, and the robot R's speech failure. This is data configured in association with information indicating permission.
The resting image condition is that, for example, in a captured image, when a person has closed eyes for a predetermined time, a state in which the person is facing down continues for a predetermined time compared to when the person is facing the front. The position of the face of the person is lower than the position of the face when the person is sitting for a predetermined time.

人物が対話中であることを示す人物状況データは、例えば、複数の人物が対話中であることを特徴付ける対話中画像条件と、人物の音声が検出されないことを示す音声条件と、ロボットＲの発話の不許可を示す情報とを対応付けて構成されたデータである。
対話中画像条件は、撮影画像において、例えば、人物が口の開閉を行っている場合、２人の人物の顔の向きや視線方向が逆方向である場合等を指すものである。 The person status data indicating that a person is in conversation includes, for example, an in-conversation image condition that characterizes that a plurality of persons are in conversation, a voice condition that indicates that no person's voice is detected, and an utterance of the robot R. This is data configured in association with information indicating that the information is not permitted.
The in-dialog image condition refers to, for example, a case where a person is opening and closing a mouth in a captured image, and a case where the face direction and line-of-sight direction of two persons are opposite directions.

ここで、人物状況データは、発話の不許可を示す情報と対応付けられる条件が、画像条件と、音声条件とだけに限定されるものではない。条件としては、例えば、場所、時間、周囲の音量（騒音レベル）、人物の反応等、またはそれらの組合せを対応付けることが可能である。この場合、人物が休息中であることを示す人物状況データは、例えば、場所、時間、騒音レベル、ロボットＲが呼びかけたときの人物の反応の各条件と、ロボットＲの発話の不許可を示す情報とを対応付けて構成されてもよい。具体的には、「人物のいる場所＝休息所」、ＡＮＤ、「時間＝昼休み」、ＡＮＤ、「騒音レベル＝Ｌｏｗ」、ＡＮＤ、「反応＝０」である場合に、「人物が休息中である」と判定することができる。なお、発話の不許可とは、ロボットからの発声（ｓａｙ）を不許可とするものではなく、話しをすること（ｔａｌｋ）を不許可とすることを示す。そして、ロボットＲは、他の条件が「人物が休息中であること」を示す場合に、最終的にその人物の名前などを呼ぶこととする。 Here, in the human situation data, the conditions associated with the information indicating that the speech is not permitted are not limited only to the image condition and the audio condition. As conditions, for example, location, time, ambient sound volume (noise level), person reaction, or a combination thereof can be associated. In this case, the person status data indicating that the person is resting indicates, for example, each condition of the place, time, noise level, reaction of the person when the robot R calls, and disapproval of the robot R's utterance. Information may be associated with the information. Specifically, when “place where the person is = rest place”, AND, “time = lunch break”, AND, “noise level = Low”, AND, “reaction = 0”, “the person is resting” It can be determined. Note that utterance disapproval does not disallow utterances from the robot (say), but does not permit talking (talk). Then, when the other condition indicates “the person is resting”, the robot R finally calls the name of the person.

次に、アプローチ行動制御手段４８の構成を詳細に説明する。
＜環境情報検出手段＞
図１０に示すように、環境情報検出手段１１０は、例えばＧＰＳ受信器ＳＲ２で検出された現在位置に対応する発話出力情報を、ローカル地図データ記憶手段３２に記憶された地図データからロボットＲの環境に起因した情報として検出するものである。検出された発話出力情報は、応対行動制御手段１３０に出力される。 Next, the configuration of the approach behavior control means 48 will be described in detail.
<Environmental information detection means>
As shown in FIG. 10, the environment information detection unit 110 obtains the utterance output information corresponding to the current position detected by, for example, the GPS receiver SR2, from the map data stored in the local map data storage unit 32, and the environment of the robot R. It is detected as information resulting from. The detected utterance output information is output to the response action control means 130.

＜人状況判別手段＞
人状況判別手段１２０は、記憶部３０に記憶された人物状況データを作成するために用いられた画像条件と、画像処理部１０で処理された撮影画像の処理結果と、音声認識部２１ｂで音声認識された結果とに基づいて、対象とする人物の状況を判別するものである。この人状況判別手段１２０は、個人状況推定手段１２１と、集団状況推定手段１２２と、発話タイミング決定手段１２３と、興味推定手段１２４とを備える。 <Person status determination means>
The human situation determination unit 120 includes image conditions used to create the personal situation data stored in the storage unit 30, the processing result of the captured image processed by the image processing unit 10, and the voice recognition unit 21b. Based on the recognized result, the situation of the target person is determined. The human situation determination unit 120 includes an individual situation estimation unit 121, a group situation estimation unit 122, an utterance timing determination unit 123, and an interest estimation unit 124.

個人状況推定手段１２１は、休息中画像条件と撮影画像の処理結果および音声認識の結果とに基づいて、対象とする人物が休息中であるか否かを判別するものである。ここで、撮影画像の処理結果とは、画像処理手段１０の顔認識手段１１ｃによって認識された顔の位置を含み、また、視線検出手段１１ｄで判定される目が閉じているかどうかという情報を含む。また、音声認識の結果とは、人物の音声が入力されたか否かという情報を含む。
集団状況推定手段１２２は、対象とする人物の付近に他の人物が認識されないか否かを判別し、他の人物が認識される場合に、対話中画像条件と撮影画像の処理結果とに基づいて、対象とする人物を含む複数の人物が対話中であるか否かを判別するものである。なお、複数の人物を認識する方法としては、オブジェクトデータ統合部４２による識別、画像処理部１０の画像処理による識別、対象検知部８０（図６参照）のタグ検知による識別のいずれの方法を用いてもよい。
発話タイミング決定手段１２３は、対象とする人物が休息中ではなく、かつ、対話中ではないと判別された場合であって、視線検出部１１ｄで検出された視線方向がロボットＲの方向に向いているときに発話タイミングであると決定するものである。ここで、視線方向がロボットＲの方向を向いているときとは、視線方向がロボットＲの方向と完全一致しているときだけを指すものではなく、視線方向が予め設定された範囲内を向いていればよい。 The personal situation estimation unit 121 determines whether or not the target person is resting based on the resting image condition, the processing result of the captured image, and the result of voice recognition. Here, the processing result of the photographed image includes the position of the face recognized by the face recognition unit 11c of the image processing unit 10, and includes information on whether the eyes determined by the line-of-sight detection unit 11d are closed. . The result of voice recognition includes information about whether or not a person's voice has been input.
The group situation estimation unit 122 determines whether or not another person is recognized in the vicinity of the target person, and when the other person is recognized, based on the image condition during conversation and the processing result of the captured image. Thus, it is determined whether or not a plurality of persons including the target person are in conversation. As a method for recognizing a plurality of persons, any one of identification by the object data integration unit 42, identification by image processing by the image processing unit 10, and identification by tag detection by the target detection unit 80 (see FIG. 6) is used. May be.
The utterance timing determination means 123 is a case where it is determined that the target person is not resting and is not in conversation, and the line-of-sight direction detected by the line-of-sight detection unit 11d is directed toward the robot R. It is determined that it is utterance timing. Here, the case where the line-of-sight direction is facing the direction of the robot R does not only indicate that the line-of-sight direction is completely coincident with the direction of the robot R, but the line-of-sight direction is within a preset range. It only has to be.

興味推定手段１２４は、対象とする人物に対して所定の話題に関する発話を行った後に発話された話題に対してその人物が示す興味を数値化した興味度を算出し、算出した興味度が上昇したか否かを判別し、興味度が上昇した場合にその人物が話題に興味を有していると判定すると共に、判別結果を記憶手段３０に記録するものである。
本実施形態では、興味推定手段１２４は、興味度を算出する方法として、人物を撮像した画像に基づく画像判定モードと、入力される人物の音声に基づく音声判定モードと、それらを統合した統合判定モードとを有しており、各モードは適宜切替可能に構成されている。 The interest estimation unit 124 calculates an interest level obtained by quantifying the interest expressed by the person with respect to the topic uttered after uttering the target person with respect to a predetermined topic, and the calculated interest level increases. When the degree of interest increases, it is determined that the person is interested in the topic, and the determination result is recorded in the storage means 30.
In the present embodiment, the interest estimation unit 124 calculates, as a method of calculating the degree of interest, an image determination mode based on an image of a person, a sound determination mode based on an input person's voice, and an integrated determination that integrates them. Modes, and each mode is configured to be appropriately switched.

具体的には、画像判定モードにおいて、興味推定手段１２４は、対象とする人物に対して所定の話題に関する発話を行った後に視線検出部１１ｄで検出された視線方向を示すベクトルにより数値化した興味度を算出する。この場合、興味推定手段１２４は、検出された視線方向がロボットＲの方向に一致する場合が最大値となるように興味度を算出する。
また、音声判定モードにおいて、興味推定手段１２４は、対象とする人物に対して所定の話題に関する発話を行った後に音声認識部２１ｂで音声認識されたキーワードやフレーズ等の認識結果または発話情報検出手部２１ｅで検出された音量、音質、話速の人物発話情報を数値化した興味度を算出する。この場合、興味推定手段１２４は、検出されたキーワードが多いほど、また音量が大きいほど、また音質が高いほど、また話速が速いほど大きくなるように興味度を算出する。なお、例えば、「面白い」等のキーワードや「詳しく教えて」等のフレーズごとに所定の重み付けをしてもよい。
また、統合判定モードにおいて、興味推定手段１２４は、画像判定モードで算出された興味度と、音声判定モードで算出された興味度とに所定の重み付けを行ってから加算した和を最終的な興味度として算出する。 Specifically, in the image determination mode, the interest estimation unit 124 expresses the interest expressed by a vector indicating the line-of-sight direction detected by the line-of-sight detection unit 11d after uttering an intended person with respect to a predetermined topic. Calculate the degree. In this case, the interest estimation unit 124 calculates the degree of interest so that the maximum value is obtained when the detected gaze direction matches the direction of the robot R.
Further, in the voice determination mode, the interest estimation unit 124 recognizes a keyword or a phrase recognized by the voice recognition unit 21b after speaking about a predetermined topic to the target person, or a speech information detecting hand. The degree of interest is calculated by quantifying the person utterance information of the volume, sound quality, and speech speed detected by the unit 21e. In this case, the interest estimation means 124 calculates the degree of interest so that the greater the number of detected keywords, the greater the volume, the higher the sound quality, and the faster the speech speed, the greater the interest. For example, a predetermined weight may be assigned to each keyword such as “interesting” or a phrase such as “tell me in detail”.
Further, in the integrated determination mode, the interest estimation unit 124 calculates the final interest by adding a predetermined weight to the interest degree calculated in the image determination mode and the interest degree calculated in the voice determination mode. Calculate as degrees.

＜応対行動制御手段＞
応対行動制御手段１３０は、人状況判別手段１２０で判別された人物の状況および人物からの反応に対応する発話出力情報を記憶部３０に記憶された人物状況データから抽出し、抽出した発話出力情報と環境情報検出手段１１０で検出された発話出力情報とに基づいて、対象とする人物に対する発話の可否、発話音量および発話口調のうちの少なくとも１つを決定するものである。この応対行動制御手段１３０は、発話レベル調整手段１３１と、身振り調整手段１３２と、話題制御手段１３３と、行動統括制御手段１３４とを備えている。 <Responding behavior control means>
The response action control means 130 extracts the utterance output information corresponding to the situation of the person determined by the human situation determination means 120 and the reaction from the person from the person situation data stored in the storage unit 30, and the extracted utterance output information And utterance output information detected by the environment information detection means 110, at least one of utterance availability, utterance volume, and utterance tone for the target person is determined. The response action control means 130 includes an utterance level adjustment means 131, a gesture adjustment means 132, a topic control means 133, and an action general control means 134.

発話レベル調整手段１３１は、行動統括制御手段１３４によって人物への発話が許可された場合に、行動統括制御手段１３４で算出された統合値に基づいて発話音量のレベルの調整または発話口調の切り替えを行うものである。ここで、発話音量のレベルの調整とは、音声合成部２１ａ（図６参照）に出力される発話行動の指令で指定される当初の音量レベルを、例えば、５０％低下させる指示や、５０％高くさせる指示を出力することである。本実施形態では、発話レベル調整手段１３１は、発話が許可された場合に、ロボットＲが実際に発話を行う位置から対象とする人物のいる位置までの距離をも加味して発話音量のレベルを調整する。すなわち、発話レベル調整手段１３１は、人物が予め設定された発話距離範囲よりも遠くにいる場合には発話音量のレベルを大きくし、人物が発話距離範囲より近くにいる場合には発話音量のレベルを小さくする。なお、対象とする人物のいる位置までの距離は、例えば、ステレオ処理部１１ａ（図６参照）で検出したり、対象検知部８０（図６参照）で検出したり、オブジェクトマップ（図８参照）から算出したりすることができる。また、発話口調の切替とは、音声合成部２１ａ（図６参照）に出力される発話行動の指令で指定される当初の口調を、別の口調に切り替えることである。本実施形態では、記憶部３０に記憶された通常口調用データと特別口調用データとが適宜切り替えられる。 The utterance level adjusting unit 131 adjusts the level of the utterance volume or switches the utterance tone based on the integrated value calculated by the behavior control unit 134 when the utterance to the person is permitted by the behavior control unit 134. Is what you do. Here, the adjustment of the level of the utterance volume is, for example, an instruction to reduce the initial volume level specified by the utterance action command output to the speech synthesizer 21a (see FIG. 6) by 50% or 50%, for example. It is to output an instruction to raise. In the present embodiment, the utterance level adjusting unit 131 adjusts the level of the utterance volume in consideration of the distance from the position where the robot R actually speaks to the position where the target person is located when the utterance is permitted. adjust. That is, the utterance level adjusting means 131 increases the utterance volume level when the person is farther than the preset utterance distance range, and the utterance volume level when the person is closer to the utterance distance range. Make it smaller. The distance to the position where the target person is present is detected by, for example, the stereo processing unit 11a (see FIG. 6), detected by the target detection unit 80 (see FIG. 6), or by an object map (see FIG. 8). ). The switching of the utterance tone is to switch the initial tone specified by the utterance action command output to the speech synthesizer 21a (see FIG. 6) to another tone. In the present embodiment, normal tone data and special tone data stored in the storage unit 30 are appropriately switched.

身振り調整手段１３２は、発話レベル調整手段１３１で発話音量のレベルが調整された場合に、その調整された発話音量のレベルに比例させて身振りによる頭部Ｒ１、腕部Ｒ２、脚部Ｒ３の少なくともいずれかの部位の移動幅を調整するものである。この身振り調整手段１３２は、身振り統合部４４から自律移動制御部５０に身振りを指定するために出力されるコマンドに記述される各部位の移動幅を調整する。例えば、発話音量のレベルが５０％低下される場合には、コマンドに記述される当初の移動幅を同様に５０％短くし、逆に、発話音量のレベルが５０％高くされる場合には、コマンドに記述される当初の移動幅を同様に５０％長くする。なお、移動は線形移動と回転移動を含む。 The gesture adjusting unit 132, when the utterance volume level is adjusted by the utterance level adjusting unit 131, is proportional to the adjusted utterance volume level, and at least of the head R1, the arm R2, and the leg R3 by gesture. The movement width of any part is adjusted. This gesture adjustment means 132 adjusts the movement width of each part described in the command output for designating gestures from the gesture integration unit 44 to the autonomous movement control unit 50. For example, when the utterance volume level is reduced by 50%, the initial movement range described in the command is similarly reduced by 50%, and conversely, when the utterance volume level is increased by 50%, Similarly, the initial movement width described in (1) is increased by 50%. The movement includes linear movement and rotational movement.

話題制御手段１３３は、人物情報記憶手段３５に記憶された話題を提供し、興味推定手段１２４によって対象とする人物が提供された話題に興味を有していないと判定された場合に（興味度が下降した場合に）、提供中の話題に関する発話を中断するものである。また、本実施形態では、話題制御手段１３３は、興味推定手段１２４によって算出された興味度に基づいて、人物情報記憶手段３５に記憶された話題を切り替え、対象とする人物に対して切り替えた話題に関する発話を行う。 The topic control unit 133 provides the topic stored in the person information storage unit 35, and when the interest estimation unit 124 determines that the subject person is not interested in the provided topic (degree of interest) Utterances about the topic being provided). In the present embodiment, the topic control unit 133 switches the topic stored in the person information storage unit 35 based on the degree of interest calculated by the interest estimation unit 124, and switches the topic for the target person. Talk about.

行動統括制御手段１３４は、人物状況データから抽出した発話出力情報と、環境情報検出手段１１０で検出された発話出力情報とを統合することで発話が許可されるか否かを判別し、発話が許可されると判定した場合に統合された発話音量または発話口調を示すアプローチ行動を決定する。本実施形態では、行動統括制御手段１３４は、アプローチ行動を決定する際に、環境情報検出手段１１０で検出された発話出力情報を、人物状況データから抽出した発話出力情報よりも優先する。つまり、行動統括制御手段１３４は、まず、ロボットＲの周囲の環境を重視し、環境情報検出手段１１０で検出された発話情報から、発話の可否、発話音量および発話口調を抽出し、発話可能であるか否かを判定する。次に、行動統括制御手段１３４は、その場で認識される人物の状況を判定材料に加えて発話可能であるか否かを判定する。その上で、行動統括制御手段１３４は、環境情報検出手段１１０で検出された発話音量および発話口調の情報を最終的な統合値とする。また、行動統括制御手段１３４は、以下の４項目のいずれかの終了条件が満たされたか否かを判別し、終了条件が満たされた場合に、対象とする人物に対するアプローチ行動を終了する。
第１終了条件：予め設定された時間が経過しても発話可能な環境にならないとき。
第２終了条件：予め設定された時間が経過しても対象とする人物が対話中であるとき。
第３終了条件：予め設定された時間が経過しても対象とする人物が休息中であるとき。
第４終了条件：予め設定された終了タイミングであるとき。なお、第４終了条件は、バッテリ補給が必要な場合やタスクを実行するための時刻となった場合等を含む。 The behavior overall control unit 134 determines whether or not the utterance is permitted by integrating the utterance output information extracted from the person situation data and the utterance output information detected by the environment information detection unit 110. When it is determined to be permitted, an approach behavior indicating an integrated speech volume or speech tone is determined. In this embodiment, the behavior control unit 134 prioritizes the utterance output information detected by the environment information detection unit 110 over the utterance output information extracted from the person situation data when determining the approach behavior. That is, the behavior control unit 134 first attaches importance to the environment around the robot R, extracts the utterance availability, the utterance volume, and the utterance tone from the utterance information detected by the environment information detection unit 110, and can speak. It is determined whether or not there is. Next, the behavior overall control means 134 determines whether or not the speech can be made by adding the situation of the person recognized on the spot to the determination material. Then, the behavior control unit 134 sets the utterance volume and utterance information detected by the environment information detection unit 110 as final integrated values. Further, the behavior overall control unit 134 determines whether or not any of the following four items is satisfied, and ends the approach behavior for the target person when the termination condition is satisfied.
First ending condition: When an environment capable of speaking cannot be established even after a preset time has elapsed.
Second end condition: When the target person is still in conversation even after a preset time has elapsed.
Third end condition: When the target person is resting even after a preset time has elapsed.
Fourth end condition: When the end timing is set in advance. Note that the fourth end condition includes a case where battery replenishment is required or a time for executing a task is reached.

［ロボットの動作］
図６に示したロボットＲの動作について主にアプローチ行動制御手段４８の動作を中心に図１１を参照（適宜図１、図６、図７および図１０参照）して説明する。図１１は、図６に示したロボットＲの動作を示すフローチャートである。ロボットＲは、主制御部４０によって、無線通信部６０を介して管理用コンピュータ３からローカル地図等の情報を予め取得しておく。また、本実施形態では、ロボットＲは、現在実行すべきタスクを有していないときに、主制御部４０のモチベーション管理部４７によって追加された行動計画によって選択された人物の所在地に移動し、その人物と対話するために発話を行う前に、主制御部４０のアプローチ行動制御手段４８が動作を開始することとする。また、アプローチ行動制御手段４８の興味推定手段１２４は、興味度を算出する方法として、人物を撮像した画像に基づく画像判定モードに設定されているものとする。 [Robot motion]
The operation of the robot R shown in FIG. 6 will be described with reference to FIG. 11 (refer to FIGS. 1, 6, 7, and 10 as appropriate), mainly focusing on the operation of the approach behavior control means 48. FIG. 11 is a flowchart showing the operation of the robot R shown in FIG. The robot R previously acquires information such as a local map from the management computer 3 via the wireless communication unit 60 by the main control unit 40. In this embodiment, the robot R moves to the location of the person selected by the action plan added by the motivation management unit 47 of the main control unit 40 when it does not have a task to be executed at present. It is assumed that the approach behavior control means 48 of the main control unit 40 starts operating before speaking to interact with the person. Further, it is assumed that the interest estimation unit 124 of the approach behavior control unit 48 is set to an image determination mode based on an image of a person as a method of calculating the degree of interest.

そして、アプローチ行動制御手段４８は、環境情報検出手段１１０によって、ローカル地図データ記憶手段３２に記憶された地図データから、検出された現在位置および騒音のレベルに対応する発話出力情報を環境に起因した情報として検出する（ステップＳ１）。そして、アプローチ行動制御手段４８は、応対行動制御手段１３０の行動統括制御手段１３４によって、現在位置および騒音レベルに対応する発話出力情報に基づいて、現在位置が発話可能な環境か否かを判別する（ステップＳ２）。 Then, the approach behavior control means 48 causes the utterance output information corresponding to the current position and noise level detected from the map data stored in the local map data storage means 32 by the environment information detection means 110 to originate from the environment. It detects as information (step S1). Then, the approach behavior control unit 48 determines whether or not the current position is an utterable environment based on the utterance output information corresponding to the current position and the noise level by the behavior control unit 134 of the response behavior control unit 130. (Step S2).

ステップＳ２において、現在位置が発話可能な環境である場合（ステップＳ２：Ｙｅｓ）、アプローチ行動制御手段４８は、人状況判別手段１２０の集団状況推定手段１２２によって、対象とする人物の付近に他の人物が認識されないか否かを判別する。すなわち、集団状況推定手段１２２は、対象とする人物が独り（１人）でいるか否かを判別する（ステップＳ３）。対象とする人物が独り（１人）でない場合（ステップＳ３：Ｎｏ）、続いて、集団状況推定手段１２２は、対象とする人物を含む複数の人物が対話中であるか否かを判別する（ステップＳ４）。 In step S2, when the current position is an utterable environment (step S2: Yes), the approach behavior control unit 48 uses the group situation estimation unit 122 of the human situation determination unit 120 to place another target person in the vicinity of the target person. It is determined whether or not a person is recognized. That is, the group status estimation unit 122 determines whether or not the target person is alone (one person) (step S3). When the target person is not alone (one person) (step S3: No), subsequently, the group situation estimation unit 122 determines whether or not a plurality of persons including the target person are in conversation (step S3). Step S4).

ステップＳ４において、対象とする人物を含む複数の人物が対話中ではない場合（ステップＳ４：Ｎｏ）、人状況判別手段１２０は、個人状況推定手段１２１によって、対象とする人物が休息中であるか否かを判別する（ステップＳ５）。また、ステップＳ３において、対象とする人物が独り（１人）でいる場合（ステップＳ３：Ｙｅｓ）、人状況判別手段１２０は、ステップＳ４をスキップしてステップＳ５に進む。 In step S4, when a plurality of persons including the target person are not in conversation (step S4: No), the personal situation determination unit 120 uses the personal situation estimation unit 121 to check whether the target person is resting. It is determined whether or not (step S5). In step S3, when the target person is alone (one person) (step S3: Yes), the human situation determination unit 120 skips step S4 and proceeds to step S5.

ステップＳ５において、対象とする人物が休息中ではない場合（ステップＳ５：Ｎｏ）、人状況判別手段１２０は、発話タイミング決定手段１２３によって、発話タイミングであると決定する（ステップＳ６）。なお、発話タイミング決定手段１２３は、人物の視線方向がロボットＲの方向を向いていることを確認して発話タイミングを決定する。そして、アプローチ行動制御手段４８は、応対行動制御手段１３０の行動統括制御手段１３４によって、発話が許可されると判定した場合に、発話レベル調整手段１３１によってステップＳ１で検出された発話出力情報を統合値として発話音量のレベルの調整または発話口調の切替を行い、発話が身振りを伴うものであって発話音量のレベルが調整された場合には、身振り調整手段１３２によって、発話音量のレベルに比例させて身振りによる各部位の移動幅を調整する（ステップＳ７）。 In step S5, when the target person is not resting (step S5: No), the person situation determination unit 120 determines the utterance timing by the utterance timing determination unit 123 (step S6). Note that the utterance timing determination unit 123 determines the utterance timing after confirming that the gaze direction of the person faces the direction of the robot R. The approach behavior control unit 48 integrates the utterance output information detected in step S1 by the utterance level adjustment unit 131 when the behavior control unit 134 of the response behavior control unit 130 determines that the utterance is permitted. When the utterance volume level is adjusted or the utterance tone is switched as a value, and the utterance is accompanied by gestures and the utterance volume level is adjusted, the gesture adjustment means 132 causes the utterance volume level to be proportional to the utterance volume level. The movement width of each part by gesture is adjusted (step S7).

そして、ロボットＲは、発話を実際に行う。すなわち、応対行動制御手段１３０の話題制御手段１３３は、シナリオ記憶手段３４に記憶されたシナリオに基づいて、音声合成部２１ａに音声の出力を指示すると共に、発話に伴った身振りの実行を身振り統合部４４に指示する。これにより、ロボットＲは、発話を行うと共に、発話に伴った身振りを実行する（ステップＳ８）。具体的には、ロボットＲは、例えば、図３に示した談話室３０７において、対象とする人物に発話する場合には、口語口調で親しげに比較的高い音量で発話すると共に、腕部Ｒ２等を比較的大きく動かす。また、例えば、図３に示した会議室３０８において、対象とする人物に発話する場合には、敬語口調で比較的低い音量で発話すると共に、腕部Ｒ２等を比較的小さく動かす。そして、ロボットＲは、音声認識部２１ｂによって、入力された人物の音声を認識する（ステップＳ９）。そして、アプローチ行動制御手段４８は、人状況判別手段１２０の興味推定手段１２４によって、検出された視線方向に基づいて興味度を算出し、対象とする人物が話題に興味を有しているか推定する（ステップＳ１０）。 Then, the robot R actually performs the utterance. That is, the topic control means 133 of the response action control means 130 instructs the speech synthesizer 21a to output speech based on the scenario stored in the scenario storage means 34, and integrates the gesture execution associated with the utterance. The unit 44 is instructed. As a result, the robot R utters and performs gestures associated with the utterance (step S8). Specifically, for example, when the robot R speaks to a target person in the conversation room 307 shown in FIG. 3, the robot R speaks at a relatively high volume in a colloquial tone, and the arm R2 Move relatively large. Further, for example, in the conference room 308 shown in FIG. 3, when speaking to a target person, speaking in a respectful tone and a relatively low volume, the arm portion R2 and the like are moved relatively small. Then, the robot R recognizes the input voice of the person by the voice recognition unit 21b (step S9). Then, the approach behavior control unit 48 calculates the degree of interest based on the detected gaze direction by the interest estimation unit 124 of the human situation determination unit 120, and estimates whether the target person is interested in the topic. (Step S10).

そして、アプローチ行動制御手段４８は、応対行動制御手段１３０の行動統括制御手段１３４によって、終了条件が成立したか否かを判別する（ステップＳ１１）。終了条件が成立した場合（ステップＳ１１：Ｙｅｓ）、アプローチ行動制御手段４８は、処理を終了する。一方、終了条件が成立していない場合（ステップＳ１１：Ｎｏ）、アプローチ行動制御手段４８は、応対行動制御手段１３０の話題制御手段１３３によって、興味推定手段１２４によって推定された興味度に基づいて、次に提供する話題を展開し（ステップＳ１２）、ステップＳ１に戻る。 Then, the approach behavior control unit 48 determines whether or not the termination condition is satisfied by the behavior control unit 134 of the response behavior control unit 130 (step S11). When the end condition is satisfied (step S11: Yes), the approach behavior control unit 48 ends the process. On the other hand, when the end condition is not satisfied (step S11: No), the approach behavior control means 48 is based on the degree of interest estimated by the interest estimation means 124 by the topic control means 133 of the reception behavior control means 130. Next, the topic to be provided is developed (step S12), and the process returns to step S1.

前記したステップＳ２において、現在位置が発話可能な環境ではない場合（ステップＳ２：Ｎｏ）、アプローチ行動制御手段４８は、ステップＳ１１に進み、終了条件が成立したか否かを判別する。また、前記したステップＳ４において、対象とする人物を含む複数の人物が対話中である場合（ステップＳ４：Ｙｅｓ）、アプローチ行動制御手段４８は、ステップＳ１１に進む。また、前記したステップＳ５において、対象とする人物が休息中である場合（ステップＳ５：Ｙｅｓ）、アプローチ行動制御手段４８は、ステップＳ１１に進む。 In the above-described step S2, when the current position is not an utterable environment (step S2: No), the approach behavior control unit 48 proceeds to step S11 and determines whether or not an end condition is satisfied. Further, in the above-described step S4, when a plurality of persons including the target person are in conversation (step S4: Yes), the approach behavior control unit 48 proceeds to step S11. In step S5 described above, when the target person is resting (step S5: Yes), the approach behavior control unit 48 proceeds to step S11.

［話題の展開の具体例］
ここで、話題の展開の具体例について、図１２を参照（適宜図１０参照）して説明する。人物情報記憶手段３５には、例えば、９種類の話題が記憶されているものとする。これらの話題は、図１２に示すように、例えば、９個のノード１２０１〜１２０９と、それらを結ぶリンクとを用いて記述することができる。ノード１２０１は、「スポーツ」に関する話題を示している。ノード１２０１は、ノード１２０２〜１２０４にそれぞれ接続されている。ノード１２０２〜１２０４は、「陸上」、「水泳」、「球技」に関する話題をそれぞれ示している。ロボットＲは、話題制御手段１３３によって、例えば、ノード１２０１から話題の提供を開始し、ノード１２０１において話題の提供が終了した時点で、例えば、話題提供中の興味度の平均値に応じて、ノード１２０２〜１２０４の中から、次に提供する話題を選択する。なお、他のノード１２０５〜１２０９は、図１２においてノード１２０１〜１２０４と同様なものを示しているので詳細な説明を省略する。これによれば、ロボットＲは、質問を多用することなく人物が嗜好すると推定される情報を発話するので対話が自然なものとなり、対話相手の人物がロボットＲに親しみを感じ易くなる。 [Specific examples of topic development]
Here, a specific example of topic development will be described with reference to FIG. 12 (see FIG. 10 as appropriate). For example, nine types of topics are stored in the person information storage unit 35. These topics can be described using, for example, nine nodes 1201 to 1209 and links connecting them as shown in FIG. A node 1201 indicates a topic related to “sports”. The node 1201 is connected to the nodes 1202 to 1204, respectively. Nodes 1202 to 1204 indicate topics related to “land”, “swimming”, and “ball game”, respectively. The robot R starts providing a topic from the node 1201, for example, by the topic control unit 133, and when the topic is provided at the node 1201, for example, according to the average value of the degree of interest during the topic provision. A topic to be provided next is selected from 1202 to 1204. The other nodes 1205 to 1209 are the same as the nodes 1201 to 1204 in FIG. According to this, since the robot R utters information presumed to be preferred by the person without using many questions, the conversation becomes natural and the person of the conversation partner can easily feel familiar with the robot R.

また、他の例として、各ノード１２０１〜１２０９に、そのノードのテーマが好きであるかどうかを尋ねる質問と、そのノードのテーマに関する情報とを含むようにしてもよい。この場合には、例えば、ロボットＲは、話題制御手段１３３によって、ノード１２０１から話題の提供を開始する場合に、初期話題として、例えば、ノード１２０１において、「スポーツは好きですか？」という質問を選択して発話する。そして、ロボットＲは、「はい」という返事を音声認識した場合に、ノード１２０１に接続されたノード１２０２〜１２０４の中から、例えば、ノード１２０４において、「球技は好きですか？」という質問を選択して発話する。同様に、「野球は好きですか？」、「Giantsは好きですか？」という質問を選択して発話する。このように、すべての質問に対して「はい」の返事を音声認識した場合に、話題制御手段１３３は、最終的にテーマを「Giants」に決定し、続いて、「Giants」に関する情報の話題を発話することとなる。また、途中で、「いいえ」の返事を音声認識した場合には、話題制御手段１３３は、質問したテーマと同レベルの別のテーマについて同様な質問を行う。例えば、「球技は好きですか？」という質問に対して、ロボットＲが「いいえ」の返事を音声認識した場合に、ノード１２０２またはノード１２０３から該当する質問を選択する。これによれば、質問を繰り返すことで人物の興味を絞込むので、人物が嗜好する情報を短時間で推定し、情報を効率よく提供することができる。 As another example, each of the nodes 1201 to 1209 may include a question asking whether or not the node's theme is liked and information on the theme of the node. In this case, for example, when the topic control unit 133 starts providing a topic from the node 1201, the robot R asks the question “Do you like sports?” As an initial topic, for example, at the node 1201. Select and speak. Then, when the robot R recognizes the reply “Yes”, for example, the node 1204 selects the question “Do you like ball games?” From the nodes 1202 to 1204 connected to the node 1201. And speak. Similarly, the user selects and speaks the questions “Do you like baseball?” And “Do you like Giants?” As described above, when the answer “Yes” is recognized by voice to all the questions, the topic control unit 133 finally determines the theme as “Giants”, and subsequently, the topic of information on “Giants”. Will be spoken. In the middle, when the answer of “No” is voice-recognized, the topic control means 133 makes a similar question about another theme at the same level as the questioned question. For example, when the robot R recognizes a reply of “No” to the question “Do you like ball games?”, The corresponding question is selected from the node 1202 or the node 1203. According to this, since a person's interest is narrowed down by repeating a question, the information which a person likes can be estimated in a short time, and information can be provided efficiently.

また、他の例として、予め定められた人物の興味を例えばアンケートにより事前に調査した結果と、各ノード１２０１〜１２０９とを、人物ごとに対応付けておくこともできる。この場合には、ロボットＲは、事前に調査した結果に基づいて、認識した人物ごとに異なる展開の仕方で話題を発話することができる。さらに、他の例として、話題制御手段１３３は、各ノード１２０１〜１２０９のいずれかをランダムに選択するようにしてもよい。 As another example, a result obtained by investigating a predetermined person's interest in advance through a questionnaire, for example, can be associated with each node 1201 to 1209 for each person. In this case, the robot R can utter a topic in a different manner of development for each recognized person, based on the results of a prior investigation. Furthermore, as another example, the topic control unit 133 may randomly select any one of the nodes 1201 to 1209.

本実施形態によれば、ロボットＲは、発話を行う前に対象とする人物へのアプローチ行動を制御するために、現在位置する場所および測定された騒音と、対象とする人物の現在撮像された画像が示す状況と、音声認識結果とに応じて、発話の可否、発話音量、発話口調を変更することができる。また、ロボットＲは、調整した発話音量のレベルに比例して身振りによる各部位の移動幅を調整するので、発話中の身振りを自然なものとすることができる。さらに、ロボットＲは、話題に関する発話を行った後に算出した興味度に基づいて、話題に対して対話中の人物が興味を持っているか判別し、その人物の嗜好する話題を推定して発話することができる。 According to the present embodiment, the robot R captures the current location of the current position and the measured noise and the current image of the target person in order to control the approach behavior to the target person before speaking. Depending on the situation indicated by the image and the speech recognition result, it is possible to change the availability of speech, speech volume, and speech tone. Further, since the robot R adjusts the movement width of each part by gesture in proportion to the adjusted level of the utterance volume, the gesture during utterance can be made natural. Further, the robot R determines whether the person who is talking to the topic is interested based on the degree of interest calculated after the utterance about the topic, and estimates the topic that the person likes and utters the topic. be able to.

以上、本発明の好ましい実施形態について説明したが、本発明は前記した実施形態に限定されるものではない。例えば、本実施形態では、行動統括制御手段１３４は、アプローチ行動を決定する際に、環境情報検出手段１１０で検出された発話出力情報を、人物状況データから抽出した発話出力情報よりも優先するものとして説明したが、人物状況データから抽出した発話出力情報の方を優先するように構成してもよい。この場合には、行動統括制御手段１３４は、まず、認識される人物の状況を重視し、次に、ロボットＲの周囲の環境を判定材料に加える。 As mentioned above, although preferable embodiment of this invention was described, this invention is not limited to above-described embodiment. For example, in this embodiment, the behavior control unit 134 prioritizes the utterance output information detected by the environment information detection unit 110 over the utterance output information extracted from the person situation data when determining the approach behavior. However, the utterance output information extracted from the person situation data may be prioritized. In this case, the behavior control unit 134 first places importance on the status of the recognized person, and then adds the environment around the robot R to the determination material.

また、行動統括制御手段１３４は、環境情報検出手段１１０で検出された発話出力情報と、人物状況データから抽出した発話出力情報とを同じタイミングで統合するように構成してもよい。この場合には、例えば、行動統括制御手段１３４は、人物状況データから抽出した発話出力情報と、環境情報検出手段１１０で検出された発話出力情報とを比較し、両者が異なる場合に、各発話出力情報を数値化して重み付けを行って統合した統合値を算出し、算出された統合値が予め設定された発話の許可を示す設定値より小さい場合に、人物への発話を許可する。また、行動統括制御手段１３４は、人物状況データから抽出した発話出力情報と、環境情報検出手段１１０で検出された発話出力情報とが同じである場合に、その発話出力情報に基づいて、ロボットＲの発話の可否を判断し、発話可と判定した場合にその発話出力情報に基づく発話音量および発話口調を音声合成部２１ａに指示する。 Further, the behavior control unit 134 may be configured to integrate the utterance output information detected by the environment information detection unit 110 and the utterance output information extracted from the person situation data at the same timing. In this case, for example, the behavior control unit 134 compares the utterance output information extracted from the person situation data with the utterance output information detected by the environment information detection unit 110. The integrated value obtained by digitizing the output information and performing weighting to calculate an integrated value is calculated. When the calculated integrated value is smaller than a preset setting value indicating permission of utterance, utterance to a person is permitted. In addition, when the utterance output information extracted from the person situation data and the utterance output information detected by the environment information detection unit 110 are the same, the behavior overall control unit 134 determines whether the robot R is based on the utterance output information. The speech synthesis unit 21a is instructed of the speech volume and speech tone based on the speech output information when it is determined that speech is possible.

また、本実施形態では、ローカル地図データ記憶手段３２に記憶された地図データは、予め設定された騒音のレベルごとに発話出力情報が位置情報と対応付けて作成されているものとしたが、これに限定されるものではない。例えば、地図データは、時間帯等の情報や予め定められたイベント情報ごとに、発話出力情報が位置情報と対応付けて作成するようにしてもよい。 Further, in the present embodiment, the map data stored in the local map data storage means 32 is assumed that the utterance output information is created in association with the position information for each preset noise level. It is not limited to. For example, the map data may be created by associating the utterance output information with the position information for each information such as a time zone or predetermined event information.

また、地図データは、必ずしも騒音のレベルごとに作成されている必要はなく、発話出力情報と位置情報とを対応付けて作成しておくようにしてもよい。この場合には、騒音レベルと発話出力情報とを対応付けたテーブルを別に作成しておき、環境情報検出手段１１０は、音声処理部２０の騒音測定部２１ｄで検出された騒音レベルと、地図データおよび別に作成されたテーブルとを参照して所定の規則に則って、現在位置に対応する発話出力情報を決定して応対行動制御手段１３０に出力することができる。 The map data does not necessarily have to be created for each noise level, and may be created by associating the utterance output information and the position information. In this case, a table in which the noise level and the utterance output information are associated with each other is created separately, and the environment information detecting unit 110 detects the noise level detected by the noise measuring unit 21d of the voice processing unit 20 and the map data. The utterance output information corresponding to the current position can be determined according to a predetermined rule with reference to a table created separately and output to the response action control means 130.

また、本実施形態では、画像判定モードでは、視線方向から興味度を算出するものとしたが、顔の向きやロボットＲから人物の顔までの距離を用いて興味度を算出するようにしてもよい。ここで、ロボットＲから人物の顔までの距離は、話題に引き込まれた人物が身を乗り出す具合を示す。また、顔の表情や、頷いているかどうかという点を数値化して興味度を算出するようにしてもよい。 In this embodiment, in the image determination mode, the degree of interest is calculated from the line-of-sight direction. However, the degree of interest may be calculated using the face direction and the distance from the robot R to the person's face. Good. Here, the distance from the robot R to the person's face indicates how the person drawn into the topic gets on. Also, the degree of interest may be calculated by digitizing the facial expression and whether or not it is whispering.

また、本実施形態では、ロボットを、２足歩行可能な自律移動型ロボットとして説明したが、これに限定されず、車輪で移動する自律移動型ロボット、産業用ロボット、自動車などの種々の移動体への応用も可能である。 In the present embodiment, the robot has been described as an autonomous mobile robot capable of walking on two legs. However, the present invention is not limited to this, and various mobile objects such as an autonomous mobile robot that moves by wheels, an industrial robot, and an automobile. Application to is also possible.

本発明の実施形態に係るロボットを含むロボットシステムの構成を模式的に示す図である。It is a figure which shows typically the structure of the robot system containing the robot which concerns on embodiment of this invention. ロボットによる自己位置検出およびオブジェクト検出の一例を模式的に示す図である。It is a figure which shows typically an example of the self position detection and object detection by a robot. 図１に示したロボットシステムで用いられるローカル地図の例を示す図である。It is a figure which shows the example of the local map used with the robot system shown in FIG. 図１に示した管理用コンピュータの記憶手段に記憶されたタスク情報データベースの一例を示す図である。It is a figure which shows an example of the task information database memorize | stored in the memory | storage means of the management computer shown in FIG. 図１に示した管理用コンピュータの記憶手段に記憶されたタスクスケジュールテーブルの一例を示す図である。It is a figure which shows an example of the task schedule table memorize | stored in the memory | storage means of the management computer shown in FIG. 本発明の実施形態に係るロボットの構成を示すブロック図である。It is a block diagram which shows the structure of the robot which concerns on embodiment of this invention. 図６に示したロボットの主制御部の構成を示すブロック図である。It is a block diagram which shows the structure of the main control part of the robot shown in FIG. オブジェクトデータの一例を示す図である。It is a figure which shows an example of object data. モチベーション指数データの一例を示す図である。It is a figure which shows an example of the motivation index data. 図７に示したアプローチ行動制御手段の構成を示すブロック図である。It is a block diagram which shows the structure of the approach action control means shown in FIG. 図６に示したロボットの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the robot shown in FIG. 話題の展開例を模式的に示す図である。It is a figure which shows typically the example of expansion | deployment of a topic.

Explanation of symbols

Ａロボットシステム
Ｒロボット
Ｒ１頭部
Ｒ２腕部
Ｒ３脚部
Ｒ４胴体部
Ｒ５背面格納部
１基地局
２ロボット専用ネットワーク
３管理用コンピュータ
３ａ記憶部
４ネットワーク
５端末
１０画像処理部（画像処理手段）
１１ａステレオ処理部
１１ｂ移動体抽出部
１１ｃ顔認識部（顔認識手段）
１１ｄ視線検出部（視線検出手段）
２０音声処理部（音声処理手段）
２１ａ音声合成部
２１ｂ音声認識部（音声認識手段）
２１ｃ音源定位部
２１ｄ騒音測定部（騒音測定手段）
２１ｅ発話情報検出部（発話情報検出手段）
３０記憶部（記憶手段）
３１オブジェクトデータ記憶手段
３２ローカル地図データ記憶手段
３３モチベーション指数記憶手段
３４シナリオ記憶手段
３５人物情報記憶手段
４０主制御部
４１静止障害物統合部
４２オブジェクトデータ統合部
４３行動パターン部
４４身振り統合部（身振り統合手段）
４５内部状態検出部
４６行動計画管理部
４７モチベーション管理部
４８アプローチ行動制御手段
５０自律移動制御部（自律移動制御手段）
６０無線通信部
７０バッテリ
８０対象検知部（対象検知手段）
９０周辺状態検知部
１１０環境情報検出手段
１２０人状況判別手段
１２１個人状況推定手段
１２２集団状況推定手段
１２３発話タイミング決定手段
１２４興味推定手段
１３０応対行動制御手段
１３１発話レベル調整手段
１３２身振り調整手段
１３３話題制御手段
１３４行動統括制御手段
Ｃカメラ（撮影手段）
ＭＣマイク
Ｓスピーカ（音声出力手段）
ＳＲ１ジャイロセンサ
ＳＲ２ＧＰＳ受信器（自己位置検出手段） A Robot system R Robot R1 Head R2 Arm R3 Leg R4 Torso R5 Back storage 1 Base station 2 Robot dedicated network 3 Management computer 3a Storage unit 4 Network 5 Terminal 10 Image processing unit (image processing means)
11a Stereo processing unit 11b Moving body extraction unit 11c Face recognition unit (face recognition means)
11d Line-of-sight detection unit (line-of-sight detection means)
20 Voice processing unit (voice processing means)
21a Speech synthesis unit 21b Speech recognition unit (speech recognition means)
21c Sound source localization unit 21d Noise measurement unit (noise measurement means)
21e Utterance information detection unit (utterance information detection means)
30 storage unit (storage means)
31 Object data storage means 32 Local map data storage means 33 Motivation index storage means 34 Scenario storage means 35 Person information storage means 40 Main control part 41 Stationary obstacle integration part 42 Object data integration part 43 Behavior pattern part 44 Gesture integration part (gesture Integration method)
45 Internal state detection unit 46 Action plan management unit 47 Motivation management unit 48 Approach action control means 50 Autonomous movement control part (autonomous movement control means)
60 wireless communication unit 70 battery 80 target detection unit (target detection means)
DESCRIPTION OF SYMBOLS 90 Peripheral state detection part 110 Environmental information detection means 120 Person situation judgment means 121 Individual situation estimation means 122 Group situation estimation means 123 Utterance timing determination means 124 Interest estimation means 130 Response action control means 131 Utterance level adjustment means 132 Gesture adjustment means 133 Topic Control means 134 Action control means C Camera (photographing means)
MC Microphone S Speaker (Audio output means)
SR1 Gyro sensor SR2 GPS receiver (self-position detection means)

Claims

Image processing for detecting the current position of the robot on a preset map and image processing so that the situation of the person can be discriminated from a photographed image obtained by photographing the person to be communicated with the photographing means Means, speech processing means for performing speech recognition and speech so that the situation of the person can be discriminated from speech, and approach behavior control means for controlling approach behavior to the target person before performing the speech. Robot,
Map data created by associating utterance output information indicating at least one of utterance availability, utterance volume and utterance tone set in advance with position information indicating the position on the map;
Storage means for storing the utterance output information and human condition data created by associating a preset image condition indicating a person's condition and a sound condition indicating a reaction from the person,
The approach behavior control means includes:
Environmental information detecting means for detecting speech output information corresponding to the detected current position as information resulting from the environment of the robot from the map data;
Based on the preset image condition, the processing result of the captured image, and the result of the voice recognition, a human situation determination unit that determines the situation of the target person;
The utterance output information corresponding to the situation of the person determined by the person situation determination means is extracted from the person situation data, and based on the extracted utterance output information and the utterance output information detected by the environment information detection means And a response behavior control means for determining at least one of speech availability, speech volume and speech tone for the subject person.

Noise measuring means for measuring the noise around the robot and detecting the noise level;
The map data is created by associating the utterance output information with the position information for each preset noise level,
The environment information detection means is information derived from the environment of the robot from the map data created for each preset noise level, with utterance output information corresponding to the detected current position and the detected noise level. The robot according to claim 1, wherein the robot is detected as:

The response action control means includes:
Comparing the utterance output information extracted from the person situation data with the utterance output information detected by the environment information detection means, and when the two are different, the utterance output information is digitized and weighted and integrated. Calculating a value, and when the calculated integrated value is smaller than a preset value indicating permission of utterance set in advance, behavior overall control means for allowing utterance to the person,
An utterance level adjusting means for adjusting an utterance volume level or switching an utterance tone based on the integrated value when utterance to the person is permitted;
The robot according to claim 1, further comprising:

Autonomous movement for autonomously moving the at least one part by outputting a drive signal to a driving means for driving at least one part of the head, arm and leg respectively connected to the torso of the robot Control means;
Scenario storage means for storing a pre-created scenario for designating a gesture that is a body motion that moves the at least one part when performing a predetermined utterance;
Gesture integration means for extracting gestures corresponding to utterances made to the target person from the scenario, and outputting a command for specifying the extracted gestures to the autonomous movement control means;
Gesture adjusting means for adjusting the movement width of the part by the gesture specified as the command in proportion to the adjusted utterance volume level when the utterance volume level is adjusted by the utterance level adjusting means;
The robot according to claim 3, further comprising:

The image processing means includes
Face recognition means for extracting a face area of the target person from the captured image;
Gaze detection means for detecting the gaze direction of the target person from the extracted face area;
After the utterance on the predetermined topic is started for the target person, the degree of interest is calculated by quantifying the gaze direction detected by the gaze detection means, and it is determined whether or not the calculated degree of interest has increased. And when the degree of interest rises, it is determined that the person is interested in the topic, and interest estimation means for recording the determination result;
Topic control means for interrupting utterances related to the predetermined topic when the degree of interest decreases,
The robot according to any one of claims 1 to 4, further comprising:

The voice processing means is
Speech information detecting means for detecting person speech information indicating at least one of the volume, sound quality, and speech speed of the target person's voice from the input voice;
After the utterance on a predetermined topic is started for the target person, a recognition result obtained by voice recognition by the voice processing unit or a person's utterance information detected by the utterance information detection unit is calculated as a degree of interest. The interest estimation means for determining whether or not the calculated interest level has increased, and determining that the person is interested in the topic when the interest level has increased, and recording the determination result When,
Topic control means for interrupting utterances related to the predetermined topic when the degree of interest decreases,
The robot according to any one of claims 1 to 4, further comprising:

A personal information storage means for storing a plurality of topics;
6. The topic control unit switches a topic stored in the person information storage unit based on the degree of interest, and performs an utterance on the switched topic for the target person. Alternatively, the robot according to claim 6.