JP5411789B2

JP5411789B2 - Communication robot

Info

Publication number: JP5411789B2
Application number: JP2010095771A
Authority: JP
Inventors: さち恵坂田
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 2010-04-19
Filing date: 2010-04-19
Publication date: 2014-02-12
Anticipated expiration: 2030-04-19
Also published as: JP2011227237A

Description

本発明は、対話対象とコミュニケーションを行うコミュニケーションロボットに関する。 The present invention relates to a communication robot that communicates with a conversation target.

従来から、対話対象（人物）とコミュニケーションを行うコミュニケーションロボットや自動応答システムが提案されている（例えば、特許文献１参照）。この特許文献１に記載の自動応答システムは、音声認識できない未知語の音声データをその属性情報に対応させて保持し、保持した音声データを対話対象への発話に使用するものである。 Conventionally, a communication robot and an automatic response system for communicating with a conversation target (person) have been proposed (for example, see Patent Document 1). The automatic response system described in Patent Document 1 retains speech data of unknown words that cannot be recognized by speech corresponding to the attribute information, and uses the retained speech data for utterance to a conversation target.

特開２００２−２９７１７９号公報JP 2002-297179 A

しかし、従来のコミュニケーションロボットは、独り言、未知語または音声以外の雑音を音声認識することで、誤って発話してしまうことがあった。この場合、従来のコミュニケーションロボットは、その発話内容を対話対象が理解できないため、コミュニケーションが破綻してしまう。
また、特許文献１に記載の自動応答システムは、未知語に対応するためのデータベースが必要になると共に、その対話処理も複雑になるという問題がある。 However, conventional communication robots sometimes utter incorrectly by recognizing speech other than monologue, unknown words or noise other than speech. In this case, the conventional communication robot fails to communicate because the conversation target cannot understand the utterance content.
In addition, the automatic response system described in Patent Document 1 has a problem that a database for dealing with unknown words is required and the dialogue processing thereof is complicated.

そこで、本発明は、簡易な構成で、コミュニケーションが継続しやすいコミュニケーションロボットを提供することを課題とする。 Therefore, an object of the present invention is to provide a communication robot that can easily continue communication with a simple configuration.

前記した課題を解決するため、本願第１発明に係るコミュニケーションロボットは、音声入力部が入力した対話対象の音声を少なくとも音声認識する音声認識部と、発話情報に基づいて音声出力する音声出力部と、所定の動作を行う可動部と、を有するコミュニケーションロボットであって、前記音声認識部は、前記音声入力部が対話対象の音声を入力したときに、前記音声認識の結果の信頼度を算出および出力し、前記対話対象の音声について前記音声認識部が算出した信頼度に基づいて、前記対話対象の音声を前記音声認識できたが当該対話対象の音声に対して回答できないことを示す回答不能行動を行うか否かの評価値を算出し、当該評価値が予め設定された閾値未満であるときに前記回答不能行動を行うと判定する応答行動判定部と、前記応答行動判定部が前記回答不能行動を行うと判定した場合、前記コミュニケーションロボットが可能な所定の応答行動から、前記回答不能行動を選択する応答行動選択部と、前記応答行動選択部が選択した前記回答不能行動の実行を、前記音声出力部および前記可動部の少なくとも一方に指令する応答行動指令部と、前記対話対象毎に、当該対話対象の音声の平均音量を算出する平均音量算出部と、備え、前記応答行動判定部は、前記音声入力部が入力した対話対象の音声の音量を前記平均音量算出部が算出した対話対象の音声の平均音量で除算した音量係数を算出し、当該音量係数を前記信頼度に乗算した値を、前記評価値として算出することを特徴とする。 To solve the problems described above, communication robot according to the first aspect of the present invention includes at least voice recognition unit that recognizes the voice dialogue target speech input unit inputs, an audio output unit for audio output based on the speech information A communication robot having a movable part that performs a predetermined operation, wherein the voice recognition unit calculates the reliability of the result of the voice recognition when the voice input unit inputs a voice to be interacted with, and outputting, on the basis of the dialogue object of speech in reliability wherein the speech recognizer has been calculated, the speech dialogue target impossible reply indicating that the has been able speech recognition can not answer the voice of the conversation target behavior A response action determination unit that calculates an evaluation value as to whether or not to perform the response and determines that the answer impossible action is performed when the evaluation value is less than a preset threshold value If the response action determining unit determines that performing the answer incapacitated, the predetermined response action possible the communication robot, a response action selector for selecting the answer incapacitated, the response action selection unit selects A response behavior command unit that commands execution of the unanswerable behavior to at least one of the voice output unit and the movable unit; and an average volume calculation unit that calculates an average volume of the voice of the dialogue target for each dialogue target; The response action determination unit calculates a volume coefficient obtained by dividing the volume of the conversation target voice input by the voice input unit by the average volume of the conversation target voice calculated by the average volume calculation unit. A value obtained by multiplying the reliability by a coefficient is calculated as the evaluation value.

かかる構成によれば、コミュニケーションロボットは、応答行動判定部によって、コミュニケーションロボットが対話対象に対して回答を行うか否かを示す指標（つまり、対話対象がコミュニケーションロボットに対して発話したか否かを示す指標）として評価値を算出する。そして、コミュニケーションロボットは、応答行動判定部によって、算出した評価値を閾値と比較して、対話対象に対して回答を行うか否かを判定する。この判定に際して、コミュニケーションロボットは、未知語データベースおよび複雑な処理を必要としない。 According to this configuration, the communication robot uses the response behavior determination unit to indicate an index indicating whether or not the communication robot answers the conversation target (that is, whether or not the conversation target has spoken to the communication robot. An evaluation value is calculated as an index). Then, the communication robot uses the response behavior determination unit to compare the calculated evaluation value with a threshold value and determine whether to reply to the conversation target. In this determination, the communication robot does not require an unknown word database and complicated processing.

ここで、評価値が低い場合、対話対象の音声が回答不要のもの（例えば、独り言、未知語または音声以外の雑音）である可能性が高いと考えられる。この場合、これら対話対象の音声に回答しても、対話対象は、その回答内容を理解できず、コミュニケーションの継続が困難である。また、仮に、対話対象の音声が回答不要のものであったとしても、コミュニケーションロボットが何の動作も行わないと、対話対象は、コミュニケーションロボットが対話対象の音声を聞いているかわからない。このため、コミュニケーションロボットは、応答行動選択部によって、評価値が低い場合には、対話対象の音声を音声認識できたが対話対象の音声に対して回答できないことを示す回答不能行動を選択する。
さらに、コミュニケーションロボットは、対話対象の音声の音量が小さい場合（例えば、独り言）、対話対象に回答する必要がないにもかかわらず、誤って発話を行ってしまうことを防止できる。 Here, when the evaluation value is low, it is considered that there is a high possibility that the speech to be talked is an answer-free one (for example, self-word, unknown word, or noise other than speech). In this case, even if the conversation target voice is answered, the conversation target cannot understand the answer content and it is difficult to continue communication. Further, even if the voice of the dialogue target is an answer-free one, if the communication robot does not perform any operation, the dialogue target does not know whether the communication robot is listening to the voice of the dialogue target . Therefore, communication robot, the response action selection unit, when the evaluation value is low, the voice dialogue object has been recognized speech to select an answer incapacitated that it can not answer the voice dialogue target.
Furthermore, the communication robot can prevent an erroneous utterance even when there is no need to answer the dialogue target when the volume of the voice of the dialogue target is low (for example, to speak alone).

また、前記した課題を解決するため、本願第２発明に係るコミュニケーションロボットは、音声入力部が入力した対話対象の音声を少なくとも音声認識する音声認識部と、発話情報に基づいて音声出力する音声出力部と、所定の動作を行う可動部と、を有するコミュニケーションロボットであって、前記音声認識部は、前記音声入力部が対話対象の音声を入力したときに、前記音声認識の結果の信頼度を算出および出力し、前記対話対象の音声について前記音声認識部が算出した信頼度に基づいて、前記対話対象の音声を前記音声認識できたが当該対話対象の音声に対して回答できないことを示す回答不能行動を行うか否かの評価値を算出し、当該評価値が予め設定された閾値未満であるときに前記回答不能行動を行うと判定する応答行動判定部と、前記応答行動判定部が前記回答不能行動を行うと判定した場合、前記コミュニケーションロボットが可能な所定の応答行動から、前記回答不能行動を選択する応答行動選択部と、前記応答行動選択部が選択した前記回答不能行動の実行を、前記音声出力部および前記可動部の少なくとも一方に指令する応答行動指令部と、を備え、前記音声認識部は、さらに、前記音声入力部が入力した対話対象の音声の発話時間を算出し、前記応答行動判定部は、前記音声認識部が算出した発話時間を予め設定された発話基準時間で除算した発話時間係数を算出し、少なくとも当該発話時間係数を前記信頼度に乗算した値を前記評価値として算出することを特徴とする。 In order to solve the above-described problem, a communication robot according to the second invention of the present application includes a speech recognition unit that recognizes at least speech to be spoken inputted by the speech input unit, and speech output that outputs speech based on speech information. And a movable part that performs a predetermined operation, wherein the voice recognition unit determines the reliability of the result of the voice recognition when the voice input unit inputs a voice to be interacted with. An answer indicating that the voice of the dialogue target can be recognized by the voice recognition unit based on the reliability calculated by the voice recognition unit for the dialogue target voice, but cannot be answered to the voice of the dialogue target. A response action determination is made to calculate an evaluation value as to whether or not to perform the impossible action, and to determine that the answer impossible action is performed when the evaluation value is less than a preset threshold value. And a response behavior selection unit that selects the response impossible behavior from predetermined response behaviors that the communication robot can perform when the response behavior determination unit determines to perform the response impossible behavior, and the response behavior selection unit A response behavior command unit that commands at least one of the voice output unit and the movable unit to execute the unanswerable behavior selected by the voice recognition unit, and the voice recognition unit further includes a dialog input by the voice input unit An utterance time of a target voice is calculated, and the response behavior determination unit calculates an utterance time coefficient obtained by dividing the utterance time calculated by the voice recognition unit by a preset utterance reference time, and at least the utterance time coefficient is calculated. A value obtained by multiplying the reliability is calculated as the evaluation value.

かかる構成によれば、コミュニケーションロボットは、対話対象の音声の発話時間が長い場合には、対話対象に回答が必要となる可能性が高く、この場合に回答不能行動を誤って選択してしまうことを防止できる。 According to such a configuration, the communication robot is likely to require an answer to the conversation target when the utterance time of the conversation target voice is long, and in this case, the action that cannot be answered may be erroneously selected. Can be prevented.

また、本願第３発明に係るコミュニケーションロボットは、前記応答行動判定部が、前記評価値が前記閾値未満であるときに、１単語の音声の発音、前記可動部の所定動作、または、前記１単語の音声の発音と前記可動部の所定動作との組み合わせである前記回答不能行動を行うと判定することが好ましい。 Further, in the communication robot according to the third invention of the present application, when the response behavior determination unit has the evaluation value less than the threshold, the pronunciation of one word, the predetermined operation of the movable unit, or the one word It is preferable to determine that the unanswerable action is performed, which is a combination of the pronunciation of the voice and the predetermined operation of the movable part.

かかる構成によれば、コミュニケーションロボットは、応答行動判定部によって、評価値が閾値未満であるときに、対話対象に対して、コミュニケーションの継続を促す回答不能行動を選択する。ここで、この回答不能行動は、例えば、１単語の音声（「え」、「へ」または「ん」と発音）、可動部の所定動作（首をかしげる動作）、または、これらの組合せ（首をかしげながら「ん」と発話）というような、対話対象にコミュニケーションの継続を促す動作である。 According to this configuration, when the evaluation value is less than the threshold value , the communication robot selects an unreplyable action that prompts the communication target to continue communication when the evaluation value is less than the threshold value. Here, this unanswerable action may be, for example, a one-word voice (pronounced “e”, “he” or “n”), a predetermined action of the movable part (an action of raising the neck), or a combination thereof (neck This is an action that prompts the conversation target to continue the communication, such as “speaking“ n ””.

本発明は、以下のような優れた効果を奏する。
本願第１，２発明は、未知語データベースおよび複雑な処理を必要とせずに回答不能行動を行うか否かを判定するため、その構成を簡易にすることができる。また、本願第１，２発明は、評価値が低い場合には、対話対象の音声を音声認識できたが対話対象の音声に対して回答できないことを示す回答不能行動を選択するため、コミュニケーションを継続しやすくできる。 The present invention has the following excellent effects.
Since the first and second inventions of the present application determine whether or not to perform an unanswerable action without requiring an unknown word database and complicated processing, the configuration can be simplified. In the first and second inventions of the present application, when the evaluation value is low, communication is performed in order to select an unanswerable action indicating that the voice of the dialogue target can be recognized but cannot be answered to the voice of the dialogue target. Easy to continue.

本願第１発明は、対話対象の音声の発話時間を考慮して評価値を算出するため、対話対象に回答する必要がないにもかかわらず、誤って発話を行ってしまうことを防止でき、コミュニケーションがより継続しやすくなる。
本願第２発明は、対話対象毎の平均音量を考慮して評価値を算出するため、対話対象に回答が必要となる可能性が高い場合にもかかわらず、回答不能行動を誤って選択してしまうことを防止でき、コミュニケーションがより継続しやすくなる。
本願第３発明は、コミュニケーションロボットが、対話対象に対してコミュニケーションの継続を促す回答不能行動を行うため、コミュニケーションがより継続しやすくなる。 The first invention of the present application calculates an evaluation value in consideration of the utterance time of the speech to be spoken, so that it is possible to prevent an erroneous utterance from occurring even though there is no need to reply to the dialogue subject. Is easier to continue.
Since the second invention of the present application calculates an evaluation value in consideration of the average volume for each conversation target, even if there is a high possibility that a reply is required for the conversation target, the action that cannot be answered is erroneously selected. Communication, and it becomes easier to continue communication.
In the third invention of the present application, since the communication robot performs an unanswerable action for prompting continuation of communication with respect to the conversation target, communication becomes easier to continue.

本発明の実施形態に係るロボットを含むロボットシステムの構成を模式的に示す図である。It is a figure which shows typically the structure of the robot system containing the robot which concerns on embodiment of this invention. ロボットによる自己位置検出およびオブジェクト検出の一例を模式的に示す図である。It is a figure which shows typically an example of the self position detection and object detection by a robot. 図１に示したロボットシステムで用いられるローカル地図の一例を示す図である。It is a figure which shows an example of the local map used with the robot system shown in FIG. 図１に示した管理用コンピュータの記憶手段に記憶されたタスク情報データベースの一例を示す図である。It is a figure which shows an example of the task information database memorize | stored in the memory | storage means of the management computer shown in FIG. 図１に示した管理用コンピュータの記憶手段に記憶されたタスクスケジュールテーブルの一例を示す図である。It is a figure which shows an example of the task schedule table memorize | stored in the memory | storage means of the management computer shown in FIG. 本発明の実施形態に係るロボットの構成を示すブロック図である。It is a block diagram which shows the structure of the robot which concerns on embodiment of this invention. 図６に示したロボットの主制御部の構成を示すブロック図である。It is a block diagram which shows the structure of the main control part of the robot shown in FIG. 図７に示した応答行動制御部の構成を示すブロック図である。It is a block diagram which shows the structure of the response action control part shown in FIG. 図８に示したルールＤＢ記憶手段に記憶されたルールＤＢの一例を示す図である。It is a figure which shows an example of rule DB memorize | stored in the rule DB memory | storage means shown in FIG. 図８に示したルールＤＢ記憶手段に記憶された動作ＤＢの一例を示す図である。It is a figure which shows an example of operation | movement DB memorize | stored in the rule DB memory | storage means shown in FIG. 図８に示した状況ＤＢ記憶手段に記憶された状況ＤＢの一例を示す図である。It is a figure which shows an example of situation DB memorize | stored in the situation DB memory | storage means shown in FIG. 図８の応答行動判定部が用いる閾値の設定方法の説明図である。It is explanatory drawing of the setting method of the threshold value which the response action determination part of FIG. 8 uses. 図８の応答行動制御部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the response action control part of FIG.

以下、図面を参照して本発明のコミュニケーションロボット（以下「ロボット」という）を実施するための形態（以下「実施形態」という）について詳細に説明する。まず、本発明の実施形態に係るロボットを含むロボット制御システムＡの全体構成について図１を参照して説明する。 DESCRIPTION OF EMBODIMENTS Hereinafter, an embodiment (hereinafter referred to as “embodiment”) for carrying out a communication robot (hereinafter referred to as “robot”) of the present invention will be described in detail with reference to the drawings. First, an overall configuration of a robot control system A including a robot according to an embodiment of the present invention will be described with reference to FIG.

（ロボット制御システムＡの構成）
図１に示すように、ロボット制御システムＡは、ロボットＲと、このロボットＲと無線通信によって接続された基地局１と、この基地局１とロボット専用ネットワーク２を介して接続された管理用コンピュータ３と、この管理用コンピュータ３にネットワーク４を介して接続された端末５とから構成される。 (Configuration of robot control system A)
As shown in FIG. 1, a robot control system A includes a robot R, a base station 1 connected to the robot R by wireless communication, and a management computer connected to the base station 1 via a robot dedicated network 2. 3 and a terminal 5 connected to the management computer 3 via a network 4.

図１に示すように、このロボット制御システムＡは、移動機能を備えた複数のロボットＲ_Ａ，Ｒ_Ｂ，Ｒ_Ｃ（ただし、ロボットを特定しない場合は、単にロボットＲという）を有しており、各ロボットＲは、管理用コンピュータ３においてロボットＲ毎に予め設定されたタスクの実行計画（タスクスケジュール）に従って、タスクを実行する。 As shown in FIG. 1, the robot control system A has a plurality of robots R _A , R _B , and R _C having a moving function (however, when a robot is not specified, it is simply referred to as a robot R). Each robot R executes a task in accordance with a task execution plan (task schedule) preset for each robot R in the management computer 3.

ここでは、自律移動型の２足歩行ロボットを一例として説明する。
ロボットＲは、管理用コンピュータ３から入力された実行命令に従ってタスクを実行するものであり、ロボットＲがタスクを実行する領域として予め設定されたタスク実行エリア内に、少なくとも一台配置されている。
ここで、図１には、来訪者を会議室などの所定の場所に案内するという内容のタスク（案内タスク）を実行中のロボットＲ_Ａと、荷物をある人に渡すという内容のタスク（荷物配達タスク）を実行中のロボットＲ_Ｂと、新たなタスクが割り当てられるまで待機中のロボットＲ_Ｃとが、例示されている。また、この例では、タスク実行エリア内に、３箇所のバッテリ補給エリアＢ_１，Ｂ_２，Ｂ_３が設けられており、ロボットＲは必要に応じてバッテリ充電（バッテリ充電タスク）を行うことができる。 Here, an autonomous mobile biped robot will be described as an example.
The robot R executes a task in accordance with an execution command input from the management computer 3, and at least one robot R is arranged in a task execution area set in advance as an area where the robot R executes the task.
Here, FIG. 1 shows a robot _RA that is executing a task (guidance task) for guiding a visitor to a predetermined place such as a conference room, and a task (package) for delivering a package to a person. and the robot R _B running delivery tasks), and the robot R _C waiting until a new task is assigned is illustrated. In this example, three battery replenishment areas B ₁ , B ₂ , and B ₃ are provided in the task execution area, and the robot R can perform battery charging (battery charging task) as necessary. it can.

ロボットＲは、図２に示すように、頭部Ｒ１、腕部Ｒ２、脚部Ｒ３、胴部Ｒ４および背面格納部Ｒ５を有しており、胴部Ｒ４にそれぞれ接続された頭部Ｒ１、腕部Ｒ２、脚部Ｒ３は、それぞれアクチュエータ（駆動手段）により駆動され、自律移動制御部５０（図６参照）により２足歩行の制御がなされる。この２足歩行についての詳細は、例えば、特開２００１−６２７６０号公報に開示されている。 As shown in FIG. 2, the robot R has a head R1, an arm R2, a leg R3, a torso R4, and a rear housing R5, and the head R1 and arms connected to the torso R4, respectively. The part R2 and the leg part R3 are each driven by an actuator (driving means), and bipedal walking is controlled by the autonomous movement control part 50 (see FIG. 6). Details of this bipedal walking are disclosed in, for example, Japanese Patent Application Laid-Open No. 2001-62760.

このロボットＲは、例えば、案内タスクを実行するときには、人物Ｈを所定の案内領域（オフィスや廊下などの移動領域）で案内する。ここでは、ロボットＲは、周囲に光（例えば、赤外光、紫外光、レーザ光など）および電波を発信して周辺領域に、タグＴを備えた人物Ｈが存在するか否かを検知し、検知した人物Ｈの位置を特定して接近し、タグＴに基づいて、人物Ｈが誰であるのかという個人識別を行う。このタグＴは、ロボットＲが人物の位置（距離および方向）を特定するために発する赤外光および電波を受信する。このタグＴは、受信した赤外光に含まれる受光方向を示す信号と、受信した電波に含まれるロボットＩＤとに基づいて、タグ識別番号を含む受信報告信号を生成し、当該ロボットＲに返信する。この受信報告信号を受信したロボットＲは、受信報告信号に基づいて、タグＴを装着した人物Ｈまでの距離と方向とを認識し、当該人物Ｈに接近することができる。 For example, when executing a guidance task, the robot R guides the person H in a predetermined guidance area (a movement area such as an office or a corridor). Here, the robot R detects whether or not the person H having the tag T exists in the peripheral area by transmitting light (for example, infrared light, ultraviolet light, laser light, etc.) and radio waves to the surroundings. Then, the position of the detected person H is specified and approached, and based on the tag T, personal identification as to who the person H is is performed. The tag T receives infrared light and radio waves emitted from the robot R to specify the position (distance and direction) of the person. The tag T generates a reception report signal including a tag identification number based on the signal indicating the light receiving direction included in the received infrared light and the robot ID included in the received radio wave, and returns it to the robot R. To do. The robot R that has received the reception report signal can recognize the distance and direction to the person H wearing the tag T based on the reception report signal, and can approach the person H.

ロボットＲは、あるタスク（例えば案内タスクや荷物配達タスクなど）を実行するために案内領域内を自律移動する場合に、レーザスリット光または赤外線を照射して、路面状態あるいは路面上のマークを探索するようになっている。すなわち、ロボットＲは、自己が移動領域内のどこを移動しているかを把握し、通常の移動領域内にいる場合はレーザスリット光を路面に照射して路面の段差、うねり、障害物の有無などを検出し、マークＭの設置領域内にいる場合は、赤外線を路面に照射してマークＭを検出し、自己位置の確認・補正などを行うようになっている。ここで、マークＭは、例えば赤外線を再帰的に反射する反射材料で構成された部材である。また、マークＭは位置データを有しており、当該位置データは地図データに含まれる形で記憶部３０（図６参照）に記憶されている。なお、地図データは、案内領域内の特定の場所に設置されたマークＭの位置データと、当該位置データに所定の幅（範囲）を持たせたマークＭの設置領域に関するデータとを含んでいる。また、マークＭの設置領域とは、マークＭから所定距離の範囲内にある領域をいい、例えば、マークＭを中心とした半径が１〜３ｍの円形領域や、マークＭの手前（ロボット側）３ｍの矩形領域などのように任意に設定される。 When the robot R autonomously moves within the guidance area to execute a certain task (for example, a guidance task or a package delivery task), the robot R irradiates laser slit light or infrared rays to search for a road surface state or a mark on the road surface. It is supposed to be. In other words, the robot R knows where the robot is moving in the moving area, and when in the normal moving area, the robot R irradiates the road surface with laser slit light to check for road steps, swells, and obstacles. When the mark M is within the installation area of the mark M, the mark M is detected by irradiating the road surface with infrared rays, and the self position is confirmed and corrected. Here, the mark M is a member made of a reflective material that recursively reflects infrared rays, for example. The mark M has position data, and the position data is stored in the storage unit 30 (see FIG. 6) in a form included in the map data. The map data includes the position data of the mark M installed at a specific location in the guidance area and the data related to the installation area of the mark M having a predetermined width (range) in the position data. . The mark M installation area refers to an area within a predetermined distance from the mark M. For example, a circular area having a radius of 1 to 3 m centered on the mark M, or in front of the mark M (on the robot side). It is arbitrarily set like a 3 m rectangular area.

図１に戻って、ロボット制御システムＡの構成の説明を続ける。
基地局１は、ロボットＲと管理用コンピュータ３との間のデータ交換を仲介するもので
ある。
具体的には、基地局１は、管理用コンピュータ３から出力された実行命令をロボットＲに送信すると共に、ロボットＲから送信されたロボットＲの状態に関するデータ（ステータス情報）やロボットＲが実行命令を受信したことを示す信号（受信報告信号）を受信して、管理用コンピュータ３に出力するものである。
基地局１は、ロボットＲと管理用コンピュータ３との間のデータ交換を確実に行えるようにするために、タスク実行エリア内に少なくとも一つ設けられている。
なお、タスク実行エリアが建物の数フロアに亘って設定されている場合には、フロア毎に設けられていることが好ましく、一つの基地局１では総てのタスク実行エリアをカバーできない場合には、複数の基地局１がタスク実行エリア内に設けられていることが好ましい。 Returning to FIG. 1, the description of the configuration of the robot control system A will be continued.
The base station 1 mediates data exchange between the robot R and the management computer 3.
Specifically, the base station 1 transmits an execution command output from the management computer 3 to the robot R, and also transmits data (status information) on the state of the robot R transmitted from the robot R and the execution command from the robot R. Is received and a signal (reception report signal) is received and output to the management computer 3.
At least one base station 1 is provided in the task execution area in order to ensure data exchange between the robot R and the management computer 3.
In addition, when the task execution area is set over several floors of the building, it is preferable to be provided for each floor. When one base station 1 cannot cover all the task execution areas, A plurality of base stations 1 are preferably provided in the task execution area.

ロボット専用ネットワーク２は、基地局１と、管理用コンピュータ３と、ネットワーク４とを接続するものであり、ＬＡＮ（Local Area Network）などにより実現されるものである。 The robot dedicated network 2 connects the base station 1, the management computer 3, and the network 4, and is realized by a LAN (Local Area Network) or the like.

管理用コンピュータ３は、複数のロボットＲを管理するものであり、基地局１、ロボット専用ネットワーク２を介してロボットＲの移動・発話などの各種制御を行うと共に、ロボットＲに対して必要な情報を提供する。ここで、必要な情報とは、検知された人物の氏名や、ロボットＲの周辺の地図（ローカル地図）などがこれに相当し、これらの情報は、管理用コンピュータ３の記憶部３ａに記憶されている。 The management computer 3 manages a plurality of robots R, and performs various controls such as movement and speech of the robot R via the base station 1 and the robot dedicated network 2 and information necessary for the robot R. I will provide a. Here, the necessary information corresponds to the name of the detected person, a map around the robot R (local map), and the like. These pieces of information are stored in the storage unit 3a of the management computer 3. ing.

ここでは、案内領域３０１は、図３（ａ）に示すように、建物のあるフロアの長方形の領域である。ロボットＲやロボットＲが案内すべき人物は、案内領域３０１の出入口３０２の外側の廊下３０３を通って案内領域３０１に入る。出入口３０２の内側には、ホール３０４が広がっており、ホール３０４の奥の隅には受付３０５が配置され、案内領域３０１の壁側には個室として仕切られた複数の会議室３０６（３０６ａ，３０６ｂ，３０６ｃ）が設けられている。受付３０５は、Ｌ字型のカウンタテーブル３０５ａと、受付スタッフが配置されるカウンタスペース３０５ｂとから成る。カウンタスペース３０５ｂには、基地局１が設置されている。なお、管理用コンピュータ３は、通路や部屋などのローカル地図の情報を位置座標データと関連づけて登録したローカルマップ（ローカル地図データ）と、ローカルマップを集積したタスク実行エリアの地図情報であるグローバルマップとを記憶部３ａ（図１参照）に保持している。 Here, as shown in FIG. 3A, the guide area 301 is a rectangular area on the floor where the building is located. The robot R and a person to be guided by the robot R enter the guidance area 301 through the corridor 303 outside the entrance 302 of the guidance area 301. A hall 304 extends inside the entrance / exit 302, and a reception 305 is disposed in the back corner of the hall 304, and a plurality of conference rooms 306 (306a, 306b) partitioned as private rooms on the wall side of the guide area 301 are provided. , 306c). The reception 305 includes an L-shaped counter table 305a and a counter space 305b in which reception staff are arranged. The base station 1 is installed in the counter space 305b. The management computer 3 includes a local map (local map data) in which information on local maps such as passages and rooms is registered in association with position coordinate data, and a global map that is map information of a task execution area in which the local maps are accumulated. Are stored in the storage unit 3a (see FIG. 1).

また、管理用コンピュータ３は、ロボットＲに実行させるタスクに関する情報（タスクデータ）を記憶するタスク情報データベースを記憶部３ａ（図１参照）に保持している。
図４に示すように、タスク情報データベース４００には、タスク毎に割り当てられた固有の識別子であるタスクＩＤ、タスクの優先度、タスクの重要度、タスクを実行させるロボットの識別子であるロボットＩＤ、案内や運搬（荷物配達）などのタスクの内容、タスク実行エリア内におけるタスクを開始する位置（開始位置）、タスク実行エリア内におけるタスクを終了する位置（終了位置）、タスクの実行に要する時間（所要時間）、そしてタスクの開始予定時刻（開始時刻）、タスクの終了予定時刻（終了時刻）、そしてタスクの状態などが、情報項目として含まれている。 Further, the management computer 3 holds a task information database that stores information (task data) related to tasks to be executed by the robot R in the storage unit 3a (see FIG. 1).
As shown in FIG. 4, the task information database 400 includes a task ID that is a unique identifier assigned to each task, a task priority, a task importance, a robot ID that is an identifier of a robot that executes the task, Contents of tasks such as guidance and transportation (package delivery), the position where the task starts within the task execution area (start position), the position where the task ends within the task execution area (end position), and the time required to execute the task ( Time required), scheduled task start time (start time), scheduled task end time (end time), task status, and the like are included as information items.

また、管理用コンピュータ３は、ロボットＲに実行させるタスクの実行計画（タスクスケジュール）を、ロボットＲ毎に設定するものである。
図５に示すように、タスクスケジュールテーブル５００は、ロボットＲに実行させるタスクの実行順位、タスク情報データベース４００（図４参照）に登録されたタスクを特定するためのタスクＩＤ、タスクの優先度、タスクの内容、そしてタスクの状態を情報項目として含むテーブルである。
このタスクスケジュールテーブル５００では、これら情報項目が、タスク実行エリア内に配置されたロボットＲ毎に整理されており、どの様なタスクが、どのような順番で各ロボットＲに割り当てられているのかを把握できるようになっている。 The management computer 3 sets an execution plan (task schedule) for tasks to be executed by the robot R for each robot R.
As shown in FIG. 5, the task schedule table 500 includes an execution order of tasks to be executed by the robot R, a task ID for identifying a task registered in the task information database 400 (see FIG. 4), a task priority, It is a table that includes task contents and task status as information items.
In this task schedule table 500, these information items are arranged for each robot R arranged in the task execution area, and what kind of tasks are assigned to each robot R in what order. It is possible to grasp.

再び、図１に戻って、ロボット制御システムＡの構成の説明を続ける。
端末５は、ネットワーク４を介して管理用コンピュータ３に接続し、管理用コンピュータ３の記憶部３ａに、人物に関する情報などを登録する、もしくは登録されたこれらの情報を修正するものである。また、端末５は、ロボットＲに実行させるタスクの登録や、管理用コンピュータ３において設定されるタスクスケジュールの変更や、ロボットＲの動作命令の入力などを行うものである。 Returning to FIG. 1 again, the description of the configuration of the robot control system A will be continued.
The terminal 5 is connected to the management computer 3 via the network 4 and registers information related to a person in the storage unit 3a of the management computer 3 or corrects the registered information. The terminal 5 is used for registering tasks to be executed by the robot R, changing a task schedule set in the management computer 3, inputting an operation command for the robot R, and the like.

以下、ロボットＲについて詳細に説明する。 Hereinafter, the robot R will be described in detail.

［ロボット］
ロボットＲは、頭部Ｒ１、腕部Ｒ２、脚部Ｒ３、胴部Ｒ４および背面格納部Ｒ５に加えて、これら各部Ｒ１〜Ｒ５の適所に、図６に示すように、カメラＣ，Ｃ、スピーカＳ、マイクＭＣ，ＭＣ、画像処理部１０、音声処理部２０、記憶部３０、主制御部４０、自律移動制御部５０、無線通信部６０、バッテリ７０、対象検知部８０、および周辺状態検知部９０を有する。
さらに、ロボットＲは、ロボットＲの向いている方向を検出するジャイロセンサＳＲ１や、予め設定された地図上におけるロボットＲの存在する位置座標を取得するためのＧＰＳ（Global Positioning System）受信器ＳＲ２を有している。 [robot]
In addition to the head R1, the arm R2, the leg R3, the trunk R4, and the rear housing R5, the robot R includes cameras C and C, speakers as shown in FIG. S, microphone MC, MC, image processing unit 10, audio processing unit 20, storage unit 30, main control unit 40, autonomous movement control unit 50, wireless communication unit 60, battery 70, object detection unit 80, and surrounding state detection unit 90.
Furthermore, the robot R includes a gyro sensor SR1 that detects the direction in which the robot R is facing, and a GPS (Global Positioning System) receiver SR2 that acquires the position coordinates of the robot R on a preset map. Have.

［カメラ］
カメラ（視覚センサ）Ｃ，Ｃは、ロボットＲの前方移動方向側の映像をデジタルデータとして取り込むことができるものであり、例えば、カラーＣＣＤ(Charge-Coupled Device)カメラが使用される。カメラＣ，Ｃは、左右に平行に並んで配置され、撮影した画像は画像処理部１０に出力される。このカメラＣ，Ｃと、スピーカＳおよびマイク（音声入力部）ＭＣ，ＭＣは、いずれも頭部Ｒ１の内部に配設される。スピーカ（音声出力部）Ｓは、音声処理部２０で音声合成された所定の音声を発することができる。 [camera]
Cameras (visual sensors) C and C are capable of taking an image of the robot R in the forward movement direction as digital data, and for example, a color CCD (Charge-Coupled Device) camera is used. The cameras C and C are arranged side by side in parallel on the left and right, and the captured image is output to the image processing unit 10. The cameras C and C, the speaker S, and the microphones (voice input units) MC and MC are all disposed inside the head R1. The speaker (sound output unit) S can emit a predetermined sound synthesized by the sound processing unit 20.

［画像処理部］
画像処理部１０は、カメラＣ，Ｃが撮影した画像（撮影画像）を処理して、撮影された画像からロボットＲの周囲の状況を把握するため、周囲の障害物や人物の認識を行う部分である。この画像処理部１０は、ステレオ処理部１１ａ、移動体抽出部１１ｂおよび顔認識部１１ｃを含んで構成される。この画像処理部１０はカメラＣ，Ｃと共に外部情報取得手段として機能することができる。
ステレオ処理部１１ａは、左右のカメラＣ，Ｃが撮影した２枚の画像の一方を基準としてパターンマッチングを行い、左右の画像中の対応する各画素の視差を計算して視差画像を生成し、生成した視差画像および元の画像を移動体抽出部１１ｂに出力する。なお、この視差は、ロボットＲから撮影された物体までの距離を表すものである。 [Image processing unit]
The image processing unit 10 processes images (captured images) taken by the cameras C and C, and recognizes surrounding obstacles and persons in order to grasp the situation around the robot R from the taken images. It is. The image processing unit 10 includes a stereo processing unit 11a, a moving body extraction unit 11b, and a face recognition unit 11c. The image processing unit 10 can function as an external information acquisition unit together with the cameras C and C.
The stereo processing unit 11a performs pattern matching on the basis of one of the two images taken by the left and right cameras C and C, calculates the parallax of each corresponding pixel in the left and right images, and generates a parallax image. The generated parallax image and the original image are output to the moving object extraction unit 11b. This parallax represents the distance from the robot R to the photographed object.

移動体抽出部１１ｂは、ステレオ処理部１１ａから出力されたデータに基づき、撮影した画像中の移動体を抽出するものである。移動する物体（移動体）を抽出するのは、移動する物体が人物であると推定して、人物の認識をするためである。
移動体の抽出をするために、移動体抽出部１１ｂは、過去の数フレーム（コマ）の画像を記憶しており、最も新しいフレーム（画像）と、過去のフレーム（画像）を比較して、パターンマッチングを行い、各画素の移動量を計算し、移動量画像を生成する。そして、視差画像と、移動量画像とから、カメラＣ，Ｃから所定の距離範囲内で、移動量の多い画素がある場合に、人物があると推定し、その所定距離範囲のみの視差画像として、移動体を抽出し、顔認識部１１ｃへ移動体の画像を出力する。 The moving body extraction unit 11b extracts a moving body in the photographed image based on the data output from the stereo processing unit 11a. The reason for extracting the moving object (moving body) is to recognize the person by estimating that the moving object is a person.
In order to extract the moving object, the moving object extraction unit 11b stores images of several past frames (frames), compares the newest frame (image) with the past frames (images), and Pattern matching is performed, the movement amount of each pixel is calculated, and a movement amount image is generated. Then, from the parallax image and the movement amount image, when there is a pixel with a large movement amount within a predetermined distance range from the cameras C and C, it is estimated that there is a person, and as a parallax image of only the predetermined distance range The moving body is extracted, and an image of the moving body is output to the face recognition unit 11c.

顔認識部１１ｃは、抽出した移動体の一部分の大きさ、形状などから顔領域および顔の位置を認識する。なお、同様にして、抽出した移動体の一部分の大きさ、形状などから手の位置も認識される。
認識された顔の位置は、ロボットＲが移動するときの情報として、また、その人とのコミュニケーションを取るため、主制御部４０に出力される。 The face recognition unit 11c recognizes the face area and the face position from the size, shape, etc. of a part of the extracted moving body. Similarly, the position of the hand is also recognized from the size and shape of a part of the extracted moving body.
The recognized face position is output to the main control unit 40 as information when the robot R moves and to communicate with the person.

［音声処理部］
音声処理部２０は、音声合成部２１ａと、音声認識部２１ｂと、音源定位部２１ｃとを有する。
音声合成部２１ａは、主制御部４０が決定し、出力してきた発話行動の指令に基づき、文字情報（テキストデータ）から音声データを生成し、スピーカＳに合成音声を出力する部分である。音声データの生成には、予め記憶部３０に記憶している文字情報（テキストデータ）と音声データとの対応関係を利用する。なお、音声データは、管理用コンピュータ３から取得され、記憶部３０に保存される。 [Audio processor]
The voice processing unit 20 includes a voice synthesis unit 21a, a voice recognition unit 21b, and a sound source localization unit 21c.
The voice synthesizer 21a is a part that generates voice data from character information (text data) and outputs the synthesized voice to the speaker S based on a speech action command determined and output by the main controller 40. For the generation of the voice data, the correspondence between the character information (text data) stored in the storage unit 30 in advance and the voice data is used. The audio data is acquired from the management computer 3 and stored in the storage unit 30.

音声認識部２１ｂは、マイクＭＣ，ＭＣから音声データ（例えば、人物の音声）が入力され、入力された音声データから文字情報（テキストデータ）を音声認識により生成するものである。具体的には、音声認識部２１ｂは、例えば、隠れマルコフモデル（ＨＭＭ）を用いた音声認識を行い、入力された音声データに含まれる単語毎に音声認識の結果の信頼度を示す単語信頼度を算出する。ここで、単語信頼度は、音声認識の分野では、公知の隠れマルコフモデルを用いてモデル化された単語モデルに当てはめた場合に、その単語であることの確からしさを示す尤度に基づいて算出できる。 The voice recognition unit 21b receives voice data (for example, human voice) from the microphones MC and MC, and generates character information (text data) from the inputted voice data by voice recognition. Specifically, the speech recognition unit 21b performs speech recognition using, for example, a hidden Markov model (HMM), and a word reliability indicating the reliability of the speech recognition result for each word included in the input speech data. Is calculated. Here, in the field of speech recognition, the word reliability is calculated based on the likelihood indicating the certainty of the word when applied to a word model modeled using a known hidden Markov model. it can.

このような単語信頼度を算出する音声認識エンジンとしては、例えば、オープンソースソフトウェアであるＪｕｌｉｕｓ音声認識システムを利用することができる（http://julius.sourceforge.jp/index.php?q=doc/cm.html参照）。 As such a speech recognition engine for calculating the word reliability, for example, Julius speech recognition system which is open source software can be used (http://julius.sourceforge.jp/index.php?q=doc /cm.html).

また、音声認識部２１ｂは、入力された音声データの音量と、その発話時間とを算出する。そして、音声認識部２１ｂは、文字情報（テキストデータ）と、入力された音声データに含まれる全ての単語の単語信頼度と、入力された音声データの音量と、算出した発話時間とを主制御部４０に出力する。 In addition, the voice recognition unit 21b calculates the volume of the input voice data and the utterance time thereof. The voice recognition unit 21b performs main control on the character information (text data), the word reliability of all words included in the input voice data, the volume of the input voice data, and the calculated utterance time. To the unit 40.

この音量は、次のようにして算出することができる。まず、音声認識部２１ｂは、音声データを、例えば１０ｍｓ程度の所定の長さ（フレーム長）の音声フレームに分割する。そして、音声認識部２１ｂは、音声フレームごとのパワースペクトルを算出することにより、その音声フレームにおける音量を求めることができる。さらに、音声認識部２１ｂは、当該発話区間における音量の最大値や平均値を当該発話における音量として算出して用いることができる。
また、発話長は、当該発話区間に含まれる音声フレーム数を計数し、フレーム長に乗ずることにより算出することができる。 This volume can be calculated as follows. First, the voice recognition unit 21b divides the voice data into voice frames having a predetermined length (frame length) of about 10 ms, for example. Then, the voice recognition unit 21b can obtain the volume in the voice frame by calculating the power spectrum for each voice frame. Furthermore, the voice recognition unit 21b can calculate and use the maximum value or the average value of the volume in the utterance section as the volume in the utterance.
The utterance length can be calculated by counting the number of audio frames included in the utterance section and multiplying by the frame length.

この他、音声入力部２１ｂは、変換した音声データを分析し、入力された音声データに音声（人物の音声）が含まれているかどうか、すなわち音声が存在するかどうかを検出してもよい。 In addition, the voice input unit 21b may analyze the converted voice data and detect whether or not voice (person's voice) is included in the input voice data, that is, whether or not voice is present.

音源定位部２１ｃは、マイクＭＣ，ＭＣ間の音圧差および音の到達時間差に基づいて音源位置（ロボットＲが認識する平面状の位置）を特定し、主制御部４０に出力するものである。音源位置は、例えば、ロボットＲの立っている方向（ｚ軸方向）周りの回転角θzで表される。 The sound source localization unit 21 c specifies a sound source position (a planar position recognized by the robot R) based on the sound pressure difference between the microphones MC and MC and the sound arrival time difference, and outputs the sound source position to the main control unit 40. The sound source position is represented by, for example, a rotation angle θz around the direction in which the robot R stands (z-axis direction).

[記憶部]
記憶部３０は、例えば、一般的なハードディスク等から構成され、管理用コンピュータ３から送信された必要な情報（ローカル地図データ、会話用データなど）を記憶するものである。また、記憶部３０は、後記するように、主制御部４０の各種動作を行うために必要な情報を記憶している。 [Memory]
The storage unit 30 is composed of, for example, a general hard disk and stores necessary information (local map data, conversation data, etc.) transmitted from the management computer 3. Further, the storage unit 30 stores information necessary for performing various operations of the main control unit 40, as will be described later.

[主制御部]
主制御部４０は、画像処理部１０、音声処理部２０、記憶部３０、自律移動制御部５０、無線通信部６０、対象検知部８０、および周辺状態検知部９０を統括制御するものである。また、ジャイロセンサＳＲ１、およびＧＰＳ受信器ＳＲ２が検出したデータは、主制御部４０に出力され、ロボットＲの行動を決定するために利用される。この主制御部４０は、例えば、管理用コンピュータ３と通信を行うための制御、管理用コンピュータ３から取得したタスク実行命令に基づいて所定のタスクを実行するための制御、ロボットＲを目的地に移動させるための制御、人物を識別するための制御、人物と対話するための制御を行うために、種々の判断を行ったり、各部の動作のための指令を生成したりする。 [Main control section]
The main control unit 40 controls the image processing unit 10, the sound processing unit 20, the storage unit 30, the autonomous movement control unit 50, the wireless communication unit 60, the target detection unit 80, and the surrounding state detection unit 90. The data detected by the gyro sensor SR1 and the GPS receiver SR2 is output to the main control unit 40 and used to determine the behavior of the robot R. The main control unit 40 includes, for example, control for communicating with the management computer 3, control for executing a predetermined task based on a task execution command acquired from the management computer 3, and the robot R as a destination. In order to perform control for movement, control for identifying a person, and control for interacting with a person, various determinations are made and commands for the operation of each unit are generated.

［自律移動制御部］
自律移動制御部５０は、主制御部４０の指示に従い頭部Ｒ１、腕部Ｒ２、脚部Ｒ３および胴部Ｒ４を駆動するものである。この自律移動制御部５０は、図示を省略するが、頭部Ｒ１の首関節を駆動させる首制御部、腕部Ｒ２の手の先の指関節を駆動させる手制御部、腕部Ｒ２の肩関節、肘関節、手首関節を駆動させる腕制御部、脚部Ｒ３に対して胴部Ｒ４を水平方向に回転駆動させる腰制御部、脚部Ｒ３の股関節、膝関節、足首関節を駆動させる足制御部を有している。これら首制御部、手制御部，腕制御部、腰制御部および足制御部は、頭部Ｒ１、腕部Ｒ２、脚部Ｒ３および胴部Ｒ４を駆動するアクチュエータに駆動信号を出力する。 [Autonomous Movement Control Unit]
The autonomous movement control unit 50 drives the head R1, the arm R2, the leg R3, and the trunk R4 in accordance with instructions from the main control unit 40. Although not shown, the autonomous movement control unit 50 includes a neck control unit that drives the neck joint of the head R1, a hand control unit that drives the finger joint at the tip of the arm R2, and a shoulder joint of the arm unit R2. , Arm control unit for driving the elbow joint and wrist joint, waist control unit for rotating the torso R4 horizontally with respect to the leg R3, foot control unit for driving the hip joint, knee joint and ankle joint of the leg R3 have. The neck control unit, hand control unit, arm control unit, waist control unit, and foot control unit output drive signals to actuators that drive the head R1, arm R2, leg R3, and torso R4.

［無線通信部］
無線通信部６０は、管理用コンピュータ３とデータの送受信を行う通信装置である。無線通信部６０は、公衆回線通信装置６１ａおよび無線通信装置６１ｂを有する。
公衆回線通信装置６１ａは、携帯電話回線やＰＨＳ(Personal Handyphone System)回線などの公衆回線を利用した無線通信手段である。一方、無線通信装置６１ｂは、IEEE802.11b規格に準拠するワイヤレスＬＡＮなどの、近距離無線通信による無線通信手段である。
無線通信部６０は、管理用コンピュータ３からの接続要求に従い、公衆回線通信装置６１ａまたは無線通信装置６１ｂを選択して管理用コンピュータ３とデータ通信を行う。 [Wireless communication part]
The wireless communication unit 60 is a communication device that transmits and receives data to and from the management computer 3. The wireless communication unit 60 includes a public line communication device 61a and a wireless communication device 61b.
The public line communication device 61a is a wireless communication means using a public line such as a mobile phone line or a PHS (Personal Handyphone System) line. On the other hand, the wireless communication device 61b is a wireless communication unit using short-range wireless communication such as a wireless LAN conforming to the IEEE802.11b standard.
The wireless communication unit 60 performs data communication with the management computer 3 by selecting the public line communication device 61 a or the wireless communication device 61 b in accordance with a connection request from the management computer 3.

バッテリ７０は、ロボットＲの各部の動作や処理に必要な電力の供給源である。このバッテリ７０は、充填式の構成をもつものが使用される。ロボットＲは、バッテリ補給エリア（図１参照）でバッテリ７０の充電器に嵌合され、バッテリ充電される。 The battery 70 is a power supply source necessary for the operation and processing of each unit of the robot R. The battery 70 has a rechargeable configuration. The robot R is fitted into the charger of the battery 70 in the battery replenishment area (see FIG. 1) and is charged by the battery.

［対象検知部］
対象検知部８０は、ロボットＲの周囲にタグＴを備える人物が存在するか否かを検知するものである。対象検知部８０は、複数の発光部８１（図６では１つのみ表示した）を備える。これら発光部８１は、例えば、ＬＥＤから構成され、ロボットＲの頭部Ｒ１外周に沿って前後左右などに配設される（図示は省略する）。対象検知部８０は、発光部８１から、各発光部８１を識別する発光部ＩＤを示す信号を含む赤外光をそれぞれ発信すると共に、この赤外光を受信したタグＴから受信報告信号を受信する。いずれかの赤外光を受信したタグＴは、その赤外光に含まれる発光部ＩＤに基づいて、受信報告信号を生成するので、ロボットＲは、この受信報告信号に含まれる発光部ＩＤを参照することにより、当該ロボットＲから視てどの方向にタグＴが存在するかを特定することができる。また、対象検知部８０は、タグＴから取得した受信報告信号の電波強度に基づいて、タグＴまでの距離を特定する機能を有する。したがって、対象検知部８０は、受信報告信号に基づいて、タグＴの位置（距離および方向）を、人物の位置として特定することができる。さらに、対象検知部８０は、発光部８１から赤外光を発光するだけではなく、ロボットＩＤを示す信号を含む電波を図示しないアンテナから発信する。これにより、この電波を受信したタグＴは、赤外光を発信したロボットＲを正しく特定することができる。なお、対象検知部８０およびタグＴについての詳細は、例えば、特開２００６−１９２５６３号公報に開示されている。この対象検知部８０は、外部情報取得手段として機能することができる。 [Target detection unit]
The target detection unit 80 detects whether or not there is a person with the tag T around the robot R. The target detection unit 80 includes a plurality of light emitting units 81 (only one is displayed in FIG. 6). These light emitting units 81 are constituted by LEDs, for example, and are arranged on the front and rear, right and left along the outer periphery of the head R1 of the robot R (not shown). The target detection unit 80 transmits infrared light including a signal indicating a light emitting unit ID for identifying each light emitting unit 81 from the light emitting unit 81 and receives a reception report signal from the tag T that has received the infrared light. To do. The tag T that has received any infrared light generates a reception report signal based on the light emitting unit ID included in the infrared light, so that the robot R determines the light emitting unit ID included in the reception report signal. By referencing, it is possible to specify in which direction the tag T exists as viewed from the robot R. Further, the target detection unit 80 has a function of specifying the distance to the tag T based on the radio wave intensity of the reception report signal acquired from the tag T. Therefore, the target detection unit 80 can specify the position (distance and direction) of the tag T as the position of the person based on the reception report signal. Further, the target detection unit 80 not only emits infrared light from the light emitting unit 81 but also transmits a radio wave including a signal indicating the robot ID from an antenna (not shown). Thus, the tag T that has received the radio wave can correctly identify the robot R that has transmitted infrared light. Details of the target detection unit 80 and the tag T are disclosed in, for example, Japanese Patent Application Laid-Open No. 2006-192563. The target detection unit 80 can function as an external information acquisition unit.

また、タグＴは、それぞれタグＴを備えた人物に対応付けられた固有のタグ識別番号（個人識別情報）を有しており、このタグ識別番号を受信報告信号に含ませてロボットＲに送信する。そして、対象検知部８０は、タグＴから受信したタグ識別番号を主制御部４０に出力する。これによって、ロボットＲは、タグＴから送信された受信報告信号に含まれるタグ識別番号によって、タグＴを備えた人物を特定することができる。 Each tag T has a unique tag identification number (personal identification information) associated with the person with the tag T. The tag identification number is included in the reception report signal and transmitted to the robot R. To do. Then, the target detection unit 80 outputs the tag identification number received from the tag T to the main control unit 40. Thereby, the robot R can specify the person with the tag T by the tag identification number included in the reception report signal transmitted from the tag T.

［周辺状態検知部］
周辺状態検知部９０は、ロボットＲの周辺状態を検知するものであり、ジャイロセンサＳＲ１やＧＰＳ受信器ＳＲ２によって検出された自己位置データを取得可能になっている。また、周辺状態検知部９０は、探索域に向かってスリット光を照射するレーザ照射部９１と、探索域に向かって赤外線を照射する赤外線照射部９２と、スリット光または赤外線が照射された探索域を撮像する床面カメラ９３とを有する。この周辺状態検知部９０は、床面カメラ９３で撮像したスリット光画像（スリット光が照射されたときの画像）を解析して路面状態を検出する。また、周辺状態検知部９０は、床面カメラ９３で撮像した赤外線画像（赤外線が照射されたときの画像）を解析してマークＭ（図２参照）を検出し、検出されたマークＭの位置（座標）からマークＭとロボットＲとの相対的な位置関係を計算する。なお、周辺状態検知部９０についての詳細は、例えば、特開２００６−１６７８４４号公報に開示されている。この周辺状態検知部９０は、外部情報取得手段として機能することができる。 [Ambient condition detector]
The peripheral state detection unit 90 detects the peripheral state of the robot R, and can acquire self-position data detected by the gyro sensor SR1 and the GPS receiver SR2. The peripheral state detection unit 90 includes a laser irradiation unit 91 that irradiates slit light toward the search region, an infrared irradiation unit 92 that irradiates infrared light toward the search region, and a search region irradiated with slit light or infrared rays. And a floor camera 93. The peripheral state detection unit 90 detects a road surface state by analyzing a slit light image (an image when the slit light is irradiated) captured by the floor camera 93. Further, the peripheral state detection unit 90 analyzes the infrared image captured by the floor camera 93 (image when irradiated with infrared rays) to detect the mark M (see FIG. 2), and the position of the detected mark M The relative positional relationship between the mark M and the robot R is calculated from (coordinates). Details of the peripheral state detection unit 90 are disclosed in, for example, Japanese Patent Application Laid-Open No. 2006-167844. The peripheral state detection unit 90 can function as an external information acquisition unit.

［主制御部の構成］
主制御部４０は、静止障害物統合部４１と、オブジェクトデータ統合部４２と、行動パターン部４３と、身振り統合部４４と、内部状態検出部４５と、行動計画管理部４６とを備えている。 [Configuration of main controller]
The main control unit 40 includes a stationary obstacle integration unit 41, an object data integration unit 42, an action pattern unit 43, a gesture integration unit 44, an internal state detection unit 45, and an action plan management unit 46. .

静止障害物統合部４１は、周辺状態検知部９０で検知されたロボットＲの周辺状態に関する情報を統合し、行動パターン部４３に出力するものである。例えば、静止障害物統合部４１が、ロボットＲの進路の床面に段ボール箱などの障害物を検知した場合や、床面の段差を検知した場合には、行動パターン部４３は、この統合された障害物情報に基づいて、図示しない局所回避モジュールによって迂回経路を探索する。 The stationary obstacle integration unit 41 integrates information related to the peripheral state of the robot R detected by the peripheral state detection unit 90 and outputs the information to the behavior pattern unit 43. For example, when the stationary obstacle integration unit 41 detects an obstacle such as a cardboard box on the floor surface of the path of the robot R or detects a step on the floor surface, the behavior pattern unit 43 is integrated. Based on the obstacle information, a bypass route is searched by a local avoidance module (not shown).

オブジェクトデータ統合部４２は、ロボットＲの姿勢データ、画像処理部１０、対象検知部８０および音源定位部２１ｃからの入力データに基づいて、対象物（オブジェクト）に関する識別データ（オブジェクトデータ）を統合し、この統合したオブジェクトデータを記憶部３０のオブジェクトデータ記憶手段３１に出力するものである。これにより、オブジェクトデータ記憶手段３１には、オブジェクトデータをオブジェクト別かつ時刻別に記録したデータであるオブジェクトマップが生成される。 The object data integration unit 42 integrates identification data (object data) related to the object (object) based on the posture data of the robot R, the input data from the image processing unit 10, the target detection unit 80, and the sound source localization unit 21c. The integrated object data is output to the object data storage means 31 of the storage unit 30. As a result, an object map, which is data in which object data is recorded for each object and for each time, is generated in the object data storage unit 31.

行動パターン部４３は、行動パターンを実行するための各種プログラム（モジュール）を格納すると共に、この行動パターンを実行するときに、記憶部３０を参照して、行動パターンに反映するものである。 The behavior pattern unit 43 stores various programs (modules) for executing the behavior pattern, and reflects the behavior pattern by referring to the storage unit 30 when the behavior pattern is executed.

また、行動パターン部４３は、応答行動制御部４７を備える。この応答行動制御部４７は、例えば、ロボットＲが発話を含むタスクを実行する際に行動パターン部４３によって生成され、音声入力の状況や入力された音声の音声認識処理の状況などを、例えば数ミリ秒程度の周期で常時監視して、その時々の状況に応じた行動を実行するものである。なお、応答行動制御部４７の詳細については後記する。 Further, the behavior pattern unit 43 includes a response behavior control unit 47. The response behavior control unit 47 generates, for example, a voice input status or a voice recognition processing status of the input voice generated by the behavior pattern unit 43 when the robot R executes a task including an utterance. It is constantly monitored with a period of about milliseconds, and the action according to the situation at that time is executed. Details of the response behavior control unit 47 will be described later.

本実施形態では、図７に示すように、記憶部３０に、オブジェクトデータ記憶手段３１のほかに、ローカル地図データ記憶手段３２と、ルールＤＢ記憶手段３３と、状況ＤＢ記憶手段３４と、発話情報記憶手段３５とを備えている。 In this embodiment, as shown in FIG. 7, in addition to the object data storage unit 31, the local map data storage unit 32, the rule DB storage unit 33, the situation DB storage unit 34, and the speech information are stored in the storage unit 30. Storage means 35.

ローカル地図データ記憶手段３２は、図３を参照して説明したロボットＲの周辺の地図（ローカル地図）を記憶するものである。このローカル地図は、例えば、管理用コンピュータ３から取得される。 The local map data storage means 32 stores a map (local map) around the robot R described with reference to FIG. This local map is acquired from the management computer 3, for example.

ルールＤＢ記憶手段３３は、各種行動パターンに対応したシナリオ（台本）、状況に応じたルール（応答行動）、ルールを実行するための具体的な動作内容や発話内容等を記憶するものである。また、ルールとは、状況に応じたロボットＲの動作内容や発話内容を示す。ここで、ルールのうち、人物の音声を音声認識できたが人物の音声に対して回答できないことを示すものを回答不能行動と呼ぶ。また、シナリオは、例えば、歩行中に人物や障害物（オブジェクト）に遭遇したときにオブジェクトの１ｍ手前で立ち止まるといったもの、立ち止まってから１０秒後に腕部Ｒ２を所定位置まで上げるといったものなど動作に関するものと、発話に関するものとがある。また、ルールＤＢ記憶手段３３は、所定の発話を行うときに頭部Ｒ１、腕部Ｒ２、脚部Ｒ３および胴部Ｒ４のうちの少なくとも１つの部位を移動させる身体動作である身振りを指定する予め作成されたシナリオを記憶する。なお、ルールＤＢ記憶手段３３の記憶するルールＤＢや動作ＤＢについては後記する。 The rule DB storage unit 33 stores scenarios (scripts) corresponding to various behavior patterns, rules (response behaviors) according to the situation, specific operation contents and utterance contents for executing the rules, and the like. Further, the rule indicates the operation content or utterance content of the robot R according to the situation. Here, a rule indicating that the voice of the person can be recognized but cannot be answered to the voice of the person is called an unanswerable action. Further, the scenario relates to an operation such as, for example, stopping at 1 m before an object when encountering a person or an obstacle (object) while walking, or raising the arm R2 to a predetermined position 10 seconds after stopping. There are things related to utterances. In addition, the rule DB storage unit 33 specifies in advance a gesture that is a body motion that moves at least one of the head R1, the arm R2, the leg R3, and the torso R4 when performing a predetermined utterance. The created scenario is stored. The rule DB and operation DB stored in the rule DB storage unit 33 will be described later.

状況ＤＢ記憶手段３４は、現在状況に関する情報（状況ＤＢ）を記憶するものである。本実施形態では、状況ＤＢは、カメラＣ，Ｃを介して取得された画像を処理する画像処理部１０の処理結果、マイクＭＣ，ＭＣに入力された音声を認識する音声認識部２１ｂの処理結果、対象検知部８０によるタグＴの認識結果等を周囲状況として含む状況を示すデータを格納する。状況ＤＢ記憶手段３４の記憶する情報は、ルールＤＢ記憶手段３３に記憶されたルールの選択時に利用される。この状況ＤＢ記憶手段３４に記憶される状況ＤＢの具体例については後記する。 The situation DB storage unit 34 stores information on the current situation (situation DB). In the present embodiment, the situation DB is the processing result of the image processing unit 10 that processes the images acquired via the cameras C and C, and the processing result of the voice recognition unit 21b that recognizes the voice input to the microphones MC and MC. The data indicating the situation including the recognition result of the tag T by the object detection unit 80 as the surrounding situation is stored. Information stored in the situation DB storage unit 34 is used when a rule stored in the rule DB storage unit 33 is selected. A specific example of the situation DB stored in the situation DB storage unit 34 will be described later.

また、状況ＤＢ記憶手段３４は、人物毎の音声データの平均音量を格納した平均音量ＤＢ（不図示）と、音量履歴データＤＢ（不図示）とを記憶するものである。本実施形態では、個人識別情報を用いて個人識別をおこなうため、平均音量ＤＢは、個人識別情報と、個人識別情報が示す人物の平均音量とを対応付けて格納する。また、この音量履歴データＤＢは、どの人物がどのくらいの音量で何回発話したかを管理するものである。なお、平均音量ＤＢおよび音量履歴データＤＢは、後記する平均音量算出手段４７４により書き込まれる。 The situation DB storage means 34 stores an average volume DB (not shown) that stores the average volume of the voice data for each person, and a volume history data DB (not shown). In this embodiment, since personal identification is performed using personal identification information, the average volume DB stores the personal identification information and the average volume of the person indicated by the personal identification information in association with each other. The volume history data DB manages how many people speak at what volume and how many times. The average sound volume DB and the sound volume history data DB are written by the average sound volume calculation means 474 described later.

なお、本実施形態では、平均音量は人物が装着しているタグＴ（図６参照）を識別するタグ識別番号に基づいて発話者である人物を特定し、この人物を特定するタグ識別番号に対応付けて状況ＤＢに格納されている平均音量を入力するようにしたが、これに限定されるものではない。例えば、オブジェクトデータ統合部４２（図７参照）によって統合される人物を示すオブジェクトデータによって人物を特定し、この人物が発話したときの音量を、この人物を特定するオブジェクトデータに対応付けて状況ＤＢ記憶手段３４にこの人物の音量の履歴として記憶しておく。そして、この人物が再度発話したときに、このオブジェクトデータに対応付けられて状況ＤＢ記憶手段３４に記憶されている過去に発話したときの音量の平均を算出して平均音量を取得するようにしてもよい。
これによって、タグＴ（図６参照）を装着していない人物であっても、２回目以降の発話では、平均音量を参照することができる。
また、ロボットＲ（図６参照）の近傍に複数の人物がいて、各人物のタグ識別番号の認識が困難な場合でも、音源定位部２１ｃ（図６参照）から出力される音源定位情報に基づいて、発話を行った人物のオブジェクトデータを特定することができる。そして、特定したオブジェクトデータに対応付けてこの発話の音量を状況ＤＢ記憶手段３４にこの人物の音量の履歴として記憶しておくことにより、一度発話を行った人物に対しては、タグ識別番号を特定できなくとも、２度目以降の発話の際にはこの人物の平均音量を参照することができる。 In the present embodiment, the average volume is determined by identifying a person who is a speaker based on a tag identification number for identifying a tag T (see FIG. 6) worn by the person and identifying the person. Although the average volume stored in the situation DB is input in association with each other, the present invention is not limited to this. For example, a person is specified by object data indicating a person to be integrated by the object data integration unit 42 (see FIG. 7), and the volume when the person speaks is associated with the object data for specifying the person in the situation DB. The storage unit 34 stores the volume of the person as a history. Then, when this person speaks again, the average sound volume is obtained by calculating the average sound volume when speaking in the past stored in the situation DB storage means 34 in association with the object data. Also good.
Thereby, even a person who does not wear the tag T (see FIG. 6) can refer to the average volume in the second and subsequent utterances.
Even if there are a plurality of persons near the robot R (see FIG. 6) and it is difficult to recognize the tag identification number of each person, the sound source localization information output from the sound source localization unit 21c (see FIG. 6) is used. Thus, it is possible to specify object data of the person who made the utterance. Then, by storing the volume of this utterance in association with the specified object data in the situation DB storage means 34 as a history of the volume of this person, a tag identification number is assigned to the person who has spoken once. Even if it cannot be specified, the average volume of this person can be referred to in the second and subsequent utterances.

発話情報記憶手段３５は、ロボットＲの発話に用いられる情報を記憶するものである。発話情報記憶手段３５は、各種行動パターンに対応したシナリオで定められた会話情報を記憶する。ここで、会話情報は、例えば、挨拶を示す定型文「○○さん、こんにちは」、確認を示す定型文「これを、△△さんへ渡すのですね」等が含まれる。また、発話情報記憶手段３５は、ルールＤＢ記憶手段３３に記憶されたルールを実行時の発話内容の情報等を記憶している。ここで、ルールを実行時の発話内容の情報は、例えば、返事を示す「ハイ」、時刻を示す定型文「□時□分です」等が含まれる。これらの情報（会話用データ）は、例えば、管理用コンピュータ３から送信される。 The utterance information storage means 35 stores information used for the utterance of the robot R. The utterance information storage means 35 stores conversation information defined by scenarios corresponding to various behavior patterns. Here, conversation information is, for example, fixed phrase that indicates the greeting "○○'s, Hello", "this, we pass to △△'s" boilerplate statement that indicates the confirmation is included, and the like. Further, the utterance information storage unit 35 stores information on the utterance contents when executing the rules stored in the rule DB storage unit 33. Here, the utterance content information at the time of executing the rule includes, for example, “high” indicating a reply, a fixed phrase “□ hour □ minute” indicating a time, and the like. These pieces of information (conversation data) are transmitted from the management computer 3, for example.

行動パターン部４３は、オブジェクトデータ記憶手段３１、ローカル地図データ記憶手段３２、ルールＤＢ記憶手段３３、状況ＤＢ記憶手段３４および発話情報記憶手段３５を適宜利用して様々な場面や状況に応じた行動パターンを実行するモジュールを備えている。モジュールの例としては、目的地移動モジュール、局所回避モジュール、デリバリモジュール、案内モジュール、応答行動制御モジュール、人対応モジュール等がある。 The behavior pattern unit 43 uses the object data storage unit 31, the local map data storage unit 32, the rule DB storage unit 33, the situation DB storage unit 34, and the utterance information storage unit 35 as appropriate to perform actions according to various scenes and situations. A module for executing patterns is provided. Examples of the module include a destination movement module, a local avoidance module, a delivery module, a guidance module, a response behavior control module, and a human correspondence module.

目的地移動モジュールは、ロボットＲの現在位置から、例えば、タスク実行エリア内のタスク実行位置等の目的地までの経路探索（例えばノード間の経路を探索）及び移動を行うものである。この目的地移動モジュールは、地図データと現在位置とを参照しつつ、目的地までの最短距離を求める。
局所回避モジュールは、歩行中に障害物が検知されたときに、静止障害物統合部４１で統合された障害物情報に基づいて、障害物を回避する迂回経路を探索するものである。 The destination movement module performs route search (for example, search for a route between nodes) and movement from the current position of the robot R to a destination such as a task execution position in the task execution area. This destination movement module obtains the shortest distance to the destination while referring to the map data and the current position.
The local avoidance module searches for a detour route for avoiding an obstacle based on the obstacle information integrated by the stationary obstacle integration unit 41 when an obstacle is detected during walking.

デリバリモジュールは、荷物配達タスクを実行するときに動作するものであり、物品の運搬を依頼する人物（依頼人）から物品を受け取る（把持する）動作や、受け取った物品を受取人に渡す（物品を手放す）動作を実行するものである。
案内モジュールは、例えば、タスク実行エリア内の案内開始地点に来訪した来訪客を案内領域３０１（図３参照）の受付３０５にいる受付スタッフのもとへ案内するタスクを実行するものである。 The delivery module operates when a package delivery task is executed. The delivery module receives (holds) an article from a person requesting transportation of the article (client), and delivers the received article to the recipient (article). The action is performed.
The guidance module executes, for example, a task for guiding a visitor who has visited a guidance start point in the task execution area to a reception staff in the reception 305 of the guidance area 301 (see FIG. 3).

人対応モジュールは、例えば、物品運搬タスクや案内タスクの実行時に所定のシナリオに基づいて、発話、姿勢の変更、腕部Ｒ２の上下移動や把持等を行うものである。なお、人対応モジュールは、タスクの実行にかかわらず、軽い挨拶やお天気の話題等を、目的をもって意図的に発話することもできる。 The person handling module performs, for example, speech, posture change, vertical movement and gripping of the arm R2 based on a predetermined scenario when executing an article transport task or a guidance task. The person handling module can intentionally speak a light greeting, a weather topic or the like for any purpose regardless of the execution of the task.

また、人対応モジュールには、様々な人に挨拶を行うという動作を実行する出会い応対モジュールや、特定の相手に向けて説明や質疑応答などのサービスを実行するプレゼンＱＡ（プレゼンテーションと質疑応答）モジュールなどのサブモジュールが含まれている。 In addition, the person support module includes an encounter response module that performs operations such as greetings various people, and a presentation QA (presentation and question and answer) module that executes services such as explanations and question-and-answer sessions for specific parties. Submodules are included.

応答行動制御モジュールは、例えば、人対応モジュールなどの発話を含むタスクの実行時において、ロボットＲが入力音声を検出したときに、この入力音声に対する行動を制御するためのモジュールである。応答行動制御モジュールは、このような行動を行う必要があるときに行動パターン部４３によって起動され、起動によって行動パターン部４３に応答行動制御部４７が生成される。応答行動制御部４７の詳細については後記する。 The response behavior control module is a module for controlling the behavior with respect to the input voice when the robot R detects the input voice at the time of executing a task including an utterance such as a human correspondence module. The response behavior control module is activated by the behavior pattern unit 43 when it is necessary to perform such behavior, and the response behavior control unit 47 is generated in the behavior pattern unit 43 by activation. Details of the response behavior control unit 47 will be described later.

身振り統合部４４は、対象とする人物に対して行う発話に対応した身振りをルールＤＢ記憶手段３３から抽出し、抽出した身振りを指定するコマンドを自律移動制御部５０に出力するものである。頭部Ｒ１の動作による身振りは、例えば、頭部Ｒ１を下方に傾けることで「お辞儀」、「礼」、「同意」、「謝罪」等を表示する動作や、頭部Ｒ１を左右に傾ける（かしげる）ことで「分からない」という意思表示を伝える動作が含まれる。また、腕部Ｒ２の動作による身振りは、例えば、腕部Ｒ２を上げることで「喜び」、「賞賛」等を表示する動作や、腕部Ｒ２を下方左右に広げることや握手を行うことで「歓迎」という意思表示を伝える動作が含まれる。また、脚部Ｒ３の動作による身振りは、例えば、その場で駆け足をすることで「喜び」、「元気」等の意思表示を伝える動作が含まれる。 The gesture integration unit 44 extracts gestures corresponding to utterances to be performed on the target person from the rule DB storage unit 33 and outputs a command for designating the extracted gestures to the autonomous movement control unit 50. Gestures by the movement of the head R1, for example, by tilting the head R1 downward, displaying “bowing”, “thanks”, “agreement”, “apology”, etc., and tilting the head R1 left and right ( It includes an action that conveys an intentional expression of “I don't know”. The gesture by the movement of the arm part R2 is, for example, an action of displaying “joy”, “praise”, etc. by raising the arm part R2, expanding the arm part R2 to the left and right, or shaking hands. This includes an action to convey an expression of “welcome”. The gesture by the operation of the leg portion R3 includes, for example, an operation of transmitting intention indications such as “joy” and “goodness” by running on the spot.

内部状態検出部４５は、ロボットＲの内部状態を検出するものである。本実施形態では、内部状態検出部４５は、充電状況（充電器に嵌合されたか否かを示す情報）およびバッテリ７０の残量を検出する。また、内部状態検出部４５は、ロボットＲの状態（現在位置、充電状況、バッテリ残量、タスク実行状況など）に関するデータを所定時間間隔毎にステータス情報として生成する。また、内部状態検出部４５は、生成したステータス情報を無線通信部６０を介して管理用コンピュータ３に出力する。そして、管理用コンピュータ３は、入力されたステータス情報を記憶部３ａに格納された図示しないロボット情報データベースにロボットＲ毎に登録する。 The internal state detection unit 45 detects the internal state of the robot R. In the present embodiment, the internal state detection unit 45 detects the charging status (information indicating whether or not the battery has been fitted) and the remaining amount of the battery 70. Further, the internal state detection unit 45 generates data relating to the state of the robot R (current position, charging state, remaining battery level, task execution state, etc.) as status information at predetermined time intervals. Further, the internal state detection unit 45 outputs the generated status information to the management computer 3 via the wireless communication unit 60. Then, the management computer 3 registers the input status information for each robot R in a robot information database (not shown) stored in the storage unit 3a.

行動計画管理部４６は、行動パターン部４３が備える各種モジュールを所定のスケジュールで実行する行動計画を管理するものである。本実施形態では、行動計画管理部４６は、管理用コンピュータ３から取得したタスク実行命令に基づいて予め定められたタスクを実行するための行動計画を管理し、現在実行すべき作業に必要なモジュールを適宜選択する。 The action plan management unit 46 manages an action plan for executing various modules included in the action pattern unit 43 according to a predetermined schedule. In the present embodiment, the action plan management unit 46 manages an action plan for executing a predetermined task based on a task execution command acquired from the management computer 3, and is a module necessary for work to be currently executed. Is appropriately selected.

［応答行動制御部の構成］
図８に示すように、応答行動制御部４７は、応答行動判定手段（応答行動判定部）４７１と、行動選択手段（応答行動選択部）４７２と、行動指令手段（応答行動指令部）４７３と、平均音量算出手段（平均音量算出部）４７４とを備え、これらによって記憶部３０に記憶された各種の情報やデータに基づいて後記する制御を行う。 [Configuration of Response Action Control Unit]
As shown in FIG. 8, the response behavior control unit 47 includes a response behavior determination unit (response behavior determination unit) 471, a behavior selection unit (response behavior selection unit) 472, a behavior command unit (response behavior command unit) 473, And an average sound volume calculation means (average sound volume calculation unit) 474, which perform later-described control based on various information and data stored in the storage unit 30.

記憶部３０に備えられたルールＤＢ記憶手段３３は、前記したようにルールＤＢ（応答行動リスト）と、動作ＤＢ（動作データベース）とを記憶している。ここで、ルールＤＢおよび動作ＤＢの具体例について図９および図１０を参照して説明する。 The rule DB storage means 33 provided in the storage unit 30 stores the rule DB (response action list) and the action DB (action database) as described above. Here, specific examples of the rule DB and the action DB will be described with reference to FIGS. 9 and 10.

ルールＤＢ（応答行動リスト）９００は、回答不能行動を含めたルール（応答行動）を複数格納したものである。図９に示すように、ルールＤＢ９００は、項目として、ルールＩＤ９０１、ルール内容９０２、回答不能フラグ９０３、および、動作ＩＤ９０４を有している。 The rule DB (response action list) 900 stores a plurality of rules (response actions) including unanswerable actions. As illustrated in FIG. 9, the rule DB 900 includes, as items, a rule ID 901, a rule content 902, an answer impossible flag 903, and an operation ID 904.

ルールＩＤ９０１は、ルールを一意に識別する識別子である。
ルール内容９０２は、ルールの内容を示している。 The rule ID 901 is an identifier that uniquely identifies the rule.
The rule content 902 indicates the content of the rule.

回答不能フラグ９０３は、そのルールが回答不能行動であるか、回答不能行動以外のルールであるか否かを示すフラグである。例えば、ルールＩＤ＝「１」のルールは、回答不能フラグ＝「０」のため、回答不能行動以外のルールであることを示す。また、例えば、ルールＩＤ＝「７」のルールは、回答不能フラグ＝「１」のため、回答不能行動であることを示す。以後、回答不能行動以外のルールを、通常のルールとする。 The unanswerable flag 903 is a flag that indicates whether the rule is an unanswerable action or a rule other than an unanswerable action. For example, the rule with the rule ID = “1” indicates that it is a rule other than the answer impossible action because the answer impossible flag = “0”. Further, for example, the rule with the rule ID = “7” indicates an unanswerable action because the unanswerable flag = “1”. Hereinafter, rules other than the unanswerable behavior are assumed to be normal rules.

動作ＩＤ９０４は、ルール内容９０２と、動作ＤＢ（図１０参照）の動作内容との対応関係を示す。例えば、ルールＩＤ＝「１」のルールは、動作ＤＢ（図１０参照）の動作ＩＤ＝「４」に対応することを示す。これは、動作ＤＢ（図１０参照）において、「驚く（＝びっくりする）」を示している。また、例えば、ルールＩＤ＝「７」のルールは、動作ＤＢ（図１０参照）の動作ＩＤ＝「１１」に対応することを示す。これは、動作ＤＢ（図１０参照）において、「首をかしげて、「ん」と発音する」を示している。 The action ID 904 indicates a correspondence relationship between the rule contents 902 and the action contents of the action DB (see FIG. 10). For example, the rule with the rule ID = “1” indicates that it corresponds to the operation ID = “4” of the operation DB (see FIG. 10). This indicates “surprise (= surprised)” in the operation DB (see FIG. 10). Further, for example, the rule with the rule ID = “7” indicates that it corresponds to the operation ID = “11” of the operation DB (see FIG. 10). This indicates “pronounces“ n ”by raising the neck” in the operation DB (see FIG. 10).

図１０に示すように、動作ＤＢ１０００は、項目として、動作ＩＤ１００１と、動作内容１００２と、可動部の部位の一例として、首１００３、掌１００４、腰１００５、腕１００６および口１００７とを有している。ここで、部位は、例えば、首（頭部Ｒ１）、掌や腕（腕部Ｒ２）、腰（脚部Ｒ３、胴部Ｒ４）、口（スピーカＳ）を指す。
例えば、動作ＩＤ＝「５」は、首、腰および腕を使用することで、「顔や体をターゲットに向けて手を挙げる」という動作を行うことを示す。また、例えば、動作ＩＤ＝「１１」は、首および口を使用することで、「首をかしげて、「ん」と発音する」という動作を行うことを示す。 As shown in FIG. 10, the action DB 1000 includes, as items, an action ID 1001, action contents 1002, and a neck 1003, a palm 1004, a waist 1005, an arm 1006, and a mouth 1007 as an example of a movable part. Yes. Here, the site refers to, for example, the neck (head R1), palm or arm (arm R2), waist (leg R3, torso R4), mouth (speaker S).
For example, the action ID = “5” indicates that the action “raise the hand with the face or body toward the target” is performed by using the neck, the waist, and the arm. Further, for example, the operation ID = “11” indicates that the operation “pronounces“ n ”by raising the neck” is performed by using the neck and mouth.

また、動作ＩＤ＝「７」，「８」については、詳細は図示していないが、自律移動制御部５０によって動かす腕部Ｒ２の関節の自由度や各関節の回転角度ごとに動作を定めたため、異なる動作ＩＤを付与した。ここで、関節の自由度は、関節を前後方向に曲げる、上下方向に曲げる、回転させる等の動きを示す。なお、腕部Ｒ２以外の部位にも同様に設定できる。
また、動作ＩＤ＝「９」，「１０」については、詳細は図示していないが、音声合成部２１ａによって合成する音声の音量ごとに、異なる動作ＩＤを付与した。なお、図１０に示した動作以外に、例えば、「腰をひねる」、「腕をぶらぶら振る」、「手指を閉じたり開いたりする」、「把持した旗等の物品を振る」等の動作を含んでもよい。 Further, although details are not shown for the operation ID = “7” and “8”, the operation is determined for each degree of freedom of the joint of the arm R2 to be moved by the autonomous movement control unit 50 and the rotation angle of each joint. Different operation IDs were assigned. Here, the degree of freedom of the joint indicates movement such as bending the joint in the front-rear direction, bending in the up-down direction, and rotating. In addition, it can set similarly also in parts other than arm part R2.
The operation ID = “9”, “10” is not shown in detail, but a different operation ID is assigned to each sound volume synthesized by the speech synthesizer 21a. In addition to the operations shown in FIG. 10, for example, operations such as “twisting hips”, “waving arms hanging”, “closing or opening fingers”, “waving articles such as gripped flags”, etc. May be included.

図８に戻って、応答行動制御部４７の構成の説明を続ける。
記憶部３０に備えられた状況ＤＢ記憶手段３４は、前記したように状況ＤＢを記憶している。状況ＤＢは、画像処理または音声認識処理の結果を周囲状況として含む状況を示すデータを格納したものである。本実施形態では、状況ＤＢは、周囲状況のほかに、内部状況を示すデータとしてバッテリを充電中であるか否かを示すデータを格納している。 Returning to FIG. 8, the description of the configuration of the response behavior control unit 47 is continued.
The situation DB storage means 34 provided in the storage unit 30 stores the situation DB as described above. The situation DB stores data indicating a situation including the result of image processing or voice recognition processing as a surrounding situation. In the present embodiment, the situation DB stores data indicating whether or not the battery is being charged as data indicating the internal situation in addition to the surrounding situation.

ここで、状況ＤＢの具体例について図１１を参照して説明する。
図１１に示すように、状況ＤＢ１１００は、現在状況の項目として、状況ＩＤ１１０１、状況内容１１０２、現在表示１１０３を有している。現在表示１１０３は、状況内容１１０２の現在値を示すものである。ここで、ｏｎは「１」を示し、ｏｆｆは「０」を示すこととした。現在表示１１０３の項目への情報の書き込みは、オブジェクトデータ統合部４２や内部状態検出部４５が行う。 Here, a specific example of the situation DB will be described with reference to FIG.
As shown in FIG. 11, the situation DB 1100 has a situation ID 1101, a situation content 1102, and a current display 1103 as items of the current situation. The current display 1103 indicates the current value of the situation content 1102. Here, “on” indicates “1”, and “off” indicates “0”. Information is written to the item of the current display 1103 by the object data integration unit 42 and the internal state detection unit 45.

具体的には、オブジェクトデータ統合部４２は、図１１に示す状況ＩＤ＝「０」，「３」で示される状況について、画像処理部１０からの画像処理結果に基づいて、該当する情報を状況ＤＢに書き込む。なお、状況ＩＤ＝「３」に示す「接近してくる人物がいる」という状況を認識する方法は、例えば、画像処理部１０による画像処理の結果、認識された顔画像の画素数が増加した場合にそのように認識することができる。また、オブジェクトデータ統合部４２は、例えば、状況ＩＤ＝「１」，「２」で示される状況について、対象検知部８０からの入力データに基づいて、該当する情報を状況ＤＢに書き込む。オブジェクトデータ統合部４２は、例えば、状況ＩＤ＝「４」，「５」で示される状況について、音声がマイクＭＣ，ＭＣから入力したときに状況ＤＢに書き込む。オブジェクトデータ統合部４２は、例えば、状況ＩＤ＝「７」で示される状況について、音源定位部２１ｃからの入力データに基づいて、該当する情報を状況ＤＢに書き込む。オブジェクトデータ統合部４２は、例えば、状況ＩＤ＝「８」で示される状況について、オブジェクトデータ記憶手段３１（図７参照）に記憶されたオブジェクトマップに基づいて、該当する情報を状況ＤＢに書き込む。例えば、状況ＩＤ＝「６」で示される状況について、内部状態検出部４５が該当する情報を状況ＤＢに書き込む。 Specifically, the object data integration unit 42 sets the corresponding information on the situation indicated by the situation IDs “0” and “3” shown in FIG. 11 based on the image processing result from the image processing unit 10. Write to DB. Note that the method for recognizing the situation “there is a person approaching” indicated by the situation ID = “3” is, for example, the number of pixels of the recognized face image increased as a result of the image processing by the image processing unit 10. Can be recognized as such. Further, the object data integration unit 42 writes the corresponding information in the situation DB based on the input data from the target detection unit 80 for the situation indicated by the situation ID = “1”, “2”, for example. The object data integration unit 42 writes, for example, the situation indicated by the situation ID = “4” and “5” in the situation DB when voice is input from the microphones MC and MC. For example, for the situation indicated by the situation ID = “7”, the object data integration unit 42 writes the corresponding information in the situation DB based on the input data from the sound source localization unit 21c. For example, for the situation indicated by situation ID = “8”, the object data integration unit 42 writes corresponding information in the situation DB based on the object map stored in the object data storage unit 31 (see FIG. 7). For example, for the situation indicated by situation ID = “6”, the internal state detection unit 45 writes the corresponding information in the situation DB.

図８に戻って、応答行動制御部４７の構成の説明を続ける。
応答行動判定手段４７１は、音声処理部２０から、音声データを音声認識した文字情報と、音声データに含まれる全ての単語の単語信頼度と、その音声データの音量と、その音声データの発話時間とが入力される。また、応答行動判定手段４７１は、対象検知部８０から個人識別情報が入力される。そして、応答行動判定手段４７１は、この単語信頼度から評価値を算出し、評価値と閾値未満とを比較して回答不能行動を行うか否かを判定する。さらに、応答行動判定手段４７１は、入力された文字情報と、回答不能行動を行うか否かの判定結果とを行動選択手段４７２に出力する。以下、評価値の算出について、４つの具体例を説明する。なお、この評価値とは、回答不能行動を行うか否かの判定基準となる値である。 Returning to FIG. 8, the description of the configuration of the response behavior control unit 47 is continued.
The response behavior determination means 471 receives from the voice processing unit 20 character information obtained by voice recognition of voice data, word reliability of all words included in the voice data, volume of the voice data, and speech time of the voice data. Are entered. The response behavior determination unit 471 receives personal identification information from the target detection unit 80. Then, the response action determination unit 471 calculates an evaluation value from this word reliability, and compares the evaluation value with a value less than the threshold value to determine whether or not to perform an unanswerable action. Furthermore, the response action determination unit 471 outputs the input character information and the determination result as to whether or not to perform an answer impossible action to the action selection unit 472. Hereinafter, four specific examples of calculating the evaluation value will be described. Note that this evaluation value is a value that is a criterion for determining whether or not to perform an unanswerable action.

＜第１例：単語信頼度＞
ここで、例えば、正常に音声認識された「、開発のエピソードを教えて。」について、各単語の単語信頼度が、単語「、」の単語信頼度「０．４２７」、単語「開発」の単語信頼度「０．９７８」、単語「の」の単語信頼度「０．３４２」、単語「エピソード」の単語信頼度「０．２３５」、単語「を」の単語信頼度「０．１０６」、単語「教え」の単語信頼度「０．４８７」、単語「て」の単語信頼度「０．１７４」、単語「。」の単語信頼度「０．９３５」であったとする。また、例えば、閾値が「０．３」であったとする。この場合、応答行動判定手段４７１は、各単語の単語信頼度の平均値「０．４６０５」を算出し、この平均値を評価値とする。そして、応答行動判定手段４７１は、評価値「０．４６０５」が閾値「０．３」未満でないため、回答不能行動を行わないと判定する。 <First example: word reliability>
Here, for example, with respect to “Tell me about the episode of development” that has been successfully voice-recognized, the word reliability of each word is the word reliability “0.427” of the word “,” and the word “development”. Word reliability “0.978”, word “no”, word reliability “0.342”, word “episode”, word reliability “0.235”, word “o”, word reliability “0.106” , The word reliability “0.487” of the word “teach”, the word reliability “0.174” of the word “te”, and the word reliability “0.935” of the word “.”. For example, assume that the threshold is “0.3”. In this case, the response action determination unit 471 calculates an average value “0.4605” of the word reliability of each word, and uses this average value as an evaluation value. Then, the response behavior determination unit 471 determines that the response impossible behavior is not performed because the evaluation value “0.4605” is not less than the threshold value “0.3”.

また、別の例として、音声データ「へー、そうなの。」が未知語のため、誤って「ふーん。拾うの。」と音声認識された場合について説明する。この場合、各単語の単語信頼度が、単語「ふーん」の単語信頼度「０．２３３」、単語「。」の単語信頼度「０．０２４」、単語「拾う」の単語信頼度「０．２３９」、単語「の」の単語信頼度「０．０５２」、単語「。」の単語信頼度「０．３５２」であったとする。また、例えば、閾値が「０．３」であったとする。この場合、応答行動判定手段４７１は、各単語の単語信頼度の平均値「０．１８」を算出し、この平均値を評価値とする。そして、応答行動判定手段４７１は、評価値「０．１８」が閾値「０．３」未満であるため、回答不能行動を行うと判定する。 As another example, a case will be described in which the voice data “Hey, yes” is an unknown word and is erroneously recognized as “Fun. In this case, the word reliability of each word is the word reliability “0.233” of the word “Fun”, the word reliability “0.024” of the word “.”, And the word reliability “0. 239 ”, the word reliability“ 0.05 ”, and the word reliability“ 0.352 ”of the word“. ”. For example, assume that the threshold is “0.3”. In this case, the response behavior determination unit 471 calculates an average value “0.18” of the word reliability of each word, and uses this average value as an evaluation value. Then, the response action determination unit 471 determines that an unanswerable action is performed because the evaluation value “0.18” is less than the threshold value “0.3”.

このように、誤って音声認識されたときには、個々の単語の単語信頼度には高いものが含まれることもあるが、平均スコアは比較的低くなる。逆に、正しく音声認識できたときには、個々の単語の中には低い単語信頼度のものも含まれるが、平均スコアは比較的高くなる。そのため、平均スコアを用いることで、より精度よく音声認識の成否を判定することができる。 As described above, when the voice is recognized by mistake, the word reliability of each word may be high, but the average score is relatively low. On the contrary, when the voice can be recognized correctly, some words have low word reliability, but the average score is relatively high. Therefore, the success or failure of speech recognition can be determined with higher accuracy by using the average score.

なお、本実施形態では、単語信頼度の平均スコアに基づいて、回答要否の判定を行うようにしたが、これに限定されるものではなく、音声認識の結果の信頼度を示す他の評価値（スコア）に基づいて判定してもよい。 In the present embodiment, the necessity of answer is determined based on the average score of word reliability. However, the present invention is not limited to this, and other evaluations indicating the reliability of the result of speech recognition. You may determine based on a value (score).

＜第２例：単語信頼度と平均音量とを考慮＞
ここで、応答行動判定手段４７１は、対象検知部８０から入力された個人識別情報を検索条件として平均音量ＤＢを検索し、平均音量ＤＢから、コミュニケーションを行っている人物の平均音量を抽出する。また、応答行動判定手段４７１は、音声処理部２０から入力された音声データの音量（つまり、現在の音量）を、平均音量ＤＢから抽出した平均音量で除算して音量係数を算出する。そして、応答行動判定手段４７１は、第１例の単語信頼度の平均値に音量係数を乗算して、評価値を算出する。つまり、応答行動判定手段４７１は、下記の式（１）および式（２）を用いて、評価値を算出する。 <Second example: Considering word reliability and average volume>
Here, the response behavior determination unit 471 searches the average sound volume DB using the personal identification information input from the target detection unit 80 as a search condition, and extracts the average sound volume of the person performing communication from the average sound volume DB. Further, the response behavior determination unit 471 calculates the volume coefficient by dividing the volume of the voice data input from the voice processing unit 20 (that is, the current volume) by the average volume extracted from the average volume DB. Then, the response action determination unit 471 calculates the evaluation value by multiplying the average value of the word reliability of the first example by the volume coefficient. That is, the response behavior determination unit 471 calculates the evaluation value using the following formula (1) and formula (2).

評価値＝単語信頼度の平均値×音量係数・・・式（１）
音量係数＝現在の音量／平均音量・・・式（２）
なお、音量の単位は、例えば、デシベル（ｄＢ）である。 Evaluation value = average value of word reliability × volume coefficient (1)
Volume coefficient = current volume / average volume (2)
The unit of volume is, for example, decibel (dB).

ここで、音量係数について説明する。
ロボットＲ（図６参照）が発話中に対話対象となる人物が発話する場合、音量が小さい場合は回答が不要な相槌や独り言などであることが多く、音量が大きい場合は回答が必要な質問や要求などであることが多い。そこで、応答行動判定手段４７１は、音量の大小で平均スコアを補正するための音量係数を、式（２）によって算出している。 Here, the volume coefficient will be described.
When a robot R (refer to FIG. 6) speaks while a person to be talked speaks, when the volume is low, there are many cases where the answer is not necessary, and when the volume is high, a question that requires an answer It is often a request. Therefore, the response behavior determination unit 471 calculates a volume coefficient for correcting the average score based on the volume level using the equation (2).

＜第３例：単語信頼度と発話時間とを考慮＞
ここで、応答行動判定手段４７１は、対象検知部８０から入力された発話時間を発話基準時間で除算して発話時間係数を算出する。そして、応答行動判定手段４７１は、第１例の単語信頼度の平均値に発話時間係数を乗算して、評価値を算出する。つまり、応答行動判定手段４７１は、下記の式（３）および式（４）を用いて、評価値を算出する。なお、発話基準時間は、人物の相槌等、ロボットＲが回答する必要がない音声データの分布から、例えば、１．５秒と予め設定される。 <Third example: Considering word reliability and utterance time>
Here, the response action determination unit 471 calculates the utterance time coefficient by dividing the utterance time input from the target detection unit 80 by the utterance reference time. Then, the response behavior determination unit 471 calculates an evaluation value by multiplying the average value of the word reliability of the first example by the utterance time coefficient. That is, the response behavior determination unit 471 calculates the evaluation value using the following formulas (3) and (4). Note that the utterance reference time is set in advance to, for example, 1.5 seconds from the distribution of voice data that the robot R does not need to answer, such as a person's conflict.

評価値＝単語信頼度の平均値×発話時間係数・・・式（３）
発話時間係数＝発話時間／発話基準時間・・・式（４）
なお、時間の単位は、例えば、秒である。 Evaluation value = average value of word reliability × speaking time coefficient (3)
Utterance time coefficient = Utterance time / Utterance reference time (4)
The unit of time is, for example, second.

ここで、発話時間係数について説明する。
ロボットＲ（図６参照）が発話中に対話対象となる人物が発話する場合、発話長が短い場合は回答が不要な相槌や独り言などであることが多く、発話長が長い場合は回答が必要な質問や要求などであることが多い。そこで、応答行動判定手段４７１は、発話長の長短で平均スコアを補正するための発話時間係数を、式（４）によって算出している。 Here, the speech time coefficient will be described.
When the robot R (see FIG. 6) speaks while the person to be talked is speaking, if the utterance length is short, there are many cases where the answer is not necessary, the answer is not necessary, and if the utterance length is long, the reply is necessary. This is often a question or request. Therefore, the response behavior determination unit 471 calculates an utterance time coefficient for correcting the average score according to the length of the utterance length, using Expression (4).

＜第４例：単語信頼度と平均音量と発話時間とを考慮＞
ここで、応答行動判定手段４７１は、第１例の単語信頼度の平均値に第２例の音量係数（式（２）参照）と第３例の発話時間係数（式（４）参照）とを乗算して、評価値を算出する。つまり、応答行動判定手段４７１は、下記の式（５）を用いて、評価値を算出する。 <Fourth example: Considering word reliability, average volume and utterance time>
Here, the response behavior determination unit 471 includes the average value of the word reliability of the first example, the volume coefficient of the second example (see formula (2)), and the utterance time coefficient of the third example (see formula (4)). Is multiplied to calculate an evaluation value. That is, the response action determination unit 471 calculates an evaluation value using the following equation (5).

評価値＝単語信頼度の平均値×音量係数×発話時間係数・・・式（５） Evaluation value = average value of word reliability × volume coefficient × speech time coefficient (5)

なお、応答行動判定手段４７１は、前記した何れの方法で評価値を算出しても良く、どの方法で評価値を算出するか予め設定できるようにしておいても良い。 Note that the response behavior determination unit 471 may calculate the evaluation value by any of the methods described above, and may be set in advance by which method the evaluation value is calculated.

応答行動判定手段４７１が判定に用いる閾値は、予め定められた値が設定される。以下、図１２を参照して、閾値の設定手法の一例について説明する。まず、回答が必要な音声と、回答が不必要な音声とについての正解付き教示データを準備する。ここで、正解付き教示データは、例えば、人物の属性（性別、年齢）、発話内容（語彙）、音量、および、発話時間が異なる複数のデータである。そして、それぞれの正解付き教示データにおいて、スコアと頻度（データ数）とを求め、閾値を設定する。例えば、図１２に実線で図示した回答が必要な正解付き教示データＤ２と、図１２に破線で図示した回答が不必要な正解付き教示データＤ１との交点αを求める。さらに、この交点αに対応するスコアを閾値とする。これによって、応答行動判定手段４７１は、要否何れかの方に誤判定が偏ることなく、精度よく要否判定を行うことができる。 A predetermined value is set as the threshold used by the response behavior determination unit 471 for the determination. Hereinafter, an example of a threshold setting method will be described with reference to FIG. First, teaching data with correct answers for a voice that requires an answer and a voice that does not require an answer is prepared. Here, the teaching data with correct answer is, for example, a plurality of pieces of data having different attributes (gender, age), utterance contents (vocabulary), volume, and utterance time. And in each teaching data with a correct answer, a score and frequency (data number) are calculated | required and a threshold value is set. For example, the intersection α of the teaching data D2 with correct answer that requires an answer illustrated by a solid line in FIG. 12 and the teaching data D1 with correct answer that does not require an answer illustrated by a broken line in FIG. 12 is obtained. Further, a score corresponding to the intersection α is set as a threshold value. Accordingly, the response behavior determination unit 471 can perform the necessity determination with high accuracy without biasing the erroneous determination toward the necessity.

図８に戻って、応答行動制御部４７の構成の説明を続ける。
行動選択手段４７２は、応答行動判定手段４７１から文字情報と判定結果とが入力されると共に、この判定結果に応じた行動を選択する。ここで、行動選択手段４７２は、例えば、この判定結果に応じて、回答不能フラグを用いて、図９のルールＤＢから応答行動を選択する。具体的には、行動選択手段４７２は、回答不能行動を行うという判定結果が入力された場合、ルールＤＢを参照し、回答不能フラグ＝「１」のルールを選択する。そして、行動選択手段４７２は、選択したルールのルールＩＤ（例えば、ルールＩＤ＝「７」）を行動指令手段４７３に出力する。 Returning to FIG. 8, the description of the configuration of the response behavior control unit 47 is continued.
The action selection unit 472 receives the character information and the determination result from the response action determination unit 471 and selects an action according to the determination result. Here, for example, the action selecting unit 472 selects a response action from the rule DB of FIG. 9 using an answer impossible flag in accordance with the determination result. Specifically, when a determination result indicating that an unreplyable action is performed is input, the action selecting unit 472 refers to the rule DB and selects a rule with an unanswerable flag = “1”. Then, the action selection unit 472 outputs the rule ID (for example, rule ID = “7”) of the selected rule to the action command unit 473.

また、行動選択手段４７２は、回答不能行動を行わないという判定結果が入力された場合、入力された音声に対して回答を行う行動を選択する。この場合、行動選択手段４７２は、人物に対して回答する行動を示す情報と、入力された文字情報とを行動指令手段４７３に出力する。 Moreover, the action selection means 472 selects the action which answers with respect to the input audio | voice, when the determination result that an answer impossible action is not performed is input. In this case, the action selection unit 472 outputs information indicating an action to be answered to the person and the input character information to the action command unit 473.

また、行動選択手段４７２は、判定結果が入力されない場合（つまり、コミュニケーションを行っていない場合）、通常のルールを選択してもよい。例えば、行動選択手段４７２は、ルールＤＢおよび状況ＤＢを参照し、回答不能フラグ＝「０」のルール（通常のルール）を選択する。具体的には、行動選択手段４７２は、状況ＤＢを参照して、人がいる（状況ＩＤ＝「２」）という状況の場合には、人がいたとき一番近い人を見る（ルールＩＤ＝「３」）を選択する。また、例えば、行動選択手段４７２は、状況ＤＢを参照して、小さな音がした（状況ＩＤ＝「４」）という状況の場合には、小さな音がした方を見る（ルールＩＤ＝「４」）を選択する。そして、行動選択手段４７２は、選択したルールのルールＩＤ（例えば、ルールＩＤ＝「３」または「４」）を行動指令手段４７３に出力する。 Moreover, the action selection means 472 may select a normal rule when the determination result is not input (that is, when communication is not performed). For example, the action selection unit 472 refers to the rule DB and the situation DB, and selects a rule (ordinary rule) with an unanswerable flag = “0”. Specifically, the action selection means 472 refers to the situation DB and, in the situation where there is a person (situation ID = “2”), sees the closest person when the person is present (rule ID = “3”) is selected. In addition, for example, the action selection unit 472 refers to the situation DB, and in the situation where a small sound is made (situation ID = “4”), the action selection unit 472 sees the person making the small sound (rule ID = “4”). ) Is selected. Then, the action selection unit 472 outputs the rule ID of the selected rule (for example, rule ID = “3” or “4”) to the action command unit 473.

行動指令手段４７３は、行動選択手段４７２が選択した行動の実行を、音声処理部２０、行動パターン部４３および身振り統合部４４を介して、スピーカＳおよび可動部の少なくとも一方に指令するものである。その一例としては、行動選択手段４７２が回答不能行動を選択した場合（例えば、ルールＩＤ＝「７」）、行動指令手段４７３は、ルールＤＢを参照し、そのルールＩＤを有するルールについて、その動作ＩＤ（例えば、動作ＩＤ＝「１１」）を抽出する。そして、行動指令手段４７３は、動作ＤＢを参照し、動作ＩＤ＝「１１」で定義されている「首」および「口」に、回答不能行動を行わせる動作指令を出力する。これにより、身振り統合部４４は、頭部Ｒ１に首をかしげる動作（身振り）を実行させる。また、音声処理部２０は、発話情報記憶手段３５に記憶された音声「ん」をスピーカＳから発話させる。 The behavior command unit 473 commands the execution of the behavior selected by the behavior selection unit 472 to at least one of the speaker S and the movable unit via the voice processing unit 20, the behavior pattern unit 43, and the gesture integration unit 44. . As an example, when the action selection unit 472 selects an unreplyable action (for example, rule ID = “7”), the action command unit 473 refers to the rule DB and operates the rule having the rule ID. ID (for example, operation ID = “11”) is extracted. Then, the action command unit 473 refers to the action DB, and outputs an action command for causing the “neck” and “mouth” defined by the action ID = “11” to perform an unanswerable action. Accordingly, the gesture integration unit 44 causes the head R1 to perform an action (gesture) of raising the neck. In addition, the voice processing unit 20 causes the speaker S to utter the voice “n” stored in the utterance information storage unit 35.

また、例えば、行動選択手段４７２が人物に対して回答するという行動を選択した場合、行動指令手段４７３は、入力された文字情報に対する適切な回答を発話情報記憶手段３５から抽出する。そして、行動指令手段４７３は、抽出した回答（音声）の発話を音声処理部２０に指令する。これにより、音声処理部２０は、発話情報記憶手段３５に記憶された回答（音声）をスピーカＳから発話させる。 For example, when the action selection unit 472 selects an action of answering a person, the action command unit 473 extracts an appropriate answer for the input character information from the utterance information storage unit 35. Then, the behavior command unit 473 commands the voice processing unit 20 to utter the extracted answer (voice). Thus, the voice processing unit 20 causes the answer (voice) stored in the utterance information storage unit 35 to utter from the speaker S.

また、例えば、行動選択手段４７２が通常のルールを選択した場合、行動指令手段４７３は、ルールＤＢを参照し、そのルールＩＤを有するルールについて、その動作ＩＤを抽出する。そして、行動指令手段４７３は、動作ＤＢを参照し、その動作ＩＤで定義されている部位にそのルールに対応する動作指令を行う。例えば、行動指令手段４７３は、ルールＩＤ＝「３」の場合、動作ＤＢ＝「５」なので、「首」、「腰」および「腕」に、「頭や体をターゲットに向ける」というルールを行わせる動作指令を出力する。 For example, when the action selection unit 472 selects a normal rule, the action command unit 473 refers to the rule DB and extracts the operation ID of the rule having the rule ID. Then, the action command unit 473 refers to the action DB, and issues an action command corresponding to the rule to the part defined by the action ID. For example, when the rule ID = “3”, the action command unit 473 has the action DB = “5”, and therefore the rule “turn the head or body toward the target” is applied to “neck”, “waist”, and “arm”. The operation command to be performed is output.

平均音量算出手段４７４は、音声処理部２０から音声データの音量が入力されると共に、対象検知部８０から個人識別情報が入力される。そして、平均音量算出手段４７４は、この音声データの音量と個人識別情報とを用いて、人物毎に音声データの平均音量を算出する。ここで、例えば、平均音量算出手段４７４は、音声処理部２０から音声データの音量が入力される都度、その音声データの音量と、その音声データを発話した人物の個人識別情報とを対応付けて音量履歴データＤＢに格納する。そして、平均音量算出手段４７４は、人物毎に、音量履歴データＤＢに格納された音量の累計値を発話回数で除算して平均音量を算出する。さらに、平均音量算出手段４７４は、この人物の個人識別情報と、人物の平均音量とを対応付けて平均音量ＤＢに書き込む。 The average sound volume calculation means 474 receives the sound data volume from the sound processing unit 20 and personal identification information from the target detection unit 80. Then, the average volume calculation means 474 calculates the average volume of the audio data for each person using the volume of the audio data and the personal identification information. Here, for example, every time the volume of the voice data is input from the voice processing unit 20, the average volume calculation means 474 associates the volume of the voice data with the personal identification information of the person who spoke the voice data. Store in the volume history data DB. Then, the average volume calculation unit 474 calculates the average volume for each person by dividing the total volume value stored in the volume history data DB by the number of utterances. Further, the average volume calculation means 474 writes the personal identification information of the person and the average volume of the person in association with each other in the average volume DB.

［ロボットの動作］
図１３を参照し、ロボットＲの動作として、応答行動制御部４７の動作を中心に説明する（適宜図８参照）。また、図１３では、前記した第１例の方法で評価値を計算することとして説明する。 [Robot motion]
With reference to FIG. 13, the operation of the robot R will be described focusing on the operation of the response behavior control unit 47 (see FIG. 8 as appropriate). In FIG. 13, the evaluation value is calculated by the method of the first example described above.

まず、応答行動制御部４７は、応答行動判定手段４７１に、音声処理部２０から音声データに含まれる全ての単語の単語信頼度が入力される（ステップＳ１）。また、応答行動制御部４７は、応答行動判定手段４７１によって、単語信頼度の平均を算出して、その平均値を評価値として算出する（ステップＳ２）。そして、応答行動制御部４７は、応答行動判定手段４７１によって、評価値が閾値未満であるか否かを判定する（ステップＳ３）。 First, the response behavior control unit 47 receives the word reliability of all words included in the speech data from the speech processing unit 20 to the response behavior determination unit 471 (step S1). Further, the response behavior control unit 47 calculates the average word reliability by the response behavior determination means 471 and calculates the average value as an evaluation value (step S2). And the response action control part 47 determines whether an evaluation value is less than a threshold value by the response action determination means 471 (step S3).

評価値が閾値未満でない場合（ステップＳ３でＮｏ）、応答行動制御部４７は、行動選択手段４７２によって、人物に対して回答するという行動を選択する（ステップＳ４）。
一方、評価値が閾値未満の場合（ステップＳ３でＹｅｓ）、応答行動制御部４７は、行動選択手段４７２によって、回答不能行動を選択する（ステップＳ５）。 When the evaluation value is not less than the threshold value (No in Step S3), the response behavior control unit 47 selects the behavior of answering the person by the behavior selection unit 472 (Step S4).
On the other hand, when the evaluation value is less than the threshold value (Yes in Step S3), the response behavior control unit 47 selects an unreplyable behavior by the behavior selection means 472 (Step S5).

ステップＳ４またはステップＳ５の処理に続いて、応答行動制御部４７は、行動指令手段４７３によって、行動選択手段４７２が選択した行動（回答不能行動または人物に対して回答するという行動）の実行を指令する（ステップＳ６）。 Subsequent to the processing of step S4 or step S5, the response action control unit 47 instructs the action command means 473 to execute the action selected by the action selection means 472 (behavior that cannot be answered or action that answers a person). (Step S6).

本実施形態においては、ロボットＲは、未知語データベースおよび複雑な対話処理を必要とせずに、単純な閾値判定で回答不能行動を行うか否かを判定するため、ロボットＲの構成を簡易にすることができる。また、ロボットＲは、評価値が低い場合には回答不能行動を選択する。このとき、ロボットＲとコミュニケーションしている人物は、ロボットＲが回答不能行動を行うことによって、自らの音声（例えば、何らかの質問）にロボットＲが回答できないことを容易に把握できる。従って、この人物は、例えば、同じ音声を大きな声で遅い速度で発音する、その音声の内容を変更する等、ロボットＲに対して再度コミュニケーションを試みる可能性が高い。これによって、ロボットＲは、人物の音声に対して適切に回答できる可能性が高くなり、コミュニケーションを継続しやすくできる。 In the present embodiment, since the robot R determines whether or not to perform an unanswerable action by simple threshold determination without requiring an unknown word database and complicated interaction processing, the configuration of the robot R is simplified. be able to. Further, the robot R selects an unanswerable action when the evaluation value is low. At this time, a person communicating with the robot R can easily grasp that the robot R cannot answer his / her voice (for example, some question) by the robot R performing an unanswerable action. Therefore, this person has a high possibility of trying to communicate with the robot R again, for example, generating the same voice with a loud voice at a slow speed or changing the content of the voice. This increases the possibility that the robot R can appropriately answer the voice of a person, and can easily continue communication.

また、本実施形態においては、ロボットＲは、回答不能行動として、例えば、「首をかしげて、「ん」と発音する」という動作を行う。この回答不能行動は、人間同士のコミュニケーションにおいて、相手側にコミュニケーションの継続を促す動作と同じである。従って、ロボットＲとコミュニケーションしている人物は、この回答不能行動によって、自らの音声にロボットＲが回答できないことを極めて容易に把握できる。これによって、ロボットＲは、コミュニケーションをより継続しやすくできる。 Further, in the present embodiment, the robot R performs, for example, an operation of “pronuncing the neck and pronounces“ n ”” as an unanswerable action. This unanswerable action is the same as the action of prompting the other party to continue communication in communication between humans. Therefore, the person who is communicating with the robot R can grasp very easily that the robot R cannot reply to his / her voice due to the unanswerable behavior. This makes it easier for the robot R to continue communication.

ここで、回答不能行動が、「首をかしげて、「ん」と発音する」であるとして説明したが、本発明は、これに限定されない。回答不能行動は、例えば、「え」、「へ」または「ん」という１単語の音声としても良い。また、回答不能行動は、例えば、首をかしげるという動作のみであっても良い。本発明は、これら回答不能行動によっても、本実施形態と同等の効果を奏することができる。 Here, although it has been described that the unanswerable behavior is “pronounced with“ necked and pronounced ””, the present invention is not limited to this. The unanswerable behavior may be, for example, a one-word voice of “e”, “he”, or “n”. Further, the unanswerable action may be only an action of, for example, raising the neck. The present invention can achieve the same effects as those of the present embodiment even by these unanswerable behaviors.

また、本実施形態においては、ロボットＲは、音声の発話時間を考慮して評価値を算出する（第３例，第４例）。このため、ロボットＲは、対話対象に回答する必要がないにもかかわらず、誤って発話を行ってしまうことを防止でき、コミュニケーションをより継続しやすくできる。 In the present embodiment, the robot R calculates the evaluation value in consideration of the speech utterance time (third example, fourth example). For this reason, although it is not necessary for the robot R to reply to the conversation target, it is possible to prevent the robot R from speaking erroneously and to facilitate communication.

また、本実施形態においては、ロボットＲは、人物毎の平均音量を考慮して評価値を算出する（第２例および第４例）。このため、ロボットＲは、人物に回答が必要となる可能性が高い場合にもかかわらず、回答不能行動を誤って選択してしまうことを防止でき、コミュニケーションをより継続しやすくできる。 In the present embodiment, the robot R calculates the evaluation value in consideration of the average sound volume for each person (second example and fourth example). For this reason, the robot R can prevent erroneous selection of an unanswerable action even when there is a high possibility that a person will need an answer, and communication can be continued more easily.

また、本実施形態においては、ロボットＲは、第２例および第４例以外の方法で、人物毎の平均音量を考慮した評価値を算出することもできる。ここで、応答行動判定手段４７１は、人物の音量が大きく変化することなく、最大変化量がプラスマイナス６（ｄＢ）程度であるため、下記の式（６）によって音量係数を算出することができる。そして、応答行動判定手段４７１は、この音量係数を用いて評価値を算出する。 In the present embodiment, the robot R can also calculate an evaluation value that takes into account the average volume for each person by a method other than the second example and the fourth example. Here, the response behavior determination means 471 can calculate the volume coefficient by the following equation (6) because the maximum change amount is about plus or minus 6 (dB) without significant change in the volume of the person. . Then, the response behavior determination unit 471 calculates an evaluation value using this volume coefficient.

音量係数＝｛６−（平均音量−現在の音量）｝／６・・・式（６） Volume coefficient = {6- (average volume−current volume)} / 6 (6)

式（６）によれば、入力された音声の音量が平均音量に等しいとき、音量係数は「１．０」となる。一方、音量が実質的な下限である（平均音量−６）［ｄＢ］のとき、音量係数は「０．０」となる。他方、音量が実質的な上限である（平均音量＋６）［ｄＢ］のとき、音量係数は「２．０」となる。このように、式（６）を用いることによって、音量の変化に対応して敏感に、かつ適切な範囲の音量係数を算出することができる。 According to Expression (6), when the volume of the input voice is equal to the average volume, the volume coefficient is “1.0”. On the other hand, when the volume is a substantial lower limit (average volume −6) [dB], the volume coefficient is “0.0”. On the other hand, when the volume is a practical upper limit (average volume + 6) [dB], the volume coefficient is “2.0”. In this way, by using the equation (6), it is possible to calculate a volume coefficient in a suitable and sensitive range corresponding to a change in volume.

また、本実施形態においては、状況ＤＢを単独で格納する状況ＤＢ記憶手段３４を備えることとしたが、オブジェクトマップを格納するオブジェクトデータ記憶手段３１と共用にしてもよい。オブジェクトマップは、オブジェクトデータをオブジェクト別かつ時刻別に記録したデータであるので、オブジェクトマップの一部を状況ＤＢとして活用することができる。 Further, in the present embodiment, the situation DB storage means 34 for storing the situation DB alone is provided, but it may be shared with the object data storage means 31 for storing the object map. Since the object map is data in which object data is recorded by object and by time, a part of the object map can be used as a situation DB.

また、本実施形態においては、動作ＤＢにおいて、ロボットＲの動作を実現する可動部の部位を、首、掌、腰、腕および口（スピーカ）として説明したが、これは一例であって、指、肩、足を含めてもよい。さらに、これらの部位をさらに細分化して定義するようにしてもよい。 Further, in the present embodiment, in the operation DB, the movable part that realizes the operation of the robot R has been described as a neck, a palm, a waist, an arm, and a mouth (speaker). , Shoulders, feet may be included. Furthermore, these parts may be further subdivided and defined.

また、本実施形態では、ロボットＲを、２足歩行可能な自律移動型ロボットとして説明したが、これに限定されず、車輪で移動する自律移動型ロボットへの応用も可能である。この場合には、２足歩行可能な自律移動型ロボットの「足」に該等する可動部が「車輪」となる点を除いて、本実施形態と同等の効果を奏することができる。 In the present embodiment, the robot R has been described as an autonomous mobile robot capable of walking on two legs, but the present invention is not limited to this, and application to an autonomous mobile robot that moves on wheels is also possible. In this case, the same effect as that of the present embodiment can be obtained except that the movable part corresponding to the “foot” of the autonomous mobile robot capable of walking on two legs becomes a “wheel”.

Ａロボットシステム
Ｒロボット（コミュニケーションロボット）
Ｒ１頭部
Ｒ２腕部
Ｒ３脚部
Ｒ４胴体部
Ｒ５背面格納部
１基地局
２ロボット専用ネットワーク
３管理用コンピュータ
３ａ記憶部
４ネットワーク
５端末
１０画像処理部
２０音声処理部
２１ａ音声合成部
２１ｂ音声認識部
２１ｃ音源定位部
３０記憶部
３１オブジェクトデータ記憶手段
３２ローカル地図データ記憶手段
３３ルールＤＢ記憶手段
３４状況ＤＢ記憶手段
３５発話情報記憶手段
４０主制御部
４１静止障害物統合部
４２オブジェクトデータ統合部
４３行動パターン部
４４身振り統合部
４５内部状態検出部
４６行動計画管理部
４７応答行動制御部
４７１応答行動判定手段（応答行動判定部）
４７２行動選択手段（応答行動選択部）
４７３行動指令手段（応答行動指令部）
４７４平均音量算出手段（平均音量算出部）
５０自律移動制御部
６０無線通信部
７０バッテリ
８０対象検知部
９０周辺状態検知部
Ｃカメラ（視覚センサ）
ＭＣマイク（音声入力部）
Ｓスピーカ（音声出力部）
ＳＲ１ジャイロセンサ
ＳＲ２ＧＰＳ受信器 A Robot system R Robot (communication robot)
R1 Head R2 Arm R3 Leg R4 Torso R5 Back storage 1 Base station 2 Robot dedicated network 3 Management computer 3a Storage unit 4 Network 5 Terminal 10 Image processing unit 20 Audio processing unit 21a Audio synthesis unit 21b Audio recognition unit 21c Sound source localization unit 30 storage unit 31 object data storage unit 32 local map data storage unit 33 rule DB storage unit 34 status DB storage unit 35 utterance information storage unit 40 main control unit 41 stationary obstacle integration unit 42 object data integration unit 43 behavior Pattern unit 44 Gesture integration unit 45 Internal state detection unit 46 Action plan management unit 47 Response behavior control unit 471 Response behavior determination means (response behavior determination unit)
472 Action selection means (response action selection unit)
473 Action command means (Response action command section)
474 Average volume calculation means (average volume calculation unit)
DESCRIPTION OF SYMBOLS 50 Autonomous movement control part 60 Wireless communication part 70 Battery 80 Object detection part 90 Peripheral state detection part C Camera (visual sensor)
MC microphone (voice input unit)
S Speaker (Audio output unit)
SR1 Gyro sensor SR2 GPS receiver

Claims

A communication robot having a voice recognition unit for recognizing at least a voice to be interacted inputted by a voice input unit, a voice output unit for outputting voice based on speech information, and a movable unit for performing a predetermined operation,
The voice recognition unit calculates and outputs the reliability of the result of the voice recognition when the voice input unit inputs a voice to be interacted with,
On the basis of the dialogue object of speech in reliability wherein the speech recognizer has been calculated, or while the sound of the dialogue target can be recognized the speech carried a reply incapacitated that it can not answer the voice of the conversation subject A response action determination unit that calculates an evaluation value of whether or not to perform the unanswerable action when the evaluation value is less than a preset threshold value;
When it is determined that the response behavior determination unit performs the response impossible behavior, the response behavior selection unit that selects the response impossible behavior from the predetermined response behavior that the communication robot is capable of,
A response behavior command unit that commands execution of the unanswerable behavior selected by the response behavior selection unit to at least one of the voice output unit and the movable unit;
An average volume calculation unit for calculating an average volume of the voice of the dialogue target for each dialogue target;
Equipped with a,
The response behavior determination unit calculates a volume coefficient obtained by dividing the volume of the conversation target voice input by the voice input unit by the average volume of the conversation target voice calculated by the average volume calculation unit, and calculates the volume coefficient. A communication robot characterized in that a value obtained by multiplying reliability is calculated as the evaluation value.

A communication robot having a voice recognition unit for recognizing at least a voice to be interacted inputted by a voice input unit, a voice output unit for outputting voice based on speech information, and a movable unit for performing a predetermined operation,
The voice recognition unit calculates and outputs the reliability of the result of the voice recognition when the voice input unit inputs a voice to be interacted with,
Based on the reliability calculated by the voice recognition unit for the conversation target voice, whether or not to perform an unanswerable action indicating that the conversation target voice can be recognized but cannot be answered to the conversation target voice A response action determination unit that calculates an evaluation value of whether or not to perform the unanswerable action when the evaluation value is less than a preset threshold value;
When it is determined that the response behavior determination unit performs the response impossible behavior, the response behavior selection unit that selects the response impossible behavior from the predetermined response behavior that the communication robot is capable of,
A response behavior command unit that commands execution of the unanswerable behavior selected by the response behavior selection unit to at least one of the voice output unit and the movable unit;
With
The voice recognition unit further calculates a speech duration of the voice to be interacted input by the voice input unit,
The response behavior determination unit calculates an utterance time coefficient obtained by dividing the utterance time calculated by the voice recognition unit by a preset utterance reference time, and at least a value obtained by multiplying the reliability by the utterance time coefficient A communication robot characterized by being calculated as a value.

Said response action determining unit, when the evaluation value is less than the threshold, 1 pronounce words sound voices, predetermined operation of the movable portion, or a predetermined operation of pronunciation and the movable portion of the one word of the speech The communication robot according to claim 1 , wherein the communication robot is determined to perform the unanswerable behavior that is a combination of the communication robot and the communication robot.