JP2006243555A

JP2006243555A - Response determination system, robot, event output server, and response determining method

Info

Publication number: JP2006243555A
Application number: JP2005061557A
Authority: JP
Inventors: Toru Iwazawa; 透岩沢
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2005-03-04
Filing date: 2005-03-04
Publication date: 2006-09-14

Abstract

PROBLEM TO BE SOLVED: To determine an appropriate response while taking the position of a speaker into consideration according to the voice of the speaker. SOLUTION: A robot 100 inputs events indicating that data regarding voices input from a 1st microphone 306, a 2nd microphone 316, and a 3rd microphone 326 prepared for a 1st participant 300, a 2nd participant 310, and a 3rd participant 320 meet designated conditions together with voice identification information made to correspond to the speakers. The robot 100 acquires position information of a speaker specified with voice identification information made to correspond to each of the input events, and determines its speech and behavior based upon the event and the position information of the speaker. COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、対応決定システム、ロボット、イベント出力サーバ、および対応決定方法に関する。 The present invention relates to a response determination system, a robot, an event output server, and a response determination method.

特許文献１には、固有の無線タグを保有するコミュニケーション対象との間でコミュニケーション行動を実行するコミュニケーションロボットが開示されている。このコミュニケーションロボットは、少なくともタグ情報を記録するタグ情報データベース、コミュニケーション対象からタグ情報を取得する取得手段、取得手段の取得結果に基づいて近傍または周囲に存在する１または複数のコミュニケーション対象を個別に認識する認識手段、認識手段の認識結果に基づいてコミュニケーション行動を実行する１のコミュニケーション対象を特定する特定手段、および特定手段によって特定されたコミュニケーション対象にコミュニケーション行動を実行する実行手段を備える。 Patent Document 1 discloses a communication robot that executes a communication action with a communication target having a unique wireless tag. This communication robot individually recognizes at least a tag information database that records tag information, an acquisition unit that acquires tag information from a communication target, and one or a plurality of communication targets that exist in the vicinity or surroundings based on the acquisition result of the acquisition unit Recognizing means, specifying means for specifying one communication target for executing the communication action based on the recognition result of the recognition means, and executing means for executing the communication action for the communication target specified by the specifying means.

このような構成を有する従来のコミュニケーションロボットは次のように動作する。コミュニケーションロボットは、コミュニケーション行動を実行する際、取得手段によってコミュニケーション対象からタグ情報を取得する。認識手段は、コミュニケーション対象としての人間を個別に認識する。特定手段は、認識手段の認識結果に基づいて、コミュニケーションロボットの近傍または周囲に存在する参加者のうち一人の参加者を特定する。実行手段は、当該参加者に対してコミュニケーション行動を実行する。これにより、近傍または周囲に存在する参加者を個別に認識することができ、特定した参加者に適したコミュニケーション行動を取ることができる、と記載されている。
特開２００４−２１６５１３号公報 The conventional communication robot having such a configuration operates as follows. When performing a communication action, the communication robot acquires tag information from a communication target by an acquisition unit. The recognizing means individually recognizes a person as a communication target. The specifying unit specifies one participant among the participants existing near or around the communication robot based on the recognition result of the recognition unit. The execution means executes communication behavior for the participant. Thus, it is described that participants existing in the vicinity or in the vicinity can be individually recognized and communication behavior suitable for the specified participant can be taken.
JP 2004-216513 A

しかし、従来のロボット（システム）においては、ロボットの近傍または周囲に存在するコミュニケーション対象を認識することができるだけで、履歴情報等に基づき、ロボットが自発的に話しかけることしかできなかった。 However, in the conventional robot (system), it is only possible to recognize a communication target existing in the vicinity of or around the robot, and the robot can only speak spontaneously based on history information and the like.

たとえ対話の相手を把握することができても、その相手の発話内容や状態を認識できないと、ロボットが相手と円滑にコミュニケーションを取ることができない。とくに、複数の相手とコミュニケーションを取る場合、それぞれの相手の発話内容や状況を認識して適切な対応をとるのは困難だった。 Even if the partner of the dialogue can be grasped, the robot cannot communicate smoothly with the partner unless the utterance content and state of the partner can be recognized. In particular, when communicating with multiple partners, it was difficult to take appropriate actions by recognizing the utterances and circumstances of each partner.

本発明は上記事情を踏まえてなされたものであり、本発明の目的は、話者の音声に応じて、話者の位置も考慮して適切な対応をとる技術を提供することにある。 The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a technique for taking an appropriate response in consideration of the position of the speaker according to the voice of the speaker.

本発明によれば、
話者に付与された音声入力デバイスから入力された音声に関するデータが所定の条件に合致することを示すイベントを、当該話者に対応づけられた音声識別情報とともに入力し、
前記話者に対する対応を決定する対応決定部と、
前記対応決定部が入力した前記イベントに対応づけられた前記音声識別情報により特定される前記話者の位置情報を取得する位置情報取得部と、
を含み、
前記対応決定部は、前記イベントと、前記話者の位置情報とに基づき、前記対応を決定することを特徴とする対応決定システムが提供される。 According to the present invention,
An event indicating that data related to speech input from the speech input device assigned to the speaker matches a predetermined condition is input together with speech identification information associated with the speaker,
A correspondence determining unit for determining a correspondence to the speaker;
A location information acquisition unit that acquires location information of the speaker specified by the voice identification information associated with the event input by the correspondence determination unit;
Including
The correspondence determining unit determines the correspondence based on the event and the position information of the speaker, and provides a correspondence determining system.

ここで、音声に関するデータは、音声データおよび、音声データに基づく音声認識結果を含む。音声入力デバイスは、たとえばマイクとすることができる。また、マイクは、接話マイクとすることができる。 Here, the voice-related data includes voice data and a voice recognition result based on the voice data. The voice input device can be, for example, a microphone. The microphone may be a close-talking microphone.

ここで、対応決定システムは、たとえば、自律移動型または対話型のロボットを制御するロボット制御システム、音声対話システム、または音声認識を利用した情報検索システム等、話者の音声に対して何らかの対応を行うシステムとすることができる。 Here, for example, the response determination system is capable of responding to a speaker's voice such as a robot control system that controls an autonomously moving or interactive robot, a voice dialog system, or an information search system using voice recognition. It can be a system to do.

本発明の対応決定システムによれば、話者に付与された音声入力デバイスから入力された音声に基づくイベントと、話者位置との双方に適応した対応を取るようにすることができる。たとえば、本発明の対応決定システムがロボット制御システムの場合、対応決定部は、イベントと、話者の位置情報とに基づき、ロボットの言動を決定することができる。これにより、イベントに応じて、ロボットが話者の方を向いたり、話者の方に近づいたり、話者位置に基づく発話をする等、話者位置に適応した言動を実行することができる。 According to the correspondence determination system of the present invention, it is possible to take a correspondence adapted to both the event based on the voice input from the voice input device assigned to the speaker and the speaker position. For example, when the correspondence determination system of the present invention is a robot control system, the correspondence determination unit can determine the behavior of the robot based on the event and the position information of the speaker. Thereby, according to the event, the speech adapted to the speaker position can be executed, such as the robot facing the speaker, approaching the speaker, or speaking based on the speaker position.

本発明によれば、接話マイク等の音声入力デバイスから入力された音声に基づき、処理が行われるので、雑音等への耐性が高いロバストな対応決定システムが提供される。さらに、音声入力デバイスを対応決定システムに対してワイヤレス（無線）接続とすることにより、話者の身体的自由度を高く保つことができる。また、本発明によれば、話者の音声および位置情報に応じて、処理が行われるので、インターフェース透過性の高い対応決定システムが提供される。 According to the present invention, processing is performed on the basis of voice input from a voice input device such as a close-talking microphone, so that a robust correspondence determination system with high resistance to noise or the like is provided. Furthermore, by making the voice input device wirelessly connected to the correspondence determination system, the physical freedom of the speaker can be kept high. Further, according to the present invention, processing is performed according to the voice and position information of the speaker, so that a correspondence determination system with high interface transparency is provided.

なお、本発明の対応決定システムは、一つのシステム内に設けられてもよく、互いにネットワークを介して接続された複数のシステムに分散して設けられてもよい。たとえば、本発明の対応決定システムがロボット制御システムの場合、当該システムは、ロボット内に設けられてもよく、ロボットと無線等のネットワークで通信可能なサーバ内に設けられてもよい。また、対応決定システムの一部の機能がロボット内に設けられ、他の機能がサーバ内に設けられてもよい。 The correspondence determination system of the present invention may be provided in one system, or may be provided in a distributed manner in a plurality of systems connected to each other via a network. For example, when the response determination system of the present invention is a robot control system, the system may be provided in the robot or in a server that can communicate with the robot through a network such as wireless communication. In addition, some functions of the response determination system may be provided in the robot, and other functions may be provided in the server.

自律移動したり話者と対話したりするロボットにおいて、話者とロボットとが円滑にコミュニケーションを取ることができるような制御が望まれる。従来、ロボットがユーザと対話したり、ユーザに対して何らかの動作をする場合、画像認識や音声の特徴によりユーザを識別する試みがなされていた。しかし、このような方法では、話者がロボットのごく近傍にいたり、雑音がない等の条件が整っていないと、話者を認識するのが困難だった。また、特許文献１に記載のように、コミュニケーション対象に固有の無線タグを保有させることにより、コミュニケーション対象を認識するという試みもなされている。しかし、上述したように、この方法では近傍にいるコミュニケーション対象を認識することができるだけで、話者の発話内容を把握するのが困難だという点では、他の従来技術と同様である。 In a robot that moves autonomously or interacts with a speaker, control is desired so that the speaker and the robot can communicate smoothly. Conventionally, when a robot interacts with a user or performs some operation on the user, attempts have been made to identify the user by image recognition or voice characteristics. However, with this method, it is difficult to recognize the speaker unless the speaker is in the immediate vicinity of the robot or the conditions such as no noise are satisfied. Also, as described in Patent Document 1, an attempt has been made to recognize a communication target by holding a wireless tag unique to the communication target. However, as described above, this method is similar to the other conventional techniques in that it can only recognize communication objects in the vicinity and it is difficult to grasp the content of the speaker's utterance.

本発明の対応決定システムによれば、話者の音声が音声入力デバイスから入力されるとともに、音声識別情報が付与されているので、どの話者がどのような発話を行ったのかを容易に把握することができる。また、複数の話者とのコミュニケーションを取る場合であっても、それぞれの話者の発話内容を把握することができる。 According to the correspondence determination system of the present invention, since the voice of the speaker is input from the voice input device and the voice identification information is given, it is easy to know which speaker has made what kind of utterance. can do. Moreover, even when communicating with a plurality of speakers, it is possible to grasp the utterance contents of each speaker.

本発明の対応決定システムは、前記音声に関するデータを、前記音声識別情報とともに入力し、前記音声に関するデータが所定の条件に合致するか否かを検出し、前記条件に合致した場合に、前記条件に合致することを示すイベントを、前記音声識別情報とともに前記対応決定部に出力するイベント出力部をさらに含むことができる。 The correspondence determination system of the present invention inputs the data related to the voice together with the voice identification information, detects whether or not the data related to the voice meets a predetermined condition, and if the condition is met, And an event output unit that outputs an event indicating that it matches to the correspondence determination unit together with the voice identification information.

本発明の対応決定システムがロボット制御システムの場合、イベント出力部は、ロボット内に設けられてもよいが、ロボットと無線等で通信可能なサーバ内に設けられてもよい。たとえば、ロボットが多数の話者と同時に対話をするような形態においては、複数のサーバにイベント出力部の機能をそれぞれ持たせ、複数のサーバから言動決定部の機能を有するロボットまたは一のサーバへイベント等のデータが入力される構成とすることもできる。 When the correspondence determination system of the present invention is a robot control system, the event output unit may be provided in the robot, or may be provided in a server that can communicate with the robot wirelessly or the like. For example, in a form in which a robot interacts simultaneously with a large number of speakers, a plurality of servers are each provided with an event output unit function, and a plurality of servers are transferred to a robot or a single server having a behavior determining unit function. It can also be configured such that data such as events is input.

本発明の対応決定システムにおいて、前記イベント出力部は、前記音声入力デバイスから入力された音声のパワーをモニタし、前記音声のパワーが所定の値以下の状態が所定時間継続した場合に、音声入力不具合を示すイベントを出力することができる。 In the response determination system of the present invention, the event output unit monitors the power of the voice input from the voice input device, and the voice input is performed when a state where the voice power is equal to or lower than a predetermined value continues for a predetermined time. An event indicating a failure can be output.

このような構成とすれば、何らかの不具合により、音声入力デバイスからの音声の入力が途絶えた場合に、話者の位置を考慮して適切な対応を取ることができる。 With such a configuration, when voice input from the voice input device is interrupted due to some trouble, an appropriate response can be taken in consideration of the position of the speaker.

本発明の対応決定システムにおいて、前記イベント出力部は、前記音声入力デバイスから入力された音声のパワーをモニタし、前記音声のパワーが所定の値以上となった場合に、発話検出を示すイベントを出力することができる。 In the response determination system of the present invention, the event output unit monitors the power of the voice input from the voice input device, and when the voice power becomes a predetermined value or more, an event indicating utterance detection is detected. Can be output.

このような構成とすれば、音声入力デバイスに話者の発話が入力された場合に、話者の位置を考慮して適切な対応を取ることができる。たとえば、対応決定システムがロボット制御システムであって、発話検出を示すイベントが出力された場合に、ロボットが話者の方向を向くように制御した場合、話者は、自分が発話したことにより、ロボットが自分の方を向いたことを認識することができる。このような適応動作により、ロボットのインターフェース透過性を高め、ロボットの動作が話者に分かりやすいようにすることができる。なお、発話は、音声の調波構造や音声のパワーが所定の値以上である継続時間等に基づき検出することもでき、イベント出力部は、このような状態を検知して、発話検出を示すイベントを出力することもできる。 With such a configuration, when a speaker's utterance is input to the voice input device, an appropriate response can be taken in consideration of the speaker's position. For example, if the response determination system is a robot control system and an event indicating utterance detection is output, and the robot is controlled so that it faces the direction of the speaker, the speaker You can recognize that the robot is facing you. By such an adaptive operation, the interface transparency of the robot can be increased and the operation of the robot can be easily understood by the speaker. The utterance can also be detected based on the harmonic structure of the voice, the duration of the voice power being a predetermined value or more, and the event output unit detects such a state to indicate the utterance detection. Events can also be output.

本発明の対応決定システムは、話者に付与された音声入力デバイスから入力された音声を、当該話者に対応づけられた音声識別情報とともに入力し、前記音声を音声認識して音声認識結果を前記音声識別情報とともに出力する音声認識部をさらに含むことができ、前記イベント出力部は、前記音声認識部から前記音声認識結果が出力された場合に、音声認識結果を示すイベントを前記音声認識結果とともに出力することができる。 The correspondence determination system according to the present invention inputs a voice input from a voice input device assigned to a speaker together with voice identification information associated with the speaker, recognizes the voice, and obtains a voice recognition result. The speech recognition unit may further include a speech recognition unit that outputs the speech recognition information, and the event output unit may display an event indicating a speech recognition result when the speech recognition result is output from the speech recognition unit. Can be output together.

このような構成とすれば、音声入力デバイスに話者の発話が入力され、音声認識が行われた場合に、話者の位置を考慮して、適切な対応を取ることができる。 With such a configuration, when a speaker's utterance is input to the voice input device and voice recognition is performed, an appropriate response can be taken in consideration of the position of the speaker.

本発明の対応決定システムは、前記話者に付与された識別タグからタグ識別情報を読み取るタグリーダをさらに含むことができ、前記位置情報取得部は、タグリーダが読み取った前記タグ識別情報に基づき、前記話者の位置情報を取得することができる。 The correspondence determination system of the present invention may further include a tag reader that reads tag identification information from an identification tag given to the speaker, and the position information acquisition unit is based on the tag identification information read by the tag reader, The position information of the speaker can be acquired.

識別タグは、たとえばアクティブ／パッシブのＲＦＩＤ（Radio Frequency Identification）タグ、超音波タグ、赤外線タグ等、電波や電磁波、超音波、赤外線等により読み取り可能なタグである。 The identification tag is, for example, an active / passive RFID (Radio Frequency Identification) tag, an ultrasonic tag, an infrared tag, or the like that can be read by radio waves, electromagnetic waves, ultrasonic waves, infrared rays, or the like.

本発明の対応決定システムは、同一の話者の、前記音声識別情報と、前記タグ識別情報とを対応づけた識別情報記憶部をさらに含むことができ、前記位置情報取得部は、前記イベント出力部が出力したイベントに対応づけられた音声識別情報に基づき、前記識別情報記憶部を参照して、対応するタグ識別情報を有する識別タグの位置情報を取得することができる。 The correspondence determination system of the present invention may further include an identification information storage unit that associates the voice identification information with the tag identification information of the same speaker, and the position information acquisition unit includes the event output The position information of the identification tag having the corresponding tag identification information can be obtained by referring to the identification information storage unit based on the voice identification information associated with the event output by the unit.

本発明の対応決定システムにおいて、前記言動決定部は、複数の話者にそれぞれ付与された複数の音声入力デバイスから入力された音声に関するデータが所定の条件に合致することを示すイベントを、各話者に対応づけられた音声識別情報とともに入力することができる。 In the correspondence determination system according to the present invention, the behavior determination unit generates an event indicating that data related to speech input from a plurality of speech input devices respectively assigned to a plurality of speakers matches a predetermined condition. It can be input together with the voice identification information associated with the person.

本発明によれば、複数の相手とコミュニケーションを取る場合でも、各話者からの音声に関するデータと、各話者の位置情報とが、それぞれ話者に対応づけられて取得される。そのため、複数の相手に対して、それぞれ適切な対応を取ることができる。 According to the present invention, even when communicating with a plurality of opponents, data related to the sound from each speaker and position information of each speaker are acquired in association with each speaker. Therefore, it is possible to take appropriate measures for each of a plurality of opponents.

本発明の対応決定システムにおいて、前記位置情報取得部は、前記話者の前記ロボットに対する位置情報を取得することができ、前記言動決定部は、前記ロボットが前記話者の位置を意識した言動を行うよう前記ロボットの言動を決定することができる。 In the correspondence determination system of the present invention, the position information acquisition unit can acquire position information of the speaker with respect to the robot, and the behavior determination unit performs a behavior in which the robot is aware of the position of the speaker. The behavior of the robot can be determined to do.

これにより、話者の位置情報の取得が容易になるとともに、ロボットが話者の方を向いたり、話者の方へ近づいたりする際の制御を容易にすることができる。 This facilitates acquisition of the speaker position information and facilitates control when the robot faces the speaker or approaches the speaker.

本発明によれば、
自律移動型または対話型のロボットであって、
上記いずれかに記載の対応決定システムと、
前記対応決定部が決定した対応を言動として実行する言動実行部と、
を含むことを特徴とするロボットが提供される。 According to the present invention,
An autonomous mobile or interactive robot,
A response determination system according to any of the above,
A behavior execution unit that executes the correspondence determined by the correspondence determination unit as behavior;
Is provided.

本発明のロボットによれば、話者に付与された音声入力デバイスから入力された音声に基づくイベントと、話者位置との双方に適応した言動を実行することができる。また、本発明のロボットは、話者と対面して言動を実行する構成とすることができる。このようなロボットにおいて、話者とロボットとが円滑にコミュニケーションを取ることができるような制御が望まれる。本発明のロボットによれば、話者の音声が音声入力デバイスから入力されるとともに、音声識別情報が付与されているので、どの話者がどのような発話を行ったのかを容易に把握することができる。また、複数の話者とのコミュニケーションを取る場合であっても、それぞれの話者の発話内容を把握することができる。これにより、話者とロボットとが円滑なコミュニケーションを取ることができる。 According to the robot of the present invention, it is possible to execute speech adapted to both the event based on the voice input from the voice input device given to the speaker and the speaker position. Further, the robot of the present invention can be configured to execute speech while facing a speaker. In such a robot, control is desired so that the speaker and the robot can communicate smoothly. According to the robot of the present invention, since the voice of the speaker is inputted from the voice input device and the voice identification information is given, it is easy to grasp which speaker has made what kind of utterance. Can do. Moreover, even when communicating with a plurality of speakers, it is possible to grasp the utterance contents of each speaker. As a result, the speaker and the robot can communicate smoothly.

本発明によれば、
音声入力デバイスから入力された音声を出力する音声出力部を含む通信端末装置および自律移動型または対話型のロボットにネットワークを介して接続されるとともに、これらを中継するイベント出力サーバであって、
前記ロボットは、
話者に付与された音声入力デバイスから入力された音声に関するデータが所定の条件に合致することを示すイベントを、当該話者に対応づけられた音声識別情報とともに入力し、前記イベントに対応づけられた前記音声識別情報により特定される前記話者の位置情報を取得し、前記イベントと前記話者の位置情報とに基づき決定された言動を実行し、
前記イベント出力サーバは、
前記通信端末装置から、前記音声出力部が出力した音声を、前記音声識別情報とともに受信する音声入力部と、
前記音声入力部が入力した音声が所定の条件に合致するか否かを検出し、前記条件に合致した場合に、前記条件に対応するイベントを、前記音声識別情報とともに出力するイベント出力部と、
前記イベント出力部が出力した前記イベントを前記音声識別情報とともに前記ロボットに送信するデータ出力部と、
を含むことを特徴とするイベント出力サーバが提供される。 According to the present invention,
An event output server connected via a network to a communication terminal device including an audio output unit that outputs audio input from an audio input device and an autonomous mobile or interactive robot, and relays these,
The robot is
An event indicating that the voice-related data input from the voice input device assigned to the speaker matches a predetermined condition is input together with the voice identification information associated with the speaker, and is associated with the event. Obtaining the position information of the speaker specified by the voice identification information, and executing the behavior determined based on the event and the position information of the speaker,
The event output server is
A voice input unit that receives the voice output by the voice output unit from the communication terminal device together with the voice identification information;
An event output unit that detects whether or not the voice input by the voice input unit matches a predetermined condition, and outputs an event corresponding to the condition together with the voice identification information when the condition is met;
A data output unit for transmitting the event output by the event output unit to the robot together with the voice identification information;
An event output server is provided.

本発明のイベント出力サーバによれば、イベントが、音声識別情報とともにロボットに送信されるので、ロボットにおいて、話者に付与された音声入力デバイスから入力された音声に基づくイベントと、話者位置との双方に適応した言動を実行することができる。 According to the event output server of the present invention, since the event is transmitted to the robot together with the voice identification information, in the robot, the event based on the voice input from the voice input device given to the speaker, the speaker position, It is possible to execute behavior adapted to both.

本発明のイベント出力サーバにおいて、前記イベント出力部は、前記音声入力デバイスから入力された音声のパワーをモニタし、前記音声のパワーが所定の値以下の状態が所定時間継続した場合に、音声入力不具合を示すイベントを出力することができる。 In the event output server of the present invention, the event output unit monitors the power of the voice input from the voice input device, and the voice input is performed when the voice power is below a predetermined value for a predetermined time. An event indicating a failure can be output.

本発明のイベント出力サーバにおいて、前記イベント出力部は、前記音声入力デバイスから入力された音声のパワーをモニタし、前記音声のパワーが所定の値以上となった場合に、発話検出を示すイベントを出力することができる。 In the event output server of the present invention, the event output unit monitors the power of the voice input from the voice input device, and when the voice power becomes equal to or higher than a predetermined value, an event indicating speech detection is detected. Can be output.

本発明のイベント出力サーバにおいて、話者に付与された音声入力デバイスから入力された音声を、当該話者に対応づけられた音声識別情報とともに入力し、前記音声を音声認識して音声認識結果を前記音声識別情報とともに出力する音声認識部をさらに含むことができ、前記イベント出力部は、前記音声認識部から前記音声認識結果が出力された場合に、音声認識結果を示すイベントを前記音声認識結果とともに出力することができる。 In the event output server of the present invention, the voice input from the voice input device assigned to the speaker is input together with the voice identification information associated with the speaker, the voice is voice-recognized, and the voice recognition result is obtained. The speech recognition unit may further include a speech recognition unit that outputs the speech recognition information, and the event output unit may display an event indicating a speech recognition result when the speech recognition result is output from the speech recognition unit. Can be output together.

本発明によれば、
話者に付与された音声入力デバイスから入力された音声に関するデータが所定の条件に合致することを示すイベントを、当該話者に対応づけられた音声識別情報とともに入力するステップと、
前記イベントを前記音声識別情報とともに入力するステップで入力された前記イベントに対応づけられた前記音声識別情報により特定される前記話者の位置情報を取得するステップと、
前記イベントと、前記話者の位置情報とに基づき、前記話者に対する対応を決定するステップと、
を含むことを特徴とする対応決定方法が提供される。 According to the present invention,
Inputting an event indicating that data related to voice input from a voice input device assigned to the speaker matches a predetermined condition, together with voice identification information associated with the speaker;
Obtaining positional information of the speaker specified by the voice identification information associated with the event input in the step of inputting the event together with the voice identification information;
Determining a response to the speaker based on the event and the location information of the speaker;
A correspondence determination method characterized by including:

本発明の対応決定方法によって、自律移動型または対話型のロボットを制御することができ、話者に対する対応を決定するステップにおいて、イベントと、話者の位置情報に基づき、ロボットの言動を決定することができる。 According to the correspondence determination method of the present invention, an autonomous mobile or interactive robot can be controlled, and in the step of determining the correspondence to the speaker, the behavior of the robot is determined based on the event and the position information of the speaker. be able to.

本発明の対応決定方法によれば、話者に付与された音声入力デバイスから入力された音声に基づくイベントと、話者位置の双方に適応した対応を決定することができる。たとえば、本発明の対応決定方法により、ロボットを制御する場合、イベントに応じて、ロボットが話者の方を向いたり、話者の方に近づいたり、話者位置に基づく発話をする等、話者位置に適応した言動を実行することができる。 According to the correspondence determination method of the present invention, it is possible to determine the correspondence adapted to both the event based on the voice input from the voice input device assigned to the speaker and the speaker position. For example, when the robot is controlled by the correspondence determination method of the present invention, the robot faces the speaker, approaches the speaker, or speaks based on the speaker position, depending on the event. Can be adapted to the person's position.

本発明の対応決定方法は、前記イベントを前記音声識別情報とともに入力するステップの前に、前記音声に関するデータを、前記音声識別情報とともに入力し、前記音声に関するデータが所定の条件に合致するか否かを検出し、前記条件に合致した場合に、前記条件に合致することを示すイベントを、前記音声識別情報とともに出力するステップをさらに含むことができる。 In the correspondence determining method of the present invention, before the step of inputting the event together with the voice identification information, the voice-related data is input together with the voice identification information, and whether the voice-related data meets a predetermined condition or not. And detecting an event indicating that the condition is met together with the voice identification information when the condition is met.

本発明の対応決定方法において、前記イベントを前記音声識別情報とともに出力するステップは、前記音声入力デバイスから入力された音声のパワーをモニタするステップと、前記音声のパワーが所定の値以下の状態が所定時間継続した場合に、音声入力不具合を示すイベントを出力するステップと、を含むことができる。 In the correspondence determining method of the present invention, the step of outputting the event together with the voice identification information includes the step of monitoring the power of the voice input from the voice input device, and the state where the voice power is equal to or lower than a predetermined value. And outputting an event indicating a voice input failure when continuing for a predetermined time.

本発明の対応決定方法において、前記イベントを前記音声識別情報とともに出力するステップは、前記音声入力デバイスから入力された音声のパワーをモニタするステップと、前記音声のパワーが所定の値以上となった場合に、発話検出を示すイベントを出力するステップと、を含むことができる。 In the correspondence determination method of the present invention, the step of outputting the event together with the voice identification information includes the step of monitoring the power of the voice input from the voice input device, and the voice power is equal to or higher than a predetermined value. A step of outputting an event indicating utterance detection.

本発明の対応決定方法は、話者に付与された音声入力デバイスから入力された音声を、当該話者に対応づけられた音声識別情報とともに入力し、前記音声を音声認識して音声認識結果を前記音声識別情報とともに出力するステップをさらに含むことができ、前記イベントを前記音声識別情報とともに出力するステップは、前記音声認識結果が出力された場合に、音声認識結果を示すイベントを前記音声認識結果とともに出力することができる。 In the correspondence determination method of the present invention, a voice input from a voice input device assigned to a speaker is input together with voice identification information associated with the speaker, the voice is voice-recognized, and a voice recognition result is obtained. The step of outputting together with the voice identification information may further include the step of outputting the event together with the voice identification information, when the voice recognition result is output, an event indicating a voice recognition result is output from the voice recognition result. Can be output together.

本発明の対応決定方法において、前記イベントを入力するステップは、複数の話者にそれぞれ付与された複数の音声入力デバイスから入力された音声に関するデータが所定の条件に合致することを示すイベントを、各話者に対応づけられた音声識別情報とともに入力することができる。 In the correspondence determining method of the present invention, the step of inputting the event includes an event indicating that data relating to speech input from a plurality of speech input devices respectively assigned to a plurality of speakers matches a predetermined condition. It can be input together with the voice identification information associated with each speaker.

なお、以上の構成要素の任意の組合せ、本発明の表現を方法、装置、システム、記録媒体、コンピュータプログラムなどの間で変換したものもまた、本発明の態様として有効である。 It should be noted that any combination of the above-described constituent elements and a conversion of the expression of the present invention between a method, an apparatus, a system, a recording medium, a computer program, etc. are also effective as an aspect of the present invention.

本発明によれば、話者の音声に応じて、話者の位置も考慮して適切な対応をとることができる。 According to the present invention, an appropriate response can be taken in consideration of the position of the speaker according to the voice of the speaker.

次に、発明を実施するための最良の形態について図面を参照して詳細に説明する。なお、以下の図面において、本発明の本質に関わらない部分の構成は省略する。 Next, the best mode for carrying out the invention will be described in detail with reference to the drawings. In the following drawings, configurations of parts not related to the essence of the present invention are omitted.

以下の実施の形態において、対応決定システムおよび対応決定方法が、自律移動するとともに話者と対話するロボットを制御するロボット制御システムおよびロボット制御方法である場合を例として説明する。 In the following embodiments, the case where the correspondence determination system and the correspondence determination method are a robot control system and a robot control method for controlling a robot that moves autonomously and interacts with a speaker will be described as an example.

以下の実施の形態において、ロボットとコミュニケーションを取りたい人には、マイク、マイクに入力された音声をロボットに送信する音声出力部を含む通信端末装置、およびロボットが位置情報を取得するための識別タグが付与される。 In the following embodiments, for a person who wants to communicate with a robot, a microphone, a communication terminal device including a voice output unit that transmits voice input to the microphone to the robot, and identification for the robot to acquire position information A tag is given.

（第一の実施の形態）
本実施の形態において、対応決定システムは、ロボット内に組み込まれる。 (First embodiment)
In the present embodiment, the correspondence determination system is incorporated in the robot.

図１は、本実施の形態におけるロボットと、話者である参加者との関係を示す模式図である。
ここでは、ロボット１００と、第一の参加者３００、第二の参加者３１０、および第三の参加者３２０とのコミュニケーションを例に説明する。 FIG. 1 is a schematic diagram showing the relationship between a robot and a participant who is a speaker in the present embodiment.
Here, communication between the robot 100 and the first participant 300, the second participant 310, and the third participant 320 will be described as an example.

第一の参加者３００、第二の参加者３１０、および第三の参加者３２０は、それぞれ、第一の識別タグ３０２、第一の音声出力部３０４、および第一のマイク３０６、第二の識別タグ３１２、第二の音声出力部３１４、および第二のマイク３１６、ならびに第三の識別タグ３２２、第三の音声出力部３２４、および第三のマイク３２６を保持する。 The first participant 300, the second participant 310, and the third participant 320 are a first identification tag 302, a first audio output unit 304, a first microphone 306, a second microphone 306, respectively. The identification tag 312, the second audio output unit 314, and the second microphone 316, and the third identification tag 322, the third audio output unit 324, and the third microphone 326 are held.

以下、第一の参加者３００を例に説明する。
第一のマイク３０６は、第一の参加者３００の音声を入力する。第一のマイク３０６は、参加者が動きやすいように、ヘッドセットマイクとすることができる。第一のマイク３０６は、第一の参加者３００に付与された第一の音声出力部３０４に接続される。 Hereinafter, the first participant 300 will be described as an example.
The first microphone 306 inputs the voice of the first participant 300. The first microphone 306 can be a headset microphone so that participants can move easily. The first microphone 306 is connected to the first audio output unit 304 assigned to the first participant 300.

第一の音声出力部３０４は、たとえば携帯型の無線通信装置である。第一の音声出力部３０４は、たとえばＰＤＡ（Personal Digital Assistance）とすることができる。第一の音声出力部３０４は、第一のマイク３０６から入力される音声をロボット１００に送信する。この際、第一の音声出力部３０４は、音声とともに自己を識別する音声識別情報をロボット１００に送信する。第一の参加者３００は、たとえば第一の音声出力部３０４をメッシュ状のポケットを有するバッグやリュックのポケットに入れて、第一の音声出力部３０４を携帯することができる。 The first audio output unit 304 is, for example, a portable wireless communication device. The first audio output unit 304 can be a PDA (Personal Digital Assistance), for example. The first sound output unit 304 transmits the sound input from the first microphone 306 to the robot 100. At this time, the first voice output unit 304 transmits voice identification information for identifying itself together with the voice to the robot 100. The first participant 300 can carry the first audio output unit 304 by putting the first audio output unit 304 in a bag having a mesh-like pocket or a backpack pocket, for example.

ロボット１００と第一の音声出力部３０４とは、たとえば無線ＬＡＮ等を介して通信を行うことができる。ロボット１００と第一の音声出力部３０４とが無線ＬＡＮを介して通信を行う場合、たとえば、ＴＣＰ（Transmission Control Protocol）／ＩＰ（Internet Protocol）により通信を行うことができる。この場合、たとえば第一の音声出力部３０４の音声識別情報として、各装置に割り当てられたポート番号やＩＰアドレスを用いることもできる。 The robot 100 and the first audio output unit 304 can communicate via a wireless LAN, for example. When the robot 100 and the first audio output unit 304 communicate via a wireless LAN, for example, communication can be performed using TCP (Transmission Control Protocol) / IP (Internet Protocol). In this case, for example, a port number or an IP address assigned to each device can be used as the voice identification information of the first voice output unit 304.

第一の識別タグ３０２は、たとえばアクティブ／パッシブのＲＦＩＤ（Radio Frequency Identification）タグ、超音波タグ、赤外線タグ等、電波や電磁波、超音波、赤外線等により読み取り可能なタグである。第一の識別タグ３０２には、タグ固有のタグ識別情報が記憶される。第一の参加者３００は、第一の識別タグ３０２をたとえば胸元等自分の体の一部につけておく。ロボット１００は、第一の識別タグ３０２からタグ識別情報を読み取ることにより、第一の参加者３００を識別することができる。また、ロボット１００は、第一の識別タグ３０２の読み取り強度等により、第一の識別タグ３０２の位置情報を取得することができる。ここで、位置情報とは、ロボット１００と第一の参加者３００との距離および第一の参加者３００のロボット１００に対する方向等、第一の参加者３００のロボット１００に対する位置情報とすることができる。 The first identification tag 302 is, for example, an active / passive RFID (Radio Frequency Identification) tag, an ultrasonic tag, an infrared tag, or the like that can be read by radio waves, electromagnetic waves, ultrasonic waves, infrared rays, or the like. The first identification tag 302 stores tag identification information unique to the tag. The first participant 300 attaches the first identification tag 302 to a part of his / her body such as the chest. The robot 100 can identify the first participant 300 by reading the tag identification information from the first identification tag 302. Further, the robot 100 can acquire the position information of the first identification tag 302 based on the reading strength of the first identification tag 302 and the like. Here, the position information may be the position information of the first participant 300 with respect to the robot 100, such as the distance between the robot 100 and the first participant 300 and the direction of the first participant 300 with respect to the robot 100. it can.

たとえば、第一の識別タグ３０２が超音波タグの場合、第一の識別タグ３０２から定期的に超音波を送信し、ロボット１００のリーダが受信する。ロボット１００は、リーダが受信した超音波の到達時間や受信角度により第一の参加者３００の位置情報を取得することができる。 For example, when the first identification tag 302 is an ultrasonic tag, ultrasonic waves are periodically transmitted from the first identification tag 302 and received by the reader of the robot 100. The robot 100 can acquire the position information of the first participant 300 based on the arrival time and reception angle of the ultrasonic wave received by the reader.

第二の参加者３１０の第二の識別タグ３１２、および第三の参加者３２０の第三の識別タグ３２２も、第一の参加者３００の第一の識別タグ３０２と同様の構成を有する。第二の参加者３１０の第二の音声出力部３１４、および第三の参加者３２０の第三の音声出力部３２４も、第一の参加者３００の第一の音声出力部３０４と同様の構成を有する。 The second identification tag 312 of the second participant 310 and the third identification tag 322 of the third participant 320 have the same configuration as the first identification tag 302 of the first participant 300. The second audio output unit 314 of the second participant 310 and the third audio output unit 324 of the third participant 320 also have the same configuration as the first audio output unit 304 of the first participant 300. Have

ロボット１００は、複数の参加者の音声を同時に受信可能に構成されるとともに、複数の参加者の位置情報を同時に取得可能な構成とされる。本実施の形態において、ロボット１００は、自律移動型である。 The robot 100 is configured to be able to simultaneously receive the voices of a plurality of participants, and is configured to be able to simultaneously acquire position information of the plurality of participants. In the present embodiment, the robot 100 is an autonomous mobile type.

図２は、本実施の形態におけるロボットの一例を示す外観構成図である。
ロボット１００は、たとえば、胴体部１および頭部２が連結されることにより構成される。胴体部１の下部には左右にそれぞれ車輪３Ａおよび車輪３Ｂが取り付けられており、これらの車輪は、独立に前後に回転することができる。 FIG. 2 is an external configuration diagram showing an example of the robot in the present embodiment.
The robot 100 is configured, for example, by connecting the body 1 and the head 2. A wheel 3A and a wheel 3B are attached to the lower part of the body part 1 on the left and right, respectively, and these wheels can rotate back and forth independently.

頭部２は、胴体部１に垂直に取り付けられた垂直軸とその垂直軸に対して９０度の角度で設置された水平軸に関して決められた範囲で回転することができる。垂直軸は頭部２の中心を通るように設置されており、水平軸は胴体部１と頭部２が正面を向いた状態で頭部２の中心を通りかつ左右方向に水平に設置されている。つまり、頭部２は左右と上下の２自由度で、決められた範囲内で回転することができる。 The head 2 can rotate within a predetermined range with respect to a vertical axis that is vertically attached to the body 1 and a horizontal axis that is installed at an angle of 90 degrees with respect to the vertical axis. The vertical axis is installed so as to pass through the center of the head 2, and the horizontal axis is installed horizontally through the center of the head 2 with the body 1 and the head 2 facing the front and horizontally in the left-right direction. Yes. That is, the head 2 can rotate within a predetermined range with two degrees of freedom, left and right and up and down.

胴体部１の表面には、スピーカ１２および内部マイク１３が設けられる。また、頭部２の表面には、ＣＣＤカメラ２１ＡおよびＣＣＤカメラ２１Ｂが設けられる。 A speaker 12 and an internal microphone 13 are provided on the surface of the body portion 1. A CCD camera 21A and a CCD camera 21B are provided on the surface of the head 2.

図３は、ロボット１００の電気的構成の一例を示すブロック図である。
胴体部１には、ロボット全体の制御を行うコントローラ１０、ロボットの動力源となるバッテリ１１、スピーカ１２、内部マイク１３、２つの車輪を動かすためのアクチュエータ１４Ａおよびアクチュエータ１４Ｂ、通信インターフェース（Ｉ／Ｆ）２４、ならびにタグリーダ２６等が収納されている。 FIG. 3 is a block diagram illustrating an example of the electrical configuration of the robot 100.
The body 1 includes a controller 10 for controlling the entire robot, a battery 11 serving as a power source for the robot, a speaker 12, an internal microphone 13, an actuator 14A and an actuator 14B for moving two wheels, and a communication interface (I / F). 24), the tag reader 26, and the like are accommodated.

内部マイク１３は、特定の話者からの発話を含む周囲の音声を集音し、得られた音声データをコントローラ１０に送出する。 The internal microphone 13 collects ambient sounds including utterances from a specific speaker, and sends the obtained audio data to the controller 10.

コントローラ１０は、ＣＰＵ１０Ａやメモリ１０Ｂを内蔵しており、ＣＰＵ１０Ａにおいて、メモリ１０Ｂに記憶された制御プログラムが実行されることにより、各種の処理を行う。 The controller 10 includes a CPU 10A and a memory 10B. The CPU 10A executes various processes by executing a control program stored in the memory 10B.

通信インターフェース２４は、参加者（図１の第一の参加者３００、第二の参加者３１０、および第三の参加者３２０）に付与された通信端末装置（図１の第一の音声出力部３０４、第二の音声出力部３１４、および第三の音声出力部３２４）との間で通信を行う。 The communication interface 24 is a communication terminal device (first audio output unit in FIG. 1) assigned to the participants (first participant 300, second participant 310, and third participant 320 in FIG. 1). 304, the second audio output unit 314, and the third audio output unit 324).

タグリーダ２６は、参加者（図１の第一の参加者３００、第二の参加者３１０、および第三の参加者３２０）に付された識別タグ（図１の第一の識別タグ３０２、第二の識別タグ３１２、および第三の識別タグ３２２）からタグ識別情報を読み取る。 The tag reader 26 is configured to identify identification tags (first identification tag 302, FIG. 1, first identification tag 302, second participant 310, and third participant 320 in FIG. 1). The tag identification information is read from the second identification tag 312 and the third identification tag 322).

頭部２には、ＣＣＤカメラ２１ＡおよびＣＣＤカメラ２１Ｂ、ならびに頭部２を回転するためのアクチュエータ２２Ａおよびアクチュエータ２２Ｂ等が収納されている。 The head 2 houses a CCD camera 21A and a CCD camera 21B, an actuator 22A and an actuator 22B for rotating the head 2, and the like.

ＣＣＤカメラ２１ＡおよびＣＣＤカメラ２１Ｂは、周囲の状況を撮像し、得られた画像データを、コントローラ１０に送出する。 The CCD camera 21 A and the CCD camera 21 B capture the surrounding situation and send the obtained image data to the controller 10.

アクチュエータ２２Ａおよびアクチュエータ２２Ｂは、ロボット１００の頭部２を上下左右に回転させる。 The actuator 22A and the actuator 22B rotate the head 2 of the robot 100 up and down and left and right.

コントローラ１０は、内部マイク１３や通信インターフェース２４を介して得られる音声データやＣＣＤカメラ２１ＡおよびＣＣＤカメラ２１Ｂから得られる画像データに基づいて、メモリ１０Ｂから適宜情報を読み出し、参加者の状況や参加者の言動を解析し、対応するロボット１００の言動を決定する。 The controller 10 appropriately reads information from the memory 10B based on audio data obtained via the internal microphone 13 and the communication interface 24 and image data obtained from the CCD camera 21A and the CCD camera 21B, and the situation of the participants and the participants And the behavior of the corresponding robot 100 is determined.

コントローラ１０は、アクチュエータ１４Ａ、アクチュエータ１４Ｂ、アクチュエータ２２Ａ、アクチュエータ２２Ｂ、およびタグリーダ２６等を制御してロボット１００に決定した動作を実行させる。また、コントローラ１０は、合成音を生成し、スピーカ１２に供給して、ロボット１００に決定した発話を出力させる。 The controller 10 controls the actuator 14A, the actuator 14B, the actuator 22A, the actuator 22B, the tag reader 26, and the like to cause the robot 100 to execute the determined operation. In addition, the controller 10 generates a synthesized sound, supplies the synthesized sound to the speaker 12, and causes the robot 100 to output the determined utterance.

図４は、本実施の形態におけるロボット１００のコントローラ１０の構成を詳細に示すブロック図である。図４では、コントローラ１０のハードウェア単位の構成ではなく、機能単位のブロックを示す。
ロボット１００のコントローラ１０（対応決定システム）は、話者に付与された第一のマイク３０６、第二のマイク３１６、第三のマイク３２６等（音声入力デバイス）から入力された音声に関するデータが所定の条件に合致することを示すイベントを、当該話者に対応づけられた音声識別情報とともに入力し、話者に対する対応を決定する言動決定部１１０（対応決定部）と、対応決定部が入力したイベントに対応づけられた音声識別情報により特定される話者の位置情報を取得する位置情報取得部１０８と、を含み、言動決定部１１０は、イベントと、話者の位置情報とに基づき、ロボット１００の言動（対応）を決定する。 FIG. 4 is a block diagram showing in detail the configuration of the controller 10 of the robot 100 according to the present embodiment. FIG. 4 shows functional unit blocks, not hardware configuration of the controller 10.
The controller 10 (correspondence determination system) of the robot 100 receives predetermined data related to voice input from the first microphone 306, the second microphone 316, the third microphone 326, etc. (voice input device) assigned to the speaker. The event indicating that the above condition is met is input together with the voice identification information associated with the speaker, and the behavior determining unit 110 (correspondence determining unit) that determines the response to the speaker and the correspondence determining unit input A position information acquisition unit 108 that acquires the position information of the speaker specified by the voice identification information associated with the event, and the behavior determination unit 110 is based on the event and the position information of the speaker. 100 behaviors (correspondence) are determined.

ロボット１００のコントローラ１０は、音声入力部１０２と、音声認識部１０４と、イベント出力部１０６と、位置情報取得部１０８と、言動決定部１１０と、言動実行部１１２の一部と、音声認識辞書１１４と、条件記憶部１１６と、対応言動記憶部１１８と、識別情報記憶部１２０と、ロボット言動記憶部１３０と、シナリオ記憶部１３２とを含む。なお、言動実行部１１２は、コントローラ１０により実現されるメカ制御部１３４、音声合成部１３６、および出力部１３８、ならびにアクチュエータ１４Ａ、アクチュエータ１４Ｂ、アクチュエータ２２Ａ、アクチュエータ２２Ｂ、およびスピーカ１２を含む。 The controller 10 of the robot 100 includes a voice input unit 102, a voice recognition unit 104, an event output unit 106, a position information acquisition unit 108, a behavior determination unit 110, a part of the behavior execution unit 112, a voice recognition dictionary. 114, a condition storage unit 116, a corresponding behavior storage unit 118, an identification information storage unit 120, a robot behavior storage unit 130, and a scenario storage unit 132. The speech execution unit 112 includes a mechanical control unit 134, a voice synthesis unit 136, and an output unit 138 realized by the controller 10, and an actuator 14A, an actuator 14B, an actuator 22A, an actuator 22B, and the speaker 12.

通信インターフェース２４は、ネットワーク４００を介して、複数の音声出力部（第一の音声出力部３０４、第二の音声出力部３１４、および第三の音声出力部３２４）から各種データを受信する。 The communication interface 24 receives various data from a plurality of audio output units (the first audio output unit 304, the second audio output unit 314, and the third audio output unit 324) via the network 400.

音声入力部１０２は、通信インターフェース２４が受信した、複数の音声出力部からの音声データを、各音声識別情報に対応づけて入力する。音声入力部１０２は、入力した音声データを、音声識別情報とともに、音声認識部１０４およびイベント出力部１０６に出力する。また、音声入力部１０２は、内部マイク１３が集音した音声データも入力し、音声認識部１０４に出力する。 The voice input unit 102 inputs the voice data received by the communication interface 24 from the plurality of voice output units in association with each voice identification information. The voice input unit 102 outputs the input voice data to the voice recognition unit 104 and the event output unit 106 together with the voice identification information. The voice input unit 102 also receives voice data collected by the internal microphone 13 and outputs the voice data to the voice recognition unit 104.

なお、音声入力部１０２は、通信インターフェース２４から入力される音声データまたは内部マイク１３から入力される音声データの両方を同時に入力することもできるが、いずれか一方からの入力をオフとし、他方からの入力のみをオンとすることもできる。 The voice input unit 102 can simultaneously input both the voice data input from the communication interface 24 or the voice data input from the internal microphone 13, but the input from either one is turned off and the other is started. It is also possible to turn on only the input.

音声入力部１０２が内部マイク１３から音声データを入力する場合の各構成要素の処理は後述し、音声入力部１０２が通信インターフェース２４から音声データを入力する場合の処理機能を以下に説明する。 Processing of each component when the voice input unit 102 inputs voice data from the internal microphone 13 will be described later, and processing functions when the voice input unit 102 inputs voice data from the communication interface 24 will be described below.

音声認識部１０４は、音声入力部１０２が入力した音声データを音声認識する。音声認識辞書１１４は、音声認識単語の集合である音声認識語彙を記憶する音声認識単語記憶部を含む。音声認識部１０４は、音声入力部１０２が入力した音声データと、音声認識辞書１１４に記憶された音声認識語彙とのマッチングを行う。音声認識部１０４は、音声データが音声認識されると、音声識別情報に対応づけて音声認識結果をイベント出力部１０６に出力する。 The voice recognition unit 104 recognizes the voice data input by the voice input unit 102. The speech recognition dictionary 114 includes a speech recognition word storage unit that stores speech recognition vocabulary that is a set of speech recognition words. The voice recognition unit 104 performs matching between the voice data input by the voice input unit 102 and the voice recognition vocabulary stored in the voice recognition dictionary 114. When the voice data is voice-recognized, the voice recognition unit 104 outputs a voice recognition result to the event output unit 106 in association with the voice identification information.

イベント出力部１０６は、音声入力部１０２から出力される音声データ、および音声認識部１０４から出力される音声認識結果に基づき、音声データが所定の条件に合致するか否かを検出し、条件に合致した場合に、その条件に合致したことを示すイベントを音声識別情報に対応づけて出力する。なお、イベント出力部１０６は、音声認識部１０４から音声認識結果を取得した場合は、イベントとともに音声認識結果も出力する。 The event output unit 106 detects whether or not the voice data meets a predetermined condition based on the voice data output from the voice input unit 102 and the voice recognition result output from the voice recognition unit 104. If they match, an event indicating that the condition is matched is output in association with the voice identification information. When the event output unit 106 acquires the speech recognition result from the speech recognition unit 104, the event output unit 106 also outputs the speech recognition result together with the event.

条件記憶部１１６は、所定の条件と、その条件に合致したことを示すイベントとを対応づけて記憶する。イベント出力部１０６は、条件記憶部１１６を参照して音声データが所定の条件に合致するか否かを検出する。 The condition storage unit 116 stores a predetermined condition and an event indicating that the condition is met in association with each other. The event output unit 106 refers to the condition storage unit 116 to detect whether or not the audio data meets a predetermined condition.

言動決定部１１０は、イベント出力部１０６からイベントと音声識別情報とが出力されると、その音声識別情報で特定される参加者の位置情報を位置情報取得部１０８から取得する。識別情報記憶部１２０は、各参加者の音声識別情報と、タグ識別情報とを対応づけて記憶する。言動決定部１１０は、イベント出力部１０６からイベントおよび音声識別情報を取得すると、識別情報記憶部１２０を参照して、音声識別情報に対応づけられたタグ識別情報を読み出し、そのタグ識別情報の位置情報の取得を位置情報取得部１０８に要請する。位置情報取得部１０８は、タグリーダ２６により参加者の識別タグを読み取る処理を行い、目的のタグ識別情報を有する識別タグの位置情報を取得する。次いで、位置情報取得部１０８は、この位置情報を言動決定部１１０に通知する。言動決定部１１０は、位置情報取得部１０８から通知された位置情報を目的の参加者の位置情報として取得する。 When the event output unit 106 outputs an event and voice identification information, the behavior determination unit 110 acquires the position information of the participant specified by the voice identification information from the position information acquisition unit 108. The identification information storage unit 120 stores voice identification information of each participant and tag identification information in association with each other. When the behavior determination unit 110 acquires the event and the voice identification information from the event output unit 106, the behavior determination unit 110 refers to the identification information storage unit 120, reads the tag identification information associated with the voice identification information, and determines the position of the tag identification information. The position information acquisition unit 108 is requested to acquire information. The position information acquisition unit 108 performs processing for reading the identification tag of the participant by the tag reader 26 and acquires the position information of the identification tag having the target tag identification information. Next, the position information acquisition unit 108 notifies the behavior determination unit 110 of this position information. The behavior determination unit 110 acquires the position information notified from the position information acquisition unit 108 as the position information of the intended participant.

言動決定部１１０は、イベント出力部１０６から取得したイベントと参加者の位置情報とに基づき、言動実行部１１２に実行させる言動を決定する。対応言動記憶部１１８は、イベントと、それに対応するロボット１００の言動とを対応づけて記憶する。言動決定部１１０は、対応言動記憶部１１８を参照して、イベント出力部１０６から取得したイベントに対応するロボット１００の言動を読み出す。 The behavior determination unit 110 determines the behavior to be performed by the behavior execution unit 112 based on the event acquired from the event output unit 106 and the location information of the participant. The corresponding behavior storage unit 118 stores the event and the behavior of the robot 100 corresponding to the event in association with each other. The behavior determining unit 110 refers to the corresponding behavior storage unit 118 and reads the behavior of the robot 100 corresponding to the event acquired from the event output unit 106.

ロボット言動記憶部１３０は、特定の状況におけるロボットの発話データおよび動作データを記憶する。シナリオ記憶部１３２は、シナリオ情報を記憶する。 The robot behavior storage unit 130 stores utterance data and motion data of the robot in a specific situation. The scenario storage unit 132 stores scenario information.

言動決定部１１０は、対応言動記憶部１１８から読み出したロボット１００の言動、話者の位置情報に基づき、必要に応じてロボット言動記憶部１３０およびシナリオ記憶部１３２を参照して、ロボット１００の言動を決定する。 The behavior determining unit 110 refers to the robot behavior storage unit 130 and the scenario storage unit 132 as necessary based on the behavior of the robot 100 read from the corresponding behavior storage unit 118 and the position information of the speaker. To decide.

言動決定部１１０は、決定した言動を指令として、メカ制御部１３４および音声合成部１３６に送出する。メカ制御部１３４は、言動決定部１１０から送出された指令に基づき、アクチュエータ１４Ａ、アクチュエータ１４Ｂ、アクチュエータ２２Ａ、およびアクチュエータ２２Ｂを駆動するための制御信号を生成し、これをアクチュエータ１４Ａ、１４Ｂ、２２Ａ、および２２Ｂへ送出する。これにより、アクチュエータ１４Ａ、１４Ｂ、２２Ａ、および２２Ｂは、制御信号にしたがって駆動する。 The behavior determination unit 110 sends the determined behavior as a command to the mechanical control unit 134 and the speech synthesis unit 136. The mechanical control unit 134 generates a control signal for driving the actuator 14A, the actuator 14B, the actuator 22A, and the actuator 22B based on the command sent from the behavior determining unit 110, and outputs the control signal to the actuators 14A, 14B, 22A, And 22B. Thus, the actuators 14A, 14B, 22A, and 22B are driven according to the control signal.

音声合成部１３６は、言動決定部１１０から送出された指令に基づき、合成音を生成する。出力部１３８には、音声合成部１３６からの合成音のディジタルデータが供給されるようになっており、出力部１３８は、それらのディジタルデータを、アナログの音声データにＤ／Ａ変換し、スピーカ１２に供給して出力させる。 The speech synthesizer 136 generates a synthesized sound based on the command sent from the behavior determining unit 110. The output unit 138 is supplied with digital data of the synthesized sound from the voice synthesizer 136. The output unit 138 D / A converts the digital data into analog voice data, and the speaker. 12 for output.

次に、音声入力部１０２が内部マイク１３から音声データを入力する場合の各構成要素の処理機能を説明する。
この場合、音声入力部１０２が入力する音声データには、音声識別情報が対応づけられないことになる。ここでは図示していないが、コントローラ１０は、たとえばＣＣＤカメラ２１ＡやＣＣＤカメラ２１Ｂから入力される画像データ等を用いて、話者を認識するようにすることができる。話者が認識できる場合、音声認識部１０４、イベント出力部１０６、および言動決定部１１０は、上述した音声入力部１０２が通信インターフェース２４から音声データを入力する場合と同様の処理を行うことができる。 Next, the processing function of each component when the voice input unit 102 inputs voice data from the internal microphone 13 will be described.
In this case, voice identification information is not associated with voice data input by the voice input unit 102. Although not shown here, the controller 10 can recognize the speaker using image data input from the CCD camera 21A or the CCD camera 21B, for example. When the speaker can be recognized, the voice recognition unit 104, the event output unit 106, and the behavior determination unit 110 can perform the same processing as when the voice input unit 102 inputs voice data from the communication interface 24 described above. .

また、話者が認識できない場合、音声入力部１０２が入力した音声データや音声認識部１０４が認識した音声認識結果は、音声入力部１０２や音声認識部１０４から言動決定部１１０に直接入力されるようにすることもでき、言動決定部１１０は、ロボット言動記憶部１３０やシナリオ記憶部１３２を参照して、ロボット１００の言動を決定することができる。話者が認識できない場合でも、音声入力部１０２が入力した音声データや音声認識部１０４が認識した音声認識結果は、イベント出力部１０６に入力され、イベント出力部１０６により、所定の条件に合致するか否かを検出する処理が行われるようにすることもできる。 If the speaker cannot be recognized, the voice data input by the voice input unit 102 and the voice recognition result recognized by the voice recognition unit 104 are directly input from the voice input unit 102 and the voice recognition unit 104 to the behavior determining unit 110. The behavior determination unit 110 can determine the behavior of the robot 100 with reference to the robot behavior storage unit 130 and the scenario storage unit 132. Even when the speaker cannot be recognized, the voice data input by the voice input unit 102 and the voice recognition result recognized by the voice recognition unit 104 are input to the event output unit 106, and the event output unit 106 meets a predetermined condition. It is also possible to perform processing for detecting whether or not.

図５は、条件記憶部１１６の内部構成の一例を示す図である。以下、図４も参照して説明する。
条件記憶部１１６は、イベント欄と条件欄とを含む。イベント欄は、番号欄と内容欄とを含む。 FIG. 5 is a diagram illustrating an example of an internal configuration of the condition storage unit 116. Hereinafter, a description will be given with reference to FIG.
The condition storage unit 116 includes an event column and a condition column. The event column includes a number column and a content column.

たとえば、番号「１」のイベントの内容は「音声入力不具合」、条件は「所定時間音声入力がなし」である。つまり、何らかの不具合により音声入力部１０２が第一の参加者３００、第二の参加者３１０、第三の参加者３２０からの音声を入力できない場合にこの条件に合致する。 For example, the content of the event with the number “1” is “voice input failure” and the condition is “no voice input for a predetermined time”. That is, this condition is met when the voice input unit 102 cannot input voice from the first participant 300, the second participant 310, and the third participant 320 due to some trouble.

音声入力不具合が生じる原因としては、たとえば、各音声出力部を含む通信端末装置の電源切れ等の不具合や、各マイクと対応する音声出力部との間の断線等が考えられる。本実施の形態において、各マイクは、各参加者の発話音声がない場合でも、周囲の音声や音声出力部特有のノイズにより、音声パワーがゼロの状態が継続することはない構成とされる。そのため、イベント出力部１０６は、音声入力部１０２から出力される音声のパワーがゼロの状態が所定時間以上継続した場合、「所定時間音声入力なし」という条件に合致することを検出することができる。また、たとえば、ロボット１００から各通信端末装置に定期的にテストデータを送信し、所定時間内に応答があるか否かに応じて、「所定時間音声入力なし」という条件に合致するか否かを検出することもできる。イベント出力部１０６は、各通信端末装置にテストデータを送信した後、所定時間内に応答がない場合に、「所定時間音声入力なし」という条件に合致したことを検出することができる。 Possible causes of the voice input failure include, for example, a failure such as a power-off of the communication terminal device including each voice output unit, a disconnection between each microphone and the corresponding voice output unit, and the like. In the present embodiment, each microphone is configured such that the voice power does not continue to be zero due to surrounding voice or noise peculiar to the voice output unit even when there is no speech voice of each participant. Therefore, the event output unit 106 can detect that the condition of “no audio input for a predetermined time” is met when the state where the power of the audio output from the audio input unit 102 is zero continues for a predetermined time or longer. . Also, for example, whether test data is periodically transmitted from the robot 100 to each communication terminal device, and whether or not the condition “no voice input for a predetermined time” is met, depending on whether or not there is a response within a predetermined time. Can also be detected. The event output unit 106 can detect that the condition “no voice input for a predetermined time” is met when there is no response within a predetermined time after transmitting test data to each communication terminal device.

また、たとえば、番号「２」のイベントの内容は「発話検出」、条件は「音声レベルが所定の閾値以上」である。つまり、第一の参加者３００、第二の参加者３１０、および第三の参加者３２０のいずれかから発話があった場合にこの条件に合致する。 For example, the content of the event with the number “2” is “speech detection”, and the condition is “sound level is equal to or higher than a predetermined threshold”. That is, this condition is met when there is an utterance from any of the first participant 300, the second participant 310, and the third participant 320.

イベント出力部１０６は、音声入力部１０２から出力される音声のパワーが所定の閾値以上となった場合に、「音声レベルが所定の閾値以上」という条件に合致することを検出することができる。また、イベント出力部１０６は、音声入力部１０２から出力される音声のパワーが所定の閾値以上となった場合に、所定時間その識別情報の音声を収集、解析し、人の声の特徴量が含まれているかどうかを判定し、その判定結果に応じて条件に合致するか否かを検出することもできる。 The event output unit 106 can detect that the condition that “the audio level is equal to or higher than the predetermined threshold” is met when the power of the audio output from the audio input unit 102 is equal to or higher than the predetermined threshold. The event output unit 106 collects and analyzes the voice of the identification information for a predetermined time when the power of the voice output from the voice input unit 102 exceeds a predetermined threshold value, It is also possible to determine whether or not it is included and detect whether or not the condition is met according to the determination result.

また、たとえば、番号「３」のイベントの内容は「音声認識結果」、条件は「音声認識結果取得」である。つまり、音声認識部１０４が音声認識結果を出力した場合にこの条件に合致する Further, for example, the content of the event with the number “3” is “voice recognition result”, and the condition is “voice recognition result acquisition”. That is, this condition is met when the voice recognition unit 104 outputs a voice recognition result.

図６は、対応言動記憶部１１８の内部構成の一例を示す図である。以下、図４も参照して説明する。
対応言動記憶部１１８は、イベント番号欄と言動欄とを有する。イベント番号欄には、条件記憶部１１６のイベント欄の番号欄の番号に対応する番号が記憶される。 FIG. 6 is a diagram illustrating an example of the internal configuration of the corresponding behavior storage unit 118. Hereinafter, a description will be given with reference to FIG.
The corresponding behavior storage unit 118 has an event number column and a behavior column. In the event number column, a number corresponding to the number in the number column of the event column of the condition storage unit 116 is stored.

たとえば、番号「１」のイベントが出力された場合の言動は、「（１）対応する話者に近づく。（２）内部マイクによる音声入力に切り替え。」である。言動決定部１１０は、このイベントに対応づけられた参加者の位置情報に基づき、ロボット１００が対応する話者に近づく行為を言動実行部１１２に実行させる。この際、言動決定部１１０は、位置情報取得部１０８から対応する参加者の位置情報を逐次取得するようにし、参加者との距離が所定距離内になったら、（２）の言動を実行させる。（２）の言動としては、音声入力部１０２が内部マイク１３からの音声データを選択的に入力するようにし、対応する参加者の音声をロボット１００の内部マイク１３から直接入力する。 For example, the behavior when the event with the number “1” is output is “(1) Approaching the corresponding speaker. (2) Switching to voice input by the internal microphone”. The behavior determination unit 110 causes the behavior execution unit 112 to perform an action in which the robot 100 approaches the corresponding speaker based on the position information of the participant associated with the event. At this time, the behavior determining unit 110 sequentially acquires the location information of the corresponding participant from the location information acquiring unit 108, and executes the behavior (2) when the distance from the participant is within a predetermined distance. . As the behavior of (2), the voice input unit 102 selectively inputs voice data from the internal microphone 13 and directly inputs the voice of the corresponding participant from the internal microphone 13 of the robot 100.

また、たとえば、番号「２」のイベントが出力された場合の言動は、「話者の方を向く。」である。言動決定部１１０は、このイベントに対応づけられた参加者の位置情報に基づき、ロボット１００が対応する話者の方を向く行為を言動実行部１１２に実行させる。 Further, for example, the behavior when the event of the number “2” is output is “Look toward the speaker”. The behavior determination unit 110 causes the behavior execution unit 112 to perform an action in which the robot 100 faces the corresponding speaker based on the participant position information associated with the event.

また、たとえば番号「３」のイベントが出力された場合の言動は、「（１）話者の方を向く。（２）対応する音声出力。」である。言動決定部１１０は、このイベントに対応づけられた参加者の位置情報に基づき、ロボット１００が対応する話者の方を向く行為を言動実行部１１２に実行させる。つづいて、言動決定部１１０は、ロボット言動記憶部１３０やシナリオ記憶部１３２を参照して、音声認識結果に対応する応答の音声を出力を決定し、言動実行部１１２に実行させる。 For example, when the event of the number “3” is output, the behavior is “(1) facing the speaker. (2) corresponding voice output”. The behavior determination unit 110 causes the behavior execution unit 112 to perform an action in which the robot 100 faces the corresponding speaker based on the participant position information associated with the event. Subsequently, the behavior determining unit 110 refers to the robot behavior storage unit 130 and the scenario storage unit 132 to determine the output of the response voice corresponding to the speech recognition result, and causes the behavior execution unit 112 to execute it.

図７は、識別情報記憶部１２０の内部構成の一例を示す図である。
識別情報記憶部１２０は、名前欄と、音声識別情報欄と、タグ識別情報欄とを含む。
名前欄には参加者の名前が記憶される。音声識別情報欄には、各音声出力部の識別情報が記憶される。タグ識別情報欄には、各識別タグのタグ識別情報が記憶される。 FIG. 7 is a diagram illustrating an example of an internal configuration of the identification information storage unit 120.
The identification information storage unit 120 includes a name field, a voice identification information field, and a tag identification information field.
The name field stores the names of the participants. In the voice identification information column, identification information of each voice output unit is stored. Tag identification information for each identification tag is stored in the tag identification information column.

ここで、たとえば「さくら」という名前の参加者には、音声識別情報「１１１１」、タグ識別情報「０００１」が対応づけられている。また、「もも」という名前の参加者には、音声識別情報「１１１２」、タグ識別情報「０００２」が対応づけられている。また、「たろう」という名前の参加者には、音声識別情報「１１１３」、タグ識別情報「０００３」が対応づけられている。 Here, for example, voice identification information “1111” and tag identification information “0001” are associated with the participant named “Sakura”. Also, voice identification information “1112” and tag identification information “0002” are associated with the participant named “Momo”. Also, voice identification information “1113” and tag identification information “0003” are associated with the participant named “Taro”.

ここで、識別情報記憶部１２０は、必ずしも名前欄を有する必要はないが、名前欄に参加者の名前を記憶しておくことにより、ロボット１００から参加者へ名前を呼びかけることができ、参加者とロボット１００とのコミュニケーションをより円滑に図ることができる。また、識別情報記憶部１２０は、各参加者の性別、年齢等、参加者に関する情報を記憶する欄をさらに含むことができる。これにより、ロボット１００が参加者に応じた応対をすることができる。 Here, the identification information storage unit 120 does not necessarily have a name field, but by storing the name of the participant in the name field, the robot 100 can call the name to the participant. And the robot 100 can be more smoothly communicated. In addition, the identification information storage unit 120 may further include a column for storing information related to the participants such as the gender and age of each participant. Thereby, the robot 100 can respond according to a participant.

図８は、本実施の形態におけるロボット１００のコントローラ１０の処理手順を示すフローチャートである。
コントローラ１０の処理手順（対応決定方法）は、話者に付与された第一のマイク３０６、第二のマイク３１６、第三のマイク３２６等（音声入力デバイス）から入力された音声に関するデータを、音声識別情報とともに入力し、音声に関するデータが所定の条件に合致するか否かを検出し（Ｓ１００）、条件に合致した場合に（Ｓ１００のＹＥＳ）、条件に合致することを示すイベントを、音声識別情報とともに出力するステップ（Ｓ１０２）と、所定の条件に合致することを示すイベントを、当該話者に対応づけられた音声識別情報とともに入力し、入力されたイベントに対応づけられた音声識別情報により特定される話者の位置情報を取得するステップ（Ｓ１０４）と、イベントと、話者の位置情報とに基づき、話者に対するロボット１００の言動（対応）を決定するステップ（Ｓ１０６）と、を含む。 FIG. 8 is a flowchart showing a processing procedure of the controller 10 of the robot 100 in the present embodiment.
The processing procedure (correspondence determination method) of the controller 10 includes data relating to voice input from the first microphone 306, the second microphone 316, the third microphone 326, etc. (voice input device) assigned to the speaker. It is input together with the voice identification information, and it is detected whether or not the data related to the voice meets a predetermined condition (S100). If the condition is met (YES in S100), an event indicating that the condition is met is set to voice. A step of outputting together with the identification information (S102), and an event indicating that the predetermined condition is met together with the voice identification information associated with the speaker, and the voice identification information associated with the input event The robot for the speaker is obtained based on the step (S104) of acquiring the speaker position information specified by the step S104, the event, and the speaker position information. 00 behavior includes the steps of determining (corresponding) (S106), the.

以下、具体的に説明する。
イベント出力部１０６は、音声入力部１０２から入力される音声データおよび音声認識部１０４から入力される音声認識結果を常時モニタリングする。音声に関するデータが条件記憶部１１６に記憶されたいずれかの条件に合致することを検出した場合（Ｓ１００のＹＥＳ）、イベント出力部１０６は、その条件に対応するイベントを音声識別情報に対応づけて言動決定部１１０に出力する（Ｓ１０２）。 This will be specifically described below.
The event output unit 106 constantly monitors the voice data input from the voice input unit 102 and the voice recognition result input from the voice recognition unit 104. When it is detected that the data relating to the voice matches any of the conditions stored in the condition storage unit 116 (YES in S100), the event output unit 106 associates the event corresponding to the condition with the voice identification information. It outputs to the behavior determination part 110 (S102).

言動決定部１１０は、イベント出力部１０６からイベントおよび音声識別情報が出力されると、識別情報記憶部１２０を参照して、音声識別情報に対応づけられたタグ識別情報を読み出す。次いで、言動決定部１１０は、位置情報取得部１０８にそのタグ識別情報を通知する。位置情報取得部１０８は、言動決定部１１０が読み出したタグ識別情報を有する識別タグを検出し、その位置を算出し、言動決定部１１０に通知する。これにより、言動決定部１１０は、イベントに対応するタグ識別情報を有する参加者の位置情報を取得する（Ｓ１０４）。 When the event output unit 106 outputs an event and voice identification information, the behavior determination unit 110 reads the tag identification information associated with the voice identification information with reference to the identification information storage unit 120. Next, the behavior determination unit 110 notifies the position information acquisition unit 108 of the tag identification information. The position information acquisition unit 108 detects an identification tag having the tag identification information read by the behavior determining unit 110, calculates its position, and notifies the behavior determining unit 110. Thereby, the behavior determining unit 110 acquires the location information of the participant having the tag identification information corresponding to the event (S104).

言動決定部１１０は、対応言動記憶部１１８を参照して、イベントに対応する言動を読み出す。また、言動決定部１１０は、必要に応じて、ロボット言動記憶部１３０およびシナリオ記憶部１３２も参照する。言動決定部１１０は、対応言動記憶部１１８、ロボット言動記憶部１３０、シナリオ記憶部１３２から読み出した情報、および参加者の位置情報に基づき、ロボット１００の言動を決定する（Ｓ１０６）。つづいて、言動決定部１１０は、決定した言動をメカ制御部１３４、および音声合成部１３６に通知する。 The behavior determination unit 110 reads the behavior corresponding to the event with reference to the corresponding behavior storage unit 118. The behavior determining unit 110 also refers to the robot behavior storage unit 130 and the scenario storage unit 132 as necessary. The behavior determination unit 110 determines the behavior of the robot 100 based on the information read from the corresponding behavior storage unit 118, the robot behavior storage unit 130, the scenario storage unit 132, and the position information of the participants (S106). Subsequently, the behavior determining unit 110 notifies the mechanical control unit 134 and the speech synthesis unit 136 of the determined behavior.

メカ制御部１３４および音声合成部１３６等の言動実行部１１２は、言動決定部１１０が決定した言動を実行する（Ｓ１０８）。 The behavior executing unit 112 such as the mechanical control unit 134 and the voice synthesizing unit 136 executes the behavior determined by the behavior determining unit 110 (S108).

言動決定部１１０が決定した一連の言動が終わると、コントローラ１０の処理を終了するか否かが判断され（Ｓ１１０）、終了しない場合（Ｓ１１０のＮＯ）、ステップＳ１００に戻る。ステップＳ１１０で、コントローラ１０の処理を終了する場合（Ｓ１１０のＹＥＳ）、処理を終える。 When the series of behaviors determined by the behavior determination unit 110 is finished, it is determined whether or not to end the processing of the controller 10 (S110). If not finished (NO in S110), the process returns to step S100. In step S110, when the process of the controller 10 is terminated (YES in S110), the process is terminated.

次に、具体例を説明する。以下、図１〜図８を適宜参照して説明する。
たとえば、ロボット１００が、第一の参加者３００（さくら）、第二の参加者３１０（もも）、第三の参加者３２０（たろう）と会話する場合の例を説明する。 Next, a specific example will be described. Hereinafter, description will be made with reference to FIGS.
For example, an example in which the robot 100 has a conversation with the first participant 300 (Sakura), the second participant 310 (Momo), and the third participant 320 (Taro) will be described.

第一の参加者３００である「さくら」には、タグ識別情報「０００１」を有する第一の識別タグ３０２が付され、音声識別情報「１１１１」を有する第一の音声出力部３０４が付与されている。「さくら」が発話する音声は、第一のマイク３０６により集音され、第一の音声出力部３０４から、音声識別情報「１１１１」に対応づけてロボット１００に送信される。 The first participant 300 “Sakura” is assigned the first identification tag 302 having the tag identification information “0001” and the first voice output unit 304 having the voice identification information “1111”. ing. The voice uttered by “Sakura” is collected by the first microphone 306 and transmitted from the first voice output unit 304 to the robot 100 in association with the voice identification information “1111”.

第二の参加者３１０である「もも」には、タグ識別情報「０００２」を有する第二の識別タグ３１２が付され、音声識別情報「１１１２」を有する第二の音声出力部３１４が付与されている。「もも」が発話する音声は、第二のマイク３１６により集音され、第二の音声出力部３１４から、音声識別情報「１１１２」に対応づけてロボット１００に送信される。 The second participant 310 “Momo” is provided with a second identification tag 312 having tag identification information “0002” and a second voice output unit 314 having voice identification information “1112”. Has been. The voice uttered by “Momo” is collected by the second microphone 316 and transmitted from the second voice output unit 314 to the robot 100 in association with the voice identification information “1112”.

第三の参加者３２０である「たろう」には、タグ識別情報「０００３」を有する第三の識別タグ３２２が付され、音声識別情報「１１１３」を有する第三の音声出力部３２４が付与されている。「たろう」が発話する音声は、第三のマイク３２６により集音され、第三の音声出力部３２４から、音声識別情報「１１１３」に対応づけてロボット１００に送信される。 The third participant 320 “Taro” is provided with a third identification tag 322 having tag identification information “0003” and a third voice output unit 324 having voice identification information “1113”. ing. The voice uttered by “Taro” is collected by the third microphone 326 and transmitted from the third voice output unit 324 to the robot 100 in association with the voice identification information “1113”.

たとえば、音声識別情報「１１１１」に対応づけられた音声データの入力が所定時間ない場合、ロボット１００のイベント出力部１０６は、音声識別情報「１１１１」に対応づけられた音声データが、イベント「音声入力不具合」に対応づけられた条件に合致することを検出する。イベント出力部１０６は、音声識別情報「１１１１」とともに、そのイベントを示す番号「１」を言動決定部１１０に出力する。 For example, when there is no input of voice data associated with the voice identification information “1111” for a predetermined time, the event output unit 106 of the robot 100 determines that the voice data associated with the voice identification information “1111” It is detected that the condition associated with “input failure” is met. The event output unit 106 outputs the number “1” indicating the event together with the voice identification information “1111” to the behavior determining unit 110.

言動決定部１１０は、音声識別情報「１１１１」に基づき、識別情報記憶部１２０を参照し、音声識別情報「１１１１」に対応するタグ識別情報「０００１」を読み出す。つづいて、言動決定部１１０は、位置情報取得部１０８にタグ識別情報「０００１」を通知する。位置情報取得部１０８は、タグ識別情報「０００１」を有する識別タグである第一の識別タグ３０２の位置情報を取得し、言動決定部１１０に通知する。 The behavior determination unit 110 reads the tag identification information “0001” corresponding to the voice identification information “1111” with reference to the identification information storage unit 120 based on the voice identification information “1111”. Subsequently, the behavior determining unit 110 notifies the position information acquiring unit 108 of the tag identification information “0001”. The position information acquisition unit 108 acquires the position information of the first identification tag 302 that is an identification tag having the tag identification information “0001”, and notifies the behavior determination unit 110 of the position information.

言動決定部１１０は、イベント「音声入力不具合」を示す番号「１」に対応づけられた言動をロボット１００に実行させるよう決定する。具体的には、位置情報取得部１０８から取得した第一の識別タグ３０２の位置情報に基づき、「さくら」に近づくための動作を決定する。言動決定部１１０は、言動実行部１１２に決定した動作を実行させる。ロボット１００が「さくら」に近づくと、言動決定部１１０は、音声入力部１０２が内部マイク１３から音声を入力するよう切り替える。このように、「音声入力不具合」を示すイベントが出力された場合、ロボット１００が対応する話者に近づくことにより、ロボット１００が内部マイク１３から話者の音声を入力する際に、周囲の雑音や遠距離発話を避けることができ、より正確に音声認識等を行うことができる。ただし、この言動は必ずしも行わなくてよく、単にロボット１００が話者の方を向くだけの言動でもよく、また、話者の方を向いて、「さくらちゃん、こっちに来て」等発話し、話者がロボット１００の近くに来るような言動を行うこともできる。 The behavior determining unit 110 determines to cause the robot 100 to execute the behavior associated with the number “1” indicating the event “voice input failure”. Specifically, based on the position information of the first identification tag 302 acquired from the position information acquisition unit 108, an operation for approaching “Sakura” is determined. The behavior determining unit 110 causes the behavior executing unit 112 to execute the determined operation. When the robot 100 approaches “Sakura”, the behavior determining unit 110 switches the voice input unit 102 to input voice from the internal microphone 13. As described above, when an event indicating “voice input failure” is output, when the robot 100 approaches the corresponding speaker, when the robot 100 inputs the speaker's voice from the internal microphone 13, ambient noise is generated. And long-distance speech can be avoided, and voice recognition and the like can be performed more accurately. However, this behavior does not necessarily have to be performed, it may be a behavior that the robot 100 just faces the speaker, or the speaker 100 faces the speaker and says “Sakura-chan, come over here” It is also possible to perform a behavior such that the speaker comes near the robot 100.

なお、ロボット１００が以上の言動を行う際、音声入力不具合が生じた参加者の名前が「さくら」であることもわかるので、ロボット１００が「さくら」の方に移動する前に、ロボット１００に「さくらちゃん、ちょっと待っていて」等の発話をさせることもできる。また、ロボット１００が「さくら」に近づいたときに、ロボット１００に「さくらちゃん、もう一度言って」等の発話をさせることもできる。この後、内部マイク１３から入力される音声に基づき、音声認識等を行うことができる。 It should be noted that when the robot 100 performs the above actions, it can be seen that the name of the participant who has the voice input problem is “Sakura”, so the robot 100 moves to “Sakura” before the robot 100 moves to “Sakura”. You can also say “Sakura-chan, wait a moment”. In addition, when the robot 100 approaches “Sakura”, the robot 100 can also make an utterance such as “Sakura-chan, say again”. Thereafter, voice recognition or the like can be performed based on the voice input from the internal microphone 13.

また、他の例として、「さくら」の方を向き、「さくらちゃんの声が聞こえないよ。近くにいるお兄さんに聞いてみて」等の発話をするようにすることもできる。 As another example, you can turn to “Sakura” and say “You can't hear Sakura-chan. Listen to your brother nearby”.

また、たとえば、音声識別情報「１１１１」に対応づけられた音声データの音声のパワーが所定の閾値以上となった場合、ロボット１００のイベント出力部１０６は、音声識別情報「１１１１」に対応づけられた音声データが、イベント「発話検出」に対応づけられた条件に合致することを検出する。イベント出力部１０６は、音声識別情報「１１１１」とともに、そのイベントを示す番号「２」を言動決定部１１０に出力する。 For example, when the voice power of the voice data associated with the voice identification information “1111” is equal to or greater than a predetermined threshold, the event output unit 106 of the robot 100 is associated with the voice identification information “1111”. It is detected that the voice data matches the condition associated with the event “speech detection”. The event output unit 106 outputs the number “2” indicating the event to the behavior determining unit 110 together with the voice identification information “1111”.

言動決定部１１０は、イベント「発話検出」を示す番号「２」に対応づけられた言動をロボット１００に実行させるよう決定する。具体的には、位置情報取得部１０８から取得した第一の識別タグ３０２の位置情報に基づき、「さくら」の方を向く言動を決定する。言動決定部１１０は、言動実行部１１２に決定した言動を実行させる。 The behavior determining unit 110 determines to cause the robot 100 to execute the behavior associated with the number “2” indicating the event “speech detection”. Specifically, based on the position information of the first identification tag 302 acquired from the position information acquisition unit 108, the behavior that faces “Sakura” is determined. The behavior determination unit 110 causes the behavior execution unit 112 to execute the determined behavior.

また、たとえば、音声識別情報「１１１１」に対応づけられた音声データの音声認識結果が出力されると、イベント出力部１０６は、音声識別情報「１１１１」に対応づけられた音声データが、イベント「音声認識結果」に対応づけられた条件に合致することを検出する。イベント出力部１０６は、音声識別情報「１１１１」とともに、そのイベントを示す番号「３」を言動決定部１１０に出力する。 For example, when the voice recognition result of the voice data associated with the voice identification information “1111” is output, the event output unit 106 converts the voice data associated with the voice identification information “1111” to the event “ It is detected that the condition associated with “speech recognition result” is met. The event output unit 106 outputs the number “3” indicating the event together with the voice identification information “1111” to the behavior determining unit 110.

言動決定部１１０は、イベント「音声認識結果」を示す番号「３」に対応づけられた言動をロボット１００に実行させるよう決定する。具体的には、位置情報取得部１０８から取得した第一の識別タグ３０２の位置情報に基づき、「さくら」の方を向く言動を決定する。つづいて、言動決定部１１０は、ロボット言動記憶部１３０およびシナリオ記憶部１３２を参照して、音声認識結果に対応する言動を決定し、ロボット１００に実行させる。たとえば、音声認識結果が「こんにちは」だった場合、ロボット１００に、「さくらちゃん、こんにちは」等と発話させる。 The behavior determining unit 110 determines to cause the robot 100 to execute the behavior associated with the number “3” indicating the event “speech recognition result”. Specifically, based on the position information of the first identification tag 302 acquired from the position information acquisition unit 108, the behavior that faces “Sakura” is determined. Subsequently, the behavior determination unit 110 refers to the robot behavior storage unit 130 and the scenario storage unit 132 to determine the behavior corresponding to the voice recognition result and causes the robot 100 to execute the behavior. For example, if the speech recognition result was a "Hello", the robot 100, "Sakura-chan, Hello" and the like and to speech.

また、たとえば、音声識別情報「１１１１」に対応づけられた音声データ、音声識別情報「１１１２」に対応づけられた音声データの音声のパワーが略同時に所定の閾値以上となった場合、ロボット１００は、該当する各参加者に対し、順番に同様の処理を行うことができる。この際、ロボット１００は、話者の方を向く動作を行うが、たとえば、「さくら」と「もも」の方を交互に向く動作を行うことができる。 Further, for example, when the voice power of the voice data associated with the voice identification information “1111” and the voice data associated with the voice identification information “1112” become substantially equal to or greater than a predetermined threshold at the same time, the robot 100 The same processing can be performed in order for each corresponding participant. At this time, the robot 100 performs an operation facing the speaker. For example, the robot 100 can perform an operation alternately facing “Sakura” and “Momo”.

次に、たとえば、ロボット１００に、クイズを出題させ、「今から言う問題に、“せーの”で答えてね」と発話させ、三人の子供から同時に回答を得る場合の例を説明する。ここでは、たとえばイベント２の「発話検出」は実行しないように設定される。 Next, for example, let us explain an example in which the robot 100 is asked to give a quiz, and “Please answer the question to be answered with“ Seno ”” and get answers from three children at the same time. . Here, for example, “utterance detection” of event 2 is set not to be executed.

このとき、三人の音声が通信インターフェース２４を介して入力されると、音声認識部１０４は、それぞれの音声データについて音声認識を行う。ロボット１００は、音声認識が行われた子供の方を向く。このとき、三人の音声認識結果の出力に時間差がある場合は、最初に音声認識結果が検出された子供に対する処理が行われる。つづいて、次の子供に対する処理が行われる。 At this time, when the voices of the three people are input via the communication interface 24, the voice recognition unit 104 performs voice recognition on the respective voice data. The robot 100 faces the child whose voice has been recognized. At this time, if there is a time difference in the output of the voice recognition results of the three persons, processing is performed on the child whose voice recognition result is first detected. Subsequently, processing for the next child is performed.

たとえば、ロボット１００が、「魚は生き物かな、それとも植物かな、どっちか答えて、せーの」と発話して、さくら、もも、たろうの順で「植物」、「生き物」、「生き物」と音声認識された場合、ロボット１００は、まず「さくら」の方を向き、次に「もも」の方を向き、最後に「たろう」の方を向く。つづいて、ロボット１００は、音声認識結果に対応する動作を行う。たとえば、「正解は生き物です。ももちゃんとたろうちゃんが正解しました。さくらちゃん残念でした。次に頑張ってね」等の発話を行う。 For example, the robot 100 speaks, “A fish is a creature or a plant, answer either, Seno”, and in the order of Sakura, Momo, Taro, “Plant”, “Creature”, “Creature” If the voice is recognized, the robot 100 first faces “Sakura”, then “Momo”, and finally “Taro”. Subsequently, the robot 100 performs an operation corresponding to the voice recognition result. For example, say, “The correct answer is a creature. Momo-chan and Taro-chan answered correctly. Sakura-chan was disappointed. Then do your best.”

次に、たとえば、ロボット１００に、クイズを出題させ、「今から言う問題に、答えてね。早いもの勝ちだよ」と発話させ、三人の子供から早いもの順で回答を得る場合の例を説明する。ここでは、たとえばイベント２「発話検出」およびイベント３の「音声認識結果」の両方が実行されるよう設定される。 Next, for example, let's ask the robot 100 to give a quiz, and say “Please answer the question you are about to say. You win fast” and get answers from the three children in order of speed. Will be explained. Here, for example, both event 2 “speech detection” and event 3 “speech recognition result” are set to be executed.

この場合、たとえば、イベント２の「発話検出」に対応する言動として、「発話開始時刻記録」を設定しておくことができる。これにより、ロボット１００は、一番早く発話を始めた参加者を認識することができる。 In this case, for example, “utterance start time recording” can be set as the behavior corresponding to “utterance detection” of event 2. As a result, the robot 100 can recognize the participant who started speaking the earliest.

たとえば、ロボット１００が、「黒くて、空を飛ぶ鳥はなんでしょう？」と発話して、さくら、ももの順で、「はい、それはカラスです」、「カラス」と発話した場合、両方とも正解になる。ここで、回答を先に発話し始めたのはさくらだが、ももが発話した「カラス」の方が音節数が少なく、発話に要する時間が短いため、音声認識結果が先に出力される場合がある。この場合、音声認識結果の出力の前後だけを考慮すると、正解者は「もも」になってしまう。しかし、最も早く正解を思いついて発話を始めたのは「さくら」であるので、ロボット１００は、発話検出が最も早く行われ、かつ正解だった「さくら」が正解者だとすることができる。 For example, if the robot 100 utters “What is a black bird flying in the sky?” And says “Yes, it is a crow” or “Crow” in the order of Sakura and Momo, both are correct. become. Here is Sakura who started speaking the answer first, but the crow that uttered Momo has fewer syllables and the time required for speaking is shorter, so the speech recognition result is output first There is. In this case, if only before and after the output of the speech recognition result is considered, the correct answerer becomes “Momo”. However, since it is “Sakura” that has come up with the correct answer the earliest and started uttering, the robot 100 can determine that “Sakura”, which was the earliest utterance detection and was the correct answer, is the correct person.

図９は、この処理を行うための言動決定部１１０の手順を示すフローチャートである。
ここで、図示していないが、コントローラ１０は、発話検出のイベントに対応づけられた音声識別情報をイベントの出力順に記憶する発話開始キューと、音声認識結果のイベントに対応づけられた音声識別情報をイベントの出力順に記憶する音声認識結果キューとを記憶する記憶領域を含む。 FIG. 9 is a flowchart showing the procedure of the behavior determining unit 110 for performing this process.
Here, although not shown in the figure, the controller 10 includes an utterance start queue that stores the speech identification information associated with the utterance detection event in the order of event output, and the speech identification information associated with the speech recognition result event. Including a storage area for storing a speech recognition result cue for storing events in the order of event output.

言動決定部１１０は、イベント出力部１０６から、発話検出のイベントの出力があると（Ｓ２００のＹＥＳ）、そのイベントに対応づけられた音声識別情報を発話開始キューに追加する（Ｓ２０２）。 When there is an utterance detection event output from the event output unit 106 (YES in S200), the behavior determination unit 110 adds the voice identification information associated with the event to the utterance start queue (S202).

また、言動決定部１１０は、イベント出力部１０６から、音声認識結果のイベントの出力があると（Ｓ２０４のＹＥＳ）、そのイベントに対応づけられた音声識別情報と、発話開始キューの先頭の音声識別情報とが同じか否かを判断する（Ｓ２０６）。発話開始キューの先頭と同じ音声識別情報である場合（Ｓ２０６のＹＥＳ）、言動決定部１１０は、対応言動記憶部１１８、識別情報記憶部１２０、ロボット言動記憶部１３０、およびシナリオ記憶部１３２等を参照して、その音声識別情報に対応づけられた話者へのロボット１００の言動を決定する（Ｓ２０８）。 In addition, when the event output unit 106 outputs an event of a speech recognition result (YES in S204), the behavior determination unit 110 and the speech identification information associated with the event and the speech identification of the head of the utterance start queue It is determined whether the information is the same (S206). When the voice identification information is the same as the head of the utterance start queue (YES in S206), the behavior determining unit 110 includes the corresponding behavior storage unit 118, the identification information storage unit 120, the robot behavior storage unit 130, the scenario storage unit 132, and the like. With reference to the speech identification information, the behavior of the robot 100 to the speaker associated with the voice identification information is determined (S208).

つづいて、発話開始キューからその音声識別情報を削除する（Ｓ２１０）。次いで、発話開始キューの先頭の音声識別情報が音声認識結果キューにあるか否かを判断し（Ｓ２１２）、ある場合（Ｓ２１２のＹＥＳ）、その音声識別情報を応答認識結果キューから削除する（Ｓ２１４）。この後、ステップＳ２０８に戻り、その音声識別情報に対応づけられた話者へのロボット１００の言動を決定する。 Subsequently, the voice identification information is deleted from the utterance start queue (S210). Next, it is determined whether the voice identification information at the head of the utterance start queue is in the voice recognition result queue (S212). If there is (YES in S212), the voice identification information is deleted from the response recognition result queue (S214). ). Thereafter, the process returns to step S208, and the behavior of the robot 100 to the speaker associated with the voice identification information is determined.

一方、ステップＳ２０６において、発話開始キューの先頭と同じ音声認識結果でない場合（Ｓ２０６のＮＯ）、音声認識結果キューに、音声識別情報を追加する（Ｓ２１６）。 On the other hand, if the voice recognition result is not the same as the head of the utterance start queue in step S206 (NO in S206), the voice identification information is added to the voice recognition result queue (S216).

ステップＳ２１２で発話開始キューの先頭の音声識別情報が応答認識結果キューにない場合（Ｓ２１２のＮＯ）、およびステップＳ２１６の後、処理を終了するか否かを判断し（Ｓ２１８）、終了しない場合（Ｓ２１８のＮＯ）、ステップＳ２００に戻り、待機する。 In step S212, when the voice identification information at the head of the utterance start queue is not in the response recognition result queue (NO in S212), and after step S216, it is determined whether or not to end the process (S218). (NO in S218), the process returns to step S200 and waits.

また、ステップＳ２１８で処理を終了する場合（Ｓ２１８のＹＥＳ）、処理を終了する。 In addition, when the process ends in step S218 (YES in S218), the process ends.

このように、ロボット１００は、複数の参加者からの発話に応じて、他の参加者への対応を決定することができる。 As described above, the robot 100 can determine the response to other participants in response to the utterances from a plurality of participants.

以上で具体例として説明した処理は、対応言動記憶部１１８、ロボット言動記憶部１３０、およびシナリオ記憶部１３２に適宜設定しておくことにより、実現することができる。 The processing described above as a specific example can be realized by appropriately setting the corresponding behavior storage unit 118, the robot behavior storage unit 130, and the scenario storage unit 132.

以上、本実施の形態におけるロボット１００によれば、ロボット１００が、話者の音声を契機としたイベントの出力に基づき、イベントおよび話者の位置に応じた言動を実行するので、ロボット１００と話者のコミュニケーションを円滑にすることができる。 As described above, according to the robot 100 in the present embodiment, the robot 100 executes speech and behavior according to the event and the position of the speaker based on the output of the event triggered by the speaker's voice. Communication can be facilitated.

（第二の実施の形態）
本実施の形態において、イベント出力部がロボット１００ではなく、ロボット１００外部に設けられたイベント出力サーバに設けられる点で、第一の実施の形態と異なる。本実施の形態において、対応決定システムの一部はロボット１００内に組み込まれ、他の一部はイベント出力サーバ内に組み込まれる。 (Second embodiment)
This embodiment is different from the first embodiment in that the event output unit is provided not in the robot 100 but in an event output server provided outside the robot 100. In the present embodiment, a part of the correspondence determination system is incorporated in the robot 100, and the other part is incorporated in the event output server.

図１０は、本実施の形態におけるイベント出力サーバの構成を示すブロック図である。 FIG. 10 is a block diagram showing the configuration of the event output server in the present embodiment.

本実施の形態において、第一の音声出力部３０４、第二の音声出力部３１４、第三の音声出力部３２４から出力された音声データおよびその音声識別情報は、第一のイベント出力サーバ２００または第二のイベント出力サーバ２２０に送信される。第一のイベント出力サーバ２００または第二のイベント出力サーバ２２０は、受信した音声データが所定の条件に合致するか否かを検出し、所定の条件に合致した場合に、当該条件に合致することを示すイベントを音声識別情報とともにロボット１００に送信する。ロボット１００は、第一のイベント出力サーバ２００または第二のイベント出力サーバ２２０から出力されたイベントおよび音声識別情報に基づき、言動を決定して実行する。 In the present embodiment, the audio data and the audio identification information output from the first audio output unit 304, the second audio output unit 314, and the third audio output unit 324 are stored in the first event output server 200 or It is transmitted to the second event output server 220. The first event output server 200 or the second event output server 220 detects whether or not the received audio data meets a predetermined condition, and if it meets the predetermined condition, the first event output server 200 or the second event output server 220 satisfies the condition. Is transmitted to the robot 100 together with the voice identification information. The robot 100 determines and executes speech based on the event and voice identification information output from the first event output server 200 or the second event output server 220.

ここで、二つのイベント出力サーバのみを示しているが、イベント出力サーバは、参加者一人に対して一つ設けてもよく、また複数の参加者に対して一つだけ設けてもいずれでもよい。以下、第一の音声出力部３０４からの音声データが第一のイベント出力サーバ２００に送信され、第二の音声出力部３１４および第三の音声出力部３２４からの音声データが第二のイベント出力サーバ２２０に送信される場合を例として説明する。 Although only two event output servers are shown here, one event output server may be provided for each participant, or only one event output server may be provided for a plurality of participants. . Hereinafter, the audio data from the first audio output unit 304 is transmitted to the first event output server 200, and the audio data from the second audio output unit 314 and the third audio output unit 324 is output to the second event output. A case where data is transmitted to the server 220 will be described as an example.

第一のイベント出力サーバ２００は、音声入力部２０２、音声認識部２０４、イベント出力部２０６、データ出力部２０８、音声認識辞書２１０、条件記憶部２１２、および通信インターフェース（Ｉ／Ｆ）２１４を含む。第二のイベント出力サーバ２２０も第一のイベント出力サーバ２００と同様の構成を有する。 The first event output server 200 includes a voice input unit 202, a voice recognition unit 204, an event output unit 206, a data output unit 208, a voice recognition dictionary 210, a condition storage unit 212, and a communication interface (I / F) 214. . The second event output server 220 has the same configuration as the first event output server 200.

通信インターフェース２１４は、ネットワーク４００を介して、第一の音声出力部３０４からデータを受信する。 The communication interface 214 receives data from the first audio output unit 304 via the network 400.

音声入力部２０２は、通信インターフェース２１４が受信した、第一の音声出力部３０４からの音声データを、音声識別情報に対応づけて入力する。音声入力部２０２は、入力した音声データを、音声識別情報とともに、音声認識部２０４およびイベント出力部２０６に出力する。 The voice input unit 202 inputs the voice data received by the communication interface 214 from the first voice output unit 304 in association with the voice identification information. The voice input unit 202 outputs the input voice data to the voice recognition unit 204 and the event output unit 206 together with the voice identification information.

音声認識部２０４は、音声入力部２０２が入力した音声データを音声認識する。音声認識辞書２１０は、音声認識単語の集合である音声認識語彙を記憶する音声認識単語記憶部を含む。音声認識部２０４は、音声入力部２０２が入力した音声データと、音声認識辞書２１０に記憶された音声認識語彙とのマッチングを行う。音声認識部２０４は、音声データが音声認識されると、音声識別情報に対応づけて音声認識結果をイベント出力部２０６に出力する。 The voice recognition unit 204 recognizes the voice data input by the voice input unit 202. The speech recognition dictionary 210 includes a speech recognition word storage unit that stores a speech recognition vocabulary that is a set of speech recognition words. The voice recognition unit 204 performs matching between the voice data input by the voice input unit 202 and the voice recognition vocabulary stored in the voice recognition dictionary 210. When voice data is voice-recognized, the voice recognition unit 204 outputs a voice recognition result to the event output unit 206 in association with the voice identification information.

イベント出力部２０６は、音声入力部２０２から出力される音声データ、および音声認識部２０４から出力される音声認識結果に基づき、音声データが所定の条件に合致するか否かを検出し、条件に合致した場合に、条件に合致したことを示すイベントを音声識別情報に対応づけて出力する。なお、イベント出力部２０６は、音声認識部２０４から音声認識結果を取得した場合は、イベントとともに音声認識結果も出力する。 The event output unit 206 detects whether or not the voice data meets a predetermined condition based on the voice data output from the voice input unit 202 and the voice recognition result output from the voice recognition unit 204. If they match, an event indicating that the condition is met is output in association with the voice identification information. When the event output unit 206 acquires the speech recognition result from the speech recognition unit 204, the event output unit 206 also outputs the speech recognition result together with the event.

条件記憶部２１２は、第一の実施の形態において図４を参照して説明した条件記憶部１１６と同様の構成を有する。イベント出力部２０６は、条件記憶部２１２を参照して音声データが所定の条件に合致するか否かを検出する。 The condition storage unit 212 has the same configuration as the condition storage unit 116 described with reference to FIG. 4 in the first embodiment. The event output unit 206 refers to the condition storage unit 212 and detects whether or not the audio data meets a predetermined condition.

データ出力部２０８は、イベント出力部２０６が出力したイベント等のデータをロボット１００に送信する処理を行う。通信インターフェース２１４は、データ出力部２０８の指示に基づき、データをロボット１００に送信する。 The data output unit 208 performs processing for transmitting data such as events output by the event output unit 206 to the robot 100. The communication interface 214 transmits data to the robot 100 based on an instruction from the data output unit 208.

図１１は、本実施の形態におけるロボット１００の構成を示すブロック図である。
ロボット１００は、図４に示したロボット１００のイベント出力部１０６および条件記憶部１１６を有さず、データ入力部１２２を有する点で異なる。 FIG. 11 is a block diagram illustrating a configuration of the robot 100 according to the present embodiment.
The robot 100 is different in that it does not have the event output unit 106 and the condition storage unit 116 of the robot 100 shown in FIG.

通信インターフェース２４は、ネットワーク４００を介して、第一のイベント出力サーバ２００および第二のイベント出力サーバ２２０から、音声識別情報に対応づけられたイベントを受信する。 The communication interface 24 receives an event associated with the voice identification information from the first event output server 200 and the second event output server 220 via the network 400.

データ入力部１２２は、通信インターフェース２４が受信した、複数の第一のイベント出力サーバ２００、および第二のイベント出力サーバ２２０からのデータを入力する。データ入力部１２２は、入力したデータを言動決定部１１０に出力する。 The data input unit 122 inputs data from the plurality of first event output servers 200 and the second event output server 220 received by the communication interface 24. The data input unit 122 outputs the input data to the behavior determining unit 110.

言動決定部１１０は、データ入力部１２２から出力されたイベントに基づき、対応言動記憶部１１８から、当該イベントに対応するロボット１００の言動を読み出す。つづいて、言動決定部１１０は、識別情報記憶部１２０を参照して、目的の音声識別情報に対応するタグ識別情報を読み出す。この後の処理は第一の実施の形態におけるロボット１００と同様である。 Based on the event output from the data input unit 122, the behavior determining unit 110 reads the behavior of the robot 100 corresponding to the event from the corresponding behavior storage unit 118. Subsequently, the behavior determining unit 110 refers to the identification information storage unit 120 and reads tag identification information corresponding to the target voice identification information. The subsequent processing is the same as that of the robot 100 in the first embodiment.

また、音声入力部１０２は、内部マイク１３が集音した音声データを入力し、音声認識部１０４に出力する。音声認識部１０４は、この音声データを音声認識する。 In addition, the voice input unit 102 inputs voice data collected by the internal microphone 13 and outputs the voice data to the voice recognition unit 104. The voice recognition unit 104 recognizes the voice data.

図１２は、本実施の形態における第一の音声出力部３０４、第一のイベント出力サーバ２００、およびロボット１００の処理手順を示すフローチャートである。 FIG. 12 is a flowchart showing a processing procedure of the first audio output unit 304, the first event output server 200, and the robot 100 in the present embodiment.

第一の音声出力部３０４は、第一のマイク３０６から入力される第一の参加者３００の音声データを常時第一のイベント出力サーバ２００に送信する（Ｓ３００）。 The first audio output unit 304 always transmits the audio data of the first participant 300 input from the first microphone 306 to the first event output server 200 (S300).

第一のイベント出力サーバ２００において、音声入力部２０２は第一の音声出力部３０４から送信された音声データを音声認識部２０４およびイベント出力部２０６に出力する。音声認識部２０４は、音声データを音声認識した場合、音声認識結果をイベント出力部２０６に出力する。イベント出力部２０６において、音声入力部２０２または音声認識部２０４から出力される音声に関するデータが所定の条件に合致した場合（Ｓ３０２）、イベント出力部２０６は、その条件に合致したことを示すイベントを音声識別情報に対応づけて出力する。イベント、音声識別情報、および音声認識結果がロボット１００に送信される（Ｓ３０４）。 In the first event output server 200, the voice input unit 202 outputs the voice data transmitted from the first voice output unit 304 to the voice recognition unit 204 and the event output unit 206. When the speech recognition unit 204 recognizes speech data, the speech recognition unit 204 outputs the speech recognition result to the event output unit 206. In the event output unit 206, when the data related to the voice output from the voice input unit 202 or the voice recognition unit 204 meets a predetermined condition (S302), the event output unit 206 displays an event indicating that the condition is met. Output in correspondence with the voice identification information. The event, the voice identification information, and the voice recognition result are transmitted to the robot 100 (S304).

ロボット１００において、言動決定部１１０は、第一のイベント出力サーバ２００からイベントを取得すると、そのイベントに対応づけられた音声識別情報に対応するタグ識別情報に基づき、対応する参加者の位置情報を取得する（Ｓ３０６）。つづいて、言動決定部１１０は、対応言動記憶部１１８、識別情報記憶部１２０、ロボット言動記憶部１３０、およびシナリオ記憶部１３２を参照して、参加者の位置情報に基づき、ロボット１００の言動を決定する（Ｓ３０８）。次いで、メカ制御部１３４および音声合成部１３６等に言動を実行させる（Ｓ３１０）。 In the robot 100, when the behavior determining unit 110 acquires an event from the first event output server 200, the behavior determining unit 110 obtains the corresponding participant location information based on the tag identification information corresponding to the voice identification information associated with the event. Obtain (S306). Subsequently, the behavior determination unit 110 refers to the corresponding behavior storage unit 118, the identification information storage unit 120, the robot behavior storage unit 130, and the scenario storage unit 132, and determines the behavior of the robot 100 based on the location information of the participants. Determine (S308). Next, the mechanical control unit 134, the voice synthesis unit 136, and the like are caused to execute speech (S310).

本実施の形態においても、第一の実施の形態と同様の効果が得られる。また、本実施の形態においては、イベント出力部がロボット１００外部のイベント出力サーバに設けられ、音声認識等の処理もイベント出力サーバで行われるので、ロボット１００自体の処理を簡易にすることができる。また、複数のイベント出力サーバを設け、複数の話者からの音声を分散して処理することにより、効率よく音声認識等の処理を行うこともできる。 Also in this embodiment, the same effect as that of the first embodiment can be obtained. Further, in the present embodiment, the event output unit is provided in the event output server outside the robot 100, and processing such as voice recognition is also performed in the event output server, so that the processing of the robot 100 itself can be simplified. . Further, by providing a plurality of event output servers and processing voices from a plurality of speakers in a distributed manner, processing such as voice recognition can be performed efficiently.

以上の実施の形態で説明したロボットおよびイベント出力サーバの各構成要素は、任意のコンピュータのＣＰＵ、メモリ、メモリにロードされた本図の構成要素を実現するプログラム、そのプログラムを格納するハードディスクなどの記憶ユニット、ネットワーク接続用インターフェースを中心にハードウェアとソフトウエアの任意の組合せによって実現される。そして、その実現方法、装置にはいろいろな変形例があることは、当業者には理解されるところである。 Each component of the robot and the event output server described in the above embodiments includes a CPU, a memory, a program that realizes the components shown in the figure loaded in the memory, and a hard disk that stores the program. It is realized by any combination of hardware and software, centering on the storage unit and network connection interface. It will be understood by those skilled in the art that there are various modifications to the implementation method and apparatus.

以上、図面を参照して本発明の実施の形態について述べたが、これらは本発明の例示であり、上記以外の様々な構成を採用することもできる。 The embodiments of the present invention have been described above with reference to the drawings, but these are exemplifications of the present invention, and various configurations other than those described above can be adopted.

以上の実施の形態において、たとえば図５および図６に示したように、複数の条件およびそれに対応したイベントを示したが、ロボットまたはイベント出力サーバにおいて、これらの条件に合致するか否かを並行して判断する構成とすることもでき、また、いずれか一の条件のみについて、合致するか否かを判断する構成とすることもできる。どの条件に基づきイベントが出力されるかは、ロボットを用いる形態に応じて適宜設定可能である。 In the above embodiment, for example, as shown in FIGS. 5 and 6, a plurality of conditions and events corresponding thereto are shown. In the robot or the event output server, whether or not these conditions are met is determined in parallel. It is also possible to adopt a configuration for determining whether or not only one of the conditions is met. Which condition is used to output an event can be set as appropriate according to the form in which the robot is used.

なお、以上の実施の形態において、ロボット１００がタグリーダ２６を有する構成を示したが、タグリーダは、ロボット１００外部に設けることもできる。この場合、ロボット１００は、当該タグリーダが読み取った話者の識別タグの情報に基づき、その話者の位置情報を取得することができる。 In the above embodiment, the configuration in which the robot 100 includes the tag reader 26 has been described. However, the tag reader may be provided outside the robot 100. In this case, the robot 100 can acquire the position information of the speaker based on the information of the speaker identification tag read by the tag reader.

たとえば、識別タグがＲＦＩＤタグの場合、以下のようにしてロボット１００が話者の位置情報を取得することができる。まず、ロボット１００や話者がいる部屋の所定の位置に複数のタグリーダを設けておく。ロボット１００は、これらのタグリーダの位置を予め記憶しておく。たとえば、３個以上のタグリーダを設けておき、各タグリーダが読み取った話者の識別タグの電波強度をロボット１００に送信するようにする。ロボット１００は、これらの情報に基づき、話者が保持する識別タグの位置情報を取得する。 For example, when the identification tag is an RFID tag, the robot 100 can acquire the position information of the speaker as follows. First, a plurality of tag readers are provided at predetermined positions in a room where the robot 100 and the speaker are present. The robot 100 stores the positions of these tag readers in advance. For example, three or more tag readers are provided, and the radio wave intensity of the speaker's identification tag read by each tag reader is transmitted to the robot 100. Based on these pieces of information, the robot 100 acquires the position information of the identification tag held by the speaker.

また、以上の実施の形態において、話者（参加者）に音声出力部を含む通信端末装置および識別タグを付与する形態を示したが、識別タグを付与することなく、通信端末装置の位置に基づき、話者の位置情報を取得することもできる。たとえば、通信端末装置として、ＧＰＳ機能付きのＰＤＡ端末を用いたり、通信端末装置から発信される電波を用いることにより、通信端末装置の位置情報を取得することができる。 Moreover, in the above embodiment, although the communication terminal device including an audio | voice output part and the form which provides an identification tag were shown to the speaker (participant), the position of the communication terminal device was not provided without providing an identification tag. Based on this, the position information of the speaker can be acquired. For example, the position information of the communication terminal device can be acquired by using a PDA terminal with a GPS function as the communication terminal device or using a radio wave transmitted from the communication terminal device.

このように、話者の位置情報を取得する方法は、種々の形態とすることができ、以上の実施の形態で説明したものに限定されない。 As described above, the method for acquiring the position information of the speaker can take various forms, and is not limited to the one described in the above embodiment.

なお、以上の実施の形態において、ロボット１００が話者の位置情報に応じて、話者の方を向いたり、話者の方へ近づいたり等、動作する例を示したが、ロボット１００は、話者の位置情報に応じた発話のみを行うようにすることもできる。たとえば、ある話者が「湖の方に行ってみよう」等発話し、その話者の位置情報から、その話者が危険なエリアに近づいていることを検出した場合「○○ちゃん、そっちに行ったら危ないよ」等の発話だけを行うこともできる。 In the above embodiment, an example in which the robot 100 operates such as facing the speaker or approaching the speaker according to the position information of the speaker has been shown. It is also possible to perform only the utterance according to the position information of the speaker. For example, if a speaker utters "Let's go to the lake" and detects from the speaker's location information that the speaker is approaching a dangerous area, XX-chan, It ’s also possible to just say, “It ’s dangerous if you go.”

また、たとえば、ロボット１００が複数の話者とコミュニケーションを取る場合には、複数の話者に対するイベント出力に応じて、ロボット１００が効率よく言動を実行するように制御することができる。たとえば、さくら、もも、たろうの順で並んでいる子供が、さくら、たろう、ももの順で「こんにちは」と発話して音声認識された場合、ロボット１００が一度、さくらの方を向き、ももを通り越してたろうの方を向いた後、再びももの方に向かなければならない。ロボット１００は、いずれかの話者に対するイベント出力があると、所定時間待機して、他の話者に対するイベント出力の有無を確認した後、言動を行うようにすることができる。複数の話者に対するイベント出力があった場合、ロボット１００は、それらの話者の位置情報に応じて、効率よい言動をすることができる。たとえば、前述の例では、所定時間内にさくら、もも、たろうの全員から「こんにちは」という発話があり、音声認識された場合、ロボット１００は、さくら、もも、たろうの位置情報に基づき、さくら、もも、たろうの順にそれぞれの方を向き、「こんにちは」等の発話を行うようにすることができる。 In addition, for example, when the robot 100 communicates with a plurality of speakers, the robot 100 can be controlled to efficiently execute speech according to an event output for the plurality of speakers. For example, cherry, peach, children that are arranged in the order of Taro, cherry, Taro, if it is speech recognition and speech as "Hello" in the order of the thigh, the robot 100 once, turned to Sakura, also You must turn to the other side again after you turn to the side of the deaf. When there is an event output for one of the speakers, the robot 100 can wait for a predetermined time, confirm the presence / absence of an event output for another speaker, and then make a speech. When there is an event output for a plurality of speakers, the robot 100 can efficiently speak according to the position information of the speakers. For example, in the example above, Sakura within a predetermined time, peach, there is speech of "Hello" from all of Taro, if it is speech recognition, the robot 100 on the basis cherry, peach, the position information of Taro, cherry, peach, orientation towards the each in the order of Taro, it is possible to perform an utterance such as "Hello".

以上の実施の形態において、ロボット１００が内部マイク１３を有する構成を示したが、ロボット１００は、内部マイク１３を有しない構成とすることもできる。この場合は、たとえば、ロボット１００は、音声入力不具合が生じた音声入力デバイスが付与された話者の方を向いて、何らかの発話をしたり、新しい音声入力デバイスをその話者の方へ持って行く等、内部マイク１３を用いない言動を適宜行うことができる。 In the above embodiment, the configuration in which the robot 100 has the internal microphone 13 has been described. However, the robot 100 may have a configuration in which the internal microphone 13 is not provided. In this case, for example, the robot 100 faces the speaker to which the voice input device in which the voice input failure has occurred is given, and makes some utterance, or brings a new voice input device to the speaker. It is possible to appropriately perform speech and actions without using the internal microphone 13 such as going.

以上の実施の形態ではとくに説明しなかったが、ロボット１００のコントローラ１０は、ＣＣＤカメラ２１Ａ、およびＣＣＤカメラ２１Ｂから送出された画像データにも基づいて、周囲の状況を解析し、それに応じてロボット１００の言動を決定することもできる。 Although not specifically described in the above embodiment, the controller 10 of the robot 100 analyzes the surrounding situation based on the image data sent from the CCD camera 21A and the CCD camera 21B, and accordingly, the robot One hundred actions can be determined.

また、以上の実施の形態では、自律移動するとともに話者と対話するロボットを例として説明したが、ロボットは、いずれか一方の機能のみを有するものとすることもできる。 In the above embodiment, a robot that moves autonomously and interacts with a speaker has been described as an example. However, the robot may have only one of the functions.

また、以上の実施の形態において、対応決定システムおよび対応決定方法が、ロボット制御システムおよびロボット制御方法である場合を例として説明したが、対応決定システムおよび対応決定方法は、たとえば、擬人化エージェントを用いた音声対話システムや音声対話方法、または音声認識を利用した情報検索システムや情報検索方法に適用することもできる。このような場合も、ユーザがマイク等の音声入力デバイスに音声を入力した場合、音声に関するデータおよびユーザの位置に応じて、システムが適切な対応を取るようにすることができる。 In the above embodiment, the case where the response determination system and the response determination method are the robot control system and the robot control method has been described as an example. However, the response determination system and the response determination method include, for example, an anthropomorphic agent. The present invention can also be applied to the used voice dialogue system and voice dialogue method, or the information retrieval system and information retrieval method using voice recognition. Even in such a case, when the user inputs a sound to a sound input device such as a microphone, the system can take an appropriate action according to the data regarding the sound and the position of the user.

以上の実施の形態において、ネットワーク４００が無線ＬＡＮである場合を例として説明したが、ネットワーク４００は、ブルートゥース等、他の種々の無線ネットワークや、有線のネットワークとすることもできる。 In the above embodiment, the case where the network 400 is a wireless LAN has been described as an example. However, the network 400 may be other various wireless networks such as Bluetooth or a wired network.

本発明は、自律移動型または対話型のロボットの制御システム、音声対話システム、音声認識を利用した情報検索システム等、話者の音声に対して何らかの対応を行うシステムおよび方法に適用することができる。 INDUSTRIAL APPLICABILITY The present invention can be applied to a system and a method for performing some correspondence with a speaker's voice, such as an autonomous mobile or interactive robot control system, a voice dialog system, and an information search system using voice recognition. .

実施の形態におけるロボットと、話者である参加者との関係を示す模式図である。It is a schematic diagram which shows the relationship between the robot in embodiment, and the participant who is a speaker. 実施の形態におけるロボットの一例を示す外観構成図である。It is an external appearance block diagram which shows an example of the robot in embodiment. ロボットの電気的構成の一例を示すブロック図である。It is a block diagram which shows an example of the electrical structure of a robot. 実施の形態におけるロボットのコントローラの構成を詳細に示すブロック図である。It is a block diagram which shows in detail the structure of the controller of the robot in embodiment. 条件記憶部の内部構成の一例を示す図である。It is a figure which shows an example of an internal structure of a condition memory | storage part. 対応言動記憶部の内部構成の一例を示す図である。It is a figure which shows an example of an internal structure of a corresponding speech memory | storage part. 識別情報記憶部の内部構成の一例を示す図である。It is a figure which shows an example of the internal structure of an identification information storage part. 実施の形態におけるロボットのコントローラの処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the controller of the robot in embodiment. 実施の形態における言動決定部の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the behavior determination part in embodiment. 実施の形態におけるイベント出力サーバの構成を示すブロック図である。It is a block diagram which shows the structure of the event output server in embodiment. 実施の形態におけるロボットの構成を示すブロック図である。It is a block diagram which shows the structure of the robot in embodiment. 実施の形態における第一の音声出力部、第一のイベント出力サーバ、およびロボットの処理手順を示すフローチャートである。It is a flowchart which shows the process procedure of the 1st audio | voice output part in embodiment, a 1st event output server, and a robot.

Explanation of symbols

１胴体部
２頭部
３Ａ、３Ｂ車輪
１０コントローラ
１０ＡＣＰＵ
１０Ｂメモリ
１１バッテリ
１２スピーカ
１３内部マイク
１４Ａ、１４Ｂアクチュエータ
２１Ａ、２１ＢＣＣＤカメラ
２２Ａ、２２Ｂアクチュエータ
２４通信Ｉ／Ｆ
２６タグリーダ
１００ロボット
１０２音声入力部
１０４音声認識部
１０６イベント出力部
１０８位置情報取得部
１１０言動決定部
１１２言動実行部
１１４音声認識辞書
１１６条件記憶部
１１８対応言動記憶部
１２０識別情報記憶部
１２２データ入力部
１３０ロボット言動記憶部
１３２シナリオ記憶部
１３４メカ制御部
１３６音声合成部
１３８出力部
２００第一のイベント出力サーバ
２０２音声入力部
２０４音声認識部
２０６イベント出力部
２０８データ出力部
２１０音声認識辞書
２１２条件記憶部
２１４通信Ｉ／Ｆ
２２０第二のイベント出力サーバ
２２２音声入力部
２２４音声認識部
２２６データ出力部
２２８音声認識辞書
３００第一の参加者
３０２第一の識別タグ
３０４第一の音声出力部
３０６第一のマイク
３１０第二の参加者
３１２第二の識別タグ
３１４第二の音声出力部
３１６第二のマイク
３２０第三の参加者
３２２第三の識別タグ
３２４第三の音声出力部
３２６第三のマイク DESCRIPTION OF SYMBOLS 1 Body part 2 Head 3A, 3B Wheel 10 Controller 10A CPU
10B Memory 11 Battery 12 Speaker 13 Internal microphone 14A, 14B Actuator 21A, 21B CCD camera 22A, 22B Actuator 24 Communication I / F
26 tag reader 100 robot 102 voice input unit 104 voice recognition unit 106 event output unit 108 position information acquisition unit 110 behavior determination unit 112 behavior execution unit 114 speech recognition dictionary 116 condition storage unit 118 corresponding behavior storage unit 120 identification information storage unit 122 data input Unit 130 robot behavior storage unit 132 scenario storage unit 134 mechanical control unit 136 speech synthesis unit 138 output unit 200 first event output server 202 speech input unit 204 speech recognition unit 206 event output unit 208 data output unit 210 speech recognition dictionary 212 condition Storage unit 214 Communication I / F
220 second event output server 222 voice input unit 224 voice recognition unit 226 data output unit 228 voice recognition dictionary 300 first participant 302 first identification tag 304 first voice output unit 306 first microphone 310 second Participant 312 Second identification tag 314 Second audio output unit 316 Second microphone 320 Third participant 322 Third identification tag 324 Third audio output unit 326 Third microphone

Claims

An event indicating that the voice-related data input from the voice input device assigned to the speaker matches a predetermined condition is input together with the voice identification information associated with the speaker, and the response to the speaker is performed. A response determining unit to determine;
A location information acquisition unit that acquires location information of the speaker specified by the voice identification information associated with the event input by the correspondence determination unit;
Including
The correspondence determination unit determines the correspondence based on the event and the position information of the speaker.

In the correspondence determination system according to claim 1,
Data related to the voice is input together with the voice identification information, whether or not the data related to the voice meets a predetermined condition, and an event indicating that the condition is met when the condition is met. A correspondence determination system, further comprising an event output unit that outputs the voice identification information to the correspondence determination unit.

In the correspondence determination system according to claim 2,
The event output unit monitors the power of the voice input from the voice input device, and outputs an event indicating a voice input failure when the voice power is below a predetermined value for a predetermined time. Correspondence determination system characterized by

In the correspondence determination system according to claim 2 or 3,
The event output unit monitors the power of the voice input from the voice input device, and outputs an event indicating utterance detection when the power of the voice exceeds a predetermined value. Decision system.

In the correspondence determination system according to any one of claims 2 to 4,
The voice input from the voice input device assigned to the speaker is input together with the voice identification information associated with the speaker, the voice is voice recognized, and the voice recognition result is output together with the voice identification information. A speech recognition unit;
The event output unit outputs an event indicating a speech recognition result together with the speech recognition result when the speech recognition result is output from the speech recognition unit.

In the correspondence determination system according to any one of claims 1 to 5,
A tag reader for reading tag identification information from an identification tag attached to the speaker;
The position information acquisition unit acquires position information of the speaker based on the tag identification information read by a tag reader.

The correspondence determination system according to claim 6,
An identification information storage unit that associates the voice identification information with the tag identification information of the same speaker;
The position information acquisition unit refers to the identification information storage unit based on the voice identification information associated with the event output by the event output unit, and obtains the position information of the identification tag having the corresponding tag identification information. A correspondence determination system characterized by acquiring.

In the correspondence determination system according to any one of claims 1 to 7,
The correspondence determination unit is configured to identify an event indicating that data related to speech input from a plurality of speech input devices assigned to a plurality of speakers matches a predetermined condition, and to identify the speech associated with each speaker. A correspondence determination system characterized by inputting together with information.

In the correspondence determination system according to any one of claims 1 to 8,
The correspondence determining unit determines the behavior of an autonomously moving or interactive robot based on the event and the position information of the speaker.

In the correspondence determination system according to claim 9,
The position information acquisition unit acquires position information of the speaker with respect to the robot;
The correspondence determining unit determines the behavior of the robot so that the robot performs a behavior that is conscious of the position of the speaker.

An autonomous mobile or interactive robot,
A response determination system according to any one of claims 1 to 9,
A behavior execution unit that executes the correspondence determined by the correspondence determination unit as behavior;
A robot characterized by including:

An event output server connected via a network to a communication terminal device including an audio output unit that outputs audio input from an audio input device and an autonomous mobile or interactive robot, and relays these,
The robot is
An event indicating that the voice-related data input from the voice input device assigned to the speaker matches a predetermined condition is input together with the voice identification information associated with the speaker, and is associated with the event. Obtaining the position information of the speaker specified by the voice identification information, and executing the behavior determined based on the event and the position information of the speaker,
The event output server is
A voice input unit that receives the voice output by the voice output unit from the communication terminal device together with the voice identification information;
An event output unit that detects whether or not the voice input by the voice input unit matches a predetermined condition, and outputs an event corresponding to the condition together with the voice identification information when the condition is met;
A data output unit for transmitting the event output by the event output unit to the robot together with the voice identification information;
An event output server comprising:

In the event output server according to claim 12,
The event output unit monitors the power of the voice input from the voice input device, and outputs an event indicating a voice input failure when the voice power is below a predetermined value for a predetermined time. An event output server characterized by

In the event output server according to claim 12 or 13,
The event output unit monitors the power of voice input from the voice input device, and outputs an event indicating utterance detection when the power of the voice exceeds a predetermined value. Output server.

The event output server according to any one of claims 12 to 14,
A voice input from a voice input device assigned to the speaker is input together with the voice identification information associated with the speaker, the voice is recognized as a voice, and a voice recognition result is output together with the voice identification information. A speech recognition unit that
The event output server outputs an event indicating a speech recognition result together with the speech recognition result when the speech recognition result is output from the speech recognition unit.

Inputting an event indicating that data related to voice input from a voice input device assigned to the speaker matches a predetermined condition, together with voice identification information associated with the speaker;
Obtaining positional information of the speaker specified by the voice identification information associated with the event input in the step of inputting the event together with the voice identification information;
Determining a response to the speaker based on the event and the location information of the speaker;
A correspondence determining method characterized by including:

In the correspondence determination method according to claim 16,
Prior to the step of inputting the event together with the voice identification information, the voice-related data is input together with the voice identification information, and it is detected whether the voice-related data meets a predetermined condition. A method for determining correspondence, further comprising the step of outputting an event indicating that the condition is met together with the voice identification information when the condition is met.

The correspondence determination method according to claim 17,
The step of outputting the event together with the voice identification information includes:
Monitoring the power of voice input from the voice input device;
Outputting an event indicating an audio input failure when a state where the power of the audio is a predetermined value or less continues for a predetermined time; and
A correspondence determining method characterized by including:

The correspondence determination method according to claim 17 or 18,
The step of outputting the event together with the voice identification information includes:
Monitoring the power of audio input from the audio input device;
Outputting an event indicating utterance detection when the power of the voice is equal to or greater than a predetermined value;
A correspondence determining method characterized by including:

The correspondence determining method according to any one of claims 17 to 19,
Inputting voice input from a voice input device assigned to a speaker together with voice identification information associated with the speaker, voice recognition of the voice, and outputting a voice recognition result together with the voice identification information Further including
The step of outputting the event together with the voice identification information includes outputting an event indicating the voice recognition result together with the voice recognition result when the voice recognition result is outputted.

The correspondence determination method according to any one of claims 16 to 20,
In the step of inputting the event, an event indicating that data related to speech input from a plurality of speech input devices respectively assigned to a plurality of speakers matches a predetermined condition is associated with each speaker. A correspondence determination method characterized by inputting together with voice identification information.

The correspondence determination method according to any one of claims 16 to 21,
In the step of determining the correspondence, the behavior determination method is characterized in that the behavior of the autonomous mobile or interactive robot is determined based on the event and the position information of the speaker.

Computer
Means for inputting an event indicating that data relating to voice input from a voice input device assigned to a speaker matches a predetermined condition together with voice identification information associated with the speaker;
Position information acquisition means for acquiring position information of the speaker specified by the voice identification information associated with the event;
Correspondence determination means for determining a correspondence to the speaker based on the event and the position information of the speaker;
A program characterized by functioning as

The program according to claim 23,
The correspondence determining means determines the behavior of an autonomously moving or interactive robot based on the event and the position information of the speaker.

Computer
Means for inputting data relating to voice input from a voice input device assigned to a speaker together with voice identification information associated with the speaker;
An event output means for detecting whether or not the data relating to the voice meets a predetermined condition, and outputting an event indicating that the condition is met together with the voice identification information when the condition is met;
Position information acquisition means for acquiring position information of the speaker specified by the voice identification information associated with the event;
Correspondence determination means for determining a correspondence to the speaker based on the event and the position information of the speaker;
A program characterized by functioning as

The program according to claim 25,
The correspondence determining means determines the behavior of an autonomously moving or interactive robot based on the event and the position information of the speaker.