JP5332602B2

JP5332602B2 - Service providing equipment

Info

Publication number: JP5332602B2
Application number: JP2008333609A
Authority: JP
Inventors: 信弥櫻田
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2008-12-26
Filing date: 2008-12-26
Publication date: 2013-11-06
Anticipated expiration: 2028-12-26
Also published as: JP2010156741A

Description

この発明は、音声により情報を送受信する移動型のサービス提供装置に関する。 The present invention relates to a mobile service providing apparatus that transmits and receives information by voice.

近時、人間に対して様々なサービスを提供する二足歩行型のロボットが開発されている。この二足歩行型のロボットは、サーバと無線により情報を送受信する（非特許文献１及び非特許文献２参照。）。 Recently, a bipedal robot that provides various services to humans has been developed. This biped robot transmits and receives information wirelessly with a server (see Non-Patent Document 1 and Non-Patent Document 2).

また、人間と音声会話して、その会話の内容に応じたサービスを提供する対話ロボットに関する発明が開示されている（特許文献１参照。）。この発明では、所定の通信プロトコルにより情報を送受信することでロボット同士が連携して、人間にサービスを提供する。 In addition, an invention related to an interactive robot that performs a voice conversation with a human and provides a service according to the content of the conversation is disclosed (see Patent Document 1). In the present invention, robots cooperate with each other by transmitting and receiving information using a predetermined communication protocol to provide services to humans.

また、ロボットが発する音声に識別子や指示情報を重畳し、音声会話でロボット間での情報通信を行う発明が開示されている（特許文献２参照）。
Robot Watch ASIMO、実社会へ一歩踏み出す〜複数台による協調動作を公開、［online］［平成２０年１２月２５日検索］、インターネット<URL :http://robot.watch.impress.co.jp/cda/news/2007/12/12/793.html> Robot Watch ASIMO、実社会へ一歩踏み出す〜複数台による協調動作を公開システム概要。動的に機能を割り振る（拡大写真）［online］［平成２０年１２月２５日検索］、インターネット<URL :http://robot.watch.impress.co.jp/cda/parts/image_for_link/23110-793-10-1.html> 国際公開２００５／０８６０５１号パンフレット特開２０００−２０７１７０号公報 Further, an invention is disclosed in which identifiers and instruction information are superimposed on voices uttered by a robot, and information communication is performed between the robots by voice conversation (see Patent Document 2).
Robot Watch ASIMO, taking a step forward to the real world-Release of coordinated operation by multiple units, [online] [Search December 25, 2008], Internet <URL: http://robot.watch.impress.co.jp/cda /news/2007/12/12/793.html> Robot Watch ASIMO, taking a step forward to the real world-publicizing coordinated operation by multiple units System overview. Dynamically allocate functions (enlarged photo) [online] [searched December 25, 2008], Internet <URL: http: //robot.watch.impress.co.jp/cda/parts/image_for_link/23110- 793-10-1.html> International Publication No. 2005/086051 Pamphlet JP 2000-207170 A

上記のように、非特許文献１及び非特許文献２に記載のロボットや特許文献１に記載のロボットは、サーバや他のロボットと音声会話をせずに、通信によって情報を送受信する。そのため、サービスの提供を受ける人間は、ロボット間でどのような情報をやりとりしているのかを把握できず、不安になってしまうという問題があった。 As described above, the robots described in Non-Patent Document 1 and Non-Patent Document 2 and the robot described in Patent Document 1 transmit and receive information by communication without having a voice conversation with a server or another robot. For this reason, there is a problem that a person who receives the service cannot understand what information is exchanged between the robots, and becomes anxious.

また、複数のロボットが全て同じ型式で、合成音声の声色が同じであると、各ロボットが発した合成音声を互いに区別できないため、どのロボットが発声したかを把握できず、結果としてコミュニケーションに支障を来すという問題があった。 Also, if multiple robots are all of the same model and the voices of the synthesized speech are the same, the synthesized speech produced by each robot cannot be distinguished from each other, so it is impossible to grasp which robot uttered, resulting in a hindrance to communication. There was a problem of coming.

さらに、特許文献２のように複数のロボット間で、音声のみで情報を送受信すると、無線等の通信により情報を送受信する場合と比較して情報量が少ないので、ロボット間で必要な全ての情報を受け渡しすることができないことがあるという問題があった。 Further, when information is transmitted and received between only a plurality of robots as in Patent Document 2, the amount of information is small compared to the case where information is transmitted and received by wireless communication or the like. There was a problem that it could not be delivered.

そこで、本発明は、複数の装置間で送受信する情報の内容がサービスの提供を受ける人間にも把握でき、それぞれ同じ声色であっても識別が可能で、必要な情報を全て送受信できるサービス提供装置を提供することを目的とする。 Accordingly, the present invention provides a service providing apparatus that can understand the content of information to be transmitted / received between a plurality of apparatuses even for a person receiving service provision, can identify even the same voice, and can transmit / receive all necessary information. The purpose is to provide.

この発明のサービス提供装置は、他の装置に対して放音手段で音声を放音した際に、収音手段で他の装置の識別情報を含む音声を収音した場合には、その識別情報に対応する他の装置と、放音手段及び収音手段で所定の情報を送受信するとともに、通信手段で補助情報を送受信する。 When the service providing apparatus according to the present invention picks up the sound including the identification information of the other apparatus by the sound collecting means when the sound is emitted to the other apparatus by the sound emitting means, the identification information Predetermined information is transmitted / received to / from other devices corresponding to the above by means of sound emission means and sound collection means, and auxiliary information is transmitted / received by communication means.

これにより、サービスの提供を受ける人間にも情報の内容が把握でき、それぞれ同じ声色であっても、どの装置が発した音声かを識別できる。また、音声だけでは全ての情報を送受信できない場合でも、必要な情報を全て送受信できる。 As a result, the person who receives the service can grasp the contents of the information, and can identify which device has the voice even if the voices are the same. Further, even when not all information can be transmitted / received by voice alone, all necessary information can be transmitted / received.

また、この発明のサービス提供装置は、方向検出手段が検出した音声の到来方向に放音手段を向ける。これにより、サービス提供装置は、発話者の方を向いて放音することができるので、人間が会話する際と同様の動作となり、ユーザに違和感を抱かせることがない。 The service providing apparatus of the present invention directs the sound emitting means in the direction of arrival of the sound detected by the direction detecting means. As a result, the service providing apparatus can emit the sound toward the speaker, so that the operation is the same as when a person has a conversation, and the user does not feel uncomfortable.

また、この発明のサービス提供装置は、音声合成のための見本信号の一部を疑似雑音信号として、当該疑似雑音信号の極性を制御することにより情報を重畳する。これにより、人に聴感上違和感を与えることなく個体の識別情報を送受信することができる。 In addition, the service providing apparatus according to the present invention superimposes information by controlling a polarity of the pseudo noise signal by using a part of a sample signal for speech synthesis as a pseudo noise signal. Thereby, identification information of an individual can be transmitted and received without giving a sense of incongruity to a person.

この発明によれば、複数のサービス提供装置間で、個体の識別情報が重畳された音声を用いて情報を送受信するので、サービスの提供を受ける人間にも情報の内容が把握でき、それぞれ同じ声色であってもどの装置が発した音声かを識別できる。また、抽出手段が他の装置の音声から識別情報を抽出すると、他の装置と音声で情報を送受信するとともに、通信手段でその装置と情報を送受信するので、音声だけでは全ての情報を送受信できない場合でも、必要な情報を全て送受信できる。 According to the present invention, since information is transmitted and received between a plurality of service providing apparatuses using voice on which individual identification information is superimposed, the content of the information can be grasped by the person receiving the service, and the same voice Even so, it is possible to identify which device is uttered. In addition, when the extraction unit extracts identification information from the voice of another device, information is transmitted / received to / from the other device, and information is transmitted / received to / from the device by the communication unit. Even in this case, all necessary information can be transmitted and received.

以下、本発明のサービス提供装置を含むサービス提供システムについて説明する。以下の説明では、サービス提供装置の一例であるロボットを例に挙げて説明する。図１は、サービス提供システムのブロック図である。 Hereinafter, a service providing system including the service providing apparatus of the present invention will be described. In the following description, a robot that is an example of a service providing apparatus will be described as an example. FIG. 1 is a block diagram of a service providing system.

なお、本実施形態においては、Ａ／Ｄ変換器、Ｄ／Ａ変換器を省略し、特に記載がない場合、全てデジタル処理であるとして説明する。 In the present embodiment, the A / D converter and the D / A converter are omitted, and unless otherwise specified, all digital processing will be described.

図１に示すように、サービス提供システム１は、一例として、ロボット２〜ロボット４を備えた構成である。ロボット２〜ロボット４は、サービスを提供する対象のユーザ５と音声により会話をし、その会話の内容に応じて様々なサービスを提供する。ロボット２〜ロボット４はそれぞれ同じ構成であるので、以下ではその代表としてロボット２の構成を説明する。なお、ロボット３はロボット２が発した音声の届く範囲に存在し、ロボット４はロボット２が発した音声が届かない壁６の反対側に存在するものとする。 As shown in FIG. 1, the service providing system 1 has a configuration including robots 2 to 4 as an example. The robots 2 to 4 have a voice conversation with the user 5 to whom the service is provided, and provide various services according to the content of the conversation. Since the robots 2 to 4 have the same configuration, the configuration of the robot 2 will be described below as a representative example. It is assumed that the robot 3 exists in a range where the voice uttered by the robot 2 can reach and the robot 4 exists on the opposite side of the wall 6 where the voice uttered by the robot 2 does not reach.

ロボット２は、センサ部１１、会話エンジン１２、発話部１３、記憶部１４、受話部１５、通信部１６、動作部１７、及び制御部１８を備えている。 The robot 2 includes a sensor unit 11, a conversation engine 12, a speech unit 13, a storage unit 14, a reception unit 15, a communication unit 16, an operation unit 17, and a control unit 18.

ロボット２では、制御部１８が、センサ部１１で検出した人間の動きや、受話部１５及び会話エンジン１２で認識した音声情報の内容等に応じた音声を出力するように、会話エンジン１２に制御信号を出力する。 In the robot 2, the control unit 18 controls the conversation engine 12 to output a sound corresponding to the human movement detected by the sensor unit 11, the contents of the voice information recognized by the receiving unit 15 and the conversation engine 12, and the like. Output a signal.

会話エンジン１２は、制御部１８からの制御信号に応じて作成した発話文章（テキスト情報）を発話部１３に出力する。 The conversation engine 12 outputs an utterance sentence (text information) created according to a control signal from the control unit 18 to the utterance unit 13.

発話部１３は、会話エンジン１２から入力されたテキスト情報に基づいて音声を合成し、記憶部１４から読み出した個体ＩＤ（個体の識別情報）を合成音声に重畳して、不図示のスピーカにより外部へ放音する。なお、個体ＩＤを合成音声に重畳する際には、聴感上違和感を与えないように処理してから重畳すると良い。例えば、個体ＩＤを音響透かし化して重畳する（例えば超音波帯域に重畳する）と良い。 The utterance unit 13 synthesizes speech based on the text information input from the conversation engine 12, superimposes the individual ID (individual identification information) read from the storage unit 14 on the synthesized speech, and externally uses a speaker (not shown). Sounds out. In addition, when superimposing individual ID on a synthetic | combination voice, it is good to superimpose, after processing so that a sense of incongruity may not be given. For example, the individual ID may be converted into an acoustic watermark and superimposed (for example, superimposed on the ultrasonic band).

また、発話部１３は、制御部１８から入力される所定の情報を合成音声に重畳して、スピーカにより外部へ放音する。 Further, the utterance unit 13 superimposes predetermined information input from the control unit 18 on the synthesized speech and emits the sound to the outside through a speaker.

記憶部１４は、個体ＩＤや他のロボット３，ロボット４の情報等を記憶している。 The storage unit 14 stores an individual ID, information on other robots 3 and 4, and the like.

ロボット２では、他のロボット３，ロボット４（他の装置）やユーザ５（サービスを提供する対象である人間）が発した音声を受話部１５の不図示のマイクで収音して、音声認識を行い、その結果を会話エンジン１２に出力する。また、受話部１５は、音声信号に重畳された個体ＩＤの抽出・解読を行い、個体ＩＤを抽出できた場合には、その個体ＩＤがどのロボットのものか解読した結果を制御部１８に出力する。また、受話部１５は、個体ＩＤを抽出できなかった場合には、その旨を制御部１８に出力する。なお、ロボット２には、音声の到来方向を検出するために、複数のマイク（少なくとも２つ）のマイクが設けられている。 In the robot 2, voices uttered by other robots 3, 4 (other devices) and users 5 (human beings who provide services) are picked up by a microphone (not shown) of the receiver 15 to be recognized. And outputs the result to the conversation engine 12. In addition, the receiver 15 extracts and decodes the individual ID superimposed on the audio signal, and if the individual ID can be extracted, outputs the result of decoding which robot the individual ID belongs to to the control unit 18. To do. Further, if the receiving unit 15 cannot extract the individual ID, the receiving unit 15 outputs the fact to the control unit 18. The robot 2 is provided with a plurality of microphones (at least two) in order to detect the direction of arrival of voice.

会話エンジン１２は、入力された内容を解析して、解析結果を制御部１８に出力する。 The conversation engine 12 analyzes the input content and outputs the analysis result to the control unit 18.

制御部１８は、会話エンジン１２による解析結果に応じた処理を行う。また、音声信号に重畳されている所定の情報に応じた処理、通信部１６を介して受信した情報に応じた処理を行う。例えば、ユーザ５から「リビングに移動」と音声が発せられた場合、会話エンジン１２で解析を行い、リビングに移動する処理を行う。また、他の装置（例えばロボット３）から「リビングに移動」と音声が発せられた場合、この音声に重畳されている移動指示の情報に基づいて、リビングに移動する処理を行う。このとき、通信部１６を介して他の装置とステータス情報（電池残量や、存在位置）を補助情報として送受信する。 The control unit 18 performs processing according to the analysis result by the conversation engine 12. Further, processing according to predetermined information superimposed on the audio signal and processing according to information received via the communication unit 16 are performed. For example, when the user 5 utters “move to living room”, the conversation engine 12 performs analysis and performs a process of moving to the living room. Further, when a voice “move to living room” is issued from another device (for example, the robot 3), a process of moving to the living room is performed based on the information of the movement instruction superimposed on the voice. At this time, status information (remaining battery capacity or location) is transmitted / received as auxiliary information to / from other devices via the communication unit 16.

すなわち、制御部１８は、他のロボットに対して呼びかける内容の音声を放音した際に、他のロボットの識別情報を含む音声を収音した場合、すなわち、音声信号から個体ＩＤを抽出できた場合には、そのＩＤに対応するロボット（ロボット３）と、会話エンジン１２の解析結果に対応する内容の音声を会話エンジン１２に出力させて、情報の送受信（会話）を行うとともに、通信部１６で情報の送受信を行う。または、制御部１８は、会話エンジン１２の解析結果に応じて、動作部１７を制御して所定の動作を行う。 That is, the control unit 18 can extract the individual ID from the voice signal when the voice including the identification information of the other robot is picked up when the voice of the content called to the other robot is emitted. In the case, the robot (robot 3) corresponding to the ID and the voice of the content corresponding to the analysis result of the conversation engine 12 are output to the conversation engine 12 to transmit / receive information (conversation), and the communication unit 16 Send and receive information. Alternatively, the control unit 18 controls the operation unit 17 according to the analysis result of the conversation engine 12 to perform a predetermined operation.

また、制御部１８は、他のロボットに対して呼びかける内容の音声を放音した際に、他のロボットの識別情報を含む音声を収音しない場合（ロボット３も壁６の反対側に移動した場合）、すなわち、音声信号から個体ＩＤを抽出できない場合には、音声が届く範囲に他のロボットがいないと判断して、他のロボット（ロボット３及びロボット４）と通信部１６で情報の送受信を行う。なお、通信部１６の通信方式は、無線であっても有線であってもよい。 In addition, when the control unit 18 does not pick up the voice including the identification information of the other robot when the voice of the content called to the other robot is emitted (the robot 3 has also moved to the opposite side of the wall 6). In other words, if the individual ID cannot be extracted from the audio signal, it is determined that there is no other robot in the range where the audio can reach, and the communication unit 16 exchanges information with other robots (robot 3 and robot 4). I do. Note that the communication method of the communication unit 16 may be wireless or wired.

また、制御部１８は、呼びかけの音声を放音せずに他のロボットの識別情報を含まない音声を収音した場合には、この音声はユーザ５のものと判断して、会話エンジン１２の解析結果に対応する内容の音声を会話エンジン１２に出力させて、ユーザ５と会話を行ったり、動作部１７を制御して所定の動作を行ったりする。 Further, when the control unit 18 does not emit the calling voice and picks up the voice that does not include the identification information of other robots, the control unit 18 determines that the voice is that of the user 5 and the conversation engine 12 The voice of the content corresponding to the analysis result is output to the conversation engine 12 to have a conversation with the user 5, or the operation unit 17 is controlled to perform a predetermined operation.

また、前記のように受話部１５には複数のマイクが設けられているので、制御部１８は、音声を収音した際に、音声の到来方向（ロボット３またはユーザ５が発話している方向）を検出することができる。すなわち、制御部２２は、複数のマイクが収音した音声信号のゲインを比較することで、音声の到来方向を検出できる。制御部２２は、音声の到来方向を検出すると、動作部１７を制御して、発話部１３のスピーカ（ロボット２の口に相当）を音声の到来方向に向ける。これにより、ロボット２は、発話者の方を向いて放音することができるので、人間が会話する際と同様の動作となり、ユーザに違和感を抱かせることがない。 In addition, since the microphone 15 is provided in the receiver 15 as described above, the controller 18 receives the voice (the direction in which the robot 3 or the user 5 is speaking) when the voice is picked up. ) Can be detected. In other words, the control unit 22 can detect the voice arrival direction by comparing the gains of the voice signals collected by the plurality of microphones. When detecting the voice arrival direction, the control unit 22 controls the operation unit 17 to direct the speaker of the utterance unit 13 (corresponding to the mouth of the robot 2) in the voice arrival direction. Thereby, since the robot 2 can emit a sound toward the speaker, the operation is the same as that when a human talks, and the user does not feel uncomfortable.

次に、発話部１２と受話部１５の詳細について説明する。図２（Ａ）は、発話部の詳細を示すブロック図である。図２（Ｂ）は、受話部の詳細を示すブロック図である。 Next, details of the utterance unit 12 and the reception unit 15 will be described. FIG. 2A is a block diagram showing details of the utterance unit. FIG. 2B is a block diagram showing details of the receiver.

発話部１２は、疑似ノイズ生成部２１、極性制御部２２、音声合成部２３、アンプ２４、及びスピーカ２５を備えている。 The utterance unit 12 includes a pseudo noise generation unit 21, a polarity control unit 22, a speech synthesis unit 23, an amplifier 24, and a speaker 25.

疑似ノイズ生成部２１は、Ｍ系列またはＧｏｌｄ系列のような自己相関性の高い疑似雑音符号列（ＰＮ符号、疑似雑音信号の波形）を生成する。 The pseudo noise generation unit 21 generates a pseudo noise code string (PN code, pseudo noise signal waveform) having a high autocorrelation such as an M sequence or a Gold sequence.

極性制御部２２は、疑似ノイズ生成部２１が生成する疑似雑音符号列の極性を、記憶部１４から読み出した個体ＩＤに応じて制御することで、疑似ノイズに個体ＩＤを重畳する。すなわち、記憶部１４から読み出した個体ＩＤのビットデータが「１」の場合、疑似ノイズ生成部２１はＰＮ符号をそのままの極性で出力し、個体ＩＤのビットデータが「０」の場合、疑似ノイズ生成部２１はＰＮ符号の極性を逆（逆位相）にして出力する。 The polarity control unit 22 superimposes the individual ID on the pseudo noise by controlling the polarity of the pseudo noise code string generated by the pseudo noise generation unit 21 according to the individual ID read from the storage unit 14. That is, when the bit data of the individual ID read from the storage unit 14 is “1”, the pseudo noise generation unit 21 outputs the PN code with the same polarity, and when the bit data of the individual ID is “0”, the pseudo noise The generation unit 21 outputs the PN code with the polarity reversed (reverse phase).

受信側では、算出された相関値の位相を検出することにより、重畳されているビットデータ（個体ＩＤ）の「１」、「０」を復調することができる。 On the receiving side, by detecting the phase of the calculated correlation value, “1” and “0” of the superimposed bit data (individual ID) can be demodulated.

音声合成部２３は、会話エンジン１２が出力した発話文章に応じて、疑似ノイズ生成部２１が出力した疑似ノイズおよび他の波形（他の波形生成部は不図示）から音声を合成する。 The speech synthesizer 23 synthesizes speech from the pseudo noise output from the pseudo noise generator 21 and other waveforms (other waveform generators are not shown) according to the utterance text output from the conversation engine 12.

音声合成部２３が合成した音声信号は、アンプ２４により増幅され音声として放音される。放音された音声は、他のロボットのマイク３１により収音され、図２（Ｂ）に示す受話部１５に入力される。 The voice signal synthesized by the voice synthesis unit 23 is amplified by the amplifier 24 and emitted as voice. The emitted sound is collected by the microphone 31 of another robot and input to the receiver 15 shown in FIG.

図２（Ｂ）に示すように、受話部１５は、マイク３１、音声認識部３２、整合フィルタ３３、及び復調部３４を備えている。 As shown in FIG. 2B, the receiver unit 15 includes a microphone 31, a voice recognition unit 32, a matched filter 33, and a demodulation unit 34.

音声認識部３２は、マイク３１が収音した音声を認識して、テキスト情報を会話エンジン１２に出力する。 The voice recognition unit 32 recognizes the voice picked up by the microphone 31 and outputs text information to the conversation engine 12.

整合フィルタ３３は、入力された音声信号と疑似ノイズとの相関を求める相関計算部である。整合フィルタ３３は、ＦＩＲフィルタにより実現され、フィルタ係数として、送信側の疑似ノイズ生成部２１が生成する擬似雑音符号列（ＰＮ符号）が設定されている。ＰＮ符号は非常に高い自己相関性を有するため、整合フィルタ３３は、入力された音声にＰＮ符号が含まれている場合、相関値ピーク（所定レベル以上の相関値）を出力する。整合フィルタ３３は、位相が正転であれば正の相関値ピークを出力し、位相が反転していれば負の相関値ピークを出力する。 The matched filter 33 is a correlation calculation unit that calculates the correlation between the input audio signal and the pseudo noise. The matched filter 33 is realized by an FIR filter, and a pseudo noise code string (PN code) generated by the pseudo noise generating unit 21 on the transmission side is set as a filter coefficient. Since the PN code has very high autocorrelation, the matched filter 33 outputs a correlation value peak (correlation value of a predetermined level or higher) when the input speech includes the PN code. The matched filter 33 outputs a positive correlation value peak if the phase is normal, and outputs a negative correlation value peak if the phase is inverted.

復調部３４では、整合フィルタ３３の出力値からデータ復調を行う。すなわち、復調部３４は、整合フィルタ３３から正の相関値ピークが入力された場合、ビットデータとして「１」を復調し、整合フィルタ３３から負の相関値ピークが入力された場合、ビットデータとして「０」を復調する。なお、疑似雑音の出力周期は予め決められており、復調部３４は、相関値ピークが入力された場合、その後、疑似雑音の出力周期の長さだけビット出力を続ける。例えば、疑似雑音の周期が１０２３サンプルであれば、正の相関値ピークが入力された場合、「１」を１０２３サンプル連続して出力する。 The demodulator 34 demodulates data from the output value of the matched filter 33. That is, the demodulator 34 demodulates “1” as bit data when a positive correlation value peak is input from the matched filter 33, and as bit data when a negative correlation value peak is input from the matched filter 33. Demodulate "0". Note that the output period of the pseudo noise is determined in advance, and when the correlation value peak is input, the demodulator 34 continues to output the bit by the length of the output period of the pseudo noise. For example, if the period of the pseudo noise is 1023 samples, “1” is continuously output for 1023 samples when a positive correlation value peak is input.

このようにして復調部３４で疑似ノイズに重畳された個体ＩＤや他の情報が復調される。上記疑似ノイズは、周波数特性上はホワイトノイズ等のノイズ音そのものであり、通常のノイズ音と聴感上全く等価とすることが可能である。 In this way, the demodulator 34 demodulates the individual ID and other information superimposed on the pseudo noise. The pseudo noise is a noise sound such as white noise itself in terms of frequency characteristics, and can be completely equivalent to a normal noise sound in terms of hearing.

次に、ロボットの動作について、フローチャートに基づいて説明する。図３は、ロボットの動作を説明するためのフローチャートである。 Next, the operation of the robot will be described based on a flowchart. FIG. 3 is a flowchart for explaining the operation of the robot.

ロボット２は、マイク３１で音声を収音した場合には（ｓ１：Ｙ）、受話部１５で音声認識と個体ＩＤの有無確認及び解読を行う（ｓ２）。ロボット２は、音声に個体ＩＤが含まれていない場合には（ｓ３：Ｎ）、ユーザ５からの呼びかけと判断し、音声認識の結果（ユーザ５の依頼内容）を確認する（ｓ４）。ロボット２は、ユーザ５の依頼内容が複数のロボットにより共同で作業する必要がある場合や別のロボットに依頼内容を伝える必要がある場合など、他のロボットに情報を伝える必要があるときには（ｓ５：Ｙ）、周囲のロボットに対して音声で呼びかけを行う（ｓ６）。一方、ロボット２は、ユーザ５の依頼内容が、他のロボットに情報を伝える必要が無いものの場合には（ｓ５：Ｎ）、その内容に応じた会話や動作を行う（ｓ７）。会話や動作が終了すると、ステップｓ１の処理に戻る。 When the robot 2 picks up the sound with the microphone 31 (s1: Y), the receiving unit 15 performs voice recognition, confirmation of presence / absence of the individual ID, and decoding (s2). If the individual ID is not included in the voice (s3: N), the robot 2 determines that the call is from the user 5, and confirms the result of voice recognition (request contents of the user 5) (s4). When the request content of the user 5 needs to be shared by a plurality of robots or when the request content needs to be transmitted to another robot, the robot 2 needs to transmit information to another robot (s5). : Y), a voice call is made to surrounding robots (s6). On the other hand, when the request content of the user 5 does not need to convey information to other robots (s5: N), the robot 2 performs a conversation or an action according to the content (s7). When the conversation or operation ends, the process returns to step s1.

ロボット２は、マイク３１で音声を収音した場合には（ｓ１：Ｙ）、受話部１５で音声認識と個体ＩＤの有無確認及び解読を行う（ｓ２）。そして、ロボット２は、音声に個体ＩＤが含まれている場合には（ｓ３：Ｙ）、他のロボットからの反応と判断し、音声と通信により情報を送受信したり、情報に応じた動作を行ったりする（ｓ８）。会話や動作が終了すると、ステップｓ１の処理に戻る。 When the robot 2 picks up the sound with the microphone 31 (s1: Y), the receiving unit 15 performs voice recognition, confirmation of presence / absence of the individual ID, and decoding (s2). Then, when the individual ID is included in the voice (s3: Y), the robot 2 determines that the reaction is from another robot, and transmits / receives information by voice and communication, or performs an operation according to the information. (S8). When the conversation or operation ends, the process returns to step s1.

一方、ロボット２は、音声を収音するまで一定時間待機し（ｓ１：Ｎ）、一定時間が経過してもマイク３１で音声を収音しない場合には（ｓ１：Ｎ、ｓ９：Ｙ）、音声の届く範囲に他のロボットはいないと判断して、他のロボットの通信により情報を送受信する（ｓ１０）。情報の送受信が終了すると、ステップｓ１の処理に戻る。 On the other hand, the robot 2 waits for a certain time until the sound is picked up (s1: N). If the sound is not picked up by the microphone 31 even after the predetermined time has passed (s1: N, s9: Y), It is determined that there is no other robot within the reach of the voice, and information is transmitted / received by communication of the other robot (s10). When the transmission / reception of information ends, the process returns to step s1.

上記のように、本発明のロボットは、他のロボットに情報を伝達する必要がある場合には、音声で呼びかけを行い、他のロボットから音声により反応があると、音声で情報を送受信する。そのため、ロボット同士が会話するので、ロボットの周囲の人（ユーザ）は、ロボット間で何の情報をやりとりしているかがわかるので、違和感や不安を抱かずに済む。 As described above, when it is necessary to transmit information to other robots, the robot of the present invention makes a call by voice, and when there is a response from other robots, sends and receives information by voice. Therefore, since the robots have a conversation with each other, people (users) around the robot can know what information is being exchanged between the robots, so that it is possible to avoid discomfort and anxiety.

また、ロボット同士が音声で情報を送受信するとともに、通信により情報を送受信することができるので、音声だけでは必要な情報を伝達できない場合でも、通信により確実に情報を伝達できる。また、音声で伝達する情報と同じ情報を通信でも伝達することで、二重に情報を伝達することになるので、音声認識を誤った場合でも、ロボットが誤動作することなく情報に基づく動作を確実に行わせることができる。例えば、ユーザからロボット２に「リビングに移動」と音声が発せられた場合、自身の電池残量が少なく、他のロボット３に代替してもらう必要がある場合、ロボット２はロボット３に呼びかけを行うとともに、通信部１６を介してステータス情報（補助情報）の送受信を行う。その結果、電池残量が多いロボット３がリビングに移動する処理を行う。ここで、他のロボット３の電池残量がロボット２より少ない場合、当該他のロボット３が「代わりに移動してください」等と発音し、ロボット２がリビングに移動する処理を行う。 In addition, since the robots can transmit and receive information by voice and can transmit and receive information by communication, even when necessary information cannot be transmitted by voice alone, the information can be reliably transmitted by communication. In addition, by transmitting the same information as the information to be transmitted by voice, the information is transmitted twice, so even if the voice recognition is wrong, the robot can reliably operate based on the information without malfunctioning. Can be done. For example, when the user sends a voice “moving to living room” to the robot 2, the robot 2 calls the robot 3 when the remaining battery level is low and another robot 3 needs to be replaced. In addition, status information (auxiliary information) is transmitted and received via the communication unit 16. As a result, the robot 3 having a large remaining battery capacity performs a process of moving to the living room. Here, when the remaining battery level of the other robot 3 is less than that of the robot 2, the other robot 3 pronounces “Please move instead” and the robot 2 moves to the living room.

なお、上記の説明では、あるロボットが、音声を収音し、その音声からＩＤを検出することで発声したロボットを認識した後に、ロボット間で通信でも情報を送受信するものとしたが他の方法であっても良い。例えば、ロボット間で通信が可能なことを確認しておき、情報の伝達が必要になった場合には、音声を発声して、周囲のロボットに呼びかけを行い、音声による反応があり個体ＩＤが検出できたら、音声と通信により情報を送受信する。また、音声による反応が無い場合には、音声の届く範囲に他のロボットはいないものと判断して、他のロボットと通信で情報を送受信する。この方式でも、周囲の人（ユーザ）は、ロボット間で何の情報をやりとりしているかがわかるので、違和感や不安を抱かずに済む。 In the above description, a robot picks up a voice and recognizes a robot uttered by detecting an ID from the voice, and then transmits / receives information by communication between the robots. It may be. For example, after confirming that communication is possible between robots, if it is necessary to transmit information, a voice is uttered and a call is made to surrounding robots. If detected, information is transmitted and received by voice and communication. If there is no response by voice, it is determined that there is no other robot within the reach of the voice, and information is transmitted / received to / from another robot. Even in this method, the surrounding people (users) can understand what information is being exchanged between the robots, so that it is not necessary to feel uncomfortable or uneasy.

なお、サービス提供システム１は、３つのサービス提供装置に限るものではなく、さらに複数のロボット（サービス提供装置）により構成されていても良い。 The service providing system 1 is not limited to three service providing apparatuses, and may be configured by a plurality of robots (service providing apparatuses).

なお、サービス提供装置の一例としてロボットについて説明したが、本発明はこれにかぎるものではなく、音声合成と音声認識を行ってコミュニケーションをとるタイプの情報処理装置であれば、広く適用できる。 Although the robot has been described as an example of the service providing apparatus, the present invention is not limited to this, and the present invention can be widely applied to any information processing apparatus that performs communication by performing speech synthesis and speech recognition.

サービス提供システムのブロック図である。It is a block diagram of a service provision system. 発話部及び受話部のブロック図である。It is a block diagram of a speech part and a receiving part. ロボットの動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of a robot.

Explanation of symbols

１…サービス提供システム２〜４…ロボット 1 ... Service provision system 2-4 ... Robot

Claims

A service providing device that transmits and receives information between a plurality of devices that perform communication by performing speech synthesis and speech recognition ,
Speech synthesis means for synthesizing speech;
Storage means for storing individual identification information;
Identification information superimposing means for superimposing the identification information on the voice synthesized by the voice synthesizing means;
A sound emitting means for emitting a sound on which the identification information is superimposed;
Sound collection means for collecting sound;
Extraction means for extracting identification information of another device from the sound collected by the sound collection means;
A communication means for performing information communication by connecting to another device;
When the identification information of another device is extracted by the extraction unit, the predetermined information is transmitted to the other device by superimposing the predetermined information on the voice synthesized by the voice synthesis unit, and the extracted identification information is added to the extracted identification information. connected via said communication means with corresponding other apparatuses, and control means for transmitting the auxiliary information to assist the predetermined information,
A service providing apparatus comprising:

The service providing apparatus according to claim 1, wherein the auxiliary information includes the same information as the predetermined information.

Direction detecting means for detecting the direction of arrival of the sound collected by the sound collecting means;
The service providing apparatus according to claim 1, further comprising: an operation unit that directs the sound emitting unit toward an arrival direction of the sound detected by the direction detecting unit.

The speech synthesis means is a means for synthesizing speech from a plurality of sample signals,
The identification information superimposition means, a portion of the sample signal as a pseudo-noise signal, the service providing apparatus according to any one of claims 1 to 3 superimposes the information by controlling the polarity of the pseudo noise signal .