JP5610283B2

JP5610283B2 - External device control apparatus, external device control method and program

Info

Publication number: JP5610283B2
Application number: JP2010203137A
Authority: JP
Inventors: 貴志住吉
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2010-09-10
Filing date: 2010-09-10
Publication date: 2014-10-22
Anticipated expiration: 2030-09-10
Also published as: JP2012060506A

Description

本発明は、外部機器制御装置、その外部機器制御方法及びプログラムに関し、特に、利用者の音声を認識しその認識した音声に従って外部機器を操作する技術に関する。 The present invention relates to an external device control apparatus, its external device control method, and program, and more particularly to a technique for recognizing a user's voice and operating the external device according to the recognized voice.

近年、人間の生活を支援するロボットが数多く開発されてきている。ロボットの一種であるヒューマノイドロボットは人間と同じ形をしたロボットであり、人間にかかる心理的負担が少ないことや、人間向けに設計された生活空間に容易に順応できることなどが利点とされる。このようなロボットは音声認識機能やジェスチャ機能などにより、人間とのコミュニケーションを自然な形で行えるように工夫されているものが多い。 In recent years, many robots that support human life have been developed. Humanoid robots, which are a type of robot, are robots that have the same shape as humans, and have advantages such as less psychological burden on humans and easy adaptation to living spaces designed for humans. Many of these robots are devised so that they can communicate with humans in a natural way by means of speech recognition functions and gesture functions.

一方で、音声認識機能を備えた電子機器も普及しつつある。カーナビゲーションシステムでは、リモコンやタッチパネルなどの手段に比べてハンズフリー入力が可能という圧倒的なメリットが音声認識にあるため、音声認識機能が比較的早期から普及している。携帯電話やスマートフォンなどの携帯端末ではキーボードなどの入力装置が小型になるため使いづらく、音声入力はそれに比べれば簡便であることから徐々に普及しつつある。 On the other hand, electronic devices having a voice recognition function are also becoming popular. In a car navigation system, since voice recognition has an overwhelming advantage that hands-free input is possible compared to means such as a remote controller and a touch panel, a voice recognition function has been widely used from a relatively early stage. In portable terminals such as mobile phones and smartphones, input devices such as keyboards are small and difficult to use, and voice input is becoming increasingly popular because it is simpler than that.

エアコンやテレビなどの電化製品においては、機能の多様化が進む一方で操作が複雑化し利用者がその機能を十分に使いこなせないという問題がある。音声認識や音声理解などの技術での解決が望まれているが、認識精度などの技術面やコスト面の問題があり普及が進んでいないのが現状である。 Electrical appliances such as air conditioners and televisions have a problem that their functions are diversified while operations are complicated and users cannot use the functions sufficiently. Although solutions with technologies such as speech recognition and speech understanding are desired, there are technical problems such as recognition accuracy and cost, and the current situation is that they are not widely used.

また、現在利用されている電子機器を別の観点からみると、さらなる利便性の追求、消費者の購買意欲の刺激、ユニバーサルデザインなどの理由により、多種多様な機能が盛り込まれる傾向にある。例えば、ネットワークを介した外出先からの機器の遠隔操作や、利用者やその環境の情報をセンサで取得して分析することで、テレビ番組の推薦やエアコンの省エネ運転などを行う技術が開発されている。 From another perspective, electronic devices that are currently used tend to include a wide variety of functions for reasons such as pursuing further convenience, stimulating consumers' willingness to purchase, and universal design. For example, technologies have been developed to recommend TV programs and save energy by operating air conditioners by remotely operating devices from outside the network and acquiring and analyzing information about users and their environment using sensors. ing.

現在、このような機能の大部分は電子機器ごとに個別に実現されているが、将来的には全ての電子機器がネットワークに接続され、たがいに情報を交換してより高度な機能を実現することが可能になると考えられる。その一つの実現形態として、センタサーバが各電子機器の情報を集約して分析し、各機器の制御を適切に行うという中央集権型の構成が考えられる。 Currently, most of these functions are implemented individually for each electronic device, but in the future all electronic devices will be connected to the network, and will exchange information and realize more advanced functions. It will be possible. As one implementation form, a centralized configuration in which the center server collects and analyzes information on each electronic device and appropriately controls each device is conceivable.

これらの背景を踏まえると、ロボットがセンタサーバの役割を果たして各電子機器の情報収集を行い、利用者と対話を行うことで利用者の潜在的な要求を明確化し、それらの結果を総合的に用いて電子機器の制御を行うというシステムが有用と考えられる。例えば特許文献１では、ロボットが利用者と対話を行い、ネットワークに接続された他の電子機器を制御する発明が開示されている。 Based on these backgrounds, the robot plays the role of the center server, collects information on each electronic device, and interacts with the user to clarify the potential demands of the user and comprehensively summarize the results. A system that uses it to control electronic devices is considered useful. For example, Patent Document 1 discloses an invention in which a robot interacts with a user and controls other electronic devices connected to a network.

特開２００５−３３３４９５号公報JP 2005-333495 A

以上に述べたロボットと電子機器の連携方法については、他にも様々な方法が考えられる。しかし、電子機器自身にも音声認識機能が搭載され、さらにロボットが家庭に普及したとき、既知の方法はいずれも音声認識機能が十分に活用されているとはいえない。例えば特許文献１のようにロボットが他の電子機器の制御を司るシステムを構成した場合、様々な課題が浮上する。 Various other methods are conceivable for the above-described cooperation method between the robot and the electronic device. However, when the electronic device itself is equipped with a voice recognition function and the robot has spread to the home, it cannot be said that any of the known methods make full use of the voice recognition function. For example, when a robot configures a system that controls other electronic devices as in Patent Document 1, various problems emerge.

１つの課題は、利用者の要求がある電子機器を操作したいという明確なものである場合にある。利用者がロボットを介して電子機器を操作しようとすると、ロボットが遠くにいる場合は自分の近くに呼ぶ必要があったり、ロボットがこちらが退屈していると判断して雑談を持ちかけ電子機器の操作が妨げられるなど、不都合な状況が起こりうる。 One problem is when it is clear that the user desires to operate an electronic device. When a user tries to operate an electronic device via a robot, if the robot is far away, it is necessary to call it close to you, or if the robot determines that you are bored, you can chat with the electronic device. Inconvenient situations can occur, such as the operation being hindered.

このような場合は、ロボットを無視して直接電子機器を操作しようと考えるだろう。しかし、電子機器への音声入力は一般にロボットよりも貧弱な言語理解しかなされず、特定の言葉以外は受け付けられないことが多い。従って、普段ロボットを介して電子機器を操作している利用者が適切な音声入力の言葉を発して操作に成功する確率は低い。このような状況は、ロボットが有効に活用されているとはいいがたい。 In such a case, you may want to ignore the robot and operate the electronic device directly. However, speech input to electronic devices is generally poorer in language understanding than robots, and only specific words are often accepted. Therefore, the probability that a user who normally operates an electronic device via a robot will utter an appropriate voice input word and succeed in the operation is low. It is hard to say that such a situation makes effective use of robots.

他の課題は、ロボットが電子機器をネットワーク経由でコントロールすることに利用者が違和感を覚える可能性があることである。ロボットの機能や見た目が人間に近付くことで、利用者がロボットを人間と同等の存在として捉えやすくなるが、いざそう捉えたとき、電子機器の操作においても人間と同じ方法で行うほうが利用者にはごく自然にうつる。 Another problem is that the user may feel discomfort when the robot controls the electronic device via the network. The robot's functions and appearance approach humans, making it easier for users to perceive robots as being equivalent to humans. However, when they do so, it is better for users to operate electronic devices in the same way as humans. It moves very naturally.

本発明の一態様は、利用者の音声を認識して外部機器の制御を行う外部機器制御装置であって、音声を受信する受信部と、前記受信装置が受信した音声を文字列に変換する音声認識部と、前記音声認識部が変換した文字列を含む状態情報を記憶する状態情報記憶領域と、前記外部機器制御装置の動作と前記状態情報における条件とを関連付ける動作情報を記憶する動作情報記憶領域と、前記状態情報記憶領域と前記動作情報記憶領域とを参照して、前記文字列を含む状態情報が外部機器の音声及び／又はジェスチャによる操作と関連付けられている条件を満たすか否かを判定する動作計画部と、前記文字列を含む状態情報が前記条件を満たしていると前記動作計画部が判定した場合、前記外部機器操作に対応する音声波形及び／又はジェスチャのパターンを生成するパターン生成部と、前記パターンに従って前記音声波形及び／又はジェスチャの出力動作を行う出力部を含むことを特徴とする。 One aspect of the present invention is an external device control apparatus that recognizes a user's voice and controls an external device, the receiving unit receiving the voice, and converting the voice received by the receiving device into a character string. Operation information for storing a speech recognition unit, a state information storage area for storing state information including a character string converted by the speech recognition unit, and operation information for associating an operation of the external device control device with a condition in the state information Whether status information including the character string satisfies a condition associated with an operation by voice and / or gesture of an external device with reference to the storage area, the status information storage area, and the motion information storage area And when the operation planning unit determines that the state information including the character string satisfies the condition, the audio waveform and / or gesture corresponding to the external device operation A pattern generator for generating a pattern, characterized in that it comprises an output unit for performing the voice waveform and / or gestures of the output operation in accordance with the pattern.

本発明によれば、利用者が外部機器の音声及び／又はジェスチャによる適切な操作方法を学習することができる。 According to the present invention, a user can learn an appropriate operation method using voice and / or gestures of an external device.

本発明の第一の実施形態に係るサービスロボットの構成を模式的に示すブロック図である。It is a block diagram showing typically the composition of the service robot concerning a first embodiment of the present invention. 本発明の第一の実施形態に係るサービスロボットの外部装置管理プログラムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the external apparatus management program of the service robot which concerns on 1st embodiment of this invention. 本発明の第一の実施形態に係るサービスロボットの機器データベースの例を示す図である。It is a figure which shows the example of the apparatus database of the service robot which concerns on 1st embodiment of this invention. 本発明の第一の実施形態に係るサービスロボットの語彙データベースの例を示す図である。It is a figure which shows the example of the vocabulary database of the service robot which concerns on 1st embodiment of this invention. 本発明の第一の実施形態に係るサービスロボットの動作計画プログラムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the operation planning program of the service robot which concerns on 1st embodiment of this invention. 本発明の第一の実施形態に係るサービスロボットの動作計画プログラムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the operation planning program of the service robot which concerns on 1st embodiment of this invention. 本発明の第一の実施形態に係るサービスロボットの音声認識プログラムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the speech recognition program of the service robot which concerns on 1st embodiment of this invention. 本発明の第一の実施形態に係るサービスロボットの利用者音声合成プログラムの動作を示す図である。It is a figure which shows operation | movement of the user speech synthesis program of the service robot which concerns on 1st embodiment of this invention. 本発明の第一の実施形態に係るサービスロボットのシステム音声合成プログラムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the system speech synthesis program of the service robot which concerns on 1st embodiment of this invention. 本発明の第一の実施形態に係るサービスロボットの内部状態データベースの例を示す図である。It is a figure which shows the example of the internal state database of the service robot which concerns on 1st embodiment of this invention. 本発明の第一の実施形態に係るサービスロボットの動作データベースの例を示す図である。It is a figure which shows the example of the operation | movement database of the service robot which concerns on 1st embodiment of this invention. 本発明の第一の実施形態に係るサービスロボットの動作キューの例を示す図である。It is a figure which shows the example of the operation | movement queue of the service robot which concerns on 1st embodiment of this invention. 本発明の第一の実施形態に係るサービスロボットの内部状態データベースの例を示す図である。It is a figure which shows the example of the internal state database of the service robot which concerns on 1st embodiment of this invention. 本発明の第一の実施形態に係るサービスロボットの内部状態データベースの例を示す図である。It is a figure which shows the example of the internal state database of the service robot which concerns on 1st embodiment of this invention. 本発明の第一の実施形態に係るサービスロボットの動作キューの例を示す図である。It is a figure which shows the example of the operation | movement queue of the service robot which concerns on 1st embodiment of this invention. 本発明の第一の実施形態に係るサービスロボットの動作キューの例を示す図である。It is a figure which shows the example of the operation | movement queue of the service robot which concerns on 1st embodiment of this invention. 本発明の第二の実施形態に係るサービスロボットの構成を模式的に示すブロック図である。It is a block diagram which shows typically the structure of the service robot which concerns on 2nd embodiment of this invention. 本発明の第二の実施形態に係るサービスロボットのジェスチャデータベースの例を示す図である。It is a figure which shows the example of the gesture database of the service robot which concerns on 2nd embodiment of this invention. 本発明の第二の実施形態に係るサービスロボットの動作計画プログラムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the operation planning program of the service robot which concerns on 2nd embodiment of this invention. 本発明の第二の実施形態に係るサービスロボットの動作計画プログラムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the operation planning program of the service robot which concerns on 2nd embodiment of this invention. 本発明の第二の実施形態に係るサービスロボットの動作キューの例を示す図である。It is a figure which shows the example of the operation | movement queue of the service robot which concerns on 2nd embodiment of this invention.

以下において、本発明を実施するための形態を説明する。説明の明確化のため、以下の記載及び図面は、適宜、省略及び簡略化がなされている。又、各図面において、同一要素には同一の符号が付されており、説明の明確化のため、必要に応じて重複説明は省略されている。 Below, the form for implementing this invention is demonstrated. For clarity of explanation, the following description and drawings are omitted and simplified as appropriate. Moreover, in each drawing, the same code | symbol is attached | subjected to the same element and the duplication description is abbreviate | omitted as needed for clarification of description.

本実施形態の本実施形態のサービスロボットは、外部の電子機器の制御機能に特徴を有している。電子機器としては、エアコン、テレビ、電子レンジ、ステレオセット、照明装置、ドアや窓などの開閉部材の駆動制御装置などがある。 The service robot according to the present embodiment is characterized by a control function of an external electronic device. Electronic devices include air conditioners, television sets, microwave ovens, stereo sets, lighting devices, drive control devices for opening and closing members such as doors and windows.

サービスロボットは音声認識機能を有し、利用者の発した言葉を認識する。サービスロボットは、利用者から音声によって外部機器の操作を指示されると、その外部機器を操作する。好ましい構成において、サービスロボットは利用者と対話し、利用者とコミュニケーションをとりながら外部機器を操作する。 The service robot has a voice recognition function, and recognizes words uttered by the user. When the service robot is instructed to operate the external device by voice, the service robot operates the external device. In a preferred configuration, the service robot interacts with the user and operates an external device while communicating with the user.

その外部機器の操作において、サービスロボットは、人間の音声及び／又はジェスチャによるその外部機器の操作を模倣した動作を行う。サービスロボットは、操作対象の外部機器の機能に応じて、音声、ジェスチャ又はそれらを組み合わせた動作を行う。これにより、利用者が外部機器の音声及び／又はジェスチャによる適切な操作方法を学習することができる。また、利用者がサービスロボットに非人間的な面を見出す可能性を小さくすることができる。 In the operation of the external device, the service robot performs an operation imitating the operation of the external device by human voice and / or gesture. The service robot performs voice, gesture, or a combination of them according to the function of the external device to be operated. Thereby, the user can learn an appropriate operation method by voice and / or gesture of the external device. In addition, it is possible to reduce the possibility that the user will find an inhuman face in the service robot.

＜第一の実施形態＞
最初に、本発明の第一の実施形態を説明する。本実施形態においては、サービスロボットは、外部機器操作のための音声を出力する。図１は、本実施形態のサービスロボットの構成を模式的に示すブロック図である。サービスロボットは、本実施形態の特徴である外部器制御機能を備えており、外部機器制御装置である。本実施形態は、特に、サービスロボットにおける外部機器制御機能について説明を行う。 <First embodiment>
First, a first embodiment of the present invention will be described. In the present embodiment, the service robot outputs a sound for operating an external device. FIG. 1 is a block diagram schematically showing the configuration of the service robot of this embodiment. The service robot has an external device control function, which is a feature of this embodiment, and is an external device control device. In the present embodiment, an external device control function in the service robot will be described in particular.

サービスロボット１０は、ＣＰＵ（中央演算装置）２０、記憶装置３０、バス４０を備え、さらに入出力デバイスとしてＮＩＦ（ネットワークインタフェース）５０、スピーカ６０、マイク７０、温度センサ８０、タイマ８１、移動機構制御装置９０及び移動機構９１を備える。 The service robot 10 includes a CPU (central processing unit) 20, a storage device 30, and a bus 40, and further, as an input / output device, an NIF (network interface) 50, a speaker 60, a microphone 70, a temperature sensor 80, a timer 81, and a moving mechanism control. The apparatus 90 and the moving mechanism 91 are provided.

ＣＰＵ２０は、記憶装置３０が備える後述の各プログラムを実行する演算装置である。記憶装置３０は、ＤＲＡＭやＳＲＡＭなどの揮発性媒体を有するデータ記憶装置、ハードディスクドライブなどの不揮発性媒体を有する記憶装置又はその組み合わせである記憶装置である。 The CPU 20 is an arithmetic device that executes each program described below included in the storage device 30. The storage device 30 is a storage device that is a data storage device having a volatile medium such as a DRAM or SRAM, a storage device having a nonvolatile medium such as a hard disk drive, or a combination thereof.

記憶装置３０は、外部機器制御のためのデータとして、機器データベース（ＤＢ）３１１、動作ＤＢ３１２、語彙ＤＢ３１３、内部状態ＤＢ３１４、動作キュー３１５、システム声質ＤＢ３１６及び利用者声質ＤＢ３１７を格納している。これらのＤＢは、それぞれ、記憶装置３０における対応する記憶領域に格納されている。 The storage device 30 stores a device database (DB) 311, an operation DB 312, a vocabulary DB 313, an internal state DB 314, an operation queue 315, a system voice quality DB 316, and a user voice quality DB 317 as data for external device control. Each of these DBs is stored in a corresponding storage area in the storage device 30.

本実施形態において、記憶装置３０に格納される情報は、データ構造に依存せず、どのようなデータ構造で表現されていてもよい。例えば、テーブル、リスト、データベースあるいはキューから適切に選択したデータ構造体が、情報を格納することができる。上記ＤＢのいくつかにより一つのＤＢを構成してもよく、一つのＤＢを複数のファイルで構成してもよい。 In the present embodiment, the information stored in the storage device 30 does not depend on the data structure and may be expressed in any data structure. For example, a data structure appropriately selected from a table, list, database, or queue can store information. One DB may be constituted by some of the DBs, and one DB may be constituted by a plurality of files.

記憶装置３０は、さらに、プログラムとして、外部装置管理プログラム３２１、動作計画プログラム３２２、音声認識プログラム３２３、システム音声合成プログラム３２４及び利用者音声合成プログラム３２５を格納している。プログラムはＣＰＵ２０によって実行されることで、定められた処理を行う。 The storage device 30 further stores an external device management program 321, an operation plan program 322, a speech recognition program 323, a system speech synthesis program 324, and a user speech synthesis program 325 as programs. The program is executed by the CPU 20 to perform a predetermined process.

従って、以下においてプログラムを主語とする説明は、ＣＰＵ２０を主語とした説明でもよい。上記プログラムに従って動作するＣＰＵ２０は、動作計画部、音声認識部、出力パターン生成部として機能する。プログラムが実行する処理は、そのプログラムが動作するサービスロボット１０が行う処理でもある。プログラムの一部又は全部は、専用ハードウェアによって実現されてもよい。プログラムは、プログラム配布サーバや、計算機読み取り可能媒体によってサービスロボット１０にインストールすることができ、記憶装置３０に格納することができる。 Therefore, in the following description, the description with the program as the subject may be an explanation with the CPU 20 as the subject. The CPU 20 that operates according to the program functions as an operation planning unit, a voice recognition unit, and an output pattern generation unit. The process executed by the program is also a process performed by the service robot 10 on which the program operates. Part or all of the program may be realized by dedicated hardware. The program can be installed in the service robot 10 by a program distribution server or a computer readable medium, and can be stored in the storage device 30.

バス４０には、ＣＰＵ２０、記憶装置３０、ＮＩＦ５０、スピーカ６０、マイク７０、温度センサ８０、タイマ８１及び移動機構制御装置９０が接続されており、各装置が相互にデータを通信するために利用される。 The bus 40 is connected to the CPU 20, the storage device 30, the NIF 50, the speaker 60, the microphone 70, the temperature sensor 80, the timer 81, and the moving mechanism control device 90, and each device is used to communicate data with each other. The

ＮＩＦ５０は、サービスロボット１０と外部機器との間においてデータを送受信するために用いられる装置である。具体的な通信内容については後述する。スピーカ６０は、バス４０を介して受信した音声データを空気振動に変換して出力する。マイク７０は、空気振動を検出して音声データに変換し、バス４０に出力する。温度センサ８０は、温度を検出してバス４０に出力する。タイマ８１は、現在時刻をバス４０に出力する。 The NIF 50 is a device used for transmitting and receiving data between the service robot 10 and an external device. Specific communication contents will be described later. The speaker 60 converts audio data received via the bus 40 into air vibration and outputs the air vibration. The microphone 70 detects air vibration, converts it into audio data, and outputs it to the bus 40. The temperature sensor 80 detects the temperature and outputs it to the bus 40. The timer 81 outputs the current time to the bus 40.

移動機構制御装置９０は、バス４０を介して他のプログラムから受けた指令に従い、移動機構９１の駆動制御を実行する。移動機構９１は本実施形態では車輪を想定するが、脚、モータ、ステアリングなど、移動機構９１はサービスロボット１０を移動するためのどのような手段でもよい。 The movement mechanism control device 90 executes drive control of the movement mechanism 91 in accordance with a command received from another program via the bus 40. Although the moving mechanism 91 is assumed to be a wheel in this embodiment, the moving mechanism 91 may be any means for moving the service robot 10 such as a leg, a motor, and a steering.

次に、外部装置管理プログラム３２１の動作を、図２を参照して説明する。外部装置管理プログラム３２１は、機器ＤＢ３１１及び語彙ＤＢ３１３により、外部機器の管理を行う。外部装置管理プログラム３２１は、機器ＤＢ３１１及び語彙ＤＢ３１３のエントリの追加、変更及び削除を行う。機器ＤＢ３１１及び語彙ＤＢ３１３の一例は、それぞれ、図３及び図４に示されている。これらについては後述する。 Next, the operation of the external device management program 321 will be described with reference to FIG. The external device management program 321 manages external devices using the device DB 311 and the vocabulary DB 313. The external device management program 321 adds, changes, and deletes entries in the device DB 311 and the vocabulary DB 313. Examples of the device DB 311 and the vocabulary DB 313 are shown in FIGS. 3 and 4, respectively. These will be described later.

図２のフローチャートに示すように、外部装置管理プログラム３２１は、サービスロボット１０の起動後に起動される（Ｓ１０１）。ＮＩＦ５０を介して外部機器から参加イベントパケットを受信した場合（Ｓ１０２におけるＹ）、外部装置管理プログラム３２１は、参加イベントパケットの内容に従い、機器ＤＢ３１１にエントリを追加する（Ｓ１０３）。 As shown in the flowchart of FIG. 2, the external device management program 321 is started after the service robot 10 is started (S101). When a participation event packet is received from an external device via the NIF 50 (Y in S102), the external device management program 321 adds an entry to the device DB 311 according to the content of the participation event packet (S103).

ＮＩＦ５０から離脱イベントパケットを受信した場合（Ｓ１０４におけるＹ）、外部装置管理プログラム３２１は、離脱イベントパケットが指定する機器ＩＤに該当するエントリを機器ＤＢ３１１から削除する（Ｓ１０５）。また、ＮＩＦ５０から語彙データパケットを受信した場合（Ｓ１０６におけるＹ）、語彙データパケットの内容に従い語彙ＤＢ３１３の内容を更新する（Ｓ１０７）。外部装置管理プログラム３２１は、サービスロボット１０の稼働中、このように常にＮＩＦ５０の受信パケットを監視し、機器ＤＢ３１１及び語彙ＤＢ３１３を更新しつづける。 When the disconnection event packet is received from the NIF 50 (Y in S104), the external device management program 321 deletes the entry corresponding to the device ID specified by the disconnection event packet from the device DB 311 (S105). When a vocabulary data packet is received from the NIF 50 (Y in S106), the contents of the vocabulary DB 313 are updated according to the contents of the vocabulary data packet (S107). The external device management program 321 constantly monitors the received packets of the NIF 50 and keeps updating the device DB 311 and the vocabulary DB 313 while the service robot 10 is in operation.

図３は、機器ＤＢ３１１の好ましい一例を示している。本例は、外部装置管理プログラム３２１が３つの外部機器からそれぞれ参加イベントパケットを受信した場合の機器ＤＢ３１１の内容を示している。参加イベントパケットには、図３の機器ＤＢ３１１の列に対応した内容、すなわち、「機器ＩＤ」、「機器の種別」、「音声機能の対応／非対応」、「合成プロファイル」、「位置」、「音声認識推奨位置」が記述されている。 FIG. 3 shows a preferred example of the device DB 311. This example shows the contents of the device DB 311 when the external device management program 321 receives participation event packets from three external devices. In the participation event packet, the contents corresponding to the column of the device DB 311 in FIG. 3, that is, “device ID”, “device type”, “correspondence / non-correspondence of voice function”, “synthesis profile”, “position”, “Speech recognition recommended position” is described.

「機器ＩＤ」は、外部機器固有の識別子である。「機器の種別」は、外部機器の種別を表す識別子である。「音声機能の対応／非対応」は、外部機器が音声認識により操作可能かどうかを表す情報である。「合成プロファイル」は、外部機器に対しサービスロボット１０がシステム音声合成プログラム３２４により音声を出力するのに用いる情報である。詳細は後述する。 “Device ID” is an identifier unique to the external device. “Device type” is an identifier representing the type of external device. “Support / non-support of voice function” is information indicating whether or not an external device can be operated by voice recognition. “Synthesis profile” is information used by the service robot 10 to output voice to the external device by the system voice synthesis program 324. Details will be described later.

「位置」は、外部機器が存在する物理空間内での座標であり、外部機器自身の位置判別機能が取得した情報あるいは利用者が設定した情報に基づいて決まる。「音声認識推奨位置」は、外部機器を音声で操作する際に推奨される利用者の位置であり、外部機器からの相対位置で決定する。 The “position” is a coordinate in a physical space where the external device exists, and is determined based on information acquired by the position determination function of the external device itself or information set by the user. The “speech recognition recommended position” is a position of a user recommended when operating the external device by voice, and is determined by a relative position from the external device.

図４は、語彙ＤＢ３１３の好ましい一例を示している。語彙ＤＢ３１３は、外部機器の操作内容とその操作のための語彙とを関連付けて記憶している。操作内容は、語彙に応答して外部機器が行う動作である。図４の例においては、３つの操作内容（外部機器の動作）と、操作内容のそれぞれに対応する語彙とが、語彙ＤＢ３１３に登録されている。 FIG. 4 shows a preferred example of the vocabulary DB 313. The vocabulary DB 313 stores the operation content of the external device and the vocabulary for the operation in association with each other. The operation content is an operation performed by the external device in response to the vocabulary. In the example of FIG. 4, three operation contents (operations of external devices) and vocabularies corresponding to the operation contents are registered in the vocabulary DB 313.

サービスロボット１０は、参加イベントパケットによって外部機器の属性情報（機器ＤＢ３１１及び語彙ＤＢ３１３に格納する情報）を取得するのではなく、記憶装置３０に、別途機器ＩＤと関連付けて予め記憶されている外部機器情報を用いてもよい。この構成においては、参加イベントパケットには、外部機器情報が記述されていなくともよい。サービスロボット１０は、外部ネットワークを介して外部機器の情報を取得する又はユーザがその情報をサービスロボット１０に登録してもよい。 The service robot 10 does not acquire external device attribute information (information stored in the device DB 311 and the vocabulary DB 313) by the participation event packet, but is stored in advance in the storage device 30 in association with the device ID. Information may be used. In this configuration, external device information may not be described in the participation event packet. The service robot 10 may acquire information on an external device via an external network, or the user may register the information in the service robot 10.

次に、動作計画プログラム３２２の動作を、図５Ａ及図５Ｂを参照して説明する。動作計画プログラム３２２は、サービスロボット１０の動作を制御する。動作計画プログラム３２２は、サービスロボット１０の今後の動作を決定し（計画し）、他のプログラム及び装置に対して決定した動作を指示する。 Next, the operation of the operation planning program 322 will be described with reference to FIGS. 5A and 5B. The operation planning program 322 controls the operation of the service robot 10. The operation planning program 322 determines (plans) future operations of the service robot 10 and instructs other programs and devices on the determined operations.

動作計画プログラム３２２は、内部状態ＤＢ３１４に登録されている状態情報が規定の条件を満足している場合、外部機器を操作することを決定する。内部状態ＤＢ３１４は、状態情報として、利用者音声の認識結果（音声からの変換文字列）の他、温度センサ８０の検出温度、タイマ８１の計測時間及び利用者の設定情報などを格納することができる。内部状態ＤＢ３１４が格納する情報は、サービスロボット１０の設計及び動作に依存する。 The operation planning program 322 determines to operate the external device when the state information registered in the internal state DB 314 satisfies a prescribed condition. The internal state DB 314 may store, as state information, a user voice recognition result (a conversion character string from speech), a temperature detected by the temperature sensor 80, a measurement time of the timer 81, user setting information, and the like. it can. Information stored in the internal state DB 314 depends on the design and operation of the service robot 10.

動作ＤＢ３１２は、内部状態ＤＢ３１４の状態情報における規定条件と外部機器操作とを関連付けた情報を格納している。上述のように、本実施形態の状態情報は、利用者の音声の認識結果を含む。動作ＤＢ３１２は、内部状態ＤＢ３１４に利用者の音声による外部機器操作の指示が含まれている場合に、外部機器を操作する。 The operation DB 312 stores information that associates the specified condition in the state information of the internal state DB 314 with the operation of the external device. As described above, the state information of the present embodiment includes the recognition result of the user's voice. The operation DB 312 operates an external device when the internal state DB 314 includes an instruction to operate the external device by a user's voice.

図５Ａ及び図５Ｂは、本実施形態における動作計画プログラム３２２の動作を示すフローチャートである。図５Ｂは、図５Ａにおけるステップ２０７の詳細を示す。動作計画プログラム３２２は、これらに示す動作を繰り返し実行する。図５Ａに示すように、動作計画プログラム３２２はサービスロボット１０の起動後に起動される（Ｓ２０１）。 5A and 5B are flowcharts showing the operation of the operation planning program 322 in the present embodiment. FIG. 5B shows details of step 207 in FIG. 5A. The operation planning program 322 repeatedly executes the operations shown in these. As shown in FIG. 5A, the motion planning program 322 is started after the service robot 10 is started (S201).

動作計画プログラム３２２は、まず、サービスロボット１０の各センサ（温度センサ８０、タイマ８１）から情報を取得し、内部状態ＤＢ３１４を更新する（Ｓ２０２）。次に、動作計画プログラム３２２は、動作ＤＢ３１２及び内部状態ＤＢ３１４を参照し、動作ＤＢ３１２の各項目の条件と内部状態ＤＢ３１４の内容を照合し、条件を満たしているかを調べる（Ｓ２０３）。条件を満たす項目があれば（Ｓ２０３におけるＹ）、動作計画プログラム３２２は、動作ＤＢ３１２においてその項目の動作の欄に記述された動作を動作キュー３１５に追加する（Ｓ２０４）。 First, the motion planning program 322 acquires information from each sensor (temperature sensor 80, timer 81) of the service robot 10 and updates the internal state DB 314 (S202). Next, the operation planning program 322 refers to the operation DB 312 and the internal state DB 314, collates the conditions of each item of the operation DB 312 with the contents of the internal state DB 314, and checks whether the conditions are satisfied (S203). If there is an item that satisfies the condition (Y in S203), the action planning program 322 adds the action described in the action column of that item in the action DB 312 to the action queue 315 (S204).

次に、動作計画プログラム３２２は、動作キュー３１５に動作が一つ以上登録されているかを調べる（Ｓ２０５）。登録されているものがない場合（Ｓ２０５におけるＮ）、動作計画プログラム３２２は、ステップ２０２に戻る。登録されているものがある場合（Ｓ２０５におけるＹ）、動作計画プログラム３２２は、動作キュー３１５の先頭の動作を表すデータを取り出し、その動作の種類を特定する（Ｓ２０６）。動作計画プログラム３２２は、以降のステップにおいて、その特定した種類に応じた処理を実行する。 Next, the action planning program 322 checks whether one or more actions are registered in the action queue 315 (S205). When there is nothing registered (N in S205), the operation planning program 322 returns to Step 202. When there is a registered one (Y in S205), the operation planning program 322 takes out data representing the first operation in the operation queue 315 and specifies the type of the operation (S206). The operation planning program 322 executes processing according to the specified type in subsequent steps.

ステップ２０６で特定した種類が「外部機器操作」以外の場合（Ｓ２０６における他の種類）、動作計画プログラム３２２は、ステップ２０７を実行する。動作計画プログラム３２２は、特定した動作の種類に対応する処理を複数の処理から選択して実行する。図５Ｂは、ステップ２０７において選択的に実行される複数の処理を示している。 When the type specified in step 206 is other than “external device operation” (other types in S206), the operation planning program 322 executes step 207. The action planning program 322 selects and executes a process corresponding to the identified action type from a plurality of processes. FIG. 5B shows a plurality of processes selectively executed in step 207.

図５Ｂに示すように、動作の種類が「移動」の場合、動作計画プログラム３２２は、移動機構制御装置９０に指令を出す（Ｓ２０７ａ）。動作の種類が「システム音声出力」の場合、動作計画プログラム３２２は、システム音声合成プログラム３２４に指令を出す（Ｓ２０７ｂ）。動作の種類が「状態変更」の場合、動作計画プログラム３２２は、内部状態ＤＢ３１４を更新する（Ｓ２０７ｃ）。 As shown in FIG. 5B, when the type of operation is “move”, the operation plan program 322 issues a command to the movement mechanism control device 90 (S207a). When the type of operation is “system speech output”, the operation planning program 322 issues a command to the system speech synthesis program 324 (S207b). When the operation type is “change state”, the operation plan program 322 updates the internal state DB 314 (S207c).

動作計画プログラム３２２は、種類が「ネットワーク送信」の場合、ＮＩＦ５０に送信指令を出す（Ｓ２０７ｄ）。動作の種類が「利用者音声出力」の場合、動作計画プログラム３２２は、利用者音声合成プログラム３２５に指令を出す（Ｓ２０７ｅ）。動作の種類が「操作完了」の場合、動作計画プログラム３２２は、対象の外部機器に音声認識ＯＮを指令するネットワーク送信動作を追加する（Ｓ２０７ｆ）。「操作完了」は、外部機器の操作の完了を意味する。ステップ２０７ａ〜ステップ２０７ｆについては後述する。 If the type is “network transmission”, the operation planning program 322 issues a transmission command to the NIF 50 (S207d). When the type of operation is “user voice output”, the action planning program 322 issues a command to the user voice synthesis program 325 (S207e). When the operation type is “operation completed”, the operation planning program 322 adds a network transmission operation that instructs the target external device to turn on voice recognition (S207f). “Operation complete” means completion of operation of the external device. Steps 207a to 207f will be described later.

次に、ステップ２０６において動作の種類が「外部機器操作」の場合について説明する。動作キュー３１５に「外部機器操作」が登録される条件として、内部状態ＤＢ３１４に、外部機器操作の指示に相当する利用者音声の認識結果が格納されている。本構成例においては、音声認識プログラム３２３が、音声認識結果の文字列を内部状態ＤＢ３１４に格納する。この点は後述する。外部機器操作に対応付けられている内部状態ＤＢ３１４の項目は、利用者の認識音声のみであることもあれば、他の項目を含むこともある。 Next, a case where the operation type is “external device operation” in step 206 will be described. As a condition for registering “external device operation” in the operation queue 315, the internal state DB 314 stores a user voice recognition result corresponding to an external device operation instruction. In this configuration example, the voice recognition program 323 stores the character string of the voice recognition result in the internal state DB 314. This point will be described later. The item in the internal state DB 314 associated with the external device operation may be only the user's recognized voice or may include other items.

動作計画プログラム３２２は、まず機器ＤＢ３１１を参照して、操作対象となる外部機器を選択する（Ｓ２０８）。さらに、動作計画プログラム３２２は、その外部機器の音声認識推奨位置を機器ＤＢ３１１から取得し、上記推奨位置へサービスロボット１０を移動させるための「移動」動作を、動作キュー３１５に追加する（Ｓ２０９）。 The operation plan program 322 first refers to the device DB 311 and selects an external device to be operated (S208). Further, the operation planning program 322 acquires the recommended voice recognition position of the external device from the device DB 311 and adds a “move” operation for moving the service robot 10 to the recommended position to the operation queue 315 (S209). .

次に、動作計画プログラム３２２は、対象の外部機器に対して「音声認識ＯＦＦ」を指令する「ネットワーク送信」動作を、動作キュー３１５に追加する（Ｓ２１０）。次に、対象の外部機器の語彙ＤＢ３１３を参照して、操作内容に対応する語彙を選択する。さらに、その語彙を音声出力するための「利用者音声出力」動作を、動作キュー３１５に追加する（Ｓ２１１）。最後に外部機器に対して操作内容の実行を指令するパケットを送信させるための「ネットワーク送信」動作を、動作キュー３１５に追加する（Ｓ２１２）。 Next, the operation planning program 322 adds a “network transmission” operation that commands “voice recognition OFF” to the target external device to the operation queue 315 (S210). Next, the vocabulary corresponding to the operation content is selected with reference to the vocabulary DB 313 of the target external device. Further, a “user voice output” operation for outputting the vocabulary as a voice is added to the action queue 315 (S211). Finally, a “network transmission” operation for transmitting a packet for instructing the external device to execute the operation content is added to the operation queue 315 (S212).

動作計画プログラム３２２は、ステップ２０２〜ステップ２０６を実行して後、動作キュー３１５に追加された動作に応じた処理を行う（Ｓ２０７）。上述のように、動作キュー３１５に登録された動作は、推奨位置への「移動」（Ｓ２０９）、「音声認識ＯＦＦ」を指令する「ネットワーク送信」（Ｓ２１０）、登録語彙の「利用者音声出力」（Ｓ２１１）及び操作実行指令パケットの「ネットワーク送信」（Ｓ２１２）である。 The operation planning program 322 executes step 202 to step 206, and then performs processing according to the operation added to the operation queue 315 (S207). As described above, the movement registered in the movement queue 315 includes “movement” to the recommended position (S209), “network transmission” (S210) instructing “voice recognition OFF”, and “user voice output” of the registered vocabulary. (S211) and “network transmission” (S212) of the operation execution command packet.

動作計画プログラム３２２は、上記動作を上記の順で実行する。具体的には、まず、動作計画プログラム３２２は、機構制御装置９０に推奨位置への移動を指令する（Ｓ２０７ａ）。機構制御装置９０は、指令に従って移動機構９１を制御し、サービスロボット１０を推奨位置に移動する。 The operation planning program 322 executes the above operations in the above order. Specifically, first, the motion planning program 322 instructs the mechanism control device 90 to move to the recommended position (S207a). The mechanism control device 90 controls the moving mechanism 91 according to the command, and moves the service robot 10 to the recommended position.

次に、動作計画プログラム３２２は、ＮＩＦ５０に対象の外部機器に対して「音声認識ＯＦＦ」の指示を送信することを指令する（Ｓ２０７ｄ）。ＮＩＦ５０は、その指令に応じて、「音声認識ＯＦＦ」の指示を対象の外部機器に送信する。指示を受けた外部機器は、その音声認識機能をＯＦＦする。 Next, the operation planning program 322 instructs the NIF 50 to transmit a “voice recognition OFF” instruction to the target external device (S207d). In response to the instruction, the NIF 50 transmits a “voice recognition OFF” instruction to the target external device. The external device that has received the instruction turns off its voice recognition function.

次に、動作計画プログラム３２２は、外部機器操作に対応する語彙を出力することを、利用者音声合成プログラム３２５に指令する（Ｓ２０７ｅ）。利用者音声合成プログラム３２５は、動作計画プログラム３２２からの指令に応じて、上記語彙を表す音声波形を生成し、スピーカ６０を介して出力する。 Next, the motion planning program 322 instructs the user speech synthesis program 325 to output a vocabulary corresponding to the external device operation (S207e). The user speech synthesis program 325 generates a speech waveform representing the vocabulary in response to a command from the motion planning program 322 and outputs the speech waveform via the speaker 60.

次に、動作計画プログラム３２２は、ＮＩＦ５０に対象の外部機器に対して操作実行の指示パケットを送信することを指令する（Ｓ２０７ｄ）。ＮＩＦ５０は、その指示に応じて、対象の外部機器に上記パケットを送信する。指示を受けた外部機器は、パケットが示す指示に従った動作を行う。 Next, the operation planning program 322 instructs the NIF 50 to transmit an operation execution instruction packet to the target external device (S207d). In response to the instruction, the NIF 50 transmits the packet to the target external device. The external device that has received the instruction performs an operation in accordance with the instruction indicated by the packet.

利用者音声合成プログラム３２５は、音声出力の後、「音声操作完了」の動作を動作キュー３１５に追加する（図７のＳ４０５）。動作計画プログラム３２２は、「音声認識ＯＮ」の指示を対象の外部機器に対して送信する動作を動作キュー３１５に追加する（Ｓ２０７ｆ）。 After the voice output, the user voice synthesis program 325 adds a “voice operation complete” operation to the operation queue 315 (S405 in FIG. 7). The operation plan program 322 adds an operation of transmitting an instruction “voice recognition ON” to the target external device to the operation queue 315 (S207f).

その後、利用者音声合成プログラム３２５は、動作キュー３１５に登録されている指示に従って、ＮＩＦ５０に「音声認識ＯＮ」の指示を対象の外部機器に対して送信することを指令する（Ｓ２０７ｄ）。ＮＩＦ５０は、その指示に応じて、「音声認識ＯＮ」の指示を対象の外部機器に送信する。指示を受けた外部機器は、その音声認識機能をＯＮする。これにより、この一連の動作の後は利用者自身が再度音声認識で外部機器を制御できる準備が整う。 Thereafter, the user speech synthesis program 325 instructs the NIF 50 to transmit a “voice recognition ON” instruction to the target external device in accordance with the instruction registered in the operation queue 315 (S207d). In response to the instruction, the NIF 50 transmits a “voice recognition ON” instruction to the target external device. The external device that has received the instruction turns on its voice recognition function. Thereby, after this series of operations, the user is ready to control the external device by voice recognition again.

このように、サービスロボット１０は、利用者の音声を認識し、音声による外部機器操作の指示に従って、外部機器を操作する。本実施形態において、音声認識プログラム３２３が、利用者の音声認識を行う。音声認識プログラム３２３の動作を、図６を参照して説明する。 In this way, the service robot 10 recognizes the user's voice and operates the external device according to the external device operation instruction by voice. In the present embodiment, the voice recognition program 323 performs user voice recognition. The operation of the speech recognition program 323 will be described with reference to FIG.

音声認識プログラム３２３はサービスロボット１０の起動後に起動される（Ｓ３０１）。音声認識プログラム３２３は、まず、マイク７０から音声データを取得する（Ｓ３０２）。音声データから利用者の声を検出するまで、音声データを順次取得する（Ｓ３０３）。利用者の声が検出されたかどうかを判断する技術は広く知られている。例えば、音声認識プログラム３２３は、「“音声認識の基礎”Lawrence Rabiner, Biing-Hwang Juang著、古井貞煕監訳、ＮＴＴアドバンステクノロジ株式会社発行」に示されているような一般的な音声検出方法を用いる。 The voice recognition program 323 is activated after the service robot 10 is activated (S301). The voice recognition program 323 first acquires voice data from the microphone 70 (S302). The voice data is sequentially acquired until the voice of the user is detected from the voice data (S303). Techniques for determining whether a user's voice has been detected are widely known. For example, the speech recognition program 323 uses a general speech detection method as shown in ““ Basics of Speech Recognition ”by Lawrence Rabiner, Biing-Hwang Juang, translated by Sadaaki Furui, published by NTT Advanced Technology Corporation. Use.

次に、音声認識プログラム３２３は、取得した音声データから利用者の声の特徴を抽出し、利用者声質ＤＢ３１７に登録する（Ｓ３０４）。声の特徴とは、音素ごとの波形そのものや、基本周波数、継続長などであり、これらのパラメータを音声合成プログラム３２４で利用することで利用者の声に近い合成音声を作成しうるものである。声の特徴を抽出する技術は広く知られたものであり、例えば、HMM-based Speech Synthesis System (HTS)のような、ＨＭＭ（ＨｉｄｄｅｎＭａｒｋｏｖＭｏｄｅｌ）音声合成法による話者適応を用いて実現できる。 Next, the voice recognition program 323 extracts the characteristics of the user's voice from the acquired voice data and registers it in the user voice quality DB 317 (S304). The characteristics of the voice are the waveform itself for each phoneme, the fundamental frequency, the duration, and the like. By using these parameters in the speech synthesis program 324, a synthesized speech close to the voice of the user can be created. . A technique for extracting voice features is widely known, and can be realized by using speaker adaptation based on a HMM (Hidden Markov Model) speech synthesis method such as HMM-based Speech Synthesis System (HTS).

次に、音声認識プログラム３２３は、利用者の音声認識処理を行い、音声が認識された場合（Ｓ３０５におけるＹ）、音声認識結果の文字列を内部状態ＤＢ３１４に登録する。音声が認識されない場合（Ｓ３０５におけるＮ）、音声認識プログラム３２３は最初のステップ３０２に戻る。 Next, the voice recognition program 323 performs voice recognition processing of the user, and when voice is recognized (Y in S305), registers the character string of the voice recognition result in the internal state DB 314. When the voice is not recognized (N in S305), the voice recognition program 323 returns to the first step 302.

図５を参照して説明したように、好ましい構成において、サービスロボット１０は、利用者の声で外部機器に操作指示の音声を出力する。本構成例において、利用者音声合成プログラム３２５が、利用者の声による出力音声（の波形）を合成する。以下において、図７のフローチャートを参照して、利用者音声合成プログラム３２５の動作を説明する。 As described with reference to FIG. 5, in a preferred configuration, the service robot 10 outputs an operation instruction sound to an external device in the user's voice. In this configuration example, the user voice synthesis program 325 synthesizes output voice (waveform) by the user's voice. Hereinafter, the operation of the user speech synthesis program 325 will be described with reference to the flowchart of FIG.

利用者音声合成プログラム３２５は、利用者音声合成プログラム３２５はサービスロボット１０の起動後に起動される（Ｓ４０１）。利用者音声合成プログラム３２５は、まず、音声出力の指令を受信するまで待機する（Ｓ４０２）。指令を受信したら、合成音声の波形を計算する（Ｓ４０３）。文字列から音声を合成する様々な技術が知られており、利用者音声合成プログラム３２５は、例えば、「"An Introduction to Text-to-Speech Synthesis", by Thierry Dutoit, KLUWER ACADEMIC PUBLISHERS」に示されているような一般的な音声合成方法を用いることができる。 The user speech synthesis program 325 is activated after the service robot 10 is activated (S401). First, the user voice synthesis program 325 waits until a voice output command is received (S402). When the command is received, the waveform of the synthesized speech is calculated (S403). Various techniques for synthesizing speech from character strings are known, and a user speech synthesis program 325 is shown in, for example, “An Introduction to Text-to-Speech Synthesis”, by Thierry Dutoit, KLUWER ACADEMIC PUBLISHERS. A general speech synthesis method can be used.

このとき、利用者音声合成プログラム３２５は、機器ＤＢ３１１に記述された合成プロファイル及び利用者声質ＤＢ３１７に基づき、音声合成に用いるパラメータを変更する。例えば、図３に示した機器ＤＢでは、合成プロファイルとして「声質」及び「推奨発話速度」が登録されている。「声質」は声道特性など合成音声の声質を指定するパラメータである。「推奨発話速度」は合成音声の発話速度を示したものである。 At this time, the user speech synthesis program 325 changes parameters used for speech synthesis based on the synthesis profile and user voice quality DB 317 described in the device DB 311. For example, in the device DB shown in FIG. 3, “voice quality” and “recommended utterance speed” are registered as the synthesis profile. “Voice quality” is a parameter for designating voice quality of synthesized speech such as vocal tract characteristics. “Recommended speech rate” indicates the speech rate of the synthesized speech.

例えば、「ＨＴＣ＿ＡＣ＿１０Ａ」に対する合成音声を作成する場合、「声質」が指定されているので、利用者音声合成プログラム３２５は、指定されたパラメータを用いる。また「発話速度」がｓｌｏｗなので、利用者音声合成プログラム３２５は、通常よりも発話速度が遅い（例えば通常４モーラ毎秒に対して３モーラ毎秒）合成音声を作成する。また、「音量」が「＋３ｄＢ」なので、利用者音声合成プログラム３２５は、通常よりも音量が３ｄＢ大きい合成音声を作成する。 For example, when a synthesized speech for “HTC_AC — 10A” is created, “voice quality” is designated, so the user speech synthesis program 325 uses the designated parameters. Since the “speech rate” is slow, the user speech synthesis program 325 creates a synthesized speech having a speech rate slower than normal (eg, 3 mora per second for normal 4 mora per second). Further, since “volume” is “+3 dB”, the user speech synthesis program 325 creates a synthesized speech whose volume is 3 dB higher than usual.

次に、利用者音声合成プログラム３２５は、作成した合成音声の波形をスピーカ６０に出力する（Ｓ４０４）。スピーカ６０への出力が完了したら、「音声操作完了」の種類の動作を動作キュー３１５に追加する（Ｓ４０５）。 Next, the user voice synthesis program 325 outputs the generated synthesized voice waveform to the speaker 60 (S404). When the output to the speaker 60 is completed, an operation of the type “sound operation completed” is added to the operation queue 315 (S405).

好ましい構成において、サービスロボット１０は、利用者との対話においては、利用者の声とは異なるシステム音声を使用する。本構成において、システム音声は、システム音声合成プログラム３２４によって生成される。図８は、システム音声合成プログラム３２４の動作を示すフローチャートである。 In a preferred configuration, the service robot 10 uses a system voice different from the user's voice in the dialog with the user. In this configuration, the system voice is generated by the system voice synthesis program 324. FIG. 8 is a flowchart showing the operation of the system speech synthesis program 324.

システム音声合成プログラム３２４は、サービスロボット１０の起動後に起動される（Ｓ５０１）。まず、音声出力の指令を受信するまで待機する（Ｓ５０２）。指令を受信したら（Ｓ５０２におけるＹ）、システムの声の特徴が示されたシステム声質ＤＢ３１６に基づき合成音声の波形を計算する（Ｓ５０３）。 The system speech synthesis program 324 is activated after the service robot 10 is activated (S501). First, it waits until an audio output command is received (S502). When the command is received (Y in S502), the synthesized speech waveform is calculated based on the system voice quality DB 316 indicating the characteristics of the system voice (S503).

文字列から音声を合成する方法としては、システム音声合成プログラム３２４は、"An Introduction to Text-to-Speech Synthesis", by Thierry Dutoit, KLUWER ACADEMIC PUBLISHERSに示されているような一般的な音声合成方法を用いることができる。システム音声合成プログラム３２４は、作成した合成音声の波形をスピーカ６０に出力する（Ｓ５０４）。 As a method of synthesizing speech from character strings, the system speech synthesis program 324 is a general speech synthesis method as shown in "An Introduction to Text-to-Speech Synthesis", by Thierry Dutoit, KLUWER ACADEMIC PUBLISHERS. Can be used. The system voice synthesis program 324 outputs the generated synthesized voice waveform to the speaker 60 (S504).

以下において、サービスロボット１０による外部機器操作の具体例を説明する。本例において、サービスロボット１０は、利用者の音声による指示に従って、エアコンの冷房機能を操作する。図９は、動作計画プログラム３２２により更新（図５ＡにおけるＳ２０２）された内部状態ＤＢ３１４を示している。 Hereinafter, a specific example of external device operation by the service robot 10 will be described. In this example, the service robot 10 operates a cooling function of the air conditioner according to an instruction by a user's voice. FIG. 9 shows the internal state DB 314 updated by the operation plan program 322 (S202 in FIG. 5A).

この内部状態ＤＢ３１４において、温度センサ８０の情報（検出温度）が状態名「温度センサ」のフィールドに、タイマ８１の情報（時刻）が、状態名「現在時刻」の項目に反映されている。そのほかに、「ユーザ上限温度」が２８℃と別途設定されており、これは、利用者自身が設定することができる任意の項目の１つである。 In this internal state DB 314, the information (detected temperature) of the temperature sensor 80 is reflected in the field of the state name “temperature sensor”, and the information (time) of the timer 81 is reflected in the item of the state name “current time”. In addition, the “user upper limit temperature” is set separately as 28 ° C., which is one of the optional items that can be set by the user.

動作計画プログラム３２２は、図１０に示す動作ＤＢ３１２と図９に示す内部状態ＤＢ３１４とを比較する（Ｓ２０３）。図１０の動作ＤＢ３１２に登録されている条件において、「温度センサ感知温度＞ユーザ上限温度」の条件が満足している。従って、動作計画プログラム３２２は、対応する動作を動作キュー３１５に追加する（Ｓ２０４）。 The action planning program 322 compares the action DB 312 shown in FIG. 10 with the internal state DB 314 shown in FIG. 9 (S203). In the conditions registered in the operation DB 312 in FIG. 10, the condition “temperature sensor sensed temperature> user upper limit temperature” is satisfied. Therefore, the action planning program 322 adds the corresponding action to the action queue 315 (S204).

図１１は、条件「温度センサ感知温度＞ユーザ上限温度」に対応する動作が登録された動作キュー３１５を示している。先頭の動作の種類は「システム音声出力」である。従って、動作計画プログラム３２２は、システム音声合成プログラム３２４に「クーラつけますか？」という音声を出力するよう指令する（図５ＢにおけるＳ２０７ｂ）。 FIG. 11 shows an operation queue 315 in which operations corresponding to the condition “temperature sensor sensed temperature> user upper limit temperature” are registered. The first operation type is “system audio output”. Therefore, the motion planning program 322 instructs the system voice synthesis program 324 to output a voice “Do you want to cooler?” (S207b in FIG. 5B).

図１１の動作キュー３１５の２番目、３番目の動作の種類は「状態変更」なので、動作計画プログラム３２２は、内部状態ＤＢ３１４を更新する（Ｓ２０７ｃ）。図１２は、更新され内部状態ＤＢ３１４を示している。「クーラ質問」の状態値は「１」であり、「待機時間」の状態値は、「現在時刻」の状態値に１０を可算した値である。 Since the type of the second and third operations in the operation queue 315 in FIG. 11 is “state change”, the operation plan program 322 updates the internal state DB 314 (S207c). FIG. 12 shows the updated internal state DB 314. The status value of “cooler question” is “1”, and the status value of “waiting time” is a value obtained by adding 10 to the status value of “current time”.

次に、利用者が「おねがいします」と発話したとする。音声認識プログラム３２３が利用者の音声認識処理を行い（図６を参照）、利用者の「おねがいします」という言葉を認識する。音声認識プログラム３２３は、その認識結果に応じて内部状態ＤＢ３１４を更新する（図６におけるＳ３０６）。図１３は、この更新された内部状態ＤＢ３１４を示している。「音声入力」の状態値は、「おねがいします」である。 Next, it is assumed that the user utters “Please, please”. The voice recognition program 323 performs voice recognition processing of the user (see FIG. 6) and recognizes the user's word “please please”. The voice recognition program 323 updates the internal state DB 314 according to the recognition result (S306 in FIG. 6). FIG. 13 shows the updated internal state DB 314. The state value of “speech input” is “please please”.

動作計画プログラム３２２は、図１３の内部状態ＤＢ３１４と図１０の動作ＤＢ３１２とを比較する。動作ＤＢ３１２の３番目の項目の条件が満たされている。従って、動作計画プログラム３２２は、対応する動作を動作キュー３１５に追加する（Ｓ２０４）。図１４は、この動作キュー３１５を示している。１番目の動作種類は「システム音声出力」であり、その動作内容は「了解しました」である。２番目の動作種類は外部機器操作であり、その動作内容は「エアコンＯＮ、設定温度２６℃」である。 The action planning program 322 compares the internal state DB 314 of FIG. 13 with the action DB 312 of FIG. The condition of the third item in the operation DB 312 is satisfied. Therefore, the action planning program 322 adds the corresponding action to the action queue 315 (S204). FIG. 14 shows this operation queue 315. The first operation type is “system audio output”, and the operation content is “I understand”. The second operation type is an external device operation, and the operation content is “air conditioner ON, set temperature 26 ° C.”.

動作計画プログラム３２２は、図１４に示す動作キュー３１５に登録されている指示に従って、まず、システム音声合成プログラム３２４にシステム音声の出力を指令する（Ｓ２０７ｂ）。音声合成プログラ３２４の処理により、「了解しました」という音声がスピーカ６０から出力される。具体的には、システム音声合成プログラム３２４は、指示に応じて、「了解しました」という合成音声をシステム声質ＤＢ３１６に基づき計算し（図８におけるＳ４０３）、合成した音声波形をスピーカに出力する（Ｓ４０４）。 The motion planning program 322 first instructs the system speech synthesis program 324 to output system speech in accordance with the instructions registered in the motion queue 315 shown in FIG. 14 (S207b). Through the processing of the speech synthesis program 324, a sound “I understand” is output from the speaker 60. Specifically, the system speech synthesis program 324 calculates a synthesized speech “I understand” based on the instruction based on the system voice quality DB 316 (S403 in FIG. 8), and outputs the synthesized speech waveform to the speaker ( S404).

次に、動作計画プログラム３２２は、「外部機器操作」の処理を実行する（Ｓ２０８〜Ｓ２１２）。具体的には、動作計画プログラム３２２は、エアコン操作のための一連の動作を動作キュー３１５に追加する（Ｓ２０９〜Ｓ２１２）。図１５は、新たな動作が追加され動作キュー３１５を示している。 Next, the operation planning program 322 executes “external device operation” processing (S208 to S212). Specifically, the operation plan program 322 adds a series of operations for operating the air conditioner to the operation queue 315 (S209 to S212). FIG. 15 shows an operation queue 315 with new operations added.

動作計画プログラム３２２は、図１５の動作キュー３１５に登録されている、「移動」、「ネットワーク出力」、「音声出力」及び「ネットワーク出力」の動作を、逐次処理する。具体的には、音声認識推奨位置（１０、１２、１０）への移動を機構制御装置９０に指示し、「ＨＴＣ＿ＡＣ＿１０Ａ」への「音声認識ＯＦＦ」の指示を送信するようにＮＩＦ５０に指示し、「エアコンを２６℃に設定してください」という利用者音声出力を利用者音声合成プログラム３２５に指示し、そして、「ＨＴＣ＿ＡＣ＿１０Ａ」への「エアコンＯＮ、設定温度２６℃」の指示を送信するようにＮＩＦ５０に指令する。 The operation plan program 322 sequentially processes the operations “movement”, “network output”, “voice output”, and “network output” registered in the operation queue 315 of FIG. 15. Specifically, the mechanism control device 90 is instructed to move to the recommended speech recognition position (10, 12, 10), the NIF 50 is instructed to transmit the “speech recognition OFF” instruction to “HTC_AC — 10A”, and Instruct the user voice synthesis program 325 to “set the air conditioner to 26 ° C.” and send the instruction “air conditioner ON, set temperature 26 ° C.” to “HTC_AC — 10A”. Command NIF50.

その結果、サービスロボット１０はエアコン「ＨＴＣ＿ＡＣ＿１０Ａ」の前に移動し、エアコン「ＨＴＣ＿ＡＣ＿１０Ａ」の音声認識機能はＯＦＦになる。その後、サービスロボット１０は、利用者音声合成プログラム３２５により利用者の声に近い声の「エアコンを２６℃に設定してください」という音声をスピーカ６０から発し、ネットワークを介して、エアコン「ＨＴＣ＿ＡＣ＿１０Ａ」に「エアコンＯＮ、設定温度２６℃」にするよう指令を与える。 As a result, the service robot 10 moves in front of the air conditioner “HTC_AC_10A”, and the voice recognition function of the air conditioner “HTC_AC_10A” is turned off. Thereafter, the service robot 10 emits a voice “Please set the air conditioner to 26 ° C.” from the speaker 60 with a voice close to the user's voice by the user voice synthesis program 325, and the air conditioner “HTC_AC_10A” via the network. Is given a command to “air conditioner ON, set temperature 26 ° C.”.

サービスロボット１０のこの一連の動きを見た利用者は、自分がエアコンをつけるとき、サービスロボット１０がやってみせたように、その位置で「エアコンを２６℃に設定してください」と言うことでエアコンを操作できることを、知ることができる。 The user who sees this series of movements of the service robot 10 says, “Set the air conditioner to 26 ° C.” at that position, as the service robot 10 did when turning on the air conditioner. You can know that you can operate the air conditioner.

以上のように、本実施形態によれば、音声入力対応の電子機器を操作する場合に、人間と同様の方法で行うことができ、利用者がサービスロボットに非人間的な面を見出す可能性を軽減できる。また、サービスロボットが電子機器の音声入力方法を日ごろ実演するため、利用者が利用時の立ち位置や使用可能な語彙を知ることができる。将来利用者が電子機器を操作する際は、ロボットが過去に行った作法を思い出せばよいだけであり、利便性が高くなる。 As described above, according to the present embodiment, when operating an electronic device that supports voice input, it can be performed in the same manner as a human being, and the user may find an inhuman side in the service robot. Can be reduced. In addition, since the service robot demonstrates the voice input method of the electronic device on a daily basis, the user can know the standing position at the time of use and the vocabulary that can be used. When the user operates the electronic device in the future, it is only necessary to remember the manners that the robot has performed in the past, which increases convenience.

サービスロボット１０は、上記好ましい構成とは異なる構成を有することができる。サービスロボット１０は、移動機構９１により、外部機器の音声認識推奨位置に移動することができることが好ましいが、サービスロボット１０は、移動機構９１を備えなくともよい。または、移動機構９１を有していても音声認識推奨位置に異動することなく、音声を出力してもよい。その構成において、サービスロボット１０は、定位置において、外部機器を操作のための音声を発する。 The service robot 10 can have a configuration different from the above preferable configuration. Although it is preferable that the service robot 10 can be moved to the recommended voice recognition position of the external device by the moving mechanism 91, the service robot 10 may not include the moving mechanism 91. Or even if it has the moving mechanism 91, you may output an audio | voice, without moving to a voice recognition recommendation position. In the configuration, the service robot 10 emits sound for operating the external device at a fixed position.

上述のように、サービスロボット１０は、外部機器操作のために音声出力を行う一方で、ネットワークを介して外部機器に指示を送ることが好ましい。これにより確実に外部機器を操作することができる。設計によっては、サービスロボット１０は、出力音声のみで外部機器を操作してもよい。 As described above, it is preferable that the service robot 10 sends an instruction to the external device via the network while performing voice output for operating the external device. As a result, the external device can be reliably operated. Depending on the design, the service robot 10 may operate an external device only with output sound.

ネットワークを介して外部機器に指令を送信する構成において、上述のように、サービスロボット１０は、音声認識に応じて動作する外部機器の機能を停止させることが好ましい。これにより、音声指令とネットワークを介した指令との競合を避けることができる。外部機器の動作に問題が生じないのであれば、外部機器は、上記２つの指令を受けても良い。 In the configuration in which the command is transmitted to the external device via the network, it is preferable that the service robot 10 stops the function of the external device that operates according to the voice recognition as described above. Thereby, the conflict between the voice command and the command via the network can be avoided. If there is no problem in the operation of the external device, the external device may receive the above two commands.

上記好ましい構成において、サービスロボット１０は利用者と対話を行い、その対話の中で利用者の音声が外部機器操作の指示を含むか否かを判定し、その指示を含む場合には、その指示に応じて外部機器操作に対応した音声を出力する。利用者の言葉が外部機器操作指示であるか否かは、その利用者の言葉のみではなく、状態情報の他の項目の値にも依存している。サービスロボット１０は、利用者との自然な対話の中で、外部機器を操作することができる。外部機器操作の指示は、利用者の言葉のみを条件とすることもある。利用者からサービスロボット１０への外部機器操作の直接的な指示は、その一例である。 In the preferred configuration described above, the service robot 10 interacts with the user, and determines whether or not the user's voice includes an instruction to operate the external device in the conversation. The sound corresponding to the external device operation is output according to Whether or not the user's word is an external device operation instruction depends not only on the user's word but also on the values of other items in the status information. The service robot 10 can operate an external device in a natural dialogue with the user. The external device operation instruction may be conditional only on the user's words. An example of a direct instruction from the user to the service robot 10 for operating an external device is an example.

上記好ましい構成において、サービスロボット１０は、温度センサ８０やタイマ８１からの情報を内部状態ＤＢに格納する。設計によっては、サービスロボット１０は、これらのデータを使用することなく、利用者の言葉のみを条件として外部機器操作を行ってもよい。また、サービスロボット１０は、温度センサ８０やタイマ８１などのデバイスを備えず、ネットワークを介してそれらデバイスの情報を取得してもよい。 In the above preferred configuration, the service robot 10 stores information from the temperature sensor 80 and the timer 81 in the internal state DB. Depending on the design, the service robot 10 may operate the external device on the condition of only the user's words without using these data. The service robot 10 may not include devices such as the temperature sensor 80 and the timer 81, and may acquire information on these devices via a network.

上述のように、サービスロボット１０は、外部機器操作のための音声出力において、利用者の音声を使用する。これにより、利用者は、サービスロボット１０による対話のための音声と外部機器のための音声とを明確に識別することができる。また、外部機器の話者適応機能を有している場合には、利用者の声を使用することで、外部機器の音声認識を利用者に適切に適応させることができる。 As described above, the service robot 10 uses the user's voice in the voice output for operating the external device. Thereby, the user can clearly identify the voice for the dialogue by the service robot 10 and the voice for the external device. In addition, when the external device has a speaker adaptation function, the voice recognition of the external device can be appropriately adapted to the user by using the voice of the user.

なお、利用者の声をより正確に模倣するため、サービスロボット１０は、利用者声質ＤＢを更新し続けることが好ましいが、更新機能を有してなくともよい。また、対話の音声と外部機器の音声の識別ためには、サービスロボット１０は異なるシステム音声を使用してもよい。 In order to imitate a user's voice more accurately, the service robot 10 preferably continues to update the user voice quality DB, but may not have an update function. In addition, the service robot 10 may use different system sounds in order to discriminate between the dialogue voice and the voice of the external device.

上述のように、サービスロボット１０は、合成プロファイルを使用して、外部機器操作のための音声を生成することが好ましい。これにより、外部機器操作のためにより適切な音声を生成することができる。上記合成プロファイルは一例であって、合成プロファイルは他の特性を含むことができる。また、設計によっては、サービスロボット１０は、合成プロファイルを使用することなく音声を生成してもよい。 As described above, it is preferable that the service robot 10 generates a sound for operating an external device using the composite profile. As a result, it is possible to generate a more appropriate sound for operating the external device. The composite profile is an example, and the composite profile can include other characteristics. Further, depending on the design, the service robot 10 may generate a voice without using the synthesis profile.

本実施形態における他の構成についての以上の説明は、音声出力に関する事項以外について、下記の第二の実施形態に対しても適用することができる。 The above description of other configurations in the present embodiment can be applied to the second embodiment described below, except for matters relating to audio output.

＜第二の実施形態＞
本発明の第二の実施形態を説明する。本実施形態のサービスロボットを図１６に示す。第一の実施形態の図１に示す構成との相違は、サービスロボット１０の機構制御装置９０がさらにジェスチャ機構９２を備え、記憶装置３０に語彙ＤＢ３１３の代わりにジェスチャＤＢ３１３ｂが格納されており、サービスロボット１０が、利用者音声合成プログラム３２５の代わりに、ジェスチャ生成プログラム３２５ｂを備える点である。 <Second Embodiment>
A second embodiment of the present invention will be described. The service robot of this embodiment is shown in FIG. The difference from the configuration shown in FIG. 1 of the first embodiment is that the mechanism control device 90 of the service robot 10 further includes a gesture mechanism 92, and the gesture DB 313b is stored in the storage device 30 instead of the vocabulary DB 313. The robot 10 includes a gesture generation program 325b instead of the user speech synthesis program 325.

ジェスチャ機構９２は、人間の身体のうちジェスチャに利用する頭部、腕部、胸部、脚部に相当する、それぞれ人間に近い見た目と動作機能を備えた機械部品である。これら部品のうちの一部のみが実装されていてもよい。また一部が移動機構９１と共有されていてもよい。その構成においては、機構制御装置９０が移動機構９１とジェスチャ機構９２を適切に制御し、移動動作とジェスチャ動作の競合を避け、さらに、それらの動作によりサービスロボット１０が危険な状態にならないようにする。 The gesture mechanism 92 is a machine part having an appearance and an operation function similar to those of a human, corresponding to a head, an arm, a chest, and a leg used for a gesture in a human body. Only some of these components may be mounted. A part of the moving mechanism 91 may be shared. In this configuration, the mechanism control device 90 appropriately controls the moving mechanism 91 and the gesture mechanism 92, avoids competition between the moving operation and the gesture operation, and prevents the service robot 10 from entering a dangerous state due to these operations. To do.

次に、外部装置管理プログラム３２１の動作を説明する。外部装置管理プログラム３２１の動作は図２で説明した第一の実施形態とほぼ同様であり、ここでは差分のみ説明する。ステップ１０６、ステップ１０７において、外部装置管理プログラム３２１はＮＩＦ５０から語彙データパケットではなくジェスチャデータパケットを受信し、ジェスチャデータパケットの内容に従い。ジェスチャＤＢ３１３ｂの内容を更新する。 Next, the operation of the external device management program 321 will be described. The operation of the external device management program 321 is almost the same as that of the first embodiment described with reference to FIG. 2, and only the difference will be described here. In Step 106 and Step 107, the external device management program 321 receives a gesture data packet instead of the vocabulary data packet from the NIF 50, and follows the content of the gesture data packet. The content of the gesture DB 313b is updated.

ジェスチャＤＢ３１３ｂは、図１７に示すように、動作とそれに対応するジェスチャシーケンスの情報を含む。図１７の例において、「［０−１０］関節Ａ：（３０〜５０、５）」は、タイムフレーム０から１０において、ジェスチャ機構９２の「関節Ａ」を３０度から５０度まで、角速度上限５で動かすことを意味する。外部装置管理プログラム３２１は、ジェスチャデータパケットによりジェスチャＤＢ３１３ｂを更新するのではなく、記憶装置３０に、予め記憶されているジェスチャ情報を用いてもよい。 As illustrated in FIG. 17, the gesture DB 313 b includes information on an operation and a corresponding gesture sequence. In the example of FIG. 17, “[0-10] joint A: (30-50, 5)” indicates that the “joint A” of the gesture mechanism 92 is 30 degrees to 50 degrees and the angular velocity upper limit in time frames 0 to 10. 5 means to move. The external device management program 321 may use the gesture information stored in advance in the storage device 30 instead of updating the gesture DB 313b with the gesture data packet.

次に、動作計画プログラム３２２の動作を図１８で説明する。動作計画プログラム３２２の動作は、図５Ａ及び図５Ｂで説明した第一の実施形態と実質的に同様であり、ここでは差分のみ説明する。ステップ２０６においては、動作計画プログラム３２２は、種類が「ジェスチャ出力」の動作について、ジェスチャ生成プログラム３２５ｂに指令を送る（Ｓ６０７ｅ）。 Next, the operation of the operation planning program 322 will be described with reference to FIG. The operation of the operation planning program 322 is substantially the same as that of the first embodiment described with reference to FIGS. 5A and 5B, and only the difference will be described here. In step 206, the operation planning program 322 sends a command to the gesture generation program 325b for the operation of the type “gesture output” (S607e).

種類が「操作完了」の場合、動作計画プログラム３２２は、対象の外部機器にジェスチャ認識ＯＮを指令するネットワーク送信動作を動作キュー３１５に追加する（Ｓ６０７ｆ）。ステップ６０９において、動作計画プログラム３２２は、機器ＤＢ３１１からはジェスチャ認識推奨位置を取得し、その位置への移動動作を追加する。機器ＤＢ３１１は、図３で示したものと同様の形式で、音声認識推奨位置の代わりにジェスチャ認識推奨位置を保持する。 When the type is “operation completed”, the operation plan program 322 adds a network transmission operation that instructs the target external device to turn on gesture recognition (S607f). In step 609, the motion planning program 322 acquires a recommended gesture recognition position from the device DB 311 and adds a movement operation to that position. The device DB 311 holds a recommended gesture recognition position instead of the recommended speech recognition position in the same format as that shown in FIG.

Ｓ６１０において、動作計画プログラム３２２は、対象の外部機器にジェスチャ認識ＯＦＦを指令するネットワーク送信動作を追加する。Ｓ６１１において、動作計画プログラム３２２は、対象の外部機器のジェスチャＤＢ３１３ｂに基づいて、ジェスチャ出力するシーケンスを作成し、ジェスチャ出力動作を追加する。 In S610, the operation planning program 322 adds a network transmission operation that instructs the target external device to turn off gesture recognition. In S611, the operation planning program 322 creates a sequence for gesture output based on the gesture DB 313b of the target external device, and adds a gesture output operation.

動作計画プログラム３２２の動作例の説明は、「外部機器操作」動作が処理された後の動作が実施の形態一の構成と異なり、動作キュー３１５は、図１９の示すようになる。動作計画プログラム３２２は、これらの動作、「移動」、「ネットワーク出力」、「ジェスチャ生成」、「ネットワーク出力」を逐次処理する。 In the description of the operation example of the operation plan program 322, the operation after the “external device operation” operation is processed is different from the configuration of the first embodiment, and the operation queue 315 is as shown in FIG. The operation planning program 322 sequentially processes these operations, “movement”, “network output”, “gesture generation”, and “network output”.

その結果、サービスロボット１０はエアコン「ＨＴＣ＿ＡＣ＿１０Ａ」の前に移動し、エアコン「ＨＴＣ＿ＡＣ＿１０Ａ」の音声認識機能はＯＦＦになる。その後、サービスロボット１０は、ジェスチャ生成プログラム３２５ｂが生成したパターンに従って、ジェスチャ動作を行う。具体的には、図１７の「エアコンＯＮ、設定温度２６℃」に対応するジェスチャシーケンスに従いジェスチャ機構９２が動き、人間のジェスチャを模倣する。最後に、サービスロボット１０は、エアコン「ＨＴＣ＿ＡＣ＿１０Ａ」に「エアコンＯＮ、設定温度２６℃」にするよう指令を与える。 As a result, the service robot 10 moves in front of the air conditioner “HTC_AC_10A”, and the voice recognition function of the air conditioner “HTC_AC_10A” is turned off. Thereafter, the service robot 10 performs a gesture operation according to the pattern generated by the gesture generation program 325b. Specifically, the gesture mechanism 92 moves in accordance with a gesture sequence corresponding to “air conditioner ON, set temperature 26 ° C.” in FIG. 17, imitating a human gesture. Finally, the service robot 10 gives a command to the air conditioner “HTC_AC — 10A” to “set the air conditioner ON and set temperature 26 ° C.”.

以上により、サービスロボット１０のこの一連の動きを見た利用者は、自分がエアコンをつけるとき、サービスロボット１０がやってみせたように、その位置でサービスロボット１０と同様のジェスチャを行うことでエアコンを操作できることを知ることができる。 As described above, the user who sees this series of movements of the service robot 10 performs the same gesture as the service robot 10 at that position as if the service robot 10 did when turning on the air conditioner. You can know that you can operate the air conditioner.

以上、本発明の好ましい実施形態を説明したが、本発明が上記の実施形態に限定されるものではない。当業者であれば、上記の実施形態の各要素を、本発明の範囲において容易に変更、追加、変換することが可能である。例えば、外部機器が音声認識機能と画像認識機能を備え、その外部機器の操作が音声とジェスチャの双方で構成されている場合には、サービスロボットは、音声出力とジェスチャ動作の双方を実行する。 As mentioned above, although preferable embodiment of this invention was described, this invention is not limited to said embodiment. A person skilled in the art can easily change, add, and convert each element of the above-described embodiment within the scope of the present invention. For example, when the external device has a voice recognition function and an image recognition function, and the operation of the external device is configured by both voice and gesture, the service robot executes both voice output and gesture operation.

本発明は、主に家庭やオフィスなどで活躍するロボットに適用できる。 The present invention can be applied mainly to robots that are active in homes and offices.

１０サービスロボット、３０記憶装置、４０バス、６０スピーカ
７０マイク、８０温度センサ、８１タイマ、９０移動機構制御装置
９１移動機構、９２ジェスチャ機構、３１１機器ＤＢ、３１２動作ＤＢ
３１３語彙ＤＢ、３１３語彙ＤＢ、３１３ｂジェスチャＤＢ
３１４内部状態ＤＢ、３１５動作キュー、３１６システム声質ＤＢ
３１７利用者声質ＤＢ、３２１外部装置管理プログラム
３２２動作計画プログラム、３２３音声認識プログラム
３２４システム音声合成プログラム、３２５利用者音声合成プログラム
３２５ｂジェスチャ生成プログラム 10 service robot, 30 storage device, 40 bus, 60 speaker 70 microphone, 80 temperature sensor, 81 timer, 90 moving mechanism control device 91 moving mechanism, 92 gesture mechanism, 311 equipment DB, 312 operation DB
313 Vocabulary DB, 313 Vocabulary DB, 313b Gesture DB
314 Internal state DB, 315 Action queue, 316 System voice quality DB
317 User voice quality DB, 321 External device management program 322 Operation planning program, 323 Speech recognition program 324 System speech synthesis program, 325 User speech synthesis program 325b Gesture generation program

Claims

An external device control device that recognizes a user's voice and controls an external device,
A receiver for receiving audio;
A voice recognition unit that converts voice received by the receiving unit into a character string;
A state information storage area for storing state information including a character string converted by the voice recognition unit;
An operation information storage area for storing operation information for associating the operation of the external device control device with the condition in the state information;
An operation for referring to the state information storage area and the operation information storage area to determine whether or not the state information including the character string satisfies a condition associated with an operation by voice and / or gesture of an external device. Planning department,
A pattern generation unit that generates a voice waveform and / or a gesture pattern corresponding to the external device operation when the operation planning unit determines that the state information including the character string satisfies the condition;
An output unit for performing an output operation of the voice waveform and / or gesture according to the pattern ;
A network interface for data communication with the external device,
The operation planning unit
Using the network interface, an operation instruction corresponding to the external device operation is transmitted to the external device,
Before the output operation by the output unit , the external device is configured to instruct the external device to stop the function of recognizing the output operation and executing the corresponding operation using the network interface. Control device.

  The external device control device according to claim 1,
  The pattern generation unit converts a natural language character string corresponding to the external device operation into a speech waveform,
  The said output part outputs the said audio | voice waveform as an air vibration, The external apparatus control apparatus characterized by the above-mentioned.

  An external device control device according to claim 2,
  A user voice quality storage area for storing user voice quality information including information indicating characteristics of the voice of the user;
  The said pattern production | generation part converts the said character string into the said audio | voice waveform so that it may approximate the characteristic of the user's voice which the said user voice quality information shows, The external apparatus control apparatus characterized by the above-mentioned.

An external device control device according to claim 3,
The external device control apparatus, wherein the voice recognition unit extracts a voice feature from a voice uttered by a user and updates information in the user voice quality information storage area.

  An external device control device according to claim 2,
  The pattern generation unit further comprises a storage area for storing a synthesis profile that specifies parameters used when converting a character string into a speech waveform,
  The said pattern production | generation part converts the input character string into an audio | voice waveform based on the said synthetic | combination profile, The external apparatus control apparatus characterized by the above-mentioned.

An external device control device that recognizes a user's voice and controls an external device,
A receiver for receiving audio;
A voice recognition unit that converts voice received by the receiving unit into a character string;
A state information storage area for storing state information including a character string converted by the voice recognition unit;
An operation information storage area for storing operation information for associating the operation of the external device control device with the condition in the state information;
An operation for referring to the state information storage area and the operation information storage area to determine whether or not the state information including the character string satisfies a condition associated with an operation by voice and / or gesture of an external device. Planning department,
A pattern generation unit that generates a gesture sequence corresponding to the operation of the external device when the operation planning unit determines that the state information including the character string satisfies the condition;
An external device control apparatus comprising: an output unit that includes a plurality of movable parts, and that performs a gesture by moving the plurality of movable parts according to the gesture sequence.

  An external device control device that recognizes a user's voice and controls an external device,
  A receiver for receiving audio;
  A voice recognition unit that converts voice received by the receiving unit into a character string;
  A state information storage area for storing state information including a character string converted by the voice recognition unit;
  An operation information storage area for storing operation information for associating the operation of the external device control device with the condition in the state information;
  An operation for referring to the state information storage area and the operation information storage area to determine whether or not the state information including the character string satisfies a condition associated with an operation by voice and / or gesture of an external device. Planning department,
  A pattern generation unit that generates a voice waveform and / or a gesture pattern corresponding to the external device operation when the operation planning unit determines that the state information including the character string satisfies the condition;
  An output unit for performing an output operation of the voice waveform and / or gesture according to the pattern;
  Including an area for storing information indicating a recommended position of the output operation for the moving mechanism and the external device;
  The moving mechanism moves the external device control device to the recommended output operation position for the external device,
  The external device control device, wherein the output unit executes the output operation at the recommended output operation position.

  A control method of an external device by a control device that recognizes a user's voice and controls the external device,
  Receive user voice,
  The received voice is converted into a character string,
  Storing state information including the converted character string in a data storage device;
  The operation information stored in the data storage device and associating the operation of the control device with the condition in the state information is compared with the state information including the character string, and the state information including the character string is To determine whether the conditions associated with the voice and / or gesture actions of
  If the state information including the character string satisfies the condition, generate a speech waveform and / or gesture pattern corresponding to the external device operation,
  According to the pattern, the voice waveform and / or gesture is output.
  Using the network interface for communicating with the external device, the operation instruction corresponding to the operation of the external device is transmitted to the external device,
  Before the output operation, the network interface is used to instruct the external device to stop the function of recognizing the output operation and executing the corresponding operation.

  A program that causes the processor to execute processing for controlling the operation of an external device control device that includes a processor and a data storage device and recognizes a user's voice to control the external device,
  Get status information including the character string converted from the received voice of the user,
  Stored in the storage device, and refers to operation information that associates the operation of the control device with the condition in the state information;
  The state information including the character string is compared with the operation information to determine whether or not the state information including the character string satisfies a condition associated with an operation by a voice and / or gesture of an external device. ,
  If the state information including the character string satisfies the condition, a voice waveform and / or gesture pattern corresponding to the operation of the external device is generated, and the voice waveform and / or gesture output operation is performed according to the pattern. To decide
  Using the network interface for communicating with the external device, the operation instruction corresponding to the operation of the external device is transmitted to the external device,
  Before the output operation, the network interface is used to send a command to the external device to stop the function of recognizing the output operation and executing the corresponding operation.
  A program characterized by causing a step to be executed.