JP6316214B2

JP6316214B2 - SYSTEM, SERVER, ELECTRONIC DEVICE, SERVER CONTROL METHOD, AND PROGRAM

Info

Publication number: JP6316214B2
Application number: JP2015005190A
Authority: JP
Inventors: 靖典山下; 岩野　裕利; 裕利岩野; 礼徳永; 新開　誠; 誠新開
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2015-01-14
Filing date: 2015-01-14
Publication date: 2018-04-25
Anticipated expiration: 2035-01-14
Also published as: JP2016130800A

Description

本開示は、音声認識に基づく機器の動作制御に関し、より特定的には、対話により機器の動作を制御する技術に関する。 The present disclosure relates to device operation control based on voice recognition, and more specifically, to a technique for controlling device operation by interaction.

対話機能を持つロボットが知られている。例えば、特開２００３−６９７３２号公報（特許文献１）は、「携帯可能な、対話機能を持つ、ネットワークの情報を取込める、使用者の感情に対し特定メッセージを与え、スケジュール機能を持つロボット」を開示している（［要約］の［課題］参照）。このロボットは、「使用者の音声を認識し入力する音声入力部４と、合成した音声を出力する音声出力部５と、記憶部２２と、制御部２とを備え、使用者の音声に対し、応答する音声を出力する様に、対話機能を実行する構成とする。記憶部は、使用者の音声により入力されたスケジュールデータを記憶し、使用者の問いかけに対し、スケジュールデータに従い、スケジュールの内容を音声にて出力する。」というものである（［要約］の［解決手段］）。 Robots with interactive functions are known. For example, Japanese Patent Laid-Open No. 2003-69732 (Patent Document 1) states that “a robot that is portable, has an interactive function, can take in network information, gives a specific message to a user's emotion, and has a schedule function” (See [Problem] in [Summary]). This robot includes “a voice input unit 4 that recognizes and inputs a user's voice, a voice output unit 5 that outputs a synthesized voice, a storage unit 22, and a control unit 2. The storage unit stores the schedule data input by the user's voice and responds to the user's inquiry according to the schedule data. The content is output by voice "([Solution] in [Summary]).

また、特開２００２−３４４５７３号公報（特許文献２）は、「予め設定した時刻になるとタレントや有名人の声でその時刻が到来したことを知らせてくれると共に、その声を自由に選択することが可能な音声再生タイマ及び音声再生タイマを有する携帯電話」を開示している（［要約］の［課題］）。 Japanese Patent Laid-Open No. 2002-344573 (Patent Document 2) states that “when a preset time is reached, the voice of a talent or a celebrity informs that the time has arrived, and that voice can be freely selected. A possible voice playback timer and a mobile phone having a voice playback timer ”are disclosed ([Summary] [Problem]).

特開２００３−６９７３２号公報JP 2003-69732 A 特開２００２−３４４５７３号公報JP 2002-344573 A

対話機能を備えるロボットに対して、スケジュールの音声出力や時刻の到来の通知等の設定を容易に行う技術が必要とされている。また、単なる対話ではなく、ユーザに興趣をもたらし得る対話を提供するための技術が必要とされている。さらには、発話される音声を変更可能な技術が必要とされている。 There is a need for a technique for easily setting a schedule voice output, notification of arrival of time, and the like for a robot having an interactive function. Further, there is a need for a technique for providing a dialogue that can bring interest to the user, not just a dialogue. Furthermore, there is a need for a technology that can change the spoken voice.

本開示は、上述のような問題点を解決するためになされたものであって、ある局面における目的は、対話機能を備えるロボットによるスケジュールの音声出力や時刻の到来の通知等を容易に行うことができるシステムを提供することである。他の局面における目的は、単なる対話ではなく、ユーザに興趣をもたらし得る対話を提供できるシステムを提供することである。さらに別の局面における目的は、発話される音声を変更可能なシステムを提供することである。 The present disclosure has been made to solve the above-described problems, and an object in one aspect is to easily perform voice output of a schedule, notification of arrival of time, and the like by a robot having an interactive function. It is to provide a system that can. An object in another aspect is to provide a system capable of providing a dialog that can bring interest to the user, not just a dialog. The objective in another situation is to provide the system which can change the audio | voice spoken.

別の局面における目的は、対話機能を備えるロボットによるスケジュールの音声出力や時刻の到来の通知の設定等を容易に行うことができるサーバ、電子機器、サーバの制御方法、または当該方法を実現するためのプログラムを提供することである。他の局面における目的は、単なる対話ではなく、ユーザに興趣をもたらし得る対話を提供できるサーバ、電子機器、サーバの制御方法または当該方法を実現するためのプログラムを提供することである。さらに別の局面における目的は、発話される音声を変更可能なサーバ、電子機器、サーバの制御方法、または当該方法を実現するためのプログラムを提供することである。 An object in another aspect is to realize a server, an electronic device, a server control method, or the method capable of easily setting a voice output of a schedule or a notification of arrival of time by a robot having an interactive function. Is to provide a program. An object in another aspect is to provide a server, an electronic device, a server control method, or a program for realizing the method, which can provide a dialogue that can bring interest to the user, not just a dialogue. Still another object of the present invention is to provide a server, an electronic device, a server control method, or a program for realizing the method that can change a voice to be spoken.

一実施の形態に従うと、発話により電子機器を動作させるシステムが提供される。このシステムは、発話を受け付ける複数の電子機器と、複数の電子機器と通信可能なサーバとを備える。電子機器は、発話を受け付けるための音声入力手段と、当該電子機器の識別情報および受け付けられた発話の内容をサーバに送信するための送信手段とを備える。サーバは、識別情報および発話の内容を保持するための格納手段と、発話内容に応じた命令を生成するための生成手段と、識別情報および命令を当該電子機器に送信するための送信手段とを備える。電子機器は、サーバから、命令を受信するための受信手段と、命令に基づいて動作を実行する動作手段とを備える。 According to one embodiment, a system for operating an electronic device by speech is provided. This system includes a plurality of electronic devices that accept utterances and a server that can communicate with the plurality of electronic devices. The electronic device includes a voice input unit for receiving an utterance and a transmission unit for transmitting the identification information of the electronic device and the content of the received utterance to a server. The server includes storage means for holding the identification information and the content of the utterance, generation means for generating an instruction according to the utterance content, and transmission means for transmitting the identification information and the instruction to the electronic device. Prepare. The electronic device includes receiving means for receiving a command from the server, and operating means for executing an operation based on the command.

他の実施の形態に従うと、電子機器を制御するサーバが提供される。このサーバは、複数の電子機器を介して発話の内容を受け付けるための入力手段と、電子機器の識別情報および発話の内容を対応づけて保持するための格納手段と、発話の内容に応じた命令を生成するための生成手段と、命令を、当該命令を生成するための発話の内容に対応づけられた識別情報により特定される電子機器に送信するための送信手段とを備える。 According to another embodiment, a server for controlling an electronic device is provided. The server includes an input unit for receiving utterance contents via a plurality of electronic devices, a storage unit for storing identification information and utterance contents in association with each other, and a command corresponding to the utterance contents. And generating means for transmitting the command to the electronic device specified by the identification information associated with the content of the utterance for generating the command.

他の実施の形態に従うと、電子機器が提供される。この電子機器は、サーバと通信するための通信手段と、発話を受け付けるための音声入力手段と、電子機器の動作を実行する動作手段とを備える。通信手段は、音声入力手段が受け付けた発話の発話内容をサーバに送信し、発話内容に応じた命令をサーバから受信する。動作手段は、受信した命令に基づいて動作を実行する。 According to another embodiment, an electronic device is provided. The electronic device includes a communication unit for communicating with the server, a voice input unit for receiving an utterance, and an operation unit for executing the operation of the electronic device. The communication means transmits the utterance content of the utterance accepted by the voice input means to the server, and receives a command corresponding to the utterance content from the server. The operation means performs an operation based on the received command.

他の実施の形態に従うと、サーバの制御方法が提供される。この制御方法は、少なくとも１つの電子機器による未来の動作の指示を含む発話を受け付けるステップと、少なくとも１つの電子機器の識別情報および指示の内容を保持するステップと、指示の内容に応じた命令を生成するステップと、未来の動作が行なわれる時機の到来に基づいて、識別情報および命令を当該１つ以上の電子機器に送信するステップとを備える。 According to another embodiment, a method for controlling a server is provided. The control method includes a step of accepting an utterance including an instruction of a future operation by at least one electronic device, a step of retaining identification information of at least one electronic device and the content of the instruction, and a command according to the content of the instruction And generating and transmitting identification information and instructions to the one or more electronic devices based on the arrival of a time when a future operation is performed.

さらに他の局面に従うと、上記の方法をコンピュータに実行させるためのプログラムが提供される。 According to still another aspect, a program for causing a computer to execute the above method is provided.

この発明の上記および他の目的、特徴、局面および利点は、添付の図面と関連して理解されるこの発明に関する次の詳細な説明から明らかとなるであろう。 The above and other objects, features, aspects and advantages of the present invention will become apparent from the following detailed description of the present invention taken in conjunction with the accompanying drawings.

ユーザが本実施形態に係るシステム１００を使用する一態様を表わす図である。It is a figure showing one mode in which a user uses system 100 concerning this embodiment. システム１００の構成の一例を表わすブロック図である。2 is a block diagram illustrating an example of a configuration of a system 100. FIG. サーバ１２０を実現するコンピュータシステムの構成を表わすブロック図である。2 is a block diagram illustrating a configuration of a computer system that implements a server 120. FIG. サーバ１２０のハードディスク５におけるデータの格納の一態様を表わす図である。It is a figure showing one mode of storage of data in hard disk 5 of server 120. システム１００によって出力されるメッセージを表わす図である。2 is a diagram showing a message output by system 100. FIG. システム１００に出力される時刻を規定するデータである。This data defines the time output to the system 100. ロボット１１０のハードウェア構成を表わすブロック図である。2 is a block diagram illustrating a hardware configuration of a robot 110. FIG. システム１００が目覚まし設定の登録を行なうときに実行する処理の一部を表わすフローチャートである。It is a flowchart showing a part of process performed when the system 100 registers a wake-up setting. 目覚まし設定の変更を行なうための処理を表わすフローチャートである。It is a flowchart showing the process for changing an alarm setting. 目覚まし設定の確認処理を表わすフローチャートである。It is a flowchart showing the confirmation process of an alarm clock setting. 目覚まし設定を取り消す処理を表わすフローチャートである。It is a flowchart showing the process which cancels an alarm clock setting. サーバ１２０が備えるハードディスク５が格納するデータの一態様を概念的に表わす図である。It is a figure which represents notionally one aspect | mode of the data which the hard disk 5 with which the server 120 is provided stores. システム１００が実行する処理の一部を表すフローチャートである。4 is a flowchart showing a part of processing executed by system 100.

以下、図面を参照しつつ、本発明の実施の形態について説明する。以下の説明では、同一の部品には同一の符号を付してある。それらの名称および機能も同じである。したがって、それらについての詳細な説明は繰り返さない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the following description, the same parts are denoted by the same reference numerals. Their names and functions are also the same. Therefore, detailed description thereof will not be repeated.

＜第１の実施の形態＞
図１を参照して、本実施の形態に係る技術思想について説明する。図１は、ユーザが本実施形態に係るシステム１００を使用する一態様を表わす図である。システム１００は、複数の電子機器（たとえば、ロボット１１０Ａ，１１０Ｂ，…１１０Ｎ）と、サーバ１２０とを備える。ロボット１１０Ａ，１１０Ｂ，１１０Ｎを総称するときは、ロボット１１０という。サーバ１２０とロボット１１０とはインターネット１３０その他のネットワークによって互いに繋がれている。ロボット１１０は、たとえば、自走機能を備えた掃除機として実現される。なお、電子機器は、ロボットに限られず、電子レンジ、エアコンその他の機器であってもよい。電子機器は、少なくとも、音声入力機能と、サーバ１２０との通信機能と、動作機能とを備えていればよい。 <First Embodiment>
With reference to FIG. 1, the technical idea which concerns on this Embodiment is demonstrated. FIG. 1 is a diagram illustrating an aspect in which a user uses a system 100 according to the present embodiment. The system 100 includes a plurality of electronic devices (for example, robots 110A, 110B,... 110N) and a server 120. The robots 110A, 110B, and 110N are collectively referred to as the robot 110. The server 120 and the robot 110 are connected to each other via the Internet 130 and other networks. For example, the robot 110 is realized as a vacuum cleaner having a self-propelled function. The electronic device is not limited to a robot, and may be a microwave oven, an air conditioner, or other devices. The electronic device only needs to have at least a voice input function, a communication function with the server 120, and an operation function.

（目覚まし設定）
ある局面において、目覚まし設定の時刻は、たとえば、２４時間単位で設定可能であり、また、１０分単位で指定可能である。目覚まし機能を利用したい場合には、ユーザ１５０は、毎回声をかけることにより時刻を設定することができる。ユーザ１５０が、一度時刻を設定すると、２回目以降は簡単に同じ時刻に設定することができる。 (Alarm setting)
In one aspect, the time for alarm setting can be set in units of 24 hours, for example, and can be specified in units of 10 minutes. When the user wants to use the alarm function, the user 150 can set the time by calling out each time. Once the user 150 sets the time, it can be easily set to the same time after the second time.

ロボット１１０が設定したい時刻を聞いてから、たとえば、約３０秒以内に希望時刻が把握できなかった場合、タイムアウトとなり、目覚まし機能の設定は完了しない。なお、当該タイムアウトまでの時間は、例示された時間に限られず、その他の時間が用いられてもよい。また、タイムアウトまでの時間は、システム１００のユーザによって設定されてもよい。この場合のユーザは、サーバ１２０の運営者およびロボット１１０のユーザのいずれであってもよい。 For example, if the desired time cannot be determined within about 30 seconds after the robot 110 asks for the desired time, a timeout occurs and the setting of the alarm function is not completed. Note that the time until the timeout is not limited to the exemplified time, and other time may be used. Further, the time until timeout may be set by the user of the system 100. The user in this case may be either an operator of the server 120 or a user of the robot 110.

なお、別の局面において、ロボット１１０は、設定時刻に「目覚ましソング」を歌い得る。この場合、ロボット１１０のユーザは、途中で歌を止めたい場合には、ロボット１１０の本体のスタート／停止ボタン（図示しない）を予め定められた時間（たとえば２秒）以上長押しすることにより、ロボット１１０による歌唱を止めることができる。 Note that in another aspect, the robot 110 may sing the “wake-up song” at the set time. In this case, when the user of the robot 110 wants to stop the song on the way, by pressing and holding a start / stop button (not shown) of the main body of the robot 110 for a predetermined time (for example, 2 seconds) or longer, Singing by the robot 110 can be stopped.

より具体的には、ある局面において、ユーザ１５０は、「目覚まし設定して」と発話する（メッセージ１５１）。ロボット１１０Ａまたはサーバ１２０がメッセージ１５１の音声の内容を認識すると、ロボット１１０Ａは、「わかった。設定したい時刻を言ってね。」という応答（メッセージ１５２）を出力する。ユーザは、メッセージ１５２を認識すると、「７時２０分」と発話する（メッセージ１５３）。ロボット１１０Ａまたはサーバ１１０が、メッセージ１５３の音声の内容を認識すると、ロボット１１０Ａは、「目覚まし設定したよ。７時２０分に起こすからね。」と発話する（メッセージ１５４）。 More specifically, in a certain situation, the user 150 utters “set to wake up” (message 151). When the robot 110A or the server 120 recognizes the content of the voice of the message 151, the robot 110A outputs a response (message 152) “I understand. Say the time you want to set.” When the user recognizes the message 152, the user speaks “7:20” (message 153). When the robot 110A or the server 110 recognizes the content of the voice of the message 153, the robot 110A utters “You have set your alarm and will wake you up at 7:20” (message 154).

このようにして、ユーザは、ロボット１１０の発話により心地よく目覚めたいと望むとき、音声による会話で時刻を指定して目覚ましをセットすることができる。 In this way, when the user desires to wake up comfortably by the utterance of the robot 110, the user can set the wake-up by designating the time by voice conversation.

その後、設定された時刻が到来すると、ロボット１１０は、サーバ１２０からのデータに基づいて、メッセージあるいは歌を音声で出力する。この音声は、目覚まし設定の時に使用された声優の音声でもよく、別の局面において、当該声優とは異なる人の歌声が出力されてもよい。 Thereafter, when the set time arrives, the robot 110 outputs a message or a song by voice based on the data from the server 120. This voice may be the voice of the voice actor used when setting the alarm, or in another aspect, a singing voice of a person different from the voice actor may be output.

（２回目以降の設定）
ユーザが目覚ましを一度設定している場合には、音声会話により簡単に同じ時刻に目覚まし機能を再度設定することができる。たとえば、ユーザが「目覚まし設定して」と発話すると、ロボット１１０は「目覚まし設定したよ。７時２０分に起こすからね♪」と発話する。この場合、ロボット１１０の発話は、既に設定されている時刻に基づいて行なわれる。当該目覚ましの時刻は、ロボット１１０に、あるいは、サーバ１２０に保存される。 (Second and subsequent settings)
If the user has set the alarm once, the alarm function can be easily set again at the same time by voice conversation. For example, when the user utters “set to wake up”, the robot 110 utters “I set wake up. I wake up at 7: 20 ♪”. In this case, the utterance of the robot 110 is performed based on the already set time. The alarm time is stored in the robot 110 or the server 120.

（目覚まし時刻の変更）
ユーザは、ロボット１１０による目覚まし時刻を音声会話により変更することができる。たとえば、ユーザが「目覚まし設定して」と発話する。ユーザの発話の内容がロボット１１０またはサーバ１２０によって認識されると、ロボット１１０は「わかった♪ 設定したい時刻を言ってね。」と発話する。ユーザが「６時２０分」と発話する。ユーザの発話の内容がロボット１１０またはサーバ１２０によって認識されると、ロボット１１０は、「目覚まし設定したよ。６時２０分に起こすからね♪」と発話する。このようにして、目覚まし時刻の設定が変更される。 (Change of alarm time)
The user can change the alarm time by the robot 110 by voice conversation. For example, the user utters “set to wake up”. When the content of the user's utterance is recognized by the robot 110 or the server 120, the robot 110 utters “I understand ♪ Please say the time you want to set.” The user utters “6:20”. When the content of the user's utterance is recognized by the robot 110 or the server 120, the robot 110 utters “You have set your alarm and will wake you up at 6: 20 ♪”. In this way, the setting of the alarm time is changed.

なお、目覚まし時刻の設定は、上記の態様に限られない。たとえば、ユーザが変更後の時刻（たとえば、６時２０分）と発話する代わりに、「いつもより１時間早く起こして」と発話してもよい。ロボット１１０は、当該ユーザの標準の目覚まし設定時刻として７時２０分を記憶している場合には、上記発話の内容を認識したとき、新たな時刻として６時２０分を算出し、目覚まし設定時刻として登録し得る。 The setting of the wake-up time is not limited to the above mode. For example, instead of speaking with the time after the change (for example, 6:20), the user may speak “wake up one hour earlier than usual”. When the robot 110 stores 7:20 as the standard wake-up time of the user, the robot 110 calculates 6:20 as the new time when recognizing the content of the utterance, and sets the wake-up time. You can register as

（目覚まし設定時刻の確認）
ユーザは、ロボット１１０の目覚まし設定時刻を音声会話で確認することができる。たとえば、ユーザが「目覚まし確認して」と発話する。当該発話がロボット１１０またはサーバ１２０によって認識されると、ロボット１１０は、ロボット１１０またはサーバ１２０に保存されているデータに基づき、「目覚まし設定してるよ♪ ７時に起こすからね。」と発話する。 (Confirmation of alarm setting time)
The user can confirm the alarm setting time of the robot 110 by voice conversation. For example, the user utters “Check to wake up”. When the utterance is recognized by the robot 110 or the server 120, the robot 110 utters, “I wake up at 7 o'clock based on the data stored in the robot 110 or the server 120.

別の局面において、目覚ましが設定されていない場会がある得る。この場合、ユーザが「目覚まし設定して」と発話して当該発話が認識されても、ロボット１１０は、当該ユーザの目覚ましがロボット１１０またはサーバ１２０に保存されていないことを検知する。ロボット１１０は、その検知の結果に基づいて、「目覚まし設定していないよ。」と発話する。 In another aspect, there may be a meeting where no alarm is set. In this case, even if the user utters “set to wake up” and the utterance is recognized, the robot 110 detects that the alarm of the user is not stored in the robot 110 or the server 120. Based on the result of the detection, the robot 110 utters “No alarm setting”.

（目覚まし設定の取り消し）
ユーザは、ロボット１１０の目覚まし機能を、音声会話で取り消すことができる。たとえば、ユーザが「目覚まし取り消して」と発話する。当該発話がロボット１１０またはサーバ１２０によって認識されると、ロボット１１０は、「わかった。目覚まし取り消していい？」と発話する。ユーザが、確認の結果を伝えるメッセージ（たとえば「いいよ」）を発話すると、ロボット１１０は「目覚まし設定を取り消したよ。」と発話する。 (Cancellation of alarm settings)
The user can cancel the alarm function of the robot 110 by voice conversation. For example, the user utters “Cancel the alarm”. When the utterance is recognized by the robot 110 or the server 120, the robot 110 utters “Okay, can I wake up?”. When the user utters a message (for example, “OK”) that conveys the result of the confirmation, the robot 110 utters “The alarm setting has been canceled”.

別の局面において、ユーザは目覚まし設定の取り消しの指示をキャンセルすることができる。たとえば、ユーザが「目覚まし取り消して」と発話する。当該発話がロボット１１０またはサーバ１２０によって認識されると、ロボット１１０は、「わかった。目覚まし取り消していい？」と発話する。気が変わったユーザが、「やめる」と発話する。当該発話がロボット１１０またはサーバ１２０によって認識されると、ロボット１１０は、「目覚まし設定したままです。７時に起こすからね♪」と発話する。 In another aspect, the user can cancel the instruction to cancel the alarm setting. For example, the user utters “Cancel the alarm”. When the utterance is recognized by the robot 110 or the server 120, the robot 110 utters “Okay, can I wake up?”. A user who has changed utters "Stop". When the utterance is recognized by the robot 110 or the server 120, the robot 110 utters “The alarm is still set.

図２を参照して、本実施の形態に係るシステム１００の構成について説明する。図２は、システム１００の構成の一例を表わすブロック図である。システム１００は、音声入力部２１０と、音声認識処理部２２０と、抽出部２３０と、記憶部２４０と、動作部２５０と、命令生成部２６０と、計時部２７０とを備える。 With reference to FIG. 2, the configuration of system 100 according to the present embodiment will be described. FIG. 2 is a block diagram illustrating an example of the configuration of the system 100. The system 100 includes a voice input unit 210, a voice recognition processing unit 220, an extraction unit 230, a storage unit 240, an operation unit 250, a command generation unit 260, and a timing unit 270.

音声入力部２１０は、システム１００に対する発話（たとえば目覚ましを設定する指示、変更する指示等）の入力を受け付ける。音声入力部２１０は、その発話に応じた信号を音声認識処理部２２０に出力する。当該信号は、たとえば発話の識別情報、発話に含まれる当該指示とを含む。当該指示は、音声データおよび文字データのいずれであってもよい。 The voice input unit 210 receives an input of an utterance (for example, an instruction to set an alarm or an instruction to change) to the system 100. The voice input unit 210 outputs a signal corresponding to the utterance to the voice recognition processing unit 220. The signal includes, for example, utterance identification information and the instruction included in the utterance. The instruction may be either voice data or character data.

音声認識処理部２２０は、音声入力部２１０から送られる信号に基づいて識別情報と発話に含まれる指示とを音声認識処理し、その処理の結果を抽出部２３０に出力する。 The voice recognition processing unit 220 performs voice recognition processing on the identification information and the instruction included in the utterance based on the signal sent from the voice input unit 210, and outputs the processing result to the extraction unit 230.

抽出部２３０は、音声認識処理部２２０から送られた認識処理の結果から当該発話の識別情報と発話に含まれる指示の内容とを抽出し、抽出したデータを記憶部２４０に格納する。 The extraction unit 230 extracts the identification information of the utterance and the content of the instruction included in the utterance from the recognition processing result sent from the speech recognition processing unit 220, and stores the extracted data in the storage unit 240.

記憶部２４０は、システム１００に対して与えられたデータ、システム１００において生成されたデータ等を保持する。 The storage unit 240 holds data given to the system 100, data generated in the system 100, and the like.

動作部２５０は音声出力部２５１を含む。動作部２５０は、音声認識処理部２２０からの出力の結果に基づいて、命令生成部２６０によって生成された命令により指定された動作を実行する。動作部２５０により行なわれる動作は、たとえば、アラーム音声の出力、その他音声の出力、掃除機能がシステム１００に含まれている場合には、掃除のための機構の運転などを含み得る。 The operation unit 250 includes an audio output unit 251. The operation unit 250 executes the operation specified by the command generated by the command generation unit 260 based on the output result from the voice recognition processing unit 220. The operation performed by the operation unit 250 may include, for example, an alarm sound output, other sound output, and operation of a mechanism for cleaning when the system 100 includes a cleaning function.

音声出力部２５１は、命令生成部２６０から命令を受信したことに基づいて、音声認識処理部２２０から与えられた信号に基づき応答を音声により出力する。 The voice output unit 251 outputs a response by voice based on the signal given from the voice recognition processing unit 220 based on the reception of the command from the command generation unit 260.

命令生成部２６０は、動作部２５０に動作を実行させるための命令を生成する。たとえば、命令生成部２６０は、記憶部２４０に保持されているデータと、計時部２７０によって計測される時間データとを用いて、当該命令を生成する。命令生成部２６０は、生成した命令を動作部２５０に送る。当該命令は、機器の識別情報と動作の具体的な指示内容とを含む。 The instruction generation unit 260 generates an instruction for causing the operation unit 250 to execute an operation. For example, the command generation unit 260 generates the command using data held in the storage unit 240 and time data measured by the time measuring unit 270. The instruction generation unit 260 sends the generated instruction to the operation unit 250. The instruction includes device identification information and specific instruction contents of the operation.

計時部２７０は、システム１００における時刻を計測する。計時部２７０は、予め保持しているクロックに基づいて時間を計測し、あるいは、システム１００に接続されている他の情報通信装置から正確な時刻情報を受信し得る。正確な時刻情報は、たとえば、ＧＰＳ（Global Positioning System）信号その他の測位信号によって取得された時刻情報を含み得る。 The timer unit 270 measures the time in the system 100. The timer 270 can measure time based on a clock that is held in advance, or can receive accurate time information from other information communication devices connected to the system 100. The accurate time information may include, for example, time information acquired by a GPS (Global Positioning System) signal or other positioning signals.

［サーバの構成］
図３を参照して、本実施の形態に係るサーバ１２０の構成について説明する。図３は、サーバ１２０を実現するコンピュータシステムの構成を表わすブロック図である。 [Server configuration]
With reference to FIG. 3, the configuration of server 120 according to the present embodiment will be described. FIG. 3 is a block diagram showing the configuration of a computer system that implements server 120.

サーバ１２０は、主たる構成要素として、プログラムを実行するＣＰＵ（Central Processing Unit）１と、サーバ１２０のユーザによる指示の入力を受けるマウス２およびキーボード３と、ＣＰＵ１によるプログラムの実行により生成されたデータ、又はマウス２若しくはキーボード３を介して入力されたデータを揮発的に格納するＲＡＭ（Random Access Memory）４と、データを不揮発的に格納するハードディスク５と、光ディスク駆動装置６と、通信ＩＦ（Interface）７と、モニタ８とを備える。各構成要素は、相互にバスによって接続されている。光ディスク駆動装置６には、ＣＤ−ＲＯＭ９その他の光ディスクが装着される。通信ＩＦ７は、ＵＳＢ（Universal Serial Bus）インターフェイス、有線ＬＡＮ（Local Area Network）、無線ＬＡＮ、Bluetooth（登録商標）インターフェイス等を含むが、これらに限られない。 The server 120 includes, as main components, a CPU (Central Processing Unit) 1 that executes a program, a mouse 2 and a keyboard 3 that receive input of instructions from a user of the server 120, data generated by execution of the program by the CPU 1, Alternatively, a RAM (Random Access Memory) 4 that stores data input via the mouse 2 or the keyboard 3 in a volatile manner, a hard disk 5 that stores data in a nonvolatile manner, an optical disk drive device 6, and a communication IF (Interface) 7 and a monitor 8. Each component is connected to each other by a bus. A CD-ROM 9 and other optical disks are mounted on the optical disk drive 6. The communication IF 7 includes, but is not limited to, a USB (Universal Serial Bus) interface, a wired LAN (Local Area Network), a wireless LAN, and a Bluetooth (registered trademark) interface.

サーバ１２０における処理は、各ハードウェアおよびＣＰＵ１により実行されるソフトウェアによって実現される。このようなソフトウェアは、ハードディスク５に予め格納されている場合がある。また、ソフトウェアは、ＣＤ−ＲＯＭ９その他のコンピュータ読み取り可能な不揮発性のデータ記録媒体に格納されて、プログラム製品として流通している場合もある。あるいは、当該ソフトウェアは、インターネットその他のネットワークに接続されている情報提供事業者によってダウンロード可能なプログラム製品として提供される場合もある。このようなソフトウェアは、光ディスク駆動装置６その他のデータ読取装置によってデータ記録媒体から読み取られて、あるいは、通信ＩＦ７を介してダウンロードされた後、ハードディスク５に一旦格納される。そのソフトウェアは、ＣＰＵ１によってハードディスク５から読み出され、ＲＡＭ４に実行可能なプログラムの形式で格納される。ＣＰＵ１は、そのプログラムを実行する。 The processing in the server 120 is realized by each hardware and software executed by the CPU 1. Such software may be stored in the hard disk 5 in advance. The software may be stored in a CD-ROM 9 or other non-volatile computer-readable data recording medium and distributed as a program product. Alternatively, the software may be provided as a program product that can be downloaded by an information provider connected to the Internet or other networks. Such software is read from the data recording medium by the optical disk drive 6 or other data reading device, or downloaded via the communication IF 7 and then temporarily stored in the hard disk 5. The software is read from the hard disk 5 by the CPU 1 and stored in the RAM 4 in the form of an executable program. The CPU 1 executes the program.

図３に示されるサーバ１２０を構成する各モジュールは、一般的なものである。したがって、本実施の形態に係る本質的な部分の一つは、サーバ１２０に格納されたプログラムであるともいえる。なお、サーバ１２０のハードウェアの動作は周知であるので、詳細な説明は繰り返さない。 Each module constituting the server 120 shown in FIG. 3 is a general one. Therefore, it can be said that one of the essential parts according to the present embodiment is a program stored in the server 120. Since the hardware operation of server 120 is well known, detailed description will not be repeated.

データ記録媒体としては、ＣＤ−ＲＯＭ、ＦＤ（Flexible Disk）、ハードディスクに限られず、磁気テープ、カセットテープ、光ディスク（ＭＯ（Magnetic Optical Disc）／ＭＤ（Mini Disc）／ＤＶＤ（Digital Versatile Disc））、ＩＣ（Integrated Circuit）カード（メモリカードを含む）、光カード、マスクＲＯＭ、ＥＰＲＯＭ（Electronically Programmable Read-Only Memory）、ＥＥＰＲＯＭ（Electronically Erasable Programmable Read-Only Memory）、フラッシュＲＯＭなどの半導体メモリ等の固定的にプログラムを担持する不揮発性のデータ記録媒体でもよい。 Data recording media are not limited to CD-ROMs, FDs (Flexible Disks), and hard disks, but include magnetic tapes, cassette tapes, optical disks (MO (Magnetic Optical Discs) / MDs (Mini Discs) / DVDs (Digital Versatile Discs)), IC (Integrated Circuit) card (including memory card), optical card, mask ROM, EPROM (Electronically Programmable Read-Only Memory), EEPROM (Electronically Erasable Programmable Read-Only Memory), fixed semiconductor memory such as flash ROM A non-volatile data recording medium carrying a program may be used.

ここでいうプログラムとは、ＣＰＵにより直接実行可能なプログラムだけでなく、ソースプログラム形式のプログラム、圧縮処理されたプログラム、暗号化されたプログラム等を含み得る。 The program here may include not only a program directly executable by the CPU but also a program in a source program format, a compressed program, an encrypted program, and the like.

［データ構造］
図４〜図６を参照して、本実施の形態に係るサーバ１２０のデータ構造について説明する。図４は、サーバ１２０のハードディスク５におけるデータの格納の一態様を表わす図である。図５は、システム１００によって出力されるメッセージを表わす図である。図６は、システム１００に出力される時刻を規定するデータである。 [data structure]
A data structure of server 120 according to the present embodiment will be described with reference to FIGS. FIG. 4 is a diagram illustrating an aspect of data storage in hard disk 5 of server 120. FIG. 5 is a diagram showing a message output by system 100. FIG. 6 is data defining the time output to the system 100.

図４に示されるように、ある局面において、ハードディスク５は、タイプ４１０と、テンプレート４２０とを保持している。タイプ４１０は、テンプレート４２０の種類を表わす。たとえば、タイプ４１０は、予め設定された「標準」を表す情報と、当該標準の変形例を表わす「派生」の種類を表わす情報とを含み得る。タイプ４１０は、その他の分類を含み得る。 As shown in FIG. 4, in one aspect, the hard disk 5 holds a type 410 and a template 420. Type 410 represents the type of template 420. For example, type 410 may include information representing a preset “standard” and information representing a type of “derivation” representing a variation of the standard. Type 410 may include other classifications.

テンプレート４２０は、システム１００による出力として規定されたメッセージを表わす。テンプレートの内容は、たとえば、本実施の形態に係るシステム１００の提供者によって規定される。派生タイプのテンプレートは、標準のテンプレートの内容に類似する、あるいは関連すると考えられる発話内容として規定される。 Template 420 represents a message defined as output by system 100. The content of the template is defined by the provider of system 100 according to the present embodiment, for example. Derived type templates are defined as utterance content that is considered similar or related to the content of a standard template.

図５を参照して、ある局面において、サーバ１２０のハードディスク５は、メッセージＩＤ（Identification）５１０とテンプレート５２０とを保持している。メッセージＩＤ５１０は、テンプレート５２０として保持されている各テンプレートのそれぞれを識別する。テンプレート５２０は、たとえば、声優その他実際の一人以上の人間の発話から得られたデータを用いて作成される。テンプレート５２０は、定期的に、あるいは、不定期に変更され得る。さらに他の局面において、システム１００のユーザの発話がテンプレート５２０として登録されてもよい。 Referring to FIG. 5, in one aspect, hard disk 5 of server 120 holds message ID (Identification) 510 and template 520. Message ID 510 identifies each template held as template 520. The template 520 is created using, for example, data obtained from a voice actor or other actual speech of one or more humans. The template 520 can be changed regularly or irregularly. In yet another aspect, the utterance of the user of system 100 may be registered as template 520.

なお、図５に示される例は、目覚ましに関する発話の内容であるが、本実施の形態に係る技術思想を実現するための例は、目覚ましに関する発話に限られない。たとえば、エアコンの動作設定（ＯＮ、ＯＦＦ、温度設定、タイマー設定等）、炊飯器の動作設定（開始時刻のタイマー設定、炊飯モード等）、電子レンジの動作設定（出力、調理時間等）のように、機器の動作設定のための発話が用いられてもよい。 Note that the example shown in FIG. 5 is the content of the utterance relating to the alarm, but the example for realizing the technical idea according to the present embodiment is not limited to the utterance relating to the alarm. For example, air conditioner operation settings (ON, OFF, temperature settings, timer settings, etc.), rice cooker operation settings (start time timer settings, rice cooking mode, etc.), microwave oven operation settings (output, cooking time, etc.) In addition, an utterance for setting the operation of the device may be used.

図６を参照して、ある局面において、ハードディスク５は、データＩＤ６１０と、時刻データ６２０とを保持している。データＩＤ６１０は、時刻データ６２０において規定されている各時刻データを識別する。時刻データ６２０は、図５に示されるメッセージのテンプレート５２０と同様に、声優その他の実際の人間により発話としてサーバ１２０に入力される。他の局面において、時刻データ６２０は、合成音によって、あるいは、システム１００のユーザによる発話によって実現されてもよい。 With reference to FIG. 6, in one aspect, hard disk 5 holds data ID 610 and time data 620. The data ID 610 identifies each time data defined in the time data 620. The time data 620 is input to the server 120 as an utterance by a voice actor or other actual person, like the message template 520 shown in FIG. In other aspects, the time data 620 may be realized by synthesized sound or by utterance by a user of the system 100.

図７を参照して、本実施の形態に係るロボット１１０の構成について説明する。図７は、ロボット１１０のハードウェア構成を表わすブロック図である。ある局面において、ロボット１１０は、操作パネル７１０と、マイク７２０と、スピーカ７３０と、モニタ７４０と、コントローラ７５０と、メモリ７６０と、モータ７７０と、車輪７８０と、通信Ｉ／Ｆ７９０とを備える。コントローラ７５０は、音声認識処理プロセッサ７５５を含む。 With reference to FIG. 7, a configuration of robot 110 according to the present embodiment will be described. FIG. 7 is a block diagram illustrating a hardware configuration of the robot 110. In one aspect, the robot 110 includes an operation panel 710, a microphone 720, a speaker 730, a monitor 740, a controller 750, a memory 760, a motor 770, wheels 780, and a communication I / F 790. The controller 750 includes a voice recognition processor 755.

操作パネル７１０は、ロボット１１０に対する命令の入力を受け付ける。操作パネル７１０は、たとえば、タッチパネル、トグルスイッチその他の物理的なスイッチとして実現される。 The operation panel 710 receives an instruction input to the robot 110. The operation panel 710 is realized as, for example, a touch panel, toggle switch, or other physical switch.

マイク７２０は、ロボット１１０に対する音声の入力を受け付けて、当該音声に応じた信号をコントローラ７５０に出力する。 The microphone 720 receives an input of sound to the robot 110 and outputs a signal corresponding to the sound to the controller 750.

スピーカ７３０は、コントローラ７５０から送られる信号に基づいて音声を出力する。
モニタ７４０は、コントローラ７５０から送られる信号に基づいてロボット１１０の動作の状態その他の情報などを表示する。モニタ７４０は、たとえば、液晶モニタ、有機ＥＬ（Electro Luminescence）モニタとして実践される。 The speaker 730 outputs sound based on a signal sent from the controller 750.
The monitor 740 displays the operation status and other information of the robot 110 based on the signal sent from the controller 750. The monitor 740 is practiced as, for example, a liquid crystal monitor or an organic EL (Electro Luminescence) monitor.

コントローラ７５０は、操作パネル７１０から送られる命令に基づいてロボット１１０の動作を制御する。別の局面において、コントローラ７５０は、マイク７２０から送られる信号に含まれる命令に基づいて、ロボット１１０の動作を制御し得る。さらに別の局面において、コントローラ７５０は、メモリ７６０に保持されているデータ、通信Ｉ／Ｆ７９０を介してロボット１１０の外部から受信したデータに基づいて、ロボット１１０の動作を制御し得る。 The controller 750 controls the operation of the robot 110 based on a command sent from the operation panel 710. In another aspect, the controller 750 may control the operation of the robot 110 based on a command included in a signal sent from the microphone 720. In yet another aspect, the controller 750 may control the operation of the robot 110 based on data held in the memory 760 and data received from the outside of the robot 110 via the communication I / F 790.

音声認識処理プロセッサ７５５は、マイク７２０から送られる信号に対して音声認識処理を実行し、その処理の結果をメモリ７６０に格納する。音声認識処理プロセッサ７５５は、ＣＰＵその他のプロセッサとして実現される。また、音声認識処理プロセッサ７５５は、コントローラ７５０に与えられる信号を音声信号に変換し、当該音声信号をスピーカ７３０に送る。スピーカ７３０は、その音声信号に基づいて音声を出力する。 The speech recognition processing processor 755 performs speech recognition processing on the signal sent from the microphone 720 and stores the processing result in the memory 760. The voice recognition processor 755 is realized as a CPU or other processor. Further, the voice recognition processor 755 converts a signal given to the controller 750 into a voice signal, and sends the voice signal to the speaker 730. The speaker 730 outputs sound based on the sound signal.

メモリ７６０は、ロボット１１０に予め規定された動作を実行させるためのプログラムおよび当該プログラムの実行に必要なデータを保持している。メモリ７６０は、フラッシュメモリ、ハードディスクその他の記憶装置によって実現される。たとえば、メモリ７６０は、機器ＩＤ７６１と、ユーザＩＤ７６２とを保持している。機器ＩＤ７６１は、ロボット１１０に与えられた固有の識別番号を表わす。ユーザＩＤ７６２は、ロボット１１０のユーザとして登録されたユーザを識別する。当該ユーザは、たとえばロボット１１０の購入者、使用者などである。ユーザＩＤ７６２は、たとえば、当該ユーザによって任意に入力される。 The memory 760 holds a program for causing the robot 110 to execute a predetermined operation and data necessary for executing the program. The memory 760 is realized by a flash memory, a hard disk, or other storage device. For example, the memory 760 holds a device ID 761 and a user ID 762. The device ID 761 represents a unique identification number given to the robot 110. User ID 762 identifies a user registered as a user of robot 110. The user is, for example, a purchaser or user of the robot 110. The user ID 762 is arbitrarily input by the user, for example.

モータ７７０は、コントローラ７５０から送られる信号に基づいて駆動する。モータ７７０は、その回転力を車輪７８０に与える。車輪７８０は、ロボット１１０の動作を３６０度移動できるように構成されている。車輪７８０が回転すると、ロボット１１０はその方向に移動する。 The motor 770 is driven based on a signal sent from the controller 750. The motor 770 gives the rotational force to the wheel 780. The wheel 780 is configured to be able to move the operation of the robot 110 by 360 degrees. As the wheel 780 rotates, the robot 110 moves in that direction.

通信Ｉ／Ｆ７９０は、ネットワークに接続されて、当該ネットワークを介して他の装置とロボット１１０との通信を仲介する。通信Ｉ／Ｆ７９０は、たとえば、無線ＬＡＮ（Local Area Network）によって実現される。通信の種類は特に限定されない。 The communication I / F 790 is connected to a network and mediates communication between another apparatus and the robot 110 via the network. Communication I / F 790 is realized by, for example, a wireless local area network (LAN). The type of communication is not particularly limited.

［制御構造］
図８から図１１を参照して、本実施の形態に係るにシステム１００の制御構造について説明する。図８は、システム１００が目覚まし設定の登録を行なうときに実行する処理の一部を表わすフローチャートである。図９は、目覚まし設定の変更を行なうための処理を表わすフローチャートである。図１０は、目覚まし設定の確認処理を表わすフローチャートである。図１１は、目覚まし設定を取り消す処理を表わすフローチャートである。以下の処理は、たとえば、システム１００に含まれる１つ以上のプロセッサによって実現され得る。１つ以上のプロセッサは、たとえば、ロボット１１０やサーバ１２０に含まれるものである。 [Control structure]
With reference to FIGS. 8 to 11, a control structure of system 100 according to the present embodiment will be described. FIG. 8 is a flowchart showing a part of processing executed when system 100 registers the alarm setting. FIG. 9 is a flowchart showing a process for changing the alarm setting. FIG. 10 is a flowchart showing the confirmation process of the alarm setting. FIG. 11 is a flowchart showing a process for canceling the alarm setting. The following processing may be realized by one or more processors included in the system 100, for example. The one or more processors are included in the robot 110 or the server 120, for example.

（目覚まし設定の登録）
図８を参照して、ステップＳ８１０にて、システム１００は、目覚まし設定を要求する発話の入力を受け付ける。たとえば、ロボット１１０のコントローラ７５０は、マイク７２０を介して、ロボット１１０のユーザによる目覚まし設定を要求する発話（たとえばメッセージ１５１）の入力を受ける。 (Register alarm settings)
Referring to FIG. 8, in step S810, system 100 accepts an input of an utterance requesting alarm setting. For example, the controller 750 of the robot 110 receives an input of an utterance (for example, a message 151) requesting an alarm setting by the user of the robot 110 via the microphone 720.

ステップＳ８２０にて、システム１００は、発話の内容を音声認識処理する。たとえば、ある局面において、サーバ１２０のＣＰＵ１は、音声認識処理部２２０として発話の内容を音声認識処理する。別の局面において、ロボット１１０の音声認識処理プロセッサ７５５は、音声認識処理部２２０として、当該発話の内容を音声認識処理してもよい。 In step S820, system 100 performs speech recognition processing on the content of the utterance. For example, in one aspect, the CPU 1 of the server 120 performs speech recognition processing on the content of the utterance as the speech recognition processing unit 220. In another aspect, the speech recognition processing processor 755 of the robot 110 may perform speech recognition processing on the content of the utterance as the speech recognition processing unit 220.

ステップＳ８３０にて、システム１００は、当該発話の内容に基づいて、設定時刻の入力を促す音声（たとえばメッセージ１５２）を出力する。当該音声データは、たとえば、システム１００において予め登録されている声優の音声を録音したデータその他人間の音声を録音したデータとして実現される。他の局面において、当該音声データは、システム１００のユーザの音声を録音することによって得られたデータとして、あるいは、合成音のデータとして実現されてもよい。以下の処理においてシステム１００が音声を出力するためのデータも同様である。 In step S830, system 100 outputs a voice (for example, message 152) that prompts the user to input a set time based on the content of the utterance. The voice data is realized, for example, as data obtained by recording voice actor voices registered in advance in the system 100 or other data obtained by recording human voices. In another aspect, the voice data may be realized as data obtained by recording the voice of the user of the system 100 or as synthesized sound data. The same applies to data for the system 100 to output sound in the following processing.

ある局面において、サーバ１２０のＣＰＵ１は、ロボット１１０のスピーカ７３０を介して当該音声を出力する。ロボット１１０のユーザは、ロボット１１０の近傍にいる場合には、当該音声を聴取できる。 In one aspect, the CPU 1 of the server 120 outputs the sound via the speaker 730 of the robot 110. When the user of the robot 110 is in the vicinity of the robot 110, the user can listen to the sound.

ステップＳ８４０にて、システム１００は、音声認識処理の結果に基づいて、設定時刻を認識できたか否かを判断する。より具体的には、たとえば、コントローラ７５０またはＣＰＵ１は、入力された設定時刻を認識できたかどうかを判断する。システム１００は、設定時刻を認識できたと判断すると（ステップＳ８４０にてＹＥＳ）、制御をステップＳ８７０に切り換える。そうでない場合には（ステップＳ８４０にＮＯ）、システム１００は、制御をステップＳ８５０に切り換える。 In step S840, system 100 determines whether the set time has been recognized based on the result of the voice recognition process. More specifically, for example, the controller 750 or the CPU 1 determines whether or not the input set time has been recognized. When system 100 determines that the set time has been recognized (YES in step S840), control is switched to step S870. Otherwise (NO in step S840), system 100 switches control to step S850.

ステップＳ８５０にて、システム１００は、内部のクロックによる計測の結果に基づいて、設定時刻の認識処理がタイムアウトになったか否かを判断する。システム１００は、設定時刻の認識処理がタイムアウトになったと判断すると（ステップＳ８５０にてＹＥＳ）、処理を終了する。そうでない場合には（ステップＳ８５０にてＮＯ）、システム１００は、制御をステップＳ８６０に切り換える。 In step S850, system 100 determines whether the recognition processing for the set time has timed out based on the result of measurement using the internal clock. When system 100 determines that the set time recognition process has timed out (YES in step S850), it ends the process. Otherwise (NO in step S850), system 100 switches control to step S860.

ステップＳ８６０にて、システム１００は、再度発話を促すメッセージ（たとえばメッセージ１５２）を音声で出力する。その後、制御は、ステップＳ８４０に戻される。 In step S860, system 100 outputs a message (for example, message 152) prompting the user to speak again by voice. Thereafter, control is returned to step S840.

ステップＳ８７０にて、システム１００は、機器（たとえばロボット１１０）の識別情報（たとえば機器ＩＤ７６１）と目覚まし設定の時刻とを関連付けて保存する。たとえば、ある局面において、ＣＰＵ１は、ハードディスク５に、当該識別情報と目覚まし設定の時刻とを保存する。別の局面において、ロボット１１０のコントローラ７５０が、メモリ７６０に機器の識別情報と目覚まし設定の時刻とを保存してもよい。 In step S870, system 100 associates and stores identification information (for example, device ID 761) of the device (for example, robot 110) and the alarm setting time. For example, in one aspect, the CPU 1 stores the identification information and the alarm setting time in the hard disk 5. In another aspect, the controller 750 of the robot 110 may store the device identification information and the alarm setting time in the memory 760.

ステップＳ８８０にて、システム１００は、発話された時刻で目覚まし設定ができた旨のメッセージ（命令内容を確認するための情報）を音声で出力する（たとえばメッセージ１５４）。ある局面において、ロボット１１０のスピーカ７３０は、サーバ１２０から送られる信号に基づき当該メッセージを音声で出力する。その後、システム１００は、登録処理を終了する。 In step S880, system 100 outputs a message (information for confirming the content of the command) that the alarm has been set at the time of utterance by voice (for example, message 154). In one aspect, the speaker 730 of the robot 110 outputs the message by voice based on a signal sent from the server 120. Thereafter, the system 100 ends the registration process.

（設定の変更）
図９を参照して、ステップＳ９１０にて、システム１００は、ユーザによる発話の音声認識処理の結果に基づいて、目覚まし設定の変更を促すメッセージの入力を検出する。 (Change settings)
Referring to FIG. 9, in step S <b> 910, system 100 detects an input of a message that prompts the user to change the alarm setting based on the result of speech recognition processing for the utterance by the user.

ステップＳ９２０にて、システム１００は、予め保存されているデータに基づいて、設定したい時刻の発話のメッセージを音声で出力する。当該音声データは、たとえば、システム１００において予め登録されている声優の音声を録音したデータその他人間の音声を録音したデータとして実現される。他の局面において、当該音声データは、システム１００のユーザの音声を録音することによって得られたデータとして、あるいは、合成音のデータとして実現されてもよい。 In step S920, system 100 outputs, by voice, an utterance message at a desired time based on data stored in advance. The voice data is realized, for example, as data obtained by recording voice actor voices registered in advance in the system 100 or other data obtained by recording human voices. In another aspect, the voice data may be realized as data obtained by recording the voice of the user of the system 100 or as synthesized sound data.

ステップＳ９３０にて、システム１００は、当該メッセージの音声認識処理の結果に基づいて、設定時刻を認識できたか否かを判断する。システム１００は、設定時刻を認識できたと判断すると（ステップＳ９３０にてＹＥＳ）、制御をステップＳ９６０に切り換える。そうでない場合には（ステップＳ９３０にてＮＯ）、システム１００は、制御をステップＳ９４０に切り換える。 In step S930, system 100 determines whether or not the set time has been recognized based on the result of the voice recognition process for the message. When system 100 determines that the set time has been recognized (YES in step S930), control is switched to step S960. Otherwise (NO in step S930), system 100 switches control to step S940.

ステップＳ９４０にて、システム１００は、設定時刻の認識処理がタイムアウトしたか否かを判断する。システム１００は、設定時刻の認識処理がタイムアウトしたと判断すると（ステップＳ９４０にてＹＥＳ）、変更ができない旨のメッセージを出力して、当該変更処理を終了する。そうでない場合には（ステップＳ９４０にてＮＯ）、システム１００は、制御をステップＳ９５０に切り換える。 In step S940, system 100 determines whether the set time recognition process has timed out. When system 100 determines that the set time recognition process has timed out (YES in step S940), system 100 outputs a message indicating that the change cannot be made, and ends the change process. Otherwise (NO in step S940), system 100 switches control to step S950.

ステップＳ９５０にて、システム１００は、再度、時刻の入力を促すメッセージを音声で出力する。その後、制御は、ステップＳ９３０に戻される。 In step S950, system 100 again outputs a message that prompts the user to input the time. Thereafter, control is returned to step S930.

ステップＳ９６０にて、システム１００は、音声認識処理の結果に基づいて、設定を止める等の入力を検出したか否かを判断する。この判断は、たとえば、設定を止める旨の命令がシステム１００に対して与えられたか否かに基づいて行なわれる。システム１００は、当該入力を検出したと判断すると（ステップＳ９６０にてＹＥＳ）、制御をステップＳ９８０に切り換える。そうでない場合には（ステップＳ９６０にてＮＯ）、システム１００は、制御をステップＳ９７０に切り換える。 In step S960, system 100 determines whether an input for stopping the setting or the like has been detected based on the result of the speech recognition process. This determination is made based on, for example, whether an instruction to stop the setting is given to the system 100. When system 100 determines that the input has been detected (YES in step S960), control is switched to step S980. Otherwise (NO in step S960), system 100 switches control to step S970.

ステップＳ９７０にて、システム１００は、時刻が設定中であるか否かを判断する。より具体的には、システム１００は目覚ましを設定する時刻が記憶部２４０に保存されているか否かを判断する。システム１００は、時刻が設定中であると判断すると（ステップＳ９７０にてＹＥＳ）、制御をステップＳ９９０に切り換える。そうでない場合には（ステップＳ９７０にてＮＯ）、システム１００は、制御をステップＳ９８０に切り換える。 In step S970, system 100 determines whether the time is being set. More specifically, the system 100 determines whether or not the time for setting the alarm is stored in the storage unit 240. When system 100 determines that the time is being set (YES in step S970), system 100 switches control to step S990. Otherwise (NO in step S970), system 100 switches control to step S980.

ステップＳ９８０にて、システム１００は、予め準備されている音声データに基づいて、目覚まし設定を中止した旨のメッセージを音声で出力する。 In step S980, system 100 outputs a voice message indicating that alarm setting has been canceled based on voice data prepared in advance.

ステップＳ９９０にて、システム１００は、予め準備されている音声データに基づいて、たとえば「目覚まし設定したままです。○○時○○分に起こすからね。」とのメッセージを音声で出力する。 In step S990, system 100 outputs, for example, a message “For example, the alarm clock is still set. I will wake up at XX hour.” Based on the voice data prepared in advance.

（目覚まし設定の確認処理）
図１０を参照して、ステップＳ１０１０にて、システム１００は、音声認識処理の結果に基づいて、目覚ましを確認すべき旨の命令を表すユーザ発話を検出する。 (Alarm setting confirmation process)
Referring to FIG. 10, in step S <b> 1010, system 100 detects a user utterance that represents a command to confirm wake-up based on the result of the speech recognition process.

ステップＳ１０２０にて、システム１００は、検出されたユーザ発話に基づいて、記憶部２４０を参照して目覚まし設定の内容を確認する。たとえば、ＣＰＵ１は、ハードディスク５を参照して、当該ユーザに関連付けられた目覚まし設定の有無を確認する。 In step S1020, system 100 refers to storage unit 240 based on the detected user utterance and confirms the contents of the alarm setting. For example, the CPU 1 refers to the hard disk 5 and confirms whether there is an alarm setting associated with the user.

ステップＳ１０３０にて、システム１００は、記憶部２４０に保存されている内容に基づいて、目覚まし設定が存在しているか否かを判断する。システム１００は、目覚まし設定が存在していると判断すると（ステップＳ１０３０にてＹＥＳ）、制御をステップＳ１０４０に切り換える。そうでない場合には（ステップＳ１０３０にてＮＯ）、システム１００は、制御をステップＳ１０５０に切り換える。 In step S1030, system 100 determines whether there is a wake-up setting based on the content stored in storage unit 240. When system 100 determines that the alarm setting exists (YES in step S1030), control is switched to step S1040. Otherwise (NO in step S1030), system 100 switches control to step S1050.

ステップＳ１０４０にて、システム１００は、予め保存されている音声データに基づいて、目覚まし設定している旨のメッセージを音声で出力する。 In step S1040, system 100 outputs a message indicating that the alarm is set based on the voice data stored in advance.

ステップＳ１０５０にて、システム１００は、予め保存されている音声データに基づいて、目覚まし設定していない旨のメッセージを音声で出力する。 In step S1050, system 100 outputs a message indicating that the alarm is not set based on voice data stored in advance.

（目覚まし設定の取り消し）
図１１を参照して、ステップＳ１１１０にて、システム１００は、音声認識処理の結果に基づいて、目覚まし設定を取り消す旨を表すユーザ発話の入力を検出する。 (Cancellation of alarm settings)
Referring to FIG. 11, in step S <b> 1110, system 100 detects an input of a user utterance indicating that the alarm setting is canceled based on the result of the voice recognition process.

ステップＳ１１２０にて、システム１００は、当該ユーザ発話が検出されたことに基づいて、目覚ましが設定されているか否かを判断する。たとえば、ＣＰＵ１は、ハードディスク５を参照して、当該ユーザに関連付けられている目覚まし設定の有無を確認する。システム１００は、目覚ましが設定されていると判断すると（ステップＳ１１２０にてＹＥＳ）、制御をステップＳ１１３０に切り換える。そうでない場合には（ステップＳ１１２０にてＮＯ）、システム１００は、制御をステップＳ１１２５に切り換える。 In step S1120, system 100 determines whether or not an alarm is set based on detection of the user utterance. For example, the CPU 1 refers to the hard disk 5 and checks whether there is an alarm setting associated with the user. If system 100 determines that the alarm is set (YES in step S1120), control is switched to step S1130. Otherwise (NO in step S1120), system 100 switches control to step S1125.

ステップＳ１１２５にて、システム１００は、予め保存されている音声データに基づいて、目覚ましが設定されていない旨のメッセージを音声で出力する。 In step S1125, system 100 outputs a message indicating that the alarm is not set based on the voice data stored in advance.

ステップＳ１１３０にて、システム１００は、予め保存されている音声データに基づいて、目覚まし設定の取り消しを確認するメッセージを音声で出力する。 In step S1130, system 100 outputs a message for confirming cancellation of the alarm setting based on the voice data stored in advance.

ステップＳ１１４０にて、システム１００は、取り消しを実行する旨の指示が入力されたか否かを判断する。この判断は、システム１００に対する信号の内容に基づいて行なわれる。システム１００は、当該指示が入力されたと判断すると（ステップＳ１１４０にてＹＥＳ）、制御をステップＳ１１６０に切り換える。そうでない場合には（ステップＳ１１４０にてＮＯ）、システム１００は、制御をステップＳ１１５０に切り換える。 In step S1140, system 100 determines whether or not an instruction to execute cancellation has been input. This determination is made based on the content of the signal for system 100. When system 100 determines that the instruction has been input (YES in step S1140), system 100 switches control to step S1160. Otherwise (NO in step S1140), system 100 switches control to step S1150.

ステップＳ１１５０にて、システム１００は、現在の時刻がタイムアウトしたか否かを判断する。システム１００は、現在の時刻がタイムアウトしたと判断すると（ステップＳ１１５０にてＹＥＳ）、制御をステップＳ１１８０に切り換える。そうでない場合には（ステップＳ１１５０にてＮＯ）、システム１００は、制御をステップＳ１１５５に切り換える。 In step S1150, system 100 determines whether or not the current time has timed out. If system 100 determines that the current time has timed out (YES in step S1150), system 100 switches control to step S1180. Otherwise (NO in step S1150), system 100 switches control to step S1155.

ステップＳ１１５５にて、システム１００は、予め保存されている音声データに基づいて、指示の入力を促すメッセージを音声で出力する。その後、制御は、ステップＳ１１４０に戻される。 In step S1155, system 100 outputs a message prompting input of an instruction by voice based on voice data stored in advance. Thereafter, control is returned to step S1140.

ステップＳ１１６０にて、システム１００は、目覚まし設定のデータを消去する。たとえば、ＣＰＵ１は、ハードディスク５に保存されている当該ユーザに関連するデータを削除する。 In step S1160, system 100 deletes the alarm setting data. For example, the CPU 1 deletes data related to the user stored in the hard disk 5.

ステップＳ１１７０にて、システム１００は、予め保存されている音声データに基づいて、目覚まし設定を取り消した旨のメッセージを音声で出力する。 In step S1170, system 100 outputs a message indicating that the alarm setting has been canceled based on the voice data stored in advance.

ステップＳ１１８０にて、システム１００は、予め保存されている音声データに基づいて、目覚まし設定が残っている旨のメッセージを音声で出力する。 In step S1180, system 100 outputs a message indicating that the alarm setting remains, based on the voice data stored in advance.

［データ構造］
図１２を参照して、本実施の形態に係るシステム１００を実現するサーバ１２０のデータ構造について説明する。図１２は、サーバ１２０が備えるハードディスク５が格納するデータの一態様を概念的に表わす図である。ある局面において、ハードディスク５は、ユーザＩＤ２１０と、機器ＩＤ１２２０と、目覚まし設定時刻１２３０と、音声バージョン１２４０とを保持している。 [data structure]
With reference to FIG. 12, the data structure of server 120 that implements system 100 according to the present embodiment will be described. FIG. 12 is a diagram conceptually showing one mode of data stored in hard disk 5 included in server 120. In one aspect, the hard disk 5 holds a user ID 210, a device ID 1220, a wake-up setting time 1230, and an audio version 1240.

ユーザＩＤ１２１０は、ロボット１１０のユーザを識別する。機器ＩＤ１２２０は、当該ロボットを識別する。 The user ID 1210 identifies the user of the robot 110. The device ID 1220 identifies the robot.

目覚まし設定時刻１２３０は、機器ＩＤ１２２０によって特定される機器（たとえばロボット１１０Ａなど）が目覚ましを鳴動すべき時刻を表わす。音声バージョン１２４０は、目覚ましが鳴動するときに出力される音声を発話するときの発話方法を表わす。たとえばユーザＩＤ１２１０が「user００１」で特定される機器（robot００１０）については、声優Ａによる音声が出力される。 Alarm setting time 1230 represents a time at which a device (for example, robot 110A) specified by device ID 1220 should sound an alarm. The voice version 1240 represents an utterance method when uttering the voice output when the alarm is sounded. For example, for the device (robot0010) identified by the user ID 1210 “user001”, the voice by the voice actor A is output.

［制御構造］
図１３を参照して、システム１００の制御構造についてさらに説明する。図１３は、システム１００において実行される処理の一部を表すフローチャートである。 [Control structure]
The control structure of the system 100 will be further described with reference to FIG. FIG. 13 is a flowchart showing a part of processing executed in the system 100.

ステップＳ１３１０にて、サーバ１２０は、内蔵するクロックからの出力に基づいて、目覚まし設定された時刻の到来を検知する。 In step S1310, server 120 detects the arrival of the set alarm time based on the output from the built-in clock.

ステップＳ１３２０にて、サーバ１２０は、音声再生用のコンテンツデータを記憶部２４０から読み出す。コンテンツデータは、たとえば、楽曲等を含み得る。 In step S1320, server 120 reads content data for audio reproduction from storage unit 240. The content data can include, for example, music.

ステップＳ１３３０にて、サーバ１２０は、コンテンツデータおよび目覚まし設定を用いてロボット１１０に再生させるための音声を生成する。 In step S1330, server 120 generates a sound to be reproduced by robot 110 using the content data and the alarm setting.

ステップＳ１３４０にて、サーバ１２０は、生成した音声と機器ＩＤとを含む信号を生成する。 In step S1340, server 120 generates a signal including the generated voice and device ID.

ステップＳ１３５０にて、サーバ１２０は、生成した信号を当該機器ＩＤによって識別されるロボット１１０に送信する。 In step S1350, server 120 transmits the generated signal to robot 110 identified by the device ID.

ステップＳ１３６０にて、ロボット１１０は、受信した信号に基づいて、コンテンツを目覚まし音声として再生する。より具体的には、たとえば、ロボット１１０のスピーカ７３０は、サーバ１２０から送られた信号に基づいて、ある声優によるメッセージを目覚まし音声として出力する。 In step S1360, robot 110 reproduces the content as an alarm sound based on the received signal. More specifically, for example, the speaker 730 of the robot 110 outputs a message by a certain voice actor as a wake-up sound based on a signal sent from the server 120.

なお、システム１００から出力される音声は、特定の声優の音声に限られない。また、出力される音声は、同一人物による音声に限られず、複数の人物の各々による音声が用いられてもよい。たとえば、システム１００は、目覚ましの設定時、変更時、確認時、設定された時刻の到来時の各々を異なる音声で出力してもよい。 Note that the sound output from the system 100 is not limited to the sound of a specific voice actor. Further, the output sound is not limited to the sound of the same person, and the sound of each of a plurality of persons may be used. For example, the system 100 may output different sounds at the time of setting the alarm, at the time of changing, at the time of confirmation, and at the time when the set time arrives.

また、他の局面において、設定された時刻が到来したときに、ロボット１１０は、発話に代えて、他の動作を実行してもよい。たとえば、ロボット１１０が掃除機能を備えている場合、ロボット１１０は、掃除運転を開始してもよい。 In another aspect, when the set time arrives, the robot 110 may execute another operation instead of the utterance. For example, when the robot 110 has a cleaning function, the robot 110 may start the cleaning operation.

＜第２の実施の形態＞
以下、第２の実施の形態について説明する。第２の実施の形態に係るシステムは、後述する点を除いて、第１の実施の形態に係るシステム１００の構成と同様の構成によって実現される。したがって、本実施の形態に係るシステムの構成の説明は繰り返さない。 <Second Embodiment>
Hereinafter, a second embodiment will be described. The system according to the second embodiment is realized by the same configuration as the configuration of the system 100 according to the first embodiment, except as described below. Therefore, the description of the system configuration according to the present embodiment will not be repeated.

第２の実施の形態に係るシステムは、エアコン、電子レンジその他の機器を用いて実現される。当該機器は、当該機器に固有な機能（たとえば、冷暖房機能、加熱機能等）に加えて、少なくとも、音声認識機能と、サーバ１２０との通信機能と、音声出力機能とを備える。 The system according to the second embodiment is realized using an air conditioner, a microwave oven, and other devices. The device includes at least a voice recognition function, a communication function with the server 120, and a voice output function in addition to functions unique to the device (for example, an air conditioning function, a heating function, and the like).

たとえば、当該機器がエアコンの場合、ある局面において、ユーザは、エアコンの動作（運転開始時刻の設定、確認、変更、削除など）を対話で設定することができる。この場合、ロボット１１０の代わりに、通信機能と音声認識機能とを備えるエアコンまたは当該エアコンのリモートコントローラ、もしくは、エアコンとの通信機能および音声認識機能を備える通信端末が、エアコンのユーザと対話し得る。 For example, when the device is an air conditioner, in one aspect, the user can interactively set the operation of the air conditioner (setting, confirmation, change, deletion, etc. of operation start time). In this case, instead of the robot 110, an air conditioner having a communication function and a voice recognition function, a remote controller of the air conditioner, or a communication terminal having a communication function and a voice recognition function with the air conditioner can interact with the user of the air conditioner. .

別の局面において、電子レンジが音声認識機能を備えてもよい。たとえば、ユーザが食材を電子レンジに入れた後に、食材の名称、出力および調理時間を電子レンジに発話する。たとえば、最初に、ユーザは「グラタンを加熱して。」と発話する。その発話が電子レンジあるいはサーバ１２０によって認識されると、電子レンジは、たとえば「わかった。グラタンを加熱するよ。標準の調理でいい？」と発話する。ユーザが、「いいよ。」と発話すると、電子レンジは「了解。グラタンを１０００ｗで３分間加熱するね。」と発話し、電子レンジは運転を開始する。その後、指定された時間が経過すると、電子レンジは、「グラタンができたよ。熱いから気を付けて。」と発話する。このようにすると、たとえば、一人暮らしのユーザは食事を楽しむことができる。なお、運転時間の確認、変更、キャンセルなどは、第１の実施の形態における目覚まし設定時刻の確認、変更、キャンセルと同様に実現可能である。 In another aspect, the microwave oven may include a voice recognition function. For example, after the user puts the ingredients into the microwave, the name, output and cooking time of the ingredients are spoken to the microwave. For example, first, the user speaks “Hey the gratin.” When the utterance is recognized by the microwave oven or the server 120, the microwave utters, for example, “Okay. I'll heat the gratin. When the user utters “OK”, the microwave speaks “OK. I will heat the gratin for 3 minutes at 1000 w.”, And the microwave starts to operate. After that, when the specified time has passed, the microwave speaks, “I have a gratin. Be careful because it ’s hot.” In this way, for example, a user living alone can enjoy a meal. Note that confirmation, change, cancellation, etc. of the driving time can be realized in the same manner as the confirmation, change, cancellation of the wake-up setting time in the first embodiment.

別の局面において、炊飯器が用いられてもよい。当該炊飯器は、ロボット１１０と同様に、音声認識機能と、音声出力機能と、サーバ１２０との通信機能とを備える。この場合、ユーザは炊飯器にコメを入れる。ユーザは「ご飯を炊いて」と発話する。その発話が炊飯器あるいはサーバ１２０によって認識されると、炊飯器は、「わかった。何合炊くの？」と発話する。ユーザが、たとえば「３合」と発話する。その発話が炊飯器あるいはサーバ１２０によって認識されると、炊飯器は、「わかった。何時に食べるの？」と発話する。ユーザが、たとえば「７時」と発話する。その発話が炊飯器あるいはサーバ１２０によって認識されると、炊飯器は、「わかった。７時までに３合炊くよ。」と発話する。その後、炊飯器は、炊飯に必要な水が満たされていることを確認すると、タイマー設定を行ない、７時にユーザがご飯を食べられるように炊飯する。 In another aspect, a rice cooker may be used. Similar to the robot 110, the rice cooker includes a voice recognition function, a voice output function, and a communication function with the server 120. In this case, the user puts rice in the rice cooker. The user speaks “cook rice”. When the utterance is recognized by the rice cooker or the server 120, the rice cooker utters "I understand. The user speaks, for example, “3 go”. When the utterance is recognized by the rice cooker or server 120, the rice cooker utters "I understand. What time do you eat?" The user speaks, for example, “7 o'clock”. When the utterance is recognized by the rice cooker or the server 120, the rice cooker utters "I understand. I'll cook three times by 7 o'clock." After that, when the rice cooker confirms that the water necessary for cooking is filled, the rice cooker sets a timer and cooks the rice so that the user can eat the rice at 7 o'clock.

以上のようにして、本実施の形態によれば、機器が発話するため、その機器のユーザは楽しみながら機器を使用することができる。 As described above, according to the present embodiment, since the device speaks, the user of the device can use the device while having fun.

＜第３の実施の形態＞
以下、第３の実施の形態について説明する。第３の実施の形態に係るシステムは、後述する点を除いて、第１の実施の形態に係るシステム１００の構成と同様の構成によって実現される。したがって、本実施の形態に係るシステムの構成の説明は繰り返さない。 <Third Embodiment>
The third embodiment will be described below. The system according to the third embodiment is realized by the same configuration as the configuration of the system 100 according to the first embodiment, except as described below. Therefore, the description of the system configuration according to the present embodiment will not be repeated.

第３の実施の形態に係るシステムは、スーパーマーケット、ブティックその他の商業施設にも用いられる。たとえば、閉店前のタイムサービスが行なわれる場合、当該商業施設の管理者は、ロボット１１０に対して、タイムサービスを開始する時刻を設定し、変更し、確認し、解除（消去）することができる。あるいは、ロボット１１０は、潜在的な顧客からの発話に対してプロモーション（宣伝）を行なうように構成されてもよい。 The system according to the third embodiment is also used in supermarkets, boutiques and other commercial facilities. For example, when the time service before closing is performed, the manager of the commercial facility can set, change, confirm, and cancel (delete) the time for starting the time service for the robot 110. . Alternatively, the robot 110 may be configured to promote promotions for utterances from potential customers.

＜第４の実施の形態＞
以下、第４の実施の形態について説明する。第４の実施の形態に係るシステムは、後述する点を除いて、第１の実施の形態に係るシステム１００の構成と同様の構成によって実現される。したがって、本実施の形態に係るシステムの構成の説明は繰り返さない。 <Fourth embodiment>
Hereinafter, a fourth embodiment will be described. The system according to the fourth embodiment is realized by the same configuration as the configuration of the system 100 according to the first embodiment, except as described below. Therefore, the description of the system configuration according to the present embodiment will not be repeated.

本実施の形態に係るシステムは、複数のロボットに一斉に通知を発話させることができる。すなわち、目覚ましその他の通知機能の実現の対象者は一人のユーザに限られない。たとえば、システム１００は、予め作成されたグループに含まれる複数のロボットの各々に対して上記通知機能を実現してもよい。この場合、サーバ１２０は、当該グループに含まれる各ロボットのネットワークアドレスを保持しており、指定された時刻が到来すると、各ネットワークアドレスに対して、目覚ましのための信号を送信する。このようにすると、効率的に複数ユーザに対する通知が実現される。 The system according to the present embodiment can cause a plurality of robots to utter a notification all at once. That is, the target of realizing the alarm function and other notification functions is not limited to one user. For example, the system 100 may realize the notification function for each of a plurality of robots included in a group created in advance. In this case, the server 120 holds the network address of each robot included in the group, and transmits a wake-up signal to each network address when the designated time arrives. In this way, notification to a plurality of users is realized efficiently.

さらに別の局面において、あるユーザが通知に対する応答を発話で行った場合に、その旨を、当該グループに含まれる他のユーザにも通知される構成が用いられてもよい。 In still another aspect, when a certain user makes a response to the notification by utterance, a configuration may be used in which other users included in the group are also notified of this.

＜第５の実施の形態＞
以下、第５の実施の形態について説明する。第５の実施の形態に係るシステムは、後述する点を除いて、第１の実施の形態に係るシステム１００の構成と同様の構成によって実現される。したがって、本実施の形態に係るシステムの構成の説明は繰り返さない。 <Fifth embodiment>
The fifth embodiment will be described below. The system according to the fifth embodiment is realized by the same configuration as the configuration of the system 100 according to the first embodiment, except as described below. Therefore, the description of the system configuration according to the present embodiment will not be repeated.

本実施の形態に係るシステムにおいて、ロボット１１０が発話する内容は、目覚ましの時刻設定に関するメッセージに限られない。ロボット１１０は、目覚まし以外のメッセージを発話することができる。たとえば、ロボット１１０は、「今日は電気設備の点検があります。１時から２時まで停電します。」というメッセージのように、お知らせのためのメッセージを発話してもよい。 In the system according to the present embodiment, the content spoken by the robot 110 is not limited to a message related to the alarm time setting. The robot 110 can utter a message other than an alarm clock. For example, the robot 110 may utter a message for notification such as a message “There is an electrical equipment inspection today. A power failure will occur from 1 to 2”.

このような発話は、プッシュ型およびプル型のいずれの態様でも実現できる。たとえば、プッシュ型の場合、サーバ１２０が当該ユーザのスケジュールを読み出して、該当するスケジュールを当該ユーザのロボット１１０に通知する。プル型の場合、ユーザがたとえば、「今日の予定は何かある？」というようにスケジュールを問いかける発話を行ない、その発話がサーバ１２０またはロボット１１０によって認識されると、サーバ１２０は当該ユーザのスケジュールを検索して、該当するスケジュールが存在した場合に、スケジュールの内容を音声で出力する。このような構成により、ユーザは、ロボット１１０を秘書として使用することができる。 Such an utterance can be realized by either a push type or a pull type. For example, in the push type, the server 120 reads the user's schedule and notifies the user's robot 110 of the corresponding schedule. In the case of the pull type, when the user makes an utterance asking the schedule such as “What is the schedule for today?” And the utterance is recognized by the server 120 or the robot 110, the server 120 determines the schedule of the user. If the corresponding schedule exists, the contents of the schedule are output by voice. With such a configuration, the user can use the robot 110 as a secretary.

別の局面において、他のユーザも関与するスケジュールが検索された場合、ロボット１１０は、ユーザに「Ａさんとの約束があります。」という発話をする。ユーザが「了解。」と返答すると、その返答はロボット１１０からサーバ１２０に送られる。サーバ１２０は、ユーザの返答を認識すると、当該他のユーザが使用する他のロボットを通じて、「Ｘさん（ロボット１１０のユーザ）との約束があります。」と発話する。Ａさんが「了解」と返答すると、その返答は、Ａさんが使用するロボットからサーバ１２０に送られる。サーバ１２０は、ロボット１１０に対して、「Ａさんが約束を確認したよ。」と発話する。このようにすると、複数のユーザが共有するスケジュールが履行されることが確実になる。 In another aspect, when a schedule involving other users is retrieved, the robot 110 utters “There is an appointment with Mr. A” to the user. When the user replies “OK”, the reply is sent from the robot 110 to the server 120. When the server 120 recognizes the user's response, the server 120 utters "There is a promise with Mr. X (the user of the robot 110)" through another robot used by the other user. When Mr. A answers “OK”, the response is sent from the robot used by Mr. A to the server 120. The server 120 speaks to the robot 110 “Mr. A confirmed the promise”. This ensures that a schedule shared by a plurality of users is implemented.

＜第６の実施の形態＞
以下、第６の実施の形態について説明する。第６の実施の形態に係るシステムは、後述する点を除いて、第１の実施の形態に係るシステム１００の構成と同様の構成によって実現される。したがって、本実施の形態に係るシステムの構成の説明は繰り返さない。 <Sixth Embodiment>
Hereinafter, a sixth embodiment will be described. The system according to the sixth embodiment is realized by the same configuration as the configuration of the system 100 according to the first embodiment, except as described below. Therefore, the description of the system configuration according to the present embodiment will not be repeated.

本実施の形態に係るシステムは、学習機能を備える。たとえば、システムは、各ユーザの目覚まし設定の履歴を保存していてもよい。その上で、システムは、ユーザによる目覚まし設定の時刻が通常と異なる場合には、その履歴に基づいて、その目覚まし設定の時刻が正しいかどうかを確認するメッセージ（たとえば「本当？その時刻でいいの？」）と発話してもよい。このような発話が出力されると、ユーザは、発話した時刻が正しいかどうかを再度確認することになるので、誤った時刻設定を防止することができる。 The system according to the present embodiment has a learning function. For example, the system may store a history of alarm settings for each user. In addition, when the time of the alarm setting by the user is different from the normal time, the system confirms whether the alarm setting time is correct based on the history (for example, “true? ?)). When such an utterance is output, the user confirms again whether or not the utterance time is correct, so that an incorrect time setting can be prevented.

＜第７の実施の形態＞
以下、第７の実施の形態について説明する。第７の実施の形態に係るシステムは、後述する点を除いて、第１の実施の形態に係るシステム１００の構成と同様の構成によって実現される。したがって、本実施の形態に係るシステムの構成の説明は繰り返さない。 <Seventh embodiment>
The seventh embodiment will be described below. The system according to the seventh embodiment is realized by the same configuration as the configuration of the system 100 according to the first embodiment, except as described below. Therefore, the description of the system configuration according to the present embodiment will not be repeated.

本実施の形態に係るシステムは、サーバ１２０からロボット１１０に一度送信されてロボット１１０によって使用された音声データがロボット１１０に保存され、ロボット１１０によって再度利用される点で、前述の実施の形態と異なる。 The system according to the present embodiment differs from the above-described embodiment in that the voice data once transmitted from the server 120 to the robot 110 and used by the robot 110 is stored in the robot 110 and reused by the robot 110. Different.

すなわち、ロボット１１０は、声優その他の人の発話を含むメッセージと、当該メッセージを識別するＩＤとを保存する。ロボット１１０は、予め設定された目覚まし時刻の到来に基づいて、メッセージを音声で出力し、あるいは、予め指定された歌を歌う。その後、ロボット１１０が、ＩＤを含む命令を受信すると、当該受信した命令に含まれるＩＤがロボット１１０に保存されているか否かを確認する。ロボット１１０は、当該ＩＤがロボット１１０に保存されていることを確認すると、保存されているＩＤに関連付けられているメッセージあるいは楽曲を音声で出力する。このようにすると、サーバ１２０とロボット１１０との間の通信が不安定な場合であっても、ロボット１１０は、ローカルに保存されたデータを用いて、設定された時刻に確実にメッセージを発話することができる。 That is, the robot 110 stores a message including a voice actor or other person's utterance and an ID for identifying the message. Based on the arrival of a preset alarm time, the robot 110 outputs a message by voice or sings a predetermined song. Thereafter, when the robot 110 receives an instruction including an ID, it is confirmed whether or not the ID included in the received instruction is stored in the robot 110. When the robot 110 confirms that the ID is stored in the robot 110, the robot 110 outputs a message or music associated with the stored ID by voice. In this way, even when communication between the server 120 and the robot 110 is unstable, the robot 110 reliably utters a message at a set time using locally stored data. be able to.

＜第８の実施の形態＞
以下、第８の実施の形態について説明する。第８の実施の形態に係るシステムは、後述する点を除いて、第１の実施の形態に係るシステム１００の構成と同様の構成によって実現される。したがって、本実施の形態に係るシステムの構成の説明は繰り返さない。 <Eighth Embodiment>
The eighth embodiment will be described below. The system according to the eighth embodiment is realized by the same configuration as the configuration of the system 100 according to the first embodiment, except as described below. Therefore, the description of the system configuration according to the present embodiment will not be repeated.

本実施の形態に係るシステムは、ロボット１１０が複数種類の音声を出力できる点で前述の各実施の形態に係るシステムと異なる。たとえば、サーバ１２０は、複数の声優の各々の音声を出力するための音声データを予め保持している。ユーザは、どの声優の音声による発話を望むかを示す情報をサーバ１２０に送信する。当該情報は、声優のＩＤを含む。サーバ１２０は、その情報を受信すると、声優のＩＤを取り出し、当該ユーザに対するメッセージとして、当該声優の音声を用いたメッセージを生成し、そのメッセージをロボット１１０に送信する。このようにして、ユーザは、希望の声優の音声による目覚ましを楽しむことができる。 The system according to the present embodiment is different from the systems according to the above-described embodiments in that the robot 110 can output a plurality of types of sounds. For example, the server 120 holds in advance voice data for outputting the voices of a plurality of voice actors. The user transmits to the server 120 information indicating which voice actor wants to speak using voice. The information includes the voice actor ID. Upon receiving the information, the server 120 extracts the voice actor ID, generates a message using the voice of the voice actor as a message for the user, and transmits the message to the robot 110. In this way, the user can enjoy alarming with the voice of the desired voice actor.

別の局面において、サーバ１２０は、複数の声優のいずれかの音声による発話をランダムにロボット１１０に出力し得る。ランダムな出力は、たとえば乱数発生器により発生される乱数を用いて実現され得る。このようにすると、ロボット１１０のユーザは、どの声優による目覚まし音声がロボット１１０から出力されるか事前に知ることができないので、通常のアラーム機能に加えて、ちょっとしたサプライズを享受し得る。 In another aspect, the server 120 may output an utterance by any one of a plurality of voice actors to the robot 110 at random. The random output can be realized using a random number generated by a random number generator, for example. In this way, since the user of the robot 110 cannot know in advance which voice actor will wake up from the robot 110, in addition to the normal alarm function, the user of the robot 110 can enjoy a little surprise.

［構成の要約］
以上より、本開示に係る技術的特徴は、たとえば、システム１００、サーバ１２０、ロボット１１０として、以下のように要約し得る。 [Configuration summary]
As described above, the technical features according to the present disclosure can be summarized as the system 100, the server 120, and the robot 110, for example, as follows.

（１）ある実施の形態に従うと、発話によりロボット１１０を動作させるシステム１００が提供される。システム１００は、発話を受け付ける複数のロボット（たとえば、ロボット１１０Ａ，１１０Ｂ，・・・，１１０Ｎ）と、複数のロボット１１０と通信可能なサーバ１２０とを備える。ロボット１１０は、発話を受け付けるための音声入力部（たとえば、マイク７２０）と、ロボット１１０の識別情報および受け付けられた発話の内容をサーバに送信するための送信部（たとえば、通信Ｉ／Ｆ７９０）とを備える。サーバは、識別情報および発話の内容を保持するためのハードディスク５と、発話の内容に応じた命令を生成するための生成部（たとえばＣＰＵ１）と、識別情報および命令を当該ロボット１１０に送信するための送信部（たとえば通信Ｉ／Ｆ７）とを備える。ロボット１１０は、サーバから、命令を受信するための受信部（たとえば通信Ｉ／Ｆ７９０）と、命令に基づいて動作を実行する動作部（たとえばモータ７７０，車輪７８０）とを備える。 (1) According to an embodiment, a system 100 for operating a robot 110 by speaking is provided. The system 100 includes a plurality of robots (for example, robots 110A, 110B,..., 110N) that accept utterances, and a server 120 that can communicate with the plurality of robots 110. The robot 110 includes a voice input unit (for example, a microphone 720) for receiving an utterance, and a transmission unit (for example, a communication I / F 790) for transmitting the identification information of the robot 110 and the content of the received utterance to a server. Is provided. The server transmits the hard disk 5 for holding the identification information and the content of the utterance, the generation unit (for example, the CPU 1) for generating a command corresponding to the content of the utterance, and the identification information and the command to the robot 110. The transmission part (for example, communication I / F7) is provided. The robot 110 includes a receiving unit (for example, a communication I / F 790) for receiving a command from the server, and an operation unit (for example, a motor 770 and a wheel 780) that executes an operation based on the command.

（２）別の実施の形態に従うと、ロボット１１０を制御するサーバ１２０が提供される。このサーバ１２０は、複数のロボット１１０を介して発話の内容を受け付けるための通信Ｉ／Ｆ７と、ロボット１１０の識別情報および発話の内容を対応づけて保持するためのハードディスク５と、発話の内容に応じた命令を生成するためのＣＰＵ１と、命令を、当該命令を生成するための発話の内容に対応づけられた識別情報により特定されるロボット１１０に送信するための通信Ｉ／Ｆ７とを備える。 (2) According to another embodiment, a server 120 for controlling the robot 110 is provided. This server 120 includes a communication I / F 7 for accepting utterance contents via a plurality of robots 110, a hard disk 5 for associating and holding the identification information of the robot 110 and the utterance contents, and the utterance contents. The CPU 1 for generating a corresponding command and the communication I / F 7 for transmitting the command to the robot 110 specified by the identification information associated with the content of the utterance for generating the command.

（３）別の局面において、通信Ｉ／Ｆ７は、命令を確認するための情報をロボット１１０に送信する。 (3) In another aspect, the communication I / F 7 transmits information for confirming the command to the robot 110.

（４）別の局面において、通信Ｉ／Ｆ７は、ロボット１１０からの発話の内容に基づいて、命令を、当該命令を生成するための発話の内容に対応づけられた識別情報により特定されるロボット１１０に送信する。 (4) In another aspect, the communication I / F 7 is based on the content of the utterance from the robot 110, and the robot is specified by the identification information associated with the content of the utterance for generating the command. 110.

（５）別の局面において、サーバ１２０は、少なくとも１つのロボット１１０に音声を出力させるための音声データをさらに保持するように構成されている。通信Ｉ／Ｆ７は、少なくとも１つのロボット１１０に対して、音声データをさらに送信する。 (5) In another aspect, the server 120 is configured to further hold audio data for causing the at least one robot 110 to output audio. The communication I / F 7 further transmits voice data to at least one robot 110.

（６）別の局面において、サーバは、複数の音声データを保持するように構成されている。通信Ｉ／Ｆ７は、当該複数の音声データのうちのいずれかの音声データを選択する発話の入力を受け付けるように構成されている。ＣＰＵ１は、選択された音声データを用いて命令を生成する。 (6) In another aspect, the server is configured to hold a plurality of audio data. The communication I / F 7 is configured to accept an utterance input for selecting any one of the plurality of voice data. The CPU 1 generates a command using the selected audio data.

（７）別の局面において、ロボット１１０は、発話に基づいて、当該発話者が当該ロボット１１０に登録されたユーザであるか否かを判断するように構成されている。当該発話者がユーザである場合に、サーバ１２０の通信Ｉ／Ｆ７は、当該ロボット１１０の識別情報および受け付けられた発話を受信する。 (7) In another aspect, the robot 110 is configured to determine whether or not the speaker is a user registered in the robot 110 based on the utterance. When the speaker is a user, the communication I / F 7 of the server 120 receives the identification information of the robot 110 and the accepted utterance.

（８）別の実施の形態に従うと、ある局面において、ロボット１１０は、サーバ１２０と通信するための通信Ｉ／Ｆ７９０と、発話を受け付けるためのマイク７２０と、ロボット１１０の動作を実行する動作部（たとえば、モータ７７０、車輪７８０）とを備える。 (8) According to another embodiment, in one aspect, the robot 110 includes a communication I / F 790 for communicating with the server 120, a microphone 720 for receiving an utterance, and an operation unit that executes the operation of the robot 110. (For example, a motor 770 and wheels 780).

通信Ｉ／Ｆ７９０は、マイク７２０が受け付けた発話の内容をサーバ１２０に送信し、発話の内容に応じた命令をサーバ１２０から受信する。動作部は、受信した命令に基づいて動作を実行する。たとえば、車輪７８０は、モータ７７０の運転に応じて回転し、ロボット１１０を移動する。 Communication I / F 790 transmits the content of the utterance accepted by microphone 720 to server 120, and receives a command corresponding to the content of the utterance from server 120. The operation unit executes an operation based on the received command. For example, the wheel 780 rotates according to the operation of the motor 770 and moves the robot 110.

（９）別の局面において、ロボット１１０は、音声を出力するための音声データをサーバから受信するように構成された受信部（たとえば通信Ｉ／Ｆ７９０）をさらに備える。動作部は、音声データに基づいて音声を出力するための音声出力部（たとえば、スピーカ７３０）を含む。 (9) In another aspect, the robot 110 further includes a receiving unit (for example, a communication I / F 790) configured to receive audio data for outputting audio from the server. The operation unit includes an audio output unit (for example, a speaker 730) for outputting audio based on the audio data.

（１０）別の局面において、ロボット１１０は、音声データを保存するためのメモリと、次の発話がロボット１１０に与えられた場合に、次の発話を出力するための音声データがメモリに保存されているか否かを確認するための確認部（たとえばコントローラ７５０）とをさらに備える。送信されるべき音声データが当該ロボット１１０に保存されている場合に、スピーカ７３０は、メモリに保存されている音声データに基づいて音声を出力する。 (10) In another aspect, the robot 110 stores a memory for storing voice data and a voice data for outputting the next utterance when the next utterance is given to the robot 110. A confirmation unit (for example, a controller 750) for confirming whether or not the When audio data to be transmitted is stored in the robot 110, the speaker 730 outputs audio based on the audio data stored in the memory.

（１１）別の局面において、サーバ１２０は、複数の音声データを保持するように構成されている。マイク７２０は、当該複数の音声データのうちのいずれかの音声データを選択する発話の入力を受け付ける。コントローラ７５０は、生成部として、選択された音声データを用いて命令を生成する。スピーカ７３０は、選択された音声データに基づく音声を出力する。 (11) In another aspect, the server 120 is configured to hold a plurality of audio data. The microphone 720 receives an input of an utterance that selects any one of the plurality of audio data. The controller 750 generates a command using the selected audio data as a generation unit. The speaker 730 outputs sound based on the selected sound data.

（１２）別の局面において、ロボット１１０は、命令に基づいて、サーバと通信可能な他の機器（たとえば、他のロボット）に対して、当該機器を制御するための信号を送信する。 (12) In another aspect, the robot 110 transmits a signal for controlling the device to another device (for example, another robot) that can communicate with the server based on the command.

（１３）別の局面において、ロボット１１０は、発話に基づいて、当該発話者が当該ロボット１１０に登録されたユーザであるか否かを判断するための認証部をさらに備える。認証部は、たとえば、コントローラ７５０が認証処理を実行することにより実現される。当該発話者がユーザである場合に、通信Ｉ／Ｆ７９０は、ロボット１１０の識別情報および受け付けられた発話をサーバ１２０に送信する。 (13) In another aspect, the robot 110 further includes an authentication unit for determining whether the speaker is a user registered in the robot 110 based on the utterance. The authentication unit is realized, for example, when the controller 750 executes an authentication process. When the speaker is a user, the communication I / F 790 transmits the identification information of the robot 110 and the received utterance to the server 120.

（１４）別の局面において、ロボット１１０は、人感センサーと、人感センサーからの出力に基づいてユーザの近傍に移動するための移動部（たとえば、車輪７８０）とをさらに備える。 (14) In another aspect, the robot 110 further includes a human sensor and a moving unit (for example, a wheel 780) for moving to the vicinity of the user based on an output from the human sensor.

（１５）好ましくは、人感センサーは、サーバからの指示に従って起動するように構成されている。 (15) Preferably, the human sensor is configured to be activated in accordance with an instruction from the server.

（１６）他の実施の形態に従うと、電子機器を制御するサーバの制御方法が提供される。制御方法は、複数のロボット１１０を介して発話の内容を受け付けるステップと、ロボット１１０の識別情報および発話の内容を対応付けて保持するステップと、発話の内容に応じた命令を生成するステップと、前記命令を、当該命令を生成するための発話の内容に対応付けられた識別情報により特定されるロボット１１０に送信するステップとを備える。 (16) According to another embodiment, a server control method for controlling an electronic device is provided. The control method includes a step of accepting the content of the utterance via the plurality of robots 110, a step of holding the identification information of the robot 110 and the content of the utterance in association with each other, a step of generating a command according to the content of the utterance, Transmitting the command to the robot 110 specified by the identification information associated with the content of the utterance for generating the command.

以上から、上述の各実施の形態によれば、１つのサーバ１２０が複数の電子機器（たとえばロボット１１０Ａ，１１０Ｂ・・・１１０Ｎ）に対し、それぞれ命令を送信し、それぞれの電子機器からの発話内容に応じた動作を実行させる。 As described above, according to each of the above-described embodiments, one server 120 transmits a command to each of a plurality of electronic devices (for example, robots 110A, 110B,... 110N), and the utterance content from each electronic device. The operation according to is performed.

また、別の局面において、サーバ１２０の中の１つの機能を複数の電子機器にそれぞれ動作させるものであってもよい。その際、当該機能の諸設定は各電子機器毎に行われるものであってもよい。 In another aspect, one function in the server 120 may be operated by a plurality of electronic devices. In that case, various settings of the function may be performed for each electronic device.

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 The embodiment disclosed this time should be considered as illustrative in all points and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

１ＣＰＵ、２マウス、３キーボード、４ＲＡＭ、５ハードディスク、６光ディスク駆動装置、８，７４０モニタ、９ＲＯＭ、１００システム、１１０，１１０Ａ，１１０Ｂ，１１０Ｎロボット、１２０サーバ、１３０インターネット、２１０音声入力部、２２０音声認識処理部、２３０抽出部、２４０記憶部、２５０動作部、２５１音声出力部、２６０命令生成部、２７０計時部、４１０タイプ、４２０，５２０テンプレート、６２０時刻データ、７１０操作パネル、７２０マイク、７３０スピーカ、７５０コントローラ、７５５音声認識処理プロセッサ、７６０メモリ、７７０モータ、７８０車輪。 1 CPU, 2 mouse, 3 keyboard, 4 RAM, 5 hard disk, 6 optical disk drive, 8,740 monitor, 9 ROM, 100 system, 110, 110A, 110B, 110N robot, 120 server, 130 Internet, 210 voice input unit , 220 voice recognition processing unit, 230 extraction unit, 240 storage unit, 250 operation unit, 251 voice output unit, 260 command generation unit, 270 timing unit, 410 type, 420,520 template, 620 time data, 710 operation panel, 720 Microphone, 730 speaker, 750 controller, 755 voice recognition processor, 760 memory, 770 motor, 780 wheel.

Claims

A system for operating an electronic device by speaking,
A plurality of electronic devices that accept the utterance;
A server capable of communicating with the plurality of electronic devices,
The electronic device is
Voice input means for accepting the utterance;
Transmission means for transmitting the identification information of the electronic device and the content of the accepted utterance to the server,
The server
Storage means for associating and holding the content of the utterance including the identification information and time setting;
Generating means for generating an instruction based on the time setting corresponding to the identification information among the time settings held in the storage means for the contents of the utterance not including the time setting;
A transmission means for transmitting the identification information and the command to the electronic device,
The generating means includes
When receiving an utterance of the content to cancel the command based on the time setting from the electronic device, generate an output command of a message confirming cancellation of the command to the electronic device,
When an utterance with the content of canceling the cancellation of the command is received as a response to the message from the electronic device, a message output command including a time setting corresponding to the identification information of the electronic device is sent to the electronic device Is configured to generate
The electronic device is
Receiving means for receiving the command from the server;
An operation means for executing an operation based on the instruction.

A server for controlling an electronic device,
Input means for accepting the content of an utterance via a plurality of the electronic devices;
Storage means for associating and holding the content of the utterance including identification information and time setting of the electronic device;
Generating means for generating an instruction based on the time setting corresponding to the identification information among the time settings held in the storage means for the content of the utterance not including the time setting;
Transmission means for transmitting the command to the electronic device specified by the identification information associated with the content of the utterance for generating the command ,
The generating means includes
When receiving an utterance of the content to cancel the command based on the time setting from the electronic device, generate an output command of a message confirming cancellation of the command to the electronic device,
When an utterance with the content of canceling the cancellation of the command is received as a response to the message from the electronic device, a message output command including a time setting corresponding to the identification information of the electronic device is sent to the electronic device Ru is configured to generate a server.

The server according to claim 2, wherein the transmission unit transmits information for confirming the command to the electronic device.

The transmission means transmits the command to the electronic device specified by the identification information associated with the content of the utterance for generating the command based on the content of the utterance from the electronic device. The server according to claim 2.

The server is configured to further hold audio data for causing the at least one electronic device to output audio,
The server according to claim 2, wherein the transmission unit is configured to further transmit the audio data to the at least one electronic device.

The server is configured to hold a plurality of the audio data;
The input means is configured to accept an input of an utterance for selecting any one of the plurality of voice data.
The server according to claim 5, wherein the generation unit is configured to generate the command using the selected voice data.

The electronic device is configured to determine whether the speaker is a user registered in the electronic device based on the utterance,
The voice according to claim 2, wherein when the speaker is the user, the input unit is configured to receive the identification information of the electronic device and the received utterance. Recognition server.

Electronic equipment,
A communication means for communicating with the server;
Voice input means for accepting utterances;
Operating means for executing the operation of the electronic device,
The communication means includes
Transmitting the content of the utterance accepted by the voice input means to the server;
Based on the content of the utterance including the time setting transmitted to the server in the past, the command generated by the server according to the content of the utterance not including the time setting is received from the server,
Based on the content of the utterance to cancel the content of the utterance including the time setting transmitted to the server in the past, the output command of the message confirming the cancellation generated by the server is received,
Receiving a message output command including a time setting generated by the server and transmitted to the server in the past, based on the utterance of the content to cancel the cancellation in response to the message;
The operation means is an electronic device that performs an operation based on a received command.

Receiving means configured to receive voice data for outputting voice from the server;
The electronic apparatus according to claim 8, wherein the operation unit includes an audio output unit for outputting audio based on the audio data.

A memory for storing the audio data;
When the next utterance is given to the electronic device, it further comprises confirmation means for confirming whether audio data for outputting the next utterance is stored in the memory,
The audio output means is configured to output audio based on audio data stored in the memory when the audio data to be transmitted is stored in the electronic device. 9. The electronic device according to 9.

The server is configured to hold a plurality of the audio data;
The voice input means is configured to accept an input of an utterance that selects any one of the plurality of voice data.
The server generates the command using the selected voice data,
The electronic device according to claim 9, wherein the sound output unit is configured to output sound based on the selected sound data.

12. The transmitter according to claim 8, further comprising: a transmission unit configured to transmit a signal for controlling the device to another device capable of communicating with the server based on the command. The electronic device described.

Further comprising authentication means for determining whether or not the speaker is a user registered in the electronic device based on the utterance;
The communication means is configured to transmit the identification information of the electronic device and the accepted utterance to the server when the utterer is the user. The electronic device described.

With human sensor,
The electronic device according to any one of claims 8 to 13, further comprising moving means for moving to a vicinity of a user based on an output from the human sensor.

The electronic device according to claim 14, wherein the human sensor is configured to be activated in accordance with an instruction from the server.

A server control method for controlling an electronic device,
Receiving utterance content via a plurality of the electronic devices;
A step of associating and holding the content of the utterance including identification information of the electronic device and time setting;
Generating a command based on a time setting corresponding to the identification information among the held time settings for the content of the utterance not including a time setting;
Transmitting the command to the electronic device specified by the identification information associated with the content of the utterance for generating the command ,
Generating the instructions comprises:
Generating an output command for a message for confirming cancellation of the command to the electronic device when an utterance of the content to cancel the command based on the time setting is received from the electronic device;
When an utterance with the content of canceling the cancellation of the command is received as a response to the message from the electronic device, a message output command including a time setting corresponding to the identification information of the electronic device is sent to the electronic device Generating a server.

A program for causing a computer to execute the method according to claim 16.