JP6373709B2

JP6373709B2 - Dialogue device

Info

Publication number: JP6373709B2
Application number: JP2014202219A
Authority: JP
Inventors: 圭司坂; 俊介山縣
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2014-09-30
Filing date: 2014-09-30
Publication date: 2018-08-15
Anticipated expiration: 2034-09-30
Also published as: JP2016071248A; WO2016052520A1

Description

本発明は、ユーザの音声を認識して応答する対話装置及び対話システムに関する。 The present invention relates to an interactive apparatus and an interactive system that recognize and respond to a user's voice.

近年、介護や癒しのケアロボットや家事代行ロボットといったロボットが徐々にユーザの生活に浸透してきている。例えば、特許文献１〜４に開示されているように、音声認識機能を有し、ユーザの発話に対して応答する対話ロボット（対話装置）や、例えば、特許文献５，６に開示されているように、音声認識機能に加え、ユーザの生活情報を蓄積してユーザの補助や支援を行う機能を有する対話ロボットも開発されている。 In recent years, robots such as care and healing robots and housekeeping robots have gradually permeated the lives of users. For example, as disclosed in Patent Documents 1 to 4, an interactive robot (interactive device) that has a voice recognition function and responds to a user's utterance, for example, disclosed in Patent Documents 5 and 6 Thus, in addition to the voice recognition function, a dialogue robot having a function of accumulating user life information and assisting or supporting the user has been developed.

国際公開公報ＷＯ０５／０７６２５８Ａ１（２００５年８月１８日公開）International Publication No. WO05 / 076258A1 (released on August 18, 2005) 特開２００６−０４３７８０（２００６年２月１６日公開）JP 2006-043780 (released February 16, 2006) 特開２０１０−１２８２８１（２０１０年６月１０日公開）JP 2010-128281 (released on June 10, 2010) 特開２００３−０２２０９２（２００３年１月２４日公開）JP2003-022092 (released on January 24, 2003) 特開２００４−１７１８４（２００４年１月２２日公開）JP 2004-17184 (January 22, 2004) 特開２００７−１５２４４４（２００７年６月２１日公開）JP2007-152444 (released on June 21, 2007)

従来の対話ロボットは、性能やコストの面から複雑な音声認識を行うことは困難であり、また応答内容もパターン化されたものや単純なものであり、面白みに欠け、飽きられ易いものとなりがちである。そこで、対話ロボットをサーバ装置と通信接続させ、対話ロボットがサーバ装置による音声認識に基づく応答内容を受信して出力（応答）するシステムも開発されている。しかし、この場合、対話ロボット単体で音声認識して応答する場合と比べて、応答のタイミングが遅れてしまう。また、通信が切断されると、応答内容を受信できない。そのため、ユーザがストレスを感じ、会話し難いといった思いをすることもある。 Conventional conversation robots are difficult to perform complex speech recognition in terms of performance and cost, and the response content is also patterned and simple, which tends to be uninteresting and easy to get tired of. It is. Therefore, a system has also been developed in which a dialogue robot is connected to a server device in communication, and the dialogue robot receives and outputs (responses) response contents based on voice recognition by the server device. However, in this case, the response timing is delayed as compared with the case where the dialog robot alone recognizes and responds. Further, when communication is disconnected, the response content cannot be received. For this reason, the user may feel stressed and feel difficult to talk.

そこで、本発明は、上記の問題点に鑑みてなされたものであり、複数の情報をスムーズに音声出力でき、ユーザにストレスを与えることなく快適な対話環境を提供できる対話装置及び対話システムを提供することにある。 Accordingly, the present invention has been made in view of the above problems, and provides an interactive apparatus and an interactive system that can smoothly output a plurality of information and provide a comfortable interactive environment without stressing the user. There is to do.

上記の課題を解決するために、本発明の一態様に係る対話装置は、入力音声を音声認識する音声認識手段と、上記音声認識の結果に応じた応答内容を示す主応答情報、及び、当該応答情報に対応づけられ当該主応答情報が示す応答内容に付加される応答内容を示す副応答情報を格納する応答情報格納部と、上記入力音声が入力される時刻を推定する時刻算出手段と、上記推定された時刻である推定入力時刻よりも前に、上記副応答情報の生成または更新に用いる材料情報を取得して上記副応答情報を生成または更新する副応答情報生成手段と、上記入力音声の入力に対し、上記応答情報格納部を参照して得られる上記主応答情報が示す応答内容と共に上記副応答情報が示す応答内容を音声出力する出力制御手段と、を備えたことを特徴とする。 In order to solve the above-described problem, an interactive apparatus according to an aspect of the present invention includes a speech recognition unit that recognizes input speech, main response information that indicates response contents according to the result of the speech recognition, and A response information storage unit that stores sub-response information indicating response content that is associated with response information and added to the response content indicated by the main response information; time calculation means that estimates the time when the input voice is input; Sub-response information generating means for acquiring material information used for generating or updating the sub-response information and generating or updating the sub-response information before the estimated input time, which is the estimated time, and the input voice Output control means for outputting the response content indicated by the sub-response information together with the response content indicated by the main response information obtained by referring to the response information storage unit.

本発明の一態様に係る対話装置によると、入力音声に対して、主応答情報が示す応答内容の音声出力に、副応答情報が示す応答内容の音声出力を付加できるので、複数の情報での応答が可能である。また、副応答情報は、入力音声の推定入力時刻よりも前に生成または更新されるので、変化に富んだ応答が可能である。このように、上記構成によると、複数の情報をスムーズに音声出力でき、ユーザにストレスを与えることなく快適な対話環境を提供できる。 According to the interactive apparatus according to one aspect of the present invention, since the voice output of the response content indicated by the secondary response information can be added to the voice output of the response content indicated by the main response information with respect to the input voice, A response is possible. Moreover, since the secondary response information is generated or updated before the estimated input time of the input voice, a response rich in change is possible. Thus, according to the above configuration, a plurality of information can be smoothly output as a voice, and a comfortable interactive environment can be provided without causing stress to the user.

本発明の実施の形態１に係る対話装置の概略構成を示す図である。It is a figure which shows schematic structure of the dialogue apparatus which concerns on Embodiment 1 of this invention. 本発明の実施の形態１に係る対話システムの概略構成を示す図である。It is a figure which shows schematic structure of the dialogue system which concerns on Embodiment 1 of this invention. 本発明の実施の形態１に係る対話装置で用いる主応答情報及び副応答情報の一例を説明する図である。It is a figure explaining an example of the main response information and sub response information used with the dialogue apparatus which concerns on Embodiment 1 of this invention. 本発明の実施の形態１に係る対話装置で用いる主応答情報及び副応答情報の別の例を説明する図である。It is a figure explaining another example of the main response information and sub response information which are used with the dialogue apparatus which concerns on Embodiment 1 of this invention. 本発明の実施の形態２に係る対話装置が設置されているユーザ宅の概略構成を示す図である。It is a figure which shows schematic structure of the user's house where the dialogue apparatus which concerns on Embodiment 2 of this invention is installed. 本発明の実施の形態２に係る対話装置の概略構成を示す図である。It is a figure which shows schematic structure of the dialogue apparatus which concerns on Embodiment 2 of this invention. 本発明の実施の形態２に係る対話装置の動作モードを説明する図である。It is a figure explaining the operation mode of the dialogue apparatus which concerns on Embodiment 2 of this invention.

〔実施の形態１〕
以下、本発明の一実施形態について図１〜４に基づいて説明すれば以下の通りである。 [Embodiment 1]
Hereinafter, it will be as follows if one Embodiment of this invention is described based on FIGS.

（対話システムの構成）
図１は、本実施の形態に係る対話システム１００の構成を示す図である。図１に示すように、対話システム１００は、対話装置１０、管理サーバ３０、情報提供サーバ３１−１，３１−２、通信端末７０、を備えており、これらは通信ネットワークを介して接続している。この通信ネットワークとしては、例えば、インターネットが利用できる。また、電話回線網、移動体通信網、ＣＡＴＶ（CAble TeleVision）通信網、衛星通信網などを利用することもできる。 (Configuration of interactive system)
FIG. 1 is a diagram showing a configuration of a dialogue system 100 according to the present embodiment. As shown in FIG. 1, the dialogue system 100 includes a dialogue device 10, a management server 30, information providing servers 31-1, 31-2, and a communication terminal 70, which are connected via a communication network. Yes. For example, the Internet can be used as this communication network. Further, a telephone line network, a mobile communication network, a CATV (CAble TeleVision) communication network, a satellite communication network, or the like can be used.

対話装置１０は音声認識機能を有しており、ユーザは自然言語を用いた音声出力（発話）によって対話装置１０と対話することができる。対話装置１０は、対話ロボットであってもよいし、音声認識機能を備えた、スマートフォン、タブレット端末、パーソナルコンピュータ、家電（家庭用電子機器）等であってもよい。 The dialogue apparatus 10 has a voice recognition function, and the user can interact with the dialogue apparatus 10 by voice output (utterance) using natural language. The dialogue apparatus 10 may be a dialogue robot, or may be a smartphone, a tablet terminal, a personal computer, a home appliance (household electronic device) or the like having a voice recognition function.

管理サーバは、対話装置１０を管理する装置であり、情報提供サーバ３１−１、３１−２は対話装置１０に各種情報を提供する装置であり、通信端末７０は、対話装置１０のユーザが有する通信端末であり、例えば、ユーザに関する情報を管理サーバに登録するのに用いられる。詳細は後述する。 The management server is a device that manages the interactive device 10, the information providing servers 31-1 and 31-2 are devices that provide various information to the interactive device 10, and the communication terminal 70 is owned by the user of the interactive device 10. For example, the communication terminal is used to register information about the user in the management server. Details will be described later.

なお、図１では、説明の簡略化のため、１つの対話装置１０、１つの通信端末７０、２つの情報提供サーバ３１−１，３１−２を図示しているが、これらの数は限定されない。また、図１では、対話装置１０は対話ロボットとして、通信端末７０はスマートフォンとして表わされているが、これらに限定されるものではない。また、管理サーバ３０が管理する対話装置１０の種類は問わず、つまり、管理サーバ３０に、対話ロボットとスマートフォンといったように異なる種類の対話装置１０が通信接続していてもよい。 In FIG. 1, for simplification of explanation, one interactive device 10, one communication terminal 70, and two information providing servers 31-1 and 31-2 are illustrated, but the number thereof is not limited. . Further, in FIG. 1, the dialogue apparatus 10 is represented as a dialogue robot and the communication terminal 70 is represented as a smartphone, but is not limited thereto. Further, the type of the interactive device 10 managed by the management server 30 is not limited, that is, different types of interactive devices 10 such as an interactive robot and a smartphone may be connected to the management server 30 by communication.

（対話装置）
対話装置１０の構成について説明する。対話装置１０は、音声（入力音声）が入力されると、音声認識を行い、その認識結果に応じた対話を行う装置である。対話装置１０は、図１に示すように、音声入力部１１、音声出力部１２、制御部１３、データ格納部１４、及び通信部１５を備えている。 (Interactive device)
A configuration of the dialogue apparatus 10 will be described. The dialogue device 10 is a device that performs voice recognition when a voice (input voice) is inputted and performs a dialogue according to the recognition result. As shown in FIG. 1, the dialogue apparatus 10 includes a voice input unit 11, a voice output unit 12, a control unit 13, a data storage unit 14, and a communication unit 15.

音声入力部１１は、マイク等の音声入力装置であり、音声出力部１２は、スピーカ等の音声出力装置である。 The voice input unit 11 is a voice input device such as a microphone, and the voice output unit 12 is a voice output device such as a speaker.

制御部１３は、対話装置１０の各部の動作を制御するブロックである。制御部１３は、例えば、ＣＰＵ（Central Processing Unit）や専用プロセッサなどの演算処理部などにより構成されるコンピュータ装置から成る。制御部１３は、データ格納部１４に記憶されている対話装置１０における各種制御を実施するためのプログラムを読み出して実行することで、対話装置１０の各部の動作を統括的に制御する。 The control unit 13 is a block that controls the operation of each unit of the interactive apparatus 10. The control unit 13 includes a computer device including an arithmetic processing unit such as a CPU (Central Processing Unit) and a dedicated processor, for example. The control unit 13 reads out and executes a program for executing various controls in the interactive device 10 stored in the data storage unit 14, thereby controlling the operation of each unit of the interactive device 10 in an integrated manner.

データ格納部１４は、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、ＨＤＤ（Hard Disk Drive）などを含み、対話装置１０にて用いられる各種情報（データ）を記憶するブロックである。また、データ格納部１４には、応答情報格納部１４１が含まれる。応答情報格納部１４１は、単語やフレーズに対応させて主応答情報が登録されているデータベースである。主応答情報は、単語１つに対応したものだけでなく、複数の単語の組み合わせに対応しものが登録されている。また、ある単語やあるフレーズに対応させて複数の主対応情報が登録されていてもよく、この場合、実際に音声出力されるものを選択すればよい。なお、単語やフレーズおよび主応答情報は、何れもテキストデータとして格納しておけばよい。このようなデータベースの構築、また、データベースからの応答情報の取得については、公知技術が利用できる。 The data storage unit 14 includes a RAM (Random Access Memory), a ROM (Read Only Memory), an HDD (Hard Disk Drive), and the like, and is a block that stores various information (data) used in the interactive device 10. The data storage unit 14 includes a response information storage unit 141. The response information storage unit 141 is a database in which main response information is registered in association with words and phrases. The main response information is registered not only for one word but also for a combination of a plurality of words. In addition, a plurality of pieces of main correspondence information may be registered corresponding to a certain word or a certain phrase, and in this case, what is actually output by voice may be selected. Note that the words, phrases, and main response information may all be stored as text data. Known techniques can be used to construct such a database and to obtain response information from the database.

さらに、応答情報格納部１４１には、主応答情報対応付けられて副応答情報が登録されている。副応答情報は、主応答情報が示す応答内容に付加される応答内容を示す情報であり、後述のように、応答情報格納部１４１に格納されていない場合には、所定時刻になると生成されて格納される。また、格納されている場合には、所定時刻になると更新される。副応答情報については、具体例を用いて後述する。副応答情報もテキストデータとして応答情報格納部１４１に格納すればよい。 Further, in the response information storage unit 141, sub response information is registered in association with main response information. The secondary response information is information indicating the response content added to the response content indicated by the main response information, and is generated at a predetermined time when not stored in the response information storage unit 141 as will be described later. Stored. If stored, it is updated at a predetermined time. The secondary response information will be described later using a specific example. The secondary response information may be stored in the response information storage unit 141 as text data.

さらに、制御部１３は、音声認識部（音声認識手段）１６、時刻算出部（時刻算出手段）１７、材料情報取得部１８、副応答情報生成部（副応答情報生成手段）１９、出力制御部（出力制御手段）２０、及び音声合成部２１としての機能を有する。 Furthermore, the control unit 13 includes a voice recognition unit (speech recognition unit) 16, a time calculation unit (time calculation unit) 17, a material information acquisition unit 18, a sub response information generation unit (sub response information generation unit) 19, and an output control unit. (Output control means) 20 and functions as a speech synthesizer 21.

音声認識部１６は、ユーザからの入力音声を認識するブロックである。具体的には、音声認識部１６は、音声入力部１１から入力された音声データをテキストデータに変換して、そのテキストデータを解析して単語やフレーズを抽出する。なお、音声認識の処理について公知技術を用いることができる。 The voice recognition unit 16 is a block that recognizes an input voice from the user. Specifically, the voice recognition unit 16 converts voice data input from the voice input unit 11 into text data, analyzes the text data, and extracts words and phrases. A known technique can be used for voice recognition processing.

時刻算出部１７は、入力音声が入力される時刻を推定（算出）するブロックである。時刻算出部１７により推定された時刻を推定入力時刻と称する。材料情報取得部１８は、副応答情報の生成または更新に用いる後述の材料情報を取得するブロックである。副応答情報生成部１９は、時刻算出部１７が算出した推定入力時刻よりも前に、材料情報取得部１８から材料情報を受け取り（取得し）、副応答情報を生成または更新するブロックである。生成または更新された副応答情報は応答情報格納部１４１に格納される。 The time calculation unit 17 is a block that estimates (calculates) the time when the input voice is input. The time estimated by the time calculation unit 17 is referred to as an estimated input time. The material information acquisition unit 18 is a block that acquires material information, which will be described later, used for generating or updating the secondary response information. The secondary response information generation unit 19 is a block that receives (acquires) material information from the material information acquisition unit 18 before the estimated input time calculated by the time calculation unit 17 and generates or updates the secondary response information. The generated or updated sub response information is stored in the response information storage unit 141.

本実施形態では、時刻算出部１７は、入力音声のうちの特定の入力音声の推定入力時刻を算出し、副応答情報生成部１９は、時刻算出部１７が算出した特定の入力音声の推定入力時刻よりも前に、応答情報格納部１４１に格納されている全ての副応答情報を生成または更新する。例えば、特定の入力音声を「おはよう」とすると、「おはよう」の推定入力時刻よりも前に、つまり、ユーザの一日の始まりと推定される時刻前に、全ての副音声情報を生成または更新することができる。ユーザは、一日毎に生成または更新された副応答情報が示す応答内容の音声を聞くことができる。 In the present embodiment, the time calculation unit 17 calculates an estimated input time of a specific input voice among the input voices, and the secondary response information generation unit 19 calculates an estimated input of the specific input voice calculated by the time calculation unit 17. Prior to the time, all the sub-response information stored in the response information storage unit 141 is generated or updated. For example, if a particular input voice is “good morning”, all sub-voice information is generated or updated before the estimated input time of “good morning”, that is, before the estimated time of the start of the user's day. can do. The user can hear the voice of the response content indicated by the side response information generated or updated every day.

もちろん、入力音声毎に推定入力時刻を算出して、ある入力音声の音声認識の結果に対応付けられた主応答情報に付加される副応答情報を、その入力音声について算出した推定入力時刻前に生成または更新する構成であってもよい。 Of course, the estimated input time is calculated for each input voice, and the secondary response information added to the main response information associated with the result of the voice recognition of the input voice is displayed before the estimated input time calculated for the input voice. The structure which produces | generates or updates may be sufficient.

時刻算出部１７による推定入力時刻を算出について具体例を用いて説明する。時刻算出部１７は、入力音声の過去の入力時刻の情報を基に推定入力時刻を算出する。この場合、例えば、入力音声の前回の入力時刻の情報を推定入力時刻として算出してもよいし、あるいは、過去の所定期間内（例えば、直近の、１週間または１ヶ月）の入力音声の入力時刻の平均を推定入力時刻として算出してもよい。あるいは、時刻算出部１７は、ユーザの生活情報を基に推定入力時刻を算出する。この場合、例えば、ユーザの起床時刻を推定入力時刻として算出してもよい。ユーザの起床時刻は、例えば、対話装置１０に目覚まし時計の機能が備えられており、設定されたアラームを鳴らす時刻から取得してもよい。もちろん、これらは例示であり、これらに限定されない。生活情報は、ユーザの生活状態ないし生活環境に関する情報であれば限定されない。 The calculation of the estimated input time by the time calculation unit 17 will be described using a specific example. The time calculation unit 17 calculates the estimated input time based on the past input time information of the input voice. In this case, for example, the information of the previous input time of the input voice may be calculated as the estimated input time, or the input voice within the past predetermined period (for example, the latest one week or one month) is input. You may calculate the average of time as estimated input time. Alternatively, the time calculation unit 17 calculates the estimated input time based on the user's life information. In this case, for example, the user's wake-up time may be calculated as the estimated input time. For example, the user's wake-up time may be acquired from the time when the dialogue device 10 has a function of an alarm clock and the set alarm is sounded. Of course, these are examples and are not limited thereto. The living information is not limited as long as it is information on the living state or living environment of the user.

副応答情報生成部１９は、材料情報として入力音声の過去の入力時刻の情報を基に副応答情報を生成または更新する。あるいは、副応答情報生成部１９は、材料情報としてユーザの生活状態ないし生活環境に関する生活情報を基に副応答情報を生成または更新する。生活情報は、ユーザの生活状態ないし生活環境に関する情報であればどのような情報でもよく、例えば、対話装置が設置された地域の天気や交通に関する情報、ユーザのスケジュール（計画）、ユーザの生活パターンに関する情報、ユーザの健康に関する情報等が挙げられる。また、生活情報は、声認識部による入力音声の音声認識の結果から得られる情報であってもよい。また、通信部１５を介して外部から受信した情報、あるいは、ユーザないしその周囲の状態を検知する状態検知部が検知した情報であってもよい。また、これらの情報の組み合わせであってもよい。状態検知部については、実施の形態２にて説明を行う。副応答情報については、後段で具体例を用いて説明する。 The secondary response information generation unit 19 generates or updates secondary response information based on the past input time information of the input voice as material information. Alternatively, the auxiliary response information generation unit 19 generates or updates the auxiliary response information based on the life information of the user or the living information related to the living environment as the material information. The living information may be any information as long as it is information related to the user's living state or living environment. For example, information related to the weather and traffic in the area where the interactive device is installed, the user's schedule (plan), and the user's life pattern Information on the user, information on the health of the user, and the like. The life information may be information obtained from the result of speech recognition of the input speech by the voice recognition unit. Moreover, the information received from the outside via the communication part 15 or the information which the state detection part which detects a user thru | or the surrounding condition may detect. Moreover, the combination of these information may be sufficient. The state detection unit will be described in the second embodiment. The secondary response information will be described later using a specific example.

副応答情報が応答情報格納部１４１に登録されていない場合には、副応答情報生成部１９は、材料情報を用いて副応答情報を生成する。この生成の処理は、情報が無い（ゼロ）副応答情報の更新の処理とも言えるので、応答情報の生成の処理も副応答情報の更新の処理に含めてもよい。 When the secondary response information is not registered in the response information storage unit 141, the secondary response information generation unit 19 generates the secondary response information using the material information. Since this generation process can be said to be a process of updating (zero) secondary response information without information, the process of generating response information may be included in the process of updating the secondary response information.

また、上記では、副応答情報生成部１９が、副応答情報を生成または更新するものとして説明したが、推定入力時刻前に副応答情報を生成し、所定時刻あるいは所定の音声（例えば、「おやすみ」）入力後に副音声情報をクリアする構成であってもよい。 In the above description, the secondary response information generation unit 19 generates or updates the secondary response information. However, the secondary response information is generated before the estimated input time, and a predetermined time or a predetermined voice (for example, “good night” “) Sub audio information may be cleared after input.

また、副応答情報生成部１９が、副応答情報を通信ネットワーク６０経由で外部から受信する処理も、副応答情報の生成または更新の処理に含めてもよい。 Further, the process in which the secondary response information generating unit 19 receives the secondary response information from the outside via the communication network 60 may be included in the process of generating or updating the secondary response information.

出力制御部２０は、音声データを音声出力部１２に出力させることで音声出力を行うブロックである。出力制御部２０は、音声入力部１１からの入力音声に対する応答として、応答情報格納部１４１を参照して得られる主応答情報が示す応答内容と共に上記副応答情報が示す応答内容を音声出力する。 The output control unit 20 is a block that performs audio output by causing the audio output unit 12 to output audio data. As a response to the input voice from the voice input unit 11, the output control unit 20 outputs the response content indicated by the sub response information together with the response content indicated by the main response information obtained by referring to the response information storage unit 141.

音声合成部２１は、音声データを生成するブロックである。音声合成部２１は、主応答情報で示される応答内容の音声データ、副応答情報で示される応答内容の音声データを生成する。生成された音声データは、音声出力部１２を介して出力される。 The voice synthesizer 21 is a block that generates voice data. The voice synthesizer 21 generates voice data having response contents indicated by main response information and voice data having response contents indicated by sub-response information. The generated audio data is output via the audio output unit 12.

対話装置１０は、このように、応答情報格納部１４１を参照することにより、ユーザの発話に対して応答を返すことが、つまり、ユーザとの対話が可能になる。 As described above, the interactive device 10 can return a response to the user's utterance by referring to the response information storage unit 141, that is, can interact with the user.

通信部１５は、外部との通信を行うブロックである。通信部１５は、管理サーバ３０及び情報提供サーバ３１−１、３１−２から、生活情報を受信する。 The communication unit 15 is a block that performs communication with the outside. The communication unit 15 receives life information from the management server 30 and the information providing servers 31-1 and 31-2.

以上のように、対話装置１０は、入力音声に対して、主応答情報が示す応答内容の音声出力に、副応答情報が示す応答内容の音声出力を付加できるので、複数の情報での応答が可能である。また、副応答情報は、推定入力時刻よりも前に生成または更新されるので、変化に富んだ応答が可能である。このように、対話装置１０は、複数の情報をスムーズに音声出力でき、ユーザにストレスを与えることなく快適な対話環境を提供できる。 As described above, since the dialogue apparatus 10 can add the voice output of the response content indicated by the secondary response information to the voice output of the response content indicated by the main response information with respect to the input voice, a response with a plurality of pieces of information can be made. Is possible. Further, since the secondary response information is generated or updated before the estimated input time, a response rich in change is possible. As described above, the dialogue apparatus 10 can smoothly output a plurality of pieces of information and can provide a comfortable dialogue environment without stressing the user.

また、副応答情報の生成または更新に用いられる材料情報を外部から取得する場合、入力音声の入力時に外部との通信が途絶えていても、入力音声の推定入力時刻より前に副応答情報を生成または更新するので、この生成または更新後に入力音声が入力されると、生成または更新された副応答情報をユーザに提供することができる。また、入力音声の入力時には、副応答情報も応答情報格納部１４１を参照して得るため、副応答情報を入力時に生成または更新したり、外部から受信したりする装置よりも、すばやい応答（音声出力）が可能である。 In addition, when acquiring material information used for generating or updating secondary response information from outside, secondary response information is generated before the estimated input time of input speech even if communication with the outside is interrupted when input speech is input. Alternatively, since the update is performed, when the input voice is input after the generation or the update, the generated or updated side response information can be provided to the user. Further, since the secondary response information is also obtained by referring to the response information storage unit 141 when the input voice is input, a response (voice) that is quicker than a device that generates or updates the secondary response information at the time of input or receives it from the outside. Output).

なお、対話装置１０に撮像部が備えられている場合には、撮像部から入力された画像からユーザの表情や位置を解析してそれに基づき対話をするように構成されていてもよい。また、撮像部から得られる画像等からユーザを識別して対話をするように構成されていてもよい。 When the interactive device 10 includes an imaging unit, the user's facial expression and position may be analyzed from an image input from the imaging unit, and a dialogue may be performed based on the analysis. Moreover, you may be comprised so that a user may be identified and interacted from the image etc. which are obtained from an imaging part.

（管理サーバ及び情報提供サーバ）
次に、管理サーバ３０及び情報提供サーバ３１−１、３１−２について説明する。 (Management server and information providing server)
Next, the management server 30 and the information providing servers 31-1 and 31-2 will be described.

管理サーバ３０は、対話装置１０を管理する装置である。管理サーバ３０に複数の対話装置１０が接続されている場合には、それぞれを個別に管理する。さらに、管理サーバ３０は、対話装置１０に生活情報を提供（送信）する。管理サーバ３０はが提供する生活情報は、後述のように、通信端末７０から取得（受信）した生活情報である。管理サーバ３０は、クラウドサービスを提供するクラウドサーバであってもよいが、これに限定されることはない。また、管理サーバ３０は、１台であってもよいし、複数台が通信ネットワークを介して接続したものであってもよい。 The management server 30 is a device that manages the interactive device 10. When a plurality of interactive devices 10 are connected to the management server 30, each is managed individually. Furthermore, the management server 30 provides (sends) life information to the dialogue apparatus 10. The life information provided by the management server 30 is life information acquired (received) from the communication terminal 70 as described later. The management server 30 may be a cloud server that provides a cloud service, but is not limited thereto. Moreover, the management server 30 may be one or a plurality of management servers 30 connected via a communication network.

情報提供サーバ３１−１，３１−２は、ユーザの生活情報を提供する装置である。情報提供サーバ３１−１，３１−２が提供する生活情報はどのような情報であってもよく、例えば、気象情報、交通情報、災害情報、行政が発信する地域情報等が挙げられる。以下では、情報提供サーバ３１−１を、天気情報を提供する天気情報提供サーバ３１−１、情報提供サーバ３１−２を、交通情報を提供する交通情報提供サーバ３１−２として説明を行う。 The information providing servers 31-1, 31-2 are devices that provide user life information. The life information provided by the information providing servers 31-1 and 31-2 may be any information, such as weather information, traffic information, disaster information, and local information transmitted by the government. Hereinafter, the information providing server 31-1 will be described as the weather information providing server 31-1 that provides the weather information, and the information providing server 31-2 as the traffic information providing server 31-2 that provides the traffic information.

ここで、管理サーバ３０、天気情報提供サーバ３１−１、交通情報提供サーバ３１−２が、個別にユーザの生活情報を対話装置１０に送信する構成でも、天気情報提供サーバ３１−１及び交通情報提供サーバ３１−２からの材料情報は管理サーバ３０にいったん集約されて、管理サーバから対話装置１０に送信する構成であってもよい。 Here, even if the management server 30, the weather information providing server 31-1, and the traffic information providing server 31-2 individually transmit the user's life information to the dialogue device 10, the weather information providing server 31-1 and the traffic information The material information from the providing server 31-2 may be once aggregated in the management server 30 and transmitted from the management server to the dialogue apparatus 10.

対話システム１００では、このような管理サーバ３０及び情報提供サーバ３１−１，３１−２から提供された生活情報を基に副応答情報を生成または更新できるため、例えば、次のような応答を行うことが可能となる。副応答情報を生成または更新する際に天気情報提供サーバ３１−１の提供する天気情報を利用することで、例えば、「おはよう。」という入力音声に対して、主応答情報が示す応答内容の音声（主応答情報で出力される音声）である「おはよう。」に、副応答情報が示す応答内容の音声（副応答情報で出力される音声）である「今日は雨が降るみたいだよ。」を付加することができる。 In the interactive system 100, the secondary response information can be generated or updated based on the life information provided from the management server 30 and the information providing servers 31-1, 31-2. For example, the following response is made. It becomes possible. By using the weather information provided by the weather information providing server 31-1 when generating or updating the secondary response information, for example, the voice of the response content indicated by the main response information with respect to the input voice “Good morning” “Good morning” (sound output in the main response information) is a voice of the response content indicated by the sub response information (sound output in the sub response information) “It seems to rain today.” Can be added.

さらに、管理サーバ３０は、通信端末７０や図示しない他の通信端末から録音音声を登録できるようになっていてもよい。そして、この場合、副応答情報生成部１９が、この登録された録音音声を副応答情報として取得することも、副応答情報の生成または更新に含める。録音音声は音声データとして形成されているので、そのまま対話装置１０に送信すると、対話装置１０での音声合成の処理はなされない。例えば、「冷蔵庫にケーキがあるよ」という音声がユーザの母親の通信端末（図示せず）から管理サーバ３０に登録されると、対話装置１０が、ユーザの「ただいま」という入力音声に対して、主応答情報を用いて「おかえり」を音声出力して、それに付加して、副応答情報を用いて「お母さんからの伝言だよ。「冷蔵庫にケーキがあるよ」。」を音声出力する、というような高度な応答を行うこともできる。 Furthermore, the management server 30 may be able to register the recorded voice from the communication terminal 70 or another communication terminal (not shown). In this case, the fact that the secondary response information generation unit 19 acquires the registered recorded voice as the secondary response information is also included in the generation or update of the secondary response information. Since the recorded voice is formed as voice data, if it is transmitted to the dialogue apparatus 10 as it is, the voice synthesis process in the dialogue apparatus 10 is not performed. For example, when the voice “There is a cake in the refrigerator” is registered in the management server 30 from the communication terminal (not shown) of the user's mother, the dialogue apparatus 10 responds to the input voice “Tadama”. , "Okaeri" is output by voice using the main response information, and added to it, and the sub-response information is used to say "A message from the mother. There is a cake in the refrigerator." It is also possible to perform an advanced response such as outputting “

（通信端末）
通信端末７０は、通信ネットワーク６０を介して他の装置と通信を行える機器である。通信端末７０は、管理サーバ３０にユーザの生活情報を登録できるように構成されている。通信端末７０としては、ユーザの生活情報を登録するためのソフトウェア（アプリケーション）が内蔵されたタブレット端末やスマートフォン、パーソナルコンピュータ等の汎用機器を想定する。通信端末７０から管理サーバ３０に登録できる生活情報は、生活状態ないし生活環境に関する情報であれば限定されず、例えば、ユーザのスケジュール、住んでいる地域、起床時間、ユーザのよく使用する（例えば、通勤や通学に使用する）路線等の情報が挙げられる。こまた、生活情報は、ユーザが通信端末７０に入力してもよいし、通信端末７０が自動または手動で取得してもよい。例えば、ユーザのよく使用する路線として、使用回数の多い基地局からユーザの通常の行動範囲を把握して、その範囲に含まれる路線を取得してもよい。これらは全て例示である。 (Communication terminal)
The communication terminal 70 is a device that can communicate with other devices via the communication network 60. The communication terminal 70 is configured to register user life information in the management server 30. The communication terminal 70 is assumed to be a general-purpose device such as a tablet terminal, a smartphone, or a personal computer in which software (application) for registering user life information is incorporated. The life information that can be registered in the management server 30 from the communication terminal 70 is not limited as long as it is information on the living state or the living environment. Information such as routes (used for commuting and attending school). In addition, the life information may be input to the communication terminal 70 by the user, or may be acquired automatically or manually by the communication terminal 70. For example, as a route frequently used by the user, a user's normal action range may be grasped from a base station that is frequently used, and a route included in the range may be acquired. These are all examples.

（主応答情報及び副応答情報）
次に、主応答情報及び副応答情報を用いた応答について具体例を図３及び４を参照して説明する。 (Main response information and secondary response information)
Next, a specific example of the response using the main response information and the sub response information will be described with reference to FIGS.

図３の（ａ）は、対話装置１０が取得している、副応答情報の生成または更新に用いられる材料情報の一例を示す。図３の（ａ）は、取得している材料情報には、「晴」を示す天気の情報、「なし」を示す交通の情報、「燃えるごみの日」、「１０時に習い事のピアノ」、「１９時に食事会」を示すスケジュールの情報、「昨日の起床時刻は７時３分」を示す生活ログの情報があることを示している。 (A) of FIG. 3 shows an example of material information used for generating or updating the secondary response information acquired by the interactive apparatus 10. In FIG. 3A, the acquired material information includes weather information indicating “sunny”, traffic information indicating “none”, “burning garbage day”, “piano of learning at 10:00”, This indicates that there is schedule information indicating “dinner at 19:00” and life log information indicating “wake up yesterday is 7: 3”.

ここで、「なし」という交通の情報を取得しているとは、言い換えれば、交通の情報は取得していない、ということである。ケジュールの情報は、上記したように通信端末７０から管理サーバ３０に登録したものを対話装置１０が取得する構成でも、対話装置１０にユーザが直接登録でき、対話装置１０はそれを取得する構成であってもよい。生活ログとは、対話装置１０が取得するユーザの生活情報であり、対話装置１０は生活ログを記録しデータ格納部１４に格納する。あるいは、通信端末７０が生活ログを記録し管理サーバ３０に送信し、対話装置１０は管理サーバ３０から生活ログを取得するという構成であってもよい。 Here, acquiring traffic information of “none” means, in other words, not acquiring traffic information. As described above, the schedule information can be registered directly in the dialog device 10 by the user even if the dialog device 10 acquires the information registered in the management server 30 from the communication terminal 70, and the dialog device 10 acquires the information. There may be. The life log is user life information acquired by the interactive device 10, and the interactive device 10 records the life log and stores it in the data storage unit 14. Alternatively, the communication terminal 70 may record a life log and transmit it to the management server 30, and the dialogue apparatus 10 may acquire the life log from the management server 30.

副応答情報生成部１９は、時刻算出部１７が算出した特定の入力音声（例えば、「おはよう」）の推定入力時刻よりも前に、材料情報を取得しおき、材料情報を基に副応答情報を生成または更新する。ここでは、対話装置１０は、毎日、特定の入力音声である「おはよう」の推定入力時刻よりも前に、材料情報を取得しておき、材料情報を基に副応答情報を生成または更新するものとする。 The secondary response information generation unit 19 acquires the material information before the estimated input time of the specific input voice (for example, “Good morning”) calculated by the time calculation unit 17, and the secondary response information based on the material information. Generate or update Here, the interactive device 10 acquires material information every day before the estimated input time of “good morning”, which is a specific input voice, and generates or updates secondary response information based on the material information. And

図３の（ｃ）は、入力音声を音声認識した結果である音声認識単語に応じた応答内容を示す主応答情報のデータベースの一例である。図３の（ｃ）に示すデータベースでは、さらに、各主応答情報には、副応答情報を付加するか否かを示す情報が対応付けられている。 (C) of FIG. 3 is an example of the database of the main response information which shows the response content according to the speech recognition word which is the result of carrying out the speech recognition of the input speech. In the database shown in FIG. 3C, each main response information is further associated with information indicating whether or not the sub response information is added.

図３の（ｄ）は、主応答情報「おはよう」に付加される副応答情報の一例であり、材料情報を基に生成または更新したものを示している。図３の（ｄ）のように副応答情報が複数ある場合には、副応答情報生成部１９は、副応答情報に優先度を設定する。そして、出力制御部２０は、優先度に従って副応答情報を特定し、特定した副応答情報で示される応答内容を音声出力する。副応答情報は、主応答情報毎に設けられているが、ここでは、「おはよう」以外の主応答情報に付加される副応答情報の例については説明しない。 (D) of FIG. 3 is an example of the sub response information added to the main response information “Good morning”, and shows the information generated or updated based on the material information. When there are a plurality of pieces of sub response information as shown in (d) of FIG. 3, the sub response information generation unit 19 sets a priority for the sub response information. And the output control part 20 specifies subresponse information according to a priority, and outputs the response content shown by the specified subresponse information by audio | voice. Although the secondary response information is provided for each main response information, an example of the secondary response information added to the main response information other than “good morning” will not be described here.

普段とは異なる状況を伝える副音声情報、緊急性を要する内容を伝えるものである副音声情報には、優先度を高く設定する。例えば、交通情報、スケジュール登録、悪天候を材料情報として生成または更新した副音声情報には高い優先度を付ける。本実施の形態では、優先度は１から３まであり、１の方が優先されるものである。 A high priority is set for sub audio information that conveys a different situation than usual and sub audio information that conveys urgent content. For example, high priority is given to sub audio information generated or updated as traffic information, schedule registration, and bad weather as material information. In the present embodiment, the priority is from 1 to 3, with 1 being prioritized.

出力制御部２０は、優先度が１の副応答情報は必ず出力する。また、優先度が２の副応答情報は優先度が１の副応答情報が無い場合に、ランダムに１つ出力する。また、優先度が３の副応答情報は、優先度が１の副応答情報及び優先度が２の副応答情報が無い場合にランダムに出力する。 The output control unit 20 always outputs the sub response information with the priority of 1. Further, when there is no sub response information with a priority of 1, sub response information with a priority of 2 is output at random. The sub-response information with priority 3 is output randomly when there is no sub-response information with priority 1 and sub-response information with priority 2.

この具体例では、図３の（ｂ）に示すように、対話装置１０は、「おはよう」という入力音声に対して、「おはよう」という主応答情報に、優先度が１の「今日は燃えるごみの日だよ」という副応答情報を付加して音声出力する。優先度が１の副応答情報が複数有る場合には、複数出力してもよいし、選択して出力してもよい。 In this specific example, as shown in (b) of FIG. 3, the dialogue apparatus 10 responds to the main response information “Good morning” with respect to the input voice “Good morning”, and “Today's burning garbage with a priority of 1”. The voice response is output with the additional response information saying "It's the day." When there are a plurality of pieces of sub response information with a priority of 1, a plurality of sub response information may be output or selected and output.

別の具体例を図４を用いて説明する。図４の（ａ）は、材料情報の一例、図４の（ｃ）は、入力音声を音声認識した結果である音声認識単語に応じた応答内容を示す主応答情報のデータベースの一例である。図４の（ｄ）は、主応答情報「いってきます」に付加される副応答情報の一例である。この具体例では、図４の（ｂ）に示すように、対話装置１０は、「いってきます」という入力音声に対して、「いってらっしゃい」という主応答情報に、「傘忘れていない？」という副応答情報を付加して音声出力する。 Another specific example will be described with reference to FIG. 4A is an example of material information, and FIG. 4C is an example of a database of main response information indicating response contents according to a voice recognition word that is a result of voice recognition of an input voice. FIG. 4D is an example of sub response information added to the main response information “I will come”. In this specific example, as shown in FIG. 4B, the dialogue apparatus 10 responds to the input response “I will come” with the main response information “Come on”, “Did you forget your umbrella?” Is added with the sub-response information.

〔実施の形態２〕
以下では、本発明の別の実施の形態の対話装置１０ａについて図５〜７を用いて説明する。なお説明の便宜上、実施の形態１にて説明した部材と同じ機能を有する部材については、同じ符号を付記し、その説明を省略する。 [Embodiment 2]
Below, the interactive apparatus 10a of another embodiment of this invention is demonstrated using FIGS. For convenience of explanation, members having the same functions as those explained in the first embodiment are given the same reference numerals and explanations thereof are omitted.

対話装置１０の提供するサービス（対話装置１０の動作）は主に対話であったが、対話装置１０ａは、対話以外にも、ユーザに各種サービスを提供可能に設けられている。そのため、対話装置１０ａは、提供可能なサービス毎の動作モードを有している。対話装置１０ａが提供可能なサービスとしては、例えば、対話、家電の操作、ユーザの生活情報の記録、ユーザへの音声アドバイスが挙げられるが、こられに限定されない。対話は、ユーザからの入力音声に対して行うものであるが、ユーザへの音声アドバイスとは、ユーザからの入力音声が無くても、対話装置１０ａから自発的に音声出力（発話）される情報である。対話装置１０ａによるサービスの提供については後に具体例を用いて説明する。 Although the service provided by the dialog device 10 (operation of the dialog device 10) is mainly a dialog, the dialog device 10a is provided so as to be able to provide various services to the user in addition to the dialog. Therefore, the dialogue apparatus 10a has an operation mode for each service that can be provided. Examples of services that can be provided by the interactive device 10a include, but are not limited to, dialogue, operation of home appliances, recording of user life information, and voice advice to the user. The dialogue is performed with respect to the voice input from the user. The voice advice to the user is information that is spontaneously voiced (spoken) from the dialogue device 10a even if there is no voice input from the user. It is. The provision of service by the interactive device 10a will be described later using a specific example.

対話装置１０ａは、図５に示すように、ユーザ宅４０にある家電を赤外線通信や無線ＬＡＮ通信などで操作可能に設けられている。家電は、例えば、空気調和機（エアコン）、洗濯機、冷蔵庫、調理器具、照明装置、給湯機器、撮影機器、各種ＡＶ（Audio-Visual）機器、各種家庭用ロボット（例えば、掃除ロボット、家事支援ロボット、動物型ロボット等）等である。以下では、対話装置１０ａが操作できる家電として、エアコン５０−１、テレビ５０−２、冷蔵庫５０−３を用いて説明を行うが、操作対象の家電はこれらに限定されない。 As shown in FIG. 5, the interactive device 10 a is provided so that home appliances in the user's home 40 can be operated by infrared communication or wireless LAN communication. Home appliances are, for example, air conditioners (air conditioners), washing machines, refrigerators, cooking utensils, lighting devices, hot water supply equipment, photographing equipment, various AV (Audio-Visual) equipment, various household robots (for example, cleaning robots, housework support) Robot, animal type robot, etc.). Below, although demonstrated using the air-conditioner 50-1, the television 50-2, and the refrigerator 50-3 as a household appliance which the dialogue apparatus 10a can operate, the household appliance of operation object is not limited to these.

対話装置１０ａは、図６に示すように、実施の形態１の対話装置１０の構成に加え、動作部２２及び状態検知部２４を備えている。動作部２２は、対話装置１０ａの各種動作を実行するブロックである。状態検知部２４は、ユーザないしその周囲の状態を検知する装置であればよく、例えば、人感センサ、撮像部（カメラ）、温度センサ等が挙げられる。しかし、これらに限定されない。 As shown in FIG. 6, the dialogue apparatus 10 a includes an operation unit 22 and a state detection unit 24 in addition to the configuration of the dialogue apparatus 10 according to the first embodiment. The operation unit 22 is a block that executes various operations of the interactive apparatus 10a. The state detection unit 24 may be any device that detects a user or a surrounding state, and examples thereof include a human sensor, an imaging unit (camera), and a temperature sensor. However, it is not limited to these.

また、対話装置１０ａの制御部１３ａは、制御部１３と同様の機能に加え、モード設定部（モード設定手段）２３としての機能を有する。モード設定部２３は、音声入力部１１から入力された入力音声の音声認識の結果に基づき提供するサービスを決定し、決定したサービスを提供する動作モードに対話装置１０ａを設定する。よって、対話装置１０ａは、ユーザとの対話から、例えば、エアコン５０−１を操作したいことを類推した場合には、エアコン５０−１を操作する動作モードに対話装置１０ａを設定し、操作を行うことが可能となる。 The control unit 13 a of the interactive apparatus 10 a has a function as a mode setting unit (mode setting unit) 23 in addition to the same function as the control unit 13. The mode setting unit 23 determines a service to be provided based on the result of speech recognition of the input speech input from the speech input unit 11, and sets the interactive apparatus 10a to an operation mode that provides the determined service. Therefore, for example, when it is inferred that the user wants to operate the air conditioner 50-1 from the dialog with the user, the dialog apparatus 10a sets the dialog apparatus 10a to the operation mode for operating the air conditioner 50-1, and performs the operation. It becomes possible.

また、対話装置１０ａのデータ格納部１４ａは、モード情報格納部１４３を含み、モード情報格納部１４３には、サービス毎に、そのサービスを提供する動作モードに対話装置１０ａを設定するための情報が格納されている。 Further, the data storage unit 14a of the interactive device 10a includes a mode information storage unit 143. The mode information storage unit 143 stores information for setting the interactive device 10a in an operation mode for providing the service for each service. Stored.

対話装置１０ａから家電を操作する際には、赤外線を用いて家電の位置を検出してもよいし、状態検知部２４が撮像部を有している場合には、この撮像部が取得した情報で家電の位置を検出してもよい。 When operating the home appliance from the interactive apparatus 10a, the position of the home appliance may be detected using infrared rays, and when the state detection unit 24 has an imaging unit, information acquired by the imaging unit You may detect the position of home appliances.

実施の形態の対話システムは、図２の対話装置１０が対話装置１０ａに置き換わったものである。実施の形態の対話システムは、さらに、エアコン５０−１、テレビ５０−２、及び冷蔵庫５０−３が通信ネットワーク６０に接続しており、管理サーバ３０が、これら家電からの情報を取得する構成であってもよい。この場合に管理サーバ３０が取得する情報としては、例えば、エアコン５０−１、テレビ５０−２、及び冷蔵庫５０−３の、設定状況、動作状況を示す情報、周囲環境の情報が挙げられる。管理サーバ３０は、これら家電から取得した情報のうちユーザの生活情報、例えば、エアコン５０−１のＯＮ／ＯＦＦや設定温度の情報、冷蔵庫５０−３を開ける回数の情報、テレビ５０−２のＯＮ／ＯＦＦの情報を、対話装置１０ａに送信する。 The dialogue system of the embodiment is obtained by replacing the dialogue device 10 in FIG. 2 with a dialogue device 10a. In the dialogue system of the embodiment, the air conditioner 50-1, the television 50-2, and the refrigerator 50-3 are further connected to the communication network 60, and the management server 30 acquires information from these home appliances. There may be. Information acquired by the management server 30 in this case includes, for example, information indicating the setting status and operation status of the air conditioner 50-1, the television 50-2, and the refrigerator 50-3, and information on the surrounding environment. Of the information acquired from the home appliances, the management server 30 includes user life information, for example, ON / OFF of the air conditioner 50-1 and set temperature information, information on the number of times the refrigerator 50-3 is opened, and ON of the TV 50-2. / OFF information is transmitted to the dialogue apparatus 10a.

このような構成であると、対話装置１０ａは、エアコン５０−１、テレビ５０−２、及び冷蔵庫５０−３から得た生活情報も推定入力時刻の算出及び副応答情報の生成または更新に利用することができる。よって、この場合、例えば、「いってきます」という入力音声に対して、主応答情報で出力される音声である「いってらっしゃい。」に、副応答情報で出力される音声である「エアコンとテレビが点いているので消してね。」を付加することができる。 With such a configuration, the dialogue apparatus 10a also uses life information obtained from the air conditioner 50-1, the television 50-2, and the refrigerator 50-3 to calculate the estimated input time and to generate or update the secondary response information. be able to. Therefore, in this case, for example, in response to the input voice “I will come”, “Come here” is the voice output as the main response information, and “Air conditioner and TV” is the voice output as the secondary response information. Can be added. "

なお、管理サーバ３０を介さず、エアコン５０−１、テレビ５０−２、及び冷蔵庫５０−３から直接対話装置１０ａに生活情報を送信する構成であってもよい。この場合、エアコン５０−１、テレビ５０−２、及び冷蔵庫５０−３が、対話装置１０ａに生活情報を提供する情報提供装置である。 In addition, the structure which transmits life information directly to the dialogue apparatus 10a from the air conditioner 50-1, the television 50-2, and the refrigerator 50-3 without using the management server 30 may be used. In this case, the air conditioner 50-1, the television 50-2, and the refrigerator 50-3 are information providing devices that provide life information to the interactive device 10a.

また、冷蔵庫５０−３が音声録音及び再生機能を有している場合、「ただいま」という入力音声に対して、主応答情報で出力される音声である「おかえり。」に、副応答情報で出力される音声である「冷蔵庫さんがお母さんの伝言を聞いているよ。」を付加することができる。この場合、冷蔵庫の伝言が再生されるまで、一定時間ごとに、副応答情報で出力される音声の出力を繰り返してもよい。冷蔵庫５０−３への音声録音は、直接行う構成であっても、実施の形態１に記載のように、管理サーバ３０介して行う構成でもよい。 In addition, when the refrigerator 50-3 has a voice recording and reproduction function, “Okaeri”, which is a voice output as main response information, is output as auxiliary response information with respect to an input voice “now”. "The refrigerator is listening to the mother's message" can be added. In this case, until the message of the refrigerator is reproduced, the output of the sound output as the secondary response information may be repeated at regular intervals. The voice recording to the refrigerator 50-3 may be performed directly or may be performed via the management server 30 as described in the first embodiment.

次に、対話装置１０ａの動作の具体例について、図７を用いて説明する。 Next, a specific example of the operation of the interactive apparatus 10a will be described with reference to FIG.

例えば、「おはよう」という入力音声を音声認識すると、生活状態が「起床」であると把握して、生活ログとして「起床時間」を記録しデータ格納部１４に格納する。なお、生活ログとは、対話装置１０ａが取得するユーザの生活情報である。この「起床時間」の過去の記録を基に平均起床時刻を算出ることで、実施の形態１で記載したように、「おはよう」という入力音声の推定入力時刻の算出ができる。 For example, when an input voice of “Good morning” is recognized as a voice, it is understood that the living state is “wake up”, and “wake up time” is recorded as a life log and stored in the data storage unit 14. The life log is user life information acquired by the dialogue apparatus 10a. By calculating the average wake-up time based on the past record of the “wake-up time”, it is possible to calculate the estimated input time of the input voice “Good morning” as described in the first embodiment.

さらに、このとき対話装置１０ａは、対話装置１０ａの動作モードを、スリープモード（またはセキュリティモード）から復帰させ、例えば、音声出力を最小限に抑えた対話モードに変更する。これは、起床時ユーザは忙しいことが多いための配慮である。 Further, at this time, the dialogue apparatus 10a returns the operation mode of the dialogue apparatus 10a from the sleep mode (or security mode), and changes the dialogue mode to, for example, the dialogue mode in which the voice output is minimized. This is because the user is often busy when getting up.

対話装置１０ａは、起床時に必要な情報（例えば、天気やニュースの情報）を副応答情報として出力する。例えば、主応答情報に応じて「おはよう。」を、副応答情報に応じて「今日は晴れだよ。」を音声出力する。 The dialogue apparatus 10a outputs information necessary for waking up (for example, weather and news information) as auxiliary response information. For example, “Good morning” is output according to the main response information, and “Today is sunny” according to the sub response information.

また、起床平均時刻と今回記録した「起床時刻」を比較し、例えば、「早起きだね。」や「遅刻するよ。」を副応答情報として出力してもよい。また、副応答情報生成部１９は、例えば、「早起きだね」の副応答情報には、起床平均時刻よりも前の所定時間になると倒れるフラグを付けておき、出力制御部２０はフラグが倒れた副応答情報は出力しないようになっていてもよい。これは、起床平均時刻の直前や起床平均時刻の後に「早起きだね」が出力されないための処置である。 Further, the average wake-up time may be compared with the “wake-up time” recorded this time, and for example, “You wake up early” or “I'll be late” may be output as auxiliary response information. Further, for example, the auxiliary response information generation unit 19 adds a flag that falls when the predetermined response time before the wake-up average time is reached to the auxiliary response information of “I'm getting up early”, and the output control unit 20 falls down The secondary response information may not be output. This is a measure for preventing the user from “getting up early” immediately before the average wake-up time or after the average wake-up time.

また、起床平均時刻から、この時刻以前に、生活情報を取得して副応答情報の生成または更新を行ったり、エアコン５０−１の運転を行ったりする。また、起床平均時刻を所定時刻経過しても入力音声「おはよう」を受信しないと、例えば、「もう朝だよ、起きなくていいの？」を音声アドバイスとして音声出力して通知する。 In addition, from the average wake-up time, living information is acquired and sub-response information is generated or updated before this time, or the air conditioner 50-1 is operated. Also, if the input voice “Good morning” is not received even after a predetermined time has elapsed from the average wake-up time, for example, “Okay morning, do you need to wake up?” Is output as voice advice and notified.

同様に、例えば、「いただきます」あるいは「ごちそうさま」という入力音声を音声認識すると、生活状態が「食事（朝食）」であると把握して、生活ログとして「食事回数」を記録しデータ格納部１４に格納する。この場合、「食事回数」のデータを参照して、食べていない日があれば、例えば「朝ごはん食べた方がいいよ。」を音声アドバイスとして音声出力して通知する。 Similarly, for example, when the input voice of “you receive” or “feast” is recognized, the life state is “meal (breakfast)” and “meals” is recorded as a life log and the data storage unit 14. In this case, with reference to the data of “number of meals”, if there is a day when the user has not eaten, for example, “You should eat breakfast” is output as voice advice and notified.

これらのように、対話装置１０ａは、入力音声の音声認識の結果に基づき、各種サービスを提供することができる。よって、ユーザは、対話装置に話し掛けるだけでサービスの提供を受けることができ、快適な生活環境を享受できる。 As described above, the dialogue apparatus 10a can provide various services based on the result of voice recognition of the input voice. Therefore, the user can receive a service simply by talking to the interactive device, and can enjoy a comfortable living environment.

〔実施の形態３〕
実施の形態１及び２にて説明した対話装置１０及び１０ａは、それぞれ、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ＣＰＵ（Central Processing Unit）を用いてソフトウェアによって実現してもよい。 [Embodiment 3]
The interactive devices 10 and 10a described in the first and second embodiments may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or may be a CPU (Central Processing Unit). It may be realized by software using

後者の場合、対話装置１０及び１０ａは、それぞれ、各機能を実現するソフトウェアであるプログラムの命令を実行するＣＰＵ、上記プログラム及び各種データがコンピュータ（またはＣＰＵ）で読み取り可能に記録されたＲＯＭ（Read Only Memory）または記憶装置（これらを「記録媒体」と称する）、上記プログラムを展開するＲＡＭ（Random Access Memory）等を備えている。そして、コンピュータ（またはＣＰＵ）が上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記記録媒体としては、「一時的でない有形の媒体」、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路等を用いることができる。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体（通信ネットワークや放送波等）を介して上記コンピュータに供給されてもよい。なお、本発明は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 In the latter case, each of the dialogue apparatuses 10 and 10a includes a CPU that executes instructions of a program that is software that realizes each function, and a ROM (Read that records the above program and various data so that the computer (or CPU) can read them. Only Memory) or a storage device (these are referred to as “recording media”), a RAM (Random Access Memory) for expanding the program, and the like. And the objective of this invention is achieved when a computer (or CPU) reads the said program from the said recording medium and runs it. As the recording medium, a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. The program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program. The present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.

本発明は上述した各実施の形態に限定されるものではなく、種々の変更が可能であり、異なる実施の形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施の形態についても本発明の技術的範囲に含まれる。さらに、各実施の形態にそれぞれ開示された技術的手段を組み合わせることにより、新しい技術的特徴を形成することができる。 The present invention is not limited to the above-described embodiments, and various modifications are possible, and the present invention also relates to embodiments obtained by appropriately combining technical means disclosed in different embodiments. Is included in the technical scope. Furthermore, a new technical feature can be formed by combining the technical means disclosed in each embodiment.

〔まとめ〕
本発明の態様１に係る対話装置（１０）は、入力音声を音声認識する音声認識手段（音声認識部１６）と、上記音声認識の結果に応じた応答内容を示す主応答情報、及び、当該主応答情報が示す応答内容に付加される応答内容を示す副応答情報を格納する応答情報格納部（１４１）と、上記入力音声が入力される時刻を推定する時刻算出手段（時刻算出部１７）と、上記推定された時刻である推定入力時刻よりも前に、上記副応答情報の生成または更新に用いる材料情報を取得して上記副応答情報を生成または更新する副応答情報生成手段（副応答情報生成部１９）と、上記入力音声が入力されると、上記応答情報格納部を参照して得られる上記主応答情報が示す応答内容と共に上記副応答情報が示す応答内容を音声出力する出力制御手段（出力制御部２０）と、を備えている。 [Summary]
An interactive device (10) according to aspect 1 of the present invention includes a speech recognition unit (speech recognition unit 16) for recognizing input speech, main response information indicating response contents according to the result of speech recognition, Response information storage unit (141) for storing sub response information indicating the response content added to the response content indicated by the main response information, and time calculating means (time calculation unit 17) for estimating the time when the input voice is input And sub-response information generation means (sub-response) for acquiring material information used for generating or updating the sub-response information and generating or updating the sub-response information before the estimated input time which is the estimated time. When the information generation unit 19) and the input voice are input, output control for outputting the response content indicated by the sub-response information together with the response content indicated by the main response information obtained by referring to the response information storage unit means( Includes a force control unit 20), the.

上記構成によると、入力音声の推定入力時刻よりも前に、副応答情報を生成または更新し、入力音声が入力されると、主応答情報が示す応答内容の共に上記副応答情報が示す応答内容を音声出力する。このように、入力音声に対して、主応答情報が示す応答内容の音声出力に、副応答情報が示す応答内容の音声出力を付加できるので、複数の情報での応答が可能である。また、副応答情報は、入力音声の推定入力時刻よりも前に生成または更新されるので、変化に富んだ応答が可能である。このように、上記構成によると、複数の情報をスムーズに音声出力でき、ユーザにストレスを与えることなく快適な対話環境を提供できる。 According to the above configuration, the secondary response information is generated or updated before the estimated input time of the input voice, and when the input voice is input, the response content indicated by the secondary response information together with the response content indicated by the main response information Is output as audio. As described above, since the voice output of the response content indicated by the sub-response information can be added to the voice output of the response content indicated by the main response information with respect to the input voice, a response with a plurality of information is possible. Moreover, since the secondary response information is generated or updated before the estimated input time of the input voice, a response rich in change is possible. Thus, according to the above configuration, a plurality of information can be smoothly output as a voice, and a comfortable interactive environment can be provided without causing stress to the user.

また、副応答情報の生成または更新に用いられる材料情報を外部から取得する場合、入力音声の入力時に外部との通信が途絶えていても、入力音声の推定入力時刻より前に副応答情報を生成または更新するので、この生成または更新後に入力音声が入力されると、生成または更新された副応答情報をユーザに提供することができる。また、入力音声の入力時には、副応答情報も応答情報格納部を参照して得るため、副応答情報を入力時に生成または更新したり外部から受信したりする装置よりも、すばやい応答（音声出力）が可能である。 In addition, when acquiring material information used for generating or updating secondary response information from outside, secondary response information is generated before the estimated input time of input speech even if communication with the outside is interrupted when input speech is input. Alternatively, since the update is performed, when the input voice is input after the generation or the update, the generated or updated side response information can be provided to the user. Further, since the secondary response information is also obtained by referring to the response information storage unit when the input voice is input, the response (voice output) is quicker than the device that generates or updates the secondary response information at the time of input or receives it from the outside. Is possible.

本発明の態様２に係る対話装置では、上記態様１において、上記時刻算出手段は、上記入力音声の過去の入力時刻の情報、または、ユーザの生活状態ないし生活環境に関する生活情報を基に上記推定入力時刻を算出する。 In the interactive apparatus according to aspect 2 of the present invention, in the aspect 1, the time calculation means estimates the time based on information on the past input time of the input speech or life information on a user's life state or living environment. Calculate the input time.

上記構成によると、入力音声の過去の入力時刻の情報、または、ユーザの生活状態ないし生活環境に関する生活情報によって算出された推定入力時刻よりも前に、副応答情報が生成または更新される。入力音声の過去の入力時刻の情報またはユーザの生活情報を用いることで、副応答情報の生成または更新の時期をユーザの生活パターンに則したものとすることができる。 According to the above configuration, the auxiliary response information is generated or updated before the estimated input time calculated based on the past input time information of the input voice or the life information on the user's living state or living environment. By using the past input time information of the input voice or the user's life information, it is possible to make the timing of generation or update of the sub-response information in accordance with the user's life pattern.

本発明の態様３に係る対話装置では、上記態様１または２において、上記材料情報は、上記入力音声の過去の入力時刻の情報、または、ユーザの生活状態ないし生活環境に関する生活情報である。 In the dialog device according to aspect 3 of the present invention, in the above aspect 1 or 2, the material information is information on the past input time of the input voice or life information on a user's living state or living environment.

上記構成によると、入力音声の過去の入力時刻の情報、または、ユーザの生活状態ないし生活環境に関する生活情報に基づき副応答情報が生成または更新される。よって、入力音声の過去の入力時刻の情報またはユーザの生活情報を用いることで、副応答情報の応答内容を、例えば、ユーザのよく口にする音声や生活パターンに沿ったものとすることができる。よって、副応答情報として、ユーザにとって有益な情報を提供することが可能になる。 According to the above configuration, the secondary response information is generated or updated based on the past input time information of the input voice or the life information on the user's living state or living environment. Therefore, by using the past input time information of the input voice or the life information of the user, the response content of the secondary response information can be set in accordance with, for example, the voice or life pattern often spoken by the user. . Therefore, it is possible to provide useful information for the user as the secondary response information.

生活情報は、ユーザの生活状態ないし生活環境に関する情報であればどのような情報でもよく、例えば、対話装置が設置された地域の天気や交通に関する情報、ユーザの生活パターンに関する情報、ユーザの健康に関する情報が挙げられる。 The living information may be any information as long as it is information related to the user's living state or living environment. For example, information related to the weather and traffic in the area where the interactive device is installed, information related to the user's life pattern, and information related to the user's health Information.

上記生活情報は、声認識手段による上記入力音声の音声認識の結果であってもよい。また、外部装置から受信した情報、あるいは、ユーザないしその周囲の状態を検知する自装置が有する状態検知部が検知した情報であってもよい。また、これらの情報の組み合わせであってもよい。自装置が有する状態検知部とは、ユーザないしその周囲の状態を検知することができる装置であればよく、例えば、人感センサ、カメラ、温度センサ等が挙げられる。しかし、これらに限定されない。 The life information may be a result of speech recognition of the input speech by voice recognition means. Further, it may be information received from an external device, or information detected by a state detection unit included in the user's own device that detects the state of the user or its surroundings. Moreover, the combination of these information may be sufficient. The state detection unit included in the device itself may be any device that can detect the user or the surrounding state, and examples thereof include a human sensor, a camera, and a temperature sensor. However, it is not limited to these.

本発明の態様４に係る対話装置では、上記態様１から３のいずれか１つにおいて、上記時刻算出手段は、上記入力音声のうちの特定の入力音声の推定入力時刻を算出し、上記副応答情報生成手段は、上記特定の入力音声の推定入力時刻よりも前に全ての上記副応答情報を生成または更新する。 In the interactive device according to aspect 4 of the present invention, in any one of the aspects 1 to 3, the time calculation means calculates an estimated input time of a specific input voice among the input voices, and the sub response The information generating means generates or updates all the sub response information before the estimated input time of the specific input voice.

上記構成によると、特定の入力音声の音声認識の結果に応じた応答内容を示す主応答情報に付加される副応答情報だけでなく、全ての副応答情報を、特定の入力音声の推定入力時刻よりも前に生成または更新することができる。このように、特定の入力音声の推定入力時刻を用いることで、例えば、「おはよう」という入力音声の推定入力時刻よりも前に、つまり、ユーザの一日の始まりと推定される時刻前に、全ての副音声情報を生成または更新することができる。ユーザは、一日毎に生成または更新された副応答情報が示す応答内容の音声を聞くことができる。 According to the above configuration, not only the secondary response information added to the primary response information indicating the response content according to the result of the speech recognition of the specific input speech, but also all the secondary response information is estimated input time of the specific input speech. Can be generated or updated before. In this way, by using the estimated input time of a specific input voice, for example, before the estimated input time of the input voice of “Good morning”, that is, before the time estimated to be the start of the user's day, All sub audio information can be generated or updated. The user can hear the voice of the response content indicated by the side response information generated or updated every day.

本発明の態様５に係る対話装置では、上記態様１から４のいずれか１つにおいて、副応答情報生成手段は、上記副応答情報に優先度を設定し、上記応答情報格納部が優先度を設定された副応答情報を複数格納している場合、上記出力制御手段は、上記優先度に従って副応答情報を特定し、当該特定した副応答情報で示される応答内容を音声出力する。 In the interactive device according to aspect 5 of the present invention, in any one of the aspects 1 to 4, the secondary response information generating means sets a priority to the secondary response information, and the response information storage unit sets the priority. When a plurality of set sub response information are stored, the output control means specifies the sub response information according to the priority, and outputs the response content indicated by the specified sub response information by voice.

上記構成によると、副応答音声が複数ある場合、優先度に従った音声出力を行うことが可能となる。 According to the above configuration, when there are a plurality of auxiliary response voices, it is possible to perform voice output according to the priority.

本発明の態様６に係る対話装置では、上記態様１から５のいずれか１つにおいて、自装置が提供可能なサービス毎の動作モードを有しており、上記音声認識手段による上記入力音声の音声認識の結果に基づき提供するサービスを決定し、当該決定したサービスを提供する動作モードに自装置を設定するモード設定手段（モード設定部２３）をさらに備えている。 In the dialogue apparatus according to aspect 6 of the present invention, in any one of the aspects 1 to 5, the communication apparatus has an operation mode for each service that can be provided by the own apparatus, and the voice of the input voice by the voice recognition unit is provided. It further includes mode setting means (mode setting unit 23) for determining a service to be provided based on the recognition result and setting the own apparatus in an operation mode for providing the determined service.

上記構成によると、提供するサービスの決定は、音声認識手段による入力音声の音声認識の結果に基づき行うことができる。そして、対話装置は、提供するサービスが決定されると、そのサービスを提供する動作モードに自装置を設定し、サービスの提供を行うことが可能となる。提供するサービスの例としては、例えば、対話、家電の操作、ユーザの生活情報の記録、ユーザへの音声アドバイスが挙げられる。対話は、ユーザからの入力音声に対して行うものであるが、ユーザへの音声アドバイスとは、ユーザからの入力音声が無くても、対話装置から自発的に音声出力される情報であるとする。このように、ユーザは、対話装置に話し掛けるだけでサービスの提供を受けることができ、快適な生活環境を享受できる。 According to the above configuration, the service to be provided can be determined based on the result of speech recognition of the input speech by the speech recognition means. When the service to be provided is determined, the interactive apparatus can set the own apparatus to an operation mode for providing the service and provide the service. Examples of services to be provided include dialogue, operation of home appliances, recording of user life information, and voice advice to the user. The dialogue is performed on the input voice from the user, and the voice advice to the user is information that is spontaneously outputted from the dialogue device even if there is no input voice from the user. . In this way, the user can receive a service simply by talking to the interactive device, and can enjoy a comfortable living environment.

さらに、モード設定手段は、外部装置から受信した情報、あるいは、ユーザないしその周囲の状態を検知する自装置が有する状態検知部が検知した情報等に基づき提供するサービスを決定してもよい。 Further, the mode setting means may determine a service to be provided based on information received from an external device, information detected by a state detection unit included in the user's own device or the surrounding device, and the like.

本発明の態様７に係る対話システムは、上記態様１から６のいずれか１つに記載の対話装置と、上記材料情報を提供する情報提供装置とが通信ネットワークを介して接続されて構成されている。 An interactive system according to an aspect 7 of the present invention is configured by connecting the interactive apparatus according to any one of the above aspects 1 to 6 and the information providing apparatus that provides the material information via a communication network. Yes.

上記対話システムによると、ユーザによる入力音声に対してスムーズな応答が可能であり、ユーザにストレスを与えることなく快適な対話環境を提供できる。 According to the dialog system, it is possible to respond smoothly to the voice input by the user, and it is possible to provide a comfortable dialog environment without stressing the user.

また、本発明の態様８に係る情報提供装置は、上記態様７の対話システムに備えられる情報提供装置である。 Moreover, the information provision apparatus which concerns on aspect 8 of this invention is an information provision apparatus with which the interactive system of the said aspect 7 is equipped.

上記情報提供装置を用いることで、上記態様８の対話システムを構築することができる。 By using the information providing apparatus, the dialog system according to aspect 8 can be constructed.

また、本発明の各態様に係る対話装置、情報提供装置または対話システムは、コンピュータによって実現してもよく、この場合には、コンピュータを対話装置、情報提供装置または対話システムが備える各手段として動作させることにより対話装置、情報提供装置または対話システムをコンピュータにて実現させるプログラム、及びそれを記録したコンピュータ読み取り可能な記録媒体も本発明の範疇に入る。 In addition, the interactive apparatus, the information providing apparatus, or the interactive system according to each aspect of the present invention may be realized by a computer. In this case, the computer operates as each unit included in the interactive apparatus, the information providing apparatus, or the interactive system. Accordingly, a program for realizing a dialogue apparatus, an information providing apparatus or a dialogue system on a computer, and a computer-readable recording medium on which the program is recorded also fall within the scope of the present invention.

本発明は、ユーザの音声を認識して応答する対話装置等に利用可能である。 The present invention can be used for an interactive device that recognizes and responds to a user's voice.

１０，１０ａ対話装置
１１音声入力部
１２音声出力部
１３，１３ａ制御部
１４，１４ａデータ格納部
１５通信部
１６音声認識部（音声認識手段）
１７時刻算出部（時刻算出手段）
１８材料情報取得部
１９副応答情報生成部（副応答情報生成部）
２０出力制御部（出力制御手段）
２２動作部
２３モード設定部（モード設定手段）
３０管理サーバ（外部装置、情報提供装置）
３１−１，３１−２情報提供サーバ（外部装置、情報提供装置）
４０ユーザ宅
５０−１エアコン
５０−２テレビ
５０−３冷蔵庫
１００対話システム
１４１応答情報格納部
１４３モード情報格納部 DESCRIPTION OF SYMBOLS 10, 10a Dialogue device 11 Voice input part 12 Voice output part 13, 13a Control part 14, 14a Data storage part 15 Communication part 16 Voice recognition part (voice recognition means)
17 Time calculation unit (time calculation means)
18 Material information acquisition unit 19 Sub response information generation unit (sub response information generation unit)
20 Output control unit (output control means)
22 Operation part 23 Mode setting part (mode setting means)
30 Management server (external device, information providing device)
31-1, 31-2 Information providing server (external device, information providing device)
40 User's house 50-1 Air conditioner 50-2 Television 50-3 Refrigerator 100 Dialog system 141 Response information storage unit 143 Mode information storage unit

Claims

Speech recognition means for recognizing input speech,
A response information storage unit for storing main response information indicating response content according to the result of the voice recognition, and sub response information indicating response content added to the response content indicated by the main response information;
Time calculation means for estimating the time when the input voice is input;
Prior to the estimated input time which is the estimated time, sub-response information generating means for obtaining or updating the sub-response information by acquiring material information used for generating or updating the sub-response information;
Output control means for outputting the response content indicated by the sub-response information together with the response content indicated by the main response information obtained by referring to the response information storage unit when the input voice is input; An interactive device characterized by the above.

The said time calculation means calculates the said estimated input time based on the information of the past input time of the said input audio | voice, or the lifestyle information regarding a user's life state or living environment. Interactive device.

3. The dialogue apparatus according to claim 1, wherein the material information is information on a past input time of the input voice or life information on a user's life state or living environment.

The secondary response information generating means sets a priority for the secondary response information,
When the response information storage unit stores a plurality of sub-response information with priorities set, the output control means specifies the sub-response information according to the priority, and the response indicated by the specified sub-response information 4. The interactive apparatus according to claim 1, wherein the content is output as a voice.

It has an operation mode for each service that the device can provide,
The apparatus further comprises mode setting means for determining a service to be provided based on a result of voice recognition of the input voice by the voice recognition means, and setting the own apparatus in an operation mode for providing the determined service. Item 5. The interactive device according to any one of Items 1 to 4.