JP2015064450A

JP2015064450A - Information processing device, server, and control program

Info

Publication number: JP2015064450A
Application number: JP2013197452A
Authority: JP
Inventors: 貴裕井上; Takahiro Inoue
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2013-09-24
Filing date: 2013-09-24
Publication date: 2015-04-09
Anticipated expiration: 2033-09-24
Also published as: JP6265670B2

Abstract

PROBLEM TO BE SOLVED: To surely present a previous phrase to a user without leaving it unpresented.SOLUTION: An interactive robot (100) includes: a reply acquisition unit (14) that obtains a phrase (6a) corresponding to a recognition result (4a) of a voice (1a); and a voice output unit (16) that, if a phrase (6b) is newly obtained before the phrase (6a) is presented and the phrase (6a) needs to be presented to a user, presents the phrase (6b) and then presents the phrase (6a).

Description

本発明は、ユーザが発した音声に応じて、当該ユーザに所定のフレーズを提示する情報処理装置等に関するものである。 The present invention relates to an information processing apparatus or the like that presents a predetermined phrase to a user according to a voice uttered by the user.

人間とロボットとが対話可能な対話システムが、従来から広く研究されている。例えば、下記の特許文献１には、より自然にユーザとの対話を継続、発展させることのできる対話型情報システムが開示されている。また、下記の特許文献２には、焦点対話シナリオから補助対話シナリオへシナリオ遷移が生じた際の応答様式の連続性を保持する対話方法、対話装置が開示されている。 2. Description of the Related Art Dialog systems that allow humans and robots to interact have been extensively studied. For example, Patent Document 1 below discloses an interactive information system that can continue and develop a dialog with a user more naturally. Patent Document 2 below discloses a dialogue method and a dialogue apparatus that maintain continuity of response style when a scenario transition occurs from a focal dialogue scenario to an auxiliary dialogue scenario.

ここで、上記の特許文献１および２に開示された技術をはじめとして、従来技術においては、あくまでも「質問・回答サービス」（質問に対するロボットからの回答が終了するまで、ユーザは待機するであろうことが想定されるもの）における一問一答のコミュニケーションが前提とされている。 Here, in the prior art including the techniques disclosed in Patent Documents 1 and 2 above, the “question / answer service” (the user will wait until the answer from the robot to the question is completed). It is premised on one-by-one communication.

特開２００６−１７１７１９号公報（２００６年６月２９日公開）JP 2006-171719 A (released June 29, 2006) 特開２００７−０７９３９７号公報（２００７年３月２９日公開）JP 2007-079397 A (published March 29, 2007)

上記対話システムにおいては、ユーザからロボットへの先の呼びかけ（質問）に対する先の回答が遅延することにより、当該先の回答と後の呼びかけに対する後の回答とが交錯する現象が起こり得るが、上記従来技術においては、上記前提（先の回答が未提示であるうちは、ユーザは後の呼びかけを行わない）により当該現象を無視できる。 In the above interactive system, a delay in the previous answer to the previous call (question) from the user to the robot may cause a phenomenon in which the previous answer and the later answer to the subsequent call are mixed. In the prior art, this phenomenon can be ignored by the above assumption (the user does not make a later call while the previous answer is not presented).

一方で、人間らしいコミュニケーションを前提とする「通常コミュニケーション」（質問に対するロボットからの回答が未提示であっても、ユーザは次の回答を要求するであろうことが想定されるもの）では、上記現象を無視できない。当該現象が起こった場合は、上記先の回答がユーザに未提示のままになるという問題が生じ得る。 On the other hand, in the case of “normal communication” that assumes human-like communication (that is, it is assumed that the user will request the next answer even if the answer from the robot to the question is not presented), the above phenomenon Cannot be ignored. When this phenomenon occurs, there may arise a problem that the above-mentioned answer remains unpresented to the user.

本発明は、上記の問題点に鑑みてなされたものであり、その目的は、先の呼びかけに対する先のフレーズ（回答）と、後の呼びかけに対する後のフレーズとが交錯する場合であっても、上記先のフレーズを未提示のままにすることなく、確実にユーザに提示できる情報処理装置等を提供することである。 The present invention has been made in view of the above problems, and its purpose is that even when the previous phrase (answer) for the previous call and the subsequent phrase for the subsequent call are interlaced, An object of the present invention is to provide an information processing apparatus and the like that can be reliably presented to the user without leaving the previous phrase unpresented.

上記の課題を解決するために、本発明の一態様に係る情報処理装置は、ユーザが発した音声に応じて、当該ユーザに所定のフレーズを提示する情報処理装置であって、前記音声が認識された結果に対応付けられた第１のフレーズを取得する取得手段と、前記第１のフレーズを提示する前に、当該第１のフレーズとは異なる第２のフレーズが前記取得手段によって新たに取得された場合、前記第１のフレーズを前記ユーザに提示することが必要であるときは、前記第２のフレーズを提示した後に、前記第１のフレーズを提示する提示手段とを備えている。 In order to solve the above-described problem, an information processing apparatus according to an aspect of the present invention is an information processing apparatus that presents a predetermined phrase to a user according to a voice uttered by the user, and the voice is recognized. Acquisition means for acquiring the first phrase associated with the obtained result, and before presenting the first phrase, a second phrase different from the first phrase is newly acquired by the acquisition means If it is necessary, when it is necessary to present the first phrase to the user, the information processing apparatus includes a presentation unit that presents the first phrase after presenting the second phrase.

上記の課題を解決するために、本発明の一態様に係るサーバは、ユーザが情報処理装置に対して発した音声に応じて、当該ユーザに所定のフレーズを提示するように当該情報処理装置を制御するサーバであって、前記音声を認識した結果に対応付けられたフレーズを、所定のフレーズセットにおいて特定する特定手段と、前記フレーズを前記ユーザに提示することが必要であるか否かに応じて、要否情報を生成する生成手段と、前記フレーズおよび前記要否情報を前記情報処理装置に送信する送信手段とを備えている。 In order to solve the above-described problem, a server according to one embodiment of the present invention causes the information processing apparatus to present a predetermined phrase to the user in accordance with a voice uttered by the user to the information processing apparatus. A server for controlling, which specifies a phrase associated with a result of recognizing the voice in a predetermined phrase set, and whether or not it is necessary to present the phrase to the user And generating means for generating necessity information and transmitting means for transmitting the phrase and the necessity information to the information processing apparatus.

本発明の一態様によれば、情報処理装置は、先の呼びかけに対する先のフレーズ（第１のフレーズ）と、後の呼びかけに対する後のフレーズ（第２のフレーズ）とが交錯する場合であっても、上記先のフレーズを未提示のままにすることなく、確実にユーザに提示できるという効果を奏する。 According to one aspect of the present invention, the information processing apparatus is a case where a previous phrase (first phrase) for a previous call and a later phrase (second phrase) for a subsequent call cross each other. Also, there is an effect that the above phrase can be surely presented to the user without leaving it unpresented.

また、本発明の一態様によれば、サーバは、先の呼びかけに対する先のフレーズと、後の呼びかけに対する後のフレーズとが交錯する場合であっても、上記先のフレーズを未提示のままにすることなく、確実にユーザに提示するように、上記情報処理装置を制御することができるという効果を奏する。 Further, according to one aspect of the present invention, the server may leave the previous phrase unpresented even when the previous phrase for the previous call and the subsequent phrase for the subsequent call cross. Thus, there is an effect that the information processing apparatus can be controlled so as to be surely presented to the user.

本発明の第１の実施の形態に係る対話ロボットの要部構成を示すブロック図である。It is a block diagram which shows the principal part structure of the dialogue robot which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係るサーバの要部構成を示すブロック図である。It is a block diagram which shows the principal part structure of the server which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る対話システムを概略的に示す模式図である。It is a mimetic diagram showing roughly the dialog system concerning a 1st embodiment of the present invention. フレーズセットの一例を示す表であり、（ａ）は、上記対話ロボットが記憶部に保持するフレーズセットの一例を示し、（ｂ）は、上記サーバが記憶部に保持するフレーズセットの一例を示す。It is a table | surface which shows an example of a phrase set, (a) shows an example of the phrase set which the said interactive robot hold | maintains at a memory | storage part, (b) shows an example of the phrase set which the said server hold | maintains at a memory | storage part . 上記対話システムにおいて実行される処理の一例を示すフローチャートである。It is a flowchart which shows an example of the process performed in the said interactive system. 本発明の第２の実施の形態に係る対話ロボットの要部構成を示すブロック図である。It is a block diagram which shows the principal part structure of the dialogue robot which concerns on the 2nd Embodiment of this invention. 上記対話ロボットが実行する処理の一例を示すフローチャートである。It is a flowchart which shows an example of the process which the said interactive robot performs.

〔実施形態１〕
図１〜図５に基づいて、本発明の第１の実施の形態（実施形態１）を説明する。 Embodiment 1
The first embodiment (Embodiment 1) of the present invention will be described with reference to FIGS.

（対話システム３００の概要）
図３は、対話システム３００を概略的に示す模式図である。図３に示されるように、上記対話システム３００は、対話ロボット１００とサーバ２００とを含む。上記対話システム３００によれば、ユーザは自然言語を用いた音声によって上記対話ロボット１００と対話することにより、様々な情報を得ることができる。 (Outline of Dialog System 300)
FIG. 3 is a schematic diagram schematically showing the dialogue system 300. As shown in FIG. 3, the dialogue system 300 includes a dialogue robot 100 and a server 200. According to the dialogue system 300, the user can obtain various information by interacting with the dialogue robot 100 by using a voice using a natural language.

対話ロボット（情報処理装置）１００は、ユーザが発した音声に応じて、当該ユーザに所定のフレーズ（返答文）を提示する装置である。音声を入力可能であり、入力された音声に基づいて上記所定のフレーズを提示可能な機器でありさえすればよく、対話ロボットに限定されない（例えば、上記対話ロボット１００は、タブレット端末、スマートフォン、パーソナルコンピュータなどによっても実現され得る）。 The interactive robot (information processing device) 100 is a device that presents a predetermined phrase (response text) to the user according to the voice uttered by the user. It is only necessary to be a device that can input voice and can present the predetermined phrase based on the input voice, and is not limited to a dialogue robot (for example, the dialogue robot 100 is a tablet terminal, a smartphone, a personal computer, or the like). It can also be realized by a computer).

サーバ２００は、ユーザが上記対話ロボット１００に対して発した音声に応じて、当該ユーザに所定のフレーズを提示するように当該対話ロボット１００を制御する装置である。なお、図３に示されるように、上記対話ロボット１００と上記サーバ２００とは、所定の通信方式にしたがう通信網を介して通信可能に接続されている。 The server 200 is a device that controls the interactive robot 100 so as to present a predetermined phrase to the user in accordance with a voice uttered by the user to the interactive robot 100. As shown in FIG. 3, the interactive robot 100 and the server 200 are communicably connected via a communication network according to a predetermined communication method.

上記対話システム３００において、ユーザが発した音声に対する回答として、対話ロボット１００がフレーズを取得する場合、以下の２つが考えられる。 In the interactive system 300, when the interactive robot 100 acquires a phrase as an answer to the voice uttered by the user, the following two are conceivable.

（１）フレーズをローカルから取得する
上記対話ロボット１００が上記音声を認識した結果が、当該対話ロボット１００が備える記憶部に格納されたフレーズセット（ローカル辞書）に含まれる場合、上記対話ロボット１００は、上記フレーズセットにおいて上記認識した結果に対応する所定のフレーズを、上記記憶部から取得する。 (1) Acquire a phrase from the local When the result of recognition of the voice by the interactive robot 100 is included in a phrase set (local dictionary) stored in a storage unit included in the interactive robot 100, the interactive robot 100 The predetermined phrase corresponding to the recognized result in the phrase set is acquired from the storage unit.

例えば、ユーザが上記対話ロボット１００に対して「おはよう」と音声によって呼びかけた場合、当該音声を認識した結果は上記フレーズセットに含まれるため、上記対話ロボット１００は、上記フレーズセットにおいて上記認識した結果に対応する「おはよう、今日の調子はどう？」というフレーズ（図４の（ａ）に示される表の１行目を参照）を、上記記憶部から取得し、当該フレーズを音声によってユーザに提示する。上記（１）の場合のように、対話ロボット１００は、簡単な呼びかけに対するフレーズをユーザに返すことができる。 For example, when the user calls the interactive robot 100 with “good morning” by voice, the result of recognizing the voice is included in the phrase set, so the interactive robot 100 recognizes the result of the recognition in the phrase set. The phrase “Good morning, how are you today?” (See the first row of the table shown in FIG. 4A) is acquired from the storage unit and the phrase is presented to the user by voice. To do. As in the case of (1) above, the interactive robot 100 can return a phrase for a simple call to the user.

（２）フレーズをクラウドから取得する
上記対話ロボット１００が上記音声を認識した結果が上記フレーズセットに含まれない場合、上記対話ロボット１００は、当該音声をサーバ２００に送信（アップロード）する。サーバ２００は、当該サーバ２００が備える記憶部に格納されたフレーズセット（クラウド辞書）において、上記認識した結果に対応する所定のフレーズを特定する。上記対話ロボット１００は、上記サーバ２００によって特定されたフレーズを取得（ダウンロード）する。 (2) Acquiring a phrase from the cloud When the result of recognition of the voice by the interactive robot 100 is not included in the phrase set, the interactive robot 100 transmits (uploads) the voice to the server 200. The server 200 specifies a predetermined phrase corresponding to the recognized result in the phrase set (cloud dictionary) stored in the storage unit included in the server 200. The interactive robot 100 acquires (downloads) the phrase specified by the server 200.

例えば、ユーザが上記対話ロボット１００に対して「今日の天気は？」と音声によって呼びかけをした場合、当該音声を認識した結果は上記フレーズセット（ローカル辞書）には含まれないため、上記対話ロボット１００は、上記サーバ２００から「雨だよ、傘を持っていってね」という上記フレーズを取得し、当該フレーズを音声によってユーザに提示する。音声を認識した結果にマッチするパターン（検出ワード）は、上記ローカル辞書よりも上記クラウド辞書の方に多く含まれることが通常であるため、上記（２）の場合のように、対話ロボット１００は、複雑な呼びかけに対しても適切なフレーズをユーザに返すことができる。 For example, when the user calls the interactive robot 100 with a voice saying “What is the weather today?”, The result of recognizing the voice is not included in the phrase set (local dictionary). 100 acquires the phrase “It's raining, bring an umbrella” from the server 200 and presents the phrase to the user by voice. Since there are usually more patterns (detected words) that match the result of speech recognition in the cloud dictionary than in the local dictionary, the interactive robot 100 has the same meaning as in (2) above. It is possible to return an appropriate phrase to the user even for a complicated call.

上記（１）および（２）のいずれの場合においても、上記対話ロボット１００がフレーズを提示する前に、他のフレーズを提示することが求められることがある。例えば、ユーザが上記対話ロボット１００に対して「今日の天気は？」（先の呼びかけ）と音声によって呼びかけた場合、当該対話ロボット１００が「雨だよ、傘を持っていってね」（先のフレーズ）というフレーズをユーザに提示する前に、当該ユーザが「ところで、スポーツのニュースは？」（後の呼びかけ）と呼びかけることにより、「昨日、チームＡは大勝だったよ」（後のフレーズ）というフレーズを提示することが求められることがある。 In both cases (1) and (2), it may be required to present another phrase before the interactive robot 100 presents a phrase. For example, when the user calls the interactive robot 100 by voice saying “What is the weather today?” (Previous call), the interactive robot 100 “rains, please bring an umbrella” (first Before presenting the phrase "Phrase of" to the user, the user asks "By the way, what is the sports news?" (Later appeal), and "Yesterday, Team A was a big win" (later phrase) May be required to be presented.

このように、先の呼びかけに対する先のフレーズと、後の呼びかけに対する後のフレーズとが交錯するのは、上記先のフレーズをユーザに提示するタイミングが遅れることがあるからである。すなわち、上記（１）の場合においては、上記ローカル辞書から適切なフレーズを抽出（検索）する処理が重いことによって、上記（２）の場合においては、上記対話ロボット１００と上記サーバ２００との間で通信が停滞することによって、上記タイミングが遅延することがある。特に、本実施の形態で説明するように、主要なフレーズセットがサーバ２００の側にある（クラウド構成をとる）場合、通信環境の悪化が原因となって上記タイミングが遅延しやすい。人間同士の会話を模擬する対話において、不自然な遅延が発生する（奇妙な間が空く）ことは許容されないため、先の呼びかけに対する先のフレーズが未提示であっても、ユーザは後のフレーズを要求することが想定される。このとき、先のフレーズが未提示のままになるおそれが考えられる。 Thus, the reason why the previous phrase for the previous call and the subsequent phrase for the subsequent call cross each other is because the timing of presenting the previous phrase to the user may be delayed. That is, in the case of (1) above, the process of extracting (searching) an appropriate phrase from the local dictionary is heavy, and in the case of (2) above, between the interactive robot 100 and the server 200, The above timing may be delayed due to communication stagnation. In particular, as described in the present embodiment, when the main phrase set is on the server 200 side (takes a cloud configuration), the timing is likely to be delayed due to deterioration of the communication environment. In a dialogue that simulates a human conversation, an unnatural delay (an odd gap) is not allowed, so even if the previous phrase for the previous call is not presented, the user will Is assumed to be required. At this time, the previous phrase may remain unpresented.

そこで、上記対話ロボット１００は、先のフレーズを提示する前に、後のフレーズが新たに取得された場合、先のフレーズをユーザに提示することが必要であるときは、後のフレーズを提示した後に、先のフレーズを提示する。これにより、上記対話ロボット１００は、先のフレーズをユーザに提示することが必要と判断される場合、当該先のフレーズを未提示のままにすることなく、当該先のフレーズを必ずユーザに提示できる。 Therefore, when the subsequent phrase is newly acquired before the previous phrase is presented, the interactive robot 100 presents the subsequent phrase when it is necessary to present the previous phrase to the user. Later, the previous phrase is presented. As a result, when it is determined that it is necessary to present the previous phrase to the user, the interactive robot 100 can always present the previous phrase to the user without leaving the previous phrase unpresented. .

以下の説明においては、一例として、「今日の天気は？」という先の呼びかけが音声１ａによってユーザから行われ、その後に「ところで、スポーツのニュースは？」という後の呼びかけが音声１ｂによって行われたとする。そして、上記先の呼びかけに対する回答として「雨だよ、傘を持っていってね」という先のフレーズ（フレーズ６ａ）を音声１ｃによって、上記後の呼びかけに対する回答として「昨日、チームＡは大勝だったよ」という後のフレーズ（フレーズ６ｂ）を音声１ｄによって、上記対話ロボット１００は、それぞれユーザに提示するとする。 In the following description, as an example, the previous call “What's the weather today?” Is made by the user with the sound 1a, and then the later call “What is sports news?” Is made with the sound 1b. Suppose. Then, as an answer to the above call, the previous phrase (phrase 6a) “It's raining, please bring an umbrella” with the voice 1c, and as an answer to the above call, “Yesterday, Team A was a big win. It is assumed that the dialogue robot 100 presents the subsequent phrase “tayo” (phrase 6b) to the user by the voice 1d.

（対話ロボット１００の構成）
図１は、対話ロボット１００の要部構成を示すブロック図である。図１に示されるように、対話ロボット１００は、通信部５０ａ（受信部５１ａ、送信部５２ａ）、制御部１０ａ（音声検出部１１、音声認識部１２、回答確定部１３、回答取得部１４、音声送出部１５、音声出力部１６、フラグ判定部１７、回答格納部１８）、音声入出力部４０（マイク４１、スピーカ４２）、および、記憶部３０ａを備えている。 (Configuration of Dialogue Robot 100)
FIG. 1 is a block diagram showing a main configuration of the interactive robot 100. As shown in FIG. 1, the dialogue robot 100 includes a communication unit 50a (reception unit 51a, transmission unit 52a), a control unit 10a (voice detection unit 11, voice recognition unit 12, answer confirmation unit 13, answer acquisition unit 14, A voice sending unit 15, a voice output unit 16, a flag determination unit 17, an answer storage unit 18), a voice input / output unit 40 (a microphone 41, a speaker 42), and a storage unit 30a are provided.

音声入出力部４０は、対話ロボット１００に対する音声の入出力を制御するものである。音声入出力部４０は、マイク４１とスピーカ４２とを含む。 The voice input / output unit 40 controls voice input / output with respect to the interactive robot 100. The voice input / output unit 40 includes a microphone 41 and a speaker 42.

マイク４１は、対話ロボット１００の周囲から音声を集め、当該音声１ａおよび音声１ｂをそれぞれ表す音声信号２ａおよび音声信号２ｂを、音声検出部１１に出力する。 The microphone 41 collects sounds from the surroundings of the interactive robot 100 and outputs the sound signals 2 a and 2 b representing the sounds 1 a and 1 b to the sound detection unit 11.

スピーカ４２は、音声出力部１６から入力される音声信号２ｃおよび音声信号２ｄを、音声１ｃおよび音声１ｄにそれぞれ変換し、当該音声１ｃおよび当該音声１ｄを外部に出力する。なお、スピーカ４２は、対話ロボット１００に内蔵されたものであってもよいし、外部接続端子を介して外付けされたものであってもよいし、通信可能に接続されたものであってもよい。 The speaker 42 converts the audio signal 2c and the audio signal 2d input from the audio output unit 16 into the audio 1c and the audio 1d, respectively, and outputs the audio 1c and the audio 1d to the outside. The speaker 42 may be built in the interactive robot 100, may be externally connected via an external connection terminal, or may be connected so as to be communicable. Good.

制御部１０ａは、対話ロボット１００が有する各種の機能を統括的に制御する。制御部１０ａは、音声検出部１１、音声認識部１２、回答確定部１３、回答取得部１４、音声送出部１５、音声出力部１６、フラグ判定部１７、および、回答格納部１８を含む。 The control unit 10a comprehensively controls various functions of the interactive robot 100. The control unit 10 a includes a voice detection unit 11, a voice recognition unit 12, a response confirmation unit 13, a response acquisition unit 14, a voice transmission unit 15, a voice output unit 16, a flag determination unit 17, and a response storage unit 18.

音声検出部１１は、ユーザが発した音声を検出する。具体的には、マイク４１から音声信号２ａまたは音声信号２ｂが入力された場合、当該音声信号２ａおよび当該音声信号２ｂを、対話ロボット１００においてデジタル処理が可能な音声情報３ａおよび音声情報３ｂにそれぞれ変換し、当該音声情報３ａおよび当該音声情報３ｂを回答確定部１３および音声認識部１２に出力する。 The voice detection unit 11 detects voice uttered by the user. Specifically, when the audio signal 2 a or the audio signal 2 b is input from the microphone 41, the audio signal 2 a and the audio signal 2 b are respectively converted into audio information 3 a and audio information 3 b that can be digitally processed by the interactive robot 100. The voice information 3 a and the voice information 3 b are output to the answer confirmation unit 13 and the voice recognition unit 12.

音声認識部１２は、ユーザが対話ロボット１００に対して発した音声を認識する。具体的には、音声検出部１１から音声情報３ａまたは音声情報３ｂが入力された場合、音声認識部１２は、所定の音声認識のアルゴリズムにしたがって、当該音声情報３ａまたは当該音声情報３ｂを認識した結果（認識結果４ａまたは認識結果４ｂ）をそれぞれ得る。ここで、当該認識結果４ａまたは当該認識結果４ｂは、上記音声情報３ａまたは上記音声情報３ｂから変換されたテキスト（ユーザが発話した内容を文字によって表現するもの）を少なくとも含む。なお、上記音声認識のアルゴリズムとしては、公知のものが適宜採用されてよい。音声認識部１２は、上記認識結果４ａおよび上記認識結果４ｂを回答確定部１３に出力する。 The voice recognition unit 12 recognizes a voice uttered by the user with respect to the interactive robot 100. Specifically, when the voice information 3a or the voice information 3b is input from the voice detection unit 11, the voice recognition unit 12 recognizes the voice information 3a or the voice information 3b according to a predetermined voice recognition algorithm. A result (recognition result 4a or recognition result 4b) is obtained. Here, the recognition result 4a or the recognition result 4b includes at least the voice information 3a or text converted from the voice information 3b (representing the contents spoken by the user by characters). As the speech recognition algorithm, known algorithms may be adopted as appropriate. The voice recognition unit 12 outputs the recognition result 4 a and the recognition result 4 b to the answer confirmation unit 13.

回答確定部１３は、音声を認識した結果に基づいて、ユーザに返す回答を確定する。具体的には、音声認識部１２から上記認識結果４ａまたは上記認識結果４ｂが入力された場合、回答確定部１３は、記憶部３０ａに格納されたフレーズセット５ａを参照し、上記認識結果４ａまたは上記認識結果４ｂに含まれる上記テキストを含むパターン（検出ワード）が、当該フレーズセット５ａに含まれるか否かを判定する。含まれると判定される場合、回答確定部１３は、当該パターンに対応付けられたフレーズをユーザに返す回答として確定し、回答取得部１４に出力する。一方、含まれないと判定される場合、回答確定部１３は、音声検出部１１から入力された音声情報３ａを音声送出部１５に出力する。このとき、回答確定部１３は、回答を保留するフレーズを回答取得部１４に出力することによって、当該フレーズをユーザに提示してよい。 The answer determination unit 13 determines the answer to be returned to the user based on the result of recognizing the voice. Specifically, when the recognition result 4a or the recognition result 4b is input from the speech recognition unit 12, the answer determination unit 13 refers to the phrase set 5a stored in the storage unit 30a, and the recognition result 4a or It is determined whether or not a pattern (detection word) including the text included in the recognition result 4b is included in the phrase set 5a. If it is determined that the phrase is included, the answer determination unit 13 determines the phrase associated with the pattern as an answer to be returned to the user, and outputs the answer to the answer acquisition unit 14. On the other hand, if it is determined not to be included, the answer confirmation unit 13 outputs the voice information 3 a input from the voice detection unit 11 to the voice transmission unit 15. At this time, the answer determination unit 13 may present the phrase to the user by outputting a phrase for suspending the answer to the answer acquisition unit 14.

図４は、フレーズセットの一例を示す表であり、（ａ）は、対話ロボット１００が記憶部３０ａに保持するフレーズセット５ａ（ローカル辞書）を示し、（ｂ）は、サーバ２００が記憶部３０ｂに保持するフレーズセット５ｂ（クラウド辞書）を示す。ここで、上記「フレーズセット」（辞書）は、所定のパターン（検出ワード）に所定のフレーズ（および重要フラグ７）を対応付けたデータセットである。また、上記「フレーズ」は、上記所定のパターンに対する好ましい回答を、所定のデータ形式（例えば、テキスト形式）によって表すものである。さらに、上記「重要フラグ」（要否情報）は、フレーズ６ａをユーザに提示することが必要であるか否かを示す情報であり、例えば、「１」または「０」の値をとる２値フラグであってよい。このとき、当該重要フラグ７が「１」の場合は「重要」（フレーズ６ａをユーザに提示することが必要であること）を示し、「０」の場合は「通常」（必要でないこと）を示してよい。 FIG. 4 is a table showing an example of a phrase set. (A) shows the phrase set 5a (local dictionary) held in the storage unit 30a by the interactive robot 100, and (b) shows the server 200 by the storage unit 30b. The phrase set 5b (cloud dictionary) held in FIG. Here, the “phrase set” (dictionary) is a data set in which a predetermined phrase (and an important flag 7) is associated with a predetermined pattern (detection word). The “phrase” represents a preferable answer to the predetermined pattern in a predetermined data format (for example, a text format). Further, the “important flag” (necessity information) is information indicating whether or not it is necessary to present the phrase 6a to the user. For example, a binary value having a value of “1” or “0” is used. It may be a flag. At this time, when the important flag 7 is “1”, it indicates “important” (it is necessary to present the phrase 6a to the user), and when it is “0”, “normal” (not necessary) is indicated. May show.

例えば、音声認識部１２から入力される認識結果４ａ（音声１ａを認識した結果）に含まれる上記テキストは「今日の天気は」となるが、図４の（ａ）に示されるように、当該テキストを含むパターンは、フレーズセット５ａには存在しない。このとき、回答確定部１３は、音声情報３ａを音声送出部１５に出力するとともに、回答を保留するフレーズ（図４の（ａ）においては「ちょっとまってね」、「そうだね」、「うーん」などのフレーズ）を、回答取得部１４に出力する。 For example, the text included in the recognition result 4a (the result of recognizing the voice 1a) input from the voice recognition unit 12 is “Today's weather”, but as shown in FIG. A pattern including text does not exist in the phrase set 5a. At this time, the answer confirming unit 13 outputs the voice information 3a to the voice sending unit 15 and also puts the answer on hold phrase (in FIG. 4 (a), “Please wait a bit”, “That's right”, “Hmm” And the like) are output to the answer acquisition unit 14.

回答取得部（取得手段）１４は、音声１ａが認識された結果に対応付けられたフレーズ（第１のフレーズ）６ａを取得する。具体的には、ユーザに返す回答として回答確定部１３からフレーズ６ａが入力された場合、回答取得部１４は、当該フレーズ６ａを音声出力部１６に出力する。同様に、受信部５１ａからフレーズ（第２のフレーズ）６ｂおよび重要フラグ７が入力された場合、回答取得部１４は、当該フレーズ６ｂを音声出力部１６に出力する。 The answer acquisition unit (acquisition means) 14 acquires the phrase (first phrase) 6a associated with the result of recognition of the voice 1a. Specifically, when the phrase 6 a is input from the answer confirmation unit 13 as an answer to be returned to the user, the answer acquisition unit 14 outputs the phrase 6 a to the voice output unit 16. Similarly, when the phrase (second phrase) 6b and the important flag 7 are input from the reception unit 51a, the answer acquisition unit 14 outputs the phrase 6b to the voice output unit 16.

一方、受信部５１ａからフレーズ６ａおよび重要フラグ７が入力されたとき、（１）当該フレーズ６ａを提示する前に、フレーズ６ｂが取得された場合（音声出力部１６によってフレーズ６ａが音声１ｃとして出力される前に、回答取得部１４がフレーズ６ｂを取得したとき）、回答取得部１４は、当該フレーズ６ａおよび当該重要フラグ７をフラグ判定部１７に出力する。（２）それ以外の場合、回答取得部１４は、当該フレーズ６ａを音声出力部１６に出力する。 On the other hand, when the phrase 6a and the important flag 7 are input from the receiving unit 51a, (1) when the phrase 6b is acquired before the phrase 6a is presented (the phrase 6a is output as the audio 1c by the audio output unit 16) Before the answer acquisition unit 14 acquires the phrase 6b), the answer acquisition unit 14 outputs the phrase 6a and the important flag 7 to the flag determination unit 17. (2) In other cases, the answer acquisition unit 14 outputs the phrase 6 a to the voice output unit 16.

音声送出部１５は、所定の通信方式にしたがう通信網を介して、音声情報３ａおよび音声情報３ｂをサーバ２００に送信する。具体的には、回答確定部１３から音声情報３ａまたは音声情報３ｂが入力された場合、音声送出部１５は、当該音声情報３ａまたは当該音声情報３ｂを送信部５２ａに出力する。 The voice sending unit 15 sends the voice information 3a and the voice information 3b to the server 200 via a communication network according to a predetermined communication method. Specifically, when the voice information 3a or the voice information 3b is input from the answer confirmation unit 13, the voice sending unit 15 outputs the voice information 3a or the voice information 3b to the transmission unit 52a.

音声出力部１６は、スピーカ４２を介して、フレーズ６ａおよびフレーズ６ｂを音声によって出力することにより、当該フレーズ６ａおよび当該フレーズ６ｂをユーザに提示する。具体的には、回答取得部１４からフレーズ６ａまたはフレーズ６ｂが入力された場合、当該フレーズ６ａまたは当該フレーズ６ｂをスピーカ４２に出力する。 The voice output unit 16 presents the phrase 6a and the phrase 6b to the user by outputting the phrase 6a and the phrase 6b by voice through the speaker 42. Specifically, when the phrase 6 a or the phrase 6 b is input from the answer acquisition unit 14, the phrase 6 a or the phrase 6 b is output to the speaker 42.

フラグ判定部１７は、フレーズ６ａを提示する前に、フレーズ６ｂが新たに取得された場合、重要フラグ７に基づいて、上記フレーズ６ａをユーザに提示することが必要であるか否かを判定する。具体的には、回答取得部１４からフレーズ６ａおよび重要フラグ７が入力された場合、フラグ判定部１７は、当該重要フラグ７が「重要」を示すか、「通常」を示すかを判定する。「重要」を示すと判定される場合、フラグ判定部１７は、上記フレーズ６ａを回答格納部１８に出力する。 When the phrase 6b is newly acquired before the phrase 6a is presented, the flag determination unit 17 determines whether or not it is necessary to present the phrase 6a to the user based on the important flag 7. . Specifically, when the phrase 6a and the important flag 7 are input from the answer acquiring unit 14, the flag determining unit 17 determines whether the important flag 7 indicates “important” or “normal”. When it is determined that “important” is indicated, the flag determination unit 17 outputs the phrase 6 a to the answer storage unit 18.

なお、「通常」を示すと判定される場合、フラグ判定部１７は、上記フレーズ６ａを回答格納部１８に出力せず、当該フレーズ６ａを破棄してよい。または、ユーザに後から提示することを再要求された場合に備えて、記憶部３０ａの所定の記憶領域に、当該フレーズ６ａを格納してもよい。例えば、「おもしろい話聞かせて」という先の呼びかけが音声１ａによってユーザから行われ、その後に「ところで、スポーツのニュースは？」という後の呼びかけが音声１ｂによって行われたとする。そして、上記先の呼びかけに対する回答として「昔々、おじいさんとおばあさんが・・・」（図４の（ｂ）に示される表の６行目参照）という先のフレーズ（フレーズ６ａ）が取得された場合、当該フレーズ６ａの重要フラグ７は「通常」を示すため、上記対話ロボット１００は、上記後の呼びかけに対する回答として「昨日、チームＡは大勝だったよ」という後のフレーズ（フレーズ６ｂ）をユーザに提示した後でも、上記フレーズ６ａを提示しなくともよい。 When it is determined that “normal” is indicated, the flag determination unit 17 may discard the phrase 6 a without outputting the phrase 6 a to the answer storage unit 18. Alternatively, the phrase 6a may be stored in a predetermined storage area of the storage unit 30a in preparation for a case where it is requested again to be presented to the user later. For example, it is assumed that a previous call “Let me hear an interesting story” is made from the user by the voice 1a, and a later call “What is sports news?” Is made by the voice 1b. Then, when the previous phrase (phrase 6a) “Old grandfather and grandmother ...” (see the sixth line of the table shown in FIG. 4B) is acquired as an answer to the above call. Since the important flag 7 of the phrase 6a indicates “normal”, the dialogue robot 100 gives the user the following phrase (phrase 6b) “Yesterday, Team A was a big win” as an answer to the later call. Even after the presentation, the phrase 6a may not be presented.

回答格納部（格納手段）１８は、フレーズ６ａを提示する前に、フレーズ６ｂが新たに取得された場合、上記フレーズ６ａをユーザに提示することが必要であるときは、当該フレーズ６ａを所定の記憶部３０ａに格納する。具体的には、フラグ判定部１７から上記フレーズ６ａが入力された場合、回答格納部１８は、当該フレーズ６ａを上記記憶部３０ａに格納する。 When the phrase 6b is newly acquired before the phrase 6a is presented, the answer storage unit (storage means) 18 is required to present the phrase 6a to the user when the phrase 6a needs to be presented to the user. Store in the storage unit 30a. Specifically, when the phrase 6a is input from the flag determination unit 17, the answer storage unit 18 stores the phrase 6a in the storage unit 30a.

ここで、音声出力部（提示手段）１６は、フレーズ６ａを提示する前に、フレーズ６ｂが新たに取得された場合、上記フレーズ６ａをユーザに提示することが必要であるときは、上記フレーズ６ｂを提示した後に、上記フレーズ６ａを提示する。すなわち、フラグ判定部１７によって重要フラグ７が「重要」を示すと判定されたことにより、回答格納部１８によってフレーズ６ａが記憶部３０ａに格納されている場合、音声出力部１６は、上記フレーズ６ｂをスピーカ４２に出力した後、上記フレーズ６ａを上記記憶部３０ａから読み出し、当該フレーズ６ａをスピーカ４２に出力する。 Here, when the phrase 6b is newly acquired before the phrase 6a is presented, the voice output unit (presentation means) 16 is required to present the phrase 6a to the user. Then, the phrase 6a is presented. That is, when the flag determination unit 17 determines that the important flag 7 indicates “important”, and the answer storage unit 18 stores the phrase 6a in the storage unit 30a, the voice output unit 16 determines that the phrase 6b Is output to the speaker 42, the phrase 6a is read from the storage unit 30a, and the phrase 6a is output to the speaker 42.

通信部５０ａは、所定の通信方式にしたがう通信網を介して外部と通信する。外部の機器との通信を実現する本質的な機能が備わってさえいればよく、通信回線、通信方式、または通信媒体などは限定されない。通信部５０ａは、例えばイーサネット（登録商標）アダプタなどの機器で構成できる。また、通信部５０ａは、例えばIEEE802.11無線通信、Bluetooth（登録商標）などの通信方式や通信媒体を利用できる。通信部５０ａは、受信部５１ａと送信部５２ａとを含む。 The communication unit 50a communicates with the outside via a communication network according to a predetermined communication method. It is only necessary to have an essential function for realizing communication with an external device, and the communication line, the communication method, the communication medium, and the like are not limited. The communication unit 50a can be configured by a device such as an Ethernet (registered trademark) adapter, for example. The communication unit 50a can use a communication method or a communication medium such as IEEE802.11 wireless communication or Bluetooth (registered trademark). The communication unit 50a includes a reception unit 51a and a transmission unit 52a.

受信部５１ａは、上記所定の通信方式にしたがう通信網を介して外部と通信することによって、フレーズ６ａおよびフレーズ６ｂをサーバ２００から受信する。受信部５１ａは、受信したフレーズ６ａおよびフレーズ６ｂを回答取得部１４に出力する。 The receiving unit 51a receives the phrase 6a and the phrase 6b from the server 200 by communicating with the outside via the communication network according to the predetermined communication method. The receiving unit 51a outputs the received phrase 6a and phrase 6b to the answer acquiring unit 14.

送信部５２ａは、音声送出部１５から音声情報３ａまたは音声情報３ｂが入力された場合、上記所定の通信方式にしたがう通信網を介して外部と通信することによって、サーバ２００に当該音声情報３ａまたは当該音声情報３ｂを送信する。 When the voice information 3a or the voice information 3b is input from the voice sending unit 15, the transmission unit 52a communicates with the outside via the communication network in accordance with the predetermined communication method, so that the server 200 transmits the voice information 3a or the voice information 3a. The voice information 3b is transmitted.

記憶部３０ａは、フレーズセット５ａおよびフレーズ６ａを格納可能な記憶機器である。記憶部３０ａは、例えば、ハードディスク、ＳＳＤ（silicon state drive）、半導体メモリ、ＤＶＤなどで構成できる。 The storage unit 30a is a storage device that can store the phrase set 5a and the phrase 6a. The storage unit 30a can be configured by, for example, a hard disk, an SSD (silicon state drive), a semiconductor memory, a DVD, or the like.

（サーバ２００の構成）
図２は、サーバ２００の要部構成を示すブロック図である。図２に示されるように、サーバ２００は、通信部５０ｂ（受信部５１ｂ、送信部５２ｂ）、制御部１０ｂ（音声取得部２１、音声認識部２２、回答特定部２３、フラグ生成部２４、情報送出部２５）、および、記憶部３０ｂを備えている。 (Configuration of server 200)
FIG. 2 is a block diagram illustrating a main configuration of the server 200. As shown in FIG. 2, the server 200 includes a communication unit 50b (reception unit 51b, transmission unit 52b), a control unit 10b (voice acquisition unit 21, voice recognition unit 22, answer identification unit 23, flag generation unit 24, information A sending unit 25) and a storage unit 30b are provided.

通信部５０ｂは、通信部５０ａと同様であるため、詳細な説明を省略する。通信部５０ｂは、受信部５１ｂと送信部５２ｂとを含む。 Since the communication unit 50b is the same as the communication unit 50a, detailed description thereof is omitted. The communication unit 50b includes a reception unit 51b and a transmission unit 52b.

受信部５１ｂは、所定の通信方式にしたがう通信網を介して外部と通信することによって、音声情報３ａおよび音声情報３ｂを対話ロボット１００から受信する。受信部５１ｂは、受信した音声情報３ａおよび音声情報３ｂを音声取得部２１に出力する。 The receiving unit 51b receives the voice information 3a and the voice information 3b from the interactive robot 100 by communicating with the outside via a communication network according to a predetermined communication method. The reception unit 51b outputs the received audio information 3a and audio information 3b to the audio acquisition unit 21.

送信部５２ｂは、情報送出部２５からフレーズ６ａまたはフレーズ６ｂと重要フラグ７とが入力された場合、上記所定の通信方式にしたがう通信網を介して外部と通信することによって、対話ロボット１００に当該フレーズ６ａまたは当該フレーズ６ｂと当該重要フラグ７とを送信する。 When the phrase 6a or the phrase 6b and the important flag 7 are input from the information transmission unit 25, the transmission unit 52b communicates with the outside through the communication network according to the predetermined communication method, thereby causing the interactive robot 100 to The phrase 6a or the phrase 6b and the important flag 7 are transmitted.

音声取得部２１は、所定の通信方式にしたがう通信網を介して、対話ロボット１００から音声情報３ａおよび音声情報３ｂを取得する。具体的には、受信部５１ｂから音声情報３ａまたは音声情報３ｂが入力された場合、音声取得部２１は、当該音声情報３ａまたは当該音声情報３ｂを音声認識部２２に出力する。 The voice acquisition unit 21 acquires the voice information 3a and the voice information 3b from the interactive robot 100 via a communication network according to a predetermined communication method. Specifically, when the voice information 3a or the voice information 3b is input from the reception unit 51b, the voice acquisition unit 21 outputs the voice information 3a or the voice information 3b to the voice recognition unit 22.

音声認識部２２は、ユーザが対話ロボット１００に対して発した音声を認識する。具体的には、音声取得部２１から音声情報３ａまたは音声情報３ｂが入力された場合、音声認識部２２は、所定の音声認識のアルゴリズムにしたがって、当該音声情報３ａまたは当該音声情報３ｂを認識した結果（認識結果４ａまたは認識結果４ｂ）をそれぞれ得る。音声認識部２２は、上記認識結果４ａまたは上記認識結果４ｂを回答特定部２３に出力する。 The voice recognition unit 22 recognizes a voice uttered by the user with respect to the interactive robot 100. Specifically, when the voice information 3a or the voice information 3b is input from the voice acquisition unit 21, the voice recognition unit 22 recognizes the voice information 3a or the voice information 3b according to a predetermined voice recognition algorithm. A result (recognition result 4a or recognition result 4b) is obtained. The voice recognition unit 22 outputs the recognition result 4a or the recognition result 4b to the answer specifying unit 23.

回答特定部（特定手段）２３は、音声を認識した結果（認識結果４ａまたは認識結果４ｂ）に対応付けられたフレーズを、フレーズセット５ｂにおいて特定する。具体的には、音声認識部２２から認識結果４ａまたは認識結果４ｂが入力された場合、記憶部３０ｂに格納されたフレーズセット５ｂを参照し、上記認識結果４ａまたは上記認識結果４ｂに含まれる上記テキストを含むパターンを抽出する。そして、回答特定部２３は、当該パターンに対応付けられたフレーズ（フレーズ６ａまたはフレーズ６ｂ）をユーザに返す回答として特定し、フラグ生成部２４に出力する。 The answer specifying unit (specifying means) 23 specifies the phrase associated with the result of recognition of speech (recognition result 4a or recognition result 4b) in the phrase set 5b. Specifically, when the recognition result 4a or the recognition result 4b is input from the voice recognition unit 22, the phrase set 5b stored in the storage unit 30b is referred to, and the above-described recognition result 4a or the above-mentioned recognition result 4b included in the recognition result 4b. Extract patterns that contain text. Then, the answer specifying unit 23 specifies the phrase (phrase 6a or phrase 6b) associated with the pattern as an answer to be returned to the user, and outputs the answer to the flag generating unit 24.

例えば、音声認識部２２から入力される認識結果４ａ（音声１ａを認識した結果）に含まれる上記テキストは「今日の天気は」となるところ、当該テキストには「天気」というパターンが含まれるため、図４の（ｂ）に示される表の１行目によれば、回答特定部２３は「今日の天気は雨だよ、傘を持っていってね」というフレーズ６ａを、ユーザに返す回答として特定し、フラグ生成部２４に出力する。なお、上記テキストを含むパターンがフレーズセット５ｂに含まれない場合（図４の（ｂ）に示される表の９行目「−マッチングなし−」を参照）、音声認識部２２は、音声認識またはフレーズの特定に失敗したことをユーザにフィードバックするフレーズ（例えば、「全然分からないよ」など）をユーザに返す回答として特定し、フラグ生成部２４に出力する（すなわち、上記テキストにマッチするパターンがフレーズセット５ｂに含まれない場合であっても、フレーズは必ず特定される）。 For example, the text included in the recognition result 4a (the result of recognizing the voice 1a) input from the voice recognition unit 22 is “Today's weather”, but the text includes the pattern “weather”. According to the first row of the table shown in FIG. 4B, the answer specifying unit 23 returns the phrase 6a to the user, “Today's weather is rain, please bring an umbrella”. And output to the flag generation unit 24. Note that when the pattern including the text is not included in the phrase set 5b (see “-no matching-” in the ninth row of the table shown in FIG. 4B), the speech recognition unit 22 performs speech recognition or A phrase that gives feedback to the user that the phrase has failed to be identified (for example, “I don't know at all”) is identified as an answer to be returned to the user, and is output to the flag generation unit 24 (that is, a pattern that matches the text) Even if it is not included in the phrase set 5b, the phrase is always specified).

フラグ生成部（生成手段）２４は、フレーズをユーザに提示することが必要であるか否かに応じて、重要フラグを生成する。具体的には、回答特定部２３からフレーズ６ａまたはフレーズ６ｂが入力された場合、フラグ生成部２４は、当該フレーズ６ａまたは当該フレーズ６ｂに「重要度」（図４の（ｂ）に示される表の２列目を参照、２列目に丸印が付された行のフレーズの重要度が「高い」ことを示す）が設定されているか否かを判定する。設定されていると判定される場合、フラグ生成部２４は、「重要」（フレーズ６ａをユーザに提示することが必要であることを表す）を示す重要フラグ７を生成し、当該重要フラグ７と、当該重要フラグ７に対応するフレーズ（フレーズ６ａまたはフレーズ６ｂ）とを情報送出部２５に出力する。設定されていないと判定される場合、フラグ生成部２４は、「通常」（フレーズ６ａをユーザに提示することが必要でないことを表す）を示す重要フラグ７を生成し、当該重要フラグ７を情報送出部２５に出力する。 The flag generation unit (generation means) 24 generates an important flag depending on whether or not it is necessary to present the phrase to the user. Specifically, when the phrase 6a or the phrase 6b is input from the answer specifying unit 23, the flag generating unit 24 sets the “importance” (the table shown in (b) of FIG. 4) to the phrase 6a or the phrase 6b. The second column is referred to, and it is determined whether or not the phrase in the row marked with a circle in the second column is “high”. When it is determined that the flag is set, the flag generation unit 24 generates an important flag 7 indicating “important” (representing that it is necessary to present the phrase 6a to the user). The phrase (phrase 6 a or phrase 6 b) corresponding to the important flag 7 is output to the information transmission unit 25. When it is determined that it is not set, the flag generation unit 24 generates an important flag 7 indicating “normal” (representing that it is not necessary to present the phrase 6a to the user), and the important flag 7 is stored as information. The data is output to the sending unit 25.

情報送出部（送信手段）２５は、所定の通信方式にしたがう通信網を介して、フレーズ６ａおよびフレーズ６ｂを対話ロボット１００に送信する。具体的には、フラグ生成部２４からフレーズ６ａまたはフレーズ６ｂと重要フラグ７とが入力された場合、情報送出部２５は、当該フレーズ６ａまたは当該フレーズ６ｂと当該重要フラグ７とを送信部５２ｂに出力する。 The information transmission unit (transmission means) 25 transmits the phrase 6a and the phrase 6b to the interactive robot 100 via a communication network according to a predetermined communication method. Specifically, when the phrase 6a or phrase 6b and the important flag 7 are input from the flag generation unit 24, the information sending unit 25 sends the phrase 6a or the phrase 6b and the important flag 7 to the transmission unit 52b. Output.

記憶部３０ｂは、フレーズセット５ｂを格納可能な記憶機器である。記憶部３０ｂは、記憶部３０ａと同様に、例えば、ハードディスク、ＳＳＤ、半導体メモリ、ＤＶＤなどで構成できる。 The storage unit 30b is a storage device that can store the phrase set 5b. The storage unit 30b can be configured by, for example, a hard disk, an SSD, a semiconductor memory, a DVD, or the like, similarly to the storage unit 30a.

（対話システム３００において実行される処理）
図５は、対話システム３００において実行される処理の一例を示すフローチャートである。なお、以下の説明において、カッコ書きの「〜ステップ」は、対話ロボット１００の制御方法に含まれる各ステップを表す。 (Processes executed in the dialogue system 300)
FIG. 5 is a flowchart illustrating an example of processing executed in the interactive system 300. In the following description, parenthesized “˜step” represents each step included in the control method of the interactive robot 100.

ユーザが発した「今日の天気は？」という音声１ａを、音声検出部１１が検出すると（ステップ１においてＹＥＳ、以下「ステップ」を「Ｓ」と略記する）、音声認識部１２が当該音声１ａを認識する（Ｓ２）。回答確定部１３は、記憶部３０ａに格納されたフレーズセット５ａ（ローカル辞書）を参照し、認識結果４ａに含まれるテキストを含むパターンが、当該フレーズセット５ａに含まれるか否か（マッチするか否か）を判定する（Ｓ３）。含まれると判定される場合（Ｓ３においてＹＥＳ）、回答確定部１３は、当該パターンに対応付けられたフレーズをユーザに返す回答として確定する（Ｓ４）。回答取得部１４が当該フレーズを取得すると（Ｓ１６）、音声出力部１６が当該フレーズを音声としてユーザに提示し（当該音声を再生する、Ｓ１７）、処理が終了する。一方、含まれないと判定される場合（Ｓ３においてＮＯ）、回答確定部１３は、「ちょっとまってね」という回答を保留するフレーズを確定し（Ｓ５）、音声出力部１６が当該フレーズを音声としてユーザに提示する（Ｓ６）。また、音声送出部１５は、音声情報３ａをサーバ２００に送信する（Ｓ７）。 When the voice detection unit 11 detects the voice 1a “What's the weather today?” Issued by the user (YES in step 1, “step” is abbreviated as “S” hereinafter), the voice recognition unit 12 reads the voice 1a. Is recognized (S2). The answer determination unit 13 refers to the phrase set 5a (local dictionary) stored in the storage unit 30a, and whether or not the pattern including the text included in the recognition result 4a is included in the phrase set 5a (matches) (S3). When it is determined that the phrase is included (YES in S3), the answer confirmation unit 13 confirms the phrase associated with the pattern as an answer to be returned to the user (S4). When the answer acquisition unit 14 acquires the phrase (S16), the voice output unit 16 presents the phrase as a voice to the user (reproduces the voice, S17), and the process ends. On the other hand, when it is determined that the answer is not included (NO in S3), the answer determination unit 13 determines a phrase for holding the answer “slightly wait” (S5), and the voice output unit 16 uses the phrase as a voice. To the user (S6). Further, the voice sending unit 15 sends the voice information 3a to the server 200 (S7).

サーバ２００の受信部５１ｂが上記音声情報３ａを受信し、音声取得部２１が当該音声情報３ａを取得すると（Ｓ８）、音声認識部２２が音声１ａを認識する（Ｓ９）。回答特定部２３は、音声１ａを認識した結果（認識結果４ａ）に対応付けられたフレーズ６ａを、フレーズセット５ｂ（クラウド辞書）において特定する（Ｓ１０）。フラグ生成部２４は、当該フレーズ６ａをユーザに提示することが必要であるか否か（当該フレーズ６ａの重要度は「高い」か否か）を判定し（Ｓ１１）、「高い」と判定される場合（Ｓ１１においてＹＥＳ）、「高い」を示す重要フラグ７を生成し（Ｓ１２）、「高い」と判定されない場合（Ｓ１１においてＮＯ）、「通常」を示す重要フラグ７を生成する（Ｓ１３）。情報送出部２５は、上記フレーズ６ａおよび上記重要フラグ７を対話ロボット１００に送信し（Ｓ１４）、当該対話ロボット１００の受信部５１ａは、当該フレーズ６ａおよび当該重要フラグ７を受信する（Ｓ１５）。回答取得部１４は、当該フレーズ６ａおよび当該重要フラグ７を取得する（Ｓ１６、取得ステップ）。 When the reception unit 51b of the server 200 receives the audio information 3a and the audio acquisition unit 21 acquires the audio information 3a (S8), the audio recognition unit 22 recognizes the audio 1a (S9). The answer specifying unit 23 specifies the phrase 6a associated with the result of recognizing the voice 1a (recognition result 4a) in the phrase set 5b (cloud dictionary) (S10). The flag generation unit 24 determines whether or not it is necessary to present the phrase 6a to the user (whether the importance of the phrase 6a is “high”) (S11), and is determined to be “high”. If it is determined (YES in S11), the important flag 7 indicating "high" is generated (S12). If not determined as "high" (NO in S11), the important flag 7 indicating "normal" is generated (S13). . The information sending unit 25 transmits the phrase 6a and the important flag 7 to the interactive robot 100 (S14), and the receiving unit 51a of the interactive robot 100 receives the phrase 6a and the important flag 7 (S15). The answer acquisition unit 14 acquires the phrase 6a and the important flag 7 (S16, acquisition step).

上記Ｓ７〜Ｓ１５の間に、ユーザがさらに発した「ところでスポーツのニュースは？」という音声１ｂを、音声検出部１１が検出した場合（割り込み処理が発生した場合、Ｓ１８においてＹＥＳ）、フラグ判定部１７は、上記Ｓ１５において受信した上記重要フラグ７に基づいて、上記フレーズ６ａをユーザに提示することが必要であるか否かを判定する（Ｓ１９）。必要であると判定される場合（上記重要フラグ７が「重要」を示す場合、Ｓ１９においてＹＥＳ）、回答格納部１８は、上記フレーズ６ａを記憶部３０ａに格納する（Ｓ２０）。 When the voice detection unit 11 detects the voice 1b “Where is sports news?” Further issued by the user during the above-described S7 to S15 (when interrupt processing occurs, YES in S18), the flag determination unit 17 determines whether it is necessary to present the phrase 6a to the user based on the important flag 7 received in S15 (S19). When it is determined that it is necessary (when the important flag 7 indicates “important”, YES in S19), the answer storage unit 18 stores the phrase 6a in the storage unit 30a (S20).

対話ロボット１００およびサーバ２００は、上記音声１ｂに対して、上記Ｓ２〜Ｓ１７の処理を実行する（図５に示されるフローチャートにおいて「Ａ」と表される）。上記Ｓ７〜Ｓ１５の間に、ユーザからさらなる音声を検出しなければ（Ｓ１８においてＮＯ）、回答取得部１４は、上記音声１ｂに対する回答として「昨日、チームＡは大勝だったよ」というフレーズ６ｂを取得し（Ｓ１６、取得ステップ）、当該フレーズ６ｂを音声１ｄによってユーザに提示する（Ｓ１７、提示ステップ）。上記「Ａ」から呼び出された一連の処理Ｓ２〜Ｓ１７が終了し、処理の流れはＳ２１の直前に復帰する。 The interactive robot 100 and the server 200 execute the processes of S2 to S17 on the voice 1b (represented as “A” in the flowchart shown in FIG. 5). If no further voice is detected from the user during S7 to S15 (NO in S18), the answer acquisition unit 14 acquires the phrase 6b “Team A was a big win yesterday” as an answer to the voice 1b. (S16, acquisition step), the phrase 6b is presented to the user by voice 1d (S17, presentation step). The series of processes S2 to S17 called from “A” is completed, and the flow of the process returns immediately before S21.

音声出力部１６は、上記Ｓ１６において取得した上記フレーズ６ａをユーザに提示することが必要であるときは（すなわち、フラグ判定部１７によって重要フラグ７が「重要」を示すと判定されたことにより、回答格納部１８によって上記フレーズ６ａが記憶部３０ａに格納されているときは、Ｓ２１においてＹＥＳ）、音声１ｃによって上記フレーズ６ａをユーザに提示する（Ｓ１７、提示ステップ）。なお、上記フレーズ６ａがユーザに提示された後、音声出力部１６は、上記記憶部３０ａに格納された上記フレーズ６ａを、当該記憶部３０ａから削除してよい。 When the voice output unit 16 needs to present the phrase 6a acquired in S16 to the user (that is, the flag determination unit 17 determines that the important flag 7 indicates “important”, When the phrase 6a is stored in the storage unit 30a by the answer storage unit 18, the phrase 6a is presented to the user by the voice 1c (S17, presentation step). Note that after the phrase 6a is presented to the user, the voice output unit 16 may delete the phrase 6a stored in the storage unit 30a from the storage unit 30a.

上記のように、例えば、通信が停滞したことにより、対話ロボット１００がユーザに回答（フレーズ６ａ）を提示するタイミングが遅延し、上記Ｓ７〜Ｓ１５の間にユーザがさらなる音声１ｂを与えたことによって、上記フレーズ６ａが提示される前に、フレーズ６ｂが新たに取得された場合、上記フレーズ６ａをユーザに提示することが必要であるときは、上記対話ロボット１００は、上記フレーズ６ｂを提示した後に、上記フレーズ６ａを提示する。 As described above, for example, when the communication is stagnant, the timing at which the interactive robot 100 presents the answer (phrase 6a) to the user is delayed, and the user gives a further voice 1b during S7 to S15. When the phrase 6b is newly acquired before the phrase 6a is presented, when it is necessary to present the phrase 6a to the user, the interactive robot 100 presents the phrase 6b. The phrase 6a is presented.

なお、音声１ｂに対して上記Ｓ２〜Ｓ１６の処理を実行している間に、ユーザからさらなる音声を検出した場合（２回目のＳ１８においてＹＥＳ）、対話ロボット１００は、上記Ｓ１９および上記Ｓ２０の処理を実行した後、上記Ｓ２〜Ｓ１７の処理をさらに実行できる。さらに実行できる回数は任意であり、当該回数は予め設定されていてよい。 In addition, when a further voice is detected from the user while performing the processes of S2 to S16 for the voice 1b (YES in S18 for the second time), the interactive robot 100 performs the processes of S19 and S20. After executing the above, the processes of S2 to S17 can be further executed. Furthermore, the number of times of execution is arbitrary, and the number of times may be set in advance.

〔実施形態２〕
図６および図７に基づいて、本発明の第２の実施の形態（実施形態２）を説明する。本実施の形態では、実施形態１に追加される構成や、実施形態１の構成とは異なる構成のみについて説明する。すなわち、実施形態１において記載された構成は、実施形態２にもすべて含まれ得る。また、実施形態１において記載された用語の定義は、実施形態２においても同じである。 [Embodiment 2]
A second embodiment (Embodiment 2) of the present invention will be described based on FIGS. In the present embodiment, only the configuration added to the first embodiment and the configuration different from the configuration of the first embodiment will be described. That is, all the configurations described in the first embodiment can be included in the second embodiment. Moreover, the definition of the term described in Embodiment 1 is the same also in Embodiment 2.

（実施形態１との相違点）
図６は、対話ロボット１０１の要部構成を示すブロック図である。実施形態１に係る対話ロボット１００は、サーバ２０と通信可能に接続されて対話システム３００を構成し、フレーズセット５ａにフレーズ６ａまたはフレーズ６ｂが存在しない場合、上記対話ロボット１００は、当該フレーズ６ａまたは当該フレーズ６ｂを上記サーバ２０から取得した（クラウド構成）。 (Differences from Embodiment 1)
FIG. 6 is a block diagram showing a main configuration of the dialogue robot 101. The interactive robot 100 according to the first embodiment is communicably connected to the server 20 to form the interactive system 300. When the phrase 6a or the phrase 6b does not exist in the phrase set 5a, the interactive robot 100 The phrase 6b was obtained from the server 20 (cloud configuration).

一方、本実施の形態においては、対話ロボット１０１が備える記憶部３０ａにフレーズセット５ａおよびフレーズセット５ｂが格納されており、対話ロボット１０１は、上記フレーズ６ａまたは上記フレーズ６ｂを上記記憶部３０ａから取得する（スタンドアロン構成）。したがって、図６に示されるように、対話ロボット１０１は、対話ロボット１００が備えた通信部５０ａおよび音声送出部１５を備えない（フレーズを取得するためにサーバ２０と通信する必要がないだけであるため、対話ロボット１０１は通信部５０ａまたは音声送出部１５を備えていてもよい）。 On the other hand, in the present embodiment, the phrase set 5a and the phrase set 5b are stored in the storage unit 30a included in the dialog robot 101, and the dialog robot 101 acquires the phrase 6a or the phrase 6b from the storage unit 30a. Yes (stand-alone configuration). Therefore, as shown in FIG. 6, the dialogue robot 101 does not include the communication unit 50a and the voice sending unit 15 included in the dialogue robot 100 (it is only unnecessary to communicate with the server 20 in order to acquire a phrase). Therefore, the dialogue robot 101 may include the communication unit 50a or the voice sending unit 15).

上記対話ロボット１０１はフレーズを取得するためにサーバ２０と通信しないため、「対話ロボットとサーバとの間で通信が停滞する」ことを原因として、ユーザにフレーズを提示するタイミングが遅延するという問題は生じ得ない。しかし、実施形態１ではサーバ２００が集中管理していたフレーズセット５ｂ（フレーズセット５ａよりも多くのパターンを含むクラウド辞書）を、実施形態２では対話ロボット１０１（記憶部３０ａ）がローカル辞書として保持するため、当該ローカル辞書から適切なフレーズを抽出する処理が一層重くなり、上記タイミングが遅延し得る。したがって、前述したように、先の呼びかけに対する先のフレーズと、後の呼びかけに対する後のフレーズとが交錯することにより、上記先のフレーズが未提示のままになるおそれが考えられる。 Since the interactive robot 101 does not communicate with the server 20 in order to acquire a phrase, the problem that the timing of presenting a phrase to the user is delayed due to “communication between the interactive robot and the server”. It cannot happen. However, the phrase set 5b (cloud dictionary including more patterns than the phrase set 5a) that is centrally managed by the server 200 in the first embodiment is held as a local dictionary in the second embodiment by the interactive robot 101 (storage unit 30a). Therefore, the process of extracting an appropriate phrase from the local dictionary becomes heavier and the timing can be delayed. Therefore, as described above, there is a possibility that the previous phrase for the previous call and the subsequent phrase for the subsequent call are mixed to leave the previous phrase unpresented.

そこで、上記対話ロボット１０１は、先のフレーズを提示する前に、後のフレーズが新たに取得された場合、先のフレーズをユーザに提示することが必要であるときは、後のフレーズを提示した後に、先のフレーズを提示する。これにより、上記対話ロボット１０１は、先のフレーズをユーザに提示することが必要と判断される場合、当該先のフレーズを未提示のままにすることなく、当該先のフレーズを必ずユーザに提示できる。 Therefore, when the later phrase is newly acquired before the previous phrase is presented, the dialogue robot 101 presents the later phrase when it is necessary to present the previous phrase to the user. Later, the previous phrase is presented. As a result, when it is determined that it is necessary to present the previous phrase to the user, the interactive robot 101 can always present the previous phrase to the user without leaving the previous phrase unpresented. .

（対話ロボット１０１の構成）
回答確定部１３は、音声を認識した結果に基づいて、ユーザに返す回答を確定する。具体的には、音声認識部１２から上記認識結果４ａまたは上記認識結果４ｂが入力された場合、回答確定部１３は、記憶部３０ａに格納されたフレーズセット５ａを参照し、上記認識結果４ａまたは上記認識結果４ｂに含まれる上記テキストを含むパターンが、当該フレーズセット５ａに含まれるか否かを判定する。 (Configuration of Dialogue Robot 101)
The answer determination unit 13 determines the answer to be returned to the user based on the result of recognizing the voice. Specifically, when the recognition result 4a or the recognition result 4b is input from the speech recognition unit 12, the answer determination unit 13 refers to the phrase set 5a stored in the storage unit 30a, and the recognition result 4a or It is determined whether or not a pattern including the text included in the recognition result 4b is included in the phrase set 5a.

含まれると判定される場合、回答確定部１３は、当該パターンに対応付けられたフレーズをユーザに返す回答として確定し、回答取得部１４に出力する。一方、含まれないと判定される場合、回答確定部１３は、記憶部３０ａに格納されたフレーズセット５ｂを参照し、上記テキストを含むパターンを抽出する。そして、回答確定部１３は、当該パターンに対応付けられたフレーズ（フレーズ６ａまたはフレーズ６ｂ）をユーザに返す回答として確定する。このとき、回答確定部１３は、回答を保留するフレーズを回答取得部１４に出力することによって、当該フレーズをユーザに提示してよい。 If it is determined that the phrase is included, the answer determination unit 13 determines the phrase associated with the pattern as an answer to be returned to the user, and outputs the answer to the answer acquisition unit 14. On the other hand, when it is determined not to be included, the answer confirmation unit 13 refers to the phrase set 5b stored in the storage unit 30a and extracts a pattern including the text. Then, the answer determination unit 13 determines the phrase (phrase 6a or phrase 6b) associated with the pattern as an answer to be returned to the user. At this time, the answer determination unit 13 may present the phrase to the user by outputting a phrase for suspending the answer to the answer acquisition unit 14.

次に、回答確定部１３は、フレーズをユーザに提示することが必要であるか否かに応じて、重要フラグを生成する。具体的には、回答確定部１３は、上記フレーズ６ａまたは上記フレーズ６ｂに「重要度」が設定されているか否かを判定する。設定されていると判定される場合、回答確定部１３は、「重要」を示す重要フラグ７を生成し、当該重要フラグ７と、当該重要フラグ７に対応するフレーズ（フレーズ６ａまたはフレーズ６ｂ）とを回答取得部１４に出力する。設定されていないと判定される場合、回答確定部１３は、「通常」を示す重要フラグ７を生成し、当該重要フラグ７を回答取得部１４に出力する。 Next, the answer determination unit 13 generates an important flag according to whether or not it is necessary to present the phrase to the user. Specifically, the answer determination unit 13 determines whether or not “importance” is set for the phrase 6a or the phrase 6b. If it is determined that it is set, the answer determination unit 13 generates an important flag 7 indicating “important”, and the important flag 7 and a phrase (phrase 6a or phrase 6b) corresponding to the important flag 7 Is output to the answer acquisition unit 14. If it is determined that it is not set, the answer determination unit 13 generates an important flag 7 indicating “normal” and outputs the important flag 7 to the answer acquiring unit 14.

回答取得部（取得手段）１４は、フレーズ６ａ、フレーズ６ｂ、および、重要フラグ７を回答確定部１３から取得する。これらを取得した後の処理は、実施形態１で説明したものと同じである。 The answer acquisition unit (acquisition means) 14 acquires the phrase 6 a, the phrase 6 b, and the important flag 7 from the answer determination unit 13. The processing after obtaining these is the same as that described in the first embodiment.

（対話ロボット１０１が実行する処理）
図７は、対話ロボット１０１が実行する処理の一例を示すフローチャートである。上記フローチャートは、図５に例示されたフローチャート（対話システム３００において実行される処理）から通信に関係する処理（Ｓ７、Ｓ８、Ｓ１４、Ｓ１５）、および、サーバ２０が再度音声認識を行う処理（Ｓ９）が除かれ、サーバ２００において実行された処理（Ｓ１０〜Ｓ１３）が上記対話ロボット１０１（回答確定部１３）において実行される（Ｓ２２〜Ｓ２５）としたものである。 (Processing executed by the interactive robot 101)
FIG. 7 is a flowchart illustrating an example of processing executed by the interactive robot 101. The flowchart is a process related to communication (S7, S8, S14, S15) from the flowchart illustrated in FIG. 5 (process executed in the dialogue system 300), and a process in which the server 20 performs voice recognition again (S9). ), And the processing (S10 to S13) executed in the server 200 is executed (S22 to S25) in the interactive robot 101 (answer confirmation unit 13).

すなわち、回答確定部１３は、音声１ａを認識した結果（認識結果４ａ）に対応付けられたフレーズ６ａを、フレーズセット５ｂ（クラウド辞書）において特定する（Ｓ２２）。そして、回答確定部１３は、当該フレーズ６ａをユーザに提示することが必要であるか否かを判定し（Ｓ２３）、「高い」と判定される場合（Ｓ２３においてＹＥＳ）、「高い」を示す重要フラグ７を生成し（Ｓ２４）、「高い」と判定されない場合（Ｓ２３においてＮＯ）、「通常」を示す重要フラグ７を生成する（Ｓ２５）。 That is, the answer determination unit 13 specifies the phrase 6a associated with the result of recognizing the voice 1a (recognition result 4a) in the phrase set 5b (cloud dictionary) (S22). Then, the answer confirmation unit 13 determines whether or not it is necessary to present the phrase 6a to the user (S23). When it is determined to be “high” (YES in S23), “high” is indicated. The important flag 7 is generated (S24), and if it is not determined to be “high” (NO in S23), the important flag 7 indicating “normal” is generated (S25).

〔実施形態３〕
本発明の第３の実施の形態（実施形態３）を説明する。本実施の形態では、実施形態１または実施形態２に追加される構成や、実施形態１または実施形態２の構成とは異なる構成のみについて説明する。すなわち、実施形態１または実施形態２において記載された構成は、実施形態３にもすべて含まれ得る。また、実施形態１または実施形態２において記載された用語の定義は、実施形態３においても同じである。 [Embodiment 3]
A third embodiment (Embodiment 3) of the present invention will be described. In the present embodiment, only a configuration added to the first embodiment or the second embodiment or a configuration different from the configuration of the first embodiment or the second embodiment will be described. That is, all the configurations described in the first embodiment or the second embodiment can be included in the third embodiment. Moreover, the definition of the term described in Embodiment 1 or Embodiment 2 is the same also in Embodiment 3.

（フレーズの動的変化）
対話ロボット（制御部１０ａを備えるもの）は、フレーズを取得した時点における状況を反映させるように、動的に変化させたフレーズをユーザに提示してよい。すなわち、ユーザが対話ロボットに対して「今日の天気は？」と音声１ａによって呼びかけ、当該対話ロボットが「雨だよ、傘を持っていってね」というフレーズ６ａを音声１ｃによって上記ユーザに提示する一例を前述したが、「今日の天気」が「晴れ」である場合、上記対話ロボットは「晴れだよ、傘はいらないね」というフレーズ６ａを提示できる。 (Phrase dynamic change)
The interactive robot (including the control unit 10a) may present the dynamically changed phrase to the user so as to reflect the situation at the time when the phrase is acquired. That is, the user calls the dialogue robot “What's the weather today?” With the voice 1a, and the dialogue robot presents the phrase 6a, “It's raining, bring an umbrella” to the user with the voice 1c. As described above, when the “today's weather” is “sunny”, the dialogue robot can present the phrase 6a “sunny, don't need an umbrella”.

具体的には、回答確定部１３（スタンドアロン構成の場合）または回答特定部２３（クラウド構成の場合）は、所定のウェブサービスから動的に変化する情報（例えば、天気の予報を示すテキスト）を取得し、当該情報に基づいて補完したフレーズを生成して（例えば、「・・・だよ」の更新可能な箇所「・・・」に、「晴れ」、「雨」などの上記テキストを挿入する）、当該補完したフレーズでフレーズセット５ｂを更新する。そして、回答取得部１４は、外部サービス（例えば、天気予報を提供する上記所定のウェブサービスなど）から取得可能な動的に変化する所定の情報によって補完された上記フレーズを取得する。これにより、上記対話ロボットは、動的に変化する上記フレーズをユーザに提示できる。 Specifically, the answer determination unit 13 (in the case of a stand-alone configuration) or the answer specifying unit 23 (in the case of a cloud configuration) receives information that dynamically changes from a predetermined web service (for example, text indicating a weather forecast). Acquire and generate a complemented phrase based on the information (for example, insert the above text such as “Sunny”, “Rain” in the updatable part “...” of “... Dayo”) The phrase set 5b is updated with the complemented phrase. And the reply acquisition part 14 acquires the said phrase complemented by the predetermined information which changes dynamically which can be acquired from external services (For example, the said predetermined web service etc. which provide a weather forecast). Thereby, the dialog robot can present the dynamically changing phrase to the user.

（重要フラグの動的設定）
フレーズセット５ｂに含まれる重要フラグ７は、パターンまたはフレーズに応じて、ユーザによってあらかじめ設定される。すなわち、ユーザは、フレーズセット５ａまたはフレーズセット５ｂの所定のレコードにパターン（検出ワード）を登録し、当該パターンに対して重要フラグ７を設定することができる。また、ユーザは、パターンを登録した後でも上記重要フラグ７を新たに設定または変更できる。 (Dynamic setting of important flag)
The important flag 7 included in the phrase set 5b is set in advance by the user in accordance with the pattern or the phrase. That is, the user can register a pattern (detection word) in a predetermined record of the phrase set 5a or the phrase set 5b and set the important flag 7 for the pattern. Further, the user can newly set or change the important flag 7 even after registering the pattern.

一方、上記重要フラグ７は、動的に変更されてもよい。例えば、音声１ａまたは音声１ｂの音量が通常よりも大きい場合（音声信号２ａまたは音声信号２ｂのゲインが所定のしきい値（または、これまでに得たゲインの平均値）を超える場合）、回答確定部１３（スタンドアロン構成の場合）またはフラグ生成部２４（クラウド構成の場合）は、「重要」を示す重要フラグ７を生成してよい。逆に、上記音量が通常よりも小さい場合、「通常」を示す重要フラグ７を生成してよい。 On the other hand, the important flag 7 may be dynamically changed. For example, when the volume of the voice 1a or the voice 1b is higher than normal (when the gain of the voice signal 2a or the voice signal 2b exceeds a predetermined threshold (or an average value of gains obtained so far)) The determination unit 13 (in the case of a stand-alone configuration) or the flag generation unit 24 (in the case of a cloud configuration) may generate the important flag 7 indicating “important”. Conversely, when the volume is lower than normal, the important flag 7 indicating “normal” may be generated.

あるいは、上記音声１ａまたは上記音声１ｂが疑問形であると判定される場合（上記音声１ａまたは上記音声１ｂの周波数が高くなる場合、認識結果４ａまたは認識結果４ｂが所定のパターンで終了する場合など）回答確定部１３またはフラグ生成部２４は、「重要」を示す重要フラグ７を生成してよい。逆に、疑問形でないと判定される場合、「通常」を示す重要フラグ７を生成してよい。 Alternatively, when it is determined that the voice 1a or the voice 1b is questionable (when the frequency of the voice 1a or the voice 1b increases, the recognition result 4a or the recognition result 4b ends in a predetermined pattern, etc.) The answer determination unit 13 or the flag generation unit 24 may generate the important flag 7 indicating “important”. On the contrary, when it is determined that it is not questionable, the important flag 7 indicating “normal” may be generated.

〔ソフトウェアによる実現例〕
対話ロボット１００およびサーバ２００の制御ブロック（特に、制御部１０ａおよび制御部１０ｂ）は、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ＣＰＵ（Central Processing Unit）を用いてソフトウェアによって実現してもよい。後者の場合、対話ロボット１００およびサーバ２００は、各機能を実現するソフトウェアであるプログラム（制御プログラム）の命令を実行するＣＰＵ、上記プログラムおよび各種データがコンピュータ（またはＣＰＵ）で読み取り可能に記録されたＲＯＭ（Read Only Memory）または記憶装置（これらを「記録媒体」と称する）、上記プログラムを展開するＲＡＭ（Random Access Memory）などを備えている。そして、コンピュータ（またはＣＰＵ）が上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記記録媒体としては、「一時的でない有形の媒体」、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体（通信ネットワークや放送波等）を介して上記コンピュータに供給されてもよい。本発明は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 [Example of software implementation]
The control blocks (particularly, the control unit 10a and the control unit 10b) of the interactive robot 100 and the server 200 may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or a CPU (Central It may be realized by software using a Processing Unit. In the latter case, the interactive robot 100 and the server 200 have a CPU that executes instructions of a program (control program) that is software for realizing each function, and the program and various data are recorded so as to be readable by the computer (or CPU). A ROM (Read Only Memory) or a storage device (these are referred to as “recording media”), a RAM (Random Access Memory) for expanding the program, and the like are provided. And the objective of this invention is achieved when a computer (or CPU) reads the said program from the said recording medium and runs it. As the recording medium, a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. The program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program. The present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.

〔まとめ〕
本発明の態様１に係る情報処理装置は、ユーザが発した音声（１ａ、１ｂ）に応じて、当該ユーザに所定のフレーズを提示する情報処理装置（対話ロボット１００、対話ロボット１０１）であって、前記音声が認識された結果（４ａ、４ｂ）に対応付けられた第１のフレーズ（６ａ）を取得する取得手段（回答取得部１４）と、前記第１のフレーズを提示する前に、当該第１のフレーズとは異なる第２のフレーズ（６ｂ）が前記取得手段によって新たに取得された場合、前記第１のフレーズを前記ユーザに提示することが必要であるときは、前記第２のフレーズを提示した後に、前記第１のフレーズを提示する提示手段（音声出力部１６）とを備えている。 [Summary]
The information processing apparatus according to the first aspect of the present invention is an information processing apparatus (the interactive robot 100 or the interactive robot 101) that presents a predetermined phrase to the user according to the voice (1a, 1b) uttered by the user. , The acquisition means (answer acquisition unit 14) for acquiring the first phrase (6a) associated with the result (4a, 4b) recognized by the voice, and before presenting the first phrase, When it is necessary to present the first phrase to the user when the second phrase (6b) different from the first phrase is newly acquired by the acquisition means, the second phrase And presenting means (speech output unit 16) for presenting the first phrase.

人間と機械とが自然にコミュニケーションすることが前提となる場合、ユーザから機械への呼びかけに対するフレーズの提示が遅れることによって、当該提示の前に、ユーザから機械へさらなる呼びかけが行われるときがある。このとき、当該さらなる呼びかけに対してのみフレーズが新たに提示され、先のフレーズが未提示のままとなるおそれがある。 When it is assumed that a human and a machine communicate naturally, there is a case in which a further call is made from the user to the machine before the presentation due to a delay in the presentation of the phrase for the call from the user to the machine. At this time, the phrase is newly presented only for the further call, and the previous phrase may remain unpresented.

上記構成によれば、上記情報処理装置は、上記第１のフレーズを提示する前に、上記第２のフレーズが新たに取得された場合、上記第１のフレーズをユーザに提示することが必要であるときは、上記第２のフレーズを提示した後に、上記第１のフレーズを提示する。したがって、上記情報処理装置は、上記先のフレーズ（第１のフレーズ）を未提示のままにすることなく、確実にユーザに提示できる。 According to the above configuration, the information processing apparatus needs to present the first phrase to the user when the second phrase is newly acquired before presenting the first phrase. In some cases, after presenting the second phrase, the first phrase is presented. Therefore, the information processing apparatus can reliably present to the user without leaving the previous phrase (first phrase) unpresented.

本発明の態様２に係る情報処理装置では、上記態様１において、前記取得手段は、前記第１のフレーズを前記ユーザに提示することが必要であるか否かを示す要否情報（重要フラグ７）をさらに取得し、前記提示手段は、前記要否情報が必要であることを示すときは、前記第２のフレーズを提示した後に、前記第１のフレーズを提示してよい。 In the information processing device according to aspect 2 of the present invention, in the above aspect 1, the acquisition unit needs information (important flag 7) indicating whether or not it is necessary to present the first phrase to the user. ) And the presenting means may present the first phrase after presenting the second phrase when indicating that the necessity information is necessary.

上記構成によれば、上記情報処理装置は、上記第１のフレーズをユーザに提示することが必要であるか否かを、上記要否情報によって知ることができる。そして、当該要否情報によってユーザに提示することが必要であると判定される場合、上記情報処理装置は、上記第２のフレーズを提示した後に、上記第１のフレーズを提示する。したがって、上記情報処理装置は、上記第１のフレーズを未提示のままにすることなく、確実にユーザに提示できる。 According to the configuration, the information processing apparatus can know whether or not it is necessary to present the first phrase to the user based on the necessity information. And when it determines with it being necessary to show to a user by the said necessity information, the said information processing apparatus presents the said 1st phrase after presenting the said 2nd phrase. Therefore, the information processing apparatus can reliably present the first phrase to the user without leaving the first phrase unpresented.

本発明の態様３に係る情報処理装置は、上記態様１または態様２において、前記第１のフレーズを提示する前に、前記第２のフレーズが新たに取得された場合、当該第１のフレーズを前記ユーザに提示することが必要であるときは、当該第１のフレーズを所定の記憶部（３０ａ）に格納する格納手段（回答格納部１８）をさらに備え、前記提示手段は、前記所定の記憶部に前記第１のフレーズが格納されている場合、前記第２のフレーズを提示した後に、当該第１のフレーズを当該所定の記憶部から読み出して提示してよい。 The information processing device according to aspect 3 of the present invention, in the above-described aspect 1 or 2, when the second phrase is newly acquired before presenting the first phrase, When it is necessary to present it to the user, it further comprises storage means (answer storage section 18) for storing the first phrase in a predetermined storage section (30a), wherein the presentation means is configured to store the predetermined storage section. When the first phrase is stored in the section, the first phrase may be read from the predetermined storage section and presented after the second phrase is presented.

上記構成によれば、上記情報処理装置は、上記第１のフレーズをユーザに提示することが必要であるときは、当該第１のフレーズを記憶部に待避させる。そして、上記情報処理装置は、上記記憶部に上記第１のフレーズが存在する場合、上記第２のフレーズを提示した後に当該第１のフレーズをユーザに提示する。したがって、上記情報処理装置は、上記第１のフレーズを未提示のままにすることなく、確実にユーザに提示できる。 According to the above configuration, when the information processing apparatus needs to present the first phrase to the user, the information processing apparatus causes the storage unit to save the first phrase. Then, when the first phrase is present in the storage unit, the information processing apparatus presents the first phrase to the user after presenting the second phrase. Therefore, the information processing apparatus can reliably present the first phrase to the user without leaving the first phrase unpresented.

本発明の態様４に係るサーバは、ユーザが情報処理装置に対して発した音声に応じて、当該ユーザに所定のフレーズを提示するように当該情報処理装置を制御するサーバ（２００）であって、前記音声を認識した結果に対応付けられたフレーズを、所定のフレーズセット（５ｂ）において特定する特定手段（回答特定部２３）と、前記フレーズを前記ユーザに提示することが必要であるか否かに応じて、要否情報を生成する生成手段（フラグ生成部２４）と、前記フレーズおよび前記要否情報を前記情報処理装置に送信する送信手段（情報送出部２５）とを備えている。 The server according to aspect 4 of the present invention is a server (200) that controls the information processing apparatus so as to present a predetermined phrase to the user in accordance with a voice uttered by the user to the information processing apparatus. , Specifying means (answer specifying unit 23) for specifying a phrase associated with the result of recognizing the voice in a predetermined phrase set (5b), and whether or not it is necessary to present the phrase to the user Accordingly, a generation unit (flag generation unit 24) for generating necessity information and a transmission unit (information transmission unit 25) for transmitting the phrase and the necessity information to the information processing apparatus are provided.

上記構成によれば、上記サーバは、上記音声を認識した結果に応じて、上記フレーズおよび上記要否情報を上記情報処理装置に送信する。ここで、上記音声を認識した結果にマッチするパターンは、上記情報処理装置が保持するフレーズセットよりも、上記サーバが保持するフレーズセットの方に多く含まれることが通常であるため、ユーザが上記情報処理装置に対して複雑な呼びかけを行った場合であっても、上記サーバは、適切なフレーズを返すように上記情報処理装置を制御できる。 According to the said structure, the said server transmits the said phrase and the said necessity information to the said information processing apparatus according to the result of having recognized the said audio | voice. Here, since the pattern that matches the result of recognizing the voice is usually included more in the phrase set held by the server than the phrase set held by the information processing apparatus, Even when a complicated call is made to the information processing apparatus, the server can control the information processing apparatus to return an appropriate phrase.

また、上記フレーズをユーザに提示することが必要であるか否かを、上記要否情報によって上記情報処理装置に知らせることができる。したがって、上記サーバは、上記フレーズを未提示のままにすることなく、確実にユーザに提示するように、上記情報処理装置を制御することができる。 Further, whether or not it is necessary to present the phrase to the user can be notified to the information processing apparatus by the necessity information. Therefore, the server can control the information processing apparatus so as to reliably present the phrase to the user without leaving the phrase unpresented.

本発明の別態様に係る対話システム（３００）は、上記態様のいずれか１つの態様に係る情報処理装置と、上記態様に係るサーバとを含んでいる。 An interactive system (300) according to another aspect of the present invention includes the information processing apparatus according to any one of the above aspects and the server according to the above aspect.

したがって、上記対話システムは、上記態様のいずれか１つの態様に係る情報処理装置、または、上記態様に係るサーバと同様の効果を奏する。 Therefore, the dialog system has the same effect as the information processing apparatus according to any one of the aspects or the server according to the aspect.

本発明の別態様に係る情報処理装置の制御方法は、ユーザが発した音声に応じて、当該ユーザに所定のフレーズを提示する情報処理装置の制御方法であって、前記音声が認識された結果に対応付けられた第１のフレーズを取得する取得ステップ（Ｓ１６）と、前記第１のフレーズを提示する前に、当該第１のフレーズとは異なる第２のフレーズを前記取得ステップにおいて新たに取得した場合、当該第１のフレーズを前記ユーザに提示することが必要であるときは、前記第２のフレーズを提示した後に、前記第１のフレーズを提示する提示ステップ（Ｓ１７）とを含んでいる。 An information processing apparatus control method according to another aspect of the present invention is an information processing apparatus control method for presenting a predetermined phrase to a user according to a voice uttered by the user, and the result of the recognition of the voice An acquisition step (S16) for acquiring a first phrase associated with the first phrase, and before presenting the first phrase, a second phrase different from the first phrase is newly acquired in the acquisition step In this case, when it is necessary to present the first phrase to the user, it includes a presenting step (S17) of presenting the first phrase after presenting the second phrase. .

したがって、上記情報処理装置の制御方法は、上記態様に係る情報処理装置と同様の効果を奏する。 Therefore, the control method of the information processing apparatus has the same effect as the information processing apparatus according to the aspect.

本発明の各態様に係る情報処理装置およびサーバは、コンピュータによって実現されてもよく、この場合、コンピュータを上記情報処理装置および上記サーバが備えた各手段として動作させることにより、上記情報処理装置および上記サーバをコンピュータにおいて実現させる情報処理装置の制御プログラム、サーバの制御プログラム、および、それらを記録したコンピュータ読み取り可能な記録媒体も、本発明の範疇に入る。 The information processing device and the server according to each aspect of the present invention may be realized by a computer. In this case, the information processing device and the server are operated by causing the computer to operate as each unit included in the information processing device and the server. A control program for an information processing apparatus that implements the server in a computer, a control program for the server, and a computer-readable recording medium that records them are also within the scope of the present invention.

本発明は上述したそれぞれの実施の形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施の形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施の形態についても、本発明の技術的範囲に含まれる。さらに、各実施の形態にそれぞれ開示された技術的手段を組み合わせることにより、新しい技術的特徴を形成できる。 The present invention is not limited to the above-described embodiments, and various modifications can be made within the scope of the claims, and the technical means disclosed in different embodiments can be appropriately combined. Embodiments to be made are also included in the technical scope of the present invention. Furthermore, a new technical feature can be formed by combining the technical means disclosed in each embodiment.

本発明は、ユーザが発した音声に応じて、当該ユーザに所定のフレーズを提示する装置に広く適用することができる。 The present invention can be widely applied to an apparatus that presents a predetermined phrase to the user according to the voice uttered by the user.

１ａ音声
１ｂ音声
４ａ認識結果（認識された結果）
４ｂ認識結果（認識された結果）
５ｂフレーズセット（所定のフレーズセット）
６ａフレーズ（第１のフレーズ）
６ｂフレーズ（第２のフレーズ）
７重要フラグ（要否情報）
１４回答取得部（取得手段）
１６音声出力部（提示手段）
１８回答格納部（格納手段）
２３回答特定部（特定手段）
２４フラグ生成部（生成手段）
２５情報送出部（送信手段）
３０ａ記憶部（所定の記憶部）
１００対話ロボット（情報処理装置）
１０１対話ロボット（情報処理装置）
２００サーバ
３００対話システム 1a voice 1b voice 4a recognition result (recognition result)
4b Recognition result (recognized result)
5b Phrase set (predetermined phrase set)
6a Phrase (first phrase)
6b Phrase (second phrase)
7 Important flag (necessity information)
14 Response acquisition unit (acquisition means)
16 Audio output unit (presentation means)
18 Answer storage (storage means)
23 answer identification part (identification means)
24 Flag generator (generator)
25 Information sending part (transmission means)
30a storage unit (predetermined storage unit)
100 Dialogue robot (information processing device)
101 Dialogue robot (information processing device)
200 server 300 interactive system

Claims

An information processing apparatus that presents a predetermined phrase to the user according to the voice uttered by the user,
Obtaining means for obtaining a first phrase associated with the result of recognition of the voice;
Before presenting the first phrase, when a second phrase different from the first phrase is newly acquired by the acquisition unit, it is necessary to present the first phrase to the user. In some cases, an information processing apparatus comprising: a presentation unit that presents the first phrase after presenting the second phrase.

The acquisition means further acquires necessity information indicating whether or not it is necessary to present the first phrase to the user;
2. The information processing according to claim 1, wherein the presenting means presents the first phrase after presenting the second phrase when indicating that the necessity information is necessary. apparatus.

When the second phrase is newly acquired before presenting the first phrase, when it is necessary to present the first phrase to the user, the first phrase is predetermined. Storage means for storing in the storage unit of
When the first phrase is stored in the predetermined storage unit, the presenting means reads and presents the first phrase from the predetermined storage unit after presenting the second phrase. The information processing apparatus according to claim 1, wherein:

A server that controls the information processing apparatus so as to present a predetermined phrase to the user in response to a voice uttered by the user to the information processing apparatus
A specifying means for specifying a phrase associated with the result of recognizing the voice in a predetermined phrase set;
Depending on whether or not it is necessary to present the phrase to the user, generating means for generating necessity information;
A server comprising: transmission means for transmitting the phrase and the necessity information to the information processing apparatus.

A control program for causing a computer to function as the information processing apparatus according to claim 1, wherein the control program causes the computer to function as each of the means.