JP2010152119A

JP2010152119A - Response generation device and program

Info

Publication number: JP2010152119A
Application number: JP2008330639A
Authority: JP
Inventors: Takakatsu Yoshimura; 貴克吉村; Kazuya Shimooka; 和也下岡; Hiroyuki Hoshino; 博之星野; Ryoko Hotta; 良子堀田; Yusuke Nakano; 雄介中野
Original assignee: Toyota Motor Corp; Toyota Central R&D Labs Inc
Current assignee: Toyota Motor Corp; Toyota Central R&D Labs Inc
Priority date: 2008-12-25
Filing date: 2008-12-25
Publication date: 2010-07-08
Anticipated expiration: 2028-12-25
Also published as: JP5195405B2

Abstract

<P>PROBLEM TO BE SOLVED: To output a response sentence in proper timing so that an input is not inhibited by the output of the response sentence from a device when a user intends to input an input sentence voluntarily or in response to the response sentence from the device. <P>SOLUTION: A language analysis part 20 performs voice recognition to a voice signal input from a microphone 12 to perform morphological analysis, an intention estimation part 22 estimates the intention of user utterance on the basis of analysis results, a response candidate generation part 124 generates response sentence candidates on the basis of a response candidate type associated with the estimated utterance intention, and an output timing measurement part 26 and an utterance waiting time clocking part 126 clocks whether or not waiting time determined for each combination of the utterance intention and the response candidate type. When the waiting time passes without the input of the next utterance from the user, it is decided that output timing of the response sentence comes, and an output part 28 converts the generated response sentence candidates into the voice signal to be output from a speaker 14. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、応答生成装置及びプログラムに係り、特に、適切なタイミングで応答文の出力を行う応答生成装置及びプログラムに関する。 The present invention relates to a response generation device and a program, and more particularly, to a response generation device and a program that output a response sentence at an appropriate timing.

従来、適切なタイミングで応答する音声対話装置として、入力された音声の情報としての話速を算出し、また、入力された音声から単語列を抽出して、抽出した単語列の出現確率から後続して入力される単語列を予測し、算出した話速に基づいて、予測した後続して入力される単語列が入力されるまでの後続入力時間を予測し、後続入力時間に基づいて応答文の出力タイミングを決定する音声対話装置が提案されている（例えば、特許文献１参照）。また、この音声対話装置では、入力された音声の情報として、発話長、基本周波数、形態素数、品詞列などの情報を用いてもよいことが記載されている。 Conventionally, as a voice interaction device that responds at an appropriate timing, the speech speed as information of the input voice is calculated, and a word string is extracted from the input voice, and the succeeding from the appearance probability of the extracted word string Predicting the input word string, predicting the subsequent input time until the predicted subsequent word string is input based on the calculated speech speed, and responding based on the subsequent input time Has been proposed (see, for example, Patent Document 1). Further, it is described that in this spoken dialogue apparatus, information such as an utterance length, a fundamental frequency, a morpheme number, and a part of speech string may be used as input voice information.

また、入力された音声に対応する文章データに所定回数以上出現した単語が存在しない場合には、一区切りの音声入力毎に相槌の応答文を出力し、所定回数以上出現した単語が存在する場合には、同様に相槌を出力すると共に、該単語をキーワードとして関連する内容を検索し、検索された内容に基づく応答文を出力する対話装置が提案されている（例えば、特許文献２参照）。
特開２００８−２４１８９０号公報特開２００７−３２８２８３号公報 In addition, when there are no words that appear more than a predetermined number of times in the sentence data corresponding to the input voice, a response sentence is output for each segmented voice input, and there are words that appear more than a predetermined number of times. Has also proposed a dialogue apparatus that outputs a response, searches for related contents using the word as a keyword, and outputs a response sentence based on the searched contents (see, for example, Patent Document 2).
JP 2008-241890 A JP 2007-328283 A

しかしながら、上記の特許文献１の音声対話装置では、入力された音声の情報に基づいて応答文の出力タイミングを決定しているため、出力する応答文が異なる場合であっても、音声入力直後などの同じタイミングで出力されてしまい、ユーザが自発的に次の音声入力を行おうとしているにもかかわらず、装置からの応答文が出力されてしまう場合がある、という問題がある。 However, in the above-mentioned spoken dialogue apparatus of Patent Document 1, since the output timing of the response sentence is determined based on the input voice information, even if the response sentence to be output is different, immediately after the voice input, etc. However, there is a problem that a response sentence may be output from the device even though the user is voluntarily inputting the next voice.

また、上記特許文献２の対話装置では、所定回数以上出現した単語が存在する場合には、相槌を出力した後に検索された内容に基づく応答文を出力しているが、先に出力する応答文が相槌以外の場合には、ユーザが先に出力された応答文に対して応答するために次の音声入力を行おうとしているにもかかわらず、装置からの応答文が出力されてしまう場合がある、という問題がある。 Moreover, in the dialogue device of Patent Document 2, when there is a word that has appeared more than a predetermined number of times, a response sentence based on the retrieved content is output after outputting the conflict, but the response sentence output first In other cases, the response text from the device may be output even though the user is trying to input the next voice to respond to the response text output earlier. There is a problem that there is.

本発明は、上記の問題を解決するためになされたものであり、ユーザが自発的に、または装置からの応答文に応答して入力文を入力しようとしている場合に、装置からの応答文の出力により入力が妨げられないように、適切なタイミングで応答文の出力をすることができる応答生成装置及びプログラムを提供することを目的とする。 The present invention has been made in order to solve the above-described problem. When a user is trying to input an input sentence voluntarily or in response to a response sentence from the apparatus, the response sentence from the apparatus is An object of the present invention is to provide a response generation device and a program capable of outputting a response sentence at an appropriate timing so that the input is not hindered by the output.

上記目的を達成するために、第１の発明の応答生成装置は、ユーザからの入力文を入力する入力手段と、前記入力手段によって入力された前記入力文の構造を解析した解析結果から、前記入力文が表す意図を推定する意図推定手段と、前記意図推定手段で推定された前記入力文が表す意図に応じた応答文を少なくとも１つ以上生成する応答生成手段と、前記意図推定手段で推定された意図と前記応答生成手段で生成した応答文との組み合わせにより定まる待ち時間が経過した後に、前記応答生成手段で生成した応答文を出力すると共に、前記待ち時間が経過する前に前記入力手段によって次の入力文が入力された場合には、前記応答生成手段で生成した応答文を出力しないように制御する制御手段とを含んで構成されている。 In order to achieve the above object, a response generation apparatus according to a first aspect of the present invention includes an input unit for inputting an input sentence from a user, and an analysis result obtained by analyzing a structure of the input sentence input by the input unit. An intention estimation unit that estimates an intention represented by an input sentence, a response generation unit that generates at least one response sentence corresponding to the intention represented by the input sentence estimated by the intention estimation unit, and an estimation performed by the intention estimation unit Output a response sentence generated by the response generation means after a waiting time determined by a combination of the received intention and the response sentence generated by the response generation means, and input the input means before the waiting time elapses Control means for controlling so that the response sentence generated by the response generation means is not output when the next input sentence is input.

また、第２の発明の応答生成プログラムは、コンピュータを、ユーザからの入力文を入力する入力手段によって入力された前記入力文の構造を解析した解析結果から、前記入力文が表す意図を推定する意図推定手段と、前記意図推定手段で推定された前記入力文が表す意図に応じた応答文を少なくとも１つ以上生成する応答生成手段と、前記意図推定手段で推定された意図と前記応答生成手段で生成した応答文との組み合わせにより定まる待ち時間が経過した後に、前記応答生成手段で生成した応答文を出力すると共に、前記待ち時間が経過する前に前記入力手段によって次の入力文が入力された場合には、前記応答生成手段で生成した応答文を出力しないように制御する制御手段として機能させるためのプログラムである。 The response generation program according to the second aspect of the invention estimates the intention represented by the input sentence from the analysis result obtained by analyzing the structure of the input sentence input by the input means for inputting the input sentence from the user. An intention estimation unit, a response generation unit that generates at least one response sentence corresponding to the intention represented by the input sentence estimated by the intention estimation unit, an intention estimated by the intention estimation unit, and the response generation unit After the waiting time determined by the combination with the response sentence generated in step elapses, the response sentence generated by the response generation means is output, and the next input sentence is input by the input means before the waiting time elapses. In this case, the program is made to function as a control unit that performs control so as not to output the response sentence generated by the response generation unit.

第１の発明及び第２の発明によれば、入力手段によってユーザからの入力文が入力されると、意図推定手段が、入力文の構造を解析した解析結果から、入力文が表す意図を推定し、応答生成手段が、意図推定手段で推定された入力文が表す意図に応じた応答文を少なくとも１つ以上生成する。そして、制御手段が、意図推定手段で推定された意図と応答生成手段で生成した応答文との組み合わせにより定まる待ち時間が経過した後に、応答生成手段で生成した応答文を出力するように制御する。また、制御手段は、待ち時間が経過する前に入力手段によって次の入力文が入力された場合には、応答生成手段で生成した応答文を出力しないように制御する。 According to the first invention and the second invention, when an input sentence from the user is input by the input means, the intention estimation means estimates the intention represented by the input sentence from the analysis result obtained by analyzing the structure of the input sentence. The response generation means generates at least one response sentence corresponding to the intention represented by the input sentence estimated by the intention estimation means. Then, the control means controls to output the response sentence generated by the response generation means after the waiting time determined by the combination of the intention estimated by the intention estimation means and the response sentence generated by the response generation means has elapsed. . In addition, the control unit performs control so that the response sentence generated by the response generation unit is not output when the next input sentence is input by the input unit before the waiting time elapses.

このように、推定された発話意図と応答文との組み合わせにより定まる待ち時間が経過するまで応答文の出力を待つことにより、ユーザが自発的に入力文を入力しようとしている場合に、装置からの応答文の出力により入力が妨げられないように、適切なタイミングで応答文の出力をすることができる。 Thus, when the user is going to input the input sentence spontaneously by waiting for the output of the response sentence until the waiting time determined by the combination of the estimated utterance intention and the response sentence elapses, from the device The response text can be output at an appropriate timing so that the input is not hindered by the output of the response text.

第３の発明の応答生成装置は、ユーザからの入力文を入力する入力手段と、前記入力手段によって入力された前記入力文の構造を解析した解析結果から、前記入力文が表す意図を推定する意図推定手段と、前記意図推定手段で推定された前記入力文が表す意図に応じた応答文を少なくとも１つ以上生成する応答生成手段と、前記応答生成手段で生成された少なくとも１つ以上の応答文の中から１つの応答文を出力し、未出力の応答文が存在する場合には、前記意図推定手段で推定された意図と出力した応答文との組み合わせにより定まる待ち時間が経過した後に、前記未出力の応答文の中から他の１つの応答文を出力すると共に、前記待ち時間が経過する前に前記入力手段によって次の入力文が入力された場合には、前記他の１つの応答文を出力しないように制御する制御手段とを含んで構成されている。 According to a third aspect of the present invention, there is provided a response generation apparatus that estimates an intention represented by an input sentence from an input unit that inputs an input sentence from a user and an analysis result obtained by analyzing a structure of the input sentence input by the input unit. Intention estimation means, response generation means for generating at least one response sentence corresponding to the intention represented by the input sentence estimated by the intention estimation means, and at least one response generated by the response generation means When one response sentence is output from the sentences and there is an unoutput response sentence, after a waiting time determined by the combination of the intention estimated by the intention estimation means and the output response sentence has elapsed, When one other response sentence is output from the non-output response sentences, and the next input sentence is input by the input means before the waiting time elapses, the other one response Sentence It is configured to include a control means for controlling so as not to force.

また、第４の発明の応答生成プログラムは、コンピュータを、ユーザからの入力文を入力する入力手段によって入力された前記入力文の構造を解析した解析結果から、前記入力文が表す意図を推定する意図推定手段と、前記意図推定手段で推定された前記入力文が表す意図に応じた応答文を少なくとも１つ以上生成する応答生成手段と、前記応答生成手段で生成された少なくとも１つ以上の応答文の中から１つの応答文を出力し、未出力の応答文が存在する場合には、前記意図推定手段で推定された意図と出力した応答文との組み合わせにより定まる待ち時間が経過した後に、前記未出力の応答文の中から他の１つの応答文を出力すると共に、前記待ち時間が経過する前に前記入力手段によって次の入力文が入力された場合には、前記他の１つの応答文を出力しないように制御する制御手段として機能させるためのプログラムである。 The response generation program according to the fourth aspect of the invention estimates the intention represented by the input sentence from the analysis result obtained by analyzing the structure of the input sentence input by the input means for inputting the input sentence from the user. Intention estimation means, response generation means for generating at least one response sentence corresponding to the intention represented by the input sentence estimated by the intention estimation means, and at least one response generated by the response generation means When one response sentence is output from the sentences and there is an unoutput response sentence, after a waiting time determined by the combination of the intention estimated by the intention estimation means and the output response sentence has elapsed, When another response sentence is output from the unoutput response sentences and the next input sentence is input by the input means before the waiting time elapses, Is a program for functioning as a control means for controlling so as not to output the response sentence.

第３の発明及び第４の発明によれば、入力手段によってユーザからの入力文が入力されると、意図推定手段が、入力文の構造を解析した解析結果から、入力文が表す意図を推定し、応答生成手段が、意図推定手段で推定された入力文が表す意図に応じた応答文を少なくとも１つ以上生成する。そして、制御手段が、応答生成手段で生成された少なくとも１つ以上の応答文の中から１つの応答文を出力し、未出力の応答文が存在する場合には、意図推定手段で推定された意図と出力した応答文との組み合わせにより定まる待ち時間が経過した後に、未出力の応答文の中から他の１つの応答文を出力するように制御する。また、待ち時間が経過する前に入力手段によって次の入力文が入力された場合には、他の１つの応答文を出力しないように制御する。 According to the third and fourth inventions, when an input sentence from the user is input by the input means, the intention estimation means estimates the intention represented by the input sentence from the analysis result obtained by analyzing the structure of the input sentence. The response generation means generates at least one response sentence corresponding to the intention represented by the input sentence estimated by the intention estimation means. Then, the control means outputs one response sentence from at least one or more response sentences generated by the response generation means, and when there is an unoutput response sentence, it is estimated by the intention estimation means After a waiting time determined by the combination of the intention and the output response text has elapsed, control is performed so that another response text is output from among the non-output response texts. Further, when the next input sentence is input by the input means before the waiting time elapses, control is performed so as not to output another response sentence.

このように、１つの応答文を出力してから、推定された発話意図と応答文との組み合わせにより定まる待ち時間が経過するまで次の応答文の出力を待つことにより、ユーザが装置からの応答文に応答して入力文を入力しようとしている場合に、装置からの応答文の出力により入力が妨げられないように、適切なタイミングで応答文の出力をすることができる。 In this way, after outputting one response sentence, the user waits for the output of the next response sentence until the waiting time determined by the combination of the estimated utterance intention and the response sentence elapses. When an input sentence is to be input in response to a sentence, the response sentence can be output at an appropriate timing so that the input is not hindered by the output of the response sentence from the device.

また、第３の発明及び第４の発明の前記制御手段は、前記未出力の応答文が存在しない場合で、かつ前記待ち時間が経過した場合には、入力文の入力を促す応答文を出力するように制御することができる。これにより、対話をさらに進めることができる。 Further, the control means of the third and fourth inventions outputs a response sentence that prompts input of an input sentence when the non-output response sentence does not exist and the waiting time has elapsed. Can be controlled. Thereby, the dialogue can be further advanced.

また、前記推定手段で推定した意図に対して、前記入力文に含まれていない内容について質問または言明する応答文が組み合わされている場合の前記待ち時間を、前記推定手段で推定した意図に対して、回答、了解、繰り返し、または確認する応答文が組み合わされている場合の前記待ち時間に比べて長くするようにすることができる。 In addition, the waiting time when the response sentence that asks or states the content not included in the input sentence is combined with the intention estimated by the estimation means is compared with the intention estimated by the estimation means. Thus, it is possible to make the waiting time longer than that when the response sentence, the answer, the understanding, the repetition, or the response sentence to be confirmed are combined.

入力文に含まれていない内容について質問または言明の応答文を出力する場合には、ユーザが意図していない内容に話題が変更されることになるが、応答文の出力までの待ち時間を長くすることにより、待ち時間の間に引き続きユーザからの入力があれば話題を変更することなく対話を継続することができる。また、出力後の待ち時間を長くすることにより、出力した応答文に対するユーザからの応答を十分待つことができ、ユーザが応答内容について考えているにもかかわらず、波状して次の応答文を出力してしまうことを避けることができる。 When outputting a response to a question or statement about content not included in the input text, the topic will be changed to content not intended by the user, but the waiting time until the output of the response text is increased. By doing so, if there is an input from the user during the waiting time, the conversation can be continued without changing the topic. In addition, by increasing the waiting time after output, it is possible to wait sufficiently for the response from the user to the output response sentence. It is possible to avoid output.

また、前記待ち時間を、前記入力文が最初に入力されたときからの経過時間に応じて短くするか、または、前記ユーザによる過去の応答文出力から次の入力文入力までの沈黙時間に基づいて、該沈黙時間が長くなるほど長くすることができる。このように、待ち時間を動的に変更することにより、ユーザが装置との対話に慣れてきた場合や、ユーザがじっくり考えて応答するタイプか即答するタイプかなどの個人差がある場合などにも対応して、適切なタイミングで応答文を出力することができる。 Further, the waiting time is shortened according to an elapsed time from when the input sentence is first input, or based on a silence time from the past response sentence output by the user to the next input sentence input. Thus, the longer the silence time, the longer it can be. In this way, when the user has become accustomed to interacting with the device by dynamically changing the waiting time, or when there are individual differences such as whether the user responds carefully or responds quickly In response, the response sentence can be output at an appropriate timing.

以上説明したように、本発明の応答生成装置及びプログラムによれば、ユーザが自発的に、または装置からの応答文に応答して入力文を入力しようとしている場合に、装置からの応答文の出力により入力が妨げられないように、適切なタイミングで応答文の出力をすることができる、という効果が得られる。 As described above, according to the response generation device and the program of the present invention, when the user intends to input an input sentence spontaneously or in response to a response sentence from the apparatus, the response sentence from the apparatus There is an effect that a response sentence can be output at an appropriate timing so that the input is not hindered by the output.

以下、図面を参照して本発明の実施の形態を詳細に説明する。なお、本実施の形態では、ユーザからの発話を入力として、所定の処理を実行して音声出力する応答生成装置に本発明を適用した場合について説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the present embodiment, a case will be described in which the present invention is applied to a response generation apparatus that executes a predetermined process and outputs a voice by using an utterance from a user as an input.

図１に示すように、第１の実施の形態に係る応答生成装置１０は、ユーザ発話を集音して音声信号を生成するマイク１２と、音声出力を行うスピーカ１４と、マイク１２及びスピーカ１４に接続され、かつ、適切なタイミングで応答文を出力する所定の処理を実行するコンピュータ１６とを備えている。 As illustrated in FIG. 1, the response generation device 10 according to the first exemplary embodiment includes a microphone 12 that collects user utterances to generate an audio signal, a speaker 14 that outputs audio, a microphone 12, and a speaker 14. And a computer 16 that executes a predetermined process for outputting a response sentence at an appropriate timing.

コンピュータ１６は、応答生成装置１０全体の制御を司るＣＰＵ、後述する応答生成処理及び意図推定処理のプログラム等各種プログラムを記憶した記憶媒体としてのＲＯＭ、ワークエリアとしてデータを一時格納するＲＡＭ、各種情報が記憶された記憶手段としてのＨＤＤ、Ｉ／Ｏ（入出力）ポート、及びこれらを接続するバスを含んで構成されている。Ｉ／Ｏポートには、マイク１２及びスピーカ１４が接続されている。 The computer 16 includes a CPU that controls the entire response generation apparatus 10, a ROM as a storage medium that stores various programs such as a response generation process and an intention estimation process described later, a RAM that temporarily stores data as a work area, and various information Includes an HDD as storage means, an I / O (input / output) port, and a bus connecting them. A microphone 12 and a speaker 14 are connected to the I / O port.

また、このコンピュータ１６を、ハードウエアとソフトウエアとに基づいて定まる機能実現手段毎に分割した機能ブロックで説明すると、図１に示すように、マイク１２から入力された音声信号を音声認識して、一般的な形態素解析器を用いて音声認識されたユーザ発話を示す文字列情報に対して形態素解析を行う言語解析部２０、言語解析部２０による解析結果に基づいて、ユーザ発話の意図を推定する意図推定部２２、意図推定部２２で推定された意図に対応付けられた応答候補タイプに基づいて応答文を生成する応答生成部２４、応答生成部２４で生成された応答文の出力タイミングになったか否かを判断する出力タイミング計測部２６、及び出力タイミング計測部２６で出力タイミングになったと判断された場合に、生成された応答文を音声信号に変換してスピーカ１４から出力させる出力部２８を含んだ構成で表すことができる。 In addition, when the computer 16 is described by functional blocks divided for each function realizing means determined based on hardware and software, the voice signal input from the microphone 12 is recognized as shown in FIG. The intent of the user utterance is estimated based on the analysis result of the language analysis unit 20 that performs morphological analysis on the character string information indicating the user utterance that has been voice-recognized using a general morphological analyzer. The intention generation unit 22, the response generation unit 24 that generates a response sentence based on the response candidate type associated with the intention estimated by the intention estimation unit 22, and the output timing of the response sentence generated by the response generation unit 24. The output timing measurement unit 26 that determines whether or not the output timing is received, and the response generated when the output timing measurement unit 26 determines that the output timing has been reached. It can be a converted to a voice signal represented by a configuration including the output unit 28 to output from the speaker 14.

意図推定部２２は、図２に示すように、ユーザ発話の意図として、「言明」、「言明型質問」、「Ｙ／Ｎ質問」、「言明型回答」、「Ｙ／Ｎ回答」、「了解」、及び「その他」の何れの発話意図クラスであるかを推定する。このユーザ発話の意図の分類は、談話分析におけるムーブ（質問や回答など談話機能を担う構成単位）に基づいて定めたものである。ユーザ発話を上記の意図の何れかに分類する場合には、ユーザ発話の言語的特徴によって複数のクラスに分類するという分類課題において、クラスの言語的特徴を最も代表する分類器を設計する。 As shown in FIG. 2, the intention estimation unit 22 includes “statement”, “statement type question”, “Y / N question”, “statement type answer”, “Y / N answer”, “ It is estimated whether the utterance intention class is “OK” or “Other”. The classification of the intention of the user utterance is determined based on a move in the discourse analysis (a structural unit that bears a discourse function such as a question and an answer). When classifying a user utterance into any of the above intentions, a classifier that best represents the linguistic characteristics of the class is designed in a classification task of classifying the utterance into a plurality of classes according to the linguistic characteristics of the user utterance.

発話意図クラスが「言明型質問」の場合には、例えば、「今、何時？」といった文のように、何かを質問する文となり、疑問文の形式をとる。また、「Ｙ／Ｎ質問」の場合も同様に、例えば、「グラタン作れるの？」といった疑問文の形式をとり、Ｙｅｓ、Ｎｏを問う質問となる。一方、「言明型質問」及び「Ｙ／Ｎ質問」以外の場合には、一般的に、平叙文の形式をとる。発話意図クラスが「言明」の場合には、例えば、「友達が遊びに来るよ」といった文のように、一定の意味内容を持った文となり、節を含むという特徴を有している。また、例えば、「友達が遊びに」のように動詞が省略されている場合も、省略部分（来るよ）を補完した文が節を含むという特徴を有しているため、発話意図クラスは「言明」となる。 When the utterance intention class is “statement type question”, for example, a sentence such as “What time is it now?” Is a sentence that asks something and takes the form of a question sentence. Similarly, in the case of the “Y / N question”, for example, a question sentence format such as “Can you make a gratin?” Is used, and the question is Yes or No. On the other hand, in cases other than the “statement type question” and the “Y / N question”, generally, a plain text is used. When the utterance intention class is “statement”, for example, a sentence having a certain meaning and a clause is included, such as a sentence “a friend will come to play”. In addition, for example, even if a verb is omitted, such as “Friends to play”, a sentence complemented with an omitted part (coming) has a feature that includes a clause. Statement ".

また、発話意図クラスが「言明型回答」となる場合には、例えば、「何を作ってくれたんですか？」という質問に対する「グラタンを作ってくれて」という回答文となり、５Ｗ１Ｈ型の質問に対して、Ｙｅｓ、Ｎｏ以外の形式で質問に対して答える文であって、節を含むという特徴を有している。また、例えば、上記質問に対して、「グラタン」のように動詞を省略した発話の場合も、省略部分（を作ってくれて）を補完した文が節を含むという特徴を有しているため、発話意図クラスは「言明型回答」となる。また、発話意図クラスが「Ｙ／Ｎ回答」となる場合には、例えば、「友達が遊びに来るんですか？」という質問に対する「そうなの。」という回答文となり、Ｙｅｓ、Ｎｏを問う質問に対して、Ｙｅｓ、Ｎｏの形式で答える文となる。 Also, when the utterance intention class is “statement type answer”, for example, it becomes an answer sentence “Please make a gratin” to the question “What did you make?” Question of 5W1H type On the other hand, it is a sentence that answers a question in a format other than Yes and No, and has a feature of including a clause. In addition, for example, in the case of an utterance in which a verb is omitted, such as “Gratin”, the sentence supplemented with the omitted part (make me) includes a clause. The utterance intention class is “statement type answer”. In addition, when the utterance intention class is “Y / N answer”, for example, it becomes an answer sentence “yes” to the question “Would your friend come to play?” On the other hand, the answer is in the form of Yes, No.

また、発話の意図が「了解」となる場合には、例えば、「そっか〜」、「ううん」、「そうなんだ」のように同意または不同意する文や了解する文となる。発話意図クラスが「その他」の場合には、例えば、「私は晴れ男なんですよ」という相手の発話に対する笑いに相当する文や「うん」のような文となる。 In addition, when the intention of the utterance is “OK”, for example, it is a sentence that agrees or disagrees, or a sentence that understands, such as “Sokka ~”, “Yeah”, and “Yes”. When the utterance intention class is “other”, for example, a sentence equivalent to laughter or “Ye” corresponding to the utterance of the other party “I am a fine man”.

応答生成部２４は、意図推定部２２で推定された発話意図クラスに対応付けられた応答候補タイプの中から選択した応答候補タイプに基づいて、応答文を生成する。 The response generation unit 24 generates a response sentence based on the response candidate type selected from the response candidate types associated with the utterance intention class estimated by the intention estimation unit 22.

図３に示すように、発話意図クラスが「言明」の場合には、応答候補タイプとして「了解」、「繰り返し」、「確認」、「深堀質問」、「言明」が対応付けられている。また、発話意図クラスが「言明型質問」の場合には、「言明型回答」、発話意図クラスが「Ｙ／Ｎ質問」の場合には、「Ｙ／Ｎ回答」、発話意図クラスが「言明型回答」の場合には、「了解」、発話意図クラスが「Ｙ／Ｎ回答」の場合には、「了解」、発話意図クラスが「了解」の場合には、「了解」及び「言明」がそれぞれ応答候補タイプとして対応付けられて、所定の記憶領域に記憶されている。発話意図クラスに対して、応答候補タイプが複数対応付けられている場合には、複数の応答候補タイプの中からランダムに選択したり、過去の発話履歴に出現していない応答候補タイプを選択したりすることにより、１つまたは複数の応答候補タイプを選択する。 As shown in FIG. 3, when the utterance intention class is “statement”, “acknowledge”, “repetition”, “confirmation”, “Fukahori question”, and “statement” are associated as response candidate types. Further, when the utterance intention class is “declaration type question”, when the utterance intention class is “Y / N question”, “Y / N answer” and the utterance intention class are “declaration type”. Type answer ”,“ OK ”, when the speech intention class is“ Y / N answer ”,“ OK ”, and when the speech intention class is“ OK ”,“ OK ”and“ statement ” Are associated as response candidate types and stored in a predetermined storage area. When multiple response candidate types are associated with the utterance intention class, select a response candidate type that is not randomly selected from multiple response candidate types or that has not appeared in the past utterance history. To select one or more response candidate types.

また、応答生成部２４は、選択した応答候補タイプに基づいて、例えば、図３の「応答文例」に示すような応答文を生成する。具体的には、ユーザ発話が「友達が遊びに来るよ。」であった場合には、発話意図クラスは「言明」と推定され、これに対する応答候補タイプとして「了解」を選択した場合には、「そっか〜」のような応答文が生成される。また、応答候補タイプが「繰り返し」の場合には、ユーザ発話の内容を繰り返す「友達が遊びに来るんですね」のような応答文が生成される。また、応答候補タイプが「確認」の場合には、ユーザ発話の内容を確認する「友達が遊びに来るの？」のような応答文が生成される。また、応答候補タイプが「深堀質問」の場合には、ユーザ発話の内容に含まれていない事項について質問する「いつ遊びに来るの？」のような応答文が生成される。また、応答候補タイプが「言明」の場合には、ユーザの発話内容に対して新たな切り返しとなる「僕も友達が欲しいな」のような応答文が生成される。 Further, the response generation unit 24 generates a response sentence as shown in “response sentence example” in FIG. 3 based on the selected response candidate type, for example. Specifically, when the user utterance is “Friend comes to play.”, The utterance intention class is estimated to be “statement”, and when “OK” is selected as the response candidate type for this, , A response sentence such as “Soka ~” is generated. Further, when the response candidate type is “repeat”, a response sentence such as “a friend comes to play” is generated that repeats the contents of the user utterance. Further, when the response candidate type is “confirmation”, a response sentence such as “Would a friend come to play?” For confirming the content of the user utterance is generated. When the response candidate type is “Fukahori question”, a response sentence such as “When are you coming to play?” Is generated to ask a question about items not included in the content of the user utterance. Further, when the response candidate type is “statement”, a response sentence such as “I want a friend” is generated as a new response to the user's utterance content.

なお、応答文の生成は、予め応答候補タイプ毎に定めたフォーマットにユーザ発話から抽出した単語を当てはめたり、ユーザ発話に含まれる単語の属性に従って予め定めた応答文生成ルールに従って生成したりなど、周知の技術を用いて行うことができる。 The response sentence is generated by applying a word extracted from the user utterance to a format determined for each response candidate type in advance, or according to a predetermined response sentence generation rule according to the attribute of the word included in the user utterance, etc. This can be done using known techniques.

出力タイミング計測部２６は、ユーザ発話が入力されない状態の継続時間を計測し、その継続時間が「応答文出力待ち時間」を経過した場合には、応答文の出力タイミングとなったと判断する。応答文出力待ち時間を経過する前にユーザから次の発話が入力された場合には、継続時間の計測が中断され、応答文の出力タイミングと判断されることはない。 The output timing measurement unit 26 measures the duration of the state in which no user utterance is input, and determines that it is the response statement output timing when the duration has exceeded the “response statement output waiting time”. When the next utterance is input from the user before the response sentence output waiting time elapses, the measurement of the duration time is interrupted and the response sentence output timing is not determined.

この判断に用いる応答文出力待ち時間は、推定された発話意図クラスとそれに対応付けられた応答候補タイプとの組み合わせ毎に定められている。例えば、図３に示すように、発話意図クラス「言明」と応答候補タイプ「了解」、「繰り返し」、「確認」のそれぞれとの組み合わせの場合は、応答文出力待ち時間を”０（待ち時間なし）”とし、発話意図クラス「言明」と応答候補タイプ「深堀質問」及び「言明」のそれぞれとの組み合わせの場合は、応答文出力待ち時間を”Ｔ１（例えば、７秒）”とする。発話意図クラス「言明」のユーザ発話に対して、応答候補タイプ「深堀質問」及び「言明」に基づく応答文を出力する場合に応答文出力待ち時間をＴ１としたのは、ユーザの発話内容に含まれていない事項について質問したり、新たな切り返しをしたりなど、対話が別の方向へ展開を見せることになるからである。これにより、ユーザが引き続き自発的に発話を入力しようとしているにもかかわらず、装置から対話を別の方向へ展開させるような応答文が出力されてスムーズな対話が阻害されるということを回避することができる。 The response sentence output waiting time used for this determination is determined for each combination of the estimated utterance intention class and the response candidate type associated therewith. For example, as shown in FIG. 3, in the case of a combination of the utterance intention class “statement” and the response candidate types “accepted”, “repeated”, and “confirmed”, the response sentence output waiting time is set to “0 (waiting time) None) ”, and in the case of a combination of the speech intention class“ statement ”and each of the response candidate types“ Fukahori question ”and“ statement ”, the response sentence output waiting time is“ T1 (for example, 7 seconds) ”. When a response sentence based on the response candidate types “Fukahori question” and “statement” is output for a user utterance of the utterance intention class “statement”, the response sentence output waiting time is set to T1 in the user's utterance content. This is because the dialogue will develop in another direction, such as asking questions that are not included or making a new turnaround. This avoids a situation in which a smooth conversation is hindered by a response sentence that expands the conversation in another direction from the device even though the user continues to voluntarily input the utterance. be able to.

次に、図４を参照して、第１の実施の形態の応答生成装置１０における応答生成処理ルーチンについて説明する。本ルーチンは、ＲＯＭに記憶された応答生成プログラムをＣＰＵが実行することにより行われる。 Next, a response generation processing routine in the response generation device 10 of the first exemplary embodiment will be described with reference to FIG. This routine is performed by the CPU executing a response generation program stored in the ROM.

ステップ１００で、マイク１２からユーザ発話が入力されたか否かを判断し、ユーザ発話が入力された場合には、ステップ１０２へ進み、入力されない場合には、入力されるまで本ステップの判断を繰り返す。ここでは、ユーザ発話「友達が遊びに来るよ」が入力されたものとする。 In step 100, it is determined whether or not a user utterance has been input from the microphone 12. If a user utterance has been input, the process proceeds to step 102. If not, the determination in this step is repeated until it is input. . Here, it is assumed that the user utterance “Friend comes to play” is input.

ステップ１０２で、入力されたユーザ発話を示す音声信号を音声認識して文字列情報とし、この文字列情報に対して形態素解析を行う。次に、ステップ１０４で、形態素解析の解析結果に基づいて、意図推定処理を実行する。 In step 102, the input voice signal indicating the user utterance is recognized as voice string information, and morphological analysis is performed on the string information. Next, in step 104, intention estimation processing is executed based on the analysis result of morphological analysis.

ここで、図５を参照して、意図推定処理ルーチンについて説明する。 Here, the intention estimation processing routine will be described with reference to FIG.

ステップ２００で、形態素解析の解析結果に基づいて、ユーザ発話の意図が「質問」であるか否かを判断する。ユーザ発話を示す文が疑問文の形式であれば、「質問」であると判断して、ステップ２０２へ進み、「質問」ではない場合には、ステップ２０８へ進む。 In step 200, it is determined whether the intention of the user utterance is “question” based on the analysis result of the morphological analysis. If the sentence indicating the user's utterance is in the question sentence format, it is determined to be a “question”, and the process proceeds to step 202. If not, the process proceeds to step 208.

ステップ２０２で、ユーザ発話を示す疑問文がＹｅｓ、Ｎｏの形式で答える「Ｙ／Ｎ質問」であるか否かを判断する。「Ｙ／Ｎ質問」の場合には、ステップ２０４に進んで、変数ｉに”Ｙ／Ｎ質問”を格納する。「Ｙ／Ｎ質問」ではない場合には、「言明型質問」であると判断して、ステップ２０６に進んで、変数ｉに”言明型質問”を格納する。 In step 202, it is determined whether or not the question sentence indicating the user utterance is a “Y / N question” that is answered in the form of Yes or No. In the case of “Y / N question”, the process proceeds to step 204 to store “Y / N question” in the variable i. If it is not a “Y / N question”, it is determined that it is a “statement type question”, and the process proceeds to step 206 to store “statement type question” in the variable i.

上記ステップ２００で、発話意図が「質問」ではないと判断されて、ステップ２０８へ進んだ場合には、ユーザ発話の意図が「言明系」であるか否かを判断する。ユーザ発話を示す文またはユーザ発話の省略部分を補完した文が節を含む場合には、「言明系」であると判断して、ステップ２１０へ進み、「言明系」ではない場合には、ステップ２１６へ進む。 If it is determined in step 200 that the utterance intention is not “question” and the process proceeds to step 208, it is determined whether or not the intention of the user utterance is “statement”. If the sentence indicating the user utterance or the sentence supplemented with the omitted part of the user utterance includes a clause, the sentence is determined to be “declaration type”, and the process proceeds to step 210. Proceed to 216.

ステップ２１０で、ユーザ発話の意図が「言明回答」であるか否かを判断する。入力されたユーザ発話の前に装置から出力された前応答文を参照し、前応答文が疑問文の場合には、それに対する回答である「言明回答」であると判断して、ステップ２１２に進んで、変数ｉに”言明回答”を格納する。「言明回答」ではない場合には、ステップ２１４に進んで、変数ｉに”言明”を格納する。 In step 210, it is determined whether or not the intention of the user utterance is “statement answer”. Reference is made to the previous response sentence output from the device before the input user utterance, and if the previous response sentence is a question sentence, it is determined that the answer is a “statement answer”, and the process proceeds to step 212. Go ahead and store “statement answer” in variable i. If it is not “statement answer”, the process proceeds to step 214 to store “statement” in the variable i.

上記ステップ２０８で、発話意図が「言明系」ではないと判断されて、ステップ２１６へ進んだ場合には、ユーザ発話の意図が「Ｙ／Ｎ回答」であるか否かを判断する。入力されたユーザ発話の前に装置から出力された前応答文を参照し、前応答文が疑問文の場合には、それに対する回答である「Ｙ／Ｎ回答」であると判断して、ステップ２１８に進んで、変数ｉに”Ｙ／Ｎ回答”を格納する。「Ｙ／Ｎ回答」ではない場合には、ステップ２２０に進んで、変数ｉに”了解”を格納する。 If it is determined in step 208 that the utterance intention is not “statement” and the process proceeds to step 216, it is determined whether or not the intention of the user utterance is “Y / N answer”. Refer to the previous response sentence output from the device before the input user utterance. If the previous response sentence is a question sentence, it is determined that the answer is “Y / N answer”, and the step Proceeding to 218, "Y / N answer" is stored in variable i. If it is not “Y / N answer”, the process proceeds to step 220 to store “OK” in the variable i.

ここでは、ユーザ発話「友達が遊びに来るよ」は、平叙文であるため、ステップ２００で否定され、また、節を含むため、ステップ２０８で肯定され、また、前応答文に対する回答ではないため、ステップ２１０で否定され、ｉ＝”言明”が格納されることになる。次に、ステップ２２２で、推定された発話意図クラスを示す変数ｉを出力して、リターンする。 Here, since the user utterance “Friend comes to play” is a plain text, it is denied in step 200, and because it includes a clause, it is affirmed in step 208 and is not an answer to the previous response sentence. In step 210, i = “statement” is stored. Next, in step 222, the variable i indicating the estimated utterance intention class is output, and the process returns.

応答生成処理（図４）のステップ１０６へ戻ると、意図推定処理で出力された発話意図クラスｉに対応する応答候補タイプの中から１つ以上の応答候補タイプを選択する。ここでは、ｉ＝”言明”であるので、図３を参照して、応答候補タイプ「了解」、「繰り返し」、「確認」、「深堀質問」、及び「言明」から１つ以上を選択する。なお、複数の応答候補タイプを選択する場合には、応答文出力待ち時間が”０（待ち時間なし）”の応答候補タイプを複数選択するよりも、応答文出力待ち時間が”０（待ち時間なし）”の応答候補タイプと応答文出力待ち時間が”Ｔ１”の応答候補タイプとを組み合わせて選択する方が望ましい。ここでは、「了解」及び「深堀質問」を選択することとする。 Returning to step 106 of the response generation process (FIG. 4), one or more response candidate types are selected from the response candidate types corresponding to the utterance intention class i output in the intention estimation process. Here, since i = “statement”, referring to FIG. 3, one or more of response candidate types “accepted”, “repeated”, “confirmed”, “Fukahori question”, and “statement” are selected. . When selecting a plurality of response candidate types, the response sentence output waiting time is “0 (waiting time) rather than selecting a plurality of response candidate types whose response sentence output waiting time is“ 0 (no waiting time) ”. None) It is preferable to select a combination of the response candidate type “” and the response candidate type “T1” as the response sentence output waiting time. Here, “OK” and “Fukahori Question” are selected.

次に、ステップ１０８で、選択した応答候補タイプに基づく応答文を生成する。応答候補タイプ「了解」については、例えば、「そっか〜」のような応答文、応答候補タイプ「深堀質問」については、例えば、「いつ遊びに来るの？」のような応答文が生成される。 Next, in step 108, a response sentence based on the selected response candidate type is generated. For the response candidate type “OK”, for example, a response sentence such as “Soka ~” is generated, and for the response candidate type “Fukahori question”, a response sentence such as “when do you come to play?” Is generated. The

次に、ステップ１１０で、選択された応答候補タイプの中に、応答文出力待ち時間が”０（待ち時間なし）”のものが存在するか否かを判断する。存在する場合には、ステップ１１２へ進んで、上記ステップ１０８で生成した応答文の中から、応答文出力待ち時間が”０（待ち時間なし）” の応答候補タイプに基づく応答文「そっか〜」を音声合成して、スピーカ１４から音声出力して、ステップ１１４へ進む。選択された応答候補タイプの中に、応答文出力待ち時間が”０（待ち時間なし）”のものが存在しない場合には、そのままステップ１１４へ進む。 Next, in step 110, it is determined whether or not there is a response sentence output waiting time of “0 (no waiting time)” among the selected response candidate types. If it exists, the process proceeds to step 112, and the response sentence “So ~~” based on the response candidate type whose response sentence output waiting time is “0 (no waiting time)” among the response sentences generated in step 108 above. ”Is voice-synthesized, and the voice is output from the speaker 14, and the process proceeds to Step 114. If none of the selected response candidate types has a response sentence output waiting time of “0 (no waiting time)”, the process proceeds to step 114 as it is.

ステップ１１４で、選択された応答候補タイプの中に、応答文出力待ち時間が”Ｔ１”のものが存在するか否かを判断する。存在する場合には、ステップ１１６へ進み、存在しない場合には、処理を終了する。 In step 114, it is determined whether or not there is a response sentence output waiting time of “T1” among the selected response candidate types. When it exists, it progresses to step 116, and when it does not exist, a process is complete | finished.

ステップ１１６で、ユーザから次発話が入力されたか否かを判断する。次発話が入力された場合には、上記ステップ１０８で生成した応答文出力待ち時間”Ｔ１”の応答候補タイプに基づく応答文は出力することなく、ステップ１０２へ戻る。次発話が入力されない場合には、ステップ１１８へ進み、前のユーザ発話が入力されてから、発話の入力がされていない状態の継続時間がＴ１を経過したか否かを判断する。Ｔ１を経過していない場合には、ステップ１１６へ戻って処理を繰り返し、Ｔ１を経過した場合には、応答文出力待ち時間”Ｔ１”の応答候補タイプに基づく応答文の出力タイミングである判断して、ステップ１２０へ進んで、上記ステップ１０８で生成した応答文の中から、応答文出力待ち時間が”Ｔ１” の応答候補タイプに基づく応答文「いつ遊びに来るの？」を音声合成して、スピーカ１４から音声出力し、処理を終了する。 In step 116, it is determined whether or not a next utterance has been input from the user. When the next utterance is input, the response text based on the response candidate type of the response text output waiting time “T1” generated in step 108 is not output, and the process returns to step 102. If the next utterance is not input, the process proceeds to step 118, and it is determined whether or not the duration time T1 in which no utterance is input has elapsed since the previous user utterance was input. If T1 has not elapsed, the process returns to step 116 to repeat the processing. If T1 has elapsed, it is determined that the response sentence output timing is based on the response candidate type of the response sentence output waiting time “T1”. Then, the process proceeds to step 120, from which the response sentence “when will you come to play?” Based on the response candidate type whose response sentence output waiting time is “T1” is synthesized from the response sentences generated in step 108 above. Then, sound is output from the speaker 14, and the process is terminated.

以上説明したように、第１の実施の形態の応答生成装置によれば、ユーザの発話内容に含まれていない事項について質問したり、新たな切り返しをしたりなど、対話が別の方向へ展開を見せるような応答文を出力する場合には、応答文出力までに発話意図と応答候補タイプとの組み合わせで定まる待ち時間を設けることにより、適切なタイミングで応答文を出力することができる。このため、ユーザが引き続き自発的に発話を入力しようとしているにもかかわらず、装置から応答文が出力されてスムーズな対話が阻害されるということを回避することができる。 As described above, according to the response generation device of the first exemplary embodiment, the dialogue develops in another direction, such as asking about a matter that is not included in the user's utterance content or making a new return. In the case of outputting a response sentence that shows the response sentence, it is possible to output the response sentence at an appropriate timing by providing a waiting time determined by the combination of the utterance intention and the response candidate type before the response sentence is output. For this reason, it is possible to avoid a situation in which a response is output from the device and smooth dialogue is hindered even though the user continues to voluntarily input an utterance.

なお、第１の実施の形態では、選択した応答候補タイプに基づく応答文を生成してから、出力のタイミングまで待って出力する場合について説明したが、選択した応答候補タイプに対応する出力待ち時間の経過を待って、選択した応答候補タイプに基づく応答文を生成して出力するようにしてもよい。 In the first embodiment, a case has been described in which a response sentence based on the selected response candidate type is generated and then output after waiting until an output timing. However, an output waiting time corresponding to the selected response candidate type is described. It is also possible to generate and output a response sentence based on the selected response candidate type after waiting for elapse.

また、第１の実施の形態では、応答文出力待ち時間として、”０（待ち時間なし）”及び”Ｔ１”とした場合について説明したが、より詳細に３段階以上の待ち時間を設定してもよいし、発話意図クラスと応答候補タイプとの組み合わせ毎に異なる待ち時間を設定してもよい。 In the first embodiment, the case where the response sentence output waiting time is set to “0 (no waiting time)” and “T1” has been described. More specifically, three or more waiting times are set. Alternatively, different waiting times may be set for each combination of the utterance intention class and the response candidate type.

また、発話意図クラスを「言明型質問」、「Ｙ／Ｎ質問」、「言明回答」、「言明」、「Ｙ／Ｎ回答」、及び「了解」の何れかに分類する場合を例に説明したが、意図が認定できないフィラーやつぶやきなどがユーザ発話として入力される場合があるため、「その他」という分類を追加してもよい。この場合には、上記いずれのクラスにも分類できない場合を「その他」と判定するようにするとよい。 Also, the case where the speech intention class is classified into one of “statement type question”, “Y / N question”, “statement answer”, “statement”, “Y / N answer”, and “OK” is explained as an example However, since a filler or a tweet that cannot be recognized as an intention may be input as a user utterance, a classification of “others” may be added. In this case, it may be determined as “others” when it cannot be classified into any of the above classes.

次に、第２の実施の形態について説明する。第２の実施の形態では、装置から応答文を出力した後に、ユーザからの発話を待つ発話待ち時間を設けた点が、第１の実施の形態と異なっている。なお、第１の実施の形態と同様の構成及び処理については、同一の符号を付して説明を省略する。 Next, a second embodiment will be described. The second embodiment is different from the first embodiment in that an utterance waiting time for waiting for an utterance from the user is provided after a response sentence is output from the apparatus. In addition, about the structure and process similar to 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

図６に示すように、第２の実施の形態に係る応答生成装置１１０は、ユーザ発話を集音して音声信号を生成するマイク１２と、音声出力を行うスピーカ１４と、マイク１２及びスピーカ１４に接続され、かつ、適切なタイミングで応答文を出力する所定の処理を実行するコンピュータ１６とを備えている。 As illustrated in FIG. 6, the response generation device 110 according to the second embodiment includes a microphone 12 that collects user utterances to generate an audio signal, a speaker 14 that outputs audio, and the microphone 12 and the speaker 14. And a computer 16 that executes a predetermined process for outputting a response sentence at an appropriate timing.

コンピュータ１６を、ハードウエアとソフトウエアとに基づいて定まる機能実現手段毎に分割した機能ブロックで説明すると、図６に示すように、言語解析部２０、意図推定部２２、意図推定部２２で推定された意図に対応付けられた応答候補タイプに基づいて応答文候補を生成する応答候補生成部１２４、応答候補生成部１２４で生成された応答文候補を一旦格納する応答候補記憶部３０、応答文出力後のユーザからの発話待ち時間を計測して、次の応答文の出力タイミングになったか否かを判断する発話待ち時間計測部１２６、及び発話待ち時間計測部１２６で出力タイミングになったと判断された場合に、応答候補記憶部３０に格納された応答文候補の中から１つを選択して音声信号に変換してスピーカ１４から出力させる出力部２８を含んだ構成で表すことができる。 When the computer 16 is described with function blocks divided for each function realization means determined based on hardware and software, as shown in FIG. 6, the language analysis unit 20, the intention estimation unit 22, and the intention estimation unit 22 perform estimation. A response candidate generation unit 124 that generates a response sentence candidate based on the response candidate type associated with the intended intention, a response candidate storage unit 30 that temporarily stores the response sentence candidate generated by the response candidate generation unit 124, and a response sentence The utterance waiting time from the user after the output is measured, and the utterance waiting time measuring unit 126 that determines whether or not the output timing of the next response sentence is reached, and the utterance waiting time measuring unit 126 determines that the output timing is reached. The output unit 28 selects one of the response sentence candidates stored in the response candidate storage unit 30 and converts it into an audio signal and outputs it from the speaker 14. It can be represented by the inclusive configuration.

応答候補生成部１２４は、意図推定部２２で推定された発話意図クラスに対応付けられた応答候補タイプに基づいて、応答文候補を生成する。発話意図クラスの各々に対応付けられた応答候補タイプ、及び応答候補タイプの各々に基づいた応答文例は、第１の実施の形態の場合と同様である。 The response candidate generation unit 124 generates a response sentence candidate based on the response candidate type associated with the utterance intention class estimated by the intention estimation unit 22. The response candidate types associated with each of the utterance intention classes and the example response sentences based on each of the response candidate types are the same as in the case of the first embodiment.

また、応答候補生成部１２４は、生成した応答文候補の中から選択した１つの応答文候補を出力する応答文として決定し、決定した応答文を出力部２８へ送信する。一方、選択されなかった応答文候補を応答候補記憶部３０へ格納する。生成した応答文候補から１つの応答文候補を選択する際には、ランダムに選択したり、過去の発話履歴に出現していない応答文候補を選択したりすることにより選択する。 In addition, the response candidate generation unit 124 determines a response sentence selected from the generated response sentence candidates as a response sentence to be output, and transmits the determined response sentence to the output unit 28. On the other hand, the response sentence candidates not selected are stored in the response candidate storage unit 30. When selecting one response sentence candidate from the generated response sentence candidates, it is selected by selecting at random or selecting a response sentence candidate that does not appear in the past utterance history.

発話待ち時間計測部１２６は、１つ目の応答文を出力した後のユーザ発話が入力されない状態の継続時間を計測し、その継続時間が「発話待ち時間」を経過した場合には、次の応答文の出力タイミングとなったと判断する。発話待ち時間を経過する前にユーザから次の発話が入力された場合には、継続時間の計測が中断され、次の応答文の出力タイミングと判断されることはない。 The utterance waiting time measuring unit 126 measures the duration of the state in which the user utterance is not input after outputting the first response sentence, and if the duration exceeds the “utterance waiting time”, the following It is determined that the response sentence output timing has come. When the next utterance is input from the user before the utterance waiting time elapses, the measurement of the duration time is interrupted and it is not determined that the next response sentence is output.

この判断に用いる発話待ち時間は、推定された発話意図クラスとそれに対応付けられた応答候補タイプとの組み合わせ毎に定められている。例えば、図７に示すように、発話意図クラス「言明」と応答候補タイプ「了解」、「繰り返し」、「言明」のそれぞれとの組み合わせの場合は、発話待ち時間を”Ｔ２”とし、発話意図クラス「言明」と応答候補タイプ「確認」及び「深堀質問」のそれぞれとの組み合わせの場合は、発話待ち時間を”Ｔ３”とする。なお、例えば、Ｔ２＝８秒、Ｔ３＝１５秒のように、Ｔ３はＴ２より長い時間を設定する。これは、出力する応答文の内容が、ユーザに回答を求めるようなものであるため、ユーザに回答を考える時間を十分に与えるためである。また、Ｔ３は∞として、ユーザからの次発話があるまで待つようにしてもよい。これにより、ユーザが装置から出力された応答文に対する回答を考えているにもかかわらず、装置から次の応答文が出力されてスムーズな対話が阻害されるということを回避することができる。 The speech waiting time used for this determination is determined for each combination of the estimated speech intention class and the response candidate type associated therewith. For example, as shown in FIG. 7, in the case of a combination of the utterance intention class “statement” and the response candidate types “accepted”, “repeat”, and “statement”, the utterance waiting time is set to “T2”, In the case of a combination of the class “statement” and the response candidate types “confirmation” and “Fukahori question”, the speech waiting time is set to “T3”. For example, T3 is set to be longer than T2, such as T2 = 8 seconds and T3 = 15 seconds. This is because the content of the response sentence to be output is such as asking the user for an answer, so that the user has enough time to think about the answer. Further, T3 may be set to ∞ and wait until there is a next utterance from the user. Accordingly, it is possible to avoid that the smooth response is hindered by the next response text being output from the device even though the user considers the response to the response text output from the device.

次に、図８を参照して、第２の実施の形態の応答生成装置１１０における応答生成処理ルーチンについて説明する。本ルーチンは、ＲＯＭに記憶された応答生成プログラムをＣＰＵが実行することにより行われる。 Next, a response generation processing routine in the response generation apparatus 110 according to the second embodiment will be described with reference to FIG. This routine is performed by the CPU executing a response generation program stored in the ROM.

ステップ１００で、ユーザ発話が入力されたか否かを判断し、次に、ステップ１０２で、入力されたユーザ発話を示す音声信号を音声認識して文字列情報とし、この文字列情報に対して形態素解析を行い、次に、ステップ１０４で、形態素解析の解析結果に基づいて、意図推定処理を実行して、発話意図クラスｉを出力する。ここでは、第１の実施の形態と同様にユーザ発話「友達が遊びに来るよ」に対して、発話意図クラスｉ＝”言明”が出力されたものとする。 In step 100, it is determined whether or not a user utterance has been input. Next, in step 102, a voice signal indicating the input user utterance is recognized as character string information, and morphemes are applied to the character string information. Next, in step 104, the intention estimation process is executed based on the analysis result of the morphological analysis, and the utterance intention class i is output. Here, as in the first embodiment, it is assumed that the utterance intention class i = “statement” is output for the user utterance “Friend comes to play”.

次に、ステップ３００で、意図推定処理で出力された発話意図クラスｉに対応する応答候補タイプに基づく応答文候補を生成する。発話意図クラスｉに対応する応答候補タイプが複数対応付けられている場合には、対応付けられている全ての応答候補タイプに基づく応答文候補を生成する。ここでは、ｉ＝”言明”に対応する応答候補タイプ「了解」、「繰り返し」、「確認」、「深堀質問」、及び「言明」のそれぞれに基づいて、「そっか〜」（了解）、「友達が遊びに来るんですね」（繰り返し）、「友達が遊びに来るの？」（確認）、「いつ遊びに来るの？」（深堀質問）、「僕も友達が欲しいな」（言明）のような応答文候補が生成される。 Next, in step 300, a response sentence candidate based on the response candidate type corresponding to the utterance intention class i output in the intention estimation process is generated. When a plurality of response candidate types corresponding to the utterance intention class i are associated, response sentence candidates based on all the associated response candidate types are generated. Here, based on the response candidate types “OK”, “Repeat”, “Confirm”, “Fukahori Question”, and “Statement” corresponding to i = “Statement”, “Soka ~” (OK), “Your friends come to play” (repeated), “Would your friends come to play?” (Confirmation), “When do you come to play” (Fukahori question), “I want friends too” (statement) Response sentence candidates such as are generated.

次に、ステップ３０２で、上記ステップ３００で生成した応答文候補の中から１つの応答文候補を選択して、音声合成して、スピーカ１４から音声出力する。ここでは、「いつ遊びに来るの？」（深堀質問）を選択して出力するものとする。次に、ステップ３０４で、出力しなかった残りの応答文候補を応答候補記憶部３０に一旦格納する。 Next, in step 302, one response sentence candidate is selected from the response sentence candidates generated in step 300, voice-synthesized, and output from the speaker 14. Here, it is assumed that “when do you come to play?” (Fukahori question) is selected and output. Next, in step 304, the remaining response sentence candidates that have not been output are temporarily stored in the response candidate storage unit 30.

次に、ステップ３０６で、ユーザから次発話が入力されたか否かを判断する。次発話が入力された場合には、上記ステップ３０４で格納された応答文候補は出力することなく、ステップ１０２へ戻る。次発話が入力されない場合には、ステップ３０８へ進む。 Next, in step 306, it is determined whether or not a next utterance has been input from the user. If the next utterance is input, the response sentence candidate stored in step 304 is not output, and the process returns to step 102. If the next utterance is not input, the process proceeds to step 308.

ステップ３０８で、上記ステップ３０２で１つ目の応答文を出力してから、発話の入力がされていない状態の継続時間が出力した応答文の応答候補タイプに対応付けられている発話待ち時間を経過したか否かを判断する。ここでは、発話意図クラスｉ＝”言明”に対する応答候補タイプ「深堀質問」の発話待ち時間”Ｔ３”である。Ｔ３を経過していない場合には、ステップ３０６へ戻って処理を繰り返し、Ｔ３を経過した場合には、次の応答文の出力タイミングである判断して、ステップ３１０へ進む。 In step 308, after the first response sentence is output in step 302, the utterance waiting time associated with the response candidate type of the response sentence in which the duration of the state in which no utterance is input is output is set. Judge whether or not it has passed. Here, the speech waiting time “T3” of the response candidate type “Fukahori question” for the speech intention class i = “statement”. If T3 has not elapsed, the process returns to step 306 to repeat the process. If T3 has elapsed, it is determined that the output timing of the next response sentence is reached, and the process proceeds to step 310.

ステップ３１０で、応答候補記憶部３０に格納されている応答文候補が存在するか否かを判断し、存在する場合には、ステップ３１２へ進んで、応答候補記憶部３０から応答文候補を１つ、例えば、「僕も友達が欲しいな」（言明）を選択して出力し、ステップ３０６へ戻る。応答候補記憶部３０に格納されている応答文候補が存在しない場合には、ステップ３１４へ進んで、ユーザの発話を促す促し応答文を生成して出力し、処理を終了する。促し応答文は、例えば、「もっとお話が聞きたいな」、「他に何かあるの？」、「それでそれで」、「僕のお話してもいい？」のようの応答文である。 In step 310, it is determined whether or not there is a response sentence candidate stored in the response candidate storage unit 30. If there is a response sentence candidate, the process proceeds to step 312, and the response sentence candidate is set to 1 from the response candidate storage unit 30. For example, “I also want a friend” (statement) is selected and output, and the process returns to step 306. When there is no response sentence candidate stored in the response candidate storage unit 30, the process proceeds to step 314 to generate and output a prompt response prompting the user to speak, and the process ends. The prompt response sentence is, for example, a response sentence such as “I want to hear more stories”, “Is there something else?”, “So then”, “Can I talk to you?”.

以上説明したように、第２の実施の形態の応答生成装置によれば、出力する応答文の内容が、ユーザに回答を求めるようなものである場合には、応答文を出力してから発話意図と応答候補タイプとの組み合わせで定まる待ち時間を設けることにより、ユーザに回答を考える時間を十分に与えて、適切なタイミングで次の応答文を出力することができる。このため、ユーザが装置から出力された応答文に対する回答を考えているにもかかわらず、装置から次の応答文が出力されてスムーズな対話が阻害されるということを回避することができる。 As described above, according to the response generation device of the second exemplary embodiment, when the content of the response sentence to be output is asking the user for an answer, the utterance is output after the response sentence is output. By providing a waiting time determined by the combination of the intention and the response candidate type, it is possible to give the user sufficient time to think about an answer and output the next response sentence at an appropriate timing. For this reason, it is possible to avoid the fact that the next response text is output from the device and smooth interaction is hindered even though the user considers an answer to the response text output from the device.

なお、第２の実施の形態では、発話意図クラスに対応する応答候補タイプが複数ある場合には、その全てについて応答文候補を生成する場合について説明したが、複数の応答候補タイプの中からいくつかの応答候補タイプを選択して、応答文候補を生成するようにしてもよい。例えば、発話意図クラスｉ＝”言明”に対して、応答候補タイプ「了解」及び「深堀質問」を選択し、「そっか〜」及び「いつ遊びに来るの？」を応答文候補として生成するようにしてもよい。 In the second embodiment, when there are a plurality of response candidate types corresponding to the utterance intention class, a case has been described in which response sentence candidates are generated for all of them. A response sentence candidate may be generated by selecting such a response candidate type. For example, for the utterance intention class i = “statement”, the response candidate types “OK” and “Fukahori question” are selected, and “Wow ~” and “When do you come to play?” Are generated as response sentence candidates. You may do it.

また、第２の実施の形態では、応答候補タイプに基づく応答文候補を生成してから、応答候補記憶部に格納する場合について説明したが、発話意図クラスに対応する応答候補タイプを一旦記憶しておき、出力のタイミングになったときに、記憶しておいた応答候補タイプの中から１つを選択して、選択した応答候補タイプに基づく応答文を生成して出力するようにしてもよい。 In the second embodiment, a case has been described in which a response sentence candidate based on a response candidate type is generated and then stored in the response candidate storage unit. However, the response candidate type corresponding to the utterance intention class is temporarily stored. In addition, when the output timing is reached, one of the stored response candidate types may be selected, and a response sentence based on the selected response candidate type may be generated and output. .

また、第２の実施の形態では、発話待ち時間として、”Ｔ２”及び”Ｔ３”とした場合について説明したが、より詳細に３段階以上の待ち時間を設定してもよいし、発話意図クラスと応答候補タイプとの組み合わせ毎に異なる待ち時間を設定してもよい。 Further, in the second embodiment, the case where “T2” and “T3” are set as the speech waiting time has been described, but more than three stages of waiting time may be set in more detail. A different waiting time may be set for each combination of the response candidate type.

次に、第３の実施の形態について説明する。第３の実施の形態では、応答文出力待ち時間及び発話待ち時間を併用し、さらに応答文出力待ち時間及び発話待ち時間を動的に変更する点が、第１及び第２の実施の形態とは異なる。なお、第１及び第２の実施の形態と同様の構成及び処理については、同一の符号を付して説明を省略する。 Next, a third embodiment will be described. In the third embodiment, the response sentence output waiting time and the utterance waiting time are used together, and the response sentence output waiting time and the utterance waiting time are dynamically changed as in the first and second embodiments. Is different. In addition, about the structure and process similar to 1st and 2nd embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

図９に示すように、第３の実施の形態に係る応答生成装置２１０は、ユーザ発話を集音して音声信号を生成するマイク１２と、音声出力を行うスピーカ１４と、マイク１２及びスピーカ１４に接続され、かつ、適切なタイミングで応答文を出力する所定の処理を実行するコンピュータ１６とを備えている。 As illustrated in FIG. 9, the response generation apparatus 210 according to the third embodiment includes a microphone 12 that collects user utterances to generate an audio signal, a speaker 14 that outputs audio, and the microphone 12 and the speaker 14. And a computer 16 that executes a predetermined process for outputting a response sentence at an appropriate timing.

コンピュータ１６を、ハードウエアとソフトウエアとに基づいて定まる機能実現手段毎に分割した機能ブロックで説明すると、図９に示すように、言語解析部２０、意図推定部２２、応答候補生成部１２４、応答候補記憶部３０、出力タイミング計測部２６、発話待ち時間計測部１２６、出力部２８、及び応答文出力待ち時間及び発話待ち時間を動的に変更する制御を行う待ち時間制御部３２を含んだ構成で表すことができる。 When the computer 16 is described with function blocks divided for each function realizing means determined based on hardware and software, as shown in FIG. 9, a language analysis unit 20, an intention estimation unit 22, a response candidate generation unit 124, A response candidate storage unit 30, an output timing measurement unit 26, an utterance waiting time measurement unit 126, an output unit 28, and a waiting time control unit 32 that performs control to dynamically change the response sentence output waiting time and the utterance waiting time are included. It can be expressed in configuration.

第３の実施の形態では、生成された応答文候補または応答候補記憶部３０に格納された応答文候補から選択された１つの応答文候補を出力する際、その応答文候補に対応する応答候補タイプの応答文出力待ち時間を待ってから出力する。 In the third embodiment, when outputting one response sentence candidate selected from the generated response sentence candidates or the response sentence candidates stored in the response candidate storage unit 30, the response candidates corresponding to the response sentence candidates Wait for the response message output waiting time for the type to be output.

例えば、ユーザ発話「友達が遊びに来るよ」（ｉ＝”言明”）に対して、応答候補タイプ「了解」、「繰り返し」、「確認」、「深堀質問」、及び「言明」に基づく応答文候補が生成され、まず「いつ遊びに来るの？」（深堀質問）を出力する場合、第２の実施の形態では、１つ目の応答文は選択されるとすぐに出力したが、ここでは、ｉ＝”言明”に対する「深堀質問」の応答文出力待ち時間”Ｔ１”の経過を待ってから出力する。ユーザの次発話がないままＴ１を経過すると、残りの応答文候補の中から、例えば、「僕も友達が欲しいな」（言明）を選択して出力する。 For example, a response based on the response candidate types “OK”, “Repeat”, “Confirmation”, “Fukahori Question”, and “Statement” to the user utterance “Friend comes to play” (i = “Statement”) When a sentence candidate is generated and “when do you come to play?” (Fukahori question) is output first, in the second embodiment, the first response sentence is output as soon as it is selected. Then, after waiting for the elapse of the response sentence output waiting time “T1” of “Fukahori question” to i = “statement”, it is output. When T1 elapses without the user's next utterance, for example, “I also want a friend” (statement) is selected and output from the remaining response sentence candidates.

ここで、２つ目以降の応答文を出力する際には、出力しようとしている応答文の応答文出力待ち時間と、前に出力した応答文の発話待ち時間が競合することになる。例えば、上記の例では、１つ目の応答文「いつ遊びに来るの？」（深堀質問）について、発話待ち時間”Ｔ３”が設定されており、２つ目の応答文「僕も友達が欲しいな」（言明）について、応答文出力待ち時間”Ｔ１”が設定されている。この場合、「いつ遊びに来るの？」を出力してから「僕も友達が欲しいな」を出力するまでの時間は、Ｔ１を優先、Ｔ３を優先、またはＴ１及びＴ３の平均時間などのように決定することができるが、ユーザの発話を装置からの応答文の出力によって阻害しない、という点に鑑み、待ち時間の長い方を選択することが好ましい。 Here, when outputting the second and subsequent response sentences, the response sentence output waiting time of the response sentence to be output competes with the utterance waiting time of the previously output response sentence. For example, in the above example, the utterance waiting time “T3” is set for the first response sentence “When do you come to play?” (Fukahori question), and the second response sentence “I am also a friend? The response sentence output waiting time “T1” is set for “I want” (statement). In this case, the time from outputting “When do you come to play?” To outputting “I want a friend”, T1 has priority, T3 has priority, or the average time of T1 and T3, etc. However, it is preferable to select the longer waiting time in view of the fact that the user's utterance is not hindered by the output of the response sentence from the device.

待ち時間制御部３２は、応答文出力待ち時間及び発話待ち時間を動的に変更する。例えば、ユーザが対話に不慣れな対話開始時は待ち時間を長めに設定し、対話開始から所定時間を経過すると、はじめに設定した待ち時間より短い待ち時間に変更する。また、ユーザ毎に対話における沈黙時間の履歴をとっておき、ユーザの顔を撮影した画像を用いるなどしてユーザを認証し、そのユーザの沈黙時間の履歴を参照して、待ち時間を決定するようにしてもよい。具体的には、応答候補タイプ「言明」、「繰り返し」、「言明回答」、「Ｙ／Ｎ回答」、及び「了解」に基づく応答文出力後のユーザ発話入力までの沈黙時間の平均をＴ２、応答候補タイプ「確認」、及び「深堀質問」に基づく応答文出力後のユーザ発話入力までの沈黙時間の平均をＴ３、対話全体の沈黙時間の平均をＴ１とすることができる。 The waiting time control unit 32 dynamically changes the response sentence output waiting time and the utterance waiting time. For example, a long waiting time is set at the start of the dialog when the user is unfamiliar with the dialog, and when a predetermined time elapses from the start of the dialog, the waiting time is changed to a waiting time shorter than the initially set waiting time. Also, for each user, a history of silence time in the dialogue is taken, the user is authenticated by using an image obtained by photographing the user's face, and the waiting time is determined by referring to the history of the user's silence time. May be. Specifically, the average of the silence time until the user utterance input after the response sentence output based on the response candidate types “statement”, “repeat”, “statement answer”, “Y / N answer”, and “OK” is T2 The average of the silence time until the user utterance input after the response sentence output based on the response candidate types “confirmation” and “Fukahori question” can be T3, and the average of the silence time of the entire dialogue can be T1.

以上説明したように、第３の実施の形態の応答生成装置によれば、待ち時間を動的に変更することにより、ユーザが装置との対話に慣れてきた場合や、ユーザがじっくり考えて応答するタイプか即答するタイプかなどの個人差がある場合などにも対応して、適切なタイミングで応答文を出力することができる。 As described above, according to the response generation device of the third exemplary embodiment, by dynamically changing the waiting time, when the user has become accustomed to the dialogue with the device, or when the user thinks carefully It is possible to output a response sentence at an appropriate timing in response to a case where there is an individual difference such as a type to be answered or a type to be answered immediately.

なお、上記第１〜第３の実施の形態では、スピーカによる音声出力を行う場合を例に説明したが、これに限定されるものではなく、ディスプレイに応答文を表示するようにしてもよい。また、ユーザから音声がマイクに入力される場合を例に説明したが、ユーザがキーボードなどを用いて入力文としてのテキストを入力するようにしてもよい。 In the first to third embodiments described above, the case where voice output by a speaker is performed has been described as an example. However, the present invention is not limited to this, and a response sentence may be displayed on a display. Moreover, although the case where audio | voice was input into the microphone from the user was demonstrated to the example, you may make it a user input the text as an input sentence using a keyboard etc.

第１の実施の形態に係る応答生成装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the response generation apparatus which concerns on 1st Embodiment. 発話意図クラス、及び発話例を示す表である。It is a table | surface which shows the speech intention class and the speech example. 発話意図クラス、応答候補タイプ、応答文出力待ち時間、及び応答文例を示す表である。It is a table | surface which shows a speech intention class, a response candidate type, a response sentence output waiting time, and a response sentence example. 第１の実施の形態の応答生成処理ルーチンを示すフローチャートである。It is a flowchart which shows the response production | generation process routine of 1st Embodiment. 意図推定処理ルーチンを示すフローチャートである。It is a flowchart which shows the intention estimation process routine. 第２の実施の形態に係る応答生成装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the response generation apparatus which concerns on 2nd Embodiment. 発話意図クラス、応答候補タイプ、発話待ち時間、及び応答文例を示す表である。It is a table | surface which shows a speech intention class, a response candidate type, a speech waiting time, and a response sentence example. 第２の実施の形態の応答生成処理ルーチンを示すフローチャートである。It is a flowchart which shows the response production | generation process routine of 2nd Embodiment. 第３の実施の形態に係る応答生成装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the response generation apparatus which concerns on 3rd Embodiment.

Explanation of symbols

１０、１１０、２１０応答生成装置
１２マイク
１４スピーカ
１６コンピュータ
２０言語解析部
２２意図推定部
２６出力タイミング計測部
２８出力部
３０応答候補記憶部
３２待ち時間制御部
１２４応答候補生成部
１２６発話待ち時間計測部 10, 110, 210 Response generation device 12 Microphone 14 Speaker 16 Computer 20 Language analysis unit 22 Intention estimation unit 26 Output timing measurement unit 28 Output unit 30 Response candidate storage unit 32 Wait time control unit 124 Response candidate generation unit 126 Speaking waiting time measurement Part

Claims

An input means for inputting an input sentence from the user;
An intention estimation unit that estimates an intention represented by the input sentence from an analysis result obtained by analyzing a structure of the input sentence input by the input unit;
Response generation means for generating at least one response sentence corresponding to the intention represented by the input sentence estimated by the intention estimation means;
After the waiting time determined by the combination of the intention estimated by the intention estimating means and the response sentence generated by the response generating means has elapsed, the response sentence generated by the response generating means is output and the waiting time has elapsed. Control means for controlling not to output the response sentence generated by the response generation means when the next input sentence is input by the input means before
A response generation device including:

An input means for inputting an input sentence from the user;
An intention estimation unit that estimates an intention represented by the input sentence from an analysis result obtained by analyzing a structure of the input sentence input by the input unit;
Response generation means for generating at least one response sentence corresponding to the intention represented by the input sentence estimated by the intention estimation means;
One response sentence is output from at least one response sentence generated by the response generation means, and when there is an unoutput response sentence, the intention estimated by the intention estimation means is output. After the waiting time determined by the combination with the response sentence elapses, another response sentence is output from the non-output response sentences, and the next input sentence is input by the input means before the waiting time elapses. Is inputted, control means for controlling not to output the other one response sentence,
A response generation device including:

3. The response generation according to claim 2, wherein the control means controls to output a response sentence that prompts input of an input sentence when the non-output response sentence does not exist and the waiting time has elapsed. apparatus.

For the intention estimated by the estimation means, the waiting time when a response sentence that asks or states a question about the contents not included in the input sentence is combined with the intention estimated by the estimation means, The response generation device according to any one of claims 1 to 3, wherein the response generation device is longer than the waiting time when a response sentence, an answer, an understanding, a repetition, or a confirmation response is combined.

The waiting time is shortened according to the elapsed time from when the input sentence was first input, or based on the silence time from the past response sentence output by the user to the next input sentence input, The response generation device according to claim 1, wherein the response generation device is configured to be longer as the silence time is longer.

Computer
An intention estimation unit that estimates an intention represented by the input sentence from an analysis result obtained by analyzing a structure of the input sentence input by an input unit that inputs an input sentence from a user;
Response generation means for generating at least one response sentence corresponding to the intention represented by the input sentence estimated by the intention estimation means;
After the waiting time determined by the combination of the intention estimated by the intention estimating means and the response sentence generated by the response generating means has elapsed, the response sentence generated by the response generating means is output and the waiting time has elapsed. Control means for controlling not to output the response sentence generated by the response generation means when the next input sentence is input by the input means before
A response generator to make it function.

Computer
An intention estimation unit that estimates an intention represented by the input sentence from an analysis result obtained by analyzing a structure of the input sentence input by an input unit that inputs an input sentence from a user;
Response generation means for generating at least one response sentence corresponding to the intention represented by the input sentence estimated by the intention estimation means;
One response sentence is output from at least one response sentence generated by the response generation means, and when there is an unoutput response sentence, the intention estimated by the intention estimation means is output. After the waiting time determined by the combination with the response sentence elapses, another response sentence is output from the non-output response sentences, and the next input sentence is input by the input means before the waiting time elapses. Is inputted, control means for controlling not to output the other one response sentence,
A response generator to make it function.