JP2009198871A

JP2009198871A - Voice interaction apparatus

Info

Publication number: JP2009198871A
Application number: JP2008041201A
Authority: JP
Inventors: Kazuya Shimooka; 和也下岡; Yusuke Nakano; 雄介中野
Original assignee: Toyota Motor Corp; Toyota Central R&D Labs Inc
Current assignee: Toyota Motor Corp; Toyota Central R&D Labs Inc
Priority date: 2008-02-22
Filing date: 2008-02-22
Publication date: 2009-09-03

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice interaction apparatus capable of continuing interaction without causing failure when reliability of verb in the results of recognition of utterance related to user's action (verb) is low. <P>SOLUTION: This voice interaction apparatus is provided with an inputting part 11 for inputting utterance, a first utterance analyzing part 12 for analyzing the input utterance, a verb determining part 13 for determining whether a verb is included in the results of analysis or not, a verb reliability determining part 14 for determining whether the verb is reliable or not when it is determined that the verb is included in the results of analysis, a response generating part 15 for generating a response to the utterance by using a response template fixed in advance and the verb when it is determined that the verb is reliable, an emotion question generating part 16 for generating a question for asking about emotion corresponding to the contents of the utterance by using a question template fixed in advance, and an outputting part 22 for outputting the generated response and question. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、ユーザの行動に関する内容がユーザ発話として数多く入力される場合に有効な音声対話装置及び音声対話プログラムに関する。 The present invention relates to a voice dialogue apparatus and a voice dialogue program that are effective when a large amount of user behavior is input as user utterances.

雑談対話システムや傾聴対話システムなど、ユーザ発話を音声認識して認識結果に対応する応答を出力してユーザと対話を行う技術が提案されている。このような技術として、例えば、ユーザ発話に対する音声認識結果の信頼度に閾値を設定し、信頼度が閾値以上であれば認識結果に対応する応答を出力し、信頼度が閾値未満であれば認識結果をリジェクトし、自律応答辞書に記述されている応答を出力する音声対話装置（例えば、特許文献１参照。）が知られている。
特開２００２−１９６７８９５号公報 Technologies such as a chat dialogue system and a listening dialogue system have been proposed in which a user utterance is recognized by speech and a response corresponding to the recognition result is output to interact with the user. As such a technique, for example, a threshold is set for the reliability of the speech recognition result for the user utterance, and if the reliability is equal to or higher than the threshold, a response corresponding to the recognition result is output, and if the reliability is lower than the threshold, recognition is performed. A voice interaction device that rejects a result and outputs a response described in an autonomous response dictionary (see, for example, Patent Document 1) is known.
JP 2002-1967895 A

しかしながら、特許文献１の音声対話装置では、ユーザ発話の認識結果として、例えば「週末温泉に行った」が得られた場合に、この認識結果の信頼度が閾値未満であると自律応答辞書を検索して応答を出力する。具体的には、「週末温泉に行った」に対する応答として、「何か話して」、「誰かいないの？」、「ねえねえ」などが出力され、明らかに不適切であり、結果としてユーザとの対話が破綻してしまうという問題点がある。 However, in the spoken dialogue apparatus of Patent Document 1, when the recognition result of the user utterance is, for example, “I went to a hot spring on the weekend”, the autonomous response dictionary is searched if the reliability of the recognition result is less than the threshold value. And output a response. Specifically, as a response to “I went to a hot spring on the weekend”, “Speak something”, “Is there anyone?”, “Hey hey”, etc. are output, which is clearly inappropriate, and as a result There is a problem that the dialogue of the future will be broken.

本発明は、上記問題点を解決するために成されたものであり、ユーザの行動（動詞）に関する発話の認識結果中の動詞が信頼度が低い場合に、ユーザとの対話が破綻することなく継続する音声対話装置及び音声対話プログラムを提供することを目的とする。 The present invention has been made to solve the above-described problems, and when the verb in the recognition result of the utterance regarding the user's action (verb) is low in reliability, the dialogue with the user is not broken. It is an object of the present invention to provide a continuous voice dialogue apparatus and a voice dialogue program.

上記目的を達成するために、請求項１記載の音声対話装置は、ユーザによる発話を入力する入力手段と、前記入力手段により入力されたユーザによる最初の発話を解析する第１の発話解析手段と、前記第１の発話解析手段による解析結果に動詞が含まれるか否かを判定する動詞判定手段と、前記動詞判定手段により前記発話解析手段による解析結果に動詞が含まれていると判定されたときに、該動詞が信頼できるか否かを判定する動詞信頼度判定手段と、前記動詞信頼度判定手段により前記動詞が信頼できると判定されたときに、予め定められた応答テンプレートと前記動詞とを用いて、前記最初の発話に対して応答を生成する応答生成手段と、前記動詞信頼度判定手段により前記動詞が信頼できないと判定されたときに、予め定められた感情質問テンプレートを用いて、前記ユーザの最初の発話の感情を問う質問を生成する感情質問生成手段と、前記応答生成手段により生成された応答及び前記感情質問生成手段により生成された質問を出力する出力手段と、を備えている。 In order to achieve the above object, the voice interaction apparatus according to claim 1 includes an input means for inputting an utterance by a user, and a first utterance analysis means for analyzing an initial utterance by the user input by the input means. , A verb determination unit for determining whether or not a verb is included in the analysis result by the first utterance analysis unit, and a determination by the verb determination unit that a verb is included in the analysis result by the utterance analysis unit Sometimes a verb reliability determination means for determining whether or not the verb is reliable, and when the verb reliability determination means determines that the verb is reliable, a predetermined response template and the verb A response generating means for generating a response to the first utterance and a predetermined feeling when the verb reliability determining means determines that the verb is unreliable. An emotion question generating means for generating a question asking the emotion of the first utterance of the user by using a question template, and an output for outputting the response generated by the response generating means and the question generated by the emotion question generating means Means.

請求項１記載の発明によれば、ユーザの発話に含まれる動詞が信頼できないときに、ユーザの感情を問う質問をして対話を特定の方向に導くことができる。 According to the first aspect of the present invention, when the verb included in the user's utterance is unreliable, the question can be asked about the user's emotion and the dialogue can be guided in a specific direction.

請求項２記載の音声対話装置は、請求項１記載の音声対話装置において、前記出力手段により出力された前記質問に対して、前記入力手段により入力された前記ユーザの発話を解析する第２の発話解析手段と、感情を表現する単語を記憶した感情表現記憶手段と、前記第２の発話解析手段による解析結果に前記感情表現記憶手段に記憶された感情を表現する単語が含まれるか否かを判定する感情表現判定手段と、前記感情表現判定手段により前記感情応答解析手段による解析結果に感情を表現する単語が含まれると判定されたときに、予め定められた感情応答テンプレートと、前記感情表現発話に対して前記感情を表現する単語とを用いて応答文を生成する感情応答生成手段と、前記感情表現判定手段により前記感情応答解析手段による解析結果に感情を表現する単語が含まれないと判定されたときに、予め定めた相槌を用いて前記感情表現発話に対する相槌を生成する相槌生成手段と、を更に備え、前記出力手段は、前記感情応答生成手段により生成された応答文及び前記相槌生成手段により生成された相槌を出力する。 According to a second aspect of the present invention, there is provided a voice dialogue apparatus according to the first aspect, wherein the user's utterance input by the input means is analyzed with respect to the question output by the output means. Whether the speech analysis means, the emotion expression storage means storing a word expressing emotion, and the analysis result by the second utterance analysis means include a word expressing the emotion stored in the emotion expression storage means An emotion expression determination means for determining the emotion response determination template, and when the emotion expression determination means determines that a word representing the emotion is included in the analysis result by the emotion response analysis means, a predetermined emotion response template and the emotion An emotion response generation unit that generates a response sentence using a word that expresses the emotion with respect to an expression utterance, and an analysis by the emotion response analysis unit by the emotion expression determination unit When it is determined that a word expressing emotion is not included in the result, a conflict generating means for generating a conflict for the emotion expression utterance using a predetermined conflict is further provided, and the output means includes the emotion The response sentence generated by the response generation unit and the interaction generated by the interaction generation unit are output.

請求項２記載の発明によれば、ユーザの感情を問う質問に対する応答に対して、応答文や相槌を出力して更に対話を継続することができる。 According to the second aspect of the present invention, it is possible to continue the dialogue by outputting a response sentence and an answer to the response to the question asking the user's emotion.

請求項３記載の音声対話装置は、請求項２記載の音声対話装置において、前記相槌生成手段は、前記動詞判定手段により前記発話解析手段による解析結果に動詞が含まれていないと判定されたときに、予め定めた相槌を用いて相槌を生成する。 The spoken dialogue apparatus according to claim 3, wherein, in the spoken dialogue apparatus according to claim 2, when the verb generation unit determines that the verb determination unit does not include a verb in the analysis result by the utterance analysis unit. In addition, using a predetermined interaction, an interaction is generated.

請求項３記載の発明によれば、ユーザの最初の発話に動詞が含まれないときに、相槌を出力して会話を継続することができる。 According to the third aspect of the present invention, when a verb is not included in the first utterance of the user, it is possible to output the autonomy and continue the conversation.

請求項４記載の音声対話プログラムは、コンピュータを、入力されたユーザによる最初の発話を解析する発話解析手段、前記発話解析手段による解析結果に動詞が含まれるか否かを判定する動詞判定手段、前記動詞判定手段により前記発話解析手段による解析結果に動詞が含まれていると判定されたときに、該動詞が信頼できるか否かを判定する動詞信頼度判定手段、前記動詞信頼度判定手段により前記動詞が信頼できると判定されたときに、予め定められた応答テンプレートと前記動詞とを用いて、前記最初の発話に対して応答を生成する応答生成手段、前記動詞信頼度判定手段により前記動詞が信頼できないと判定されたときに、予め定められた感情質問テンプレートを用いて前記ユーザの最初の発話の内容に対応する感情を問う質問を生成する感情質問生成手段、として機能させる。 The spoken dialogue program according to claim 4, wherein the computer includes an utterance analysis unit that analyzes an initial utterance by the input user, a verb determination unit that determines whether or not a verb is included in an analysis result by the utterance analysis unit, When the verb determining means determines that the verb is included in the analysis result by the utterance analyzing means, the verb reliability determining means for determining whether or not the verb is reliable, the verb reliability determining means When it is determined that the verb is reliable, a response generation unit that generates a response to the first utterance using a predetermined response template and the verb, and the verb reliability determination unit determines the verb Is determined to be unreliable, a question that asks the emotion corresponding to the content of the user's first utterance using a predetermined emotion question template Emotion question generating means to be formed, to function as.

請求項４記載の発明によれば、ユーザの発話に含まれる動詞が信頼できないときに、ユーザの感情を問う質問をして対話を特定の方向に導くことができる。 According to the fourth aspect of the present invention, when the verb included in the user's utterance is unreliable, the question can be asked about the user's emotion and the dialogue can be guided in a specific direction.

以上説明したように、本発明の音声対話装置及び音声対話プログラムによれば、ユーザの行動を表す動詞を含む発話の認識結果中の動詞が信頼度が低い場合に、ユーザとの対話が破綻することなく継続するように、適切な応答を生成することができるという効果が得られる。 As described above, according to the speech dialogue apparatus and the speech dialogue program of the present invention, the dialogue with the user fails when the verb in the recognition result of the utterance including the verb representing the user's behavior is low in reliability. The effect is that an appropriate response can be generated so as to continue without interruption.

以下、本発明の実施の形態について図面を参照しながら詳細に説明する。なお、本発明は、以下の実施の形態に限定されるものではなく、特許請求の範囲に記載された範囲内で設計上の変更をされたものにも適用可能である。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In addition, this invention is not limited to the following embodiment, It is applicable also to what changed the design within the range described in the claim.

図１は、本発明に係る音声対話装置の主要構成を示すブロック図である。同図に示すように、本実施の形態に係る音声対話装置は、入力部１１と、第１の発話解析部１２と、動詞判定部１３と、動詞信頼度判定部１４と、応答生成部１５と、感情質問生成部１６と、第２の発話解析部１７と、感情表現辞書１８と、感情表現判定部１９と、感情表現応答生成部２０と、相槌生成部２１と、出力部２２と、を備えている。 FIG. 1 is a block diagram showing the main configuration of a voice interaction apparatus according to the present invention. As shown in the figure, the voice interaction apparatus according to the present embodiment includes an input unit 11, a first utterance analysis unit 12, a verb determination unit 13, a verb reliability determination unit 14, and a response generation unit 15. An emotion question generation unit 16, a second utterance analysis unit 17, an emotion expression dictionary 18, an emotion expression determination unit 19, an emotion expression response generation unit 20, a conflict generation unit 21, an output unit 22, It has.

入力部１１は、例えばマイクロホンで構成され、ユーザ発話を集音して音声信号を生成する。 The input unit 11 is composed of a microphone, for example, and collects a user's utterance and generates an audio signal.

第１の発話解析部１２は、複数の単語が登録された認識用辞書データベース（図示省略）に登録された単語を参照し、入力部１１によって生成された音声信号に基づいてユーザ発話を認識し、最も確からしい候補から上位所定数（例えば１０）の候補を選択すると共に、それらを一般的な形態素解析器を用いて形態素解析を行う。 The first utterance analysis unit 12 refers to a word registered in a recognition dictionary database (not shown) in which a plurality of words are registered, and recognizes a user utterance based on an audio signal generated by the input unit 11. The top predetermined number (for example, 10) candidates are selected from the most probable candidates, and morpheme analysis is performed on them using a general morpheme analyzer.

動詞判定部１３は、第１の発話解析部１２による解析結果に基づいて、ユーザ発話に動詞が含まれるか否かを判定する。動詞判定部１３は、所定の条件を満たす場合には動詞が含まれると判定し、所定の条件を満たさない場合には動詞が含まれないと判定する。本実施の形態では、「解析結果の上位１０候補のうち、８つ以上の候補が動詞を含む」という条件を設定する。 The verb determination unit 13 determines whether or not a verb is included in the user utterance based on the analysis result by the first utterance analysis unit 12. The verb determination unit 13 determines that a verb is included when a predetermined condition is satisfied, and determines that a verb is not included when the predetermined condition is not satisfied. In the present embodiment, the condition that “8 or more candidates among the top 10 candidates of the analysis result include a verb” is set.

動詞信頼度判定部１４は、動詞判定部１３によってユーザ発話に動詞が含まれると判定された場合に、その動詞が信頼できるか否かを判定する。動詞信頼度判定部１４は、所定の条件を満たす場合にはその動詞は信頼できると判定し、所定の条件を満たさない場合には動詞は信頼できないと判定する。本実施の形態では、「当該動詞が、解析結果上位１０候補のうち７つ以上の候補に含まれる」という条件を設定する。 The verb reliability determination unit 14 determines whether or not the verb is reliable when the verb determination unit 13 determines that the verb is included in the user utterance. The verb reliability determination unit 14 determines that the verb is reliable when a predetermined condition is satisfied, and determines that the verb is not reliable when the predetermined condition is not satisfied. In the present embodiment, a condition is set that “the verb is included in seven or more candidates among the top ten candidates for analysis results”.

応答生成部１５は、第１の発話解析部１２による解析結果が動詞判定部１３によって動詞を含むと判定され、かつ、動詞信頼度判定部１４によってその動詞が信頼できると判定されたときに、第１の発話解析部１２による解析結果を用いて応答を生成する。例えば、解析結果に含まれる動詞が「行く」であれば、図２に示す予め用意されたテンプレート、例えば「〜んだぁ」と動詞「行く」とを用い、更に時制をあわせて「行ったんだぁ」を応答として生成する。 The response generation unit 15 determines that the analysis result by the first utterance analysis unit 12 includes the verb by the verb determination unit 13 and the verb reliability determination unit 14 determines that the verb is reliable. A response is generated using the analysis result by the first utterance analysis unit 12. For example, if the verb included in the analysis result is “go”, the template prepared in advance shown in FIG. 2, for example, “~ nada” and the verb “go” are used. "Da" is generated as a response.

感情質問生成部１６は、第１の発話解析部１２による解析結果が動詞判定部１３によって動詞を含むと判定され、かつ、動詞信頼度判定部１４によってその動詞が信頼できないと判定されたときに、ユーザ発話が示す内容に対するユーザの感情を問う質問を生成する。即ち、感情質問生成部１６は、ユーザ発話の解析結果を用いるのではなく、ユーザが行動（動詞）についての発話を行うときに有すると考えられる感情を質問する。感情質問生成部１６は、質問の生成に際しては、予め用意されたテンプレート、例えば「どんな気分だった？」を用いて質問を生成する。 The emotion question generation unit 16 determines that the analysis result by the first utterance analysis unit 12 includes a verb by the verb determination unit 13 and the verb reliability determination unit 14 determines that the verb is not reliable. The question which asks the user's feeling with respect to the content which a user utterance shows is generated. That is, the emotion question generation unit 16 does not use the analysis result of the user utterance, but asks an emotion that is considered to be possessed when the user utters an action (verb). The emotion question generation unit 16 generates a question using a template prepared in advance, for example, “How did you feel?”

第２の発話解析部１７は、第１の発話解析部１２とは異なり、感情を表現する複数の単語、例えば「気持ちよい」、「楽しい」など、及びそれらの単語を修飾する複数の単語、例えば「とても」、「非常に」などが登録された認識用辞書データベース（図示省略）に登録された単語を参照し、入力部１１によって生成された音声信号に基づいてユーザ発話を認識し、最も確からしい候補１つを選択すると共に、それを一般的な形態素解析器を用いて形態素解析を行う。 Unlike the first utterance analysis unit 12, the second utterance analysis unit 17 includes a plurality of words that express emotions, such as “feeling good” and “fun”, and a plurality of words that modify these words, for example, The user's utterance is recognized based on the voice signal generated by the input unit 11 with reference to a word registered in a recognition dictionary database (not shown) in which “very”, “very”, etc. are registered. A new candidate is selected, and a morphological analysis is performed using a general morphological analyzer.

感情表現辞書１８には、図３に示すように、感情を表現する複数の単語が登録されている。 In the emotion expression dictionary 18, a plurality of words expressing emotions are registered as shown in FIG.

感情表現判定部１９は、第２の発話解析部１７による解析結果に感情表現が含まれているか否かを、感情表現辞書１８を用いて判定する。具体的には、感情表現判定部１９は、第２の発話解析部１７による解析結果が感情表現辞書１８に登録されている単語を含んでいるときには感情表現が含まれていると判定し、その他のときには感情表現が含まれて射ないと判定する。 The emotional expression determination unit 19 uses the emotional expression dictionary 18 to determine whether or not the emotional expression is included in the analysis result by the second utterance analysis unit 17. Specifically, the emotional expression determination unit 19 determines that the emotional expression is included when the analysis result by the second utterance analysis unit 17 includes a word registered in the emotional expression dictionary 18, and the others. It is determined that it does not shoot because emotional expressions are included.

感情表現応答生成部２０は、感情表現判定部１９によって第２の発話解析部１７の解析結果が感情表現を含むと判定されたときに応答を生成する。感情表現応答生成部２０は、予め定めた応答生成テンプレートを含む応答生成ルールを記憶した感情表現応答生成データベースを保持している。図４は、感情表現応答生成データベースを示す。同図に示すように、感情表現応答生成データベースには、感情表現が表す感情が有し得る「程度」、「時間」などの属性情報と、属性情報「程度」に対して「どのくらい〜なの？」、属性情報「時間」に対して「いつ〜なの？」などの応答を生成する応答生成ルールが定められている。感情表現応答生成部２０は、前述の解析結果に含まれる感情表現が表す感情が有し得る属性情報を判別し、感情表現について何れかの属性情報を問う応答を生成する。 The emotion expression response generation unit 20 generates a response when the emotion expression determination unit 19 determines that the analysis result of the second utterance analysis unit 17 includes the emotion expression. The emotional expression response generation unit 20 holds an emotional expression response generation database that stores response generation rules including a predetermined response generation template. FIG. 4 shows an emotion expression response generation database. As shown in the figure, in the emotion expression response generation database, “how much?” Is attribute information such as “degree” and “time” that the emotion represented by the emotion expression can have and attribute information “degree”. , And a response generation rule for generating a response such as “when?” With respect to the attribute information “time”. The emotion expression response generation unit 20 determines attribute information that the emotion represented by the emotion expression included in the analysis result described above may have, and generates a response asking for any attribute information regarding the emotion expression.

相槌生成部２１は、第１の発話解析部１２による解析結果に動詞が含まれないとき、及び第２の発話解析部１７による解析結果に感情表現が含まれないと判定されたときにユーザに対する応答として相槌を生成する。相槌生成部２１は、予め相槌として「へぇ」、「ふーん」、「そうなんだ」、「それでそれで」、「ふむふむ」などの相槌が登録された相槌データベース（図示省略）を参照することにより相槌を生成する。 When the analysis result by the first utterance analysis unit 12 does not include a verb and when the analysis result by the second utterance analysis unit 17 determines that the emotion expression is not included, Generate a response as a response. The reference generation unit 21 refers to the reference database (not shown) in which the references such as “hee”, “hmm”, “sore”, “so then”, “fumfum” are registered in advance. Generate.

出力部２２は、例えばスピーカで構成され、ユーザに対する応答発話を音声出力する。また、出力部２２は、音声出力に限らず、応答文を画面に表示したり、紙に印刷してもよい。 The output unit 22 is configured by, for example, a speaker, and outputs a response utterance to the user. The output unit 22 is not limited to voice output, and may display a response sentence on a screen or print it on paper.

次に、本実施の形態における音声対話装置の作用について、図５に示すフローチャートに沿って説明する。本実施の形態では、例として「週末温泉に行った」を最初のユーザ発話として考える。 Next, the operation of the voice interaction apparatus in the present embodiment will be described along the flowchart shown in FIG. In this embodiment, as an example, “I went to a hot spring on the weekend” is considered as the first user utterance.

ステップ１００では、入力部１１が、ユーザ発話が入力されたか否かを判定する。ユーザ発話が入力されて発話に応じた音声信号が生成されたときにはステップ１０２に進み、ユーザ発話が入力されないときにはユーザ発話が入力されるまでこの判断を繰り返す。本実施の形態では、ユーザ発話として「週末温泉に行った」が入力され、これに応じた音声信号が生成されてステップ１０２に進む。 In step 100, the input unit 11 determines whether a user utterance has been input. When a user utterance is input and an audio signal corresponding to the utterance is generated, the process proceeds to step 102. When a user utterance is not input, this determination is repeated until a user utterance is input. In the present embodiment, “I went to a hot spring on the weekend” is input as a user utterance, an audio signal corresponding to this is generated, and the process proceeds to step 102.

ステップ１０２では、第１の発話解析部１２が、入力部１１により生成された音声信号に基づき、認識用辞書データベースに登録された単語を参照してユーザ発話を認識し、本実施の形態では最も確からしい候補から１０候補を選択すると共に、それらの形態素解析を行う。図６は、解析結果として得られた上位１０候補を示す。 In step 102, the first utterance analysis unit 12 recognizes the user utterance by referring to the words registered in the recognition dictionary database based on the voice signal generated by the input unit 11, and is the most preferred in this embodiment. 10 candidates are selected from the probable candidates and their morphological analysis is performed. FIG. 6 shows the top 10 candidates obtained as an analysis result.

ステップ１０４では、動詞判定部１３が、図６の解析結果に基づいて、ユーザ発話に動詞が含まれるか否かを判定し、動詞が含まれる場合にはステップ１０６に進み、動詞が含まれない場合にはステップ１２２に進む。本実施の形態では、動詞判定部１３は、図６に示す解析結果の１０候補のうち、動詞を含むものが８つ以上の場合にはユーザ発話が動詞を含むと判定し、動詞を含むものが７つ以下の場合にはユーザ発話は動詞を含まないと判定する。図６に示す解析結果では、１０候補すべてが動詞を含むため、ユーザ発話には動詞が含まれると判定され、ステップ１０６に進む。 In step 104, the verb determination unit 13 determines whether or not a verb is included in the user utterance based on the analysis result of FIG. 6. If the verb is included, the process proceeds to step 106, and no verb is included. If so, go to Step 122. In the present embodiment, the verb determination unit 13 determines that the user utterance includes a verb when there are eight or more of the analysis results shown in FIG. 6 including the verb, and includes the verb. Is less than seven, it is determined that the user utterance does not include a verb. In the analysis result shown in FIG. 6, since all ten candidates include verbs, it is determined that the user utterance includes verbs, and the process proceeds to step 106.

ステップ１０６では、動詞信頼度判定部１４が、ユーザ発話に含まれる動詞が信頼できるか否かを判定し、信頼できる場合にはステップ１２６に進み、信頼できない場合にはステップ１０８に進む。本実施の形態では、動詞信頼度判定部１４は、図６に示す解析結果の１０候補のうち、その動詞を含むものが７つ以上の場合には信頼できると判定し、その動詞を含むものが６つ以下の場合には信頼できないと判定する。 In step 106, the verb reliability determination unit 14 determines whether or not the verb included in the user utterance is reliable. If it is reliable, the process proceeds to step 126, and if not reliable, the process proceeds to step 108. In the present embodiment, the verb reliability determination unit 14 determines that the ten candidates of the analysis result shown in FIG. 6 include the verb when it is seven or more, and includes the verb. Is determined to be unreliable if the number is 6 or less.

図７は、図６の解析結果の候補に含まれる動詞と当該動詞を含む解析結果の候補の数を示す。図７に示されるように、解析結果の候補に含まれる動詞「行く」、「似る」、「言う」、「いる」、「煮る」は何れも７つ以上の解析結果の候補には含まれていないため、ユーザ発話に含まれる動詞は信頼できないと判定され、ステップ１０８に進む。 FIG. 7 shows the verbs included in the analysis result candidates of FIG. 6 and the number of analysis result candidates including the verbs. As shown in FIG. 7, the verbs “go”, “similar”, “say”, “is”, and “simmer” included in the analysis result candidates are all included in the seven or more analysis result candidates. Therefore, it is determined that the verb included in the user utterance is unreliable, and the process proceeds to step 108.

ステップ１０８では、感情質問生成部１６が、ユーザが発話に含まれる動詞について発話を行うときに有すると考えられる感情を問う質問を生成する。具体的には、感情質問生成部１６は、発話に含まれる動詞が何であるかにかかわらず、予め用意しているテンプレートに基づいて、例えば「へー、どんな気分だった？」を質問として生成する。 In step 108, the emotion question generation unit 16 generates a question asking an emotion that the user thinks to have when speaking about a verb included in the utterance. Specifically, the emotion question generation unit 16 generates, for example, “Hey, how did you feel?” As a question based on a template prepared in advance, regardless of what verb is included in the utterance. .

ステップ１１０では、出力部２２が、生成された質問を出力する。 In step 110, the output unit 22 outputs the generated question.

ステップ１１２では、入力部１１が、出力部２２によって出力された質問に対するユーザの発話が入力されたか否かを判定する。ユーザ発話が入力されて発話に応じた音声信号が生成されたときにはステップ１１４に進み、ユーザ発話が入力されないときにはユーザ発話が入力されるまでこの判断を繰り返す。本実施の形態では、ユーザ発話として「気持ちよかったよ」が入力されたものとし、これに応じた音声信号が生成されてステップ１１４に進む。 In step 112, the input unit 11 determines whether or not the user's utterance for the question output by the output unit 22 has been input. When a user utterance is input and an audio signal corresponding to the utterance is generated, the process proceeds to step 114. When a user utterance is not input, this determination is repeated until a user utterance is input. In this embodiment, it is assumed that “good feeling” is input as the user utterance, and an audio signal corresponding to this is generated, and the process proceeds to step 114.

ステップ１１４では、第２の発話解析部１７が、入力部１１により生成された音声信号に基づき、感情を表現する単語及びそれらを修飾する単語が登録された認識用辞書データベースに登録された単語を参照してユーザ発話を認識する共に形態素解析を行う。この場合、ユーザに対して感情を問う質問をしているので、ユーザの発話の内容は感情を示すものに絞られることが期待できるため、感情を表現する複数の単語及びそれらの単語を修飾する複数の単語のみに特化した認識用辞書データベースを用いて認識することが可能となる。本実施の形態では、解析結果の最も確からしい候補として「気持ちよかった」が得られたとする。 In step 114, the second utterance analysis unit 17 selects words registered in the recognition dictionary database in which words expressing emotions and words that modify them are registered based on the audio signal generated by the input unit 11. The user utterance is recognized and the morphological analysis is performed. In this case, since the user is asked a question about emotions, the content of the user's utterance can be expected to be narrowed down to those showing emotions. It is possible to recognize using a recognition dictionary database specialized only for a plurality of words. In the present embodiment, it is assumed that “good feeling” was obtained as the most probable candidate of the analysis result.

ステップ１１６では、感情表現判定部１９が、第２の発話解析部１７による解析結果に感情表現が含まれているか否かを感情表現辞書１８を用いて判定し、感情表現が含まれると判定された場合にはステップ１１８に進み、感情表現が含まれないと判定された場合にはステップ１２２に進む。本実施の形態では、図３には感情表現として「気持ちよい」が登録されているため、解析結果には「気持ちよかった」という感情表現が含まれると判定され、ステップ１１８に進む。 In step 116, the emotional expression determination unit 19 determines whether or not the emotional expression is included in the analysis result by the second utterance analysis unit 17 by using the emotional expression dictionary 18, and it is determined that the emotional expression is included. If YES in step 118, the flow advances to step 118. If it is determined that no emotional expression is included, the flow advances to step 122. In the present embodiment, since “feeling good” is registered as an emotion expression in FIG. 3, it is determined that an emotional expression “feeling good” is included in the analysis result, and the process proceeds to step 118.

ステップ１１８では、感情表現応答生成部２０が、その感情表現と図４に示す感情表現応答生成データベースに記憶された応答生成ルールに基づいて感情表現応答を生成する。本実施の形態では、感情表現応答生成部２０は、感情表現応答生成データベースに記憶された感情表現「気持ちよかった」が表す感情が有し得る属性情報として「程度」を問う応答として、応答生成テンプレート「どれくらい〜なの？」を用いて「どれくらい気持ちよかったの？」を生成する。 In step 118, the emotional expression response generation unit 20 generates an emotional expression response based on the emotional expression and the response generation rule stored in the emotional expression response generation database shown in FIG. In the present embodiment, the emotion expression response generation unit 20 uses a response generation template as a response that asks "degree" as attribute information that the emotion represented by the emotion expression "I felt good" stored in the emotion expression response generation database. Using “How much is it?”, “How pleasant was it?” Is generated.

ステップ１２０では、出力部２２が、生成された感情表現応答を出力する。 In step 120, the output unit 22 outputs the generated emotion expression response.

一方、ステップ１１４において、第２の発話解析部１７による解析結果の最も確からしい候補として「気持ち買った」が得られたとする。 On the other hand, it is assumed that “feeling bought” is obtained as the most probable candidate of the analysis result by the second utterance analysis unit 17 in step 114.

この場合には、ステップ１１６では、感情表現判定部１９が、この解析結果に感情表現が含まれているか否かを感情表現辞書１８を用いて判定する。その結果「気持ち買った」は図３に示す感情表現辞書１８には登録されていないため、解析結果には感情表現が含まれないと判定され、ステップ１２２に進む。 In this case, in step 116, the emotional expression determination unit 19 determines whether or not an emotional expression is included in the analysis result by using the emotional expression dictionary 18. As a result, “I bought a feeling” is not registered in the emotion expression dictionary 18 shown in FIG. 3, so it is determined that the analysis result does not include an emotion expression, and the process proceeds to step 122.

ステップ１２２では、相槌生成部２１が、相槌データベースに登録された相槌からランダムに選択し、例えば「そうなんだ」といった相槌を生成し、ステップ１２４では、出力部２２が相槌を出力する。 In step 122, the interest generation unit 21 randomly selects from the considerations registered in the consideration database, and generates, for example, “Yes”, and in step 124, the output unit 22 outputs the consideration.

また、本実施の形態の別の例として、「週末温泉に行った」というユーザ発話に対して、ステップ１０２において、第１の発話解析部１２が、図８に示す１０候補を選択したとする。 As another example of the present embodiment, it is assumed that the first utterance analysis unit 12 selects ten candidates shown in FIG. 8 in step 102 for a user utterance “I went to a hot spring on the weekend”. .

この場合には、ステップ１０４において、動詞判定部１３が、図８の解析結果に基づいて、ユーザ発話に動詞が含まれるか否かを判定する。図８の１０候補のすべてに動詞が含まれているため、上述の条件によってユーザ発話は動詞を含むと判定される。 In this case, in step 104, the verb determination unit 13 determines whether or not a verb is included in the user utterance based on the analysis result of FIG. Since all of the ten candidates in FIG. 8 include a verb, the user utterance is determined to include a verb according to the above-described conditions.

ステップ１０６では、動詞信頼度判定部１４が、ユーザ発話に含まれる動詞が信頼できるか否か、上述の条件に基づいて判定する。本例の場合は、図９に示すように「行く」という動詞が８つの候補に含まれるために信頼できると判定され、ステップ１２６に進む。 In step 106, the verb reliability determination unit 14 determines whether or not the verb included in the user utterance is reliable based on the above-described condition. In the case of this example, as shown in FIG. 9, the verb “go” is included in the eight candidates, so that it is determined to be reliable, and the process proceeds to step 126.

ステップ１２６では、応答生成部１５が、ユーザ発話に対する入力発話応答を生成する。応答の生成方法は公知のいかなる手法を用いてもよいが、例えば、「行く」という動詞と予め用意されたテンプレート「〜んだぁ」とを用いて、「行ったんだぁ」という応答を生成する。あるいは、特開２００７−２０６８８８号公報に記載された手法を適用して応答を生成してもよい。この手法では、応答生成部１５は、ユーザ発話の解析結果の最も確からしい候補「週末温泉に行った」について格解析を行って格要素と述語を抽出し、格要素を確認する「温泉に行ったの？」や、省略された格要素を質問する「誰と行ったの？」などを応答として生成する。 In step 126, the response generation unit 15 generates an input utterance response to the user utterance. Any known method can be used as a method for generating a response. For example, a response “I went” is generated by using a verb “go” and a template “-daa” prepared in advance. To do. Alternatively, a response may be generated by applying the method described in Japanese Patent Application Laid-Open No. 2007-206888. In this method, the response generation unit 15 performs a case analysis on the most probable candidate of the user utterance analysis result “I went to a hot spring on weekends”, extracts case elements and predicates, and confirms the case elements "Who did you?" Or "Who did you?"

ステップ１２８では、出力部２２が応答を出力する。 In step 128, the output unit 22 outputs a response.

以上のように、本実施の形態に係る音声対話装置は、ユーザ発話に含まれる動詞が信頼できない場合でも誤った応答を生成することがないため、破綻することなくユーザとの対話を継続することができる。更に、単純に相槌を返す場合と異なり、ユーザの感情を陽に質問することで、ユーザの次の発話を促すことが可能となり、対話の自然性も向上する。 As described above, the voice interaction apparatus according to the present embodiment does not generate an erroneous response even when the verb included in the user utterance is unreliable, and thus continues the conversation with the user without failing. Can do. In addition, unlike simply returning a conflict, by explicitly asking the user's emotions, it is possible to prompt the user's next utterance and improve the naturalness of the conversation.

本発明に係る音声対話装置の主要構成を示すブロック図である。It is a block diagram which shows the main structures of the voice interactive apparatus which concerns on this invention. 応答テンプレートを示す図である。It is a figure which shows a response template. 感情表現辞書の構成を示す図である。It is a figure which shows the structure of an emotion expression dictionary. 感情表現応答生成データベースの構成を示す図である。It is a figure which shows the structure of an emotion expression response production | generation database. 本発明に係る音声対話装置の作用の流れを示すフローチャートである。It is a flowchart which shows the flow of an effect | action of the voice interactive apparatus which concerns on this invention. ユーザ発話の解析結果の候補の例（その１）を示す表である。It is a table | surface which shows the example (the 1) of the candidate of the analysis result of a user utterance. ユーザ発話の解析結果の候補に含まれる動詞と各動詞を含む候補の数の例（その１）を示す表である。It is a table | surface which shows the example (the 1) of the number of candidates which contain the verb contained in the candidate of the analysis result of a user utterance, and each verb. ユーザ発話の解析結果の候補の例（その２）を示す表である。It is a table | surface which shows the example (the 2) of the candidate of the analysis result of a user utterance. ユーザ発話の解析結果の候補に含まれる動詞と各動詞を含む候補の数の例（その２）を示す表である。It is a table | surface which shows the example (the 2) of the number of the candidates included in the candidate of the analysis result of a user utterance, and each verb.

Explanation of symbols

１１入力部
１２第１の発話解析部
１３動詞判定部
１４動詞信頼度判定部
１５応答生成部
１６感情質問生成部
１７第２の発話解析部
１８感情表現辞書
１９感情表現判定部
２０感情表現応答生成部
２１相槌生成部
２２出力部 DESCRIPTION OF SYMBOLS 11 Input part 12 1st utterance analysis part 13 Verb determination part 14 Verb reliability determination part 15 Response generation part 16 Emotion question generation part 17 Second utterance analysis part 18 Emotion expression dictionary 19 Emotion expression determination part 20 Emotion expression response generation Unit 21 phase generator 22 output unit

Claims

An input means for inputting a user's utterance;
First utterance analysis means for analyzing a first utterance by a user input by the input means;
Verb determining means for determining whether or not a verb is included in the analysis result by the first utterance analyzing means;
A verb reliability determination unit that determines whether or not the verb is reliable when the verb determination unit determines that the verb is included in the analysis result by the utterance analysis unit;
A response generation unit that generates a response to the first utterance using a predetermined response template and the verb when the verb reliability determination unit determines that the verb is reliable;
An emotional question generating means for generating a question asking the emotion of the user's first utterance using a predetermined emotional question template when the verb reliability determining means determines that the verb is unreliable;
An output means for outputting the response generated by the response generation means and the question generated by the emotion question generation means;
Spoken dialogue device with

Second utterance analysis means for analyzing the user's utterance input by the input means for the question output by the output means;
An emotional expression storage means for storing words expressing emotions;
Emotion expression determination means for determining whether or not a word expressing the emotion stored in the emotion expression storage means is included in the analysis result by the second utterance analysis means;
When it is determined by the emotion expression determination means that the analysis result by the emotion response analysis means includes a word expressing emotion, using a predetermined emotion response template and the word expressing the emotion, An emotion response generating means for generating a response sentence for an emotional expression utterance;
A conflict generating means for generating a conflict for the emotional expression utterance using a predetermined conflict when it is determined by the emotional expression determining unit that the analysis result by the emotion response analyzing unit does not include a word expressing the emotion And further comprising
The spoken dialogue apparatus according to claim 1, wherein the output unit outputs the response sentence generated by the emotion response generation unit and the interaction generated by the interaction generation unit.

3. The spoken dialogue according to claim 2, wherein when the verb determining unit determines that the verb is not included in the analysis result of the utterance analyzing unit, the verb generating unit generates a conflict using a predetermined conflict. apparatus.

Computer
Utterance analysis means for analyzing the first utterance by the input user,
Verb determining means for determining whether or not a verb is included in the analysis result by the utterance analyzing means,
Verb reliability determination means for determining whether or not the verb is reliable when the verb determination means determines that the verb is included in the analysis result by the utterance analysis means,
A response generation unit that generates a response to the first utterance using a predetermined response template and the verb when the verb reliability determination unit determines that the verb is reliable;
Emotion question generation for generating a question asking an emotion corresponding to the content of the user's first utterance using a predetermined emotion question template when the verb reliability determination means determines that the verb is unreliable means,
Spoken dialogue program to function as.