JP2009198614A

JP2009198614A - Interaction device and program

Info

Publication number: JP2009198614A
Application number: JP2008038069A
Authority: JP
Inventors: Takakatsu Yoshimura; 貴克吉村; Kazuya Shimooka; 和也下岡; Ryoko Hotta; 良子堀田; Hiroyuki Hoshino; 博之星野; Yusuke Nakano; 雄介中野
Original assignee: Toyota Motor Corp; Toyota Central R&D Labs Inc
Current assignee: Toyota Motor Corp; Toyota Central R&D Labs Inc
Priority date: 2008-02-19
Filing date: 2008-02-19
Publication date: 2009-09-03
Anticipated expiration: 2028-02-19
Also published as: JP5045486B2

Abstract

<P>PROBLEM TO BE SOLVED: To properly correspond to an optional input for continuing interaction. <P>SOLUTION: A no-input time counter 22 measures a no-input time. When the measured no-input time is a predetermined time or more, an interaction type determination part 34 determines whether an interaction type is an information providing type or an information acquiring type based on the probability of the information providing type obtained from a determination result in the past and the transition probability and the probability of the information acquiring type. When a voice signal fetching part 20 fetches a voice signal, the interaction type determination part 34 determines a query answering type when it is determined that an input voice represents a request/query. If it is determined that the input voice does not represent a request/query, either of an information receiving type and the information acquiring type is determined based on the probability of the information acquiring type and that of the information receiving type. A correspondence generation part 38 generates a correspondence sentence complying with the determination result to utterance from a user. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、対話装置及びプログラムに係り、特に、ユーザによる発話又は入力文に応答して対話する対話装置及びプログラムに関する。 The present invention relates to a dialog device and a program, and more particularly to a dialog device and a program for dialog in response to a user's utterance or input sentence.

従来より、利用者の音声に対する制限を行わずとも、精度よく利用者の要求内容を把握して、利用者とのやりとりをスムーズに行えるようにする音声対話装置が知られている（特許文献１）。この音声対話装置は、システム主導型の対話システムとユーザ主導型の対話システムとを混在させ、認識結果や対話履歴に応じて、２種類の対話システムを適切に切り替えることで、システム主導型対話の頑健性を保ちつつ、ユーザ主導型対話の柔軟性を有している。
特開２００３−２２８３９３号公報 2. Description of the Related Art Conventionally, there has been known a voice interaction device that can accurately grasp a user's request contents and smoothly exchange with the user without restricting the user's voice (Patent Document 1). ). This voice interaction device mixes a system-driven dialogue system and a user-driven dialogue system, and appropriately switches between the two types of dialogue systems according to recognition results and dialogue history. It has the flexibility of user-driven dialogue while maintaining robustness.
JP 2003-228393 A

しかしながら、上記の特許文献１に記載の技術では、想定外の入力に対しては、適切に応答することができない、という問題がある。例えば、音声認識の認識結果と共に出力される信頼度などを利用して、想定外の入力らしい時は、「もう一度言って下さい」などの定型文を応答することができるが、ユーザがシステムとは対話ができないと判断し、対話を中断してしまうため、対話を継続させることができない。 However, the technique described in Patent Document 1 has a problem that it cannot respond appropriately to an unexpected input. For example, by using the reliability output together with the recognition result of speech recognition, when it seems to be an unexpected input, it can respond with a fixed phrase such as “Please say again”, but the user is a system Since it is determined that the dialogue cannot be performed and the dialogue is interrupted, the dialogue cannot be continued.

本発明は、上記の問題点を解決するためになされたもので、任意の入力に対して適切に応答することができ、対話を継続させることができる対話装置及びプログラムを提供することを目的とする。 The present invention has been made to solve the above-described problems, and an object of the present invention is to provide a dialog device and a program that can appropriately respond to an arbitrary input and can continue the dialog. To do.

上記の目的を達成するために第１の発明に係る対話装置は、ユーザによる発話及び入力文の少なくとも一方を入力する入力手段と、前記入力手段への入力がない状態が継続する時間を計測する無入力時間計測手段と、前記入力手段に入力された前記発話及び入力文の少なくとも一方が、前記ユーザの要求又は質問を表わしているか否かを判定する要求質問判定手段と、前記無入力時間計測手段によって計測された時間が所定時間以上である場合には、前記ユーザとの対話が、自装置が自発的に情報を提供する情報提供型であると判別し、前記要求質問判定手段によって前記ユーザの要求又は質問を表わしていると判定された場合には、前記ユーザとの対話が、ユーザの要求又は質問に対して自装置が回答する質問応答型であると判別し、前記要求質問判定手段によって前記ユーザの要求又は質問を表わしていないと判定された場合には、前記ユーザとの対話が、ユーザが自発的に提供する情報を自装置が受理する情報受理型であると判別する対話型判別手段と、前記発話及び入力文の少なくとも一方に対して、前記対話型判別手段による判別結果に応じた応答文を生成する応答生成手段と、前記応答生成手段によって生成された応答文を出力する出力手段とを含んで構成されている。 In order to achieve the above object, an interactive apparatus according to a first aspect of the present invention measures an input means for inputting at least one of an utterance and an input sentence by a user and a time during which there is no input to the input means. Non-input time measurement means, request question determination means for determining whether or not at least one of the utterance and the input sentence input to the input means represents the user's request or question, and the non-input time measurement When the time measured by the means is a predetermined time or more, it is determined that the interaction with the user is an information providing type in which the device itself provides information voluntarily, and the request question determining means determines the user If it is determined that the user's request or question is expressed, it is determined that the dialogue with the user is a question answering type in which the device responds to the user's request or question, When it is determined that the user's request or question is not represented by the question determination unit, the dialogue with the user is an information reception type in which the device itself receives information that the user voluntarily provides. An interactive discriminating unit for discriminating; a response generating unit for generating a response sentence corresponding to a discrimination result by the interactive discriminating unit for at least one of the utterance and the input sentence; and a response generated by the response generating unit Output means for outputting a sentence.

第２の発明に係るプログラムは、コンピュータを、ユーザによる発話及び入力文の少なくとも一方を入力する入力手段への入力がない状態が継続する時間を計測する無入力時間計測手段、前記入力手段に入力された前記発話及び入力文の少なくとも一方が、前記ユーザの要求又は質問を表わしているか否かを判定する要求質問判定手段、前記無入力時間計測手段によって計測された時間が所定時間以上である場合には、前記ユーザとの対話が、自装置が自発的に情報を提供する情報提供型であると判別し、前記要求質問判定手段によって前記ユーザの要求又は質問を表わしていると判定された場合には、前記ユーザとの対話が、ユーザの要求又は質問に対して自装置が回答する質問応答型であると判別し、前記要求質問判定手段によって前記ユーザの要求又は質問を表わしていないと判定された場合には、前記ユーザとの対話が、ユーザが自発的に提供する情報を自装置が受理する情報受理型であると判別する対話型判別手段、及び前記発話及び入力文の少なくとも一方に対して、前記対話型判別手段による判別結果に応じた応答文を生成する応答生成手段として機能させるためのプログラムである。 According to a second aspect of the present invention, there is provided a non-input time measuring unit that measures a time during which a computer is not input to an input unit that inputs at least one of an utterance and an input sentence by a user. When the time measured by the requested question determination means and the non-input time measurement means for determining whether at least one of the uttered speech and the input sentence represents the user's request or question is a predetermined time or more In the case where it is determined that the dialogue with the user is an information providing type in which the device itself provides information voluntarily, and it is determined by the request question determination means that it represents the user's request or question Determining that the interaction with the user is a question answering type in which the device responds to the user's request or question, and the request question determining means If it is determined that it does not represent the user's request or question, the interactive determination for determining that the dialogue with the user is an information acceptance type in which the device itself accepts information that the user voluntarily provides And a program for functioning as at least one of the utterance and the input sentence as a response generation unit that generates a response sentence according to the determination result by the interactive determination unit.

第１の発明及び第２の発明によれば、入力手段に、ユーザによる発話及び入力文の少なくとも一方が入力される。また、無入力時間計測手段によって、入力手段への入力がない状態が継続する時間を計測する。 According to the first and second inventions, at least one of the user's utterance and the input sentence is input to the input means. Further, the non-input time measuring means measures the time during which the state where there is no input to the input means continues.

また、要求質問判定手段によって、入力手段に入力された発話及び入力文の少なくとも一方が、ユーザの要求又は質問を表わしているか否かを判定する。 Further, the request question determination unit determines whether at least one of the utterance and the input sentence input to the input unit represents a user request or a question.

そして、対話型判別手段によって、無入力時間計測手段によって計測された時間が所定時間以上である場合には、ユーザとの対話が、自装置が自発的に情報を提供する情報提供型であると判別する。また、要求質問判定手段によってユーザの要求又は質問を表わしていると判定された場合には、ユーザとの対話が、ユーザの要求又は質問に対して自装置が回答する質問応答型であると判別する。また、要求質問判定手段によってユーザの要求又は質問を表わしていないと判定された場合には、ユーザとの対話が、ユーザが自発的に提供する情報を自装置が受理する情報受理型であると判別する。 Then, when the time measured by the no-input time measuring means is equal to or longer than the predetermined time by the interactive discrimination means, the dialogue with the user is an information providing type in which the device itself provides information voluntarily. Determine. Further, when it is determined by the request question determination means that it represents a user request or question, it is determined that the dialogue with the user is a question response type in which the own device answers the user request or question. To do. Further, when it is determined by the request question determination means that it does not represent the user's request or question, the interaction with the user is an information reception type in which the device itself receives information that the user voluntarily provides Determine.

そして、応答生成手段によって、発話及び入力文の少なくとも一方に対して、対話型判別手段による判別結果に応じた応答文を生成する。出力手段によって、応答生成手段によって生成された応答文を出力する。 Then, the response generation means generates a response sentence corresponding to the determination result by the interactive determination means for at least one of the utterance and the input sentence. The response means generated by the response generation means is output by the output means.

このように、無入力時間、及び入力された発話又は入力文が要求又は質問を表わしているか否かに基づいて、ユーザとの対話の対話型分類が、情報提供型、質問応答型、及び情報受理型の何れであるかを判別し、応答文を生成することにより、任意の入力に対して適切に応答することができ、対話を継続させることができる。 Thus, based on whether there is no input time and whether the input utterance or input sentence represents a request or a question, the interactive classification of the dialog with the user is an information providing type, a question answering type, and an information type. It is possible to appropriately respond to an arbitrary input and continue the dialogue by determining which type is an acceptance type and generating a response sentence.

第３の発明に係る対話装置は、ユーザによる発話及び入力文の少なくとも一方を入力する入力手段と、前記入力手段への入力がない状態が継続する時間を計測する無入力時間計測手段と、前記入力手段に入力された前記発話及び入力文の少なくとも一方が、前記ユーザの要求又は質問を表わしているか否かを判定する要求質問判定手段と、前記無入力時間計測手段によって計測された時間が所定時間以上である場合には、前記ユーザとの対話が、自装置が自発的に質問する情報獲得型であると判別し、前記要求質問判定手段によって前記ユーザの要求又は質問を表わしていると判定された場合には、前記ユーザとの対話が、ユーザの要求又は質問に対して自装置が回答する質問応答型であると判別し、前記要求質問判定手段によって前記ユーザの要求又は質問を表わしていないと判定された場合には、前記ユーザとの対話が、ユーザが自発的に提供する情報を自装置が受理する情報受理型であると判別する対話型判別手段と、前記発話及び入力文の少なくとも一方に対して、前記対話型判別手段による判別結果に応じた応答文を生成する応答生成手段と、前記応答生成手段によって生成された応答文を出力する出力手段とを含んで構成されている。 An interactive apparatus according to a third aspect of the present invention is an input unit that inputs at least one of a user's utterance and an input sentence, a non-input time measuring unit that measures a time during which there is no input to the input unit, The request question determination means for determining whether or not at least one of the utterance and the input sentence input to the input means represents the user's request or question, and the time measured by the non-input time measurement means is predetermined. If it is more than the time, it is determined that the dialogue with the user is an information acquisition type in which the device voluntarily asks a question, and it is determined that the request question determination means represents the user's request or question. If it is determined that the dialogue with the user is a question answering type in which the user's own device answers the user's request or question, the request question determining means determines the user. If it is determined that the request or question is not represented, the dialog with the user is determined to be an information reception type in which the user apparatus accepts information that the user voluntarily provides; Response generation means for generating a response sentence corresponding to the determination result by the interactive determination means for at least one of the utterance and the input sentence; and output means for outputting the response sentence generated by the response generation means; It is comprised including.

第４の発明に係るプログラムは、コンピュータを、ユーザによる発話及び入力文の少なくとも一方を入力する入力手段への入力がない状態が継続する時間を計測する無入力時間計測手段、前記入力手段に入力された前記発話及び入力文の少なくとも一方が、前記ユーザの要求又は質問を表わしているか否かを判定する要求質問判定手段、前記無入力時間計測手段によって計測された時間が所定時間以上である場合には、前記ユーザとの対話が、自装置が自発的に質問する情報獲得型であると判別し、前記要求質問判定手段によって前記ユーザの要求又は質問を表わしていると判定された場合には、前記ユーザとの対話が、ユーザの要求又は質問に対して自装置が回答する質問応答型であると判別し、前記要求質問判定手段によって前記ユーザの要求又は質問を表わしていないと判定された場合には、前記ユーザとの対話が、ユーザが自発的に提供する情報を自装置が受理する情報受理型であると判別する対話型判別手段、及び前記発話及び入力文の少なくとも一方に対して、前記対話型判別手段による判別結果に応じた応答文を生成する応答生成手段として機能させるためのプログラムである。 According to a fourth aspect of the present invention, there is provided a non-input time measuring unit that measures a time during which a computer is not input to an input unit that inputs at least one of an utterance and an input sentence by a user. When the time measured by the requested question determination means and the non-input time measurement means for determining whether at least one of the uttered speech and the input sentence represents the user's request or question is a predetermined time or more In the case where it is determined that the dialogue with the user is an information acquisition type in which the device itself asks a question voluntarily, and the request question determination unit determines that the request or the question represents the user The dialogue with the user is determined to be a question response type in which the own device answers the user's request or question, and the user is asked by the request question determination means. If it is determined that it does not represent a request or a question, the interactive type discriminating means for discriminating that the dialogue with the user is an information accepting type in which the device itself accepts information that the user voluntarily provides; and It is a program for functioning as a response generation unit that generates a response sentence corresponding to a determination result by the interactive determination unit for at least one of the utterance and the input sentence.

第３の発明及び第４の発明によれば、入力手段に、ユーザによる発話及び入力文の少なくとも一方が入力される。また、無入力時間計測手段によって、入力手段への入力がない状態が継続する時間を計測する。 According to the third and fourth inventions, at least one of a user's utterance and an input sentence is input to the input means. Further, the non-input time measuring means measures the time during which the state where there is no input to the input means continues.

そして、対話型判別手段によって、無入力時間計測手段によって計測された時間が所定時間以上である場合には、ユーザとの対話が、自装置が自発的に質問する情報獲得型であると判別する。また、要求質問判定手段によってユーザの要求又は質問を表わしていると判定された場合には、ユーザとの対話が、ユーザの要求又は質問に対して自装置が回答する質問応答型であると判別する。また、要求質問判定手段によってユーザの要求又は質問を表わしていないと判定された場合には、ユーザとの対話が、ユーザが自発的に提供する情報を自装置が受理する情報受理型であると判別する。 Then, when the time measured by the no-input time measuring means is equal to or longer than the predetermined time by the interactive discrimination means, it is determined that the dialogue with the user is an information acquisition type in which the device spontaneously asks a question. . Further, when it is determined by the request question determination means that it represents a user request or question, it is determined that the dialogue with the user is a question response type in which the own device answers the user request or question. To do. Further, when it is determined by the request question determination means that it does not represent the user's request or question, the interaction with the user is an information reception type in which the device itself receives information that the user voluntarily provides Determine.

このように、無入力時間、及び入力された発話又は入力文が要求又は質問を表わしているか否かに基づいて、ユーザとの対話の対話型分類が、情報獲得型、質問応答型、及び情報受理型の何れであるかを判別し、応答文を生成することにより、任意の入力に対して適切に応答することができ、対話を継続させることができる。 Thus, based on whether there is no input time and whether the input utterance or input sentence represents a request or question, the interactive classification of user interaction is classified as an information acquisition type, a question response type, and an information type. It is possible to appropriately respond to an arbitrary input and continue the dialogue by determining which type is an acceptance type and generating a response sentence.

第５の発明に係る対話装置は、ユーザによる発話及び入力文の少なくとも一方を入力する入力手段と、前記入力手段への入力がない状態が継続する時間を計測する無入力時間計測手段と、前記入力手段に入力された前記発話及び入力文の少なくとも一方が、前記ユーザの要求又は質問を表わしているか否かを判定する要求質問判定手段と、ユーザと自装置との対話における、自装置が自発的に情報を提供する情報提供型、自装置が自発的に質問する情報獲得型、ユーザの要求又は質問に対して自装置が回答する質問応答型、及びユーザが自発的に提供する情報を自装置が受理する情報受理型からなる対話型分類の遷移履歴を記憶する遷移履歴記憶手段と、前記無入力時間計測手段によって計測された時間が所定時間以上である場合には、過去の判別結果及び前記遷移履歴に基づいて、前記ユーザとの対話が、前記情報提供型及び情報獲得型の何れかであると判別し、前記要求質問判定手段によって前記ユーザの要求又は質問を表わしていると判定された場合には、前記ユーザとの対話が、前記質問応答型であると判別し、前記要求質問判定手段によって前記ユーザの要求又は質問を表わしていないと判定された場合には、前記ユーザとの対話が、前記情報受理型であると判別する対話型判別手段と、前記発話及び入力文の少なくとも一方に対して、前記対話型判別手段による判別結果に応じた応答文を生成する応答生成手段と、前記応答生成手段によって生成された応答文を出力する出力手段とを含んで構成されている。 An interactive apparatus according to a fifth aspect of the present invention is an input unit that inputs at least one of an utterance and an input sentence by a user, a non-input time measuring unit that measures a time during which there is no input to the input unit, In the dialogue between the user and his / her own device, the own device spontaneously makes a decision as to whether or not at least one of the utterance and the input sentence input to the input device represents the user's request or question. Information provision type that provides information automatically, information acquisition type that self device voluntarily asks, question response type that self device answers user's request or question, and information that user provides voluntarily A transition history storage means for storing a transition history of an interactive classification consisting of an information acceptance type accepted by the apparatus, and a past time when the time measured by the no-input time measurement means is a predetermined time or more, Based on the determination result and the transition history, it is determined that the interaction with the user is either the information provision type or the information acquisition type, and the request question determination unit represents the request or question of the user. If it is determined that the dialogue with the user is the question answering type, and if it is determined by the request question determination means that it does not represent the user's request or question, An interactive type discriminating unit that discriminates that the dialogue with the user is the information acceptance type, and a response that generates a response sentence according to the discrimination result by the interactive type discriminating unit for at least one of the utterance and the input sentence The generating unit includes an output unit that outputs the response sentence generated by the response generating unit.

第６の発明に係るプログラムは、コンピュータを、ユーザによる発話及び入力文の少なくとも一方を入力する入力手段への入力がない状態が継続する時間を計測する無入力時間計測手段、前記入力手段に入力された前記発話及び入力文の少なくとも一方が、前記ユーザの要求又は質問を表わしているか否かを判定する要求質問判定手段、ユーザと自装置との対話における、自装置が自発的に情報を提供する情報提供型、自装置が自発的に質問する情報獲得型、ユーザの要求又は質問に対して自装置が回答する質問応答型、及びユーザが自発的に提供する情報を自装置が受理する情報受理型からなる対話型分類の遷移履歴を記憶する遷移履歴記憶手段、前記無入力時間計測手段によって計測された時間が所定時間以上である場合には、過去の判別結果及び前記遷移履歴に基づいて、前記ユーザとの対話が、前記情報提供型及び情報獲得型の何れかであると判別し、前記要求質問判定手段によって前記ユーザの要求又は質問を表わしていると判定された場合には、前記ユーザとの対話が、前記質問応答型であると判別し、前記要求質問判定手段によって前記ユーザの要求又は質問を表わしていないと判定された場合には、前記ユーザとの対話が、前記情報受理型であると判別する対話型判別手段、及び前記発話及び入力文の少なくとも一方に対して、前記対話型判別手段による判別結果に応じた応答文を生成する応答生成手段として機能させるためのプログラムである。 According to a sixth aspect of the present invention, there is provided a non-input time measuring unit that measures a time during which a computer is not input to an input unit that inputs at least one of an utterance and an input sentence by a user. Request question determination means for determining whether or not at least one of the uttered speech and input sentence represents the user's request or question, and the device itself provides information spontaneously in the dialogue between the user and the device Information-providing type, information-acquiring type in which the device itself asks questions, question-answering type in which the device responds to user requests or questions, and information that the device accepts information that the user voluntarily provides Transition history storage means for storing the transition history of the interactive type consisting of acceptance type, and when the time measured by the non-input time measuring means is a predetermined time or more, Based on the result and the transition history, it is determined that the interaction with the user is one of the information provision type and the information acquisition type, and the request question determination unit represents the user's request or question. If determined, it is determined that the dialogue with the user is the question answering type, and if it is determined by the request question determination means that it does not represent the user's request or question, the user Generating a response sentence according to a determination result by the interactive determination means for at least one of the utterance and the input sentence, and an interactive determination means for determining that the dialogue with the information acceptance type It is a program for functioning as a means.

第５の発明及び第６の発明によれば、入力手段に、ユーザによる発話及び入力文の少なくとも一方が入力される。また、無入力時間計測手段によって、入力手段への入力がない状態が継続する時間を計測する。 According to the fifth and sixth inventions, at least one of the user's utterance and the input sentence is input to the input means. Further, the non-input time measuring means measures the time during which the state where there is no input to the input means continues.

そして、対話型判別手段によって、無入力時間計測手段によって計測された時間が所定時間以上である場合には、過去の判別結果及び遷移履歴に基づいて、ユーザとの対話が、情報提供型及び情報獲得型の何れかであると判別する。また、要求質問判定手段によってユーザの要求又は質問を表わしていると判定された場合には、ユーザとの対話が、質問応答型であると判別する。また、要求質問判定手段によってユーザからの要求又は質問を表わしていないと判定された場合には、ユーザとの対話が、情報受理型であると判別する。 Then, when the time measured by the no-input time measuring means is equal to or longer than the predetermined time by the interactive discrimination means, the dialogue with the user is based on the past discrimination result and the transition history. It is determined that it is one of acquisition type. Further, when it is determined by the request question determination means that it represents the user's request or question, it is determined that the dialogue with the user is a question response type. Further, when it is determined by the request question determination means that the request or question from the user is not represented, it is determined that the dialogue with the user is an information acceptance type.

このように、無入力時間、対話型分類の遷移履歴、及び入力された発話又は入力文が要求又は質問を表わしているか否かに基づいて、ユーザとの対話の対話型分類が、情報提供型、情報獲得型、質問応答型、及び情報受理型の何れであるかを判別し、応答文を生成することにより、任意の入力に対して適切に応答することができ、対話を継続させることができる。 Thus, based on the no-input time, the transition history of the interactive classification, and whether or not the input utterance or input sentence represents a request or question, the interactive classification of the dialog with the user is an information provision type. It is possible to respond appropriately to any input and continue the dialogue by discriminating whether it is information acquisition type, question answering type, or information receiving type and generating a response sentence it can.

第５の発明に係る対話型判別手段は、要求質問判定手段によってユーザの要求又は質問を表わしていないと判定された場合には、過去の判別結果及び遷移履歴に基づいて、ユーザとの対話が、情報受理型及び情報獲得型の何れかであると判別することができる。これによって、入力された発話又は入力文が要求又は質問を表わしていない場合に、情報受理型及び情報獲得型の何れかであるかを更に判別することにより、任意の入力に対してより適切に応答することができる。 In the interactive discrimination means according to the fifth invention, when it is determined that the request question determination means does not represent the user's request or question, the dialogue with the user is performed based on the past determination result and the transition history. Therefore, it can be determined that the information reception type or the information acquisition type. As a result, when the input utterance or input sentence does not represent a request or a question, it is more appropriate for any input by further determining whether it is an information acceptance type or an information acquisition type. Can respond.

第１の発明及び第５の発明に係る応答生成手段は、情報提供型であると判別された場合、ユーザに提供するための情報を複数記憶した情報データベースに記憶された複数の情報から選択した情報を用いて、応答文を生成することができる。 The response generating means according to the first and fifth inventions, when determined to be of the information providing type, has selected from a plurality of information stored in an information database storing a plurality of information for providing to the user A response sentence can be generated using the information.

上記の応答生成手段は、質問応答型であると判別された場合、質問文又は要求文と質問文又は要求文に対する回答文とを対応させて記憶した質問回答データベースに記憶された質問文又は要求文から、発話及び入力文の少なくとも一方が表わす要求又は質問に対応する質問文又は要求文を検索し、検索された前記質問文又は要求文に対する回答文を用いて、応答文を生成することができる。 When it is determined that the response generation means is a question response type, the question sentence or request stored in the question answer database in which the question sentence or request sentence and the answer sentence to the question sentence or request sentence are stored in association with each other Searching a question sentence or request sentence corresponding to a request or question represented by at least one of an utterance and an input sentence from a sentence, and generating a response sentence using an answer sentence to the searched question sentence or request sentence it can.

上記の対応装置は、入力手段によって入力された発話及び入力文の少なくとも一方の構造を解析する解析手段を更に含み、応答生成手段は、情報受理型であると判別された場合、構造と構造に対する応答文とを対応させて記憶した応答データベースに記憶され、かつ、解析手段によって解析された構造に対応する応答文を用いて、応答文を生成することができる。 The corresponding device further includes an analysis unit that analyzes the structure of at least one of the utterance and the input sentence input by the input unit, and the response generation unit determines that the structure and the structure are in a case where the response generation unit is determined to be an information reception type. The response sentence can be generated using the response sentence stored in the response database stored in correspondence with the response sentence and corresponding to the structure analyzed by the analyzing means.

第３の発明及び第５の発明に係る応答生成手段は、情報獲得型であると判別された場合、ユーザへの質問文又は要求文を複数記憶した質問要求データベースに記憶された複数の質問文から選択した質問文を用いて、応答文を生成することができる。 When it is determined that the response generation means according to the third and fifth inventions is an information acquisition type, a plurality of question sentences stored in a question request database storing a plurality of question sentences or request sentences to the user A response sentence can be generated using the question sentence selected from the above.

上記の入力手段は、ユーザによる発話を入力し、要求質問判定手段は、入力手段から入力された発話について、音声の特徴量を抽出し、抽出した特徴量と、予め求められた質問又は要求を表わす発話の特徴量とを比較して、発話及び入力文の少なくとも一方が、ユーザの要求又は質問を表わしているか否かを判定することができる。 The input means inputs an utterance by the user, and the request question determination means extracts a voice feature amount for the utterance input from the input means, and extracts the extracted feature amount and a previously obtained question or request. It is possible to determine whether or not at least one of the utterance and the input sentence represents the user's request or question by comparing with the feature amount of the expressed utterance.

以上説明したように、本発明の対話装置及びプログラムによれば、無入力時間、及び入力された発話又は入力文が要求又は質問を表わしているか否かに基づいて、ユーザとの対話の対話型分類を判別し、応答文を生成することにより、任意の入力に対して適切に応答することができ、対話を継続させることができる、という効果が得られる。 As described above, according to the dialogue apparatus and program of the present invention, the interactive type of dialogue with the user is based on the no-input time and whether or not the inputted utterance or input sentence represents a request or a question. By discriminating the classification and generating a response sentence, it is possible to appropriately respond to an arbitrary input and to obtain an effect that the dialogue can be continued.

以下、図面を参照して本発明の実施の形態を詳細に説明する。なお、本実施の形態では、ユーザの発話に対して音声を利用して応答する対話装置に本発明を適用した場合について説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the present embodiment, a case will be described in which the present invention is applied to an interactive device that responds to a user's utterance using speech.

図１に示すように、第１の実施の形態に係る対話装置１０は、マイクロホンで構成され、かつ、ユーザ発話を集音して音声信号を生成する入力部１２と、入力部１２によって入力された音声信号に基づいて、応答発話を生成するコンピュータ１４と、スピーカで構成され、かつ、生成された応答発話を音声出力する音声出力部１６とを備えている。 As shown in FIG. 1, an interactive apparatus 10 according to the first embodiment is configured by a microphone, and is input by an input unit 12 that collects a user's utterance and generates an audio signal. A computer 14 that generates a response utterance based on the received audio signal, and a voice output unit 16 that includes a speaker and outputs the generated response utterance.

コンピュータ１４は、ＣＰＵと、ＲＡＭと、後述する応答生成処理ルーチンを実行するためのプログラムを記憶したＲＯＭとを備え、機能的には次に示すように構成されている。コンピュータ１４は、入力部１２から入力された信号から、入力音声を示す音声信号を切り出す音声切り出し部２０と、入力部１２から入力される信号に基づいて、入力音声が入力されない状態が継続している無入力時間を計測する無入力時間カウンタ２２と、認識用辞書データベース（図示省略）に登録された単語を参照して、音声信号に基づいて、ユーザ発話を認識する音声認識部２４と、音声信号に基づいて、音声の特徴量（例えば、継続長や基本周波数など）を抽出する特徴量抽出部２６と、遷移履歴として、ユーザとの対話における後述する４種類の対話型分類の遷移確率を記憶した遷移確率記憶部２８と、後述する対話型分類判別部３４によって前回判別された結果を記憶する判別結果記憶部３０と、記憶された遷移確率と前回の判別結果とに基づいて、現在の対話が各対話型分類である確率を算出する確率算出部３２と、無入力時間カウンタ２２によって計測された無入力時間、音声認識部２４による認識結果、特徴量抽出部２６によって抽出された特徴量、及び確率算出部３２によって算出された確率に基づいて、ユーザとの対話を、４種類の対話型分類の何れかに判別する対話型分類判別部３４とを備えている。 The computer 14 includes a CPU, a RAM, and a ROM that stores a program for executing a response generation processing routine to be described later, and is functionally configured as follows. The computer 14 continues from the signal input from the input unit 12 to a voice cutout unit 20 that cuts out a voice signal indicating the input voice and a state in which no input voice is input based on the signal input from the input unit 12. A non-input time counter 22 that measures a non-input time, a word recognition unit 24 that recognizes a user utterance based on a voice signal with reference to a word registered in a recognition dictionary database (not shown), and a voice Based on the signal, a feature amount extraction unit 26 that extracts a feature amount (for example, duration or fundamental frequency) of speech, and, as a transition history, transition probabilities of four types of interactive classifications described later in a dialog with the user are obtained. The stored transition probability storage unit 28, the determination result storage unit 30 that stores the result of the previous determination by the interactive classification determination unit 34 described later, the stored transition probability and the previous time Based on the determination result, the probability calculation unit 32 that calculates the probability that the current dialogue is each interactive classification, the no-input time measured by the no-input time counter 22, the recognition result by the voice recognition unit 24, and the feature amount An interactive classification discriminating unit 34 for discriminating a dialogue with the user into one of four types of interactive classifications based on the feature amount extracted by the extracting unit 26 and the probability calculated by the probability calculating unit 32; I have.

ここで、４種類の対話型分類について説明する。ユーザとの対話は、対話装置１０がユーザに対して自発的に情報を提供する情報提供型対話、対話装置１０がユーザに対して自発的に質問する情報獲得型対話、ユーザの要求又は質問に対して対話装置１０が回答する質問応答型対話、及びユーザが自発的に提供する情報を対話装置１０が受理する情報受理型対話の４種類の対話型分類に分類される。 Here, four types of interactive classification will be described. The dialogue with the user is an information providing dialogue in which the dialogue device 10 voluntarily provides information to the user, an information acquisition dialogue in which the dialogue device 10 voluntarily asks the user, a user request or a question On the other hand, the interactive device 10 is classified into four types of interactive classifications: a question-and-answer type dialogue in which the interactive device 10 answers, and an information reception type dialogue in which the interactive device 10 accepts information that the user voluntarily provides.

情報提供型対話は、対話装置１０側に対話の主導権があり、かつ、対話装置１０が情報を提供することを想定した対話であり、対話装置１０の状態や対話装置１０に記憶された情報を、対話装置１０が自発的にユーザに提供する対話である。例えば、ユーザが対話装置１０に接近した場合や、対話装置１０の前でユーザが無言でいる場合に、対話装置１０が保持している情報（ニュースや、天気情報、対話装置１０の扱い方に関する情報など）をユーザに提供する。 The information provision type dialogue is a dialogue on the assumption that the dialogue device 10 has the initiative of the dialogue and the dialogue device 10 provides information, and the state of the dialogue device 10 and the information stored in the dialogue device 10. Is a dialogue that the dialogue device 10 voluntarily provides to the user. For example, when the user approaches the interactive device 10 or when the user is silent in front of the interactive device 10, information held by the interactive device 10 (news, weather information, how to handle the interactive device 10) Information).

情報獲得型対話は、対話装置１０側に対話の主導権があり、かつ、ユーザが対話装置１０に対して情報を提供することを想定した対話であり、ユーザ自身の情報や要求、あるいは、ユーザが知っている情報に関して、対話装置１０が自発的に質問する対話である。例えば、対話装置１０が、ユーザに提供すべき情報を絞り込むために必要なユーザ情報を取得するための対話（「何について知りたい？」や「今日は何見に来たの？」など）をしたり、対話を継続させるための対話をしたりする。 The information acquisition type dialogue is a dialogue on the assumption that the dialogue device 10 has the initiative of the dialogue and the user provides information to the dialogue device 10, and the user's own information and requests, or the user This is a dialogue in which the dialogue device 10 voluntarily asks about information known to the user. For example, a dialogue (such as “What do you want to know?” Or “What did you come to see today?”) For the user to acquire user information necessary for the dialogue device 10 to narrow down information to be provided to the user. Or a dialogue to continue the dialogue.

質問応答型対話は、ユーザ側に対話の主導権があり、かつ、対話装置１０がユーザに対して情報を提供することを想定した対話であり、対話装置１０が、ユーザからの質問に対する回答を検索して回答する対話である。例えば、ユーザからの質問又は要求（「トイレはどこ？」や「何時までやってるの？」など）に対し、質問や要求の答えとなる情報をデータベースから検索して、ユーザに提供する。 The question answering type dialogue is a dialogue on the assumption that the user has the initiative of the dialogue and the dialogue device 10 provides information to the user, and the dialogue device 10 answers the question from the user. It is a dialogue to search and answer. For example, in response to a question or request from the user (such as “Where is the toilet?” Or “What time are you doing?”), Information that is the answer to the question or request is retrieved from the database and provided to the user.

情報受理型対話は、ユーザ側に対話の主導権があり、かつ、ユーザが対話装置１０に対して情報を提供することを想定した対話であり、ユーザ自身の情報やユーザが知っている情報など、ユーザが自発的に提供する情報を、対話装置１０が受理する対話である。例えば、ユーザからの質問又は要求以外の入力に対し、相槌を返したり（例えば、「僕、あっくんです。」に対して「へぇそうなんだ。」と応答する。）、問い返したり（例えば、「遊びに来たよ。」に対して「誰と来たの？」と応答する。）、話題を広げたり（例えば、「今日は天気がいいね。」に対して「洗濯物が良く乾くね。」と応答する。）して、上記の情報提供型対話、情報獲得型対話、及び質問応答型対話では想定していない入力に対して、可能な限り広く待ち受け、対話を継続させるための対話をする。 The information-accepting dialogue is a dialogue on the assumption that the user has the initiative of the dialogue and the user provides information to the dialogue device 10, and the user's own information, information that the user knows, etc. In this dialogue, the dialogue apparatus 10 accepts information that the user voluntarily provides. For example, in response to an input other than a question or a request from the user, for example, a response is given (for example, “I am so happy”) or a question (for example, “play” "Where did you come to?"), Spread the topic (for example, "The weather is nice today" and "The laundry dries well." ), And wait for the input that is not assumed in the above information-providing dialog, information-acquisition-type dialog, and question-and-answer-type dialog as much as possible, and conduct a dialog to continue the dialog .

音声認識部２４は、入力音声に基づいて、ユーザ発話を認識すると共に、各認識結果に対する信頼度を出力する。例えば、入力された発話が「今日の天気は？」であるとき、認識結果として、「今日の天気は？」、「今日天気。」、及び「今日天気がいい。」が得られると共に、「今日の天気は？」に対する信頼度「０．８」、「今日天気。」に対する信頼度「０．５」、及び「今日天気がいい。」に対する信頼度「０．３」が得られる。 The voice recognition unit 24 recognizes the user utterance based on the input voice and outputs the reliability for each recognition result. For example, when the input utterance is “What is today's weather?”, As recognition results, “What is today's weather?”, “Today's weather.”, And “Today's weather is good.” A reliability “0.8” for “What is the weather today?”, A reliability “0.5” for “Today's weather.”, And a reliability “0.3” for “Good weather today” are obtained.

特徴量抽出部２５は、入力された音声信号から、音声の特徴量として基本周波数を抽出する。特徴量抽出部２５により抽出される基本周波数に基づいて、音声の韻律が上がっているか否かを判断することができる。 The feature quantity extraction unit 25 extracts a fundamental frequency as a voice feature quantity from the input voice signal. Based on the fundamental frequency extracted by the feature amount extraction unit 25, it can be determined whether or not the prosody of the speech is raised.

遷移確率記憶部２８は、図２に示すような、上記の４種類の対話型分類の遷移確率が遷移履歴を示す情報として記憶されている。ここで、遷移確率ａ_ｘｙは、以下のように算出される。 In the transition probability storage unit 28, as shown in FIG. 2, the transition probabilities of the above four types of interactive classifications are stored as information indicating the transition history. Here, the transition probability a_xy is calculated as follows.

まず、ＷＯＺ（ＷｉｚａｒｄｏｆＯＺ）実験などで、以下に示すような正解の対話遷移履歴データを作成する。
ユーザ：遊びに来たよ。
対話装置：こんにちは、誰と来たの？（情報受理型対話）
ユーザ：友達と来たよ。
対話装置：へぇそうなんだ。（情報受理型対話）
対話装置：今日は○○が展示してあるよ。（情報提供型対話）
ユーザ：○○って何？
対話装置：今話題の××だよ。あの有名な△△が製作したものだよ。（質問応答型対話）
対話装置：○○に興味ある？（情報獲得型対話）
そして、上記のような対話遷移履歴データ（初期状態→情報受理型対話→情報受理型対話→情報提供型対話→質問応答型対話→情報獲得型対話）から、以下に（１）式に従って、対話型分類ｘ→対話型分類ｙに遷移する遷移確率ａ_ｘｙを計算する。
ａ_ｘｙ＝
（対話型分類ｘから対話型分類ｙに遷移した回数）／（対話型分類ｘにいた回数）
・・・（１）
ただし、情報提供型対話を対話型分類１とし、情報獲得型対話を対話型分類２とし、質問応答型対話を対話型分類３とし、情報受理型対話を対話型分類４とする。 First, in the WOZ (Wizard of OZ) experiment or the like, correct dialogue transition history data as shown below is created.
User: I came to play.
The interactive device: Hello, Who did you come? (Information-accepting dialogue)
User: I came with a friend.
Dialogue device: Yeah. (Information-accepting dialogue)
Dialogue device: XX is on display today. (Information-providing dialogue)
User: What is XX?
Dialogue device: It's the topic of xx now. That famous △△ made it. (Question answering dialogue)
Dialogue device: Are you interested in XX? (Information acquisition dialogue)
From the above dialog transition history data (initial state → information acceptance type dialogue → information acceptance type dialogue → information provision type dialogue → question answer type dialogue → information acquisition type dialogue) A transition probability a_xy for transition from the type classification x to the interactive classification y is calculated.
a_xy =
(Number of times of transition from interactive classification x to interactive classification y) / (Number of times of interactive classification x)
... (1)
However, the information provision type dialogue is assumed to be interactive category 1, the information acquisition type dialogue is assumed to be interactive type 2, the question answering type dialogue is assumed to be interactive type 3, and the information acceptance type dialogue is assumed to be interactive type 4.

複数の対話遷移履歴データから、上記（１）式を用いて、４種類の対話型分類の遷移の全組み合わせについて遷移確率を算出して、遷移確率記憶部２８に記憶する。 From the plurality of dialog transition history data, the transition probability is calculated for all combinations of transitions of the four types of interactive classification using the above equation (1), and stored in the transition probability storage unit 28.

確率算出部３２は、遷移確率記憶部２８に記憶された遷移確率と、判別結果記憶部３０に記憶された前回の判別結果としての対話型分類とに基づいて、前回の判別結果の対話型分類から所定の対話型分類への遷移確率を、現在の判別対象の対話の対話型分類が所定の対話型分類である確率として算出する。 Based on the transition probability stored in the transition probability storage unit 28 and the interactive classification as the previous determination result stored in the determination result storage unit 30, the probability calculation unit 32 uses the interactive classification of the previous determination result. The transition probability from 1 to the predetermined interactive classification is calculated as the probability that the interactive classification of the current discrimination target dialog is the predetermined interactive classification.

対話型分類判別部３４は、無入力時間カウンタ２２によって計測された無入力時間が、５秒以上であると、確率算出部３２によって算出された情報提供型対話である確率及び情報獲得型対話である確率に基づいて、現在のユーザとの対話が、情報提供型対話及び情報獲得型対話の何れかであると判別する。また、音声切り出し部２０によって入力音声が切り出された場合には、音声認識部２４の結果及び特徴量抽出部２６によって抽出された音声の特徴量に基づいて、ユーザからの発話が、質問又は要求であるか否かを判定する。 When the no-input time measured by the no-input time counter 22 is 5 seconds or more, the interactive classification discriminating unit 34 performs the probability and information acquisition type dialogue that is the information provision type dialogue calculated by the probability calculation unit 32. Based on a certain probability, it is determined that the current interaction with the user is either an information providing interaction or an information acquisition interaction. When the input voice is cut out by the voice cutout unit 20, the utterance from the user is asked or asked based on the result of the voice recognition unit 24 and the voice feature amount extracted by the feature amount extraction unit 26. It is determined whether or not.

質問又は要求であるか否かの判定では、音声認識部２４で得られた信頼度が最も高い音声認識結果の文末の構造が、質問文又は要求文の構造（例えば、「〜か？」、「〜教えて」、又は「〜したい」）と同一であれば、ユーザからの発話が、質問又は要求であると判定する。また、音声の特徴量としての基本周波数を、予め求められた質問又は要求であるときの基本周波数（音声の最後で右上りになる基本周波数）と比較して、抽出された基本周波数と、質問又は要求であるときの予め求められた基本周波数とが類似していれば、ユーザからの発話が、質問又は要求であると判定する。 In the determination as to whether or not it is a question or a request, the structure at the end of the speech recognition result with the highest reliability obtained by the speech recognition unit 24 is the structure of the question sentence or the request sentence (for example, “??” If it is the same as “˜tell me” or “want to do”), it is determined that the utterance from the user is a question or request. In addition, the fundamental frequency as the feature amount of the voice is compared with the fundamental frequency at the time of the question or request obtained in advance (the fundamental frequency that becomes the upper right at the end of the speech), and the extracted fundamental frequency and the question Alternatively, if the fundamental frequency obtained in advance is similar to the request, it is determined that the utterance from the user is a question or a request.

上記のように、ユーザからの発話が、質問又は要求であると判定された場合には、現在のユーザとの対話が、質問応答型対話であると判別し、質問又は要求でないと判定された場合には、確率算出部３２によって算出された情報獲得型対話である確率及び情報受理型対話である確率に基づいて、現在のユーザとの対話が、情報獲得型対話及び情報受理型対話の何れかであると判別する。 As described above, when it is determined that the utterance from the user is a question or a request, it is determined that the current interaction with the user is a question answering-type interaction, and is determined not to be a question or a request. In this case, based on the probability of the information acquisition type dialogue and the probability of the information reception type dialogue calculated by the probability calculation unit 32, the current user dialogue is either the information acquisition type dialogue or the information acceptance type dialogue. It is determined that

また、対話装置１０のコンピュータ１４は、音声認識部２４によって認識されたユーザからの発話、及び後述する応答生成部３８によって生成された応答文を対話履歴として記憶する対話履歴記憶部３６と、対話型分類判別部３４によって判別された対話型分類、音声認識部２４によって認識されたユーザからの発話、及び対話履歴記憶部３６に記憶された対話履歴に基づいて、ユーザからの発話に対する応答文を生成する応答生成部３８とを備えている。 In addition, the computer 14 of the dialogue apparatus 10 includes a dialogue history storage unit 36 that stores, as a dialogue history, an utterance from the user recognized by the voice recognition unit 24 and a response sentence generated by a response generation unit 38 to be described later. Based on the interactive classification determined by the type classification determination unit 34, the utterance from the user recognized by the voice recognition unit 24, and the conversation history stored in the conversation history storage unit 36, a response sentence to the utterance from the user is obtained. And a response generation unit 38 that generates the response.

応答生成部３８は、図３に示すように、対話型分類判別部３４により情報提供型対話であると判別されたときに、情報提供型対話に応じた応答文を生成する情報提供型応答生成部４２と、対話型分類判別部３４により情報獲得型対話であると判別されたときに、情報獲得型対話に応じた応答文を生成する情報獲得型応答生成部４４と、対話型分類判別部３４により質問応答型対話であると判別されたときに、質問応答型対話に応じた応答文を生成する質問応答型応答生成部４６と、対話型分類判別部３４により情報受理型対話であると判別されたときに、情報受理型対話に応じた応答文を生成する情報受理型応答生成部４８とを備えている。 As illustrated in FIG. 3, the response generation unit 38 generates an information provision type response generation that generates a response sentence corresponding to the information provision type dialogue when the interactive type classification determination unit 34 determines that the information provision type dialogue is performed. An information acquisition type response generation unit 44 that generates a response sentence corresponding to the information acquisition type dialogue when it is determined by the unit 42 and the interactive type classification determination unit 34, and an interactive classification determination unit When it is determined by the question answering type dialogue 34 that the question answering type response generating unit 46 generates a response sentence corresponding to the question answering type dialogue and the interactive classification discriminating unit 34 is the information acceptance type dialogue. An information acceptance type response generation unit 48 that generates a response sentence corresponding to the information acceptance type dialogue when it is determined.

情報提供型応答生成部４２は、ユーザに提供する情報を複数記憶した情報データベース４２Ａと、対話履歴記憶部３６に記憶された対話履歴と情報データベース４２Ａの情報とに基づいて、情報提供型対話に応じた応答文を生成する応答文生成部４２Ｂとを備えている。応答文生成部４２Ｂは、対話履歴を利用して、情報データベース４２Ａからまだ出力されていない情報を選出し、選出された情報から、対話履歴から得られる前回のユーザからの発話に含まれる単語と最も関連深い情報を選択し、選択された情報を用いて応答文を生成する。 Based on the information database 42A storing a plurality of pieces of information to be provided to the user, the dialogue history stored in the dialogue history storage unit 36, and the information in the information database 42A, the information provision type response generation unit 42 performs an information provision type dialogue. A response sentence generation unit 42B that generates a response sentence corresponding to the response sentence. The response sentence generation unit 42B selects information that has not yet been output from the information database 42A by using the conversation history, and includes words included in the utterance from the previous user obtained from the conversation history from the selected information. The most relevant information is selected, and a response sentence is generated using the selected information.

例えば、まだ出力されていない情報が、「今日は天気がいいね」や「今日は○○○が展示してあるよ」などであり、前回のユーザからの発話が「こんにちは」である場合には、「今日は天気がいいね」を選択して応答文を生成する。 For example, information that has not yet been output, and the like "I today are exhibited is ○○○", "weather It would be a good today" and, if the utterance from the previous user is "Hello" Selects “I like the weather today” and generates a response sentence.

情報獲得型応答生成部４４は、ユーザへの質問文を複数記憶した質問データベース４４Ａと、対話履歴記憶部３６に記憶された対話履歴と質問データベース４４Ａの質問文とに基づいて、情報獲得型対話に応じた応答文を生成する応答文生成部４４Ｂとを備えている。応答文生成部４４Ｂは、対話履歴を利用して、質問データベース４４Ａからまだ出力されていない質問文を選出し、選出された質問文から、対話履歴から得られる前回のユーザからの発話に含まれる単語と最も関連深い質問文を選択し、選択された質問文を用いて応答文を生成する。また、質問データベース４４Ａの中に、対話装置１０が最も知りたい質問を示す質問文が存在する場合には、最も知りたい質問を示す質問文を選択し、選択された質問文を用いて応答文を生成する。 The information acquisition type response generation unit 44 is based on the question database 44A storing a plurality of question sentences to the user, the dialog history stored in the dialog history storage unit 36, and the question text in the question database 44A. And a response sentence generation unit 44B that generates a response sentence corresponding to the above. The response sentence generation unit 44B selects a question sentence that has not yet been output from the question database 44A using the conversation history, and is included in the utterance from the previous user obtained from the conversation history from the selected question sentence. A question sentence most closely related to the word is selected, and a response sentence is generated using the selected question sentence. When there is a question sentence indicating the question that the dialogue apparatus 10 wants to know most in the question database 44A, the question sentence indicating the question that the dialogue apparatus 10 wants to know most is selected, and a response sentence is selected using the selected question sentence. Is generated.

例えば、まだ出力されていない質問文が、「何について知りたい？」や「○○○（例えば、展示物の名称）に興味ある？」などであり、最も知りたい質問を示す質問文が「○○○（展示物の名称）に興味ある？」である場合には、「○○○（展示物の名称）に興味ある？」を選択して応答文を生成する。 For example, a question sentence that has not been output yet is “What do you want to know?” Or “Is you interested in XX (for example, the name of an exhibit?)? If it is “Is interested in the name of the exhibit?”, The user selects “Are you interested in the name of the exhibit?” And generates a response sentence.

質問応答型応答生成部４６は、ユーザからの質問文又は要求文と質問文又は要求文に対する回答文とを対応させて複数記憶した質問回答データベース４６Ａと、対話履歴記憶部３６に記憶された対話履歴と質問回答データベース４６Ａの回答文とに基づいて、質問応答型対話に応じた応答文を生成する応答文生成部４６Ｂとを備えている。応答文生成部４６Ｂは、対話履歴を利用して、質問回答データベース４６Ａからまだ出力されていない回答文を選出し、選出された回答文に対応する質問文から、音声認識部２４から得られたユーザからの発話に最も一致する質問文を選択し、選択された質問文に対する回答文を用いて応答文を生成する。 The question response type response generation unit 46 includes a question answer database 46A in which a plurality of question sentences or request sentences from the user and answer sentences to the question sentences or request sentences are associated with each other, and a dialogue stored in the dialogue history storage unit 36. A response sentence generation unit 46B that generates a response sentence corresponding to the question-response dialogue based on the history and the answer sentence of the question answer database 46A is provided. The response sentence generation unit 46B uses the dialogue history to select an answer sentence that has not been output from the question answer database 46A, and is obtained from the voice recognition unit 24 from the question sentence corresponding to the selected answer sentence. A question sentence that most closely matches the utterance from the user is selected, and a response sentence is generated using an answer sentence for the selected question sentence.

例えば、まだ出力されていない回答文が、「案内カウンタの右奥にあるよ」や「２Ｆ特別展示室にあるよ」であり、これらの回答文に対する質問文「トイレはどこ？」及び「○○（展示物の名称）はどこにある？」から、音声認識結果である「トイレは？」に最も類似する質問文「トイレはどこ？」を選択し、選択された質問文に対応する回答文「案内カウンタの右奥にあるよ」を用いて応答文を生成する。 For example, the answer sentences that have not yet been output are “I am in the right back of the guide counter” and “I am in the 2F special exhibition room”, and the question sentences “Where is the toilet?” And “○” ○ From "Where is the name of the exhibit?", Select the question sentence "Where is the toilet?" That is most similar to the voice recognition result "Where is the toilet?", And the answer sentence corresponding to the selected question sentence A response sentence is generated using “I'm right behind the guide counter”.

情報受理型応答生成部４８は、文構造と文構造に対する応答文とを対応させて複数記憶した応答データベース４８Ａと、音声認識部２４から得られたユーザからの発話の構造を解析する構造解析部４８Ｃと、構造解析部４８Ｃによって解析されたユーザからの発話の構造と応答データベース４８Ａの応答文とに基づいて、情報受理型対話に応じた応答文を生成する応答文生成部４８Ｂとを備えている。 The information reception type response generation unit 48 is a structure of a response database 48A that stores a plurality of sentence structures and response sentences corresponding to the sentence structure, and a structure analysis unit that analyzes the structure of the utterance from the user obtained from the speech recognition unit 24. 48C, and a response sentence generation unit 48B that generates a response sentence corresponding to the information acceptance type dialogue based on the structure of the utterance from the user analyzed by the structure analysis unit 48C and the response sentence of the response database 48A. Yes.

構造解析部４８Ｃは、音声認識部２４から得られたユーザからの発話に対して形態素解析を行い、また、単語辞書を用いて、ユーザからの発話の構造（品詞や意味）を解析する。応答文生成部４８Ｂは、応答データベース４８Ａを利用して、構造解析部４８Ｃにより解析された構造に対する応答文の候補を複数生成し、対話履歴を利用して、最も自然な応答文の候補を選択して、ユーザ発話に対する応答文を生成する。 The structure analysis unit 48C performs morphological analysis on the utterance from the user obtained from the speech recognition unit 24, and analyzes the structure (part of speech and meaning) of the utterance from the user using a word dictionary. The response sentence generation unit 48B generates a plurality of response sentence candidates for the structure analyzed by the structure analysis unit 48C using the response database 48A, and selects the most natural response sentence candidate using the dialogue history. Then, a response sentence to the user utterance is generated.

例えば、「○○（人名）」というキーワードに対する応答文が「へぇそうなんだ。」であり、「天気」や「良い」というキーワードに対する応答文が「選択物がよく乾くね」であり、「来たよ」という述語に対する応答文が「誰と来たの？」であり、「見たよ」という述語に対する応答文が「いつ見たの？」である場合、音声認識結果であるユーザ発話「あっくんと遊びに来たよ」から、「あっくん」という人名と、「来たよ」という述語とが構造として得られると、応答文の候補として「誰と来たの？」及び「へぇそうなんだ。」が生成される。そして、対話履歴を利用して、最も自然な応答文として、「へぇそうなんだ。」が選択され、応答文が生成される。 For example, the response sentence for the keyword “XX (person name)” is “Yes”, the response sentence for the keyword “weather” or “good” is “the selection is dry well” If the response to the predicate “Tayo” is “Who are you with?” And the response to the predicate “I saw you” is “When did you see it?”, The user's utterance “Akkun When the name “Aku” and the predicate “I came” are obtained as a structure from “I came to play,” “Who are you?” And “Hey, are n’t you?” Is done. Then, using the conversation history, “Hey is so” is selected as the most natural response sentence, and a response sentence is generated.

次に、本実施の形態の原理について説明する。ユーザと対話装置との対話には、対話の主導権がユーザ側及び対話装置型の何れにあるかで、対話装置主導型の対話とユーザ主導型の対話の２種類がある。対話装置主導型の対話装置は、対話装置が応答できる応答をユーザに示すことができるが、ユーザが入力したいことを直接入力することができない、という問題がある。そこで、本実施の形態に係る対話装置１０では、ユーザが入力したいことを直接入力した場合に、質問応答型対話に応じた応答文を生成して、ユーザの発話に対して応答する。 Next, the principle of this embodiment will be described. There are two types of dialogues between the user and the dialogue device, depending on whether the user has the initiative of dialogue or the dialogue device type, the dialogue device-led dialogue and the user-led dialogue. The interactive device initiative type interactive device can show the user a response that the interactive device can respond to, but has a problem that the user cannot directly input what he / she wants to input. Therefore, in the interactive apparatus 10 according to the present embodiment, when the user directly inputs what he / she wants to input, a response sentence corresponding to the question answering type dialogue is generated and responded to the user's utterance.

また、ユーザ主導型の対話装置は、ユーザの入力を待ち受け、入力に応じた応答を返すことができるが、ユーザの入力がない場合には、対話が成立しない、という問題がある。そこで、本実施の形態では、情報提供型対話に応じた応答文を生成して、ユーザの入力がない場合であっても対話を成立させる。 In addition, the user-initiated interactive device can wait for a user input and return a response according to the input, but there is a problem that a dialog is not established if there is no user input. Therefore, in this embodiment, a response sentence corresponding to the information provision type dialogue is generated, and the dialogue is established even when there is no user input.

また、対話装置主導型及びユーザ主導型のどちらの対話装置においても、擬人化された対話装置（２次元キャラクタ、動物・人型ロボットなどの形状をしていて、人間的な言葉を発する対話装置）を用いた場合には、対話装置として任意の入力を受け付けることができないにもかかわらず、ユーザは対人と同等の入力を対話装置に対して行なうことがある。例えば、展示場などに設置された音声入力可能な館内案内システム（展示場に関する発声「トイレはどこ？」や「何時までやってるの？」などを待ち受けている）に、館内案内システムとしては受け付けていない入力（例えば、「僕は、あっくんです。」、「遊びに来たよ。」、及び「今日は天気がいいね。」）があると（人間の心理的傾向で志向姿勢と呼ばれる。）、想定外の入力により、システムが誤応答したり、受け付けない発話として破棄（リジェクション）してしまう、という問題がある。システムが誤応答したり、「別の言い方で言って下さい。」などのリジェクション時の定型文を返すと、ユーザは、対話装置とは対話ができないと判断して、対話をやめてしまう。 In both interactive device-driven and user-driven interactive devices, anthropomorphic interactive devices (two-dimensional characters, animal / humanoid robots, etc., which generate human language) ), The user may make an input equivalent to that of a person to the dialog device even though the dialog device cannot accept any input. For example, an in-house guidance system installed in an exhibition hall that accepts voice input (waiting for utterances such as “Where is the toilet?” Or “What time are you waiting?”) (For example, “I'm so happy.”, “I'm coming to play.”, And “I like the weather today.”) There is a problem that the system erroneously responds due to an unexpected input or is discarded (rejected) as an unacceptable utterance. If the system responds incorrectly or returns a fixed phrase at the time of rejection such as “Please say in another way”, the user decides that he / she cannot interact with the dialog device and stops the dialog.

想定外発話の多くがユーザからの情報提供である場合が多いため、本実施の形態の対話装置１０では、ユーザから任意の入力が行なわれる場合に、情報獲得型対話又は情報受理型対話に応じた応答文を生成して、ユーザからの発話に応答する。 Since many of the unexpected utterances are often information provision from the user, the dialog device 10 according to the present embodiment responds to the information acquisition type dialog or the information reception type dialog when an arbitrary input is made from the user. The response sentence is generated and the user's utterance is responded.

上記のように、本実施の形態では、任意の入力に対し、ユーザとの対話を対話型分類に判別し、判別された対話型分類に応じた応答生成を行ことにより、想定外発話に対しても適切に受け付けることができる。 As described above, in the present embodiment, for an arbitrary input, an interaction with the user is determined as an interactive classification, and a response is generated according to the determined interactive classification. Can be accepted properly.

以下、上記のコンピュータ２２で実行される応答生成処理ルーチンについて図４を用いて説明する。まず、ステップ１００において、無入力時間の測定を開始し、次のステップ１０２において、入力部１２から入力された信号から、入力音声を示す音声信号が切り出されたか否かを判定する。入力部１２によってユーザによる発話に応じた音声信号が生成され、入力音声を示す音声信号が切り出されると、ステップ１０４において、認識用辞書データベースに登録された単語を参照して、入力された音声信号に基づいて、ユーザ発話を認識すると共に、認識されたユーザ発話の各々について信頼度を算出する。 Hereinafter, the response generation processing routine executed by the computer 22 will be described with reference to FIG. First, in step 100, measurement of the no-input time is started, and in the next step 102, it is determined whether or not an audio signal indicating the input audio has been cut out from the signal input from the input unit 12. When the voice signal corresponding to the user's utterance is generated by the input unit 12 and the voice signal indicating the input voice is cut out, in step 104, the input voice signal is referenced with reference to the word registered in the recognition dictionary database. Based on the above, the user utterance is recognized, and the reliability is calculated for each recognized user utterance.

そして、ステップ１０６では、音声の特徴量として、入力された音声信号の基本周波数を抽出し、ステップ１０８において、上記ステップ１０４で認識された信頼度が最も高いユーザ発話、及び上記ステップ１０６で抽出された音声信号の基本周波数に基づいて、ユーザからの発話が、質問又は要求を表わしているか否かを判定する。 In step 106, the fundamental frequency of the input audio signal is extracted as the audio feature quantity. In step 108, the user utterance with the highest reliability recognized in step 104 and extracted in step 106. Based on the fundamental frequency of the voice signal, it is determined whether or not the utterance from the user represents a question or a request.

上記ステップ１０８で、ユーザからの発話が、質問又は要求を表わしていると判定されると、ユーザとの対話が質問応答型対話であると判別し、ステップ１１０において、上記ステップ１０４で認識されたユーザからの発話、及び質問回答データベース４６Ａに基づいて、ユーザからの発話に対して、質問応答型対話に応じた応答文を生成して、音声出力部１６に出力して、応答生成処理ルーチンを終了する。 If it is determined in step 108 that the utterance from the user represents a question or a request, it is determined that the dialog with the user is a question answering type dialog, and in step 110, it is recognized in step 104. Based on the utterance from the user and the question answering database 46A, a response sentence corresponding to the question answering type dialogue is generated for the utterance from the user and output to the voice output unit 16, and a response generation processing routine is executed. finish.

また、上記ステップ１０８で、ユーザからの発話が、質問又は要求を表わしていないと判定されると、ステップ１１２において、遷移確率記憶部２８に記憶された遷移確率と前回の判別結果とに基づいて、現在のユーザとの対話が情報獲得型対話である確率及び情報受理型対話である確率の各々を算出する。そして、ステップ１１４で、上記ステップ１１２で算出された情報受理型対話である確率が情報獲得型対話である確率より大きいか否かを判定する。上記ステップ１１４で、情報受理型対話である確率が情報獲得型対話である確率より大きい場合には、ユーザとの対話が情報受理型対話であると判別し、ステップ１１６において、上記ステップ１０４で認識されたユーザからの発話の構造を解析し、解析されたユーザからの発話の構造及び応答データベース４８Ａに基づいて、ユーザからの発話に対して、情報受理型対話に応じた応答文を生成して、生成された応答文を音声合成し、音声出力部１６に出力して、応答生成処理ルーチンを終了する。 If it is determined in step 108 that the utterance from the user does not represent a question or a request, in step 112, based on the transition probability stored in the transition probability storage unit 28 and the previous determination result. The probability that the dialogue with the current user is an information acquisition dialogue and the probability that the dialogue with the current user is an information reception dialogue is calculated. In step 114, it is determined whether or not the probability of the information acceptance type dialogue calculated in step 112 is greater than the probability of the information acquisition type dialogue. If the probability of the information acceptance type dialogue is greater than the probability of the information acquisition type dialogue at step 114, it is determined that the dialogue with the user is an information acceptance type dialogue, and at step 116, the recognition at step 104 is recognized. The structure of the utterance from the user is analyzed, and a response sentence corresponding to the information reception type dialogue is generated for the utterance from the user based on the structure of the utterance from the analyzed user and the response database 48A. Then, the generated response sentence is synthesized with speech and output to the speech output unit 16, and the response generation processing routine is terminated.

上記ステップ１１４で、情報獲得型対話である確率が情報受理型対話である確率以下である場合には、ユーザとの対話が情報獲得型対話であると判別し、ステップ１１８において、記憶された対話履歴及び前回のユーザの発話に基づいて、情報獲得型対話に応じた応答文を生成して、生成された応答文を音声合成し、音声出力部１６に出力して、応答生成処理ルーチンを終了する。 If the probability of the information acquisition dialogue is equal to or less than the probability of the information acceptance dialogue in step 114, it is determined that the dialogue with the user is an information acquisition dialogue. In step 118, the stored dialogue is determined. Based on the history and the previous user's utterance, a response sentence corresponding to the information acquisition type dialogue is generated, the generated response sentence is synthesized by voice, and output to the voice output unit 16 to finish the response generation processing routine To do.

上記ステップ１０２で、入力部１２から入力された信号から、音声信号が切り出されなかった場合には、ステップ１２０において、測定された無入力時間が、５秒以上であるか否かを判定し、ユーザから音声が入力されない時間が５秒未満である場合には、ステップ１０２へ戻る。一方、上記ステップ１２０において、測定された無入力時間が、５秒以上であると判定されると、ステップ１２２において、遷移確率記憶部２８に記憶された遷移確率と前回の判別結果とに基づいて、現在のユーザとの対話が情報提供型対話である確率及び情報獲得型対話である確率の各々を算出する。そして、ステップ１２４で、上記ステップ１２２で算出された情報提供型対話である確率が、情報獲得型対話である確率より大きいか否かを判定する。上記ステップ１２４で、情報提供型対話である確率が情報獲得型対話である確率より大きい場合には、ユーザとの対話が情報提供型対話であると判別し、ステップ１２６において、記憶された対話履歴、前回のユーザからの発話、及び情報データベース４２Ａに基づいて、ユーザからの発話に対して、情報提供型対話に応じた応答文を生成して、生成された応答文を音声合成し、音声出力部１６に出力して、応答生成処理ルーチンを終了する。 If the audio signal is not cut out from the signal input from the input unit 12 in step 102, it is determined in step 120 whether or not the measured no-input time is 5 seconds or more. If the time during which no voice is input from the user is less than 5 seconds, the process returns to step 102. On the other hand, if it is determined in step 120 that the measured no-input time is 5 seconds or longer, in step 122, based on the transition probability stored in the transition probability storage unit 28 and the previous determination result. Each of the probability that the dialogue with the current user is an information provision type dialogue and the probability that the dialogue with the current user is an information acquisition type dialogue is calculated. Then, in step 124, it is determined whether or not the probability of the information provision type dialogue calculated in step 122 is larger than the probability of the information acquisition type dialogue. If the probability of the information providing dialogue is greater than the probability of the information acquiring dialogue in step 124, it is determined that the dialogue with the user is an information providing dialogue, and the stored dialogue history is stored in step 126. Based on the previous utterance from the user and the information database 42A, a response sentence corresponding to the information providing type dialogue is generated for the utterance from the user, and the generated response sentence is synthesized by speech and output as voice It outputs to the part 16, and a response generation process routine is complete | finished.

一方、上記ステップ１２４で、情報獲得型対話である確率が情報提供型対話である確率以上である場合には、ユーザとの対話が情報獲得型対話であると判別し、上記ステップ１１８へ移行し、情報獲得型対話に応じた応答文を生成して、生成された応答文を音声合成し、音声出力部１６に出力して、応答生成処理ルーチンを終了する。 On the other hand, if the probability of the information acquisition type dialogue is greater than or equal to the probability of the information provision type dialogue in step 124, it is determined that the dialogue with the user is the information acquisition type dialogue, and the process proceeds to step 118. Then, a response sentence corresponding to the information acquisition type dialogue is generated, the generated response sentence is synthesized by voice, and output to the voice output unit 16, and the response generation processing routine is terminated.

そして、上記のように生成された応答文が、音声出力部１６によって音声出力される。 Then, the response sentence generated as described above is output as voice by the voice output unit 16.

以上説明したように、第１の実施の形態に係る対話装置によれば、無入力時間、対話型分類の遷移確率、及びユーザからの発話が要求又は質問を表わしているか否かに基づいて、ユーザとの対話の対話型分類が、情報提供型対話、情報獲得型対話、質問応答型対話、及び情報受理型対話の何れであるかを判別し、判別された対話型分類に応じて応答文を生成することにより、任意の入力に対して適切に応答することができ、対話を継続させることができる。 As described above, according to the dialogue apparatus according to the first embodiment, based on the no-input time, the transition probability of the interactive classification, and whether the utterance from the user represents a request or a question, It is determined whether the interactive classification of the dialogue with the user is an information providing dialogue, an information acquisition dialogue, a question answering dialogue, or an information acceptance dialogue, and the response sentence is determined according to the identified interactive classification. By generating, it is possible to appropriately respond to any input and continue the dialogue.

また、判別された対話型分類に応じて、応答文の生成方法を切り替えることで、ユーザからのあらゆる入力を受け付け、破綻しない対話を実現することができる。 In addition, by switching the response sentence generation method according to the determined interactive classification, it is possible to realize a dialog that accepts any input from the user and does not fail.

次に、第２の実施の形態について説明する。なお、第１の実施の形態と同様の構成になっている部分については、同一符号を付して説明を省略する。 Next, a second embodiment will be described. In addition, about the part which has the structure similar to 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

第２の実施の形態では、対話装置の前を撮像部によって撮像し、対話装置の前に存在するユーザが、対話装置を見ているか否かを判定している点と、ユーザが対話装置を見ていると判定されると無入力時間の測定を開始している点とが第１の実施の形態と異なっている。 In the second embodiment, an image is captured in front of the interactive device by the imaging unit, and it is determined whether or not the user existing in front of the interactive device is watching the interactive device. The difference from the first embodiment is that the measurement of the no-input time is started when it is determined that the user is watching.

図５に示すように、第２の実施の形態に係る対話装置２１０は、入力部１２と、自装置の前を撮像する撮像部２１２と、入力部１２によって入力された信号、及び撮像装置２１２によって撮像された画像に基づいて、応答発話を生成するコンピュータ２１４と、音声出力部１６とを備えている。 As illustrated in FIG. 5, the interactive apparatus 210 according to the second embodiment includes an input unit 12, an imaging unit 212 that captures an image of the front of the device, a signal input by the input unit 12, and the imaging device 212. The computer 214 that generates a response utterance based on the image captured by the computer and the audio output unit 16 are provided.

撮像部２１２は、自装置の前にユーザが存在する場合に、ユーザの顔を含む領域を撮像するように配置されている。ユーザが自装置の前で自装置を見ている場合には、ユーザの顔が正面を向いている正面顔画像が撮像される。 The imaging unit 212 is arranged to capture an area including the user's face when a user is present in front of the device. When the user is looking at his / her own device in front of his / her own device, a front face image in which the user's face is facing the front is captured.

コンピュータ２１４は、音声切り出し部２０と、撮像部２１２によって撮像された画像から、ユーザの正面顔画像を検出する正面顔検出部２２０と、正面顔検出部２２０によって正面顔画像が検出された時から、入力音声の無入力時間を計測する無入力時間カウンタ２２２と、音声認識部２４と、特徴量抽出部２６と、遷移確率記憶部２８と、判別結果記憶部３０と、確率算出部３２と、対話型分類判別部３４と、対話履歴記憶部３６と、応答生成部３８とを備えている。 The computer 214 includes a front face detection unit 220 that detects a front face image of the user from images picked up by the sound extraction unit 20 and the image pickup unit 212, and a time when the front face image is detected by the front face detection unit 220. A no-input time counter 222 that measures the no-input time of the input voice, a voice recognition unit 24, a feature amount extraction unit 26, a transition probability storage unit 28, a discrimination result storage unit 30, a probability calculation unit 32, The interactive classification discriminating unit 34, the dialogue history storage unit 36, and the response generation unit 38 are provided.

正面顔検出部２２０は、撮像部２１２により撮像された画像から、正面を向いている顔画像の学習パターンを用いて、自装置の前に存在するユーザの正面顔画像を検出する。 The front face detection unit 220 detects the front face image of the user existing in front of the user's own device from the image captured by the imaging unit 212 using the learning pattern of the face image facing the front.

無入力時間カウンタ２２２は、正面顔検出部２２０によってユーザの正面顔画像が検出されると、ユーザが対話装置２１０に目を向けていると判断して、ユーザから音声が入力されない状態が継続する時間の計測を開始する。 When the front face image of the user is detected by the front face detection unit 220, the no-input time counter 222 determines that the user is looking at the interactive device 210, and the state where no sound is input from the user continues. Start measuring time.

第２の実施の形態に係る応答生成処理ルーチンでは、撮像された画像からユーザの正面顔画像が検出されたか否かを判定し、正面顔画像が検出されると、無入力時間の測定を開始する。そして、入力部１２から入力された信号から、音声信号が切り出されたか否かを判定し、入力された信号から、音声信号が切り出されなかった場合には、測定された無入力時間が、５秒以上であるか否かを判定する。測定された無入力時間が、５秒以上であると判定されると、ユーザが対話装置２１０を見ているにもかかわらず、音声入力が中断している時間が長いと判断し、現在の対話が情報提供型対話である確率及び情報獲得型対話である確率に基づいて、ユーザとの対話の対話型分類が、情報提供型対話及び情報獲得型対話の何れかであるかを判別し、判別された対話型分類に応じて、応答文を生成する。 In the response generation processing routine according to the second embodiment, it is determined whether or not the front face image of the user is detected from the captured image, and measurement of the no-input time is started when the front face image is detected. To do. Then, it is determined whether or not the audio signal is cut out from the signal input from the input unit 12, and if the audio signal is not cut out from the input signal, the measured no-input time is 5 It is determined whether or not it is equal to or longer than second. If it is determined that the measured no-input time is 5 seconds or longer, it is determined that the voice input is interrupted for a long time even though the user is watching the dialog device 210, and the current dialog Based on the probability that is an information-providing dialogue and the probability that it is an information-acquiring dialogue, it is determined whether the interactive classification of the dialogue with the user is an information-providing dialogue or an information-acquisition dialogue. A response sentence is generated according to the interactive classification.

一方、入力された信号から音声信号が切り出されると、音声認識結果及び音声信号の基本周波数に基づいて、ユーザからの発話が、質問又は要求を表わしているか否かを判定し、ユーザからの発話が、質問又は要求を表わしていると判定されると、ユーザとの対話が質問応答型対話であると判別し、質問応答型対話に応じた応答文を生成する。また、ユーザからの発話が、質問又は要求を表わしていないと判定されると、情報獲得型対話である確率及び情報受理型対話である確率に基づいて、ユーザとの対話の対話型分類が、情報受理型対話及び情報獲得型対話の何れであるかを判別し、判別された対話型分類に応じて、応答文を生成する。 On the other hand, when a speech signal is cut out from the input signal, it is determined whether or not the speech from the user represents a question or a request based on the speech recognition result and the fundamental frequency of the speech signal. If it is determined that it represents a question or a request, it is determined that the dialogue with the user is a question-response dialogue, and a response sentence corresponding to the question-response dialogue is generated. Further, when it is determined that the utterance from the user does not represent a question or a request, the interactive classification of the dialog with the user is based on the probability of the information acquisition dialog and the probability of the information acceptance dialog. It is determined whether the dialogue is an information reception type dialogue or an information acquisition type dialogue, and a response sentence is generated according to the discriminated dialogue type.

このように、ユーザが対話装置を見ていると判断されたときから、無入力時間の測定を開始することにより、ユーザからの入力がない状態が継続している時間を精度よく測定することができ、測定された無入力時間に基づいて、ユーザとの対話が情報提供型対話であると精度よく判別することができる。 Thus, it is possible to accurately measure the time during which there is no input from the user by starting the measurement of the no-input time from when it is determined that the user is looking at the interactive device. In addition, based on the measured no-input time, it is possible to accurately determine that the interaction with the user is an information providing interaction.

なお、上記の実施の形態では、ユーザからの発話に対する応答を、スピーカによる音声出力によって行う場合を例に説明したが、これに限定されるものではなく、ディスプレイに応答文を表示するようにしてもよい。 In the above embodiment, the case where the response to the utterance from the user is performed by the sound output by the speaker is described as an example, but the present invention is not limited to this, and the response sentence is displayed on the display. Also good.

次に、第３の実施の形態について説明する。なお、第１の実施の形態と同様の構成になっている部分については、同一符号を付して説明を省略する。 Next, a third embodiment will be described. In addition, about the part which has the structure similar to 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

第３の実施の形態では、ユーザからテキストデータが入力される点が第１の実施の形態と主に異なっている。 The third embodiment is mainly different from the first embodiment in that text data is input from the user.

図６に示すように、第３の実施の形態に係る対話装置３１０は、キーボードで構成され、かつ、ユーザからの入力文をテキストデータで入力する入力部３１２と、入力部３１２によって入力されたテキストデータに基づいて、応答文を生成するコンピュータ３１４と、ディスプレイで構成され、かつ、生成された応答文を表示する表示部３１６とを備えている。 As shown in FIG. 6, the dialogue apparatus 310 according to the third embodiment is configured with a keyboard, and an input unit 312 that inputs an input sentence from the user as text data, and is input by the input unit 312. A computer 314 that generates a response sentence based on the text data, and a display unit 316 that includes a display and displays the generated response sentence are provided.

コンピュータ３１４は、入力部３１２からの入力に基づいて、無入力時間を計測する無入力時間カウンタ３２２と、遷移確率記憶部２８と、判別結果記憶部３０と、確率算出部３２と、無入力時間カウンタ３２２によって計測された無入力時間、入力部３１２から入力されたテキストデータ、及び確率算出部３２によって算出された確率に基づいて、ユーザとの対話を、４種類の対話型分類の何れかに判別する対話型分類判別部３３４と、対話履歴記憶部３６と、応答生成部３８とを備えている。 The computer 314 includes a no-input time counter 322 that measures a no-input time based on an input from the input unit 312, a transition probability storage unit 28, a discrimination result storage unit 30, a probability calculation unit 32, and a no-input time. Based on the no-input time measured by the counter 322, the text data input from the input unit 312, and the probability calculated by the probability calculation unit 32, the dialogue with the user is classified into one of four types of interactive classifications. An interactive classification discriminating unit 334 for discriminating, a dialogue history storage unit 36, and a response generating unit 38 are provided.

次に、第３の実施の形態に係る応答生成処理ルーチンについて図７を用いて説明する。なお、第１の実施の形態と同様の処理については、同一符号を付して詳細な説明を省略する。 Next, a response generation processing routine according to the third embodiment will be described with reference to FIG. In addition, about the process similar to 1st Embodiment, the same code | symbol is attached | subjected and detailed description is abbreviate | omitted.

まず、ステップ１００において、無入力時間の測定を開始し、次のステップ３５０において、入力部３１２からテキストデータとして入力文が入力されたか否かを判定する。ユーザが入力部３１２によって入力文を入力すると、ステップ３５２において、上記ステップ３５０で入力された入力文が、質問又は要求を表わしているか否かを判定する。 First, in step 100, measurement of no-input time is started, and in the next step 350, it is determined whether or not an input sentence is input as text data from the input unit 312. When the user inputs an input sentence using the input unit 312, in step 352, it is determined whether or not the input sentence input in step 350 represents a question or a request.

上記ステップ３５２で、ユーザからの入力文が、質問又は要求を表わしていると判定されると、ユーザとの対話が質問応答型対話であると判別し、ステップ１１０において、ユーザからの入力文に対して、質問応答型対話に応じた応答文を生成して、表示部３１６に出力して、応答生成処理ルーチンを終了する。 If it is determined in step 352 that the input sentence from the user represents a question or a request, it is determined that the dialogue with the user is a question answering dialogue, and in step 110, the input sentence from the user is On the other hand, a response sentence corresponding to the question response type dialogue is generated and output to the display unit 316, and the response generation processing routine is terminated.

また、上記ステップ３５２で、ユーザからの入力文が、質問又は要求を表わしていないと判定されると、ステップ１１２において、情報獲得型対話である確率及び情報受理型対話である確率の各々を算出する。そして、ステップ１１４で、情報受理型対話である確率が情報獲得型対話である確率より大きいか否かを判定し、情報受理型対話である確率が情報獲得型対話である確率より大きい場合には、ユーザとの対話が情報受理型対話であると判別し、ステップ１１６において、ユーザからの入力文の構造、及び応答データベース４８Ａに基づいて、ユーザからの入力文に対して、情報受理型対話に応じた応答文を生成して、表示部３１６に出力し、応答生成処理ルーチンを終了する。 If it is determined in step 352 that the input sentence from the user does not represent a question or a request, in step 112, the probability of an information acquisition type dialogue and the probability of an information acceptance type dialogue are calculated. To do. In step 114, it is determined whether or not the probability of the information acceptance type dialogue is greater than the probability of the information acquisition type dialogue. If the probability of the information acceptance type dialogue is greater than the probability of the information acquisition type dialogue, In step 116, it is determined that the dialogue with the user is an information reception type dialogue. Based on the structure of the input sentence from the user and the response database 48A, the input dialogue from the user is changed to the information reception type dialogue. A corresponding response sentence is generated and output to the display unit 316, and the response generation processing routine ends.

上記ステップ１１４で、情報獲得型対話である確率が情報受理型対話である確率以下である場合には、ユーザとの対話が情報獲得型対話であると判別し、ステップ１１８において、記憶された対話履歴及び前回のユーザの入力文に基づいて、情報獲得型対話に応じた応答文を生成して、表示部３１６に出力し、応答生成処理ルーチンを終了する。 If the probability of the information acquisition dialogue is equal to or less than the probability of the information acceptance dialogue in step 114, it is determined that the dialogue with the user is an information acquisition dialogue. In step 118, the stored dialogue is determined. Based on the history and the previous user input sentence, a response sentence corresponding to the information acquisition type dialogue is generated and output to the display unit 316, and the response generation processing routine is terminated.

上記ステップ３５０で、入力部３１２から入力がなかった場合には、ステップ１２０において、測定された無入力時間が、５秒以上であるか否かを判定し、ユーザから入力文が入力されない状態が継続している時間が５秒未満である場合には、ステップ３５０へ戻る。一方、上記ステップ１２０において、測定された無入力時間が、５秒以上であると判定されると、ステップ１２２において、情報提供型対話である確率及び情報獲得型対話である確率の各々を算出する。そして、ステップ１２４で、情報提供型対話である確率が情報獲得型対話である確率より大きいか否かを判定し、情報提供型対話である確率が情報獲得型対話である確率より大きい場合には、ユーザとの対話が情報提供型対話であると判別し、ステップ１２６において、対話履歴、前回のユーザからの入力文、及び情報データベース４２Ａに基づいて、ユーザからの入力文に対して、情報提供型対話に応じた応答文を生成して、表示部３１６に出力し、応答生成処理ルーチンを終了する。 If there is no input from the input unit 312 in step 350, it is determined in step 120 whether or not the measured no-input time is 5 seconds or longer, and there is a state in which no input sentence is input from the user. If the duration is less than 5 seconds, the process returns to step 350. On the other hand, if it is determined in step 120 that the measured no-input time is 5 seconds or longer, each of the probability of the information providing dialogue and the probability of the information acquisition dialogue is calculated in step 122. . In step 124, it is determined whether or not the probability of being an information providing dialogue is greater than the probability of being an information acquiring dialogue, and if the probability of being an information providing dialogue is greater than the probability of being an information obtaining dialogue. It is determined that the dialogue with the user is an information provision type dialogue, and in step 126, information provision is provided for the input sentence from the user based on the dialogue history, the previous input sentence from the user, and the information database 42A. A response sentence corresponding to the type dialogue is generated and output to the display unit 316, and the response generation processing routine is terminated.

一方、上記ステップ１２４で、情報獲得型対話である確率が情報提供型対話である確率以上である場合には、ユーザとの対話が情報獲得型対話であると判別し、上記ステップ１１８へ移行し、情報獲得型対話に応じた応答文を生成して、表示部１６に出力し、応答生成処理ルーチンを終了する。 On the other hand, if the probability of the information acquisition type dialogue is greater than or equal to the probability of the information provision type dialogue in step 124, it is determined that the dialogue with the user is the information acquisition type dialogue, and the process proceeds to step 118. Then, a response sentence corresponding to the information acquisition type dialogue is generated and output to the display unit 16, and the response generation processing routine is terminated.

そして、上記のように生成された応答文が、表示部１６によって表示される。 Then, the response sentence generated as described above is displayed on the display unit 16.

このように、無入力時間、対話型分類の遷移確率、及びユーザからの入力文が要求又は質問を表わしているか否かに基づいて、ユーザとの対話の対話型分類が、情報提供型対話、情報獲得型対話、質問応答型対話、及び情報受理型対話の何れであるかを判別し、判別された対話型分類に応じて応答文を生成することにより、任意の入力に対して適切に応答することができ、対話を継続させることができる。 Thus, based on the no-input time, the transition probability of the interactive classification, and whether the input sentence from the user represents a request or a question, the interactive classification of the interaction with the user is an information providing conversation, Appropriate response to any input by discriminating between information acquisition type dialogue, question answering type dialogue and information acceptance type dialogue, and generating a response sentence according to the discriminated dialogue type Can continue the dialogue.

なお、上記の第１の実施の形態〜第３の実施の形態では、ユーザとの対話を４種類の対話型分類の中から判別する場合を例に説明したが、これに限定されるものではなく、情報提供型対話、質問応答型対話、及び情報受理型対話の３種類の対話型分類の中から判別するようにしてもよい。この場合には、無入力時間が所定時間以上である場合には、情報提供型対話であると判別し、ユーザからの発話が質問又は要求でないと判定された場合には、情報受理型対話であると判別すればよい。また、情報獲得型対話、質問応答型対話、及び情報受理型対話の３種類の対話型分類の中から判別するようにしてもよい。この場合には、無入力時間が所定時間以上である場合には、情報獲得型対話であると判別し、ユーザからの発話が質問又は要求でないと判定された場合には、算出された情報獲得型対話の確率及び情報受理型対話の確率に基づいて、情報獲得型対話及び情報受理型対話の何れかであると判別すればよい。 In the first to third embodiments, the case where the user interaction is determined from the four types of interactive classifications has been described as an example. However, the present invention is not limited to this. Instead, the determination may be made from three types of interactive classifications, that is, an information provision type dialogue, a question answering type dialogue, and an information reception type dialogue. In this case, when the non-input time is equal to or longer than the predetermined time, it is determined that the dialogue is an information provision type dialogue. When it is determined that the utterance from the user is not a question or a request, the information acceptance type dialogue is performed. What is necessary is just to discriminate | determine. Further, it may be determined from three types of interactive classifications, that is, an information acquisition type dialogue, a question answering type dialogue, and an information acceptance type dialogue. In this case, if the non-input time is equal to or longer than the predetermined time, it is determined that the dialogue is an information acquisition type. If it is determined that the utterance from the user is not a question or a request, the calculated information acquisition is performed. What is necessary is just to discriminate | determine that it is either an information acquisition type | mold dialog and an information reception type | mold dialog based on the probability of a type | mold dialog, and the probability of an information reception type | mold dialog.

また、前回の判別結果と遷移確率とを用いて、各対話型分類である確率を算出する場合を例に説明したが、これに限定されるものではなく、前回以前の複数の判別結果と遷移確率とを用いて、各対話型分類の確率を算出するようにしてもよい。 In addition, the case where the probability of each interactive classification is calculated using the previous discrimination result and the transition probability has been described as an example, but the present invention is not limited to this. The probability of each interactive classification may be calculated using the probability.

また、情報データベース、質問データベース、質問回答データベース、及び応答データベースが、対話装置の内部に設けられている場合を例に説明したが、これに限定するものではなく、情報データベース、質問データベース、質問回答データベース、及び応答データベースを対話装置の外部に設け、これらのデータベースと対話装置とをネットワークで接続し、対話装置がネットワークを介してこれらのデータベースにアクセスするようにしてもよい。 In addition, the case where the information database, the question database, the question answer database, and the response database are provided inside the dialogue apparatus has been described as an example. However, the information database, the question database, and the question answer are not limited thereto. The database and the response database may be provided outside the interactive apparatus, and these databases and the interactive apparatus may be connected via a network so that the interactive apparatus accesses these databases via the network.

本発明の第１の実施の形態に係る対話装置の構成を示す概略図である。It is the schematic which shows the structure of the dialogue apparatus which concerns on the 1st Embodiment of this invention. 対話型分類の状態遷移図である。It is a state transition diagram of interactive classification. 本発明の第１の実施の形態に係る対話装置の応答生成部の構成を示すブロック図である。It is a block diagram which shows the structure of the response production | generation part of the dialogue apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る対話装置の応答生成処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the response generation process routine of the dialogue apparatus which concerns on the 1st Embodiment of this invention. 本発明の第２の実施の形態に係る対話装置の構成を示す概略図である。It is the schematic which shows the structure of the dialogue apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第３の実施の形態に係る対話装置の構成を示す概略図である。It is the schematic which shows the structure of the dialogue apparatus which concerns on the 3rd Embodiment of this invention. 本発明の第３の実施の形態に係る対話装置の応答生成処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the response generation process routine of the dialogue apparatus which concerns on the 3rd Embodiment of this invention.

Explanation of symbols

１０、２１０対話装置
１２マイクロホン
１４スピーカ
１６認識用辞書データベース
１８概念語辞書データベース
２０、２２０概念表現データベース
２１不明応答データベース
２２、２２２コンピュータ
２１８形容語辞書データベース 10, 210 Dialogue device 12 Microphone 14 Speaker 16 Recognition dictionary database 18 Concept word dictionary database 20, 220 Concept expression database 21 Unknown response database 22, 222 Computer 218 Adjective dictionary database

Claims

An input means for inputting at least one of an utterance and an input sentence by the user;
No-input time measuring means for measuring the time during which no input to the input means continues,
Request question determination means for determining whether at least one of the utterance and the input sentence input to the input means represents the request or question of the user;
When the time measured by the non-input time measuring means is a predetermined time or more, it is determined that the dialogue with the user is an information providing type in which the device itself provides information voluntarily, and the request question When it is determined by the determination means that the user's request or question is represented, it is determined that the dialogue with the user is a question response type in which the own device answers the user's request or question, When it is determined that the request question determination unit does not represent the user's request or question, the user's device is an information reception type in which the user's own device accepts information that the user voluntarily provides. Interactive discrimination means for discriminating;
Response generation means for generating a response sentence according to a determination result by the interactive determination means for at least one of the utterance and the input sentence;
Output means for outputting a response sentence generated by the response generation means;
Interactive device including

An input means for inputting at least one of an utterance and an input sentence by the user;
No-input time measuring means for measuring the time during which no input to the input means continues,
Request question determination means for determining whether at least one of the utterance and the input sentence input to the input means represents the request or question of the user;
When the time measured by the no-input time measuring means is a predetermined time or more, it is determined that the dialogue with the user is an information acquisition type in which the own device asks a question spontaneously, and the request question determining means When it is determined that the user's request or question is represented by the user's request or question, it is determined that the user's request or question is a question answering type in which the device answers the request. When it is determined by the question determination means that the request or question of the user is not represented, it is determined that the interaction with the user is an information reception type in which the device itself receives information that the user voluntarily provides Interactive discrimination means,
Response generation means for generating a response sentence according to a determination result by the interactive determination means for at least one of the utterance and the input sentence;
Output means for outputting a response sentence generated by the response generation means;
Interactive device including

An input means for inputting at least one of an utterance and an input sentence by the user;
No-input time measuring means for measuring the time during which no input to the input means continues,
Request question determination means for determining whether at least one of the utterance and the input sentence input to the input means represents the request or question of the user;
In the dialogue between the user and the user device, the information providing type in which the user device voluntarily provides information, the information acquisition type in which the user device voluntarily asks a question, the question response in which the user device answers the user request or question A transition history storage means for storing a transition history of an interactive classification consisting of a type and an information acceptance type in which the device accepts information that the user voluntarily provides;
When the time measured by the non-input time measuring means is a predetermined time or more, based on the past determination result and the transition history, the interaction with the user is either the information providing type or the information acquiring type. If it is determined by the request question determination means that it represents the user's request or question, it is determined that the interaction with the user is the question response type, and the request If it is determined by the question determination means that it does not represent the user's request or question, the interactive determination means for determining that the interaction with the user is the information acceptance type,
Response generation means for generating a response sentence according to a determination result by the interactive determination means for at least one of the utterance and the input sentence;
Output means for outputting a response sentence generated by the response generation means;
Interactive device including

When the interactive determination means determines that the request question determination means does not represent the user's request or question, based on the past determination result and the transition history, the interaction with the user is: The interactive apparatus according to claim 3, wherein the interactive apparatus determines that the information reception type or the information acquisition type is selected.

When it is determined that the response generation unit is the information providing type, the response generation unit uses the information selected from the plurality of pieces of information stored in the information database storing a plurality of pieces of information to be provided to the user. The interactive apparatus according to claim 1, 3, or 4.

When it is determined that the response generation means is the question response type, the question sentence stored in the question answer database in which the question sentence or the request sentence and the answer sentence to the question sentence or the request sentence are stored in association with each other Alternatively, from the request sentence, the question sentence or the request sentence corresponding to the request or question represented by at least one of the utterance and the input sentence is searched, and the response sentence is used by using the answer sentence to the searched question sentence or request sentence. The interactive device according to claim 1, wherein the interactive device is generated.

An analysis unit that analyzes the structure of at least one of the utterance and the input sentence input by the input unit;
When it is determined that the response generation unit is the information acceptance type, the structure is stored in a response database in which the structure and a response sentence to the structure are associated with each other and analyzed by the analysis unit The dialogue apparatus according to claim 1, wherein the response sentence is generated using a response sentence corresponding to.

When it is determined that the response generation unit is the information acquisition type, the response generation unit uses a question sentence selected from the plurality of question sentences stored in a question request database storing a plurality of question sentences or request sentences to the user. 5. The dialogue apparatus according to any one of claims 2 to 4, wherein the response sentence is generated.

The input means inputs an utterance by a user,
The request question determination unit extracts a feature amount of speech for the utterance input from the input unit, and the extracted feature amount and the feature amount of an utterance representing a question or request obtained in advance. The dialogue apparatus according to any one of claims 1 to 8, wherein at least one of the utterance and the input sentence indicates whether or not it represents a user request or a question.

Computer
A no-input time measuring means for measuring the time during which there is no input to the input means for inputting at least one of the utterance and the input sentence by the user,
Request question determination means for determining whether at least one of the utterance and the input sentence input to the input means represents the user's request or question,
When the time measured by the non-input time measuring means is a predetermined time or more, it is determined that the dialogue with the user is an information providing type in which the device itself provides information voluntarily, and the request question When it is determined by the determination means that the user's request or question is represented, it is determined that the dialogue with the user is a question response type in which the own device answers the user's request or question, When it is determined that the request question determination unit does not represent the user's request or question, the user's device is an information reception type in which the user's own device accepts information that the user voluntarily provides. And an interactive type discriminating unit that discriminates between the utterance and the input sentence, and a response generating unit that generates a response sentence according to the discrimination result by the interactive type discriminating unit Because of the program.

Computer
A no-input time measuring means for measuring the time during which there is no input to the input means for inputting at least one of the utterance and the input sentence by the user,
Request question determination means for determining whether at least one of the utterance and the input sentence input to the input means represents the user's request or question,
When the time measured by the no-input time measuring means is a predetermined time or more, it is determined that the dialogue with the user is an information acquisition type in which the own device asks a question spontaneously, and the request question determining means When it is determined that the user's request or question is represented by the user's request or question, it is determined that the user's request or question is a question answering type in which the device answers the request. When it is determined by the question determination means that the request or question of the user is not represented, it is determined that the interaction with the user is an information reception type in which the device itself receives information that the user voluntarily provides An interactive discriminating means for generating a response sentence corresponding to a discrimination result by the interactive discriminating means for at least one of the utterance and the input sentence. Program.

Computer
A no-input time measuring means for measuring the time during which there is no input to the input means for inputting at least one of the utterance and the input sentence by the user,
Request question determination means for determining whether at least one of the utterance and the input sentence input to the input means represents the user's request or question,
In the dialogue between the user and the user device, the information providing type in which the user device voluntarily provides information, the information acquisition type in which the user device voluntarily asks a question, the question response in which the user device answers the user request or question A transition history storage means for storing a transition history of interactive classification consisting of a type and an information reception type in which the device accepts information voluntarily provided by the user,
When the time measured by the non-input time measuring means is a predetermined time or more, based on the past determination result and the transition history, the interaction with the user is either the information providing type or the information acquiring type. If it is determined by the request question determination means that it represents the user's request or question, it is determined that the interaction with the user is the question response type, and the request When it is determined by the question determination means that the user's request or question is not represented, the interactive determination means for determining that the dialogue with the user is the information acceptance type, and the utterance and the input sentence A program for causing at least one to function as a response generation unit that generates a response sentence according to a determination result by the interactive determination unit.