JP2007102104A

JP2007102104A - Device and method for answer sentence generation, and program and storage medium thereof

Info

Publication number: JP2007102104A
Application number: JP2005295208A
Authority: JP
Inventors: Hajime Tsukada; 元塚田; Matthias Denecke; デネッケ・マティアス; Yoshihito Yasuda; 宜仁安田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2005-10-07
Filing date: 2005-10-07
Publication date: 2007-04-19
Anticipated expiration: 2025-10-07
Also published as: JP4755478B2

Abstract

<P>PROBLEM TO BE SOLVED: To eliminate the need for complicated rule setting for an interaction system. <P>SOLUTION: An answer sentence generation device 100 is equipped with an interaction corpus storage part 133, which stores a corpus given utterance pairs of user utterances and their answer sentences and label the values of interaction states of the utterance pairs. Namely, an interaction state management part 140 decides what interaction state an input user utterance is in (what kind of label value the input user utterance corresponds to) based on the user utterances and answer sentences, and label values included in the interaction corpus. An answer sentence generation part 150 retrieves an answer sentence to the user utterance similar to the decided interaction state from the interaction corpus. Namely, the answer sentence which is most suitable as an answer sentence to the input user utterance with high possibility is retrieved from the interaction corpus storage part 133. Then the retrieved answer sentence is corrected and output as the answer sentence with respect to the input user utterance. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、対話システムにおける応答文生成技術に関する。 The present invention relates to a response sentence generation technique in a dialog system.

自然言語による対話システムは、人と機械との自然なコミュニケーションを実現する手段として広く使われている。電話による自動応答サービスやＷｅｂ（World Wide Web）を利用した対話型サービス等はその一例である（非特許文献１等参照）。 Natural language dialogue systems are widely used as a means of realizing natural communication between people and machines. Examples thereof include an automatic answering service by telephone and an interactive service using the Web (World Wide Web) (see Non-Patent Document 1, etc.).

従来の対話システムでは、データベースやオントロジといった構造化されたバックエンドを用いていた(非特許文献２〜非特許文献４参照)。
例えば、バックエンドのデータベースに「出発停留所名」、「到着停留所名」、「曜日」、「時刻」というフィールドを持つようなバス情報案内システム（システム）において、「本厚木から通信研究所前まで」というユーザからの入力があったとする。
この場合、システムは、バックエンドのデータベースを参照して、「出発停留所名」は「本厚木」であり、「到着停留所名」は「通信研究所前」であると判断できる。また、システムユーザからの入力内容と、バックエンドのデータベースに設定されたフィールドに基づき、この質問で何の情報が欠けているかが分かる。
つまり、システムは、ユーザからの入力内容には「曜日」と「時刻」とが欠けていることが分かる。したがって、システムは、次に「曜日」と「時刻」とに関する質問を行えばよいと判断できる。
“音声処理応用システム・音声インタフェース”、［online］、［平成17年9月1日検索]、インターネット<URL:http://www.icot.or.jp/FTS/REPORTS/H10-reports/AITEC9903Re2＿Folder/AITEC9903R2-sec3-2.htm> A.Rundnicky and X.Wu,An Agenda-Based Dialog Management Architecture for Spoken Language Systems,Proceedings of the Workshop on Automatic Speech Recognition and Understanding,1999 M.Denecke and A.h.Waibel,Dialogue Strategies Guiding Users to their Commnucative Goals, Proceedings of Eurospeech,Rohdos,Greece,1997 A.Ferrieux and M.D.Sadek,An Efficient Date-Driven Model for Cooperative SpokenDialogue,Proceedings of International Conference on Spoken Language Proceedings,Yokohama,Japan,1994 In a conventional dialogue system, a structured back end such as a database or ontology is used (see Non-Patent Document 2 to Non-Patent Document 4).
For example, in a bus information guidance system (system) that has fields such as “departure stop name”, “arrival stop name”, “day of the week”, and “time” in the back-end database, “From Hon-Atsugi to Communication Research Institute” Is input from the user.
In this case, with reference to the back-end database, the system can determine that the “departure stop name” is “Honatsugi” and the “arrival stop name” is “in front of the communication laboratory”. Also, based on the input from the system user and the fields set in the backend database, you can see what information is missing from this question.
That is, the system knows that “day of the week” and “time” are missing from the input content from the user. Therefore, the system can determine that the question regarding “day of the week” and “time” should be made next.
"Speech processing application system / voice interface", [online], [searched on September 1, 2005], Internet <URL: http: //www.icot.or.jp/FTS/REPORTS/H10-reports/AITEC9903Re2_Folder /AITEC9903R2-sec3-2.htm> A. Rundnicky and X. Wu, An Agenda-Based Dialog Management Architecture for Spoken Language Systems, Proceedings of the Workshop on Automatic Speech Recognition and Understanding, 1999 M. Denecke and AhWaibel, Dialogue Strategies Guiding Users to their Commnucative Goals, Proceedings of Eurospeech, Rohdos, Greece, 1997 A. Ferrieux and MDSadek, An Efficient Date-Driven Model for Cooperative SpokenDialogue, Proceedings of International Conference on Spoken Language Proceedings, Yokohama, Japan, 1994

しかし、対話システムにおいて、前記したようなバックエンドに用いるルールを作成するのは手間がかかる。特に、ユーザの質問が広範なものである場合、バックエンドもそれに応じて複雑なルールを設定する必要があるので、バックエンドの作成には非常に手間がかかるという問題がある。つまり、従来技術では対話型質問応答システム等の対話システムを構築する際に、構築コストが高くなるという問題がある。
そこで、本発明は、前記した課題を解決し、対話システムの構築コストを低減する応答文生成装置等を提供することを目的とする。 However, it takes time to create the rules used for the back end as described above in the interactive system. In particular, when the user has a wide range of questions, it is necessary to set complicated rules according to the back end. Therefore, there is a problem that it takes much time to create the back end. That is, the conventional technique has a problem that the construction cost becomes high when constructing an interactive system such as an interactive question answering system.
Accordingly, an object of the present invention is to provide a response sentence generation device and the like that solve the above-described problems and reduce the construction cost of a dialogue system.

前記した課題を解決するため、本発明の応答文生成装置は、複雑なルールを持つバックエンドを用いるかわりに、ユーザ発話とその応答文との発話対、およびその発話対における対話状態のラベル値を付与した対話コーパスを用いる構成とした。すなわち、応答文生成装置はこの対話コーパスに含まれるユーザ発話とラベル値とに基づき、入力されたユーザ発話がどのような対話状態か（どのようなラベル値に該当するか）を判断する。そして、対話コーパスからこの対話状態に類似するユーザ発話を探し、このユーザ発話に対する応答文を検索する。つまり、入力されたユーザ発話の応答文として最も適切な応答候補文を対話コーパスから検索する。そして、この検索した応答候補文に修正を行い、入力されたユーザ発話に対する応答文として出力する構成とした。 In order to solve the above-described problem, the response sentence generation device of the present invention, instead of using a back end having a complicated rule, an utterance pair of a user utterance and its response sentence, and a label value of a conversation state in the utterance pair The dialogue corpus to which That is, the response sentence generation device determines what dialogue state the input user utterance has (based on what label value) based on the user utterance and the label value included in the dialogue corpus. Then, a user utterance similar to this dialog state is searched from the dialog corpus, and a response sentence to this user utterance is searched. That is, the most appropriate response candidate sentence is searched from the dialogue corpus as the response sentence of the input user utterance. And it was set as the structure which corrects this searched response candidate sentence and outputs as a response sentence with respect to the input user utterance.

すなわち、本発明は、ユーザ発話に対する応答文を生成する応答文生成装置であって、前記ユーザ発話の入力を受け付けるユーザ発話入力部と、前記入力されたユーザ発話を解析して、前記ユーザ発話に含まれるキーワードを抽出するキーワード抽出部と、前記抽出されたキーワードおよび前記キーワードを含む文書データを記憶する発話状態記憶部と、前記抽出されたキーワードを前記発話状態記憶部に追加するキーワード追加部と、所定の文書データの集合である文書コーパスを記憶する文書コーパス記憶部から、前記抽出されたキーワードを含む文書データを検索する文書データ検索部と、前記検索された文書データを前記発話状態記憶部に出力し、前記発話状態記憶部の文書データを更新する発話状態更新部と、ユーザ発話とその応答文とからなる発話対におけるユーザ発話それぞれに、前記ユーザ発話の種別およびその応答文の種別の組み合わせにより識別される対話状態のラベル値を付与した対話コーパスを記憶する対話コーパス記憶部と、前記発話状態記憶部に記憶されたキーワードおよび前記検索された文書データに基づき算出されたユーザ発話の特徴量と前記対話コーパスのユーザ発話に含まれるキーワードおよびこのユーザ発話のラベル値とに基づき、前記入力されたユーザ発話のラベル値を判断する対話状態推測部と、前記対話コーパス記憶部から、前記判断されたラベル値と同じラベル値を持つユーザ発話に対する応答文を抽出する応答文抽出部と、前記抽出された各応答文の修正箇所を決定する修正箇所決定部と、前記決定された各修正箇所における置換候補の語を、前記対話コーパスおよび前記発話状態記憶部の文書データのうち少なくとも一方に含まれる語から選択する置換候補決定部と、前記抽出された各応答文の前記修正箇所に、前記選択された置換候補の語を置換した応答候補文を作成する応答候補文作成部と、前記作成された応答候補文のうち、前記発話状態記憶部に記憶されるキーワードを最も多く含む応答候補文を選択する応答文選択部と、前記選択された応答候補文を出力する応答文出力部とを備える構成とした。 That is, the present invention is a response sentence generation device that generates a response sentence for a user utterance, the user utterance input unit that receives input of the user utterance, and the input user utterance, A keyword extraction unit for extracting the included keyword, an utterance state storage unit for storing the extracted keyword and document data including the keyword, and a keyword addition unit for adding the extracted keyword to the utterance state storage unit; A document data search unit for searching for document data including the extracted keyword from a document corpus storage unit that stores a document corpus that is a set of predetermined document data; and the utterance state storage unit for the searched document data Utterance state update unit for updating the document data in the utterance state storage unit, user utterance and its response A dialogue corpus storage unit for storing a dialogue corpus in which a dialogue state label value identified by a combination of the type of the user utterance and the type of the response sentence is stored for each user utterance in the utterance pair consisting of a sentence, and the utterance Based on the keyword stored in the state storage unit and the feature amount of the user utterance calculated based on the retrieved document data, the keyword included in the user utterance of the dialogue corpus, and the label value of the user utterance, the input A dialogue state estimation unit that determines a label value of a user utterance, a response sentence extraction unit that extracts a response sentence to a user utterance having the same label value as the determined label value from the dialogue corpus storage unit, and the extraction A correction location determination unit for determining a correction location of each response sentence, and a position at each of the determined correction locations. The candidate word is selected from the words included in at least one of the dialogue corpus and the document data of the utterance state storage unit, and the correction candidate of the extracted response sentence is selected as the selected word. A response candidate sentence creation unit that creates a response candidate sentence in which the replacement candidate words are replaced, and of the created response candidate sentences, a response candidate sentence that includes the most keywords stored in the utterance state storage unit is selected. And a response sentence output unit that outputs the selected response candidate sentence.

この構成によれば、対話状態推測部がユーザ発話の対話状態のラベル値を判断し、このラベル値をキーとして、対話コーパスからこのユーザ発話に対する応答候補文を選択するので、複雑なルールを用いなくても応答文を生成することができる。また、応答候補文は対話コーパスや、検索された文書データに含まれる語で置換し、置換した応答候補文の中から発話状態記憶部に記憶されるキーワードを最も多く含む応答候補文を選択するので、ユーザ発話の応答文としてより適切な文を生成することができる。 According to this configuration, since the dialog state estimation unit determines the label value of the dialog state of the user utterance, and selects a response candidate sentence for the user utterance from the dialog corpus using this label value as a key, a complicated rule is used. A response sentence can be generated without it. In addition, the response candidate sentence is replaced with a dialogue corpus or a word included in the retrieved document data, and the response candidate sentence including the most keywords stored in the utterance state storage unit is selected from the replaced response candidate sentences. Therefore, a more appropriate sentence can be generated as a response sentence of the user utterance.

また、本発明の応答文生成装置の前記応答文抽出部は、前記対話コーパスから、前記判断されたラベル値と同じラベル値を持つユーザ発話と、そのユーザ発話に対する応答文とを検索し、前記入力されたユーザ発話と、前記検索されたユーザ発話の類似度を算出し、前記検索したユーザ発話のうち前記算出した類似度が最も高いものから所定数選択し、前記選択した所定数のユーザ発話に対する応答文を、前記対話コーパスから抽出する構成とした。 Further, the response sentence extraction unit of the response sentence generation device of the present invention searches the dialog corpus for a user utterance having the same label value as the determined label value and a response sentence for the user utterance, A similarity between the input user utterance and the searched user utterance is calculated, a predetermined number is selected from the searched user utterances having the highest similarity, and the selected predetermined number of user utterances The response sentence to is extracted from the dialogue corpus.

この構成によれば、応答文生成装置の演算処理能力に応じて、応答文の抽出数の絞り込みをすることができる。また、応答文生成装置は、対話コーパスから様々な応答文を抽出するので、より適切な応答文を生成できる。 According to this configuration, it is possible to narrow down the number of extracted response sentences according to the arithmetic processing capability of the response sentence generation device. Further, since the response sentence generation device extracts various response sentences from the dialogue corpus, it can generate a more appropriate response sentence.

本発明の応答文生成装置の前記応答文抽出部における、前記ユーザ発話の特徴量は、前記発話状態記憶部に記憶されるキーワードと、前記発話状態記憶部における前記キーワードの出現回数と、前記検索された文書データにおける前記キーワードの出現回数との組み合わせにより記述されたものとした。 The feature amount of the user utterance in the response sentence extraction unit of the response sentence generation device of the present invention includes a keyword stored in the utterance state storage unit, an appearance count of the keyword in the utterance state storage unit, and the search It is assumed that it is described by a combination with the number of appearances of the keyword in the document data.

この構成によれば、ユーザ発話の特徴量の記述を容易にすることができる。 According to this configuration, it is possible to easily describe the feature amount of the user utterance.

本発明の応答文生成装置の前記修正箇所決定部は、前記抽出した応答文に含まれる語のうち、前記対話コーパス中での出現頻度が所定の閾値よりも低い語の位置を、前記修正箇所として決定する構成とした。 The correction location determination unit of the response sentence generation device of the present invention determines the position of the word whose appearance frequency in the dialogue corpus is lower than a predetermined threshold among the words included in the extracted response sentence. It was set as the structure determined as.

この構成によれば、対話コーパスから抽出した応答文のうち、頻繁に用いられる語の集合（雛形に相当する文）が残り、その他の語（あまり頻繁に用いられない語）を置換して修正することになるので、応答文生成装置は、ユーザ発話の応答文としてより適切な応答文を生成することができる。 According to this configuration, among the response sentences extracted from the dialogue corpus, a set of frequently used words (sentences corresponding to templates) remains, and other words (words that are not used frequently) are replaced and corrected. Therefore, the response sentence generation device can generate a more appropriate response sentence as the response sentence of the user utterance.

本発明の応答文生成装置は、前記ユーザ発話の音声データをテキストデータに変換して、前記ユーザ発話入力部に出力し、前記応答文出力部から出力された応答文のテキストデータを音声データとして出力する音声処理部をさらに備える構成とした。 The response sentence generation device of the present invention converts the voice data of the user utterance into text data, outputs the text data to the user utterance input unit, and uses the text data of the response sentence output from the response sentence output unit as voice data The audio processing unit for outputting is further provided.

この構成によれば、応答文生成装置は、ユーザ発話を音声データで受け取ったときにも応答文生成を行うことができる。また、応答文生成装置は、生成した応答文を音声データで出力することができる。 According to this configuration, the response sentence generation device can generate a response sentence even when a user utterance is received as voice data. Further, the response sentence generation device can output the generated response sentence as voice data.

本発明の応答文生成装置において前記対話状態のラベル値は、前記ユーザ発話およびその応答文がそれぞれ、質問文であるか否と、質問文であるなら、ＹＥＳ／ＮＯ型質問文、ＷＨ型質問文、複数の回答を列挙すべき列挙型質問文およびそれ以外の質問文のいずれの文に分類されるかにより決定する構成とした。 In the response sentence generation device of the present invention, the label value of the dialog state includes whether the user utterance and the response sentence are question sentences, and if they are question sentences, a YES / NO question sentence and a WH question The structure is determined depending on whether the sentence is classified into a sentence, an enumerated question sentence that should list a plurality of answers, or a question sentence other than that.

この構成によれば、応答文生成装置は、少ない分類数で的確に対話状態を分類できるので、応答文生成の処理負荷を軽減することができる。 According to this configuration, the response sentence generation device can accurately classify the conversation state with a small number of classifications, and thus the processing load of response sentence generation can be reduced.

本発明によれば、対話型質問応答システム等の対話システムを構築する際に、複雑なルールを作成する必要がなくなる。つまり、本発明を対話システムに実装することで、対話システムの構築のコストを低減することができる。 According to the present invention, it is not necessary to create complicated rules when constructing an interactive system such as an interactive question answering system. That is, by implementing the present invention in a dialog system, the cost of constructing the dialog system can be reduced.

以下、図面を参照しながら、本発明を実施するための最良の形態（以下、実施の形態という）を説明する。
図１は、本実施の形態の応答文生成装置の構成を示すブロック図である。
応答文生成装置１００は、各種演算処理を行うＣＰＵ（Central Processing Unit）１０と、このＣＰＵ１０が演算処理を行う際に用いる記憶手段であるメインメモリ２０と、各種データの入出力を司る入出力インターフェース５０と、各種プログラムおよびデータを格納する記憶部１３０とを含んで構成される。この記憶部１３０は、例えばハードディスク装置等により構成される。ＣＰＵ１０は、この記憶部１３０に格納される各種プログラムおよびデータをメインメモリ２０に読み出し、演算処理を行うことにより後記する各機能を実現する。 Hereinafter, the best mode for carrying out the present invention (hereinafter referred to as an embodiment) will be described with reference to the drawings.
FIG. 1 is a block diagram showing a configuration of a response sentence generating apparatus according to the present embodiment.
The response sentence generation apparatus 100 includes a CPU (Central Processing Unit) 10 that performs various arithmetic processes, a main memory 20 that is a storage unit used when the CPU 10 performs arithmetic processes, and an input / output interface that controls input and output of various data. 50 and a storage unit 130 for storing various programs and data. The storage unit 130 is configured by, for example, a hard disk device. The CPU 10 implements each function described later by reading various programs and data stored in the storage unit 130 into the main memory 20 and performing arithmetic processing.

記憶部１３０には、ＣＰＵ１０が、入力されたユーザ発話に基づき応答文を生成するための応答文生成プログラム１０１を格納し、所定領域に、発話状態記憶部１３１と、文書コーパスを記憶する文書コーパス記憶部１３２と、対話コーパスを記憶する対話コーパス記憶部１３３とを備える。 The storage unit 130 stores a response sentence generation program 101 for the CPU 10 to generate a response sentence based on the input user utterance, and an utterance state storage unit 131 and a document corpus that stores a document corpus in a predetermined area. A storage unit 132 and a dialogue corpus storage unit 133 that stores a dialogue corpus are provided.

文書コーパスは、新聞記事等の様々な文書データの集合である。また、対話コーパスは、ユーザ発話とその応答文（発話対）を書き起こしたテキストに、この発話対の対話アクションラベル（後記）を付与したものである。発話状態記憶部１３１は、入力されたユーザ発話に含まれるキーワードと、このキーワードを含む文書データとを記憶する。このキーワードを含む文書データは、文書コーパス記憶部１３２から検索された文書データである。 The document corpus is a collection of various document data such as newspaper articles. Further, the dialogue corpus is obtained by adding a dialogue action label (described later) of the utterance pair to a text transcribed of the user utterance and the response sentence (utterance pair). The utterance state storage unit 131 stores a keyword included in the input user utterance and document data including the keyword. The document data including the keyword is document data retrieved from the document corpus storage unit 132.

応答文生成プログラム１０１は、対話状態管理モジュール１４１と、応答文生成モジュール１５８とを備える。
この対話状態管理モジュール１４１は、ＣＰＵ１０が、入力されたユーザ発話から、そのユーザ発話の対話状態を推測（判断）するためのモジュールである。
応答文生成モジュール１５８は、ＣＰＵ１０が、入力されたユーザ発話および推測された当該ユーザ発話の対話状態に基づき、そのユーザ発話に続く応答文を生成するためのモジュールである。
各モジュールの機能および、各記憶部に格納されるデータの詳細は、図２のブロック図を用いて詳細に説明する。 The response sentence generation program 101 includes a dialog state management module 141 and a response sentence generation module 158.
The dialog state management module 141 is a module for the CPU 10 to infer (determine) the dialog state of the user utterance from the input user utterance.
The response sentence generation module 158 is a module for the CPU 10 to generate a response sentence following the user utterance based on the input user utterance and the estimated dialog state of the user utterance.
Details of the function of each module and the data stored in each storage unit will be described in detail with reference to the block diagram of FIG.

図２は、図１の応答文生成装置を機能展開して示したブロック図である。図１を参照しつつ図２を用いて応答文生成装置１００の機能を説明する。 FIG. 2 is a block diagram showing the expanded function of the response sentence generating apparatus of FIG. The function of the response sentence generating device 100 will be described with reference to FIG.

応答文生成装置１００は、ユーザ発話の入力を受け付けるユーザ発話入力部１１０と、入力されたユーザ発話から、そのユーザ発話の対話状態（対話アクションラベル）を判断する対話状態管理部１４０と、入力されたユーザ発話および判断された当該ユーザ発話の対話状態に基づき、そのユーザ発話に続く応答文を生成する応答文生成部１５０と、生成された応答文を出力する応答文出力部１６０と、記憶部１３０とを備える。 The response sentence generation device 100 receives a user utterance input unit 110 that receives an input of a user utterance, and a dialog state management unit 140 that determines a dialog state (dialog action label) of the user utterance from the input user utterance. A response sentence generation unit 150 that generates a response sentence following the user utterance, a response sentence output unit 160 that outputs the generated response sentence, and a storage unit 130.

＜対話状態管理部＞
対話状態管理部１４０は、図１のＣＰＵ１０が対話状態管理モジュール１４１を実行することにより実現される。また、応答文生成部１５０は、ＣＰＵ１０が応答文生成モジュール１５８を実行することにより実現される。
さらに、ユーザ発話入力部１１０および応答文出力部１６０は、図１の入出力インターフェース５０およびＣＰＵ１０による応答文生成プログラム１０１の実行処理により実現される。記憶部１３０は、前記したとおり発話状態記憶部１３１と、文書コーパス記憶部１３２と、対話コーパス記憶部１３３とを備える。 <Dialogue state management department>
The dialog state management unit 140 is realized by the CPU 10 in FIG. 1 executing the dialog state management module 141. The response sentence generation unit 150 is realized by the CPU 10 executing the response sentence generation module 158.
Further, the user utterance input unit 110 and the response sentence output unit 160 are realized by the execution process of the response sentence generation program 101 by the input / output interface 50 and the CPU 10 of FIG. As described above, the storage unit 130 includes the utterance state storage unit 131, the document corpus storage unit 132, and the dialogue corpus storage unit 133.

なお、本実施の形態において、ユーザ発話入力部１１０に入力されるユーザ発話は、テキストデータであり、例えば、図示しない音声処理部が音声データから変換したものである。また、この音声処理部は、応答文生成装置１００の内部に備えるようにしてもよいし、外部に接続するようにしてもよい。さらに、応答文生成装置１００は、ユーザ発話の音声の入力を受け付ける音声入力部（マイクロホン）や、応答文生成装置１００で生成（選択）された応答文を音声として出力する音声出力部（スピーカ）等を備えるようにしてもよい。 In the present embodiment, the user utterance input to the user utterance input unit 110 is text data, and is, for example, converted from voice data by a voice processing unit (not shown). In addition, the voice processing unit may be provided inside the response sentence generating apparatus 100 or may be connected to the outside. Furthermore, the response sentence generation device 100 is a voice input unit (microphone) that accepts input of a user utterance voice, or a voice output unit (speaker) that outputs the response sentence generated (selected) by the response sentence generation device 100 as a voice. Etc. may be provided.

対話状態管理部１４０は、キーワード抽出部１４２と、キーワード追加部１４３と、文書検索部１４４と、発話状態更新部１４５と、対話状態推測部１４６とを備える。 The dialogue state management unit 140 includes a keyword extraction unit 142, a keyword addition unit 143, a document search unit 144, an utterance state update unit 145, and a dialogue state estimation unit 146.

キーワード抽出部１４２は、ユーザ発話入力部１１０から出力されたユーザ発話に含まれるキーワードを抽出する。このときのキーワードの抽出は公知の技術を用いることで可能である。 The keyword extraction unit 142 extracts keywords included in the user utterance output from the user utterance input unit 110. The keyword can be extracted at this time by using a known technique.

キーワード追加部１４３は、ユーザ発話から抽出されたキーワードを発話状態記憶部１３１に追加する。 The keyword adding unit 143 adds the keyword extracted from the user utterance to the utterance state storage unit 131.

文書検索部１４４は、発話状態記憶部１３１に蓄積されたキーワードを読み出し、このキーワードを含む文書データを文書コーパス記憶部１３２から検索する。 The document search unit 144 reads a keyword stored in the utterance state storage unit 131 and searches the document corpus storage unit 132 for document data including the keyword.

発話状態更新部１４５は、検索された文書データを発話状態記憶部１３１に出力し、発話状態記憶部１３１の発話状態を更新する。ここで、発話状態記憶部１３１に記憶される発話状態（発話状態情報）は、今までのユーザ発話から抽出されたキーワードと、そのキーワードを含む文書データとを示したものである。この発話状態を表１に例示する。 The utterance state update unit 145 outputs the retrieved document data to the utterance state storage unit 131 and updates the utterance state of the utterance state storage unit 131. Here, the utterance state (speech state information) stored in the utterance state storage unit 131 indicates a keyword extracted from a user utterance so far and document data including the keyword. This speech state is illustrated in Table 1.

例えば、表１の発話状態には、ユーザ発話には「横浜」、「観光」、「施設」および「教える」というキーワードがそれぞれ１回ずつ出現し、これらのキーワードを含む文書データとして文書コーパス記憶部１３２から「横浜の観光施設を教えるガイドが〇〇に派遣されることになりました。」という文書データが検索されたことを示している。また、文書コーパス記憶部１３２から検索された文書データには、各キーワードの出現回数（スコア）を示すようにしてもよい。 For example, in the utterance state shown in Table 1, the keywords “Yokohama”, “Sightseeing”, “Facility”, and “Teach” appear once in the user utterance, and the document corpus is stored as document data including these keywords. It is shown that the document data “A guide teaching tourist facilities in Yokohama has been dispatched to ○” is retrieved from the department 132. The document data retrieved from the document corpus storage unit 132 may indicate the number of appearances (score) of each keyword.

なお、例えば、ユーザ発話入力部１１０が、現在のユーザ発話の次のユーザ発話として「家族で楽しめるところを。」という発話の入力を受け、キーワード抽出部１４２が「家族」および「楽しめる」がキーワードとして抽出したときには、キーワード追加部１４３は、前記した発話状態に「家族」および「楽しめる」というキーワードを追加登録する。また、文書検索部１４４は、「横浜」、「観光」、「施設」、「家族」および「楽しめる」というキーワードを含む文書データを文書コーパス記憶部１３２から検索する。そして、発話状態更新部１４５は、発話状態に登録された文書データを、検索された文書データに置き換えて発話状態を更新する。 Note that, for example, the user utterance input unit 110 receives an input of an utterance “a place that can be enjoyed by the family” as a user utterance next to the current user utterance, and the keyword extraction unit 142 uses “family” and “enjoy” as keywords. , The keyword adding unit 143 additionally registers the keywords “family” and “enjoyable” in the utterance state described above. Further, the document search unit 144 searches the document corpus storage unit 132 for document data including the keywords “Yokohama”, “sightseeing”, “facility”, “family”, and “enjoyable”. Then, the utterance state updating unit 145 updates the utterance state by replacing the document data registered in the utterance state with the retrieved document data.

対話状態推測部１４６は、発話状態記憶部１３１に記憶された発話状態に基づき、ユーザ発話の対話状態を推測し、このユーザ発話に推測結果を示すラベル値（対話アクションラベル）を付与する。この対話状態の推測には、例えば、統計的な分類器であるサポートベクトルマシン（V.N.Vapnic.The Nature of Statistical Learning Theory,Springer 1995）を用いる。
このサポートベクトルマシンは、対話コーパスのユーザ発話に含まれるキーワードおよびユーザ発話に付された対話アクションラベルに基づき、発話状態記憶部１３１に記憶されたキーワードおよび検索された文書データから算出されたユーザ発話の特徴量（ベクトルで記述）の入力を受けると、このユーザ発話の対話アクションラベルを出力するよう学習されているものとする。 The dialogue state estimation unit 146 estimates the dialogue state of the user utterance based on the utterance state stored in the utterance state storage unit 131, and assigns a label value (interaction action label) indicating the estimation result to the user utterance. For example, a support vector machine (VNVapnic. The Nature of Statistical Learning Theory, Springer 1995), which is a statistical classifier, is used for the estimation of the dialog state.
This support vector machine is based on a keyword included in a user utterance of a dialogue corpus and a dialogue action label attached to the user utterance, and a user utterance calculated from a keyword stored in the utterance state storage unit 131 and searched document data. It is assumed that the user has learned to output the dialogue action label of the user utterance when receiving the feature amount (denoted by vector).

ここで、このサポートベクトルマシンに入力されるベクトルの作成方法を説明する。ここでは、ユーザ発話の特徴量として（１）ユーザ発話に含まれるキーワードと、（２）発話状態（表１参照）における各キーワードの出現回数と、（３）検索された文書データにおける各キーワードの出現回数とをベクトルで記述したものを用いるものとする。
発話状態記憶部１３１の発話状態として登録されたキーワードおよび各キーワードの出現回数が以下の表２に示すようなものであるものとして説明する。 Here, a method of creating a vector input to the support vector machine will be described. Here, as features of user utterances, (1) keywords included in user utterances, (2) the number of occurrences of each keyword in the utterance state (see Table 1), and (3) each keyword in the retrieved document data It is assumed that the number of appearances is described by a vector.
The description will be made assuming that the keywords registered as the utterance state in the utterance state storage unit 131 and the number of appearances of each keyword are as shown in Table 2 below.

また、各キーワードに割り当てる番号を決めておく。例えば、以下の表３に示すように割り当て番号を決める。この割り当て番号は、ベクトルにおいて各キーワードの出現回数の数値を書き込む位置を示す。この情報は、予め応答文生成装置１００の管理者等が設定し、記憶部１３０に格納される。なお、ベクトル作成の際、各キーワードの、発話状態における出現回数を書き込むフィールドと、検索された文書データにおける出現回数を書き込むフィールドとは別個のものとする。図３は、本実施の形態のサポートベクトルマシンに入力されるベクトルを例示した図である。例えば、発話状態における出現回数は符号７０１に示すフィールドに書き込み、検索された文書データにおける出現回数は符号７０２に示すフィールドに書き込むようにする。 In addition, a number assigned to each keyword is determined. For example, assignment numbers are determined as shown in Table 3 below. This allocation number indicates the position where the numerical value of the number of appearances of each keyword is written in the vector. This information is set in advance by the administrator of the response sentence generating apparatus 100 and stored in the storage unit 130. Note that when creating a vector, a field for writing the number of appearances of each keyword in the utterance state and a field for writing the number of appearances in the retrieved document data are different. FIG. 3 is a diagram illustrating vectors input to the support vector machine of the present embodiment. For example, the number of appearances in the utterance state is written in the field indicated by reference numeral 701, and the number of appearances in the retrieved document data is written in the field indicated by reference numeral 702.

対話状態推測部１４６は、これらの情報をもとにサポートベクトルマシンに入力するベクトルを作成する（図３参照）。すなわち、例えば、キーワード「横浜」の割り当て番号は「２」であり、出現回数は「３」なので、左から数えて２番目の位置に「３」を書き込む。他のキーワードについても同様に出現回数を書き込む。また、割り当て番号２,３,５,８以外のキーワードは出現していないので、０とする。同様に、対話状態推測部１４６は、検索された文書データにおける各キーワードの出現回数も書き込む（符号７０２参照）。なお、符号７０２における各キーワードの割り当て番号は、ユーザ発話における出現回数の割り当て番号（表３）とは別個のものを用意する。 The dialog state estimation unit 146 creates a vector to be input to the support vector machine based on these pieces of information (see FIG. 3). That is, for example, since the assigned number of the keyword “Yokohama” is “2” and the number of appearances is “3”, “3” is written in the second position from the left. Similarly, the number of appearances is written for other keywords. In addition, since keywords other than allocation numbers 2, 3, 5, and 8 do not appear, 0 is set. Similarly, the dialogue state estimation unit 146 also writes the number of times each keyword appears in the retrieved document data (see reference numeral 702). Note that the assignment number of each keyword in the reference numeral 702 is prepared separately from the assignment number (Table 3) of the number of appearances in the user utterance.

対話状態推測部１４６は、このようにして作成したベクトルをサポートベクトルマシンに入力し、ユーザ発話の対話状態を推測する。そして、この推測結果である対話アクションラベルと、入力されたユーザ発話とを応答文生成部１５０に出力する。 The dialog state estimation unit 146 inputs the vector created in this way to the support vector machine, and estimates the dialog state of the user utterance. Then, the dialogue action label as the estimation result and the input user utterance are output to the response sentence generation unit 150.

対話状態推測部１４６が付与する対話アクションラベルについて説明する。対話アクションラベルは、ユーザ発話とその応答文（ウィザード発話）との対に対して付与されるラベルである。本実施の形態では、ユーザ発話および応答文の種別が、（１）質問か、質問でないかと、（２）質問であるなら、どのような種類の質問か、により以下の表４に示す８種類に分類し、対話アクションラベルを付与する。 The dialogue action label provided by the dialogue state estimation unit 146 will be described. The dialogue action label is a label given to a pair of a user utterance and a response sentence (wizard utterance). In the present embodiment, the types of user utterances and response sentences are (1) whether the question is not a question, and (2) if the question is a question, depending on what kind of question, eight types shown in Table 4 below. And an interactive action label is assigned.

表４において、ＵＴはユーザ発話を示し、ＷＴはウィザード発話（応答文）を示す。また、ＷＨ型質問とは、Ｗｈｏ（誰）、Ｗｈｉｃｈ（どの）、Ｗｈａｔ（何）、Ｗｈｅｎ（いつ）、Ｗｈｅｒｅ（どこ）、Ｈｏｗ（どのように）に関する質問である。例えば、ユーザ発話がＹＥＳ／ＮＯ型質問であり、これに対するウィザード発話が質問でなければ対話アクションラベルは「１」となる。 In Table 4, UT indicates a user utterance, and WT indicates a wizard utterance (response sentence). The WH type question is a question regarding Who (who), Whoch (what), What (what), When (when), Where (where), and How (how). For example, if the user utterance is a YES / NO question and the wizard utterance is not a question, the dialogue action label is “1”.

次に、対話コーパス記憶部１３３に格納される対話コーパスについて説明する。この対話コーパスは、ユーザ発話とその応答文（ウィザード発話）とからなる発話対を書き起こしたテキストデータに、前記した対話アクションラベルを付与したものである。なお、このときの対話アクションラベルの判断は人手により行われる。
以下に、対話コーパスに含まれる発話対の一例を示す。 Next, the dialogue corpus stored in the dialogue corpus storage unit 133 will be described. This dialogue corpus is obtained by adding the dialogue action label described above to text data in which an utterance pair consisting of a user utterance and a response sentence (wizard utterance) is transcribed. The dialog action label at this time is determined manually.
An example of an utterance pair included in the dialogue corpus is shown below.

ＵＴ：５：箱根でホテルに泊まって、温泉でもゆっくりつかりながら休日を楽しみたいんですが、予算が１人２００００円で、その範囲内でいいホテルに、いいホテルがあったら泊まりたいんですが、どういったところがありますか
ＷＴ：：はい温泉の場所にご希望がございますか UT: 5: I would like to stay at a hotel in Hakone and enjoy a holiday while relaxing in a hot spring, but the budget is 20000 yen per person, and I would like to stay if there is a good hotel within that range. WT :: Yes, do you have any hope for the hot springs?

前記した例では、ウィザード発話が質問なので、「１から４」の対話アクションラベルは候補から外れる。そして、ウィザード発話がＹＥＳ／ＮＯ型質問なので、「５」の対話アクションラベルが付与される。 In the above example, since the wizard utterance is a question, the interactive action label of “1 to 4” is excluded from the candidates. Since the wizard utterance is a YES / NO question, a dialogue action label of “5” is given.

なお、この対話コーパスは、既存の形態素解析技術を用いて対話コーパス中の発話を句と単語の単位に分割し、各単語には品詞情報を付与しておく。品詞が名詞、動詞および形容詞のいずれかであるような単語については、文書コーパス記憶部１３２の文書コーパス中での出現回数を付与する。また、固有表現抽出技術（例えば、Hideki Isozaki, Hideto Kazawa: Efficient Support Vector Classifiers for Named Entity Recognition , Proceedings of COLING-2002, pp.390-396, 2002）を用いて固有表現の抽出および固有表現の種類（例えば、人名、地名等）を付与する。また、この対話コーパスにおける発話対は充分多くの数を備えているものとする。 This dialogue corpus divides the utterances in the dialogue corpus into phrase and word units using existing morphological analysis technology, and gives part of speech information to each word. For words whose part of speech is any of a noun, a verb, and an adjective, the number of appearances in the document corpus of the document corpus storage unit 132 is given. In addition, the extraction of specific expressions and the types of specific expressions using specific expression extraction technology (for example, Hideki Isozaki, Hideto Kazawa: Efficient Support Vector Classifiers for Named Entity Recognition, Proceedings of COLING-2002, pp.390-396, 2002) (For example, person name, place name, etc.). In addition, it is assumed that there are a sufficiently large number of utterance pairs in this dialogue corpus.

＜応答文生成部＞
応答文生成部１５０は、対話コーパス記憶部１３３の対話コーパスから、応答候補文を抽出する応答文抽出部１５１と、この抽出した応答候補文に修正を加え、修正した応答候補文の中から１つの文を選択する応答文修正部１５２とを含んで構成される。 <Response sentence generator>
The response sentence generation unit 150 adds a response sentence extraction unit 151 that extracts a response candidate sentence from the dialogue corpus of the conversation corpus storage unit 133, modifies the extracted response candidate sentence, and selects one of the corrected response candidate sentences. And a response sentence correction unit 152 that selects one sentence.

応答文抽出部１５１は、対話状態推測部１４６から、ユーザ発話と、このユーザ発話の対話アクションラベルとを受け取ると、これらの情報に基づき、このユーザ発話と対話状態が類似しているユーザ発話を対話コーパス記憶部１３３から抽出する。そして、このユーザ発話に対する応答文（ウィザード発話）を応答候補文として抽出し、応答文修正部１５２に受け渡す。つまり、対話コーパスから、入力されたユーザ発話に続く応答文（ウィザード発話）としてふさわしい応答文を応答候補文として抽出する。なお、この応答文抽出部１５１の詳細は、フローチャートを用いて後記する。 When the response sentence extraction unit 151 receives the user utterance and the dialogue action label of the user utterance from the dialogue state estimation unit 146, the response sentence extraction unit 151 selects a user utterance whose dialogue state is similar to the user utterance based on the information. Extracted from the dialogue corpus storage unit 133. Then, a response sentence (wizard utterance) for this user utterance is extracted as a response candidate sentence and transferred to the response sentence correction unit 152. That is, a response sentence suitable as a response sentence (wizard utterance) following the input user utterance is extracted from the dialogue corpus as a response candidate sentence. Details of the response sentence extraction unit 151 will be described later using a flowchart.

ここで、図４を用いて応答文修正部１５２の構成を説明する（適宜図２参照）。図４は、図２の応答文修正部の構成を示すブロック図である。
図４に示すように応答文修正部１５２は、応答文抽出部１５１から出力された各応答候補文における修正箇所を決定する修正箇所決定部１５３と、この各修正箇所における置換候補の語を決定する置換候補決定部１５４と、各応答候補文の修正箇所の語を置換候補の語に置き換えた応答候補文を作成する応答候補文作成部１５５と、作成した応答候補文にスコアを付与するスコア付与部１５６と、作成した応答候補文のうち、最もスコアが高い応答候補文を選択する応答文選択部１５７とを含んで構成される。
これらの各構成要素の詳細は、図７のフローチャートを用いて後記する。 Here, the configuration of the response sentence correction unit 152 will be described with reference to FIG. 4 (see FIG. 2 as appropriate). FIG. 4 is a block diagram illustrating a configuration of the response sentence correction unit in FIG.
As shown in FIG. 4, the response sentence correction unit 152 determines a correction part determination unit 153 that determines a correction part in each response candidate sentence output from the response sentence extraction part 151, and determines a replacement candidate word in each correction part. A replacement candidate determination unit 154, a response candidate sentence creation unit 155 that creates a response candidate sentence in which the word of the corrected portion of each response candidate sentence is replaced with a replacement candidate word, and a score that gives a score to the created response candidate sentence The adding unit 156 and a response sentence selection unit 157 that selects a response candidate sentence with the highest score among the created response candidate sentences are configured.
Details of these components will be described later with reference to the flowchart of FIG.

＜処理手順＞
次に、適宜図１〜図４を参照しつつ、図５を用いて応答文生成装置１００の処理手順の概要を説明する。図５は、図２の応答文生成装置の処理手順の概要を示すフローチャートである。 <Processing procedure>
Next, the outline of the processing procedure of the response sentence generation device 100 will be described with reference to FIGS. FIG. 5 is a flowchart showing an outline of a processing procedure of the response sentence generation device of FIG.

まず、応答文生成装置１００（図２参照）は、システムの初期化を行う（Ｓ４０１）。そして、ユーザ発話入力部１１０は、ユーザ発話の入力を受け付け（Ｓ４０２）、対話状態管理部１４０は、入力されたユーザ発話の対話状態管理を行う（Ｓ４０３）。すなわち、対話状態管理部１４０は、ユーザ発話に含まれるキーワードを抽出し、このキーワードを含む文書データを検索する。次に、キーワードおよびこのキーワードを含む文書データから、このユーザ発話の対話アクションラベルを付与する。そして、応答文生成部１５０は、応答文を生成する（Ｓ４０４）。つまり、応答文生成部１５０はユーザ発話とこのユーザ発話の対話アクションラベルに基づき、対話コーパス記憶部１３３から応答候補文を抽出し、この抽出した応答候補文に修正を加える。そして、この修正した応答候補文にスコアを付与し、最もスコアが高い応答候補文を応答文として選択する。次に、応答文出力部１６０は、選択した応答文を出力し（Ｓ４０５）、Ｓ４０２へ戻る。つまり、次のユーザ発話の入力を受け付ける。
このようにして、応答文生成装置１００は、入力されたユーザ発話に対する応答文を生成し、出力する。 First, the response sentence generator 100 (see FIG. 2) initializes the system (S401). Then, the user utterance input unit 110 receives an input of the user utterance (S402), and the dialogue state management unit 140 performs dialogue state management of the input user utterance (S403). That is, the dialogue state management unit 140 extracts a keyword included in the user utterance and searches for document data including the keyword. Next, the dialogue action label of the user utterance is given from the keyword and the document data including the keyword. Then, the response sentence generation unit 150 generates a response sentence (S404). That is, the response sentence generation unit 150 extracts a response candidate sentence from the dialogue corpus storage unit 133 based on the user utterance and the dialogue action label of the user utterance, and corrects the extracted response candidate sentence. Then, a score is given to the corrected response candidate sentence, and the response candidate sentence having the highest score is selected as the response sentence. Next, the response text output unit 160 outputs the selected response text (S405), and returns to S402. That is, the input of the next user utterance is received.
In this way, the response sentence generating apparatus 100 generates and outputs a response sentence for the input user utterance.

次に、図１〜図４を参照しつつ、図６を用いて図５のＳ４０３の対話状態管理について詳細に説明する。図６は、図５のＳ４０３の対話状態管理の詳細を説明するフローチャートである。 Next, the dialog state management in S403 of FIG. 5 will be described in detail with reference to FIGS. FIG. 6 is a flowchart for explaining the details of the dialog state management in S403 of FIG.

キーワード抽出部１４２は、ユーザ発話入力部１１０から入力されたユーザ発話のテキストデータを形態素解析し、このテキストデータに含まれるキーワードを抽出する（Ｓ５０１）。例えば、キーワード抽出部１４２は、「横浜の観光施設を教えてください」というユーザ発話を形態素解析し、「横浜」、「観光」、「施設」および「教える」というキーワードを抽出する。 The keyword extraction unit 142 performs morphological analysis on the text data of the user utterance input from the user utterance input unit 110, and extracts keywords included in the text data (S501). For example, the keyword extraction unit 142 performs morphological analysis on the user utterance “Please tell me about tourist facilities in Yokohama” and extracts the keywords “Yokohama”, “sightseeing”, “facility”, and “teach”.

次に、キーワード追加部１４３は、ユーザ発話に含まれるキーワードを発話状態記憶部１３１に追加する（Ｓ５０２）。例えば、前記した例でいうと、「横浜」、「観光」、「施設」および「教える」というキーワードを発話状態記憶部１３１に追加登録する。 Next, the keyword adding unit 143 adds a keyword included in the user utterance to the utterance state storage unit 131 (S502). For example, in the above-described example, the keywords “Yokohama”, “sightseeing”, “facility”, and “teaching” are additionally registered in the utterance state storage unit 131.

そして、文書検索部１４４は、発話状態記憶部１３１のすべてのキーワードを含む文書データを文書コーパス記憶部１３２から検索する（Ｓ５０３）。例えば、文書検索部１４４は、発話状態記憶部１３１に「横浜」、「観光」、「施設」および「教える」というキーワードが登録されていれば、このキーワードを含む文書データを検索する。 Then, the document search unit 144 searches the document corpus storage unit 132 for document data including all keywords in the utterance state storage unit 131 (S503). For example, if the keywords “Yokohama”, “Sightseeing”, “Facility”, and “Teach” are registered in the utterance state storage unit 131, the document search unit 144 searches for document data including this keyword.

発話状態更新部１４５は、検索された文書データを発話状態記憶部１３１に出力し（Ｓ５０４）、発話状態を更新する。 The utterance state update unit 145 outputs the retrieved document data to the utterance state storage unit 131 (S504), and updates the utterance state.

対話状態推測部１４６は、発話状態記憶部１３１に記憶された発話状態に基づき、ユーザ発話の対話状態を推測し、このユーザ発話に推測結果を示す対話アクションラベルを付与する（Ｓ５０５）。例えば、ユーザ発話「横浜の観光地を教えてください」の対話アクションラベルとして「３」を付与する。そして、このユーザ発話と、対話アクションラベルを応答文抽出部１５１に受け渡す。 The dialogue state estimation unit 146 estimates the dialogue state of the user utterance based on the utterance state stored in the utterance state storage unit 131, and assigns a dialogue action label indicating the estimation result to the user utterance (S505). For example, “3” is assigned as a dialogue action label of the user utterance “Please tell me about sightseeing spots in Yokohama”. Then, the user utterance and the dialogue action label are transferred to the response sentence extraction unit 151.

次に、図１〜図６を参照しつつ、図７を用いて図５のＳ４０４の応答文生成について詳細に説明する。図７は、図５のＳ４０４の応答文生成の詳細を説明するフローチャートである。 Next, the response sentence generation in S404 of FIG. 5 will be described in detail with reference to FIGS. FIG. 7 is a flowchart for explaining the details of the response sentence generation in S404 of FIG.

まず、応答文抽出部１５１は、対話状態推測部１４６からユーザ発話およびこのユーザ発話の対話アクションラベルを受け取ると、これらの情報に基づき対話コーパス（対話コーパス記憶部１３３）から、このユーザ発話に最も近いユーザ発話を検索する（Ｓ６０１）。 First, when the response sentence extracting unit 151 receives the user utterance and the dialog action label of the user utterance from the dialog state estimating unit 146, the response sentence extracting unit 151 receives the most recent user utterance from the dialog corpus (dialog corpus storage unit 133) based on the information. A near user utterance is searched (S601).

つまり、まず応答文抽出部１５１は、対話状態推測部１４６から受け取った対話アクションラベルと同じラベルを持つユーザ発話を検索する。そして、この検索した発話対におけるユーザ発話と、対話状態推測部１４６から受け取ったユーザ発話との類似度を求め、例えば類似度が高いものからＮ個（所定数）のユーザ発話を選択する。なお、ここでのユーザ発話の選択は、類似度が所定の閾値を超えるものを選択するようにしてもよい。 That is, first, the response sentence extraction unit 151 searches for a user utterance having the same label as the dialogue action label received from the dialogue state estimation unit 146. Then, the degree of similarity between the user utterance in the searched utterance pair and the user utterance received from the dialogue state estimation unit 146 is obtained, and N (predetermined number) user utterances are selected from those having a high degree of similarity, for example. In addition, you may make it select the user utterance here that the similarity exceeds a predetermined threshold value.

また、ここでの類似度は、例えば距離尺度を用いるようにしてもよい。この距離尺度としては、２つの文字列を入力として、類似度を返すようなもの（Vapnic95のカーネル等）を用いることができる。
例えば、「奈良で温泉旅館を探しています」という発話と「京都で温泉旅館を探しています」という発話とは、「奈良で温泉旅館を探しています」という発話と「週末のパッケージツアーを探しています」という発話よりも類似しているので、距離尺度は後者の対よりも、前者の対の方が小さい値となる。したがって、Ｓ６０１において応答文抽出部１５１が類似度として前記した距離尺度を用いるときには、対話距離尺度が最も小さいものからＮ個のユーザ発話を選択する。
そして、応答文抽出部１５１は、このユーザ発話に対する応答文を対話コーパス記憶部１３３から読み出す。 The similarity here may be a distance scale, for example. As this distance measure, a measure (such as a Vapnic95 kernel) that can receive two character strings and return a similarity can be used.
For example, the utterance “I am looking for a hot spring inn in Nara” and the utterance “I am looking for a hot spring inn in Kyoto” and the utterance “I am looking for a hot spring inn in Nara” The distance measure is smaller in the former pair than in the latter pair. Therefore, when the response sentence extraction unit 151 uses the distance scale described above as the similarity in S601, N user utterances are selected from those having the smallest dialog distance scale.
Then, the response sentence extraction unit 151 reads the response sentence for the user utterance from the dialogue corpus storage unit 133.

次に、修正箇所決定部１５３が、Ｓ６０１で選択したユーザ発話の応答文の修正箇所（P_i1,…,P_imi）を決定する（Ｓ６０２）。そして、決定した応答文の修正箇所をメインメモリ２０（図１参照）等に記憶しておく。 Next, the correction part determination unit 153 determines the correction part (P _i1 ,..., P _imi ) of the response sentence of the user utterance selected in S601 (S602). Then, the determined correction part of the response sentence is stored in the main memory 20 (see FIG. 1) or the like.

ここで、図７において
m_i：ｉ番目の応答文での修正箇所の数
P_ij：ｉ番目の応答文でのｊ番目の修正箇所
ｉ：発話のインデックス
ｊ：修正箇所のインデックス
である。
まず、修正箇所決定部１５３は、ｉ＝１を代入して修正箇所を決める。 Here, in FIG.
m _i : Number of corrections in the i-th response sentence
P _ij : j-th corrected portion in the i-th response sentence i: utterance index j: index of the corrected portion.
First, the correction location determination unit 153 determines i by substituting i = 1.

なお、ここでの修正箇所の決定は、まず、応答文の品詞が名詞、動詞および形容詞であるような単語の位置を求め、その単語の対話コーパス中での出現回数（出現頻度）が、所定の閾値よりも低いような単語の位置とする。もし、この単語が対話コーパス中に出現しない単語であれば、文書コーパス（文書コーパス記憶部１３２）における出現回数（出現頻度）を用いるようにする。つまり、修正箇所決定部１５３は、応答文として頻繁に用いられる単語（文）は残し、あまり頻繁に用いられない単語は、修正対象（置換対象）とする。
例えば、対話コーパスから「どういった観光施設をお探しですか」という応答文が抽出されたとき、「観光施設」という単語の出現回数（出現頻度）が所定の閾値よりも低ければ、この単語を修正箇所として決定する。つまり「どういった〜をお探しですか」という頻繁に使われる語の集合（文）を残すようにする。 Here, the correction location is determined by first obtaining the position of a word whose part of speech of the response sentence is a noun, verb, or adjective, and the number of occurrences (appearance frequency) of the word in the dialogue corpus is predetermined. The position of the word is lower than the threshold. If this word does not appear in the dialogue corpus, the number of appearances (appearance frequency) in the document corpus (document corpus storage unit 132) is used. That is, the correction location determination unit 153 leaves words (sentences) that are frequently used as response sentences, and sets words that are not frequently used as correction targets (replacement targets).
For example, when a response sentence “what kind of sightseeing facility are you looking for” is extracted from the dialogue corpus, if the number of appearances (frequency of appearance) of the word “tourism facility” is lower than a predetermined threshold, this word Is determined as a correction point. In other words, a set of frequently used words (sentences) such as “What are you looking for?” Should be left.

次に、置換候補決定部１５４は、Ｓ６０２で決定した修正箇所（P_ij）の置換候補（S_ij ¹ , … ,S_ij ^l）を選択する（Ｓ６０３）。すなわち、Ｓ６０２で決定した応答文の修正箇所の単語の品詞や、その単語が固有表現（固有名詞や数を表す名詞）であるか否か、固有表現の場合はその種類に関する情報を利用して、発話状態記憶部１３１の発話情報のキーワードから、これらの情報が最も似た単語を置換候補として選択する。そして、この置換候補の単語をメインメモリ２０に記憶しておく。まず、修正箇所決定部１５３はｊ＝１を代入して置換候補を選択する。 Next, the replacement candidate determination unit 154 selects replacement candidates (S _ij ¹ ,..., S _ij ^l ) of the correction portion (P _ij ) determined in S602 (S603). That is, the part of speech of the corrected part of the response sentence determined in S602, whether or not the word is a proper expression (a proper noun or a noun representing a number), and in the case of a proper expression, information on the type is used. From the keywords of the utterance information in the utterance state storage unit 131, the word having the most similar information is selected as a replacement candidate. Then, the replacement candidate word is stored in the main memory 20. First, the correction location determination unit 153 substitutes j = 1 and selects a replacement candidate.

次に、ｊ＜m_iであるとき（Ｓ６０４のＹｅｓ）、つまりｉ番目の応答文のすべての修正箇所（１番目からm_i番目）について置換候補を選択していないとき、置換候補決定部１５４はｊの値をインクリメントして（Ｓ６０５）、Ｓ６０３の処理を実行する。そして、ｊ＝m_iのとき（Ｓ６０４のＮｏ）、つまり置換候補決定部１５４がｉ番目の応答文のすべての修正箇所について置換候補を選択したとき、Ｓ６１１へ進む。Ｓ６１１では、ｉ＜Ｎであるとき（Ｓ６１１のＹｅｓ）、つまり修正箇所決定部１５３が、Ｓ６０１で選択したＮ個のユーザ発話の応答文のうち、まだ修正箇所を決定しない応答文があるとき、ｉの値をインクリメントして（Ｓ６１２）、Ｓ６０２の処理を実行する。そして、ｉ＝Ｎのとき（Ｓ６１１のＮｏ）、つまりＳ６０１で選択したＮ個のユーザ発話の応答文のすべての修正箇所を決定したとき、応答候補文作成部１５５はメインメモリ２０に記憶された各応答文の修正箇所と、その修正箇所に置換する置換候補の単語を読み出し、読み出した内容にしたがって各応答文の修正を行う（Ｓ６２１）。このようにして、応答候補文作成部１５５は、各応答文の修正箇所に各置換候補の単語に置換した応答候補文のリストを作成する。 Then, j <when a m _i (S604 of Yes), that is, when the i-th (m _i-th from the first) All corrected portion of the response sentence does not select a replacement candidate for substitution candidate determination section 154 Increments the value of j (S605) and executes the process of S603. When j = m _i (No in S604), that is, when the replacement candidate determination unit 154 selects replacement candidates for all the corrected portions of the i-th response sentence, the process proceeds to S611. In S611, when i <N (Yes in S611), that is, when there is a response sentence in which the correction part determination unit 153 has not yet determined a correction part among the N user utterance response sentences selected in S601. The value of i is incremented (S612), and the process of S602 is executed. When i = N (No in S611), that is, when all the corrected parts of the response sentences of the N user utterances selected in S601 are determined, the response candidate sentence creation unit 155 is stored in the main memory 20. The correction part of each response sentence and the replacement candidate word to be replaced with the correction part are read, and each response sentence is corrected according to the read contents (S621). In this way, the response candidate sentence creation unit 155 creates a list of response candidate sentences that are replaced with the words of the respective replacement candidates at the correction portions of the respective response sentences.

次に、スコア付与部１５６は、応答候補文作成部１５５が作成した各応答候補文にスコアを付与する（Ｓ６２２）。このときのスコアは、各応答候補文にキーワード（発話状態記憶部１３１に登録されているキーワード）がいくつ含まれているかによって決定する。そして、最もスコアの高い応答候補文を応答文として出力する（Ｓ６２３）。つまり、応答文選択部１５７は、このスコアが最も高い応答候補文を応答文として選択し、応答文出力部１６０は、この選択された応答文を出力する。すなわち、応答文出力部１６０は、今までのユーザ発話に含まれているキーワードを、より多く含んでいる応答文を出力する。 Next, the score giving unit 156 gives a score to each response candidate sentence created by the response candidate sentence creating unit 155 (S622). The score at this time is determined by how many keywords (keywords registered in the utterance state storage unit 131) are included in each response candidate sentence. Then, the response candidate sentence with the highest score is output as a response sentence (S623). That is, the response sentence selection unit 157 selects the response candidate sentence having the highest score as a response sentence, and the response sentence output unit 160 outputs the selected response sentence. That is, the response sentence output unit 160 outputs a response sentence that includes more keywords included in the user utterances so far.

応答文生成装置１００は、以上のような手順により、入力されたユーザ発話に対する応答文を生成し、出力する。
このように本実施の形態の応答文生成装置１００は、従来技術のように複雑なルールを備えるバックエンドを用いなくても、ユーザ発話に対する応答文を作成し、出力することができる。 The response sentence generation device 100 generates and outputs a response sentence for the input user utterance by the procedure as described above.
As described above, the response sentence generation apparatus 100 according to the present embodiment can create and output a response sentence for a user utterance without using a back end having a complicated rule as in the prior art.

本実施の形態に係る応答文生成装置１００は、前記したような処理を実行させる応答文生成プログラム１０１によって実現することができ、このプログラムをコンピュータによる読み取り可能な記憶媒体（ＣＤ−ＲＯＭ等）に記憶して提供することが可能である。また、そのプログラムを、インターネット等のネットワークを通して提供することも可能である。 The response sentence generation device 100 according to the present embodiment can be realized by the response sentence generation program 101 that executes the processing as described above, and this program can be stored in a computer-readable storage medium (CD-ROM or the like). It can be stored and provided. It is also possible to provide the program through a network such as the Internet.

本実施の形態の応答文生成装置の構成を示すブロック図である。It is a block diagram which shows the structure of the response sentence production | generation apparatus of this Embodiment. 図１の応答文生成装置を機能展開して示したブロック図である。It is the block diagram which expanded and showed the function of the response sentence production | generation apparatus of FIG. 本実施の形態のサポートベクトルマシンに入力されるベクトルを例示した図である。It is the figure which illustrated the vector inputted into the support vector machine of this embodiment. 図２の応答文修正部の構成を示すブロック図である。It is a block diagram which shows the structure of the response sentence correction part of FIG. 図２の応答文生成装置の処理手順の概要を示すフローチャートである。It is a flowchart which shows the outline | summary of the process sequence of the response sentence production | generation apparatus of FIG. 図５のＳ４０３の対話状態管理の詳細を説明するフローチャートである。6 is a flowchart for explaining details of dialog state management in S403 of FIG. 図５のＳ４０４の応答文生成の詳細を説明するフローチャートである。It is a flowchart explaining the detail of the response sentence production | generation of S404 of FIG.

Explanation of symbols

１０ＣＰＵ（Central Processing Unit）
２０メインメモリ
５０入出力インターフェース
１００応答文生成装置
１０１応答文生成プログラム
１１０ユーザ発話入力部
１３０記憶部
１３１発話状態記憶部
１３２文書コーパス記憶部
１３３対話コーパス記憶部
１４０対話状態管理部
１４１対話状態管理モジュール
１４２キーワード抽出部
１４３キーワード追加部
１４４文書検索部
１４５発話状態更新部
１４６対話状態推測部
１５０応答文生成部
１５１応答文抽出部
１５２応答文修正部
１５３修正箇所決定部
１５４置換候補決定部
１５５応答候補文作成部
１５６スコア付与部
１５７応答文選択部
１５８応答文生成モジュール
１６０応答文出力部 10 CPU (Central Processing Unit)
DESCRIPTION OF SYMBOLS 20 Main memory 50 Input / output interface 100 Response sentence production | generation apparatus 101 Response sentence production | generation program 110 User utterance input part 130 Storage part 131 Speech state storage part 132 Document corpus storage part 133 Dialogue corpus storage part 140 Dialogue state management part 141 Dialogue state management module 142 Keyword Extraction Unit 143 Keyword Addition Unit 144 Document Search Unit 145 Speech State Update Unit 146 Dialogue State Estimation Unit 150 Response Sentence Generation Unit 151 Response Sentence Extraction Unit 152 Response Sentence Correction Unit 153 Correction Location Determination Unit 154 Replacement Candidate Determination Unit 155 Response Candidate Sentence creation unit 156 Score assignment unit 157 Response statement selection unit 158 Response statement generation module 160 Response statement output unit

Claims

A response sentence generation device for generating a response sentence for a user utterance,
A user utterance input unit that accepts input of the user utterance;
A keyword extraction unit that analyzes the input user utterance and extracts a keyword included in the user utterance;
An utterance state storage unit for storing the extracted keyword and document data including the keyword;
A keyword adding unit for adding the extracted keyword to the utterance state storage unit;
A document data retrieval unit that retrieves document data including the extracted keyword from a document corpus storage unit that stores a document corpus that is a set of predetermined document data;
Outputting the searched document data to the utterance state storage unit, and updating the document data in the utterance state storage unit;
Dialog corpus storage for storing a dialogue corpus in which a dialogue state label value identified by a combination of the type of the user utterance and the type of the response sentence is assigned to each user utterance in the utterance pair composed of the user utterance and the response sentence. And
(1) a keyword stored in the utterance state storage unit and a feature amount of the user utterance calculated based on the retrieved document data, and (2) a keyword included in the user utterance of the dialog corpus and a label of the user utterance A dialogue state estimation unit that determines a label value of the input user utterance based on the value and
A response sentence extraction unit that extracts a response sentence to a user utterance having the same label value as the determined label value from the dialogue corpus storage unit;
A correction location determination unit for determining a correction location of each extracted response sentence;
A replacement candidate determination unit that selects a replacement candidate word at each of the determined correction locations from words included in at least one of the document data of the utterance state storage unit and the dialogue corpus;
A response candidate sentence creating unit that creates a response candidate sentence by replacing the selected replacement candidate word at the correction location of each extracted response sentence;
Among the created response candidate sentences, a response sentence selection unit that selects a response candidate sentence that includes the most keywords stored in the utterance state storage unit;
A response sentence output unit for outputting the selected response candidate sentence;
A response sentence generation device comprising:

The response sentence extraction unit
A user utterance having the same label value as the determined label value and a response to the user utterance are searched from the dialogue corpus, and the similarity between the input user utterance and the searched user utterance is determined. Calculating, selecting a predetermined number from the searched user utterances having the highest similarity, and extracting response sentences for the selected predetermined number of user utterances from the dialogue corpus. Item 4. The response sentence generation device according to Item 1.

The feature amount of the user utterance is:
It is described by a combination of a keyword stored in the utterance state storage unit, the number of appearances of the keyword in the utterance state storage unit, and the number of appearances of the keyword in the retrieved document data. The response sentence generation device according to claim 1 or 2.

The correction location determination unit
4. The position of a word whose appearance frequency in the dialogue corpus is lower than a predetermined threshold among the words included in the extracted response sentence is determined as the correction portion. The response sentence generation device according to any one of the above.

A voice processing unit that converts voice data of the user utterance into text data, outputs the text data to the user utterance input unit, and outputs text data of the response sentence output from the response sentence output unit as voice data; The response sentence generation device according to claim 1, wherein:

The label value of the conversation state is
Whether or not the user utterance and the response sentence are question sentences, and if they are question sentences, a YES / NO question sentence, a WH question sentence, an enumerated question sentence that should list a plurality of answers, and others The response sentence generation device according to any one of claims 1 to 5, wherein the answer sentence generation device is determined depending on which of the question sentences is classified.

A response sentence generation method for generating a response sentence for a user utterance,
Dialog corpus storage for storing a dialogue corpus in which a dialogue state label value identified by a combination of the type of the user utterance and the type of the response sentence is assigned to each user utterance in the utterance pair composed of the user utterance and the response sentence. A response sentence generation device comprising a unit,
Receiving an input of the user utterance;
Analyzing the input user utterance and extracting a keyword included in the user utterance;
Adding the extracted keyword to an utterance state storage unit that stores the extracted keyword and document data including the keyword;
Retrieving document data including the extracted keyword from a document corpus storage unit that stores a document corpus that is a set of predetermined document data;
Outputting the retrieved document data to the utterance state storage unit and updating the document data in the utterance state storage unit;
(1) a feature amount of a user utterance calculated based on a keyword stored in the utterance state storage unit and the retrieved document data;
(2) determining a label value of the input user utterance based on a keyword included in the user utterance of the dialogue corpus and a label value of the user utterance;
Extracting a response sentence to a user utterance having the same label value as the determined label value from the dialogue corpus storage unit;
Determining a correction location of each extracted response sentence;
Selecting a replacement candidate word at each determined correction location from words included in at least one of the document data of the utterance state storage unit and the dialogue corpus;
Creating a response candidate sentence in which the selected replacement candidate word is replaced at the correction location of each extracted response sentence;
Selecting a response candidate sentence including the most keywords stored in the utterance state storage unit from among the created response candidate sentences;
Outputting the selected response candidate sentence;
The response sentence generation method characterized by performing.

A response sentence generation program for causing a computer which is the response sentence generation apparatus to execute the response sentence generation method according to claim 7.

A computer-readable storage medium storing the response sentence generating program according to claim 8.