JP2021196798A

JP2021196798A - Interactive system and control method for interactive system

Info

Publication number: JP2021196798A
Application number: JP2020102121A
Authority: JP
Inventors: 利昇三好; Toshinori Miyoshi; 健三黒土; Kenzo Kurotsuchi; 力光井; Tsutomu Mitsui
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2020-06-12
Filing date: 2020-06-12
Publication date: 2021-12-27
Anticipated expiration: 2040-06-12
Also published as: JP7416665B2

Abstract

To provide an interactive system that accurately returns an appropriate response to various input expressions of a user and to provide a control method therefor.SOLUTION: In an interactive system 1, an interactive device 10 includes: in interactive processing unit 7 that outputs a response sentence to an input sentence; a storage unit that stores: a question/answer data that associates an assumed input sentence, assuming an input sentence, with a response sentence; a synonym dictionary; a list of discrimination terms, which are terms to be distinguished from each other according to a topic of an interaction; and a similarity degree calculation model that determines a degree of similarity between the input sentence and the assumed input sentence; and a similarity degree calculation model generating unit 6 that generates the similarity degree calculation model. The interactive system generates the similarity degree calculation model in which a degree of similarity is calculated such that synonyms recorded in the synonym dictionary have a high degree of similarity and discrimination terms in the list of discrimination terms have a low degree of similarity; obtains a degree of similarity between the input sentence and the assumed input sentence by using the similarity calculation model; selects the assumed input sentence, based on the obtained degree of similarity; and outputs the response sentence corresponding to the selected assumed input sentence.SELECTED DRAWING: Figure 2

Description

本発明は、対話システム、及び対話システムの制御方法に関する。 The present invention relates to a dialogue system and a method for controlling the dialogue system.

特許文献１には、ユーザとの間の効率的な対話を可能とすることを目的として構成された対話システムについて記載されている。対話システムは、自然言語理解部により所定フォーマットに変換されたユーザ発話を取得し、所定フォーマットのユーザ発話に基づいて、現在の対話状態を更新し、第１ポリシモデルを使用し、更新した対話状態に基づき現在のサブドメインを決定し、現在のサブドメインに関連づけられた第２ポリシモデルを使用して、対話状態に基づき行動を決定する。また、対話システムは、データベースからサンプリングされた対話において、対話状態の複雑性を低減する。 Patent Document 1 describes a dialogue system configured for the purpose of enabling efficient dialogue with a user. The dialogue system acquires the user utterance converted into a predetermined format by the natural language understanding unit, updates the current dialogue state based on the user utterance in the predetermined format, uses the first policy model, and updates the dialogue state. The current subdomain is determined based on, and the behavior is determined based on the dialogue state using the second policy model associated with the current subdomain. The dialogue system also reduces the complexity of the dialogue state in the dialogue sampled from the database.

特開２０１９−１９１５１７号公報Japanese Unexamined Patent Publication No. 2019-191517

近年、問い合わせや相談の自動対応、対話型検索、対話型の機器操作等、ユーザからの音声やテキスト等の入力に対して自動で応答する情報処理システム（以下、「対話システム」と称する。）の開発が進んでいる。対話システムによれば、ユーザに対し、例えば、常時かつ即時の問い合わせ対応、簡便な情報検索、機器操作案内等のサービスを提供することができる。 In recent years, an information processing system that automatically responds to user input such as voice and text, such as automatic response to inquiries and consultations, interactive search, and interactive device operation (hereinafter referred to as "dialogue system"). Development is in progress. According to the dialogue system, it is possible to provide a user with services such as constant and immediate inquiry response, simple information retrieval, and device operation guidance.

対話システムには、例えば、ユーザの多様な入力表現に対して、表現の同義性を吸収しつつ区別を要する表現については区別して精度よく適切な応答を返すことが求められる。特許文献１に記載の対話システムは、複合対話ドメインにおいて対話システムとユーザとの間の効率的な対話の実現を図るものであるが、上記のような観点から対話システムの品質を向上する仕組みについては開示されていない。 The dialogue system is required to, for example, respond to various input expressions of the user by absorbing the synonyms of the expressions, distinguishing the expressions that require distinction, and returning an accurate and appropriate response. The dialogue system described in Patent Document 1 aims to realize an efficient dialogue between the dialogue system and the user in the composite dialogue domain. Regarding a mechanism for improving the quality of the dialogue system from the above viewpoints. Is not disclosed.

本発明は、こうした背景に鑑みてなされたものであり、ユーザの多様な入力表現に対して精度よく適切な応答を返すことが可能な、対話システム、及び対話システムの制御方法を提供することを目的としている。 The present invention has been made in view of such a background, and provides a dialogue system and a control method of the dialogue system capable of returning an accurate and appropriate response to various input expressions of the user. I am aiming.

上記目的を達成するための本発明の一つは、情報処理装置を用いて構成される対話システムであって、入力文に対して応答文を出力する対話処理部と、前記入力文を想定した文である想定入力文と前記応答文とを対応付けた質問応答データ、同義語辞書、対話のトピックに応じて互いに区別されるべき用語である区別用語のリストである区別用語リスト、及び入力文と想定入力文との類似度を求める類似度算出モデルを記憶する記憶部と、前記類似度算出モデルを生成する類似度算出モデル生成部と、を備え、前記類似度算出モデル生成部は、前記同義語辞書に記録されている同義語同士は類似度が高く、前記区別用語リストの区別用語同士は類似度が低くなるように類似度を算出する類似度算出モデルを生成し、前記対話処理部は、前記入力文と前記想定入力文との類似度を、前記類似度算出モデルを用いて求め、求めた前記類似度に基づき前記想定入力文を選択し、選択した当該想定入力文に対応する応答文を出力する。 One of the present inventions for achieving the above object is a dialogue system configured by using an information processing apparatus, assuming a dialogue processing unit that outputs a response statement to an input statement and the input statement. Question-and-answer data in which the assumed input sentence, which is a sentence, and the response sentence are associated with each other, a synonym dictionary, a distinctive term list, which is a list of distinctive terms that should be distinguished from each other according to the topic of dialogue, and an input sentence. A storage unit that stores a similarity calculation model for obtaining the similarity between the sentence and the assumed input sentence, and a similarity calculation model generation unit that generates the similarity calculation model, and the similarity calculation model generation unit is described above. A similarity calculation model is generated in which the similarity is calculated so that the synonyms recorded in the synonym dictionary have a high degree of similarity and the distinctive words in the distinction term list have a low degree of similarity. Obtains the similarity between the input sentence and the assumed input sentence by using the similarity calculation model, selects the assumed input sentence based on the obtained similarity, and corresponds to the selected assumed input sentence. Output the response statement.

その他、本願が開示する課題、及びその解決方法は、発明を実施するための形態の欄、及び図面により明らかにされる。 In addition, the problems disclosed in the present application and the solutions thereof will be clarified by the column of the form for carrying out the invention and the drawings.

本発明によれば、ユーザの多様な入力表現に対して精度よく適切な応答を返すことができる。 According to the present invention, it is possible to return an accurate and appropriate response to various input expressions of the user.

対話システムの概略的な構成を示す図である。It is a figure which shows the schematic structure of the dialogue system. 対話システムを構成する情報処理装置のハードウェア構成例を示す図である。It is a figure which shows the hardware configuration example of the information processing apparatus which constitutes a dialogue system. 対話システムの基本動作の概略を示す図である。It is a figure which shows the outline of the basic operation of a dialogue system. 質問応答データの起源となるＦＡＱのＷｅｂページを示す図である。It is a figure which shows the Web page of FAQ which is the origin of question answering data. ＦＡＱに基づき生成される質問応答データの概略を示す図である。It is a figure which shows the outline of the question answering data generated based on FAQ. 質問応答データの起源となるマニュアルを示す図である。It is a figure which shows the manual which is the origin of the question answering data. マニュアルに基づき生成される質問応答データの概略を示す図である。It is a figure which shows the outline of the question answering data generated based on a manual. 質問応答データの起源となる仕様書を示す図である。It is a figure which shows the specification which is the origin of the question answering data. 仕様書に基づき生成されるシナリオ表形式の質問応答データの概略を示す図である。It is a figure which shows the outline of the question answering data of a scenario table format generated based on a specification. シナリオ表形式の質問応答データに基づく対話手順を説明する図である。It is a figure explaining the dialogue procedure based on the question answering data of a scenario table format. 同義語辞書の一例である。This is an example of a synonym dictionary. 区別用語リストの一例である。This is an example of a list of distinguishing terms. 主要用語リストの一例である。This is an example of a list of main terms. ユーザ端末に表示される画面の一例である。This is an example of a screen displayed on a user terminal. ユーザ端末に表示される画面の一例である。This is an example of a screen displayed on a user terminal. ユーザ端末に表示される画面の一例である。This is an example of a screen displayed on a user terminal. ユーザ端末に表示される画面の一例である。This is an example of a screen displayed on a user terminal.

以下、図面を参照しつつ本発明の実施形態について説明する。尚、以下の記載及び図面は、本発明を説明するための例示であって、説明の明確化のため、適宜、省略及び簡略化がなされている。本発明は、他の種々の形態でも実施する事が可能である。とくに限定しない限り、各構成要素は単数でも複数でも構わない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. The following description and drawings are examples for explaining the present invention, and are appropriately omitted or simplified for the sake of clarification of the description. The present invention can also be implemented in various other forms. Unless otherwise specified, each component may be singular or plural.

以下に説明する発明の構成において、同一部分又は同様な機能を有する部分には同一の符号を異なる図面間で共通して用い、重複する説明は省略することがある。また、同一あるいは同様な機能を有する要素が複数ある場合に同一の符号に異なる添字を付して説明することがある。但し、複数の要素を区別する必要がない場合は添字を省略して説明することがある。 In the configuration of the invention described below, the same reference numerals may be used in common among different drawings for the same parts or parts having similar functions, and duplicate description may be omitted. Further, when there are a plurality of elements having the same or similar functions, the same code may be described with different subscripts. However, if it is not necessary to distinguish between multiple elements, the subscript may be omitted for explanation.

以下の説明において、「データ」という表現にて各種のデータを説明することがあるが、各種のデータは、テーブルやリスト等の他のデータ構造で表現されていてもよい。また、識別情報について説明する際に、「識別子」、「ＩＤ」等の表現を用いるが、これらについてはお互いに置換することが可能である。また、以下の説明において、符号の前に付した「ｓ」の文字は処理ステップの意味である。 In the following description, various types of data may be described by the expression "data", but various types of data may be expressed by other data structures such as tables and lists. Further, when the identification information is described, expressions such as "identifier" and "ID" are used, but these can be replaced with each other. Further, in the following description, the character "s" added before the reference numeral means a processing step.

図１に、一実施形態として例示する情報処理システムである対話システム１の概略的な構成を示している。同図に示すように、対話システム１は、対話装置１０と、対話装置１０と通信ネットワーク３０を介して通信可能に接続するユーザ端末４０とを含む。 FIG. 1 shows a schematic configuration of a dialogue system 1 which is an information processing system exemplified as an embodiment. As shown in the figure, the dialogue system 1 includes a dialogue device 10 and a user terminal 40 that is communicably connected to the dialogue device 10 via a communication network 30.

対話装置１０は、ユーザからの問い合わせ文等のユーザから受け付けたテキスト形式の文（以下、「入力文」と称する。）に対する応答文を生成してユーザ端末４０に送信する。対話装置１０とユーザとの間の対話は、例えば、Ｗｅｂページを介したチャット形式で行われる。 The dialogue device 10 generates a response sentence to a text-formatted sentence (hereinafter, referred to as “input sentence”) received from the user such as an inquiry sentence from the user, and transmits the response sentence to the user terminal 40. The dialogue between the dialogue device 10 and the user is performed, for example, in a chat format via a Web page.

図２は、対話装置１０の主な機能を説明するシステムフロー図である。同図に示すように、対話装置１０は、対話コンテンツ管理部５、類似度算出モデル生成部６、及び対話処理部７の各機能を備える。対話コンテンツ管理部５は、質問応答生成部５１、区別用語リスト生成部５２、及び主要用語リスト生成部５３の各機能を含む。対話処理部７は、質問応答生成部７１、及び入力補助部７２の各機能を含む。対話装置１０は、図示しない記憶部を備える。記憶部は、テキストデータ２１、対話コンテンツ２２（質問応答データ２２１、同義語辞書２２２、区別用語リスト２２３）、類似度算出モデル２３、及び主要用語リスト２４を記憶する。 FIG. 2 is a system flow diagram illustrating the main functions of the dialogue device 10. As shown in the figure, the dialogue device 10 includes the functions of the dialogue content management unit 5, the similarity calculation model generation unit 6, and the dialogue processing unit 7. The dialogue content management unit 5 includes the functions of the question answering generation unit 51, the distinctive term list generation unit 52, and the main term list generation unit 53. The dialogue processing unit 7 includes the functions of the question answering generation unit 71 and the input assisting unit 72. The dialogue device 10 includes a storage unit (not shown). The storage unit stores the text data 21, the dialogue content 22 (question response data 221, synonym dictionary 222, distinctive term list 223), the similarity calculation model 23, and the main term list 24.

対話コンテンツ管理部５の質問応答生成部５１は、テキストデータ２１に基づき質問応答データ２２１を生成する。テキストデータ２１は、例えば、業務マニュアル、業務報告書、仕様書、Ｗｅｂページ等から取得される、テキスト形式のデータである。尚、質問応答データ２２１は、ユーザが手動で作成（入力）してもよい。 The question answering generation unit 51 of the dialogue content management unit 5 generates the question answering data 221 based on the text data 21. The text data 21 is, for example, text data obtained from a business manual, a business report, a specification, a Web page, or the like. The question answering data 221 may be manually created (input) by the user.

質問応答データ２２１は、質問文と応答文との対応や対話の流れ等に関する情報を含む。具体的には、質問応答データ２２１は、ユーザからの質問と当該質問に対する応答との組合せ、対話装置１０側からユーザヘの質問に対してユーザが何らかの応答文を返したときに、その応答文と次にユーザに送出する質問文との組合せ等を集約したデータである。 The question answering data 221 includes information on the correspondence between the question text and the response text, the flow of dialogue, and the like. Specifically, the question answering data 221 is a combination of a question from the user and a response to the question, and when the user returns some response sentence to the question from the dialogue device 10 side to the user, the response sentence and the response sentence. It is data that aggregates the combination with the question text to be sent to the user next.

対話処理部７は、ユーザ端末４０を介してテキストデータを送受信することによりユーザと対話処理を行う。対話処理部７の質問応答生成部７１は、ユーザ端末４０から入力文を受信すると、受信した入力文に対応する応答文を質問応答データ２２１から検索する。検索に際し、質問応答生成部７１は、同義語辞書２２２、区別用語リスト２２３、及び類似度算出モデル２３を用いる。 The dialogue processing unit 7 performs dialogue processing with the user by transmitting and receiving text data via the user terminal 40. When the question response generation unit 71 of the dialogue processing unit 7 receives the input sentence from the user terminal 40, the question response generation unit 71 searches the question response data 221 for the response sentence corresponding to the received input sentence. In the search, the question answering generation unit 71 uses the synonym dictionary 222, the distinctive term list 223, and the similarity calculation model 23.

同義語辞書２２２は、同義語を対応付けた情報を含む。同義語辞書２２２は、既存の同義語辞書でもよいし、例えば、対話装置１０がユーザとの間で行った対話に際して取得した情報（質問文、応答文等）を分析することにより自動生成したものでもよい。 The synonym dictionary 222 includes information associated with synonyms. The synonym dictionary 222 may be an existing synonym dictionary, or is automatically generated, for example, by analyzing information (question sentence, response sentence, etc.) acquired by the dialogue device 10 during a dialogue with the user. But it may be.

区別用語リスト２２３は、話題（以下、「トピック」と称する。)毎に区別を要する用
語の組合せ（以下、「区別用語」と称する。）のリストを含む。区別用語リスト２２３は、例えば、対話コンテンツ管理部５の区別用語リスト生成部５２が、質問応答データ２２１、同義語辞書２２２、テキストデータ２１等を用いて生成する。また、区別用語リスト２２３は、ユーザが手動で作成（入力）してもよい。 The distinguishing term list 223 includes a list of combinations of terms (hereinafter referred to as "distinguishing terms") that require distinction for each topic (hereinafter referred to as "topic"). The distinctive term list 223 is generated by, for example, the distinctive term list generation unit 52 of the dialogue content management unit 5 using the question and answer data 221, the synonym dictionary 222, the text data 21, and the like. Further, the distinctive term list 223 may be manually created (input) by the user.

尚、対話処理部７による上記の検索において区別用語リスト２２３を用いるのは、ユーザの多様な入力表現に対して精度よく適切な応答を返すためである。表記が類似する単語同士であっても、対話のトピックによっては表現を区別しなければならないことがあるが、同義語辞書２２２は、同義表現を統一して扱うためのものであり、同義語辞書２２２のみに基づき質問応答生成部７１が検索を行うと、対話のトピックに応じた区別を要する表現を扱えなくなることがある。本実施形態の対話装置１０は、区別用語リスト２２３を併用することで、ユーザの多様な入力表現に対して表現の同義性を吸収しつつ区別を要する表現については区別して精度よく適切な応答を返す。 The reason why the distinctive term list 223 is used in the above search by the dialogue processing unit 7 is to return an accurate and appropriate response to various input expressions of the user. Even if words have similar notations, it may be necessary to distinguish expressions depending on the topic of dialogue, but the synonym dictionary 222 is for handling synonymous expressions in a unified manner, and is a synonym dictionary. If the question answering generation unit 71 performs a search based only on 222, it may not be possible to handle expressions that require distinction according to the topic of dialogue. By using the distinctive term list 223 together, the dialogue device 10 of the present embodiment absorbs the synonyms of expressions for various input expressions of the user, distinguishes expressions that require distinction, and provides an accurate and appropriate response. return.

類似度算出モデル２３は、入力文と、質問応答データ２２１における想定入力文との類
似度を求める関数や機械学習モデルである。類似度算出モデル２３は、類似度算出モデル生成部６によって生成される。対話処理部７は、ユーザとの対話に際し、類似度算出モデル２３を用いることで、入力文に一致する入力文（以下、「想定入力文」と称する。）が質問応答データ２２１に定義されていない場合でも、入力文が質問応答データ２２１におけるいずれの入力文に対応するのかを特定する。 The similarity calculation model 23 is a function or a machine learning model for obtaining the similarity between the input sentence and the assumed input sentence in the question response data 221. The similarity calculation model 23 is generated by the similarity calculation model generation unit 6. By using the similarity calculation model 23 when the dialogue processing unit 7 interacts with the user, an input sentence (hereinafter, referred to as “assumed input sentence”) matching the input sentence is defined in the question response data 221. Even if there is no input sentence, it is specified which input sentence in the question / answer data 221 corresponds to.

対話コンテンツ管理部５の主要用語リスト生成部５３は、対話装置１０の対話処理部７が、ユーザが適切な入力文を入力できるように補助する機能である入力補助部７２によって参照される主要用語リスト２４を生成する。尚、主要用語リスト２４は、例えば、ユーザが手動で作成（入力）してもよい。 The main term list generation unit 53 of the dialogue content management unit 5 is a main term referred to by the input assist unit 72, which is a function of assisting the dialogue processing unit 7 of the dialogue device 10 to input an appropriate input sentence. Generate Listing 24. The main term list 24 may be manually created (input) by the user, for example.

図３に、対話装置１０やユーザ端末４０を構成する情報処理装置１００のハードウェア構成の一例を示す。同図に示すように、情報処理装置１００は、プロセッサ１０１、主記憶装置１０２、通信装置１０３、入力装置１０４、出力装置１０５、及び補助記憶装置１０６を備える。 FIG. 3 shows an example of the hardware configuration of the information processing device 100 constituting the dialogue device 10 and the user terminal 40. As shown in the figure, the information processing device 100 includes a processor 101, a main storage device 102, a communication device 103, an input device 104, an output device 105, and an auxiliary storage device 106.

プロセッサ１０１は、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）、ＧＰＵ（Graphics Processing Unit）、ＡＩ（Artificial Intelligence）チップ、ＦＰＧＡ（Field Programmable Gate Array）、ＳｏＣ（System on Chip）、ＡＳＩＣ（Application Specific Integrated Circuit）等を用いて構成される。 The processor 101 is, for example, a CPU (Central Processing Unit), an MPU (Micro Processing Unit), a GPU (Graphics Processing Unit), an AI (Artificial Intelligence) chip, an FPGA (Field Programmable Gate Array), a SoC (System on Chip), and an ASIC. (Application Specific Integrated Circuit) etc. are used.

主記憶装置１０２は、プログラムやデータを記憶する装置であり、例えば、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、不揮発性メモリ（ＮＶＲＡＭ（Non Volatile RAM））等である。 The main storage device 102 is a device for storing programs and data, and is, for example, a ROM (Read Only Memory), a RAM (Random Access Memory), a non-volatile memory (NVRAM (Non Volatile RAM)), and the like.

通信装置１０３は、通信ネットワークや通信ケーブル等を介してユーザ端末等の他の情報処理装置との間で通信を行う装置であり、無線又は有線の通信モジュール（無線通信モジュール、通信ネットワークアダプタ、ＵＳＢモジュール等）である。 The communication device 103 is a device that communicates with other information processing devices such as a user terminal via a communication network, a communication cable, or the like, and is a wireless or wired communication module (wireless communication module, communication network adapter, USB). Module etc.).

入力装置１０４と出力装置１０５は、対話装置１０のユーザインタフェースを構成する。入力装置１０４は、外部からのユーザ入力やデータ入力を受け付けるユーザインタフェースであり、例えば、キーボード、マウス、タッチパネル、カードリーダ、音声入力装置(例えば、マイクロフォン)等である。出力装置１０５は、各種情報をユーザに向けて出力するユーザインタフェースであり、各種情報を表示する表示装置（液晶ディスプレイ、有機ＥＬパネル等）、各種情報を音声によって出力する音声出力装置（例えば、スピーカ）、紙媒体に印刷するプリンタ等である。 The input device 104 and the output device 105 form a user interface of the dialogue device 10. The input device 104 is a user interface that accepts user input and data input from the outside, and is, for example, a keyboard, a mouse, a touch panel, a card reader, a voice input device (for example, a microphone), and the like. The output device 105 is a user interface that outputs various information to the user, and is a display device (liquid crystal display, organic EL panel, etc.) that displays various information, and a voice output device (for example, a speaker) that outputs various information by voice. ), Printers that print on paper media, etc.

補助記憶装置１０６は、プログラムやデータを格納する装置であり、例えば、ＳＳＤ（Solid State Drive）、ハードディスクドライブ、光学式記憶媒体（ＣＤ（Compact Disc
）、ＤＶＤ（Digital Versatile Disc）等）、ＩＣカード、ＳＤカード等である。補助記憶装置１０６には、対話装置１０の機能を実現するためのプログラム及びデータが格納されている。補助記憶装置１０６は、記録媒体の読取装置や通信装置１０３を介してプログラムやデータの書き込み／読み出しが可能である。補助記憶装置１０６に格納（記憶）されているプログラムやデータは、主記憶装置１０２に随時読み出される。プロセッサ１０１が、主記憶装置１０２に格納されているプログラムを読み出して実行することにより、対話装置１０が備える各機能が実現される。 The auxiliary storage device 106 is a device for storing programs and data, for example, an SSD (Solid State Drive), a hard disk drive, and an optical storage medium (CD (Compact Disc)).
), DVD (Digital Versatile Disc), etc.), IC card, SD card, etc. The auxiliary storage device 106 stores programs and data for realizing the functions of the dialogue device 10. The auxiliary storage device 106 can write / read programs and data via the reading device of the recording medium and the communication device 103. Programs and data stored (stored) in the auxiliary storage device 106 are read out to the main storage device 102 at any time. When the processor 101 reads out and executes the program stored in the main storage device 102, each function of the dialogue device 10 is realized.

対話装置１０の機能の全部又は一部を、他の演算装置（例えば、ＦＰＧＡ（Field Programable Gate Array）やＡＳＩＣ（Application Specific Integrated Circuit）等のハ
ードウェアによって実現してもよい。 All or part of the functions of the dialogue device 10 may be realized by hardware such as another arithmetic unit (for example, FPGA (Field Programable Gate Array) or ASIC (Application Specific Integrated Circuit)).

情報処理装置１００は、例えば、パーソナルコンピュータ（デスクトップ型又はノートブック型）、スマートフォン、タブレット、汎用機等である。情報処理装置１００の全部又は一部は、例えば、クラウドシステムにより提供されるクラウドサーバのように仮想的な情報処理資源を用いて実現されるものであってもよい。 The information processing device 100 is, for example, a personal computer (desktop type or notebook type), a smartphone, a tablet, a general-purpose machine, or the like. All or part of the information processing apparatus 100 may be realized by using virtual information processing resources such as a cloud server provided by a cloud system.

続いて、対話コンテンツ管理部５の質問応答生成部５１が質問応答データ２２１を生成する機能について具体的に説明する。 Subsequently, the function of the question answering generation unit 51 of the dialogue content management unit 5 to generate the question answering data 221 will be specifically described.

図４Ａ、図４Ｂは、質問応答生成部５１が、ＦＡＱ（Frequently Asked Question）が
掲載されたＷｅｂページから取得されるテキストデータに基づき質問応答データ２２１を生成する場合を例示している。 4A and 4B illustrate the case where the question answering generation unit 51 generates the question answering data 221 based on the text data acquired from the Web page on which the FAQ (Frequently Asked Question) is posted.

図４Ａは、質問応答データ２２１の生成元となるテキストデータ２１の一例であり、ＦＡＱが掲載されたＷｅｂページの例である。同図に示すように、例示するＷｅｂページには、ＦＡＱにおける個々の質問文３０１と応答文３０２とが、「サービスＡの申込方法」、「住所の変更方法」等のタイトルで分類されて記載されている。 FIG. 4A is an example of the text data 21 that is the generation source of the question answering data 221 and is an example of a Web page on which the FAQ is posted. As shown in the figure, in the illustrated Web page, the individual question sentence 301 and the response sentence 302 in the FAQ are classified and described by titles such as "how to apply for service A" and "how to change the address". Has been done.

図４Ｂは、図４Ａに示すテキストデータ２１に基づき生成される質問応答データ２２１の例である。同図に示すように、例示する質問応答データ２２１は、質問文と応答文とのペアからなるテキスト（以下、「ＱＡ対データ」と称する。）である。対話コンテンツ管理部５は、Ｗｅｂページから質問文と当該質問文に対応する応答文との組合せを抽出し、抽出した組合せに基づきＱＡ対データを生成する。本例の場合、ＦＡＱの質問が想定入力文となる。 FIG. 4B is an example of question answering data 221 generated based on the text data 21 shown in FIG. 4A. As shown in the figure, the illustrated question-answering data 221 is a text (hereinafter, referred to as "QA vs. data") composed of a pair of a question sentence and a response sentence. The dialogue content management unit 5 extracts a combination of the question sentence and the response sentence corresponding to the question sentence from the Web page, and generates QA pair data based on the extracted combination. In the case of this example, the FAQ question is the assumed input sentence.

図５Ａ、図５Ｂは、質問応答生成部５１が、一定の構造に従って記載されている業務マニュアル等のマニュアルからなるテキストデータ２１から質問応答データ２２１を生成する場合を例示している。対話コンテンツ管理部５は、マニュアルからなるテキストデータ２１における章節や文体の構造の特徴に基づき、ＱＡ対データと、そのＱＡ対データを集約した質問応答データ２２１を生成する。 5A and 5B illustrate the case where the question answering generation unit 51 generates the question answering data 221 from the text data 21 consisting of a manual such as a business manual described according to a certain structure. The dialogue content management unit 5 generates QA pair data and question / answer data 221 that aggregates the QA pair data based on the characteristics of the chapters and stylistic structure in the text data 21 composed of the manual.

図５Ａは、質問応答データ２２１の生成元となるテキストデータ２１の一例であり、マニュアルの一部から抽出されたデータである。例示するマニュアルには、第５章の表題として「サービスＡについて」が記載され、第５章の第１節（５．１節）に「サービスＡの申込方法」について記載されている。本例の場合、対話コンテンツ管理部５は、第５章の表題「サービスＡ」と、５．１節の表題「申込方法」とを「の」で結合した「サービスＡの申込方法は？」という質問を生成するとともに、５．１節に記載されている内容をその質問に対する応答として抽出する。また、例示するマニュアルには「６．基本情報の変更」という章があり、基本情報の種類ごとに「６．１住所」等の項目がある。対話コンテンツ管理部５は、「６．基本情報の変更」における「基本情報」の記載箇所を節の表題（例えば、「住所」）に変更することにより「住所の変更方法は？」との質問を生成するとともに、６．１節に記載されている内容を、その質問に対する応答として抽出する。 FIG. 5A is an example of the text data 21 that is the generation source of the question answering data 221 and is the data extracted from a part of the manual. In the illustrated manual, "About Service A" is described as the title of Chapter 5, and "How to apply for Service A" is described in Section 1 (Section 5.1) of Chapter 5. In the case of this example, the dialogue content management unit 5 combines the title "Service A" in Chapter 5 and the title "Application method" in Section 5.1 with "no", "How to apply for Service A?" Is generated, and the content described in Section 5.1 is extracted as a response to the question. In addition, the illustrated manual has a chapter "6. Change of basic information", and there are items such as "6.1 address" for each type of basic information. The dialogue content management unit 5 asks "How to change the address?" By changing the description of "basic information" in "6. Change of basic information" to the title of the section (for example, "address"). Is generated, and the content described in Section 6.1 is extracted as a response to the question.

図５Ｂは、図５Ａに示すテキストデータ２１から生成される質問応答データ２２１の例である。同図に示すように、対話コンテンツ管理部５は、「サービスＡの申込方法は？」を質問文とし、「店舗またはＷｅｂサイトからお手続き頂けます。」を応答文としたＱＡ対データを生成する。また、対話コンテンツ管理部は、「住所の変更方法は？」という質問文と、「Ｗｅｂサイトからお手続き頂けます。」という応答文とをペアにしたＱＡ対データを生成する。 FIG. 5B is an example of question answering data 221 generated from the text data 21 shown in FIG. 5A. As shown in the figure, the dialogue content management unit 5 generates QA vs. data with the question text "How to apply for service A?" And the response text "You can complete the procedure from the store or website." do. In addition, the dialogue content management department generates QA vs. data that pairs the question text "How to change the address?" And the response text "You can complete the procedure from the website."

図６Ａ、図６Ｂは、対話装置１０が、ユーザとの対話を通じてユーザの希望に叶う商品を特定する情報（型番、製品名等）を提示するサービスを提供す場合における、テキストデータ２１と質問応答データ２２１の例である。対話装置１０は、例えば、ユーザがある商品を購入しようとする際、ユーザの希望に沿った商品をユーザとの対話を通じて提示する。図６Ａ及び図６Ｂには、ユーザがある商品としてＢＴＯ（Build To Order）方式のパーソナルコンピュータ（以下、「ＰＣ」と称する。）を購入しようとする場合を例示している。質問応答生成部５１は、商品であるＰＣの構成部品の仕様が記載された仕様表のテキストデータ２１に基づき質問応答データ２２１を生成する。 6A and 6B show text data 21 and question answering when the dialogue device 10 provides a service for presenting information (model number, product name, etc.) that identifies a product that meets the user's wishes through dialogue with the user. This is an example of data 221. For example, when a user wants to purchase a certain product, the dialogue device 10 presents the product according to the user's wishes through a dialogue with the user. 6A and 6B illustrate a case where a user intends to purchase a BTO (Build To Order) type personal computer (hereinafter referred to as "PC") as a certain product. The question answering generation unit 51 generates question answering data 221 based on the text data 21 of the specification table in which the specifications of the components of the PC as a product are described.

図６Ａは、テキストデータ２１の一例であり、商品であるＰＣの構成部品の仕様が記載された仕様表である。同図に示すように、仕様表には、各商品について、商品名、価格、記憶装置、ポート等を対応付けた情報を含む。 FIG. 6A is an example of text data 21, and is a specification table in which specifications of components of a personal computer, which is a commercial product, are described. As shown in the figure, the specification table includes information associated with the product name, price, storage device, port, etc. for each product.

図６Ｂは、図６Ａに示した仕様表のテキストデータ２１に基づき質問応答生成部５１が生成する質問応答データ２２１の例である。同図に示すように、例示する質問応答データ２２１は、エントリ（レコード）の識別子（ＩＤ）に、ユーザに対して行う質問と、当該質問に対して想定されるユーザの応答部（想定入力文）と、ユーザに対して次に行う質問とを対応付けた情報を含む。 FIG. 6B is an example of question answering data 221 generated by the question answering generation unit 51 based on the text data 21 of the specification table shown in FIG. 6A. As shown in the figure, the example question answering data 221 has an entry (record) identifier (ID), a question to be asked to the user, and a expected user response unit (assumed input sentence) to the question. ) And the information associated with the next question to be asked to the user.

図６Ｂに示した質問応答データ２２１は、対話装置１０の対話処理部７が行った質問に対するユーザからの応答文（想定入力文）に応じて次の質問を決定するための情報を含む。即ち質問応答データ２２１は、対話処理部７がユーザとの間で対話処理を進めていくためのシナリオを表形式でまとめた情報(以下、「シナリオ表」と称する。)である。 The question answering data 221 shown in FIG. 6B includes information for determining the next question according to the response sentence (assumed input sentence) from the user to the question asked by the dialogue processing unit 7 of the dialogue device 10. That is, the question-answering data 221 is information (hereinafter, referred to as a "scenario table") that summarizes the scenarios for the dialogue processing unit 7 to proceed with the dialogue processing with the user in a table format.

図６Ｃは、対話処理部７がユーザとの間で行う対話（チャット）の手順を説明する図である。対話処理部７は、まずユーザ端末４０に、図６Ｂに示したシナリオ表の最初の質問Ｑ１（例えば、「記憶装置はどれにしますか？」）を送出する。ユーザが質問Ｑ１に対して応答文（例えば、「ＨＤＤ」、「ハードディスク装置」等。ここでは一例として想定入力文Ａ１や当該Ａ１に近い応答文であるものとする。）を返送すると、対話処理部７は、シナリオ表を検索し、質問Ｑ１と想定入力文Ａ１とに対応付けされている次の質問Ｑ２を取得する。質問Ｑ２が、例えば、「ＵＳＢ」の属性について「ＵＳＢポートの規格は何にしますか？」との質問であれば、対話処理部７は、質問Ｑ２をユーザ端末４０に向けて送出し、質問Ｑ２に対するユーザの応答に応じて次の質問をシナリオ表から取得し、取得した質問をユーザ端末４０に送出する。以上のようにして対話処理部７は最終的に一つの商品を特定し、その商品名をユーザ端末４０に送出する。 FIG. 6C is a diagram illustrating a procedure of dialogue (chat) performed by the dialogue processing unit 7 with the user. The dialogue processing unit 7 first sends to the user terminal 40 the first question Q1 (for example, "Which storage device do you want?") In the scenario table shown in FIG. 6B. When the user returns a response sentence (for example, "HDD", "hard disk device", etc., here, as an example, it is assumed that the expected input sentence A1 or a response sentence close to the A1) is returned to the question Q1, the dialogue process is performed. Part 7 searches the scenario table and acquires the next question Q2 associated with the question Q1 and the assumed input sentence A1. If the question Q2 is, for example, a question "What is the standard of the USB port?" For the attribute of "USB", the dialogue processing unit 7 sends the question Q2 to the user terminal 40 and asks a question. The next question is acquired from the scenario table according to the user's response to Q2, and the acquired question is sent to the user terminal 40. As described above, the dialogue processing unit 7 finally identifies one product and sends the product name to the user terminal 40.

図７に同義語辞書２２２の一例を示す。同図に示すように、同義語辞書２２２は、同義語や類義語等、意味が近い表現を関連付けた情報を含む。 FIG. 7 shows an example of the synonym dictionary 222. As shown in the figure, the synonym dictionary 222 includes information associated with expressions having similar meanings such as synonyms and synonyms.

図８に、区別用語リスト２２３の一例を示す。例示する区別用語リスト２２３は、トピック「保険業」についての区別用語リスト２２３の一例である。区別用語リスト２２３は、表記が類似していても対話のトピックに応じて区別すべき用語を互いに独立した区別用語とし、その区別用語を一覧にしたものである。例えば、「保険料」と「保険金」は表記が類似し関連語であるが、「保険業」においては両者は明確に区別する必要がある。例示する区別用語リスト２２３では、「保険料」と「保険金」が個別の区別用語として（異なるエントリ（レコード）のデータとして）登録されている。例示する区別用語リスト２２３では、区別用語に同義語が存在する場合、そのうちの一の同義語を代表語とし、他の同義語は代表語に対応付けて（同じエントリ（レコード）のデータとして）登録される。 FIG. 8 shows an example of the distinctive term list 223. Illustrated Distinguished Term List 223 is an example of Distinguished Term List 223 for the topic "Insurance". The distinctive term list 223 is a list of terms that should be distinguished according to the topic of dialogue, even if the notations are similar, as distinctive terms that are independent of each other. For example, "insurance premium" and "insurance money" have similar notations and are related terms, but in "insurance industry", it is necessary to clearly distinguish between them. In the illustrated distinctive term list 223, "insurance premium" and "insurance claim" are registered as separate distinctive terms (as data of different entries (records)). In the illustrated distinguished term list 223, when a synonym exists in the distinguished term, one of the synonyms is used as the representative word, and the other synonyms are associated with the representative word (as data of the same entry (record)). be registered.

区別用語リスト２２３は様々な態様を取り得る。例えば、トピックが「商品やサービス
に関する問い合わせ」等である場合、例えば、「プランＡ」と「プランＢ」のように商品名やサービス名の表現が類似していることがあり、この場合、区別用語リスト２２３に「プランＡ」と「プランＢ」が夫々区別用語として登録される。また、例えば、トピックが図５Ａに示した仕様表の内容に関するものである場合、例えば、「ＵＳＢ２．０」と「ＵＳＢ３．０」が区別用語として登録される。 The distinctive term list 223 can take various aspects. For example, when the topic is "inquiries about products and services", for example, the expressions of product names and service names may be similar, such as "plan A" and "plan B", and in this case, distinction is made. "Plan A" and "Plan B" are registered as distinct terms in the term list 223. Further, for example, when the topic relates to the contents of the specification table shown in FIG. 5A, for example, "USB2.0" and "USB3.0" are registered as distinct terms.

区別用語リスト２２３は、例えば、対話コンテンツ管理部５の区別用語リスト生成部５２が、質問応答データ２２１やテキストデータ２１から区別用語を抽出し、抽出した区別用語を集約することにより生成される。 The distinctive term list 223 is generated, for example, by the distinctive term list generation unit 52 of the dialogue content management unit 5 extracting the distinctive terms from the question answering data 221 and the text data 21 and aggregating the extracted distinctive terms.

図２とともに区別用語リスト生成部５２が区別用語リスト２２３を生成する処理について説明する。 A process of generating the distinctive term list 223 by the distinctive term list generation unit 52 will be described with reference to FIG.

区別用語リスト生成部５２は、例えば、質問応答データ２２１が、図４Ｂや図５Ｂに例示した質問文と応答文の対の形式である場合、区別用語リスト生成部５２は、例えば、質問文に含まれているある名詞、複合名詞を抽出する。区別を要する用語は、例えば、保険の申し込みに関する問い合わせであれば「プランＡ」と「プランＢ」、「保険料」と「保険金」のように、名詞または複合名詞であることが多く、区別して理解する必要がある。尚、区別用語は、名詞や複合名詞に限らず、他の品詞であってもよい。区別用語語の品詞は、例えば、対話のトピック等に応じて設定することができる。 For example, when the question / answer data 221 is in the form of a pair of a question sentence and a response sentence exemplified in FIG. 4B or FIG. Extract certain contained nouns and compound nouns. Terms that require distinction are often nouns or compound nouns, such as "plan A" and "plan B", "insurance premium" and "insurance money" for inquiries regarding insurance applications. It needs to be understood separately. The distinctive term is not limited to a noun or a compound noun, and may be another part of speech. The part of speech of the distinctive term can be set according to, for example, the topic of dialogue.

また、区別用語リスト生成部５２は、例えば、形態素解析や構文解析等を行うことによりテキストデータ２１から区別用語を抽出する。区別用語リスト生成部５２は、例えば、抽出した区別用語を代表語とし、各代表語に同義語辞書２２２に基づく同義語を対応付けする。例えば、区別用語として抽出した「保険金」、「保険料」、「パスポート」のうち、「保険料」には「掛け金」、「パスポート」には「旅券」が夫々同義表現として同義語辞書２２２に登録されている場合、区別用語リスト生成部５２は、「保険金」と「保険料」とを個別の区別用語として区別用語リスト２２３の別の欄に登録し、「保険料」については「掛け金」と同義語であるとして、」、また「パスポート」については「旅券」と同義語であるとして、夫々、区別用語リスト２２３の同じ欄に登録する。 Further, the distinguishing term list generation unit 52 extracts the distinguishing term from the text data 21 by, for example, performing morphological analysis, syntactic analysis, or the like. For example, the distinctive term list generation unit 52 uses the extracted distinctive terms as representative words, and associates synonyms based on the synonym dictionary 222 with each representative word. For example, of the "insurance claims," "insurance premiums," and "passports" extracted as distinctive terms, "insurance premiums" are synonymous with "stakes," and "passports" are synonymous with the synonym dictionary 222. When registered in, the distinctive term list generation unit 52 registers "insurance money" and "insurance premium" as individual distinctive terms in another column of the distinctive term list 223, and "insurance premium" is "insurance premium". As a synonym for "insurance", and as a synonym for "passport" for "passport", they are registered in the same column of the distinctive term list 223, respectively.

尚、仕様表のようなテキストデータ２１から区別用語を抽出する場合、仕様表の表内の属性値の用語同士は区別される必要があるため、区別用語リスト生成部５２は、これらの用語を区別用語として抽出する。また、シナリオ表のような形式の質問応答データ２２１から区別用語を抽出する場合、区別用語リスト生成部５２は、質問文、応答文（想定入力文）、次の質問の中から、名詞や複合名詞等を区別用語として抽出する。 When extracting the distinguishing terms from the text data 21 such as the specification table, the terms of the attribute values in the table of the specification table need to be distinguished from each other. Therefore, the distinguishing term list generation unit 52 uses these terms. Extract as a distinguishing term. Further, when the distinction term is extracted from the question response data 221 in the form of a scenario table, the distinction term list generation unit 52 uses a noun or a compound from the question sentence, the response sentence (assumed input sentence), and the following question. Extract nouns as distinctive terms.

尚、区別用語リスト生成部５２が、ユーザから区別用語リスト２２３の内容の編集を受け付けるようにしてもよい。また、区別用語リスト２２３は、人が質問応答データ２２１やテキストデータ２１を参照しつつ作成してもよい。 The distinguishing term list generation unit 52 may accept editing of the contents of the distinguishing term list 223 from the user. Further, the distinctive term list 223 may be created by a person with reference to the question answering data 221 and the text data 21.

ところで、質問応答データ２２１のみから区別用語を抽出した場合、用語の数が多くなり、区別する必要のない用語（ノイズ）が抽出されてしまう可能性がある。そこで例えば、質問応答データ２２１から抽出された区別用語のうち、テキストデータ２１の所定の箇所に記載されていない用語をフィルタリング（除外）し、所定の箇所に記載されているもののみを区別用語として抽出するようにしてもよい。例えば、図４Ａのテキストデータ２１における所定の箇所は「サービスＡの申込方法」、「住所の変更方法」のようにカテゴリの表題が記載されている箇所である。また例えば、図５Ａのテキストデータ２１における所定の箇所は「サービスＡ」、「申込方法」、「解約方法」のような章節の見出しが記載されている箇所である。尚、カテゴリや章節の見出しは、区別する必要がある商品や問
い合わせ内容ごとに分類されたものであるので、例えば、カテゴリや章節の見出しに現れる用語を区別用語の代表語として抽出してもよい。 By the way, when the distinguishing term is extracted only from the question answering data 221, the number of terms may increase and the term (noise) that does not need to be distinguished may be extracted. Therefore, for example, among the distinguishing terms extracted from the question answering data 221, the terms not described in the predetermined place of the text data 21 are filtered (excluded), and only the terms described in the predetermined place are used as the distinguishing terms. It may be extracted. For example, a predetermined place in the text data 21 of FIG. 4A is a place where a category title is described, such as "method of applying for service A" and "method of changing an address". Further, for example, a predetermined place in the text data 21 of FIG. 5A is a place where chapter headings such as "service A", "application method", and "cancellation method" are described. Since the headings of categories and chapters are classified according to the products and inquiry contents that need to be distinguished, for example, the terms appearing in the headings of categories and chapters may be extracted as representative terms of the distinguishing terms. ..

類似度算出モデル２３は、類似度算出モデル生成部６が、同義語辞書２２２と、区別用語リスト２２３とに基づき生成する。類似度算出モデル２３は、例えば、入力文をＸとし、想定入力文をＹとして関数ｆ（Ｘ，Ｙ）と表わすことができる。類似度としては、例えば、ＸとＹの編集距離や、Ｘを翻訳文、Ｙを正解文（参照文）とみなしたときのＸとＹのＢＬＥＵ値等を採用することができる。また類似度として、例えば、編集距離やＢＬＵＥ値以外の類似度を用いてもよい。 The similarity calculation model 23 is generated by the similarity calculation model generation unit 6 based on the synonym dictionary 222 and the distinctive term list 223. In the similarity calculation model 23, for example, the input statement is X and the assumed input statement is Y, which can be expressed as a function f (X, Y). As the degree of similarity, for example, the editing distance between X and Y, the BLEU value of X and Y when X is regarded as a translated sentence and Y is regarded as a correct sentence (reference sentence), and the like can be adopted. Further, as the similarity, for example, a similarity other than the edit distance or the BLUE value may be used.

尚、類似度算出モデル２３は、同義語辞書２２２に基づき、表記（表層表現）が異なる語でも同一の語として扱う。そのため、同義語は同義語辞書２２２や区別用語リスト２２３を用いて予め一つの用語に統一しておくことが好ましい（「値段」、「料金」、「代金」等の用語を全て「価格」という用語に統一する等）。 The similarity calculation model 23 treats words having different notations (surface expressions) as the same word based on the synonym dictionary 222. Therefore, it is preferable to unify synonyms into one term in advance using the synonym dictionary 222 and the distinctive term list 223 (all terms such as "price", "charge", and "price" are referred to as "price". Unify the terms, etc.).

図２を参照しつつ、類似度算出モデル生成部６が類似度算出モデル２３を生成する処理について説明する。類似度算出モデル生成部６は、同義語辞書２２２において同じ欄に登録されている用語同士の類似度が大きくなり、また、区別用語リスト２２３に登録されている区別用語同士の類似度が小さくなるような類似度算出モデル２３を生成する。以下では、類似度算出モデル２３の例（第１モデルｆ１（Ｘ，Ｙ）と、第２モデルｆ２（Ｘ，Ｙ））について説明する。 With reference to FIG. 2, a process in which the similarity calculation model generation unit 6 generates the similarity calculation model 23 will be described. In the similarity calculation model generation unit 6, the similarity between the terms registered in the same column in the synonym dictionary 222 increases, and the similarity between the distinct terms registered in the distinct term list 223 decreases. Such a similarity calculation model 23 is generated. Hereinafter, an example of the similarity calculation model 23 (first model f1 (X, Y) and second model f2 (X, Y)) will be described.

第１モデルｆ１（Ｘ，Ｙ）による算出方法では、まず入力文Ｘと想定入力文Ｙを形態素解析や構文解析により、単語やフレーズ等の用語ごとに分解し、分解した用語の集合を夫々、Ｓ１＝｛ｘ１、ｘ２、…、ｘｎ｝、Ｓ２＝｛ｙ１、ｙ２、…、ｙｍ｝とし、Ｓ１とＳ２の類似度を算出する。尚、助詞等の特定の品詞や特定の用語をストップワードとして定義しておき、それらを集合Ｓ１、Ｓ２から除外しておくようにしてもよい。 In the calculation method using the first model f1 (X, Y), the input sentence X and the assumed input sentence Y are first decomposed into terms such as words and phrases by morphological analysis and syntactic analysis, and the set of decomposed terms is decomposed into each. S1 = {x1, x2, ..., Xn}, S2 = {y1, y2, ..., ym}, and the degree of similarity between S1 and S2 is calculated. It should be noted that a specific part of speech such as a particle or a specific term may be defined as a stop word, and they may be excluded from the sets S1 and S2.

Ｓ１とＳ２の類似度は、例えば、Ｊａｃｃａｒｄ係数や、Ｄｉｃｅ係数等の方法によって集合間の類似度として算出することができる。また、単語ｘｉ、ｙｊの間の類似度ｓ（ｘｉ、ｙｊ）をｘｉとｙｊの表記の近さ（編集距離の負数等）で定義し、ＷＭＤ（Word Mover‘s Distance）等で集合Ｓ１と集合Ｓ２の類似度を算出することもできる。また、集合Ｓ１と集合Ｓ２の単語重複度等に基づく方法を用いることもできる。ＸとＹのレーベンシュタイン距離等の編集距離を用いる方法もある。Ｓ１とＳ２の類似度は、更に他の算出方法により算出してもよい。このように、第１モデルｆ１（Ｘ，Ｙ）では、ＸとＹの表記に基づき類似度を算出する。尚、距離については、負数をとる等して類似度に変換する。 The degree of similarity between S1 and S2 can be calculated as the degree of similarity between sets by a method such as a Jaccard index or a Dice coefficient. Further, the similarity s (xi, yj) between the words xi and yj is defined by the closeness of the notation of xi and yj (negative number of editing distance, etc.), and the set S1 is defined by WMD (Word Mover's Distance) or the like. It is also possible to calculate the similarity of the set S2. Further, a method based on the word duplication degree of the set S1 and the set S2 can also be used. There is also a method of using an editing distance such as the Levenshtein distance between X and Y. The degree of similarity between S1 and S2 may be calculated by another calculation method. In this way, in the first model f1 (X, Y), the similarity is calculated based on the notation of X and Y. The distance is converted to the degree of similarity by taking a negative number or the like.

第１モデルｆ１（Ｘ，Ｙ）に基づく算出方法では、表記に基づき類似度を算出することから、類似度算出の過程が明確であり説明性が高い。そのため、対話装置１０による応答精度を改善するために、対話装置１０が、例えば、何故そのような返答をするのか、対話装置１０の管理者やユーザが知りたい場合に、類似度の算出過程を追跡することができる。また、第１モデルｆ１（Ｘ，Ｙ）に基づく類似度の算出方法は、表記に基づくものであるため、表記が類似している場合は正しく想定入力文を推定できるという利点がある。 In the calculation method based on the first model f1 (X, Y), the similarity is calculated based on the notation, so that the process of calculating the similarity is clear and highly explanatory. Therefore, in order to improve the response accuracy of the dialogue device 10, for example, when the administrator or the user of the dialogue device 10 wants to know why such a response is made, the similarity calculation process is performed. Can be tracked. Further, since the method of calculating the similarity based on the first model f1 (X, Y) is based on the notation, there is an advantage that the assumed input sentence can be correctly estimated when the notations are similar.

尚、第１モデルｆ１（Ｘ，Ｙ）に基づく算出方法は、単語の意味ではなく、表記で近さを算出するため、類似した表現であっても異なる単語として扱われる場合がある。例えば、「利用する」と「使う」は、共通する文字がないため、類似度が小さくなる。 In the calculation method based on the first model f1 (X, Y), the closeness is calculated not by the meaning of the word but by the notation, so that even similar expressions may be treated as different words. For example, "use" and "use" have no common characters, so the degree of similarity is small.

そこで、第２モデルｆ２（Ｘ，Ｙ）による類似度の算出方法では、単語ｗ１とｗ２の単語間類似度ｓ（ｗ１、ｗ２）を用いる。単語間類似度ｓ（ｗ１、ｗ２）は、個々の単語同
士の類似度であり、その算出方法としては、例えば、ｗ１とｗ２の分散表現のコサイン類似度を用いる方法がある。また、ｗ１とｗ２の単語概念間距離を用いる方法もある。 Therefore, in the method of calculating the similarity by the second model f2 (X, Y), the interword similarity s (w1, w2) of the words w1 and w2 is used. The inter-word similarity s (w1, w2) is the similarity between individual words, and as a calculation method thereof, for example, there is a method using the cosine similarity of the distributed expression of w1 and w2. There is also a method of using the distance between word concepts of w1 and w2.

単語間類似度ｓ（ｗ１、ｗ２）を用いる算出方法では、例えば、ＷｏｒｄＮｅｔ上の２つの単語間のパスの長さ等を用いることができる。これらの単語間類似度を用いて、単語集合Ｓ１、Ｓ２の間の距離を例えば、ＷＭＤを使って算出する。第２モデルｆ２（Ｘ，Ｙ）は、表記ではなく、単語の分散表現の類似度や概念距離を用いるため、表記が異なっていても、意味が類似した単語同士の類似度は高くなる。しかし単語間類似度ｓ（ｗ１、ｗ２）が表記に基づく算出方法ではないため、説明性が低くなる場合がある。 In the calculation method using the interword similarity s (w1, w2), for example, the length of the path between two words on WordNet can be used. Using these word-to-word similarities, the distance between the word sets S1 and S2 is calculated using, for example, WMD. Since the second model f2 (X, Y) uses the similarity of distributed expressions of words and the conceptual distance instead of the notation, the similarity between words having similar meanings is high even if the notations are different. However, since the interword similarity s (w1, w2) is not a calculation method based on the notation, the explanatory property may be low.

尚、第２モデルｆ２（Ｘ，Ｙ）による類似度の算出方法では、単語間類似度ｓ（ｗ１、ｗ２）を、表記ではなく、分散表現の類似度や概念距離を用いて算出していたが、同義語辞書２２２と区別用語リスト２２３とを用いて、類似度の値を補正することとしてもよい。以下に、類似度が０〜１の間の実数をとるように正規化されている場合に類似度の値を補正する方法の一例を示す。 In the method of calculating the degree of similarity using the second model f2 (X, Y), the degree of similarity between words (w1, w2) was calculated using the degree of similarity and the conceptual distance of the distributed expression instead of the notation. However, the value of the degree of similarity may be corrected by using the synonym dictionary 222 and the distinctive term list 223. The following is an example of a method of correcting the similarity value when the similarity is normalized to take a real number between 0 and 1.

まず同義語辞書２２２に同義表現として記載されている用語ｗ１、ｗ２は、予め定めた０以上１以下の大きい値をｃ１として、ｓ（ｗ１、ｗ２）＝ｃ１とおく。尚、ｗ１とｗ２は同じ用語とみなして、ｃ１＝１としてもよい。また、区別用語リスト２２３に記載されている二つの区別用語（ｗ１、ｗ２）は予め定めた０以上１以下の小さい値をｃ２として、ｓ（ｗ１、ｗ２）＝ｃ２とおく。例えば、ｃ２＝０としてもよい。このように補正することで、区別用語リスト２２３に記載されている用語同士の類似度は小さくなる。 First, the terms w1 and w2 described as synonymous expressions in the synonym dictionary 222 are set to s (w1, w2) = c1 with a predetermined large value of 0 or more and 1 or less as c1. Note that w1 and w2 may be regarded as the same term and c1 = 1. Further, for the two distinct terms (w1, w2) described in the distinct term list 223, a predetermined small value of 0 or more and 1 or less is set as c2, and s (w1, w2) = c2. For example, c2 = 0 may be set. By making such a correction, the degree of similarity between the terms listed in the distinctive term list 223 becomes small.

尚、特にＷｏｒｄＮｅｔの概念間類似度を用いる場合には、複数の想定質問文Ｙに対して、第２モデルｆ２（Ｘ，Ｙ）による類似度が同じ値になる場合がある。このような場合、ｆ１とｆ２とを合成した類似度算出モデルｆ（Ｘ，Ｙ）を、
ｆ（Ｘ，Ｙ）＝Ｃ×ｆ１（Ｘ，Ｙ）＋ｆ２（Ｘ，Ｙ）
としてもよい。合成係数Ｃは、例えば、Ｃ＝０．１とする。また、ｆ１（Ｘ，Ｙ）が十分大きいときには、表記がかなり類似しており、ｆ１（Ｘ，Ｙ）の値が信頼できる。そのため、予め閾値Ｈを定めて置き、以下の式のように、ｆ１がＨより大きい場合には、ｆ１の値を類似度として採用してもよい。
ｆ（Ｘ，Ｙ）＝ｆ１（Ｘ，Ｙ）ｆ１（Ｘ、Ｙ）＞Ｈの場合
・・・式１
ｆ（Ｘ，Ｙ）＝Ｃ×ｆ１（Ｘ，Ｙ）＋ｆ２（Ｘ，Ｙ）ｆ１（Ｘ，Ｙ）≦Ｈの場合
・・・式２ In particular, when WordNet's interconceptual similarity is used, the similarity by the second model f2 (X, Y) may be the same value for a plurality of assumed question sentences Y. In such a case, the similarity calculation model f (X, Y) obtained by synthesizing f1 and f2 is used.
f (X, Y) = C × f1 (X, Y) + f2 (X, Y)
May be. The synthesis coefficient C is, for example, C = 0.1. Further, when f1 (X, Y) is sufficiently large, the notations are quite similar, and the value of f1 (X, Y) is reliable. Therefore, if the threshold value H is set in advance and f1 is larger than H as in the following equation, the value of f1 may be adopted as the degree of similarity.
When f (X, Y) = f1 (X, Y) f1 (X, Y)> H
... Equation 1
When f (X, Y) = C × f1 (X, Y) + f2 (X, Y) f1 (X, Y) ≦ H
・・・ Equation 2

このような算出方法により、表記が類似しているときにはｆ１が採用され、結果に対する説明性が高くなるとともに、ｆ１の値が低く、表記が類似していない場合であっても、単語間の意味が類似している場合には、ｆ２を主として、ｆ２とｆ１の合成類似度を用いることで、高い類似度を得ることができる。 By such a calculation method, f1 is adopted when the notations are similar, the explanation for the result is improved, and even when the value of f1 is low and the notations are not similar, the meaning between words is high. When are similar, a high degree of similarity can be obtained by mainly using f2 and using the synthetic similarity of f2 and f1.

対話処理部７は、以上のようにして生成された類似度算出モデル２３を用いて入力文と想定入力文との類似度を算出し、最も近い想定入力文を取得する。例えば、図６Ｂに示したシナリオ表に基づくユーザとの対話において、対話装置１０が「記憶装置はどれにしますか？」という質問をしたときに、入力文が「Hard Disk Drive」であった場合、「記憶
装置」の属性における値（ＨＤＤ、ＳＳＤ）の中で、どの値に最も近いかを算出する。 The dialogue processing unit 7 calculates the similarity between the input sentence and the assumed input sentence using the similarity calculation model 23 generated as described above, and acquires the closest assumed input sentence. For example, in a dialogue with a user based on the scenario table shown in FIG. 6B, when the dialogue device 10 asks "Which storage device should I use?", The input sentence is "Hard Disk Drive". , Among the values (HDD, SSD) in the attribute of "storage device", which value is closest to is calculated.

以上により、表記が類似している場合には説明性が高い類似度を、また、表記が異なる場合でも意味が類似している場合には高い類似度を得ることができる。また、区別する必要がある用語については類似度を低く算出することができる。これにより応答精度を向上
させることができる。対話装置１０は、以上のような類似度算出モデル２３を用いることで、入力文と最も類似度が高い想定入力文を特定し、適切な応答文を返すおことができる。 From the above, it is possible to obtain a high degree of similarity when the notations are similar, and a high degree of similarity when the meanings are similar even if the notations are different. In addition, the similarity can be calculated low for terms that need to be distinguished. This makes it possible to improve the response accuracy. By using the similarity calculation model 23 as described above, the dialogue device 10 can identify the assumed input sentence having the highest similarity with the input sentence and return an appropriate response sentence.

ところで、入力文の表現は多様であるため、入力文に対話装置１０が適切な応答文を返送するための情報が欠落している場合や、区別用語として区別して入力すべき語と他の語が混在していることがあり、これらは対話装置１０の応答精度を低下させる要因となる。図２に示した対話処理部７の入力補助部７２は、適切な入力文を入力できるようにユーザを補助して対話装置１０の応答精度の向上を図る。 By the way, since the expressions of the input sentence are various, the input sentence lacks information for the dialogue device 10 to return an appropriate response sentence, or a word to be input separately as a distinctive term and another word. May be mixed, and these are factors that reduce the response accuracy of the dialogue device 10. The input assisting unit 72 of the dialogue processing unit 7 shown in FIG. 2 assists the user so that an appropriate input sentence can be input, and aims to improve the response accuracy of the dialogue device 10.

入力補助部７２は、ユーザが入力文を入力する過程で、想定入力文で使用されている用語を表示する。対話装置１０は、想定入力文に含まれる用語を主要用語リスト２４として管理する。 The input assist unit 72 displays the terms used in the assumed input sentence in the process of inputting the input sentence by the user. The dialogue device 10 manages the terms included in the assumed input sentence as the main term list 24.

図２に示すように、対話装置１０の主要用語リスト生成部５３は、質問応答データ２２１に基づき主要用語リスト２４を生成する。対話処理部７は、ユーザ端末４０を介して行われるユーザとの対話処理に際し、質問応答生成部７１が想定入力文を推定するとともに、入力補助部７２が、主要用語リスト２４から、推定した想定入力文に対応する主要語をユーザ端末４０に提示する。 As shown in FIG. 2, the main term list generation unit 53 of the dialogue device 10 generates the main term list 24 based on the question answering data 221. In the dialogue processing unit 7, when the dialogue processing with the user is performed via the user terminal 40, the question response generation unit 71 estimates the assumed input sentence, and the input assisting unit 72 estimates from the main term list 24. The main word corresponding to the input sentence is presented to the user terminal 40.

図９に主要用語リスト２４の一例を示す。主要用語リスト２４は、例えば、主要用語リスト生成部５３が、形態素解析によって質問応答データ２２１の各想定入力文に含まれる特定の品詞（名詞、動詞等）を主要用語として抽出することにより生成する。例示する主要用語リスト２４は、図４Ａ及び図５Ａに示した質問応答データ２２１に基づき生成されたものである。 FIG. 9 shows an example of the main term list 24. The main term list 24 is generated, for example, by the main term list generation unit 53 extracting specific parts of speech (nouns, verbs, etc.) included in each assumed input sentence of the question / answer data 221 by morphological analysis. .. The illustrated list of key terms 24 is generated based on the question answering data 221 shown in FIGS. 4A and 5A.

図１０Ａ〜図１０Ｄに、ユーザが入力文を入力する過程で入力補助部７２が表示する画面の例を示す。 10A to 10D show an example of a screen displayed by the input assisting unit 72 in the process of inputting an input sentence by the user.

例えば、図１０Ａに示すように、ユーザ端末４０に表示されたユーザ入力欄４１にユーザが「申込はどこでできますか？」と入力した場合、対話処理部７は、入力文Ｘに対し、各想定質問文Ｙに対する類似度ｆ（Ｘ，Ｙ）を算出し、類似度が高い上位ｐ件（ｐは予め定めておく。例えばｐ＝１０とする。）を選定する。 For example, as shown in FIG. 10A, when the user inputs "where can I apply?" In the user input field 41 displayed on the user terminal 40, the dialogue processing unit 7 responds to the input sentence X by each. The similarity f (X, Y) with respect to the assumed question sentence Y is calculated, and the upper p cases with high similarity (p is predetermined. For example, p = 10) are selected.

ここでは、もっとも高い類似度の値をＭ（Ｍ＝ｍａｘｆ（Ｘ、Ｙ））としたとき、二つの想定入力文「サービスＡ」と「サービスＢ」を含む想定質問文Ｙが選定されたものとする。このとき、主要用語リスト２４において、選定された想定入力文Ｙに含まれる主要用語ｗ（この場合には、「サービスＡ」、「サービスＢ」、「申込」のいずれか）を入力文Ｘに加えてＸ’としたときに、Ｍ’＝ｍａｘｆ（Ｘ’、Ｙ）の値がＭよりも予め定めた一定の値Ｋ以上大きい場合、即ち、Ｍ’−Ｍ＞Ｋとなる場合、ｗを不足情報補完候補４２として選定する。つまり入力文Ｘと想定入力文Ｙとの類似度が高くなるように、入力文Ｘに不足情報補完候補４２として選定した主要用語ｗを加える。 Here, when the value of the highest similarity is M (M = maxf (X, Y)), the assumed question sentence Y including the two assumed input sentences "service A" and "service B" is selected. And. At this time, in the main term list 24, the main term w (in this case, any of "service A", "service B", and "application") included in the selected assumed input sentence Y is used as the input sentence X. In addition, when X'is set, when the value of M'= maxf (X', Y) is larger than M by a predetermined constant value K or more, that is, when M'-M> K, w is set. It is selected as a candidate for supplementing insufficient information 42. That is, the main term w selected as the missing information complement candidate 42 is added to the input sentence X so that the similarity between the input sentence X and the assumed input sentence Y is high.

図１０Ａは、「サービスＡ」と「サービスＢ」が不足情報補完候補４２として選定された場合である。対話処理部７は、選定した不足情報補完候補４２である「サービスＡ」と「サービスＢ」を、ユーザ端末４０にて選択可能に表示する。図１０Ａにおいて、ユーザが、例えば、「サービスＡ」を選択すると、図１０Ｂに示すように、入力補助部７２は、選択された「サービスＡ」の語によって補完された「サービスＡの申込みはどこでできますか？」という入力文Ｘ’を表示する。 FIG. 10A shows a case where “service A” and “service B” are selected as the missing information complement candidate 42. The dialogue processing unit 7 displays the selected shortage information complement candidate 42 “service A” and “service B” so as to be selectable on the user terminal 40. In FIG. 10A, when the user selects, for example, "service A", the input assisting unit 72, as shown in FIG. 10B, "where is the application for service A" complemented by the selected word "service A". Is it possible? ”Is displayed.

尚、例えば、ユーザ自身が用語を知らない、用語を混同している、等の理由で誤った用語を入力する場合がある。例えば、図１０Ｃに示すように、入力文Ｘが「サービスＡの申込みはどこでできますか？」という文である場合、ユーザが「サービスＡ」を「サービスＢ」や「サービスＣ」と混同している場合がある。この場合、対話処理部７は、入力文Ｘにおける「サービスＡ」の表示位置に別の候補４３として「サービスＢ」、「サービスＣ」を、ユーザ端末４０に選択可能に表示する。これによりユーザに用語を混同している可能性があることを認知させることができ、入力文Ｘの誤入力を低減させることができる。この仕組みは、例えば、ユーザが「サービスＡ」の申込がどこでできるか質問しており、更に「サービスＡ」が主要用語リスト２４にあることから、「サービスＡ」をｗ１とし、主要用語リスト２４にある用語をｗ２として、単語間類似度ｓ（ｗ１、ｗ２）を算出し、ｗ１と類似している用語ｗ２を提示すべき用語として選定することにより実現できる。尚、用語ｗ２は、用語ｗ１との類似度が高い順に所定件数選定してもよいし、予め定めた一定の閾値以上の類似度をもつ用語ｗ２を選定してもよい。ユーザは、必要であれば、提示された用語ｗ２から一つを選択し、用語ｗ１を用語ｗ２に置き換えることによって、入力を修正することができる。 In addition, for example, the user may input an erroneous term because he / she does not know the term or confuses the term. For example, as shown in FIG. 10C, when the input sentence X is the sentence "Where can I apply for service A?", The user confuses "service A" with "service B" or "service C". May be. In this case, the dialogue processing unit 7 displays "service B" and "service C" as different candidates 43 at the display position of "service A" in the input sentence X so as to be selectable on the user terminal 40. As a result, the user can be made aware that the terms may be confused, and erroneous input of the input sentence X can be reduced. In this mechanism, for example, the user is asking where the application for "service A" can be made, and since "service A" is in the main term list 24, "service A" is set to w1 and the main term list 24. It can be realized by calculating the inter-word similarity s (w1, w2) with the term in the above as w2 and selecting the term w2 similar to w1 as the term to be presented. The term w2 may be selected in a predetermined number in descending order of similarity with the term w1, or the term w2 having a similarity of a certain threshold value or more may be selected. The user can modify the input, if desired, by selecting one of the presented terms w2 and replacing the term w1 with the term w2.

ユーザが入力文を入力する際、対話装置１０の類似度算出に有効な用語と、そうでない用語とをユーザが知ることができれば、ユーザは効率よく入力文を入力することができる。例えば、入力文の単語集合をＳ１とし、次に、参照単語集合をＳ３とする。参照単語集合Ｓ３は、全ての想定入力文に掲載されている単語の集合でもよいし、区別用語リスト２２３に掲載されている用語の集合でもよい。対話装置１０は、Ｓ１の要素の単語ｗ１について、ｇ＝ｍａｘ｛ｓ（ｗ１、ｗ２）｝を算出する。尚、当該式における「ｍａｘ」は、Ｓ３中の全ての単語ｗ２に対してとる。このとき、図１０Ｄに示すように、入力文Ｘの入力欄（以下、ユーザ入力欄４１と称する）において、ｇの値が予め定めた閾値以上となる用語ｗ１と他の用語とを識別可能に表示させてもよい。同図の例では、用語ｗ１が強調表示（ハイライト表示）されている（図中、矩形点線枠４４）。また、閾値以上となる用語ｗ１については、ｇの値に応じ、濃度を変えたグラデーション表示で示してもよい。このように、用語ｗ１と他の用語とを識別可能にすることで、対話装置１０が応答をする上で重視している用語がユーザにも分かり、対話装置１０が精度よく応答するための入力をユーザに促すことができる。 When the user inputs an input sentence, if the user can know the terms that are effective for calculating the similarity of the dialogue device 10 and the terms that are not, the user can efficiently input the input sentence. For example, the word set of the input sentence is S1, and then the reference word set is S3. The reference word set S3 may be a set of words listed in all assumed input sentences, or may be a set of terms listed in the distinctive term list 223. The dialogue device 10 calculates g = max {s (w1, w2)} for the word w1 of the element of S1. In addition, "max" in the formula is taken for all words w2 in S3. At this time, as shown in FIG. 10D, in the input field of the input sentence X (hereinafter referred to as the user input field 41), the term w1 in which the value of g is equal to or higher than a predetermined threshold value can be distinguished from other terms. It may be displayed. In the example of the figure, the term w1 is highlighted (highlighted) (rectangular dotted frame 44 in the figure). Further, the term w1 that is equal to or higher than the threshold value may be indicated by a gradation display in which the density is changed according to the value of g. By making the term w1 distinguishable from other terms in this way, the user can understand the term that the dialogue device 10 emphasizes in responding, and the input for the dialogue device 10 to respond accurately. Can be prompted to the user.

尚、上記の単語間類似度ｓ（ｗ１、ｗ２）は、上述した各種単語間類似度（分散表現のコサイン類似度、単語概念間距離等）の一つ、または複数を合成したものを用いればよい。また、単語間類似度ｓ（ｗ１、ｗ２）は、例えば、類似度算出モデルｆ（Ｘ，Ｙ）の合成計数Ｃに対応する重み付けに用いてもよい。単語同士の類似度ｓ（ｗ１、ｗ２）の代わりに類似度ｆ（Ｘ，Ｙ）を用いてもよい。 The word-to-word similarity s (w1, w2) may be one or a combination of the various word-to-word similarities (cosine similarity of distributed expressions, distance between word concepts, etc.). good. Further, the interword similarity s (w1, w2) may be used for weighting corresponding to the synthetic count C of the similarity calculation model f (X, Y), for example. The similarity f (X, Y) may be used instead of the similarity s (w1, w2) between words.

以上、本発明の実施形態につき説明したが、本発明は上記した実施形態に限定されるものではなく、様々な変形例が含まれる。また例えば、上記した実施形態は本発明を分かりやすく説明するために構成を詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また各実施形態の構成の一部について、他の構成に追加、削除、置換することが可能である。 Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments, and includes various modifications. Further, for example, the above-described embodiment is described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to the one including all the described configurations. Further, it is possible to add, delete, or replace a part of the configuration of each embodiment with other configurations.

また、上記の各構成、機能部、処理部、処理手段等は、それらの一部または全部を、例えば、集積回路で設計する等によりハードウェアで実現してもよい。また、上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリやハードディスク、ＳＳＤ（Solid State Drive）等の記録装置、
ＩＣカード、ＳＤカード、ＤＶＤ等の記録媒体に置くことができる。 Further, each of the above configurations, functional units, processing units, processing means and the like may be realized by hardware by designing a part or all of them by, for example, an integrated circuit. Further, each of the above configurations, functions, and the like may be realized by software by the processor interpreting and executing a program that realizes each function. Information such as programs, tables, and files that realize each function can be stored in memory, hard disks, and recording devices such as SSDs (Solid State Drives).
It can be placed on a recording medium such as an IC card, SD card, or DVD.

また、以上に説明した各情報処理装置の各種機能部、各種処理部、各種データベースの配置形態は一例に過ぎない。各種機能部、各種処理部、各種データベースの配置形態は、これらの装置が備えるハードウェアやソフトウェアの性能、処理効率、通信効率等の観点から最適な配置形態に変更し得る。 Further, the arrangement form of various functional units, various processing units, and various databases of each information processing apparatus described above is only an example. The arrangement form of various function units, various processing units, and various databases can be changed to the optimum arrangement form from the viewpoint of the performance, processing efficiency, communication efficiency, and the like of the hardware and software included in these devices.

また、前述した各種のデータを格納するデータベースの構成（スキーマ（Schema）等）は、リソースの効率的な利用、処理効率向上、アクセス効率向上、検索効率向上等の観点から柔軟に変更し得る。 Further, the configuration of the database (schema, etc.) for storing various data described above can be flexibly changed from the viewpoints of efficient use of resources, improvement of processing efficiency, improvement of access efficiency, improvement of search efficiency, and the like.

１対話システム、５対話コンテンツ管理部、５１質問応答生成部、５２区別用語リスト生成部、５３主要用語リスト生成部、６類似度算出モデル生成部、７対話処理部、２１テキストデータ、２２対話コンテンツ、２２１質問応答データ、２２２
同義語辞書、２２３区別用語リスト、２３類似度算出モデル、２４主要用語リスト、３０通信ネットワーク、４０ユーザ端末、４１ユーザ入力欄 1 Dialogue system, 5 Dialogue content management unit, 51 Question response generation unit, 52 Distinguished term list generation unit, 53 Main term list generation unit, 6 Similarity calculation model generation unit, 7 Dialogue processing unit, 21 Text data, 22 Dialogue content 221 Question and answer data, 222
Synonym dictionary, 223 distinct term list, 23 similarity calculation model, 24 main term list, 30 communication network, 40 user terminal, 41 user input field

Claims

It is a dialogue system configured using an information processing device.
A dialogue processing unit that outputs a response statement to the input statement,
A distinction that is a list of distinctive terms that should be distinguished from each other according to the question response data, the synonym dictionary, and the topic of dialogue in which the assumed input sentence and the response sentence are associated with each other. A storage unit that stores a list of terms, a similarity calculation model for finding the similarity between an input sentence and an assumed input sentence, and a storage unit.
The similarity calculation model generation unit that generates the similarity calculation model,
Equipped with
The similarity calculation model generation unit calculates the similarity so that the synonyms recorded in the synonym dictionary have high similarity and the distinct terms in the distinct term list have low similarity. Generate a calculated model and
The dialogue processing unit obtains the similarity between the input sentence and the assumed input sentence by using the similarity calculation model, selects the assumed input sentence based on the obtained similarity, and selects the assumed input. Output the response statement corresponding to the statement,
Dialogue system.

The dialogue system according to claim 1.
A dialogue content management unit is further provided to generate the question-and-answer data based on text data consisting of a document having a chapter structure, and to extract the distinctive terms from the chapter headings in the text data or the notation indicating the chapter classification. Prepare, prepare
Dialogue system.

The dialogue system according to claim 1.
The similarity calculation model generation unit includes a first similarity calculation model based on the similarity between a set of terms included in each of the input sentence and the assumed input sentence, and each of the input sentence and the assumed input sentence. A function obtained by generating a second similarity calculation model based on the similarity between words included in the above and synthesizing the first similarity calculation model and the second similarity calculation model is the similarity calculation model. Generate as,
Dialogue system.

The dialogue system according to claim 3.
When the similarity between the input sentence calculated by the first similarity calculation model and the assumed input sentence is equal to or more than a predetermined threshold value, the dialogue processing unit uses the first similarity calculation model as the similarity. Adopted as a degree calculation model,
Dialogue system.

The dialogue system according to claim 1.
The dialogue processing unit calculates the similarity after unifying the terms included in the input sentence into the terms included in the assumed input sentence using the synonym dictionary or the distinctive term list.
Dialogue system.

The dialogue system according to claim 1.
The storage unit stores a list of main terms extracted by using the terms included in each of the assumed input sentences as the main terms.
The dialogue processing unit outputs the main term for the assumed input sentence similar to the input sentence in a state in which the user can select the input sentence, and outputs the input sentence with the main term selected by the user. Complement,
Dialogue system.

The dialogue system according to claim 1.
The storage unit stores a list of main terms extracted by using the terms included in each of the assumed input sentences as the main terms.
The dialogue processing unit outputs the main terms of the assumed input sentence similar to the input sentence in a state where it can be replaced with some terms included in the input sentence, and outputs some of the terms. When the user's instruction to replace is received, a part of the input sentence is replaced with the instructed main term.
Dialogue system.

The dialogue system according to claim 1.
The storage unit stores a list of main terms extracted by using the terms included in each of the assumed input sentences as the main terms.
The dialogue processing unit displays, among the words included in the input sentence, words whose similarity with the word included in the assumed input sentence is equal to or higher than a predetermined threshold value so as to be identifiable in the input sentence.
Dialogue system.

It is a control method of a dialogue system configured by using an information processing device.
The dialogue system
Step to output response statement to input statement,
A distinction that is a list of distinct terms that should be distinguished from each other according to the question response data, the synonym dictionary, and the topic of dialogue in which the assumed input sentence and the response sentence are associated with each other. A step to store a list of terms and a similarity calculation model for finding the similarity between an input sentence and an assumed input sentence.
Steps to generate the similarity calculation model,
A step of generating a similarity calculation model for calculating the similarity so that the synonyms recorded in the synonym dictionary have a high degree of similarity and the distinctive terms in the distinction term list have a low degree of similarity, and
The similarity between the input sentence and the assumed input sentence is obtained by using the similarity calculation model, the assumed input sentence is selected based on the obtained similarity, and the response sentence corresponding to the selected assumed input sentence is selected. To output the step,
How to control the dialogue system.

The method for controlling a dialogue system according to claim 9.
The dialogue system generates the question-and-answer data based on the text data consisting of a document having a chapter structure, and extracts the distinctive terms from the chapter headings in the text data or the notation indicating the classification of the chapters. Steps to further prepare the content management department,
How to control the dialogue system to further execute.

The method for controlling a dialogue system according to claim 9.
The dialogue system includes a first similarity calculation model based on the similarity between a set of terms included in each of the input sentence and the assumed input sentence, and words included in each of the input sentence and the assumed input sentence. A step of generating a second similarity calculation model based on the similarity between each other, and generating a function obtained by synthesizing the first similarity calculation model and the second similarity calculation model as the similarity calculation model. ,
How to control the dialogue system to further execute.

The method for controlling a dialogue system according to claim 11.
When the dialogue system has a similarity between the input sentence calculated by the first similarity calculation model and the assumed input sentence of a predetermined threshold value or more, the first similarity calculation model is used as the similarity degree. Steps to be adopted as a calculation model,
How to control the dialogue system to further execute.

The method for controlling a dialogue system according to claim 9.
A step in which the dialogue system unifies the terms included in the input sentence into the terms included in the assumed input sentence using the synonym dictionary or the distinctive term list, and then calculates the similarity.
How to control the dialogue system to further execute.

The method for controlling a dialogue system according to claim 9.
The dialogue system
A step of storing a list of main terms extracted from the terms included in each of the assumed input sentences as the main terms, and
A step of outputting the main term of the assumed input sentence similar to the input sentence in a state in which the user can select it, and complementing the input sentence with the main term selected by the user.
How to control the dialogue system to further execute.

The method for controlling a dialogue system according to claim 9.
The dialogue system
A step of storing a list of main terms extracted from the terms included in each of the assumed input sentences as the main terms, and
A user who outputs the main term of the assumed input sentence similar to the input sentence in a state where it can be replaced with a part of the terms included in the input sentence and replaces the part of the terms. When the instruction of the above is received, a part of the input sentence is replaced with the instructed main term.
How to control the dialogue system to further execute.

The method for controlling a dialogue system according to claim 9.
A step in which the dialogue system stores a list of main terms extracted from the terms included in each of the assumed input sentences as the main terms, and
A step of identifiablely displaying a word included in the input sentence whose similarity with the word included in the assumed input sentence is equal to or higher than a predetermined threshold value in the input sentence.
How to control the dialogue system to further execute.