JP2024009724A

JP2024009724A - Dialogue reply candidate proposal system and dialogue reply candidate proposal method

Info

Publication number: JP2024009724A
Application number: JP2022111469A
Authority: JP
Inventors: 剛齊藤; Tsuyoshi Saito; 敦荻野; Atsushi Ogino
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2022-07-11
Filing date: 2022-07-11
Publication date: 2024-01-23

Abstract

To provide a dialogue reply candidate proposal system for generating an appropriate reply candidate sentence for a new question sentence in a dialogue.SOLUTION: A dialogue reply candidate proposal system comprises a processor and a sub-storage device 33. The sub-storage device 33 stores a question-reply database 21 in which past question sentences and past replay sentences for the past question sentences are associated with each other and a learning question-replay database 22 in which the sentences in the question replay database 21 have been coupled with sentences in dialogues time series information. When a new question sentence is input from a questioner, the processor calculates a similarity with question information for each past question sentence stored in the learning question-reply database 22, further calculates a summed similarity by summing similarities in a dialogue, extracts a past question sentence similar to the new question sentence on the basis of each similarity, and presents it as a reply candidate sentence.SELECTED DRAWING: Figure 2

Description

本発明は、対話内の新規質問文章に対する回答文章の候補となる回答候補文章を生成する対話回答候補提案システムおよび対話回答候補提案方法に関する。 The present invention relates to a dialog answer candidate proposing system and a dialog answer candidate proposing method that generate answer candidate sentences that are answer text candidates for new question sentences in a dialog.

インターネットや公衆通信網を介して得られたユーザの新規質問文章に対して、回答文章を準備するために、その候補となる回答候補文章を生成する技術がある。例えば、特許文献１には、ユーザの質問に基づく質問文字列を複数の形態素（単語）に分解し、得られた複数の形態素に基づいて、蓄積された過去の回答文字列群から回答文字列を選択して出力する技術が開示されている。 BACKGROUND ART In order to prepare answer texts for new question texts from users obtained via the Internet or public communication networks, there is a technique for generating answer candidate texts that serve as candidates. For example, in Patent Document 1, a question string based on a user's question is decomposed into a plurality of morphemes (words), and based on the obtained plurality of morphemes, an answer string is created from a group of accumulated past answer strings. A technique is disclosed for selecting and outputting.

特許文献２には、用意したＦＡＱに対して構文、意味、主題解析を行うことで、顧客からの問い合わせの内容が不完全であっても、自然な形での聞き返しに基づく対話により誘導しながら問い合わせの意図・意味を解釈し、内容の補完・絞り込みを行なって回答を提示する技術が開示されている。また、特許文献３には、蓄積された過去の対話文それぞれに対して「質問、情報提供、その他」というラベル付けを行い、それらを基に対話データに含まれる対話回答候補質問者による各言葉と対話回答候補回答者による各言葉のそれぞれについて、当該言葉が質問の表現であるかを推定し、質問の表現であると推定された対話回答候補回答者による言葉に対する応答としての対話回答候補質問者による言葉を対話回答候補対話データから抽出することにより質問文及び回答文を再構築する技術が開示されている。 Patent Document 2 discloses that by performing syntactic, semantic, and thematic analysis on a prepared FAQ, even if the content of the customer inquiry is incomplete, it can be guided through dialogue based on natural feedback. A technology has been disclosed that interprets the intent and meaning of an inquiry, complements and narrows down the content, and presents an answer. Furthermore, in Patent Document 3, each past dialog sentence that has been accumulated is labeled with "question, information provision, etc.", and based on the labels, each word by the questioner as a dialog answer candidate included in the dialog data is labeled. For each word by a dialog answer candidate answer, it is estimated whether the word is an expression of a question, and a dialog answer candidate question is generated as a response to the dialog answer candidate answer's word that is estimated to be an expression of a question. A technique has been disclosed for reconstructing question sentences and answer sentences by extracting words uttered by a person from dialog answer candidate dialog data.

特開２０１８－１８１０３３号公報JP 2018-181033 Publication 特開２０１９－３３１９号公報JP 2019-3319 Publication ＷＯ２０２１／２３４７７３号公報WO2021/234773 publication

特許文献１の質問回答支援装置は、ユーザの質問文章が内容的に不十分であり、補完する情報を得るための追加の質問や、内容を絞り込むための選択肢の提示といった、対話が必要になる場合の想定が不十分である。具体的には、図１４に示されるような入力文章に対して特許文献１の技術を適用すると、Ｑ１－Ｑ４の質問文をそれぞれ単独で学習させることとなる。すなわち、Ｑ２の「入っています。」に類似した質問に対して、「アンテナピクトは立っていますか？」と回答を提案するような学習をさせるが、これは「アンテナピクトは立っていますか？」と回答する経緯となった、「携帯電話が繋がらない・・・。」「電源は入っていますか？」という対話のやり取りを加味しない学習といえる。このようにそれぞれの文章を前後の文脈を含めずに学習させてしまうと、全く関係のない文脈の質問に対して「アンテナピクトは立ってますか？」を回答提案してしまうなど、モデルへの悪影響が出てしまう。 In the question answering support device of Patent Document 1, the user's question text is insufficient in terms of content, and dialogue is required such as asking additional questions to obtain complementary information and presenting options to narrow down the content. Insufficient assumptions about the case. Specifically, if the technique of Patent Document 1 is applied to an input sentence as shown in FIG. 14, each of the question sentences Q1 to Q4 will be learned independently. In other words, in response to a question similar to Q2, ``It's in.'', students will learn to suggest an answer such as ``Is the antenna picto standing?'' This can be said to be learning that does not take into account the dialogues such as ``I can't connect to my cell phone...'' and ``Is it turned on?'' that led to the answer ``?''. If you train each sentence without including its surrounding context, the model may end up proposing an answer such as "Is the antenna pictograph standing?" in response to a question with a completely unrelated context. There will be negative effects.

特許文献２に記載の技術は、複数回の対話を前提とした技術であり、ＦＡＱ（ＦｒｅｑｕｅｎｔｌｙＡｓｋｅｄＱｕｅｓｔｉｏｎｓ）から適切な聞き返し文章を生成するため、適切なやり取りが期待出来る。しかしながら、ＦＡＱは人手で生成する必要があり、生成コストがかかるという問題がある。チャットボットなどで良く採用される公知のルールベース型アルゴリズム（予め人手で回答するルールを定める方式）でも同様である。 The technology described in Patent Document 2 is a technology that assumes multiple dialogues, and generates appropriate feedback sentences from Frequently Asked Questions (FAQ), so appropriate exchanges can be expected. However, there is a problem in that the FAQ needs to be generated manually, and the generation cost is high. The same applies to well-known rule-based algorithms (methods in which rules for responding are manually determined in advance) that are often employed in chatbots and the like.

その問題に対し特許文献３に記載の技術では、過去の対話文中から、ユーザによる一連の文章を組み合わせ１つの文章へ、回答者による一連の文章を組み合わせ１つの文章へ再結合することで、ＦＡＱ生成コストの削減を図っている。例えば、図１４に示されるような入力文章に対して特許文献３を適用した場合、図１５に示されるような質問文及び回答文が抽出される。しかしながらこの例での「入っています。」、「３本立っています。」からも分かる通り、対話の際にユーザは必要最低限の応答しかせず、主語などの重要な単語を除いて回答することが一般的なため、ＡＩに学習させる質問文章からも図１５の主語などが抜け落ちてしまい、モデルへの悪影響が出てしまう虞がある。 To solve this problem, the technology described in Patent Document 3 combines a series of sentences written by the user into one sentence from past dialogues and recombines a series of sentences written by respondents into one sentence. Efforts are being made to reduce production costs. For example, when Patent Document 3 is applied to an input sentence as shown in FIG. 14, question sentences and answer sentences as shown in FIG. 15 are extracted. However, as you can see from "It's in." and "Three bars are standing." in this example, the user only responds to the minimum necessary during dialogue, excluding important words such as the subject. Since this is common, there is a risk that the subject in Figure 15 may be omitted from the question text that the AI is trained to learn, which could have a negative impact on the model.

そこで、本発明の目的は、対話内の新規質問文章に対して好適な回答候補文章を、低コストで出力する対話回答候補提案システムおよび対話回答候補提案方法を提供することを目的とする。 SUMMARY OF THE INVENTION Therefore, an object of the present invention is to provide a dialog answer candidate proposing system and a dialog answer candidate proposing method that output suitable answer candidate sentences for new question sentences in a dialog at a low cost.

本願において開示される発明のうち代表的な特徴を説明すれば次のとおりである。
本発明の一つの特徴によれば、プロセッサと、記憶装置と、質問者が使用するユーザ端末及び回答者が使用するオペレータ端末と通信を行う送受信装置を有し、質問者からの質問文章に対する回答文章の候補となる回答候補文章を生成してオペレータ端末に表示するようにした対話回答候補提案システムにおいて、質問者によって入力された過去の質問文章と、当該過去の質問文章に対する過去の回答文章を対応付けて対話時系列情報を付与して記録した質問回答データベースと、質問回答データベースに保存された情報に基づいて生成された学習用質問文章及び学習用回答文章を記録する学習用質問回答データベースと、質問者によって新規質問文章が入力されたら、質問回答データベースまたは学習用質問回答データベースから新規質問文章に類似する過去の質問文章または学習用質問文章を抽出する回答候補文章生成部と、抽出された新規質問文章に類似する過去の質問文章または学習用質問文章に対応付けられた過去の回答文章を用いて学習用質問回答データを生成する学習用質問回答データ生成部を設けた。学習用質問回答データ生成部は、質問回答データベース及び学習用質問回答データベースに保存された過去の質問文章及び学習用質問文章それぞれに対して、新規質問文章との類似度を算出し、各対話の合算類似度を算出することにより、新規質問文章との類似度または各対話の合算類似度に基づいて学習用質問回答データを生成する。 Representative features of the invention disclosed in this application are as follows.
According to one feature of the present invention, the present invention includes a processor, a storage device, and a transmitting/receiving device that communicates with a user terminal used by a questioner and an operator terminal used by a questioner, and the transmitting/receiving device communicates with a user terminal used by a questioner and an operator terminal used by a questioner. In an interactive answer candidate proposal system that generates answer candidate sentences that serve as sentence candidates and displays them on the operator terminal, past question sentences input by the questioner and past answer sentences for the past question sentences are A question-and-answer database in which dialogue time-series information is associated and recorded, and a learning question-and-answer database in which learning question sentences and learning answer sentences generated based on the information stored in the question and answer database are recorded. , an answer candidate sentence generation unit that extracts past question sentences or learning question sentences similar to the new question sentence from the question and answer database or the learning question and answer database when a new question sentence is input by the questioner; A learning question and answer data generation unit is provided that generates learning question and answer data using past question sentences similar to the new question sentence or past answer sentences associated with the learning question sentence. The learning question and answer data generation unit calculates the degree of similarity between each of the past question sentences and the learning question sentences stored in the question and answer database and the learning question and answer database with the new question sentences, and calculates the degree of similarity between each dialogue. By calculating the total similarity, learning question and answer data is generated based on the similarity with the new question sentence or the total similarity of each dialogue.

本発明の他の特徴によれば、記憶装置には、質問回答データベースと、学習用質問回答データベースと、疑問があることを表す疑問詞を保存する疑問詞テーブルと、質問文章の解析に当たって同義とする単語を保存する同義語テーブルと、否定肯定語テーブルが格納される。学習用質問回答データベースは、同義語テーブルに保存された単語を、文章解析に用いて過去の質問文章及び回答文章から抽出し、文章解析の結果を予め生成したルールに基づいて生成されたものである。回答候補文章生成部は、質問者からの新規質問文章が入力されると、学習用質問回答データベースを検索し、オペレータ端末に対して、新規質問文章に対する類似質問と、類似質問に対する回答候補を表示させることで、質問者への対話回答をアシストする。学習用質問回答データベースの構築は、学習用質問回答データ生成プログラムによりバッチ処理で実行される。 According to another feature of the present invention, the storage device includes a question and answer database, a learning question and answer database, an interrogative word table that stores interrogative words expressing questions, and a question word table that stores interrogative words expressing questions, and a A synonym table that stores words that correspond to the same term, and a negative/affirmative word table are stored. The learning question and answer database is generated based on rules that use text analysis to extract the words stored in the synonym table from past question texts and answer texts, and generate the text analysis results in advance. be. When a new question sentence is input from the questioner, the answer candidate sentence generation unit searches the learning question and answer database and displays similar questions to the new question sentence and answer candidates for the similar questions on the operator terminal. This will assist in answering the questioner's dialogue. The construction of the learning question and answer database is executed in batch processing by the learning question and answer data generation program.

本発明によれば、質問回答データベースから質問情報に類似する過去の質問文章を抽出し、抽出した質問情報に類似する過去の質問文章もしくは質問文章群と、それに対応付けられた過去の回答文章もしくは回答文章群を質問回答データベースから抽出して、回答候補文章を作成して学習用質問回答データベースを構築するので、質問回答応答時にオペレータは、学習用質問回答データベースを用いた質問者への回答案を取得することができ、新規質問文章に対する好適な回答候補文章を効率良く質問者に対して提供できる。 According to the present invention, past question sentences similar to question information are extracted from a question and answer database, and past question sentences or question sentence groups similar to the extracted question information and past answer sentences or A learning question-and-answer database is constructed by extracting answer sentences from the question-and-answer database and creating answer candidate sentences, so when responding to a question, the operator can use the learning question-and-answer database to provide a proposed answer to the questioner. can be obtained, and suitable answer candidate sentences for new question sentences can be efficiently provided to the questioner.

本発明の実施例に係る対話回答候補提案システム１のハードウェア構成を示すブロック図である。1 is a block diagram showing a hardware configuration of a dialog answer candidate proposal system 1 according to an embodiment of the present invention. FIG. 図１の副記憶装置３３に格納される内容の一例を示すデータ構成図である。2 is a data configuration diagram showing an example of contents stored in the secondary storage device 33 of FIG. 1. FIG. 図２の質問回答データベース２１に格納されるデータの一例を示す図である。3 is a diagram showing an example of data stored in the question and answer database 21 of FIG. 2. FIG. 図２の学習用質問回答データベース２２に格納されるデータの一例を示す図である。3 is a diagram showing an example of data stored in the learning question and answer database 22 of FIG. 2. FIG. 図２の疑問語テーブル２３に格納されるデータの一例を示す図である。3 is a diagram showing an example of data stored in the question word table 23 of FIG. 2. FIG. 図２の同義語テーブル２４に格納されるデータの一例を示す図である。3 is a diagram showing an example of data stored in the synonym table 24 of FIG. 2. FIG. 図２の肯定否定語テーブル２５に格納されるデータの一例を示す図である。3 is a diagram showing an example of data stored in the affirmative/negative word table 25 of FIG. 2. FIG. 本実施例における学習用質問データの生成手順を示すフローチャートである（その１）。It is a flowchart (1) which shows the generation procedure of the learning question data in a present Example. 本実施例における学習用質問データの生成手順を示すフローチャートである（その２）。It is a flowchart which shows the generation procedure of the learning question data in a present Example (Part 2). 本実施例の学習用質問データ生成における前処理、結合の例を示す図である。FIG. 3 is a diagram showing an example of preprocessing and combination in the generation of learning question data according to the present embodiment. オペレータ端末３に表示される質問回答表示画面１２００の一例を示す図である。3 is a diagram showing an example of a question and answer display screen 1200 displayed on the operator terminal 3. FIG. 対話回答候補提案システム１の回答候補文章生成処理の手順を示すフローチャートである。2 is a flowchart showing the procedure of an answer candidate sentence generation process of the dialog answer candidate suggestion system 1. FIG. オペレータ端末３に表示される回答候補文章表示画面１４００の一例を示す説明図である。14 is an explanatory diagram showing an example of an answer candidate sentence display screen 1400 displayed on the operator terminal 3. FIG. 従来技術の課題及び本発明の特徴を説明するための入力文章の例を示す図である。FIG. 2 is a diagram showing an example of an input sentence for explaining the problems of the prior art and the features of the present invention. 従来技術の課題を説明するための質問文章及び回答文章の例を示す図である。It is a figure which shows the example of a question text and an answer text for explaining the problem of a prior art.

以下、図面を参照して本発明の実施例を説明する。本発明は、他の種々の形態でも実施することが可能である。特に限定しない限り、各構成要素は単数でも複数でも構わない。各種情報の例として、「テーブル」という表現にて説明しているが、各種情報はこれら以外のデータ構造で表現されてもよい。例えば、「ＸＸリスト」、「ＸＸキュー」、「ＸＸ情報」としても良い。識別情報について説明する際に、「識別情報」、「識別子」、「名」、「ＩＤ」、「番号」等の表現を用いるが、これらについてはお互いに置換が可能である。 Embodiments of the present invention will be described below with reference to the drawings. The present invention can also be implemented in various other forms. Unless specifically limited, each component may be singular or plural. Although the expression "table" is used in the description as an example of various types of information, various types of information may be expressed using data structures other than these. For example, it may be "XX list", "XX queue", or "XX information". When describing identification information, expressions such as "identification information", "identifier", "name", "ID", and "number" are used, but these expressions can be replaced with each other.

プログラムは、プログラムソースから計算機にインストールされてもよい。プログラムソースは、例えば、プログラム配布サーバまたは計算機が読み取り可能な記憶メディアであってもよい。プログラムソースがプログラム配布サーバの場合、プログラム配布サーバはプロセッサと配布対象のプログラムを記憶する記憶資源を含み、プログラム配布サーバのプロセッサが配布対象のプログラムを他の計算機に配布してもよい。また、実施例において、２以上のプログラムが１つのプログラムとして実現されてもよいし、１つのプログラムが２以上のプログラムとして実現されてもよい。 A program may be installed on a computer from a program source. The program source may be, for example, a program distribution server or a computer-readable storage medium. When the program source is a program distribution server, the program distribution server includes a processor and a storage resource for storing the program to be distributed, and the processor of the program distribution server may distribute the program to be distributed to other computers. Furthermore, in the embodiments, two or more programs may be implemented as one program, or one program may be implemented as two or more programs.

＜システム構成＞
図１は本発明の実施例に係る対話回答候補提案システム１のシステム構成を示すブロック図である。対話回答候補提案システム１は、プログラムを実行して処理を行うコンピュータによって実現できるもので、プロセッサ３１と、記憶資源（主記憶装置３２、副記憶装置３３）と、入力装置３４と出力装置３５を含んで構成され、これらはデータバス３７により相互に接続される。対話回答候補提案システム１のデータバス７にはさらに、ネットワーク５に接続するためのネットワークインターフェース３６が設けられる。 <System configuration>
FIG. 1 is a block diagram showing the system configuration of a dialogue answer candidate proposal system 1 according to an embodiment of the present invention. The dialog answer candidate proposal system 1 can be realized by a computer that executes a program and performs processing, and includes a processor 31, storage resources (main storage 32, secondary storage 33), input device 34, and output device 35. These are interconnected by a data bus 37. The data bus 7 of the dialogue answer candidate proposal system 1 is further provided with a network interface 36 for connecting to the network 5.

プロセッサ３１は、副記憶装置３３に記憶されたデータやプログラムを主記憶装置３２に読み出して実行することにより特定の機能を実現する。プロセッサ３１は、例えばＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、又は、ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）であり、使用されるプロセッサ３１の種類、動作周波数、コア数、使用個数は任意である。尚、処理の主体が、プロセッサ３１だけでなく、プロセッサを有するコントローラ、装置、システム、計算機、ノードであっても良い。プログラムを実行して行う処理の主体は、演算部であれば良く、特定の処理を行う専用回路を含んでいてもよい。ここで、専用回路とは、例えばＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）やＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）、ＣＰＬＤ（ＣｏｍｐｌｅｘＰｒｏｇｒａｍｍａｂｌｅＬｏｇｉｃＤｅｖｉｃｅ）等である。 The processor 31 implements specific functions by reading data and programs stored in the secondary storage device 33 into the main storage device 32 and executing them. The processor 31 is, for example, a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit), and the type, operating frequency, number of cores, and number of processors used are arbitrary. Note that the subject of processing may be not only the processor 31 but also a controller, device, system, computer, or node having a processor. The main body of processing performed by executing the program may be an arithmetic unit, and may include a dedicated circuit that performs specific processing. Here, the dedicated circuits include, for example, FPGA (Field Programmable Gate Array), ASIC (Application Specific Integrated Circuit), and CPLD (Complex Programmable Logic D). evice) etc.

プロセッサ３１は、主記憶装置３２又は副記憶装置３３にインストールされている様々なプログラムを実行する。ここでは、学習用質問回答データ生成プログラム２６（図２参照）を実行することにより、学習用質問回答データ生成部１１の機能を実現し、回答候補文章生成プログラム２７（図２参照）を実行することによって回答候補文章生成部１２の機能をソフトウェアによって実現する。また、質問応答実行プログラム２８（図２参照）を用いてチャット環境の提供などの質問回答実行部１３の機能をソフトウェアによって実現する。プロセッサ３１が実現する機能は、これら３つだけに限らないが、その他の機能についての図示や説明は省略している。 The processor 31 executes various programs installed in the main storage device 32 or the secondary storage device 33. Here, the function of the learning question and answer data generation unit 11 is realized by executing the learning question and answer data generation program 26 (see FIG. 2), and the answer candidate sentence generation program 27 (see FIG. 2) is executed. By doing this, the functions of the answer candidate sentence generation section 12 are realized by software. Further, the functions of the question and answer execution unit 13, such as providing a chat environment, are realized by software using the question and answer execution program 28 (see FIG. 2). Although the functions realized by the processor 31 are not limited to these three, illustrations and explanations of other functions are omitted.

ネットワーク５は、例えばインターネットのような公知のグローバルネットワークを用いることができる。しかしながら、利用するネットワーク５はインターネットだけに制限されずに、通信キャリアが提供する公衆通信網や、構内ネットワーク（ＬＡＮ：ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、その他の公知のネットワークを用いても良い。ネットワークインターフェース（Ｉ／Ｆ：Ｉｎｔｅｒｆａｃｅ）３６は、対話回答候補提案システム１とネットワーク５とを接続可能にするための機器であり、対話回答候補提案システム１は、ネットワーク５に接続されている複数のユーザが所有するそれぞれのユーザ端末２や、ヘルプデスク等で作業する複数のオペレータ（回答者）等が使用するオペレータ端末３とのデータの送受信を行う。 As the network 5, a known global network such as the Internet can be used, for example. However, the network 5 to be used is not limited to the Internet, and may be a public communication network provided by a communication carrier, a local area network (LAN), or other known networks. The network interface (I/F) 36 is a device that enables the dialogue answer candidate proposal system 1 and the network 5 to be connected. Data is transmitted and received between each user terminal 2 owned by the user and the operator terminal 3 used by a plurality of operators (respondents) working at a help desk or the like.

対話回答候補提案システム１にはさらに、マウスやキーボート等の操作を受け付ける入力装置３４と、ディスプレイ装置等の情報を出力する出力装置３５が設けられ、対話回答候補提案システム１を直接操作する作業者によって操作可能に構成される。尚、入力装置３４と出力装置３５の機能を、オペレータ端末３の一つで兼用するように構成することも可能である。 The dialog answer candidate proposal system 1 is further provided with an input device 34 that accepts operations such as a mouse or a keyboard, and an output device 35 that outputs information such as a display device. operably configured by. Note that it is also possible to configure one of the operator terminals 3 to serve both the functions of the input device 34 and the output device 35.

主記憶装置３２は、ＲＡＭ等の揮発性記憶素子を有し、プロセッサ３１が実行するプログラムや、データを一次的に記憶する。副記憶装置３３は、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）やＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の不揮発性記憶素子を有し、プログラムやデータ等を記憶する装置である。副記憶装置３３の構成（記憶資源）は、上述の構成だけに限られずに、副記憶装置３３に加えて、又は、副記憶装置３３に替えて外部の記憶装置、ネットワークサーバ、クラウドサーバなど利用して、データやプログラム等を分散して記憶させるように構成しても良い。 The main storage device 32 has a volatile storage element such as a RAM, and temporarily stores programs executed by the processor 31 and data. The secondary storage device 33 is a device that has a non-volatile storage element such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive), and stores programs, data, and the like. The configuration (storage resources) of the secondary storage device 33 is not limited to the above-mentioned configuration, and external storage devices, network servers, cloud servers, etc. may be used in addition to or in place of the secondary storage device 33. It may also be configured to store data, programs, etc. in a distributed manner.

ユーザ端末２は、質問を行うユーザが操作する情報機器であり、ユーザ端末２から対話回答候補提案システム１に対する問題文章が入力される。ユーザ端末２は、少なくとも、ユーザから入力を受け付ける入力装置（図示せず）と、ディスプレイやタッチパネルなどの情報を表示する出力装置（図示せず）と、ネットワークインターフェース（図示せず）を備える電子機器であり、例えば、パーソナルコンピュータ、タブレット、スマートフォン、携帯電話等の公知の情報端末を用いることができる。ユーザ端末２は、ネットワーク５を介して、対話回答候補提案システム１やオペレータ端末３と双方向の通信ができる。 The user terminal 2 is an information device operated by a user who asks a question, and a question sentence for the dialogue answer candidate proposal system 1 is input from the user terminal 2. The user terminal 2 is an electronic device that includes at least an input device (not shown) that receives input from a user, an output device (not shown) that displays information such as a display or touch panel, and a network interface (not shown). For example, a known information terminal such as a personal computer, tablet, smartphone, or mobile phone can be used. The user terminal 2 is capable of bidirectional communication with the dialog answer candidate proposal system 1 and the operator terminal 3 via the network 5 .

オペレータ端末３は、ユーザ端末２を操作するユーザと対話を行うオペレータが使用する機器であり、ヘルプデスク等のオペレータが作業する場所に設置される。オペレータ端末３は、少なくとも、オペレータからの入力操作を受け付ける入力装置（図示せず）と、ディスプレイやタッチパネルなどの情報を表示する出力装置（図示せず）と、ネットワークインターフェース（図示せず）を備える。オペレータ端末３として、パーソナルコンピュータ、タブレット、スマートフォン等の公知の電子機器を用いることができる。尚、オペレータ端末３の内部に副記憶装置３３で格納されたデータの一部、例えば質問回答データベース２１、学習用質問回答データベース２２の複製を内部記憶装置にあらかじめコピーしておいて、ユーザからの質問回答処理をオペレータ端末のプロセッサが行うようにしても良い。このように、対話回答候補提案システム１のプロセッサ３１が実行する一部の機能を、オペレータ端末３にて実行させるような構成にして処理の高速化を図るようにしても良い。 The operator terminal 3 is a device used by an operator who interacts with a user who operates the user terminal 2, and is installed at a place where the operator works, such as a help desk. The operator terminal 3 includes at least an input device (not shown) that receives input operations from an operator, an output device (not shown) that displays information such as a display or a touch panel, and a network interface (not shown). . As the operator terminal 3, a known electronic device such as a personal computer, a tablet, or a smartphone can be used. Note that some of the data stored in the sub-storage device 33 inside the operator terminal 3, for example, copies of the question-and-answer database 21 and the learning question-and-answer database 22, are copied to the internal storage device in advance so that the user can The question answering process may be performed by the processor of the operator terminal. In this way, some of the functions executed by the processor 31 of the dialogue answer candidate proposal system 1 may be configured to be executed by the operator terminal 3 to speed up the processing.

ユーザによって入力された新規質問文章が、ユーザ端末２から対話回答候補提案システム１に送信されると、プロセッサ３１は受け取った新規質問文章に基づいて、新規質問文章に対する回答文章の候補となる過去の質問文章もしくは質問文章群と、それに対応付けられた過去の回答文章もしくは回答文章群を生成し、それらをネットワーク５を介してオペレータ端末３の出力装置に表示する。オペレータは、対話回答候補提案システム１による対話回答候補を含む様々な情報を見ながら、質問したユーザに対する適切な回答を行う。オペレータによって確定された質問文章への回答は、対話回答候補提案システム１を介してユーザ端末２に送信される。つまり、プロセッサ３１は、質問を行うユーザと、それに回答する担当のオペレータとの情報のやり取りの仲介をシステム的に行いつつ、ユーザによって行われた質問に関連する過去の質問文章もしくは質問文章群と、それに対応付けられた過去の回答文章もしくは質問文章群を生成して、オペレータ端末３に表示することにより、オペレータによる回答作業を支援する。オペレータはオペレータ端末３に表示された様々な情報を参照しながら、ユーザに対する回答を作成してユーザに返信できるので、迅速かつ高精度の回答作業を行うことができる。 When a new question text input by the user is transmitted from the user terminal 2 to the dialogue answer candidate proposal system 1, the processor 31 selects past sentences that are candidates for answer sentences to the new question text based on the received new question text. A question text or question text group and past answer texts or answer text groups associated therewith are generated and displayed on the output device of the operator terminal 3 via the network 5. The operator provides an appropriate answer to the user who asked the question while looking at various information including the dialog answer candidates provided by the dialog answer candidate suggestion system 1. The answer to the question text determined by the operator is transmitted to the user terminal 2 via the dialog answer candidate proposal system 1. In other words, the processor 31 systemically mediates the exchange of information between the user who asks the question and the operator in charge of answering the question, and also uses past question texts or question text groups related to the questions asked by the user. , a group of past answer texts or question texts associated with it is generated and displayed on the operator terminal 3, thereby supporting the answer work by the operator. Since the operator can create an answer to the user and send it back to the user while referring to various information displayed on the operator terminal 3, the operator can quickly and accurately answer the answer.

学習用質問回答データ生成部１１は、質問回答データベース２１（図２で後述）に蓄積された過去の質問及び回答文章を用いて学習用質問回答データを生成する機能を実現する。この機能はコンピュータプログラムを実行することで実現され、その詳細は図８にて後述する。回答候補文章生成部１２は、ユーザからの新規質問文章が入力された場合に、過去の質問文章もしくは質問文章群と、それに対応付けられた過去の回答文章もしくは質問文章群を参照しながら最適な回答候補を生成する。質問回答実行部１３は、ユーザからの質問文を受け付けて、オペレータ端末３に送信し、オペレータ端末３から入力された回答文をユーザ端末２に返信するという一連の質問と回答のやり取り環境を提供する。質問回答実行部１３は、チャットボットや、電子メールを用いた質問－応答、チャット画面を介した一問一答形式の回答環境、Ｗｅｂ画面を用いた専用の質問応答環境等の公知の方法を用いることができ、これらはプロセッサ３１がコンピュータプログラム（図２にて後述する質問応答実行プログラム２８）を実行することで実現できる。質問回答実行部１３は、回答候補文章生成部１２によって生成された回答文の候補を、ネットワーク５介してオペレータ端末３に送信する。 The learning question and answer data generation unit 11 realizes a function of generating learning question and answer data using past questions and answer texts stored in a question and answer database 21 (described later in FIG. 2). This function is realized by executing a computer program, the details of which will be described later with reference to FIG. When a new question text is input from the user, the answer candidate text generation unit 12 generates an optimal answer text while referring to past question texts or question text groups and past answer texts or question text groups that are associated with the past question texts or question text groups. Generate answer candidates. The question-and-answer execution unit 13 provides a series of question-and-answer exchange environments in which questions are received from the user, sent to the operator terminal 3, and answers input from the operator terminal 3 are returned to the user terminal 2. do. The question-and-answer execution unit 13 uses known methods such as a chatbot, a question-and-answer using e-mail, a question-and-answer format answering environment via a chat screen, and a dedicated question-and-answering environment using a web screen. These can be realized by the processor 31 executing a computer program (question answering execution program 28 described later in FIG. 2). The question and answer execution unit 13 transmits the answer sentence candidates generated by the answer candidate sentence generation unit 12 to the operator terminal 3 via the network 5.

図２は、図１の副記憶装置３３に格納される内容の一例を示すデータ構成図である。副記憶装置３３には、質問回答データベース２１と、学習用質問回答データベース２２が構築される。また、これらのデータベースを構築し、または、これらのデータベースを利用する際に使用される疑問詞テーブル２３と、同義語テーブル２４と、肯定否定語テーブル２５が格納される。さらに、図１で示したプロセッサ３１が実行する学習用質問回答プログラム２６と、回答候補文章生成プログラム２７と、質問応答実行プログラム２８が格納される。副記憶装置３３には、その他の多様なプログラム、データ等が記憶されるがそれらの説明は省略する。 FIG. 2 is a data configuration diagram showing an example of the contents stored in the secondary storage device 33 of FIG. 1. A question and answer database 21 and a learning question and answer database 22 are constructed in the secondary storage device 33. Also stored are an interrogative word table 23, a synonym table 24, and an affirmation/negative word table 25, which are used when constructing or using these databases. Furthermore, a learning question-and-answer program 26, an answer candidate sentence generation program 27, and a question-and-answer execution program 28 executed by the processor 31 shown in FIG. 1 are stored. Although various other programs, data, etc. are stored in the secondary storage device 33, a description thereof will be omitted.

質問回答データベース２１は、過去の質問文章と、過去の質問文章に対する過去の回答文章と、過去の質問文章ベクトルと、を対応付けて格納したデータ群である（詳細は図３にて後述）。学習用質問回答データベース２２は、学習用質問回答データ生成部１１にて、質問回答データベース２１中の文章を用いて生成したデータ群を保存するデータベースである（詳細は図４にて後述）。 The question and answer database 21 is a data group in which past question texts, past answer texts to past question texts, and past question text vectors are stored in association with each other (details will be described later with reference to FIG. 3). The learning question and answer database 22 is a database that stores a data group generated by the learning question and answer data generation unit 11 using sentences in the question and answer database 21 (details will be described later with reference to FIG. 4).

疑問詞テーブル２３は、疑問があることを表す疑問詞及び要望があることを表す要望語を保存するデータベースである（詳細は図５にて後述）。同義語テーブル２４は、同義語を保存するデータベースである（詳細は図６にて後述）。肯定否定語テーブル２５は、文章が肯定文なのか否定文なのかを決めるために保存しているデータベースである（詳細は図７にて後述）。 The interrogative word table 23 is a database that stores interrogative words expressing a question and request words expressing a request (details will be described later with reference to FIG. 5). The synonym table 24 is a database that stores synonyms (details will be described later with reference to FIG. 6). The affirmative/negative word table 25 is a database stored to determine whether a sentence is an affirmative sentence or a negative sentence (details will be described later with reference to FIG. 7).

学習用質問回答データ生成プログラム２６と、回答候補文章生成プログラム２７と、質問応答実行プログラム２８がインストールされている。図１にて示した学習用質問回答データ生成部１１は、プロセッサ３１が副記憶装置３３に記憶されて学習用質問回答データ生成プログラム２６を読み出して実行することにより実現される。また、図１にて示した回答候補文章生成部１２は、プロセッサ３１が副記憶装置３３に記憶されている回答候補文章生成プログラム２７を読み出して実行することにより実現される。質問応答実行プログラム２８は、質問者によるユーザ端末２とオペレータ端末３を用いた質問及び回答のやり取り環境を実現するプログラムである。 A learning question and answer data generation program 26, an answer candidate sentence generation program 27, and a question and answer execution program 28 are installed. The learning question and answer data generation unit 11 shown in FIG. 1 is realized by the processor 31 reading and executing the learning question and answer data generation program 26 stored in the secondary storage device 33. Further, the answer candidate sentence generation section 12 shown in FIG. 1 is realized by the processor 31 reading out and executing the answer candidate sentence generation program 27 stored in the sub-storage device 33. The question and answer execution program 28 is a program that realizes an environment in which a questioner can exchange questions and answers using the user terminal 2 and the operator terminal 3.

＜各種データ構造＞
図３は、質問回答データベース２１の一例を示す図である。質問回答データベース２１において、質問回答ＩＤ５０１は、格納されている複数の過去の質問文章５０２を識別するＩＤである。過去の回答文章５０３は、過去の質問文章５０２に対して回答された文章であり、ＮｅｘｔＩＤ５０４（詳細は後述する）は、過去の質問文章５０２に続く次の質問の質問回答ＩＤ５０１である。同一のレコード５１１内には、さらに、文章ベクトル５０５、ユーザ評価５０６、学習用データ生成進捗５０７が過去の質問文章５０２に対応付けて格納される。 <Various data structures>
FIG. 3 is a diagram showing an example of the question and answer database 21. In the question and answer database 21, a question and answer ID 501 is an ID that identifies a plurality of stored past question texts 502. The past answer text 503 is a text that was answered to the past question text 502, and the Next ID 504 (details will be described later) is the question answer ID 501 of the next question following the past question text 502. In the same record 511, a sentence vector 505, a user evaluation 506, and a training data generation progress 507 are further stored in association with the past question sentence 502.

この様に、質問回答データベース２１は、過去の質問文章５０２と、当該過去の質問文章５０２に対する過去の回答文章５０３を対応付けて保存する。ＮｅｘｔＩＤ５０４は、対話の繋がりを保存するための番号であり、「０」は対話終了、「－（ＮＵＬＬ）」は一度のやり取りで回答が完了したことを意味する。質問回答ＩＤ５０１を用いてＮｅｘｔＩＤ５０４を付与することで、“対話時の系列情報”が確認できる。文章ベクトル５０５は、質問回答データベース２１に、新たに過去の質問文章と過去の回答文章との組が保存される度に、質問回答データベース２１に保存された全ての過去の質問文章に対して生成してもよい。また、例えば、過去の質問文章と過去の回答文章との組が所定の数、質問回答データベース２１に保存される毎等、あらかじめ設定したタイミングで文章ベクトル５０５を生成し直しても良い。ユーザ評価５０６は、回答に対するユーザからの評価を、対話結果と関連付けるために保存され、「－（ＮＵＬＬ）」は「ユーザ評価無し」を意味する。学習用データ生成進捗５０７は、学習用質問回答データ生成プログラムが、どこまで実行されているかを判断するために保存され、「〇」はプログラムにより学習用データが生成済み、「－（ＮＵＬＬ）」はプログラムは学習用データ生成対象外、「×」はプログラムによるデータ生成が未完了を意味している。 In this way, the question and answer database 21 stores past question texts 502 and past answer texts 503 for the past question texts 502 in association with each other. NextID 504 is a number for saving the connection of dialogues; "0" means that the dialogue has ended, and "- (NULL)" means that the answer has been completed in one exchange. By using the question and answer ID 501 and assigning the Next ID 504, "sequence information at the time of dialogue" can be confirmed. The sentence vector 505 is generated for all past question sentences stored in the question and answer database 21 every time a new pair of past question sentences and past answer sentences is stored in the question and answer database 21. You may. Alternatively, the text vector 505 may be regenerated at a preset timing, such as every time a predetermined number of pairs of past question texts and past answer texts are stored in the question and answer database 21. User evaluation 506 is saved to associate the user's evaluation of the answer with the interaction result, and "-(NULL)" means "no user evaluation." The learning data generation progress 507 is saved to determine how far the learning question and answer data generation program has been executed. The program is not subject to training data generation, and "x" means that data generation by the program has not been completed.

図４は、学習用質問回答データベース２２の一例を示す図である。図４に示す学習用質問回答データベース２２では、学習用質問文章６０２と、学習用質問回答データ生成プログラム２６（図２参照）を実行することで生成される学習用回答文章６０３が格納される。質問回答Ｎｏ６０１は、学習用質問文章６０２を識別するためのＩＤである。作成元ＩＤ６０４は、学習用質問文章６０２を構成する元になった過去の質問文章５０２（図３参照）のＩＤ５０１（図３参照）である。文章ベクトル６０５は、自然言語の文章を数値化したものである。文章ベクトル６０５を算出する方法は公知の手法を用いることができ、数字の組を使って文章の特徴が表現される。ユーザ評価６０６は、学習用質問文章６０２に対する学習用回答文章６０３のユーザによる評価を格納したもので、質問終了時等にユーザによって入力された評価である。ここでは、学習用回答文章６０３の結果として学習用質問文章６０２に含まれる質問を行ったユーザが、学習用回答文章６０３の最終回答を適切（“Ｇｏｏｄ”）と評価したことが入力されている。この様に、学習用質問回答データベース２２は、学習用質問文章６０２と、当該学習用質問文章６０２に対する学習用回答文章６０３を対応付けて保存する。 FIG. 4 is a diagram showing an example of the learning question and answer database 22. The learning question and answer database 22 shown in FIG. 4 stores learning question sentences 602 and learning answer sentences 603 generated by executing the learning question and answer data generation program 26 (see FIG. 2). The question answer No. 601 is an ID for identifying the learning question text 602. The creation source ID 604 is the ID 501 (see FIG. 3) of the past question text 502 (see FIG. 3) from which the study question text 602 was constructed. The sentence vector 605 is a numerical representation of a natural language sentence. A known method can be used to calculate the sentence vector 605, and the characteristics of the sentence are expressed using a set of numbers. The user evaluation 606 stores the user's evaluation of the learning answer text 603 with respect to the learning question text 602, and is the evaluation input by the user at the end of the question. Here, it is input that the user who asked the question included in the learning question text 602 as a result of the learning answer text 603 evaluated the final answer of the learning answer text 603 as appropriate (“Good”). . In this manner, the learning question and answer database 22 stores the learning question text 602 and the learning answer text 603 for the learning question text 602 in association with each other.

図５は、疑問詞テーブル２３の一例を示す図である。図５に示す疑問詞テーブル２３では、ユーザからの質問文を解析するために利用される疑問詞７０２がリスト形式であらかじめ格納されている。疑問詞ＩＤ７０１は、格納された疑問詞７０２を識別するためのＩＤである。疑問詞７０２は、疑問があることを表す疑問詞を格納したものである。疑問詞種別７０３は、疑問詞７０２のそれぞれに対して、どの種別の疑問詞かを示す。図５には、疑問詞７０２及び種別７０３の例として、「いつ：ＷＨＥＮ」と「なぜ：ＷＨＹ」のレコード（疑問詞ＩＤ＝“１”、“２”）が示されている。疑問詞７０２：種別７０３の他の例として、「だれ：ＷＨＯ」、「何：ＷＨＡＴ」、「どこ：ＷＨＥＲＥ」、「どう：ＨＯＷ」、「いつ：ＷＨＥＮ」、「どの：ＷＨＩＣＨ」、「どなた：ＷＨＯ」、「どれ：ＷＨＡＴ」、「どんな：ＨＯＷ」、「いかなる：ＨＯＷ」、「ますか：ＹＥＳ／ＮＯ」「？：ＹＥＳ／ＮＯ」が挙げられる。 FIG. 5 is a diagram showing an example of the interrogative word table 23. In the interrogative word table 23 shown in FIG. 5, interrogative words 702 used to analyze a question from a user are stored in advance in a list format. The interrogative word ID 701 is an ID for identifying the stored interrogative word 702. The interrogative word 702 stores interrogative words expressing doubts. The interrogative word type 703 indicates which type of interrogative word each of the interrogative words 702 is. In FIG. 5, records of "WHEN" and "WHY" (interrogative word ID="1", "2") are shown as examples of the interrogative word 702 and the type 703. Other examples of interrogative word 702: type 703 include "WHO", "WHAT", "WHERE", "HOW", "WHEN", "WHICH", and "Who". Examples include ": WHO", "WHAT", "HOW", "HOW", "YES/NO", "YES/NO".

図６は、同義語テーブル２４の一例を示す図である。図６に示す同義語テーブル２４では、同義語ＩＤ８０１は、同義語８０２を識別するためのＩＤである。同義語８０２は、格納される同義語のペアを表す単語である。図６には、同義語８０２の例として、「スマホ」と、その同義語にあたる「スマートフォン」を示した。このように、対話回答候補提案システム１における文章解析処理では、ユーザから「スマホ」と入力された場合は、「スマートフォン」と同義の用語として取り扱う。同義語テーブル２４には多数の同義語８０２が格納される。 FIG. 6 is a diagram showing an example of the synonym table 24. In the synonym table 24 shown in FIG. 6, a synonym ID 801 is an ID for identifying a synonym 802. The synonym 802 is a word representing a stored synonym pair. FIG. 6 shows "smartphone" and its synonym "smartphone" as an example of the synonym 802. In this way, in the text analysis process in the dialog answer candidate proposal system 1, when the user inputs "smartphone", it is treated as a term synonymous with "smartphone". A large number of synonyms 802 are stored in the synonym table 24.

図７は、肯定否定語テーブル２５の一例を示す図である。図７に示す肯定否定語テーブル２５では、肯定否定単語９０２の一覧が格納される。肯定否定語ＩＤ９０１は、肯定否定単語９０２のそれぞれを識別するＩＤである。肯定否定単語９０２とは、文章に含まれる意図に関して重要な意味をもつ場合が多いと考えられる単語を列挙したものである。肯定否定単語種別９０３とは、肯定否定単語９０２が肯定と否定どちらの意味かを示す情報である。図７には、肯定否定単語９０２の例として、「はい」「違う」が格納され、それらの種別９０２が、「はい：肯定」、「違う：否定」を示している。尚、肯定否定単語９０２、肯定否定単語種別９０３は、図７で示す例以外にも多数のデータが格納される。 FIG. 7 is a diagram showing an example of the affirmative/negative word table 25. In the affirmative/negative word table 25 shown in FIG. 7, a list of positive/negative words 902 is stored. The affirmative/negative word ID 901 is an ID for identifying each positive/negative word 902. The affirmative/negative words 902 are a list of words that are considered to often have important meanings regarding the intention contained in the sentence. The affirmative/negative word type 903 is information indicating whether the positive/negative word 902 means affirmative or negative. In FIG. 7, "yes" and "no" are stored as examples of affirmative/negative words 902, and their types 902 indicate "yes: affirmative" and "no: negative." Note that a large number of data other than the example shown in FIG. 7 are stored in the affirmative/negative word 902 and the affirmative/negative word type 903.

＜類似文章抽出法＞
対話回答候補提案システム１は、ユーザによって入力された新規質問文章に基づいて、回答候補文章を生成する過程で、ｔｆｉｄｆ法（単語頻度逆文章頻度法）といった公知技術で文章のベクトル化、モデル生成を行い、その後ベクトル同士のコサイン類似度を算出する。以下では、文章のベクトル化、モデル生成方法と、文章と文章のコサイン類似度を算出し、対象とする文章に類似する類似文章を抽出する類似文章抽出方法の概要を以下に説明する。文章ベクトル化及びコサイン類似度の算出では、複数の文章が格納されたデータベース（本実施例では質問回答データベース２１、学習用質問回答データベース２２）を使用する。 <Similar sentence extraction method>
In the process of generating answer candidate sentences based on a new question sentence input by the user, the dialog answer candidate proposal system 1 vectorizes the sentence and generates a model using a known technique such as the tfidf method (word frequency inverse sentence frequency method). Then, the cosine similarity between the vectors is calculated. Below, an outline of a text vectorization method, a model generation method, and a similar text extraction method that calculates the cosine similarity between texts and extracts similar texts similar to the target text will be explained below. In text vectorization and calculation of cosine similarity, a database in which a plurality of texts are stored (in this embodiment, the question and answer database 21 and the learning question and answer database 22) is used.

文章と文章のコサイン類似度を算出し、対象とする文章に類似する類似文章を抽出する類似文章抽出方法では、以下で説明するように、データベースに格納されている文章それぞれと、対象とする文章とに、ｔｆｉｄｆ法等でベクトルを算出し、コサイン類似度を算出する。 In the similar sentence extraction method, which calculates the cosine similarity between sentences and extracts similar sentences that are similar to the target sentence, as explained below, each sentence stored in the database and the target sentence are Then, a vector is calculated using the tfidf method or the like, and a cosine similarity is calculated.

ｔｆｉｄｆ法を用いた場合、まずデータベースに格納されている全文章と、対象とする文章を形態素解析し、文章を単語（形態素）に分解する。次に、分解して得られた複数の単語から、単語の重複する分を削除し、単語それぞれを成分とする単語ベクトルを生成する。次に、データベースに格納されている文章それぞれと、対象とする文章に対してｔｆｉｄｆベクトルを算出する。ｔｆｉｄｆベクトルは、単語ベクトルの成分の単語に対するｔｆｉｄｆ値を成分とするベクトルである。 When using the tfidf method, first, all sentences stored in the database and the target sentence are morphologically analyzed, and the sentences are broken down into words (morphemes). Next, duplicate words are deleted from the plurality of words obtained by decomposition, and a word vector having each word as a component is generated. Next, a tfidf vector is calculated for each sentence stored in the database and the target sentence. The tfidf vector is a vector whose components are tfidf values for words that are components of the word vector.

単語ベクトルとｔｆｉｄｆベクトルとの例を挙げると、「スマートフォンは軽い。」という文を、形態素解析して生成される単語ベクトルは、例えば、（スマートフォン，は，軽い，。）となる。これに対するｔｆｉｄｆベクトルは、例えば、（「スマートフォン」のｔｆｉｄｆ値，「は」のｔｆｉｄｆ値，「軽い」のｔｆｉｄｆ値，「。」のｔｆｉｄｆ値）となる。 To give an example of a word vector and a tfidf vector, a word vector generated by morphologically analyzing the sentence "Smartphones are light." is, for example, (smartphone, wa, light, .). The tfidf vector for this is, for example, (tfidf value of "smartphone", tfidf value of "ha", tfidf value of "light", tfidf value of ".").

次に、データベースに格納されている文章のｔｆｉｄｆベクトルそれぞれと、対象とする文章のｔｆｉｄｆベクトルとのコサイン類似度（２つのｔｆｉｄｆベクトルの間の角度に対するコサインの値）を算出する。２つのｔｆｉｄｆベクトルＡ、Ｂのコサイン類似度は、コサイン類似度＝Ａ・Ｂ／（｜Ａ｜｜Ｂ｜）となる。対象文章とのコサイン類似度の値が大きい文章ほど（コサイン類似度が高い文章ほど）、類似度が高い文章とする。 Next, the cosine similarity (cosine value for the angle between the two tfidf vectors) between each of the tfidf vectors of the sentences stored in the database and the tfidf vector of the target sentence is calculated. The cosine similarity of the two tfidf vectors A and B is cosine similarity=A·B/(|A||B|). A sentence with a larger value of cosine similarity with the target sentence (a sentence with a higher cosine similarity) is considered to have a higher degree of similarity.

そして、データベースに含まれる文章のうちで、コサイン類似度の高さで上位から所定の割合（または所定の数）の文章を、類似度が高い類似文章とする。ここで、コサイン類似度の代わりに、データベースに格納されている文章のｔｆｉｄｆベクトルと、対象とする文章のｔｆｉｄｆベクトル、の内積を用いても良い。 Among the sentences included in the database, a predetermined percentage (or a predetermined number) of sentences from the top in terms of cosine similarity are determined to be similar sentences with high similarity. Here, instead of the cosine similarity, the inner product of the tfidf vector of the text stored in the database and the tfidf vector of the target text may be used.

以上の説明は、ｔｆｉｄｆ法の概要であり、ｔｆｉｄｆ法を用いる際のｔｆｉｄｆ法のアルゴリズムは、以上で説明した方法から適宜変更できる。また、ｔｆｉｄｆ法の「文章と文章のコサイン類似度を算出し、対象とする文章に類似する類似文章を抽出する類似文章抽出方法」に換えて、例えばＢＥＲＴ（ＢｉｄｉｒｅｃｔｉｏｎａｌＥｎｃｏｄｅｒＲｅｐｒｅｓｅｎｔａｔｉｏｎｓｆｒｏｍＴｒａｎｓｆｏｒｍｅｒｓ）法等の文章類似度を算出する他の方法を用いて類似文章を抽出しても良い。ＢＥＲＴ法は、ｔｆｉｄｆ法と異なり文脈理解可能なモデルを生成するため、本発明における対話を考慮したプログラムにより適している。ＢＥＲＴ法の使用例としては、まず、保有するデータベースの質問文同士が同義か同義ではないかのラベル付けを一部の少量データに対して行う。その次に、公知の日本語Ｗｉｋｉｐｅｄｉａ事前学習済みモデルと用意したラベルデータを用いてファインチューニングを行うことで、同義文を判定するモデルを生成する、といった方法がある。 The above explanation is an outline of the tfidf method, and the algorithm of the tfidf method when using the tfidf method can be changed as appropriate from the method explained above. In addition, instead of the tfidf method, which is a similar sentence extraction method that calculates the cosine similarity between sentences and extracts similar sentences that are similar to the target sentence, a method such as the BERT (Bidirectional Encoder Representations from Transformers) method can be used. Similar sentences may be extracted using other methods of calculating sentence similarity. Unlike the tfidf method, the BERT method generates a model that can understand the context, and therefore is more suitable for a program that takes dialogue into account in the present invention. As an example of the use of the BERT method, first, a small amount of data is labeled as to whether the question sentences in the held database are synonymous or not. Next, there is a method of generating a model for determining synonymous sentences by performing fine tuning using a known Japanese Wikipedia pre-trained model and prepared label data.

＜処理手順＞
次に、対話回答候補提案システム１の処理手順について説明する。図８、図９は、対話回答候補提案システム１の学習用質問回答データ生成処理の一例の手順を示すフローチャートである。本システムにおける学習用質問回答データ生成処理は、システムの管理者、又は、対話回答候補提案システムの管理者が、バッチ処理にて実行するものであり、定期的又は不定期的な間隔で学習用質問回答データ生成プログラム２６（図２参照）が起動され、図８及び図９で示す処理が開始される。 <Processing procedure>
Next, the processing procedure of the dialog answer candidate suggestion system 1 will be explained. 8 and 9 are flowcharts showing an example of the procedure of the learning question and answer data generation process of the dialog answer candidate proposal system 1. The learning question and answer data generation process in this system is executed by the system administrator or the dialog answer candidate proposal system administrator as a batch process, and the learning question and answer data generation process is performed at regular or irregular intervals by the system administrator or the dialog answer candidate proposal system administrator. The question and answer data generation program 26 (see FIG. 2) is activated, and the processing shown in FIGS. 8 and 9 is started.

対話回答候補提案システム１のプロセッサ３１は、学習用質問回答データ生成プログラム２６（図２参照）の実行が指定されると、副記憶装置３３から学習用質問回答データ生成プログラム２６（図２参照）を読み出してプログラムの実行を開始する。図８において、プロセッサ３１はプログラムを実行する際の変数の初期化作業を行う。ここでは、ＩＤをカウントするための変数ｉをゼロクリアしてから（ステップＳ１０１）、変数ｉを１だけカウントアップする（ステップＳ１０２）。 When execution of the learning question and answer data generation program 26 (see FIG. 2) is specified, the processor 31 of the dialog answer candidate proposal system 1 reads the learning question and answer data generation program 26 (see FIG. 2) from the secondary storage device 33. Read the file and start executing the program. In FIG. 8, a processor 31 performs variable initialization work when executing a program. Here, a variable i for counting ID is cleared to zero (step S101), and then the variable i is counted up by 1 (step S102).

次に、プロセッサ３１は、質問回答データベース２１の変数ｉに格納されたＩＤのデータが、本プログラムによって確認されているかを、学習データ生成進捗が「×」であるかによって確認する（ステップＳ１０３）。確認されていないと判定された場合（ステップＳ１０３：ＮＯ）はステップＳ１０４に進み、確認されていると判定された場合（ステップＳ１０３：ＹＥＳ）は、ステップＳ１０２に戻る。これにより、プロセッサ３１は、本プログラムで未確認のデータをＩＤ＝１から順に発見するまで、ステップＳ１０２、ステップＳ１０３の処理を繰り返す。 Next, the processor 31 checks whether the data of the ID stored in the variable i of the question and answer database 21 has been confirmed by this program by checking whether the learning data generation progress is "x" (step S103). . If it is determined that the information has not been confirmed (step S103: NO), the process advances to step S104, and if it is determined that the information has been confirmed (step S103: YES), the process returns to step S102. As a result, the processor 31 repeats the processing of steps S102 and S103 until unconfirmed data is found in this program in order from ID=1.

次に、プロセッサ３１は、変数ｉに対応するＩＤ以上のデータの質問文章、回答文章に対し、前処理Ａに沿って解析を行う（ステップＳ１０４）。前処理Ａとは、質問データと回答データを結合するための事前処理を指し、文章を結合するにあたって必要な解析を行うものである。前処理Ａの各解析手法は、公知技術であってよい。例えば、特許文献２に記載された構文解析、意味解析、主題解析などでもよい。 Next, the processor 31 analyzes the question text and answer text of data equal to or higher than the ID corresponding to the variable i according to preprocessing A (step S104). Preprocessing A refers to preprocessing for combining question data and answer data, and performs analysis necessary for combining sentences. Each analysis method of preprocessing A may be a known technique. For example, syntactic analysis, semantic analysis, thematic analysis, etc. described in Patent Document 2 may be used.

図１０は、学習用質問回答データ生成処理の一例を示す説明図である。最初に、データ欄１０００に示すように結合する質問文章と回答文章のペアを質問回答データベース２１から読み出す。次に、これら質問文章及び回答文章を、前処理Ａ１１０１にて構文解析、意味解析、主題解析などを行うことにより、解析結果欄１００２に示したような形に解析する。この後に、解析結果欄１１０２に示した内容に対して、結合を実施するか否か判断するルールＡ１１０３を適用する。さらに、ルールＡで結合すると判断された対象に対して、結合を実施するルールＢ１１０４を適用する。ルールＢ１１０４により結合された結果の例を示したのが、表１１０５である。 FIG. 10 is an explanatory diagram showing an example of the learning question and answer data generation process. First, as shown in the data column 1000, pairs of question sentences and answer sentences to be combined are read out from the question and answer database 21. Next, these question sentences and answer sentences are analyzed into the form shown in the analysis result column 1002 by performing syntactic analysis, semantic analysis, thematic analysis, etc. in preprocessing A1101. After this, rule A1103 is applied to the content shown in the analysis result column 1102 to determine whether or not to perform the combination. Furthermore, a rule B 1104 for performing the combination is applied to the objects determined to be combined according to the rule A. Table 1105 shows an example of the results combined by rule B1104.

構文解析は、質問文章及び回答文章について、形態素解析等で構文の解析を行うことで、「恐れ入る⇒動詞」、「が⇒助詞」といったように文章を解析する。意味解析は、構文解析した結果に基づき、意味属性を文法的に解析して「でしょうか⇒助動詞：質問」といったように意味を解析する。この際、疑問詞テーブル２３を用いることで、「どの⇒連帯詞：不明部分：ＷＨＡＴ」といったように、結合に当たって必要な不明部分を予測することが可能となる。また、同義語テーブル２４を用いることで、「スマホ⇒スマートフォン」といったように、解析に当たって同義とする単語を認識させることが可能となる。さらには、肯定否定語テーブル２４を用いることで、「はい⇒肯定」といったように、ＹＥＳ／ＮＯに対する回答が肯定か否定かを予測することが可能となる。主題解析は、特許文献２で定義された、文章中の重要部分を特定する手段で、「携帯⇒名詞ＭＡＩＮ：ＷＨＡＴ」、「使う⇒動詞ＭＡＩＮ：ＨＯＷ」といったように文章が解析される。 In the syntactic analysis, the question sentences and answer sentences are analyzed using morphological analysis, etc., and the sentences are analyzed into sentences such as ``Oshiru ⇒ verb'' and ``ga ⇒ particle.'' In semantic analysis, based on the results of syntactic analysis, the semantic attributes are grammatically analyzed and the meaning is analyzed, such as ``Shika ⇒ Auxiliary verb: question.'' At this time, by using the interrogative word table 23, it is possible to predict the unknown part necessary for the combination, such as "which⇒joint word: unknown part:WHAT". Furthermore, by using the synonym table 24, it is possible to recognize words that are synonymous during analysis, such as "smartphone⇒smartphone". Furthermore, by using the affirmative/negative word table 24, it is possible to predict whether the answer to YES/NO is affirmative or negative, such as "Yes⇒affirmative". Thematic analysis is defined in Patent Document 2 and is a means of identifying important parts in a sentence, and the sentence is analyzed such as "mobile phone ⇒ noun MAIN: WHAT" and "use ⇒ verb MAIN: HOW."

再び図８に戻り、ステップＳ１０５においてプロセッサ３１は、質問回答データベース２１の変数ｉに対応するＩＤのデータが、複数回の対話なのか一度のやり取りで終了したのかを、ＮｅｘｔＩＤが「－」であるかによって確認する。一度のやり取りで終了したと判定された場合（ステップＳ１０５：ＹＥＳ）は、変数ｉに対応するＩＤのデータの学習データ生成進捗に、学習用データ生成対象外を意味する「－」を格納し、変数ｉをインクリメントしてからステップＳ１０４に戻る（ステップＳ１０６）。ステップＳ１０５で複数回の対話であると判定された場合（ステップＳ１０５：ＮＯ）は、ステップＳ１０７に進む。これにより、プロセッサ３１は、複数回の対話データを発見するまで、ステップＳ１０５、ステップＳ１０６の処理を繰り返すことになる。 Returning to FIG. 8 again, in step S105, the processor 31 determines whether the data of the ID corresponding to the variable i in the question and answer database 21 is a multiple interaction or a single interaction, and the Next ID is "-". Check depending on whether If it is determined that the process is completed with one exchange (step S105: YES), a "-" indicating that the learning data is not to be generated is stored in the learning data generation progress of the data of the ID corresponding to the variable i, After incrementing the variable i, the process returns to step S104 (step S106). If it is determined in step S105 that there is a plurality of dialogues (step S105: NO), the process advances to step S107. As a result, the processor 31 repeats the processing of steps S105 and S106 until it finds dialogue data for a plurality of times.

ステップＳ１０７では、プロセッサ３１は、質問回答データベース２１から変数ｉに対応するＩＤから、対話が終了するまでのＩＤ（対応するＮｅｘｔＩＤが０）までの対話文章が、ルールＡに当てはまるかを確認する。当てはまると判定された場合（ステップＳ１０７：ＹＥＳ）は、ステップＳ１０８に進む（ステップＳ１０８の次にステップＳ１０４に戻る）。ステップＳ１０８では、ＩＤ＝ｉの学習データ生成進捗５０７（図３参照）に「－」を格納し、変数ｉを対話終了（図３のＮｅｘｔＩＤ５０４が“０”）を示す次のＩＤまで更新する。ステップＳ１０７で、当てはまらないと判定された場合は、ステップＳ１０９に進む。これにより、プロセッサ３１は、ルールＡに当てはまる対話文章を発見するまで、ステップＳ１０５、ステップＳ１０７の処理を繰り返す。尚、ルールＡとは、一連の対話が学習用質問回答データとして結合されるのに適しているかを判断するためのルールのことであり、図１０で示した通り。前処理Ａ１１０１を行った結果に対し適用する。ルールＡの例として、「オペレータによる一回答文章の中に、不明部分が２つ以上ある」、「オペレータによる一回答文章が、２文以上で構成されている」「オペレータの回答に不明部分、質問要素が含まれていない（聞き返しではなく、回答である）にも関わらず、ユーザとのやり取りが続く（解決していない）」などがある。 In step S107, the processor 31 checks whether the dialogue text from the ID corresponding to the variable i in the question and answer database 21 to the ID (corresponding Next ID is 0) until the end of the dialogue applies to rule A. If it is determined that this applies (step S107: YES), the process advances to step S108 (after step S108, the process returns to step S104). In step S108, "-" is stored in the learning data generation progress 507 (see FIG. 3) for ID=i, and the variable i is updated to the next ID indicating the end of the dialogue (Next ID 504 in FIG. 3 is "0"). If it is determined in step S107 that this does not apply, the process advances to step S109. As a result, the processor 31 repeats the processing of steps S105 and S107 until a dialogue sentence that applies to rule A is found. Note that rule A is a rule for determining whether a series of dialogues is suitable for being combined as learning question and answer data, as shown in FIG. 10. This is applied to the results of preprocessing A1101. Examples of rule A are: ``There are two or more unknown parts in one answer by the operator,'' ``One answer by the operator consists of two or more sentences,'' and ``Unknown parts in the operator's answer. Even though the question element is not included (it is an answer, not a question), the interaction with the user continues (unresolved).''

図９に進み、次にプロセッサ３１は、質問回答データベース２１から変数ｉに対応するＩＤから、対話が終了するまでのＩＤ（対応するＮｅｘｔＩＤが０）までの質問文章で、前処理Ａによる解析結果の中で「不明部分」「質問」が含まれる文章があるかを確認する（ステップＳ１０９）。このような文章があると判定された場合（ステップＳ１０９：ＹＥＳ）は、質問があったタイミングに対応するＩＤのデータの学習データ生成進捗に、学習用データ生成対象外を意味する「－」を格納し、質問があったタイミングに対応するＩＤを記憶する（ステップ１１０）。ステップＳ１０９にて、「不明部分」「質問」が含まれる文章がないと判定された場合（ステップＳ１０９：ＮＯ）は、ステップＳ１１１に進む。Ｓ１０９、Ｓ１１０の処理により、対話開始以外のタイミングでユーザから質問があった場合を分岐することができる。 Proceeding to FIG. 9, the processor 31 analyzes the results of the analysis by preprocessing A on the question text from the ID corresponding to the variable i to the ID (corresponding Next ID is 0) until the end of the dialogue from the question and answer database 21. It is checked whether there is a sentence that includes "unknown part" or "question" (step S109). If it is determined that such a sentence exists (step S109: YES), a "-" indicating that the learning data is not subject to generation is added to the learning data generation progress of the ID data corresponding to the timing of the question. The ID corresponding to the timing of the question is stored (step 110). If it is determined in step S109 that there is no sentence that includes "unknown part" or "question" (step S109: NO), the process advances to step S111. By processing S109 and S110, it is possible to branch when the user asks a question at a timing other than the start of the dialogue.

次に、プロセッサ３１は、変数ｉに対応するＩＤの回答文章とそのＩＤのＮｅｘｔＩＤの質問文章を、ルールＢに従い結合する（ステップＳ１１１）。ルールＢとは、一連の対話が学習用質問回答データとして結合するためのルールのことであり、図１０で示した通り、前処理Ａ１１０１を行った結果に対して適用する。図１０では、解析結果欄１００２に示した解析後の質問文章及び回答文章をルールＢを用いて、表１１０５に示したような形で結合する。例えば、解析した結果の「ＨＯＷの種別」、「不明部分の種別」がそれぞれ、「動詞」、「ＷＨＡＴ」だった場合には、「結合ルール」列の通り、「携帯は～を使う」といったように結合する。このように、解析した結果の種別をそれぞれをルールとして定義することで、適切な構文で文章を結合することが可能となる。 Next, the processor 31 combines the answer sentence with the ID corresponding to the variable i and the question sentence with the Next ID of that ID according to rule B (step S111). Rule B is a rule for combining a series of dialogues as learning question and answer data, and is applied to the results of preprocessing A1101 as shown in FIG. In FIG. 10, the analyzed question text and answer text shown in the analysis result column 1002 are combined using rule B in the form shown in table 1105. For example, if the ``HOW type'' and ``Unknown part type'' in the analysis result are ``verb'' and ``WHAT,'' respectively, as per the ``combination rule'' column, ``Mobile phones use...'' Combine like this. In this way, by defining each type of analysis result as a rule, it becomes possible to combine sentences with an appropriate syntax.

次に、プロセッサ３１は、変数ｉを、結合したＩＤまで更新する（ステップＳ１１２）。次に、変数ｉのＮｅｘｔＩＤ５０４が対話終了を意味する「０」であるかを確認する（ステップＳ１１３）。対話終了と判定された場合（ステップＳ１１３：ＹＥＳ）はステップＳ１１４に進み、対話終了でないと判定された場合（ステップＳ１１３：ＮＯ）は、ステップＳ１１１に戻る。これにより、一連の対話が完了するまで、ステップＳ１１１、ステップＳ１１２の処理が繰り返される。 Next, the processor 31 updates the variable i to the combined ID (step S112). Next, it is checked whether the NextID 504 of the variable i is "0", which means the end of the dialogue (step S113). If it is determined that the dialogue has ended (step S113: YES), the process advances to step S114, and if it is determined that the dialogue has not ended (step S113: NO), the process returns to step S111. As a result, the processes of steps S111 and S112 are repeated until the series of dialogues is completed.

次に、プロセッサ３１は、対話始めの質問文章及びステップＳ１１１～Ｓ１１３までで生成した結合文章を学習用質問データ、対話終了時の回答データを学習用回答データとし、学習用質問回答データベース２２に格納する（ステップＳ１１２）。次にプロセッサ３１は、次の変数ｉ＋１と等しいＩＤが存在するかを確認する（ステップＳ１１５）。存在する場合（ステップＳ１１５：ＹＥＳ）はステップＳ１０５に戻り、存在しないと判定された場合（ステップＳ１１３：ＮＯ）は、図８、図９で示す処理を終了する。これにより、データ生成処理を実施していないデータに対して、学習用質問回答文を作成する処理を行うことができる。尚、本実施例では図９及び図１０の処理をバッチ処理で、質問者からの入力操作とは連動しないタイミングで行うとして説明したが、バッチ処理だけに限られずに、質問者からの一連の入力操作とオペレータによる回答が完了した時点で図９及び図１０の処理を実行するようにリアルタイムで処理するように構成しても良い。 Next, the processor 31 stores the question sentence at the beginning of the dialogue and the combined sentences generated in steps S111 to S113 as learning question data, and the answer data at the end of the dialogue as learning answer data, in the learning question and answer database 22. (Step S112). Next, the processor 31 checks whether an ID equal to the next variable i+1 exists (step S115). If it exists (step S115: YES), the process returns to step S105, and if it is determined that it does not exist (step S113: NO), the process shown in FIGS. 8 and 9 ends. As a result, it is possible to perform the process of creating a learning question and answer sentence for data that has not been subjected to the data generation process. In this embodiment, the processing in FIGS. 9 and 10 is performed as a batch process, and is performed at a timing that is not linked to the input operation from the questioner. The configuration may be such that the processing in FIGS. 9 and 10 is executed in real time when the input operation and the answer by the operator are completed.

図１１は、オペレータ端末３に表示される質問回答表示画面の一例を示す説明図である。図１１に示す質問回答表示画面１２００には、対話内過去質問文章表示欄１２０１と、回答文章表示欄１２０２と、新規質問文章表示欄１２０３と、送信ボタン１２０４と、回答ボタン１３０５が表示される。対話内過去質問文章表示欄１２０１は、一連の対話における質問文章を表示する欄である。回答文章表示欄１１０２は、オペレータによる回答文章を表示する欄である。新規質問文章表示欄１２０３は、新規質問文章を表示する欄である。ここでは、質問文章と回答文章の表示欄が３つだけ表示されている例が示されているが、文章の数が多くなればそれに対応する表示欄も多数表示される（表示画面１２００上に同時表示できない場合は、スクロール可能とされる）。 FIG. 11 is an explanatory diagram showing an example of a question and answer display screen displayed on the operator terminal 3. A question and answer display screen 1200 shown in FIG. 11 displays a dialog past question text display field 1201, an answer text display field 1202, a new question text display field 1203, a send button 1204, and an answer button 1305. The dialog past question text display column 1201 is a column that displays question texts in a series of dialogs. The answer text display column 1102 is a column for displaying the answer text given by the operator. The new question text display field 1203 is a field for displaying new question texts. Here, an example is shown in which only three display columns for question sentences and answer sentences are displayed, but as the number of sentences increases, many corresponding display columns will be displayed (on the display screen 1200). (If they cannot be displayed simultaneously, they can be scrolled.)

オペレータ端末３を操作するオペレータが、対話内過去質問文章表示欄１２０１もしくは、新規質問文章表示欄１２０３内を押す（クリック等がなされる）と、オペレータ端末３は、オペレータからの入力を受け付けて、オペレータが対話内過去質問文章表示欄１２０１もしくは、新規質問文章表示欄１２０３内の質問文章を編集できる。この編集により、オペレータ端末３は、オペレータが誤記の修正等の編集を加えた新規質問文章を対話回答候補提案システム１に送信することができる。その結果、プロセッサ３１は、編集を加えた質問文章を新規質問文章とみなして回答候補文章を生成する。これにより、プロセッサ３１は、より好適な回答候補文章を生成し得る。 When the operator operating the operator terminal 3 presses (clicks, etc.) inside the dialog past question text display field 1201 or the new question text display field 1203, the operator terminal 3 accepts input from the operator, The operator can edit the question text in the dialog past question text display field 1201 or the new question text display field 1203. Through this editing, the operator terminal 3 can transmit a new question text that has been edited by the operator, such as correcting typographical errors, to the dialog answer candidate proposal system 1. As a result, the processor 31 regards the edited question text as a new question text and generates an answer candidate text. Thereby, the processor 31 can generate more suitable answer candidate sentences.

送信ボタン１２０４は、オペレータが押すことにより、オペレータ端末３が、対話回答候補提案システム１に、対話内過去質問文章表示欄１２０１内の質問文章と新規質問文章表示欄１２０３内の新規質問文章と、対話回答候補提案システム１に回答候補文章の生成を指示する情報である回答候補文章生成開始情報と、を含む生成開始情報を送信する。尚、送信ボタン１２０４は、アイコン形式で表示され、ここでいう「押す」とは、アイコンを選択することを意味し、オペレータ端末３がタッチ式の入力パネルを有する場合には、「送信ボタン１２０４を押す」は「アイコンがタッチされる」と同義である。生成開始情報に含まれる質問文章は、オペレータが送信ボタン１２０４を押した時点での対話内過去質問文章表示欄１２０１と新規質問文章表示欄１２０３内の質問文章である。従って、オペレータが送信ボタン１２０４を押す前に、対話内過去質問文章表示欄１２０１もしくは、新規質問文章表示欄１２０３内の質問文章を編集した場合には、編集後の質問文章が質問文章として開始情報に含まれる。 When the operator presses the send button 1204, the operator terminal 3 sends the dialogue answer candidate suggestion system 1 the question text in the dialog past question text display field 1201 and the new question text in the new question text display field 1203. Generation start information including answer candidate sentence generation start information, which is information instructing the dialog answer candidate proposal system 1 to generate an answer candidate sentence, is transmitted. Note that the send button 1204 is displayed in the form of an icon, and "pressing" here means selecting the icon. If the operator terminal 3 has a touch-type input panel, the "send button 1204" is displayed in the form of an icon. ``is pressed'' is synonymous with ``the icon is touched''. The question text included in the generation start information is the question text in the past question text display field 1201 and the new question text display field 1203 in the dialogue at the time when the operator presses the send button 1204. Therefore, if the operator edits the question text in the dialogue past question text display field 1201 or the new question text display field 1203 before pressing the send button 1204, the edited question text will be used as the question text in the start information. include.

オペレータによって回答ボタン１２０５が押されると、オペレータ端末３に表示されている画面が、質問回答表示画面から、オペレータが新規質問文章に対する回答文章を入力する画面に切り替わるように構成されている。新規質問文章に対する回答文章を入力する画面の例は、公知の入力方法を用いれば良いので、ここでの説明は省略する。 When the operator presses the answer button 1205, the screen displayed on the operator terminal 3 is configured to switch from a question and answer display screen to a screen where the operator inputs an answer to a new question. As an example of a screen for inputting an answer text to a new question text, a known input method may be used, so a description thereof will be omitted here.

プロセッサ３１は、ネットワークＩ／Ｆ３６でオペレータ端末３から生成開始情報を受信する（これにより、生成開始情報に含まれる、新規質問文章が入力される）と、回答候補文章生成部１２により実行される回答候補文章生成処理を開始する。回答候補文章生成処理によって生成された、回答文は、質問応答実行プログラム２８（図２参照）によって、質問者の使用するユーザ端末２に送信され、質問者からの更なる質問を受け付けることになる。 When the processor 31 receives the generation start information from the operator terminal 3 through the network I/F 36 (thereby, a new question sentence included in the generation start information is input), the answer candidate sentence generation unit 12 executes the process. Start answer candidate sentence generation processing. The answer sentences generated by the answer candidate sentence generation process are sent by the question answer execution program 28 (see FIG. 2) to the user terminal 2 used by the questioner, and further questions from the questioner are accepted. .

次に、図１２を用いて回答候補文章生成処理について説明する。図１２は、対話回答候補提案システム１の回答候補文章生成処理の一例を示すフローチャートであり、この処理は、質問者からの質問に対してオペレータが回答しようとする際に回答候補文章生成プログラム２７（図２参照）により実行される。最初にプロセッサ３１は、対話内過去質問文章と、新規質問文章を保存する（ステップＳ２０１）。次に、プロセッサ３１は、対話内過去質問文章と新規質問文章を足し合わせた文章に対し、類似文章抽出法にて生成したモデルから、文章ベクトルを生成する（ステップＳ２０２）。 Next, the answer candidate sentence generation process will be explained using FIG. 12. FIG. 12 is a flowchart showing an example of the answer candidate sentence generation process of the dialog answer candidate suggestion system 1. (See FIG. 2). First, the processor 31 stores the past question sentences in the dialogue and the new question sentences (step S201). Next, the processor 31 generates a sentence vector from a model generated by the similar sentence extraction method for the sentence that is the sum of the past question sentences in the dialogue and the new question sentences (step S202).

次に、プロセッサ３１は、質問回答データベース２１及び学習用質問回答データベース２２を用い、類似文章抽出法を用いて類似度を算出して、質問情報に類似する過去の質問文章を抽出し、抽出した過去の質問文章、
及び、それに対応付けられた過去の回答文章を回答候補文章として、保存する（ステップＳ２０３）。ここで、質問回答データベース２１の学習用データ生成進捗が「〇」の文章は、学習用質問回答データベース２２に格納されている文章の結合前の文章となるため、類似度算出対象外とする。またこの際、データベースに格納したユーザ評価を用い、ｇｏｏｄ評価であれば類似度を上昇させる処理などを施すこともできる。 Next, the processor 31 uses the question and answer database 21 and the learning question and answer database 22 to calculate the degree of similarity using a similar sentence extraction method, and extracts past question sentences that are similar to the question information. past question texts,
Then, the past answer sentences associated therewith are saved as answer candidate sentences (step S203). Here, the sentences whose learning data generation progress in the question and answer database 21 is "〇" are sentences that have not been combined with the sentences stored in the study question and answer database 22, and are therefore excluded from similarity calculation. At this time, it is also possible to use the user evaluations stored in the database and perform processing to increase the degree of similarity if the evaluation is good.

次に、プロセッサ３１は、ステップＳ２０３で算出した類似度のうち、質問回答データベース２１の学習用データ生成進捗が「－」及び「×」の対話文章に対して、合算類似度を算出する（ステップＳ２０４）。これにより、学習用質問回答データとして結合されていない対話データに対して、対話全体で一括りとした指標を算出する。合算類似度を算出する例として、対話内の質問文章のうち、「不明部分」、「質問」、「要望」が含まれる文章を重みづけした上で類似度を合算する方法などがある。これにより、ユーザが求める意図に沿った類似質問の類似度を上昇させることが期待出来る。 Next, of the similarities calculated in step S203, the processor 31 calculates a total similarity for the dialogue sentences for which the learning data generation progress of the question and answer database 21 is "-" and "x" (step S203). S204). As a result, an index for the entire dialogue is calculated for dialogue data that is not combined as learning question and answer data. As an example of calculating the total similarity, there is a method of weighting sentences that include "unknown parts," "questions," and "requests" among the question sentences in the dialogue, and then adding up the similarities. This can be expected to increase the degree of similarity of similar questions in line with the user's intentions.

次に、プロセッサ３１は、ステップＳ２０３、Ｓ２０４で算出した類似度を用い、類似度の高さで上位から所定の割合（例えば２０％）または所定の数（例えば３）の過去の質問文章を抽出する。そして、抽出した過去の質問文章に対応付けられた過去の回答文章を、質問回答データベース２１から抽出し、抽出した過去の質問文章及び回答文章を、回答候補文章として保存する（Ｓ２０５）。また、合算類似度が類似度上位となった場合には、対応する対話の全てを回答候補文章としても良いし、そのうちで最も類似度が高い質問文章及び対応する回答候補文章としても良い。また、学習用質問データの類似度が上位となった場合、学習用質問データ及び対応する学習用回答データを回答候補文章としても良いし、元々の対話データを回答候補文章としても良い。さらには、類似の回答候補文章を重ねて表示しないよう、類似度上位同士の回答文章の類似度を計算し、所定の類似度を超えた場合一方を表示しない、といった形としても良い。そして、回答候補文章および回答候補文章表示画面情報を、出力装置（ネットワークＩ／Ｆ３６）に出力して、出力装置（ネットワークＩ／Ｆ３６）に回答候補文章および回答候補文章表示画面情報をオペレータ端末３に送信させる。 Next, the processor 31 uses the similarity calculated in steps S203 and S204 to extract a predetermined percentage (for example, 20%) or a predetermined number (for example, 3) of past question texts from the top based on the high similarity. do. Then, past answer texts associated with the extracted past question texts are extracted from the question and answer database 21, and the extracted past question texts and answer texts are saved as answer candidate texts (S205). Further, when the total similarity is high in similarity, all of the corresponding dialogues may be used as answer candidate sentences, or the question sentence with the highest similarity among them and the corresponding answer candidate sentences may be used. Further, when the similarity of the learning question data is high, the learning question data and the corresponding learning answer data may be used as the answer candidate sentences, or the original dialogue data may be used as the answer candidate sentences. Furthermore, in order to avoid displaying similar answer candidate sentences in an overlapping manner, the degree of similarity between answer sentences with the highest similarities may be calculated, and if the degree of similarity exceeds a predetermined degree, one of the answer sentences may not be displayed. Then, the answer candidate sentences and the answer candidate sentence display screen information are output to the output device (network I/F 36), and the answer candidate sentences and the answer candidate sentence display screen information are output to the output device (network I/F 36). have it sent to

プロセッサ３１は、所定時間待機する（ステップＳ２０６）。次に、プロセッサ３１は、同一ユーザから質問文章が入力されたかを確認する（ステップＳ２０７）。入力されたと判定された場合（ステップＳ２０７：ＹＥＳ）はステップＳ２０１に進み、入力されていないと判定された場合（ステップＳ２０７：ＮＯ）は、処理を完了する。これにより、プロセッサ３１は、対話が続く限り処理を繰り返して回答候補文章を提案する。 The processor 31 waits for a predetermined time (step S206). Next, the processor 31 checks whether a question text has been input by the same user (step S207). If it is determined that the information has been input (step S207: YES), the process proceeds to step S201, and if it is determined that the information has not been input (step S207: NO), the process is completed. As a result, the processor 31 repeats the process and proposes answer candidate sentences as long as the dialogue continues.

図１３は、オペレータ端末３に表示される回答候補文章表示画面の一例を示す説明図である。図１２に示す回答候補文章表示画面１４００は、回答候補文章（過去質問文章）欄１４０１、１４０２、１４０３、１４０４と、回答候補文章（過去回答文章）欄１４０５、１４０６、１４０７、１４０８を備えている。またこの例では、１つの対話文章を回答候補文章として表示しているが、図１２で説明した通り、表示形式はこれに限らない。 FIG. 13 is an explanatory diagram showing an example of an answer candidate sentence display screen displayed on the operator terminal 3. The answer candidate sentence display screen 1400 shown in FIG. 12 includes answer candidate sentence (past question sentences) columns 1401, 1402, 1403, and 1404, and answer candidate sentence (past answer sentences) columns 1405, 1406, 1407, and 1408. . Furthermore, in this example, one dialogue sentence is displayed as an answer candidate sentence, but the display format is not limited to this, as described with reference to FIG. 12.

オペレータ端末３に、回答候補文章表示画面で、回答候補文章が表示されることで、オペレータは、表示された回答候補文章を参考にして、新規質問文章に対する回答文章を生成できる。これにより、オペレータは、より容易に回答文章を生成できる。また、オペレータが回答文章を生成するために必要となるエネルギーや生成される二酸化炭素の排出量を減らすことができ、地球温暖化を抑制できる。 By displaying the answer candidate sentences on the answer candidate sentence display screen on the operator terminal 3, the operator can generate an answer sentence for the new question sentence by referring to the displayed answer candidate sentences. This allows the operator to more easily generate answer texts. In addition, the energy required for the operator to generate the answer texts and the amount of carbon dioxide emissions generated can be reduced, and global warming can be suppressed.

本実施例で対話回答候補提案システム１は、新規質問文章だけでなく、質問回答データベース２１及び学習用質問回答データベース２２に保存された過去の質問文章および新規質問文章に基づいて、回答候補文章を生成する。これにより、対話回答候補提案システム１は、新規質問文章だけに基づいて回答候補文章を生成する場合に比べて、新規質問文章の質問の意図により一層沿う、好適な回答候補文章を生成し、出力できる。 In this embodiment, the dialog answer candidate proposal system 1 generates answer candidate sentences based not only on new question sentences but also on past question sentences and new question sentences stored in the question and answer database 21 and the learning question and answer database 22. generate. As a result, the dialogue answer candidate proposal system 1 generates and outputs suitable answer candidate sentences that are more in line with the intent of the question in the new question sentence than when generating answer candidate sentences based only on the new question sentence. can.

また、質問回答データベース２１及び学習用質問回答データベース２２に保存された過去の質問文章と、過去の回答文章とを用いて、回答候補文章を生成する。これにより、対話回答候補提案システム１は、より容易に回答候補文章を生成できる。 Further, answer candidate sentences are generated using past question sentences and past answer sentences stored in the question and answer database 21 and the learning question and answer database 22. Thereby, the dialog answer candidate proposal system 1 can more easily generate answer candidate sentences.

また、対話回答候補提案システム１は、ネットワークＩ／Ｆ３６（送受信装置）に、回答候補文章を出力して、ネットワークＩ／Ｆ３６（送受信装置）に、回答候補文章を、ネットワーク５を介してオペレータ端末３に送信させる。これにより、オペレータ端末３を操作するオペレータは、容易に回答候補文章を読むことができる。 In addition, the dialog answer candidate proposal system 1 outputs answer candidate sentences to the network I/F 36 (transmission/reception device), and sends the answer candidate sentences to the network I/F 36 (transmission/reception device) via the network 5 to the operator terminal. 3 to send. Thereby, the operator operating the operator terminal 3 can easily read the answer candidate sentences.

以上、本発明を実施例に基づいて説明したが、本発明は上述の例に限定されるものではなく、その趣旨を逸脱しない範囲内で種々の変更が可能である。例えば、前述した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明したすべての構成を備えるものに本発明は限定されない。また、実施例の構成の一部について、他の構成の追加、削除、または置換をしてもよい。 Although the present invention has been described above based on examples, the present invention is not limited to the above-mentioned examples, and various changes can be made without departing from the spirit thereof. For example, the embodiments described above have been described in detail to explain the present invention in an easy-to-understand manner, and the present invention is not necessarily limited to having all the configurations described. Furthermore, other configurations may be added, deleted, or replaced with some of the configurations of the embodiments.

１対話回答候補提案システム
２ユーザ端末
３オペレータ端末
５ネットワーク
１１学習用質問回答データ生成部
１２回答候補文章生成部
１３質問回答実行部
２１質問回答データベース
２２学習用質問回答データベース
２３疑問詞テーブル
２４同義語テーブル
２５肯定否定語テーブル
２６学習用質問回答データ生成プログラム
２７回答候補文章生成プログラム
２８質問回答実行プログラム
３１プロセッサ
３２主記憶装置
３３副記憶装置
３４入力装置
３５出力装置
３６ネットワークインターフェース
３７データバス 1 Dialogue answer candidate suggestion system 2 User terminal 3 Operator terminal 5 Network 11 Study question and answer data generation section 12 Answer candidate sentence generation section 13 Question and answer execution section 21 Question and answer database 22 Study question and answer database 23 Interrogative word table 24 Synonyms Table 25 Affirmative and negative word table 26 Learning question and answer data generation program 27 Answer candidate sentence generation program 28 Question and answer execution program 31 Processor 32 Main storage 33 Secondary storage 34 Input device 35 Output device 36 Network interface 37 Data bus

Claims

It has a processor, a storage device, and a transmitting/receiving device that communicates with a user terminal used by the questioner and an operator terminal used by the answerer, and it generates answer candidate sentences that are candidates for answer sentences to the question sentences from the questioner. A dialogue answer candidate suggestion system that generates and displays on the operator terminal,
a question-and-answer database in which past question texts input by the questioner are associated with past answer texts to the past question texts, and dialogue time series information is added and recorded;
a learning question and answer database that records learning question sentences and learning answer sentences generated based on information stored in the question and answer database;
an answer candidate sentence generation unit that extracts past question sentences or learning question sentences similar to the new question sentence from the question and answer database or the learning question and answer database when a new question sentence is input by the questioner;
a learning question and answer data generation unit that generates learning question and answer data using past question sentences similar to the extracted new question sentences or past answer sentences associated with the learning question sentences;
A dialog answer candidate suggestion system characterized by having the following.

The learning question and answer data generation unit calculates the degree of similarity with the new question text for each of the past question text and the learning question text stored in the question and answer database and the learning question and answer database. According to claim 1, the learning question and answer data is generated based on the similarity with the new question text or the total similarity of each dialogue by calculating the total similarity of each dialogue. Dialogue answer candidate suggestion system described.

storing in the storage device an interrogative word table that stores interrogative words expressing questions;
The learning question and answer data generation unit extracts the interrogative words stored in the interrogative word table from the past question sentences and answer sentences using sentence analysis, and generates a rule in which the result of the sentence analysis is generated in advance. 3. The dialog answer candidate proposal system according to claim 2, wherein the learning question text is generated based on the following.

storing in the storage device a synonym table that stores words that are synonymous when analyzing the question text;
The learning question and answer database extracts the words stored in the synonym table from the past question sentences and answer sentences using sentence analysis, and extracts the words stored in the synonym table from the past question sentences and answer sentences, and extracts the words stored in the synonym table from the past question sentences and answer sentences, based on rules generated in advance as a result of the sentence analysis. The dialog answer candidate suggestion system according to claim 2, wherein the dialog answer candidate suggestion system is generated.

The dialog answer candidate proposal according to claim 4, wherein each of the answer candidate sentences stored in the learning question and answer database is generated using a plurality of the past question sentences having similar contents. system.

The transmitting/receiving device includes a network interface that allows the dialogue answer candidate suggestion system to be connected to a network,
The dialog answer candidate proposal system according to claim 2, wherein the processor receives input from the user terminal and the operator terminal, and outputs to the user terminal and the operator terminal via the network interface. .

When the new question sentence from the questioner is input, the answer candidate sentence generation unit searches the learning question and answer database,
7. The dialogue answer candidate proposal system according to claim 6, wherein the processor causes an operator terminal to display a similar question to the new question text and answer candidates to the similar question.

The processor executes in real time or in batch processing a learning question and answer data generation program that associates past question sentences stored in the question and answer database with past answer sentences for the past question sentences, 8. The dialogue answer candidate proposal system according to claim 7, wherein the learning question and answer database is constructed.

a processor, a storage device, a transmitting/receiving device that communicates with a user terminal used by the questioner and an operator terminal used by the answerer, and past question texts from the questioner and past answers to the past question texts. a question-and-answer database that associates and stores
A method for proposing dialogue answer candidates in a dialogue answer candidate proposal system, comprising: a learning question and answer database that records learning question sentences and learning answer sentences generated based on information stored in the question and answer database; ,
Create a learning question and answer database using past question sentences or past answer sentences that are associated with learning question sentences,
When a new question sentence is input by the questioner,
Calculating the degree of similarity with the new question sentence for each of the past question sentence and the learning question sentence, and calculating the total similarity of each dialogue,
Past question sentences similar to the extracted new question sentences or past answer sentences associated with the learning question sentences are extracted from the question answer database or the learning question answer database and used as answer candidate sentences for the learning purpose. A method for proposing dialogue answer candidates, characterized by adding them to a question and answer database.

storing in the storage device an interrogative word table that stores interrogative words expressing questions;
The processor extracts the interrogative words stored in the interrogative word table from the past question sentences and answer sentences using sentence analysis, and selects the answer candidates based on a rule generated in advance from the result of the sentence analysis. 10. The dialogue answer candidate proposal method according to claim 9, further comprising the step of creating a sentence.

storing in the storage device a synonym table that stores words that are synonymous when analyzing the question text;
The processor extracts the words stored in the synonym table from the past question sentences and answer sentences using sentence analysis, and extracts the words from the past question sentences and answer sentences based on the result of the sentence analysis based on a rule generated in advance. 11. The dialogue answer candidate proposal method according to claim 10, wherein:

The transmitting/receiving device includes a network interface that allows the dialogue answer candidate proposal system to be connected to a network,
The dialog answer candidate proposal according to claim 11, wherein the processor performs input from the user terminal and the operator terminal and output to the user terminal and the operator terminal via the network interface. Method.

When the new question text from the questioner is input, the processor searches for similar questions from the learning question and answer database;
12. The dialog answer candidate proposing method according to claim 11, further comprising displaying the similar questions extracted by the search and answer candidates for the similar questions on the operator terminal.

The processor performs in advance a learning question-and-answer data generation process that associates past question sentences stored in the question-and-answer database with past answer sentences for the past question sentences in a batch process. 14. The dialog answer candidate proposal method according to claim 13, wherein the learning question and answer database is enabled to be used when the new question text from the questioner is input.