JP7052438B2

JP7052438B2 - Training data generation method, training data generation program and data structure

Info

Publication number: JP7052438B2
Application number: JP2018043606A
Authority: JP
Inventors: 拓哉牧野; 智哉野呂
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2018-03-09
Filing date: 2018-03-09
Publication date: 2022-04-12
Anticipated expiration: 2038-03-09
Also published as: JP2019159613A

Description

本発明の実施形態は、学習データ生成方法、学習データ生成プログラムおよびデータ構造に関する。 Embodiments of the present invention relate to a training data generation method, a training data generation program, and a data structure.

従来、製品やサービス等を提供する企業は、頻繁に問い合わせのある質問と回答のペアをＦＡＱ集として準備している。このＦＡＱ集は、コールセンターのオペレータが顧客からの問い合わせに回答するための知識源として利用される。また、顧客が直接問い合わせを行うための検索対象としてＷｅｂ上で提供されている。 Conventionally, companies that provide products and services prepare frequently asked questions and answer pairs as FAQ collections. This FAQ collection is used as a knowledge source for call center operators to answer inquiries from customers. In addition, it is provided on the Web as a search target for customers to make inquiries directly.

このＦＡＱ集における検索では、問い合わせとして入力された自然文や語彙をもとに検索が行われる。しかしながら、ＦＡＱ集に含まれる語彙と、ユーザが検索の際に使う語彙とが異なる場合には検索が困難となり、検索精度が低減する。このような精度低減に対処する手法としては、ＦＡＱ集における質問と、その質問に対応する回答のペアによる機械学習を用いたモデル構築を行う手法が知られている。 In the search in this FAQ collection, the search is performed based on the natural sentences and vocabulary entered as inquiries. However, if the vocabulary included in the FAQ collection is different from the vocabulary used by the user during the search, the search becomes difficult and the search accuracy is reduced. As a method for coping with such a decrease in accuracy, a method for constructing a model using machine learning by a pair of a question in the FAQ and a pair of answers corresponding to the question is known.

特開２０１７－２２８２７２号公報Japanese Unexamined Patent Publication No. 2017-228272

しかしながら、上記の従来技術では、例えば、コールセンター運用開始直後や、新規のＦＡＱ集において、ユーザが問い合わせに用いる語彙の多様性に対応することが困難であった。このため、十分な検索精度を提供することが困難な場合があった。 However, with the above-mentioned conventional technique, it has been difficult to deal with the diversity of vocabulary used by users for inquiries, for example, immediately after the start of call center operation or in a new FAQ collection. Therefore, it may be difficult to provide sufficient search accuracy.

１つの側面では、検索精度の向上を可能とする学習データ生成方法、学習データ生成プログラムおよびデータ構造を提供することを目的とする。 In one aspect, it is an object of the present invention to provide a learning data generation method, a training data generation program, and a data structure that enable improvement of search accuracy.

第１の案では、第１の事例の集合である第１の事例集を検索する検索器の学習データを生成する学習データ生成方法であって、取得する処理と、学習する処理と、生成する処理とをコンピュータが実行する。取得する処理は、質問と、当該質問に対する少なくとも１つの回答とを含む第２の事例の集合である第２の事例集から第２の事例を取得する。学習する処理は、取得した第２の事例に含まれる質問および回答に基づく質問生成器への入力に対し、質問生成器が出力した仮想質問の単語列が質問の単語列に対応するように質問生成器を学習する。生成する処理は、学習した質問生成器に対して第１の事例に基づく入力を行って生成した仮想質問と、入力した第１の事例とを対応付けた学習データを生成する。 The first plan is a learning data generation method for generating learning data of a search device for searching a first casebook, which is a set of first cases, and is a process of acquiring, a process of learning, and a process of generating. The computer performs the processing. The process of acquiring acquires the second case from the second case collection, which is a set of the second cases including the question and at least one answer to the question. In the process of learning, in response to the input to the question generator based on the question and answer included in the acquired second case, the question is asked so that the word string of the virtual question output by the question generator corresponds to the word string of the question. Learn the generator. The generated process generates learning data in which the virtual question generated by inputting the learned question generator based on the first case and the input first case are associated with each other.

本発明の１実施態様によれば、検索精度の向上を可能とする。 According to one embodiment of the present invention, it is possible to improve the search accuracy.

図１は、実施形態にかかるシステムの機能構成例を示すブロック図である。FIG. 1 is a block diagram showing a functional configuration example of the system according to the embodiment. 図２は、実施形態にかかるシステムの動作例を示すフローチャートである。FIG. 2 is a flowchart showing an operation example of the system according to the embodiment. 図３は、実施形態にかかる学習データ生成処理の一例を示すフローチャートである。FIG. 3 is a flowchart showing an example of the learning data generation process according to the embodiment. 図４は、学習時における質問生成器の動作例を説明する説明図である。FIG. 4 is an explanatory diagram illustrating an operation example of the question generator during learning. 図５は、質問生成時における質問生成器の動作例を説明する説明図である。FIG. 5 is an explanatory diagram illustrating an operation example of the question generator at the time of question generation. 図６は、従来の回答検索の具体例を説明する説明図である。FIG. 6 is an explanatory diagram illustrating a specific example of a conventional answer search. 図７は、実施形態にかかるシステムの回答検索の具体例を説明する説明図である。FIG. 7 is an explanatory diagram illustrating a specific example of the response search of the system according to the embodiment. 図８は、プログラムを実行するコンピュータの一例を示す説明図である。FIG. 8 is an explanatory diagram showing an example of a computer that executes a program.

以下、図面を参照して、実施形態にかかる学習データ生成方法、学習データ生成プログラムおよびデータ構造を説明する。実施形態において同一の機能を有する構成には同一の符号を付し、重複する説明は省略する。なお、以下の実施形態で説明する学習データ生成方法、学習データ生成プログラムおよびデータ構造は、一例を示すに過ぎず、実施形態を限定するものではない。また、以下の各実施形態は、矛盾しない範囲内で適宜組みあわせてもよい。 Hereinafter, the learning data generation method, the training data generation program, and the data structure according to the embodiment will be described with reference to the drawings. Configurations having the same function in the embodiment are designated by the same reference numerals, and duplicate description will be omitted. The learning data generation method, the learning data generation program, and the data structure described in the following embodiments are merely examples, and do not limit the embodiments. In addition, the following embodiments may be appropriately combined within a consistent range.

図１は、実施形態にかかるシステムの機能構成例を示すブロック図である。図１に示すように、学習データ生成装置１は、検索対象ＱＡ集ＤＢ３に格納された、検索対象の事例の集合である事例集を検索する検索装置２の回答検索器２１について、機械学習を用いたモデル構築を行う際の学習データ４を生成する。すなわち、回答検索器２１は、検索器の一例である。 FIG. 1 is a block diagram showing a functional configuration example of the system according to the embodiment. As shown in FIG. 1, the learning data generation device 1 performs machine learning on the answer search device 21 of the search device 2 that searches the casebook, which is a set of cases to be searched, stored in the search target QA collection DB3. The training data 4 for constructing the model using is generated. That is, the answer search device 21 is an example of a search device.

検索対象ＱＡ集ＤＢ３は、コールセンターのオペレータ等のユーザが検索する検索対象の事例（質問（ｘ_ｍ），回答（ｙ_ｊ））の集合である事例集を格納するデータベースである。例えば、検索対象ＱＡ集ＤＢ３には、頻繁に問い合わせのある質問と回答のペアとする複数の事例が格納されている。 The search target QA collection DB3 is a database that stores a collection of search target cases (questions (x _m ), answers (y _j )) searched by users such as call center operators. For example, the search target QA collection DB3 stores a plurality of cases in which frequently inquired questions and answers are paired.

検索装置２は、インタフェース部２０と、回答検索器２１とを有する。インタフェース部２０は、キーボードなどの入力装置１０２（図８参照）を介したユーザからの入力を受け付け、入力に対する処理結果をモニタ１０３（図８参照）などの出力装置へ出力する。例えば、インタフェース部２０は、コールセンターのオペレータが顧客から受け付けた質問６を入力とし、入力された質問６による回答検索器２１の検索結果７を出力する。これにより、ユーザ（オペレータ）は、検索結果７として得られた回答を顧客に伝えることができる。 The search device 2 has an interface unit 20 and an answer search device 21. The interface unit 20 receives an input from a user via an input device 102 (see FIG. 8) such as a keyboard, and outputs a processing result for the input to an output device such as a monitor 103 (see FIG. 8). For example, the interface unit 20 inputs the question 6 received from the customer by the operator of the call center, and outputs the search result 7 of the answer search device 21 based on the input question 6. As a result, the user (operator) can convey the answer obtained as the search result 7 to the customer.

回答検索器２１は、学習データ４を教師データとする機械学習を用いて構築された検索対象ＱＡ集ＤＢ３の検索モデルにより、検索装置２において入力された質問６に対応する回答を検索対象ＱＡ集ＤＢ３より検索し、検索結果７として出力する。回答検索器２１における検索モデルとしては、例えば、脳のニューロンを模したユニットを入力層から中間層を経て出力層に至る間に階層的に結合したニューラル・ネットワークを適用できる。 The answer search device 21 searches for answers corresponding to the question 6 input in the search device 2 by the search model of the search target QA collection DB3 constructed by using machine learning using the learning data 4 as the teacher data. Search from DB3 and output as search result 7. As a search model in the answer search device 21, for example, a neural network in which units imitating neurons in the brain are hierarchically connected from an input layer to an output layer via an intermediate layer can be applied.

回答検索器２１は、学習時において、回答検索器２１の入力層に学習データ４に基づく入力（例えば質問（ｘ_ｊ，ｘ’）の入力）を行い、演算結果を示す出力値を出力層から出力させる。そして、学習データ４における正解情報（回答（ｙ_ｊ））と出力値との比較に基づいて、回答検索器２１が回答を出力するためのパラメータを学習する。より具体的には、回答検索器２１は、出力値と正解情報との比較結果を用いた誤差逆伝播（error Back Propagation：BP）法などによって、回答検索器２１におけるニューラル・ネットワークのパラメータを学習する。 At the time of learning, the answer searcher 21 inputs to the input layer of the answer searcher 21 based on the learning data 4 (for example, input of a question (x _j , x')), and outputs an output value indicating the calculation result from the output layer. Output. Then, based on the comparison between the correct answer information (answer (y _j )) in the learning data 4 and the output value, the answer searcher 21 learns the parameters for outputting the answer. More specifically, the answer searcher 21 learns the parameters of the neural network in the answer searcher 21 by an error back propagation (BP) method or the like using the comparison result between the output value and the correct answer information. do.

そして、検索時（検索モデルの適用時）においては、回答検索器２１は、入力層に入力された質問６に対し、学習したパラメータに応じた回答を検索結果７として出力する。 Then, at the time of search (when the search model is applied), the answer search device 21 outputs the answer according to the learned parameter to the question 6 input to the input layer as the search result 7.

学習データ生成装置１は、取得部１０、学習部１１、質問生成器１２および生成部１３を有する。 The learning data generation device 1 has an acquisition unit 10, a learning unit 11, a question generator 12, and a generation unit 13.

取得部１０は、オンラインＱＡ集ＤＢ５に格納された、質問（ｑ_ｉ）と、質問に対する少なくとも１つの回答（ａ_ｉ）とを含む事例の集合である事例集（オンラインＱＡ集）より事例を取得する。 The acquisition unit 10 acquires cases from a casebook (online QA collection), which is a set of cases including a question ( _qi ) and at least one answer ( _ai ) to the question, stored in the online QA collection DB5. do.

オンラインＱＡ集ＤＢ５に格納されたオンラインＱＡ集は、検索対象ＱＡ集ＤＢ３とは別の、例えば、インターネット等の通信ネットワークを介して知識を共有する電子掲示板などの共有サイト（ナレッジコミュニティ）の情報である。このオンラインＱＡ集は、通信ネットワークを介して投稿された質問（ｑ_ｉ）および当該質問に対して投稿された少なくとも１つの回答（ａ_ｉ）を事例として有する。 The online QA collection stored in the online QA collection DB5 is information on a shared site (knowledge community) such as an electronic bulletin board that shares knowledge via a communication network such as the Internet, which is different from the search target QA collection DB3. be. This online QA collection has, as an example, a question ( _{qi) posted via a communication network and at least one answer (ai} ₎ posted to the question.

取得部１０は、オンラインＱＡ集ＤＢ５に格納された事例（１，…，ｎ）を順次読み出し、事例における質問（ｑ_ｉ）と、回答（ａ_ｉ）とを取得する。 The acquisition unit 10 sequentially reads out the cases (1, ..., N) stored in the online QA collection DB5, and acquires the question (q _i ) and the answer ( _ai ) in the case.

なお、オンラインＱＡ集ＤＢ５に格納された事例は、複数のカテゴリ（例えば、パソコン関係、家電関係、…）に分類されていてもよい。このように、オンラインＱＡ集ＤＢ５に格納された事例が分類分けされている場合、取得部１０は、検索対象ＱＡ集ＤＢ３にかかるカテゴリの事例を取得してもよい。 The cases stored in the online QA collection DB5 may be classified into a plurality of categories (for example, personal computer-related, home appliance-related, ...). In this way, when the cases stored in the online QA collection DB 5 are classified, the acquisition unit 10 may acquire the cases in the category related to the search target QA collection DB 3.

例えば、検索対象ＱＡ集ＤＢ３に含まれる事例がパソコン関係である場合、取得部１０は、オンラインＱＡ集ＤＢ５においてパソコン関係のカテゴリに該当する事例を取得する。検索対象ＱＡ集ＤＢ３に含まれる事例がオンラインＱＡ集ＤＢ５のカテゴリのいずれに該当するかについては、ユーザが判断した上で事前に設定してもよいし、検索対象ＱＡ集ＤＢ３の事例を自然言語処理で解析した上でカテゴリ判断してもよい。 For example, when the case included in the search target QA collection DB3 is related to a personal computer, the acquisition unit 10 acquires the case corresponding to the personal computer-related category in the online QA collection DB5. The user may decide in advance which of the categories of the online QA collection DB5 the case included in the search target QA collection DB3 corresponds to, or the case of the search target QA collection DB3 may be set in natural language. You may judge the category after analyzing it by processing.

また、事例に含まれる回答（ａ_ｉ）のそれぞれには、回答（ａ_ｉ）に対する評価情報を有してもよい。一例として、評価情報は、回答に対して質問者が行う評価結果などがあり、質問者がよいと判断した回答に対する「いいね」や、最もよいと判断した回答に対する「ベストアンサー」などがある。 Further, each of the answers ( _ai ) included in the case may have evaluation information for the answer ( _ai ). As an example, the evaluation information includes the evaluation result performed by the questioner for the answer, such as "like" for the answer judged to be good by the questioner and "best answer" for the answer judged to be the best. ..

取得部１０は、事例に含まれる回答（ａ_ｉ）の中から、評価情報が所定の条件を満たす回答を取得してもよい。例えば、取得部１０は、質問（ｑ_ｉ）とともに、質問に対する複数の回答（ａ_ｉ）の中から評価情報が「ベストアンサー」の回答を取得する。 The acquisition unit 10 may acquire an answer whose evaluation information satisfies a predetermined condition from the answers ( _ai ) included in the case. For example, the acquisition unit 10 acquires an answer whose evaluation information is "best answer" from a plurality of answers ( _ai ) to the question together with the question (q _i ).

学習部１１は、取得部１０が取得した事例を教師データとし、例えば機械翻訳の分野で用いられている、ｓｅｑｕｅｎｃｅｔｏｓｅｑｕｅｎｃｅの枠組みにて、事例に基づく入力に対して仮想質問を生成する質問生成器１２の学習を行う。 The learning unit 11 uses the case acquired by the acquisition unit 10 as teacher data, and generates a virtual question for input based on the case in the framework of sequence to sequence, which is used in the field of machine translation, for example. Learn the vessel 12.

このｓｅｑｕｅｎｃｅｔｏｓｅｑｕｅｎｃｅについては、Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural networks. In advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp. 3104-3112, 2014.などがある。 About this sequence to sequence, Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural networks. In advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014 , Montreal, Quebec, Canada, pp. 3104-3112, 2014. etc.

具体的には、学習部１１は、取得部１０が取得した事例に含まれる質問（ｑ_ｉ）および回答（ａ_ｉ）に基づく入力を質問生成器１２に行う。次いで、学習部１１は、質問生成器１２への入力に対し、質問生成器１２が出力した仮想質問の単語列が質問（ｑ_ｉ）の単語列に対応するように質問生成器１２が仮想質問の単語列を生成するためのパラメータを学習する。 Specifically, the learning unit 11 inputs to the question generator 12 based on the question (q _i ) and the answer ( _ai ) included in the case acquired by the acquisition unit 10. Next, the learning unit 11 asks the question generator 12 a virtual question so that the word string of the virtual question output by the question generator 12 corresponds to the word string of the question (q _i ) in response to the input to the question generator 12. Learn the parameters for generating the word string of.

より具体的には、学習部１１は、取得部１０が取得した事例に含まれる質問（ｑ_ｉ）の単語列（ｙ＝＜ｙ_１，…，ｙ_Ｍ＞）および回答（ａ_ｉ）の単語列（ｘ＝＜ｘ_１，…，ｘ_Ｎ＞）の対を教師データＤとする。学習部１１は、質問生成器１２の学習では、次の式（１）において、負の対数尤度を最小化するように質問生成器１２のパラメータ（φ）を更新する。 More specifically, the learning unit 11 has a word string (y = <y ₁ , ..., y _M >) of a question (q _i ) and a word of an answer ( _ai ) included in the case acquired by the acquisition unit 10. Let the pair of columns (x = <x ₁ , ..., x _N >) be the teacher data D. In the learning of the question generator 12, the learning unit 11 updates the parameter (φ) of the question generator 12 so as to minimize the negative log-likelihood in the following equation (1).

質問生成器１２は、学習部１１による学習により構築された質問生成のモデルにより、生成部１３により入力された事例に対応する仮想質問を生成して出力する。具体的には、質問生成器１２は、回答検索器２１と同様のニューラル・ネットワークを適用できる。 The question generator 12 generates and outputs a virtual question corresponding to the case input by the generation unit 13 by the question generation model constructed by the learning by the learning unit 11. Specifically, the question generator 12 can apply the same neural network as the answer searcher 21.

生成部１３は、検索対象ＱＡ集ＤＢ３に含まれる事例（質問（ｘ_ｊ），回答（ｙ_ｊ））を基に質問生成器１２により生成した仮想質問を用いた学習データ４を生成する。具体的には、生成部１３は、学習部１１により学習した質問生成器１２に対して検索対象ＱＡ集ＤＢ３に含まれる事例（質問（ｘ_ｊ），回答（ｙ_ｊ））に基づく入力を行い、入力した事例に対応する仮想質問（ｘ’）を得る。次いで、生成部１３は、質問生成器１２に入力した事例（質問（ｘ_ｊ），回答（ｙ_ｊ））と、入力により生成した仮想質問（ｘ’）とを対応付けた学習データ４を生成する。 The generation unit 13 generates learning data 4 using a virtual question generated by the question generator 12 based on the cases (question (x _j ), answer (y _j )) included in the search target QA collection DB3. Specifically, the generation unit 13 inputs to the question generator 12 learned by the learning unit 11 based on the cases (question (x _j ), answer (y _j )) included in the search target QA collection DB3. , Get a virtual question (x') corresponding to the entered case. Next, the generation unit 13 generates learning data 4 in which the case (question (x _j ), answer (y _j )) input to the question generator 12 and the virtual question (x') generated by the input are associated with each other. do.

図２は、実施形態にかかるシステムの動作例を示すフローチャートである。図２に示すように、学習データ生成装置１の学習部１１は、オンラインＱＡ集ＤＢ５より取得した事例を教師データとして質問生成器１２を学習する（Ｓ１）。 FIG. 2 is a flowchart showing an operation example of the system according to the embodiment. As shown in FIG. 2, the learning unit 11 of the learning data generation device 1 learns the question generator 12 using the case acquired from the online QA collection DB 5 as teacher data (S1).

次いで、学習データ生成装置１の生成部１３は、学習後の質問生成器１２で検索対象ＱＡ集ＤＢ３に含まれる対象事例の回答から質問（仮想質問）を生成する（Ｓ２）。これにより、生成部１３は、生成した仮想質問と対象事例とを対応付けた学習データ４を生成する。 Next, the generation unit 13 of the learning data generation device 1 generates a question (virtual question) from the answers of the target cases included in the search target QA collection DB 3 by the question generator 12 after learning (S2). As a result, the generation unit 13 generates the learning data 4 in which the generated virtual question and the target case are associated with each other.

ここで、Ｓ１、Ｓ２において学習データ４を生成する学習データ生成処理の詳細を説明する。図３は、実施形態にかかる学習データ生成処理の一例を示すフローチャートである。 Here, the details of the learning data generation process for generating the learning data 4 in S1 and S2 will be described. FIG. 3 is a flowchart showing an example of the learning data generation process according to the embodiment.

図３に示すように、処理が開始されると、学習データ生成装置１は、オンラインＱＡ集ＤＢ５に格納された事例の中で読み出し対象とする事例分（１，…，ｎ：ｆｏｒｉ＝１ｔｏｎ）のループ処理（Ｓ１０～Ｓ１５）を開始する。読み出し対象となる事例とは、例えば、検索対象ＱＡ集ＤＢ３にかかるカテゴリの事例などである。 As shown in FIG. 3, when the processing is started, the learning data generation device 1 has the cases (1, ..., n: for i = 1) to be read out among the cases stored in the online QA collection DB5. To n) loop processing (S10 to S15) is started. The case to be read is, for example, the case of the category related to the search target QA collection DB3.

ループ処理が開始されると、取得部１０は、オンラインＱＡ集ＤＢ５より質問（ｑ_ｉ）および回答（ａ_ｉ）を取得する（Ｓ１１）。ここで、回答（ａ_ｉ）については、例えば、複数の中から評価情報が「ベストアンサー」であるものを取得する。 When the loop processing is started, the acquisition unit 10 acquires a question (q _i ) and an answer ( _ai ) from the online QA collection DB 5 (S11). Here, as for the answer ( _ai ), for example, the one whose evaluation information is the "best answer" is acquired from a plurality of answers.

次いで、学習部１１は、質問生成器１２の入力層に例えば回答（ａ_ｉ）を入力し、質問生成器１２のパラメータ（φ）に基づいて回答（ａ_ｉ）から質問（ｑ）を生成する（Ｓ１２）。 Next, the learning unit 11 inputs, for example, an answer ( _ai ) to the input layer of the question generator 12, and generates a question (q) from the answer ( _ai ) based on the parameter (φ) of the question generator 12. (S12).

次いで、学習部１１は、正しい質問（ｑ_ｉ）および生成した質問（ｑ）に基づいて損失（正しい質問に対する誤差）を計算する（Ｓ１３）。 Next, the learning unit 11 calculates the loss (error with respect to the correct question) based on the correct question (q _i ) and the generated question (q) (S13).

次いで、学習部１１は、正しい質問（ｑ_ｉ）に近い質問を質問生成器１２が生成できるように、質問生成器１２のパラメータ（φ）を更新する（Ｓ１４）。 Next, the learning unit 11 updates the parameter (φ) of the question generator 12 so that the question generator 12 can generate a question close to the correct question (q _i ) (S14).

図４は、学習時における質問生成器１２の動作例を説明する説明図である。図４に示すように、学習時において、質問生成器１２には回答（ａ_ｉ）を入力する。次いで、質問生成器１２から出力される単語列と、正しい質問（ｑ_ｉ）の単語列とを比較し、式（１）において負の対数尤度を最小化するようにパラメータ（φ）を更新する。 FIG. 4 is an explanatory diagram illustrating an operation example of the question generator 12 during learning. As shown in FIG. 4, an answer ( _ai ) is input to the question generator 12 at the time of learning. Next, the word string output from the question generator 12 is compared with the word string of the correct question (q _i ), and the parameter (φ) is updated so as to minimize the negative log-likelihood in the equation (1). do.

学習データ生成装置１は、上記のループ処理（Ｓ１０～Ｓ１５）を読み出し対象の事例分繰り返すことで、質問生成器１２のパラメータ（φ）を取得する（Ｓ１６）。 The learning data generator 1 acquires the parameter (φ) of the question generator 12 by repeating the above loop processing (S10 to S15) for the cases to be read (S16).

次いで、学習データ生成装置１は、検索対象ＱＡ集ＤＢ３に格納された事例分（１，…，ｋ：ｆｏｒｊ＝１ｔｏｋ）のループ処理（Ｓ１７～Ｓ２０）を開始する。 Next, the learning data generation device 1 starts the loop processing (S17 to S20) for the cases (1, ..., k: for j = 1 to k) stored in the search target QA collection DB3.

ループ処理が開始されると、生成部１３は、質問生成器１２の入力層に例えば検索対象ＱＡ集ＤＢ３の回答（ｙ_ｉ）を入力し、質問生成器１２のパラメータ（φ）に基づいて検索対象ＱＡ集ＤＢ３の回答（ｙ_ｉ）から仮想質問（ｘ’）を生成する（Ｓ１８）。 When the loop processing is started, the generation unit 13 inputs, for example, the answer (y _i ) of the search target QA collection DB 3 to the input layer of the question generator 12, and searches based on the parameter (φ) of the question generator 12. A virtual question (x') is generated from the answer (y _i ) of the target QA collection DB3 (S18).

図５は、質問生成時における質問生成器１２の動作例を説明する説明図である。図５に示すように、質問生成時において、質問生成器１２には回答（ｙ_ｉ）を入力する。この入力に対し、質問生成器１２は、パラメータ（φ）に基づいて生成した単語列を、仮想質問（ｘ’）として出力する。 FIG. 5 is an explanatory diagram illustrating an operation example of the question generator 12 at the time of question generation. As shown in FIG. 5, an answer (y _i ) is input to the question generator 12 at the time of question generation. In response to this input, the question generator 12 outputs the word string generated based on the parameter (φ) as a virtual question (x').

次いで、生成部１３は、生成した仮想質問（ｘ’）を検索対象ＱＡ集ＤＢ３のＦＡＱ事例（質問（ｘ_ｊ），回答（ｙ_ｊ））が正解となるクエリとして対応付けて学習データ４に追加する（Ｓ１９）。 Next, the generation unit 13 associates the generated virtual question (x') with the learning data 4 as a query in which the FAQ example (question (x _j ), answer (y _j )) of the search target QA collection DB3 is the correct answer. Add (S19).

学習データ生成装置１は、上記のループ処理（Ｓ１７～Ｓ２０）を検索対象ＱＡ集ＤＢ３のＦＡＱ事例分繰り返すことで、検索対象ＱＡ集ＤＢ３のＦＡＱ事例を学習するための学習データ４を生成し、生成した学習データ４を出力する（Ｓ２１）。 The learning data generation device 1 generates learning data 4 for learning the FAQ cases of the search target QA collection DB3 by repeating the above loop processing (S17 to S20) for the FAQ cases of the search target QA collection DB3. The generated learning data 4 is output (S21).

図２に戻り、検索装置２は、生成した仮想質問（ｘ’）を用いた学習データ４により、回答検索器２１を学習する（Ｓ３）。 Returning to FIG. 2, the search device 2 learns the answer search device 21 from the learning data 4 using the generated virtual question (x') (S3).

例えば、回答検索器２１による検索対象ＱＡ集ＤＢ３のＦＡＱ事例の検索では、次の式（２）に示すように、検索モデルによって与えられるスコアが最大となるＦＡＱ事例を返す。 For example, in the search of the FAQ case of the search target QA collection DB3 by the answer search device 21, the FAQ case having the maximum score given by the search model is returned as shown in the following equation (2).

式（２）において、Ｓは検索対象のＦＡＱ事例の集合とし、Ｆ_θ（ｑ，ｄ）は検索モデルのパラメータをθとしたときのクエリ（ｑ）に対するＦＡＱ事例（ｄ）のスコアとする。 In the formula (2), S is a set of FAQ cases to be searched, and F _θ (q, d) is the score of the FAQ case (d) for the query (q) when the parameter of the search model is θ.

回答検索器２１の学習においては、クエリとその正しい回答のペアの集合がＲ＝｛（ｑ^（１），ｄ^（１）），…，（ｑ^（ｎ），ｄ^（ｎ））｝と与えられた際に次の式（３）を最小にするようなθを求めることとなる。 In the learning of the answer searcher 21, the set of pairs of the query and its correct answer is given as R = {(q ⁽¹⁾ , d ⁽¹⁾ ), ..., (Q ⁽ⁿ⁾ , d ⁽ⁿ⁾ )}. When this is done, θ is obtained so as to minimize the following equation (3).

ただし、Ｓ（ｑ^（ｉ））はクエリｑ^（ｉ）に対して検索対象となるＦＡＱ事例の集合とする。この式（３）は、検索に関するエラーとなり、全ての質問に対して正しい回答を返すことができれば、値は０となる。 However, S (q ⁽ⁱ⁾ ) is a set of FAQ cases to be searched for the query q ⁽ⁱ⁾ . This formula (3) becomes an error related to the search, and if the correct answer can be returned to all the questions, the value becomes 0.

次いで、検索装置２は、インタフェース部２０よりユーザからの質問６を受け付ける（Ｓ４）。検索装置２は、回答検索器２１を用いて、受け付けた質問６に対応するＦＡＱ事例の検索を実施する（Ｓ５）。次いで、検索装置２は、検索の実施によって得られた検索結果７をインタフェース部２０より出力する。 Next, the search device 2 receives the question 6 from the user from the interface unit 20 (S4). The search device 2 uses the answer search device 21 to search for FAQ cases corresponding to the received question 6 (S5). Next, the search device 2 outputs the search result 7 obtained by executing the search from the interface unit 20.

以上のように、学習データ生成装置１は、検索対象ＱＡ集ＤＢ３に含まれる事例を検索する、検索装置２における回答検索器２１の学習データ４を生成する。学習データ生成装置１の取得部１０は、質問と、この質問に対する少なくとも１つの回答とを含む事例を格納するオンラインＱＡ集ＤＢ５から事例を取得する。学習データ生成装置１の学習部１１は、取得した事例に含まれる質問および回答に基づく質問生成器１２への入力に対し、質問生成器１２が出力した仮想質問の単語列が取得した質問の単語列に対応するように質問生成器１２を学習する。学習データ生成装置１の生成部１３は、学習した質問生成器１２に対して検索対象ＱＡ集ＤＢ３に含まれる事例に基づく入力を行って生成した仮想質問と、入力した事例とを対応付けた学習データ４を生成する。 As described above, the learning data generation device 1 generates the learning data 4 of the answer search device 21 in the search device 2 for searching the cases included in the search target QA collection DB 3. The acquisition unit 10 of the learning data generation device 1 acquires a case from the online QA collection DB 5 that stores the question and the case including at least one answer to the question. The learning unit 11 of the learning data generation device 1 receives the input to the question generator 12 based on the questions and answers included in the acquired case, and the word string of the virtual question output by the question generator 12 is the acquired question word. The question generator 12 is trained to correspond to the columns. The generation unit 13 of the learning data generation device 1 inputs a virtual question generated by inputting to the learned question generator 12 based on the case included in the search target QA collection DB3, and the learning in which the input case is associated with each other. Generate data 4.

このように、学習データ生成装置１は、回答検索器２１の学習を行う学習データ４として、検索対象ＱＡ集ＤＢ３とは別のオンラインＱＡ集ＤＢ５で質問生成器１２を学習して生成した、検索対象ＱＡ集ＤＢ３に含まれる事例に対応する仮想質問と、検索対象ＱＡ集ＤＢ３の事例とを対応付けたものを生成する。したがって、学習データ４は、回答検索器２１の学習に用いることで、ユーザが問い合わせに用いる語彙の多様性に回答検索器２１を対応させることができる。また、学習データ４は、例えば、検索対象ＱＡ集ＤＢ３がコールセンター運用開始直後の新規のものであっても、回答検索器２１における検索精度の向上を可能とする。 As described above, the learning data generation device 1 is generated by learning the question generator 12 in the online QA collection DB 5 different from the search target QA collection DB 3 as the learning data 4 for learning the answer search device 21. A virtual question corresponding to a case included in the target QA collection DB3 is associated with a case of the search target QA collection DB3. Therefore, by using the learning data 4 for learning the answer search device 21, the answer search device 21 can be made to correspond to the variety of vocabulary used by the user for inquiries. Further, the learning data 4 makes it possible to improve the search accuracy in the answer search device 21, even if the search target QA collection DB 3 is a new one immediately after the start of the call center operation.

図６は、従来の回答検索の具体例を説明する説明図である。図６に示すように、従来の回答検索器３０では、検索対象ＱＡ集ＤＢ３における質問と回答のペアをそのまま教師データとして学習している。このため、回答検索器３０では、「ＨＰが見れない」などの検索対象ＱＡ集ＤＢ３内で該当しない語彙が含まれる質問６については、「ヒットなし」などとする検索結果７となる。 FIG. 6 is an explanatory diagram illustrating a specific example of a conventional answer search. As shown in FIG. 6, in the conventional answer search device 30, the pair of the question and the answer in the search target QA collection DB3 is learned as it is as teacher data. Therefore, in the answer search device 30, for the question 6 including the vocabulary that does not correspond in the search target QA collection DB3 such as "HP cannot be seen", the search result 7 is "no hit" or the like.

図７は、実施形態にかかるシステムの回答検索の具体例を説明する説明図である。図７に示すように、本実施形態では、オンラインＱＡ集ＤＢ５で学習した質問生成器１２により、検索対象ＱＡ集ＤＢ３に含まれる事例に対応する仮想質問を生成する。そして、検索対象ＱＡ集ＤＢ３に含まれる事例に対し、生成した仮想質問を加えたものを回答検索器２１の学習データとする。質問生成器１２が生成した仮想質問には、例えば、「ＨＰが見れない」などの質問６に対しても、該当する語彙（例えば「ＨＰが閲覧できません」）が含まれる場合がある。したがって、本実施形態では、回答検索器２１は、質問６に対して該当する回答である「ルータなど、ネットワーク機器を再起動して下さい。」を検索結果７とすることができる。 FIG. 7 is an explanatory diagram illustrating a specific example of the response search of the system according to the embodiment. As shown in FIG. 7, in the present embodiment, the question generator 12 learned in the online QA collection DB 5 generates a virtual question corresponding to the case included in the search target QA collection DB 3. Then, the learning data of the answer search device 21 is obtained by adding the generated virtual question to the case included in the search target QA collection DB3. The virtual question generated by the question generator 12 may include the corresponding vocabulary (for example, "HP cannot be viewed") even for question 6 such as "HP cannot be viewed". Therefore, in the present embodiment, the answer searcher 21 can use the answer "Please restart the network device such as a router" corresponding to the question 6 as the search result 7.

また、取得部１０は、オンラインＱＡ集ＤＢ５の事例集における複数のカテゴリの中で、検索対象ＱＡ集ＤＢ３の事例集にかかるカテゴリに含まれる事例を取得する。これにより、学習データ生成装置１では、検索対象ＱＡ集ＤＢ３の事例集に関連する事例をオンラインＱＡ集ＤＢ５より取得して質問生成器１２の学習を行うことができる。したがって、学習データ生成装置１は、検索対象ＱＡ集ＤＢ３の事例集に関連する仮想質問を質問生成器１２に生成させることができることから、回答検索器２１における検索精度の向上を可能とする。 Further, the acquisition unit 10 acquires the cases included in the category related to the case collection of the search target QA collection DB3 among the plurality of categories in the case collection of the online QA collection DB5. As a result, the learning data generation device 1 can acquire the cases related to the case collection of the search target QA collection DB3 from the online QA collection DB5 and learn the question generator 12. Therefore, since the learning data generation device 1 can cause the question generator 12 to generate a virtual question related to the casebook of the search target QA collection DB3, it is possible to improve the search accuracy in the answer searcher 21.

また、取得部１０は、オンラインＱＡ集ＤＢ５の事例に含まれる質問と、当該質問に対する複数の回答の中の、評価情報が所定の条件を満たす回答とを取得する。このため、学習データ生成装置１は、評価情報が所定の条件を満たす回答、例えば、評価の高い回答を用いて質問生成器１２の学習を行うことができ、回答検索器２１における検索精度の向上を可能とする。 Further, the acquisition unit 10 acquires a question included in the case of the online QA collection DB5 and an answer among a plurality of answers to the question whose evaluation information satisfies a predetermined condition. Therefore, the learning data generation device 1 can learn the question generator 12 by using an answer whose evaluation information satisfies a predetermined condition, for example, an answer having a high evaluation, and improves the search accuracy in the answer search device 21. Is possible.

また、オンラインＱＡ集ＤＢ５に格納される事例集は通信ネットワークを介して知識を共有する共有サイトであり、取得部１０がオンラインＱＡ集ＤＢ５より取得する事例は、共有サイトに投稿された質問および当該質問に対して投稿された少なくとも１つの回答である。これにより、学習データ生成装置１は、共有サイトの投稿内容で質問生成器１２の学習を行うことができ、共有サイトのユーザが用いる多様な語彙を仮想質問に反映させることができる。したがって、仮想質問を含めた学習データ４を回答検索器２１の学習に用いることで、ユーザが問い合わせに用いる語彙の多様性に回答検索器２１を対応させることができる。 Further, the casebook stored in the online QA collection DB5 is a shared site for sharing knowledge via a communication network, and the cases acquired by the acquisition unit 10 from the online QA collection DB5 are questions posted on the shared site and the relevant cases. At least one answer posted to the question. As a result, the learning data generation device 1 can learn the question generator 12 from the posted contents of the shared site, and can reflect various vocabularies used by the users of the shared site in the virtual question. Therefore, by using the learning data 4 including the virtual question for the learning of the answer searcher 21, the answer searcher 21 can correspond to the diversity of the vocabulary used by the user for the inquiry.

なお、図示した各装置の各構成要素は、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 It should be noted that each component of each of the illustrated devices does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or part of them may be functionally or physically distributed / physically in any unit according to various loads and usage conditions. Can be integrated and configured.

学習データ生成装置１、検索装置２で行われる各種処理機能は、ＣＰＵ（またはＭＰＵ、ＭＣＵ（Micro Controller Unit）等のマイクロ・コンピュータ）上で、その全部または任意の一部を実行するようにしてもよい。また、各種処理機能は、ＣＰＵ（またはＭＰＵ、ＭＣＵ等のマイクロ・コンピュータ）で解析実行されるプログラム上、またはワイヤードロジックによるハードウエア上で、その全部または任意の一部を実行するようにしてもよいことは言うまでもない。また、学習データ生成装置１、検索装置２で行われる各種処理機能は、クラウドコンピューティングにより、複数のコンピュータが協働して実行してもよい。 The various processing functions performed by the learning data generation device 1 and the search device 2 are executed on the CPU (or a microcomputer such as an MPU or MCU (Micro Controller Unit)) in whole or in any part thereof. May be good. In addition, various processing functions may be executed in whole or in any part on a program analyzed and executed by a CPU (or a microcomputer such as an MPU or MCU) or on hardware by wired logic. Needless to say, it's good. Further, various processing functions performed by the learning data generation device 1 and the search device 2 may be executed by a plurality of computers in cooperation by cloud computing.

ところで、上記の実施形態で説明した各種の処理は、予め用意されたプログラムをコンピュータで実行することで実現できる。そこで、以下では、上記の実施例と同様の機能を有するプログラムを実行するコンピュータ（ハードウェア）の一例を説明する。図８は、プログラムを実行するコンピュータの一例を示す説明図である。 By the way, various processes described in the above-described embodiment can be realized by executing a program prepared in advance on a computer. Therefore, in the following, an example of a computer (hardware) that executes a program having the same function as the above embodiment will be described. FIG. 8 is an explanatory diagram showing an example of a computer that executes a program.

図８に示すように、コンピュータ１００は、各種演算処理を実行するＣＰＵ１０１と、データ入力を受け付ける入力装置１０２と、モニタ１０３と、スピーカ１０４とを有する。また、コンピュータ１００は、記憶媒体からプログラム等を読み取る媒体読取装置１０５と、各種装置と接続するためのインタフェース装置１０６と、有線または無線により外部機器と通信接続するための通信装置１０７とを有する。また、コンピュータ１００は、各種情報を一時記憶するＲＡＭ１０８と、ハードディスク装置１０９とを有する。また、コンピュータ１００内の各部（１０１～１０９）は、バス１１０に接続される。 As shown in FIG. 8, the computer 100 includes a CPU 101 that executes various arithmetic processes, an input device 102 that accepts data input, a monitor 103, and a speaker 104. Further, the computer 100 has a medium reading device 105 for reading a program or the like from a storage medium, an interface device 106 for connecting to various devices, and a communication device 107 for communicating with an external device by wire or wirelessly. Further, the computer 100 has a RAM 108 for temporarily storing various information and a hard disk device 109. Further, each part (101 to 109) in the computer 100 is connected to the bus 110.

ハードディスク装置１０９には、上記の実施形態で説明した取得部１０、学習部１１、質問生成器１２、生成部１３、インタフェース部２０および回答検索器２１等の機能部における各種処理を実行するためのプログラム１１１が記憶される。また、ハードディスク装置１０９には、プログラム１１１が参照する検索対象ＱＡ集ＤＢ３や学習データ４等の各種データ１１２が記憶される。入力装置１０２は、例えば、コンピュータ１００の操作者から操作情報の入力を受け付ける。モニタ１０３は、例えば、操作者が操作する各種画面を表示する。インタフェース装置１０６は、例えば印刷装置等が接続される。通信装置１０７は、ＬＡＮ（Local Area Network）等の通信ネットワークと接続され、通信ネットワークを介した外部機器との間で各種情報をやりとりする。 The hard disk device 109 is for executing various processes in the functional units such as the acquisition unit 10, the learning unit 11, the question generator 12, the generation unit 13, the interface unit 20, and the answer searcher 21 described in the above embodiment. Program 111 is stored. Further, the hard disk device 109 stores various data 112 such as the search target QA collection DB 3 and the learning data 4 referenced by the program 111. The input device 102 receives, for example, input of operation information from the operator of the computer 100. The monitor 103 displays, for example, various screens operated by the operator. For example, a printing device or the like is connected to the interface device 106. The communication device 107 is connected to a communication network such as a LAN (Local Area Network), and exchanges various information with an external device via the communication network.

ＣＰＵ１０１は、ハードディスク装置１０９に記憶されたプログラム１１１を読み出して、ＲＡＭ１０８に展開して実行することで、取得部１０、学習部１１、質問生成器１２、生成部１３、インタフェース部２０および回答検索器２１等にかかる各種の処理を行う。なお、プログラム１１１は、ハードディスク装置１０９に記憶されていなくてもよい。例えば、コンピュータ１００は、読み取り可能な記憶媒体に記憶されたプログラム１１１を読み出して実行するようにしてもよい。コンピュータ１００が読み取り可能な記憶媒体は、例えば、ＣＤ－ＲＯＭやＤＶＤディスク、ＵＳＢ（Universal Serial Bus）メモリ等の可搬型記録媒体、フラッシュメモリ等の半導体メモリ、ハードディスクドライブ等が対応する。また、公衆回線、インターネット、ＬＡＮ等に接続された装置にプログラム１１１を記憶させておき、コンピュータ１００がこれらからプログラム１１１を読み出して実行するようにしてもよい。 The CPU 101 reads out the program 111 stored in the hard disk device 109, expands it into the RAM 108, and executes it, so that the acquisition unit 10, the learning unit 11, the question generator 12, the generation unit 13, the interface unit 20, and the answer searcher are executed. Various processes related to 21 and the like are performed. The program 111 may not be stored in the hard disk device 109. For example, the computer 100 may read and execute the program 111 stored in the readable storage medium. The storage medium that can be read by the computer 100 corresponds to, for example, a CD-ROM, a DVD disk, a portable recording medium such as a USB (Universal Serial Bus) memory, a semiconductor memory such as a flash memory, a hard disk drive, or the like. Further, the program 111 may be stored in a device connected to a public line, the Internet, a LAN, or the like, and the computer 100 may read the program 111 from these and execute the program 111.

以上の実施形態に関し、さらに以下の付記を開示する。 The following additional notes are further disclosed with respect to the above embodiments.

（付記１）第１の事例の集合である第１の事例集を検索する検索器の学習データを生成する学習データ生成方法であって、
質問と、当該質問に対する少なくとも１つの回答とを含む第２の事例の集合である第２の事例集から前記第２の事例を取得し、
取得した前記第２の事例に含まれる質問および回答に基づく質問生成器への入力に対し、当該質問生成器が出力した仮想質問の単語列が前記質問の単語列に対応するように前記質問生成器を学習し、
学習した前記質問生成器に対して前記第１の事例に基づく入力を行って生成した仮想質問と、入力した前記第１の事例とを対応付けた学習データを生成する、
処理をコンピュータが実行することを特徴とする学習データ生成方法。 (Appendix 1) A learning data generation method for generating learning data of a search device for searching a first casebook, which is a set of first cases.
Obtain the second case from the second casebook, which is a set of second cases including a question and at least one answer to the question.
In response to the input to the question generator based on the questions and answers included in the acquired second case, the question generation is performed so that the word string of the virtual question output by the question generator corresponds to the word string of the question. Learn the vessel,
A learning data is generated in which a virtual question generated by inputting an input based on the first case to the learned question generator and the input first case are associated with each other.
A learning data generation method characterized by a computer performing processing.

（付記２）前記取得する処理は、前記第２の事例集における複数のカテゴリの中で、前記第１の事例集にかかるカテゴリに含まれる前記第２の事例を取得する、
ことを特徴とする付記１に記載の学習データ生成方法。 (Appendix 2) The acquisition process acquires the second case included in the category related to the first case collection among the plurality of categories in the second case collection.
The learning data generation method according to Appendix 1, wherein the learning data is generated.

（付記３）前記第２の事例は、前記質問に対する複数の回答それぞれに評価を示す評価情報を有し、
前記取得する処理は、前記第２の事例に含まれる質問と、当該質問に対する複数の回答の中の、前記評価情報が所定の条件を満たす回答とを取得する、
ことを特徴とする付記１または２に記載の学習データ生成方法。 (Appendix 3) The second case has evaluation information indicating an evaluation for each of a plurality of answers to the question.
The acquisition process acquires a question included in the second case and an answer from a plurality of answers to the question whose evaluation information satisfies a predetermined condition.
The learning data generation method according to Appendix 1 or 2, wherein the learning data is generated.

（付記４）前記第２の事例集は、通信ネットワークを介して知識を共有する共有サイトであり、前記第２の事例は、前記共有サイトに投稿された質問および当該質問に対して投稿された少なくとも１つの回答である、
ことを特徴とする付記１乃至３のいずれか一に記載の学習データ生成方法。 (Appendix 4) The second casebook is a shared site for sharing knowledge via a communication network, and the second casebook is a question posted on the shared site and posted for the question. At least one answer,
The learning data generation method according to any one of Supplementary note 1 to 3, wherein the learning data is generated.

（付記５）第１の事例の集合である第１の事例集を検索する検索器の学習データを生成する学習データ生成プログラムであって、
質問と、当該質問に対する少なくとも１つの回答とを含む第２の事例の集合である第２の事例集から前記第２の事例を取得し、
取得した前記第２の事例に含まれる質問および回答に基づく質問生成器への入力に対し、当該質問生成器が出力した仮想質問の単語列が前記質問の単語列に対応するように前記質問生成器を学習し、
学習した前記質問生成器に対して前記第１の事例に基づく入力を行って生成した仮想質問と、入力した前記第１の事例とを対応付けた学習データを生成する、
処理をコンピュータに実行させることを特徴とする学習データ生成プログラム。 (Appendix 5) A learning data generation program that generates learning data of a search device for searching a first casebook, which is a set of first cases.
Obtain the second case from the second casebook, which is a set of second cases including a question and at least one answer to the question.
In response to the input to the question generator based on the questions and answers included in the acquired second case, the question generation is performed so that the word string of the virtual question output by the question generator corresponds to the word string of the question. Learn the vessel,
A learning data is generated in which a virtual question generated by inputting an input based on the first case to the learned question generator and the input first case are associated with each other.
A learning data generation program characterized by having a computer execute processing.

（付記６）前記取得する処理は、前記第２の事例集における複数のカテゴリの中で、前記第１の事例集にかかるカテゴリに含まれる前記第２の事例を取得する、
ことを特徴とする付記５に記載の学習データ生成プログラム。 (Appendix 6) The acquisition process acquires the second case included in the category related to the first case collection among the plurality of categories in the second case collection.
The learning data generation program according to Appendix 5, characterized in that.

（付記７）前記第２の事例は、前記質問に対する複数の回答それぞれに評価を示す評価情報を有し、
前記取得する処理は、前記第２の事例に含まれる質問と、当該質問に対する複数の回答の中の、前記評価情報が所定の条件を満たす回答とを取得する、
ことを特徴とする付記５または６に記載の学習データ生成プログラム。 (Appendix 7) The second case has evaluation information indicating an evaluation for each of a plurality of answers to the question.
The acquisition process acquires a question included in the second case and an answer from a plurality of answers to the question whose evaluation information satisfies a predetermined condition.
The learning data generation program according to the appendix 5 or 6, characterized in that.

（付記８）前記第２の事例集は、通信ネットワークを介して知識を共有する共有サイトであり、前記第２の事例は、前記共有サイトに投稿された質問および当該質問に対して投稿された少なくとも１つの回答である、
ことを特徴とする付記５乃至７のいずれか一に記載の学習データ生成プログラム。 (Appendix 8) The second casebook is a shared site for sharing knowledge via a communication network, and the second casebook is a question posted on the shared site and posted for the question. At least one answer,
The learning data generation program according to any one of Supplementary note 5 to 7, wherein the learning data generation program is characterized by the above.

（付記９）第１の事例の集合である第１の事例集を検索する検索器の学習に用いる学習データのデータ構造であって、
質問と、当該質問に対する少なくとも１つの回答とを含む第２の事例の集合である第２の事例集から前記第２の事例を取得し、取得した前記第２の事例に含まれる質問および回答に基づく質問生成器への入力に対し、当該質問生成器が出力した仮想質問の単語列が前記質問の単語列に対応するように前記質問生成器を学習し、学習した前記質問生成器に対して前記第１の事例に基づく入力を行って生成した仮想質問と、
入力した前記第１の事例と、が対応付けられた学習データとして、前記検索器の入力層に入力されることにより、演算結果を示す出力値を前記検索器の出力層から出力させ、正解情報と前記出力値との比較に基づく学習を行う、
処理をコンピュータに実行させることを特徴とするデータ構造。 (Appendix 9) A data structure of learning data used for learning of a search device for searching a first casebook, which is a set of first cases.
The second case is acquired from the second casebook, which is a set of the second case including the question and at least one answer to the question, and the question and answer included in the acquired second case The question generator is learned so that the word string of the virtual question output by the question generator corresponds to the word string of the question in response to the input to the question generator based on the question generator. A virtual question generated by inputting based on the first case,
By inputting to the input layer of the search device as learning data associated with the input first example, an output value indicating a calculation result is output from the output layer of the search device, and correct answer information is obtained. And learning based on the comparison with the output value,
A data structure characterized by having a computer perform processing.

１…学習データ生成装置
２…検索装置
３…検索対象ＱＡ集ＤＢ
４…学習データ
５…オンラインＱＡ集ＤＢ
６…質問
７…検索結果
１０…取得部
１１…学習部
１２…質問生成器
１３…生成部
２０…インタフェース部
２１…回答検索器
３０…回答検索器
１００…コンピュータ
１０１…ＣＰＵ
１０２…入力装置
１０３…モニタ
１０４…スピーカ
１０５…媒体読取装置
１０６…インタフェース装置
１０７…通信装置
１０８…ＲＡＭ
１０９…ハードディスク装置
１１０…バス
１１１…プログラム
１１２…各種データ 1 ... Learning data generation device 2 ... Search device 3 ... Search target QA collection DB
4 ... Learning data 5 ... Online QA collection DB
6 ... Question 7 ... Search result 10 ... Acquisition unit 11 ... Learning unit 12 ... Question generator 13 ... Generation unit 20 ... Interface unit 21 ... Answer searcher 30 ... Answer searcher 100 ... Computer 101 ... CPU
102 ... Input device 103 ... Monitor 104 ... Speaker 105 ... Medium reading device 106 ... Interface device 107 ... Communication device 108 ... RAM
109 ... Hard disk device 110 ... Bus 111 ... Program 112 ... Various data

Claims

It is a learning data generation method for generating learning data of a search device for searching a first casebook, which is a set of first cases.
Obtain the second case from the second casebook, which is a set of second cases including a question and at least one answer to the question.
In response to the input to the question generator based on the questions and answers included in the acquired second case, the question generation is performed so that the word string of the virtual question output by the question generator corresponds to the word string of the question. Learn the vessel,
The computer executes a process of generating learning data in which the virtual question generated by inputting the learned question generator based on the first case and the input first case are associated with each other. ,
The acquisition process acquires the second case included in the category determined by analyzing the case of the first case collection by natural language processing among the plurality of categories in the second case collection. ,
A learning data generation method characterized by this.

The second case has evaluation information indicating an evaluation for each of a plurality of answers to the question.
The acquisition process acquires a question included in the second case and an answer from a plurality of answers to the question whose evaluation information satisfies a predetermined condition.
The learning data generation method according to claim 1 , wherein the learning data is generated.

The second casebook is a shared site that shares knowledge via a communication network, and the second casebook is a question posted on the shared site and at least one answer posted to the question. Is,
The learning data generation method according to claim 1 or 2 , wherein the learning data is generated.

It is a learning data generation program that generates learning data of a search device that searches the first casebook, which is a set of first cases.
Obtain the second case from the second casebook, which is a set of second cases including a question and at least one answer to the question.
In response to the input to the question generator based on the questions and answers included in the acquired second case, the question generation is performed so that the word string of the virtual question output by the question generator corresponds to the word string of the question. Learn the vessel,
Let the computer execute a process of generating learning data in which the virtual question generated by inputting the learned question generator based on the first case and the input first case are associated with each other. ,
The acquisition process acquires the second case included in the category determined by analyzing the case of the first case collection by natural language processing among the plurality of categories in the second case collection. ,
A learning data generation program characterized by this.