JP2020161012A

JP2020161012A - Information processing apparatus, control method and program

Info

Publication number: JP2020161012A
Application number: JP2019062106A
Authority: JP
Inventors: 下郡山　敬己; Itsuki Shimokooriyama; 敬己下郡山
Original assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc
Current assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc
Priority date: 2019-03-28
Filing date: 2019-03-28
Publication date: 2020-10-01

Abstract

To provide an information processing apparatus, a control method and a program that can select an appropriate learning model from learning models which are classified to be managed, in order learning or the like in information search.SOLUTION: An information processing apparatus comprises: reception means that receives specification of a classification item; and selection means that selects one category of learning models on the basis of the specified classification item.SELECTED DRAWING: Figure 1

Description

本発明は、検索対象となる文書群の中から指定された検索条件に適切と思われる文書を提示するための文書検索の技術に関する。 The present invention relates to a document retrieval technique for presenting a document that seems to be appropriate for a specified search condition from a group of documents to be searched.

従来からユーザに対して適切な検索結果を提示するため、検索条件と文書群に含まれる各文書に含まれるターム（形態素解析、Ｎ−Ｇｒａｍなど一定の基準で切り出した文字列）の関連性を統計値として算出する技術がある。これらの技術を類似検索などと呼ぶ（以下、当該技術を本発明の説明において、統一的に類似検索と呼ぶこととし、本願発明における後述の順位学習による検索とは区別することにする）。 Conventionally, in order to present appropriate search results to users, the relationship between the search conditions and the terms (morphological analysis, character strings cut out by a certain standard such as N-Gram) included in each document included in the document group is determined. There is a technique to calculate as a statistical value. These techniques are referred to as a similarity search or the like (hereinafter, the technique is collectively referred to as a similarity search in the description of the present invention, and is distinguished from the search by rank learning described later in the present invention).

また、学習データと検索対象となる文書群が類似する場合の特徴量を機械学習によりモデル化し、新たな検索条件が指定された場合に、当該学習モデルに基づきランキング調整をすることで、類似検索の精度を向上させる順位学習の技術がある。 In addition, the feature amount when the learning data and the document group to be searched are similar is modeled by machine learning, and when a new search condition is specified, the ranking is adjusted based on the learning model to perform a similar search. There is a ranking learning technique that improves the accuracy of.

順位学習には大量の学習データが必要であるが、学習データの収集は困難である。類似検索をシステムとして運用開始した後にユーザの検索ログから学習データを収集することも考えられるが、検索結果の評価にはユーザの負荷がかかることもあり、十分な量のログ収集が可能とは言い切れない。また運用開始前には、開発者がテスト用に作成した学習データなどに限定される。 A large amount of learning data is required for rank learning, but it is difficult to collect the learning data. It is conceivable to collect learning data from the user's search log after starting the operation of similar search as a system, but it may be a burden on the user to evaluate the search results, so it is possible to collect a sufficient amount of logs. could not say it all. Also, before the start of operation, it is limited to learning data created by the developer for testing.

特許文献１は、予め用意された回答（いわばＦＡＱの文書群）に対して、ユーザからの問い合わせに対して最も類似した質問（学習データの質問文）を見つけ、対応する回答を返す技術に対して、質問文が少ない場合でもトピック推定精度を高める技術を提供している。 Patent Document 1 is a technique for finding a question (question text of learning data) most similar to an inquiry from a user with respect to a prepared answer (so to speak, a FAQ document group) and returning a corresponding answer. We provide technology to improve topic estimation accuracy even when there are few question sentences.

具体的には、学習データの質問文に現れる単語に対して、対応する回答内の単語に置換することによって、学習データの質問文を拡張する、すなわち学習データの件数を増やしている。また拡充した質問文のうち不自然な質問文を除外するため、確率言語モデルを用いて質問文の存在確率を計算し、存在確率がある閾値を超える場合のみ学習データとして用いるとしている。 Specifically, the question sentence of the learning data is expanded by replacing the word appearing in the question sentence of the learning data with the word in the corresponding answer, that is, the number of learning data is increased. In addition, in order to exclude unnatural question sentences from the expanded question sentences, the existence probability of the question sentence is calculated using a stochastic language model, and it is used as training data only when the existence probability exceeds a certain threshold.

特開２０１７−３７５８８号公報JP-A-2017-37588

しかしながら、特許文献１の技術においては、確率言語モデルを用いて拡充された質問文が適切であるか否かを判定しているが、置換された単語はあくまで予め用意された回答に含まれるものであり、専門用語やある組織特有の用語が使用されている可能性がある。その場合、確率言語モデルでは事例が不足していて、質問文が適切に拡充されない場合も発生する。 However, in the technique of Patent Document 1, it is determined whether or not the expanded question sentence is appropriate by using a stochastic language model, but the replaced word is included in the answer prepared in advance. There is a possibility that technical terms or terms specific to an organization are used. In that case, the stochastic language model lacks cases, and the question text may not be expanded appropriately.

さらに特許文献１の技術においては、学習データとして用いる質問文を拡充させることで学習効果を高めること目的である。しかしながら学習データの件数が増加すると学習に要する計算時間が膨大になり、実用的ではなくなってしまうことある。 Further, the technique of Patent Document 1 aims to enhance the learning effect by expanding the question sentences used as learning data. However, if the number of training data increases, the calculation time required for learning becomes enormous, which may become impractical.

本発明の目的は、情報検索における順位学習等において、分類されて管理される学習モデルから適切な学習モデルを選択することを可能とする技術を提供することである。 An object of the present invention is to provide a technique capable of selecting an appropriate learning model from the classified and managed learning models in ranking learning and the like in information retrieval.

本発明は、分類項目により決定されるカテゴリに対応する学習モデルを複数管理する情報処理装置であって、前記分類項目の指定を受け付ける受付手段と、前記指定された分類項目に基づいて、１のカテゴリの学習モデルを選択する選択手段とを備えることを特徴とする。 The present invention is an information processing device that manages a plurality of learning models corresponding to a category determined by a classification item, and is based on a receiving means that accepts the designation of the classification item and the designated classification item. It is characterized by having a selection means for selecting a learning model of a category.

本発明により、情報検索における順位学習等において、分類されて管理される学習モデルから適切な学習モデルを選択することが可能となる。 According to the present invention, it is possible to select an appropriate learning model from the classified and managed learning models in ranking learning and the like in information retrieval.

本発明の実施形態に係る機能構成の一例を示す図である。It is a figure which shows an example of the functional structure which concerns on embodiment of this invention. 本発明の実施形態に係る情報処理装置１００に適用可能なハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware configuration applicable to the information processing apparatus 100 which concerns on embodiment of this invention. 本発明の実施形態に係わる類似検索対象となる文書の一例である。This is an example of a document to be searched for similarities according to the embodiment of the present invention. 本発明の実施形態に係わる検索対象文書の分類と文書数の一例である。This is an example of the classification and the number of documents to be searched according to the embodiment of the present invention. 本発明の実施形態に係る検索時のユーザインタフェースの一例である。This is an example of a user interface at the time of searching according to the embodiment of the present invention. 本発明の実施形態に係わる学習データの一例である。This is an example of learning data related to the embodiment of the present invention. 本発明の実施形態に係るカテゴリに対して学習モデルを生成するか否かを判定するための情報の一例である。This is an example of information for determining whether or not to generate a learning model for the category according to the embodiment of the present invention. 本発明の実施形態に係るカテゴリに対して学習モデルを生成するか否かを判定する基準の一例である。This is an example of a criterion for determining whether or not to generate a learning model for the category according to the embodiment of the present invention. 本発明の実施形態に係る学習時の処理を説明するフローチャートの一例である。It is an example of the flowchart explaining the process at the time of learning which concerns on embodiment of this invention. 本発明の実施形態に係る学習時に１つのカテゴリ評価処理を説明するフローチャートの一例である。This is an example of a flowchart for explaining one category evaluation process at the time of learning according to the embodiment of the present invention. 本発明の実施形態に係る学習時の精度評価の処理を説明するフローチャートの一例である。This is an example of a flowchart for explaining the accuracy evaluation process at the time of learning according to the embodiment of the present invention. 本発明の実施形態に係る学習モデルとカテゴリを対応づけて記憶する記憶部の一例である。This is an example of a storage unit that stores the learning model according to the embodiment of the present invention in association with the category. 本発明の実施形態に係る検索処理を説明するフローチャートの一例である。This is an example of a flowchart for explaining the search process according to the embodiment of the present invention. 本発明の実施形態に係る検索時にカテゴリを選択する処理を説明するフローチャートの一例である。This is an example of a flowchart for explaining a process of selecting a category at the time of searching according to the embodiment of the present invention.

以下、本発明の実施の形態を、図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

本発明においては機械学習により従来型の文書の検索結果を、機械学習を利用して検索順位を改めて指定し直す。これを順位学習などと呼ぶ。特に本発明では説明の便宜上、事前に学習モデルを決定する処理を“学習モデルの生成”、実際にユーザなどの検索条件に基づく検索結果を、生成された学習モデルを用いて順位を指定し直す処理を“再ランク付け”と呼ぶことにする。 In the present invention, the search result of the conventional document is redesignated by machine learning, and the search order is redesignated by using machine learning. This is called ranking learning. In particular, in the present invention, for convenience of explanation, the process of determining the learning model in advance is "generation of the learning model", and the search results based on the search conditions such as the user are redesignated by using the generated learning model. We will call the process "reranking".

図１は、本発明の実施形態に係る機能構成の一例を示す図である。本機能構成は、大きく学習時の機能と検索時の機能に分けて考えることができる。 FIG. 1 is a diagram showing an example of a functional configuration according to an embodiment of the present invention. This function configuration can be broadly divided into a learning function and a search function.

学習カテゴリ決定部１０１は、学習データ記憶部１２１に記憶された学習データ（検索条件や正解の選択などのユーザログ）と検索対象文書記憶部１２３に格納されている検索対象文書の情報から、学習モデルを生成する文書カテゴリを決定する機能部である。何れのカテゴリを学習対象とするか評価するための基準は学習実行条件記憶部１２２に設定されており、学習カテゴリ決定部１０１から参照される。また評価する対象の各カテゴリに対応づけられる情報を、文書・学習状況記憶部７００に格納する。 The learning category determination unit 101 learns from the learning data (user log such as search conditions and selection of correct answer) stored in the learning data storage unit 121 and the information of the search target document stored in the search target document storage unit 123. It is a functional part that determines the document category that generates the model. The criteria for evaluating which category is to be learned is set in the learning execution condition storage unit 122, and is referred to by the learning category determination unit 101. In addition, information associated with each category to be evaluated is stored in the document / learning status storage unit 700.

学習モデル生成部１０２は、学習カテゴリ決定部１０１で決定した学習対象の各カテゴリに対して学習モデルを生成し、学習モデル記憶部１２４に学習モデルを格納する。学習に際して、学習モデル生成部１０２は類似検索部１０３を呼び出して検索対象文書記憶部１２３を検索する。 The learning model generation unit 102 generates a learning model for each category of the learning target determined by the learning category determination unit 101, and stores the learning model in the learning model storage unit 124. At the time of learning, the learning model generation unit 102 calls the similar search unit 103 to search the search target document storage unit 123.

検索条件受付部１１１は、検索処理を利用するユーザからユーザインタフェースを介して、あるいは他のアプリケーションから検索条件を受け付けるための機能部である。 The search condition reception unit 111 is a functional unit for receiving search conditions from a user who uses the search process via the user interface or from another application.

カテゴリ取得部１１２は、検索条件受付部１１１で受け付けた検索条件、学習モデル記憶部１２４、文書・学習状況記憶部７００を参照していずれのカテゴリの学習モデルを利用して再ランク付けを実行するかを決定する。 The category acquisition unit 112 refers to the search condition received by the search condition reception unit 111, the learning model storage unit 124, and the document / learning status storage unit 700, and executes re-ranking using the learning model of any category. To decide.

再ランク付け部１１３は、検索条件受付部１１１で受け付けた検索条件を類似検索部１０３に渡して、類似検索部１０３はその条件に基づいて検索対象文書記憶部１２３に対して類似検索を実行する。さらに再ランク付け部１１３は学習モデル記憶部１２４を参照し、決定された前記カテゴリに対応する学習モデルを用いて、前記検索結果に対して再ランク付けを行う。 The re-ranking unit 113 passes the search conditions received by the search condition receiving unit 111 to the similar search unit 103, and the similar search unit 103 executes a similar search on the search target document storage unit 123 based on the conditions. .. Further, the re-ranking unit 113 refers to the learning model storage unit 124, and re-ranks the search results using the learning model corresponding to the determined category.

結果提示部１１４は、検索結果をユーザインタフェースあるいは本発明の検索機能を呼び出したアプリケーションに検索結果を提示する。 The result presentation unit 114 presents the search result to the user interface or the application that called the search function of the present invention.

ただしカテゴリ取得部１１２において、カテゴリを決定できない場合がある。その際は、再ランク付けを行わず、類似検索部１０３の類似検索結果をそのまま提示する。 However, the category acquisition unit 112 may not be able to determine the category. In that case, the similar search result of the similar search unit 103 is presented as it is without re-ranking.

図２は、本発明の実施形態に係る情報処理装置１００に適用可能なハードウェア構成の一例を示すブロック図である。 FIG. 2 is a block diagram showing an example of a hardware configuration applicable to the information processing apparatus 100 according to the embodiment of the present invention.

図２に示すように、情報処理装置１００は、システムバス２０４を介してＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２０１、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２０２、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２０３、入力コントローラ２０５、ビデオコントローラ２０６、メモリコントローラ２０７、通信Ｉ／Ｆコントローラ２０８等が接続された構成を採る。 As shown in FIG. 2, the information processing apparatus 100 includes a CPU (Central Processing Unit) 201, a RAM (Random Access Memory) 202, a ROM (Read Only Memory) 203, an input controller 205, and a video controller 206 via the system bus 204. , Memory controller 207, communication I / F controller 208, etc. are connected.

ＣＰＵ２０１は、システムバス２０４に接続される各デバイスやコントローラを統括的に制御する。 The CPU 201 comprehensively controls each device and controller connected to the system bus 204.

また、ＲＯＭ２０３あるいは外部メモリ２１１には、ＣＰＵ２０１の制御プログラムであるＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔ／ＯｕｔｐｕｔＳｙｓｔｅｍ）やＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）や、各サーバあるいは各ＰＣが実行する機能を実現するために必要な後述する各種プログラム等が記憶されている。また、本発明を実施するために必要な情報が記憶されている。なお外部メモリはデータベースであってもよい。 Further, the ROM 203 or the external memory 211 will be described later, which is necessary for realizing the functions executed by the BIOS (Basic Input / Output System) and the OS (Operating System) which are the control programs of the CPU 201, and each server or each PC. Various programs etc. are stored. In addition, information necessary for carrying out the present invention is stored. The external memory may be a database.

ＲＡＭ２０２は、ＣＰＵ２０１の主メモリ、ワークエリア等として機能する。ＣＰＵ２０１は、処理の実行に際して必要なプログラム等をＲＯＭ２０３あるいは外部メモリ２１１からＲＡＭ２０２にロードし、ロードしたプログラムを実行することで各種動作を実現する。 The RAM 202 functions as a main memory, a work area, and the like of the CPU 201. The CPU 201 realizes various operations by loading a program or the like necessary for executing the process from the ROM 203 or the external memory 211 into the RAM 202 and executing the loaded program.

また、入力コントローラ２０５は、キーボード（ＫＢ）２０９や不図示のマウス等のポインティングデバイス等からの入力を制御する。 Further, the input controller 205 controls input from a pointing device such as a keyboard (KB) 209 or a mouse (not shown).

ビデオコントローラ２０６は、ディスプレイ２１０等の表示器への表示を制御する。尚、表示器は液晶ディスプレイ等の表示器でもよい。これらは、必要に応じて管理者が使用する。 The video controller 206 controls the display on a display such as the display 210. The display may be a display such as a liquid crystal display. These are used by the administrator as needed.

メモリコントローラ２０７は、ブートプログラム、各種のアプリケーション、フォントデータ、ユーザファイル、編集ファイル、各種データ等を記憶する外部記憶装置（ハードディスク（ＨＤ））や、フレキシブルディスク（ＦＤ）、あるいは、ＰＣＭＣＩＡ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒＭｅｍｏｒｙＣａｒｄＩｎｔｅｒｎａｔｉｏｎａｌＡｓｓｏｃｉａｔｉｏｎ）カードスロットにアダプタを介して接続されるコンパクトフラッシュ（登録商標）メモリ等の外部メモリ２１１へのアクセスを制御する。 The memory controller 207 is an external storage device (hard disk (HD)) for storing boot programs, various applications, font data, user files, edit files, various data, etc., a flexible disk (FD), or a PCMCIA (Personal Computer). Controls access to external memory 211 such as Compact Flash® memory connected via an adapter to the Memory Card International Association card slot.

通信Ｉ／Ｆコントローラ２０８は、ネットワークを介して外部機器と接続・通信し、ネットワークでの通信制御処理を実行する。例えば、ＴＣＰ／ＩＰ（ＴｒａｎｓｍｉｓｓｉｏｎＣｏｎｔｒｏｌＰｒｏｔｏｃｏｌ／ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）を用いた通信等が可能である。 The communication I / F controller 208 connects and communicates with an external device via the network, and executes communication control processing on the network. For example, communication using TCP / IP (Transmission Control Protocol / Internet Protocol) is possible.

尚、ＣＰＵ２０１は、例えばＲＡＭ２０２内の表示情報用領域へアウトラインフォントの展開（ラスタライズ）処理を実行することにより、ディスプレイ２１０上に表示することが可能である。また、ＣＰＵ２０１は、ディスプレイ２１０上のマウスカーソル（図示しない）等によるユーザ指示を可能とする。 The CPU 201 can display the outline font on the display 210 by executing the outline font development (rasterization) process on the display information area in the RAM 202, for example. Further, the CPU 201 enables a user instruction by a mouse cursor (not shown) or the like on the display 210.

本発明を実現するための後述する各種プログラムは、外部メモリ２１１に記録されており、必要に応じてＲＡＭ２０２にロードされることによりＣＰＵ２０１によって実行されるものである。 Various programs described later for realizing the present invention are recorded in the external memory 211, and are executed by the CPU 201 by being loaded into the RAM 202 as needed.

図３は本発明の実施形態に係わる検索対象となる文書の一例である。例として、ソフトウェア製品のサポートに用いられるＦＡＱ（よくある質問）集を記載しているが、本願発明における選択対象の文書は論文、新聞記事、会社規定、ＳＮＳ上の投稿などテキストを含むものであれば何でもよく、ＦＡＱに限定するものではない。 FIG. 3 is an example of a document to be searched according to the embodiment of the present invention. As an example, a collection of FAQs (Frequently Asked Questions) used to support software products is described, but the documents to be selected in the present invention include texts such as papers, newspaper articles, company regulations, and posts on SNS. Anything is acceptable as long as it is available, and it is not limited to FAQ.

３００ａ、３００ｂに記載の例では、１つのＦＡＱには、ＦＡＱを一意的に識別する“ＦＡＱＩＤ”、ＦＡＱ全体の内容を分類するものとして“製品種別”と“問合せ分類”、さらにサポート内容、例えば前記ソフトウェア製品のユーザの問合せに対してどのＦＡＱを参照したらよいかの目安となる“質問”、その応答を記載した“回答”などのフィールドから構成され、それぞれのフィールドに対応する内容の記載を含む。ここで、ＦＡＱＩＤは後述する学習データにおいて問合せ（クエリ）の文字列と特定のＦＡＱを対応づけるものであり、フィールドの名称として限定するものではない。その他のフィールドについては、名称や個数を限定するものではない In the examples described in 300a and 300b, one FAQ includes "FAQ" that uniquely identifies the FAQ, "product type" and "query classification" that classify the contents of the entire FAQ, and further support contents, for example. It consists of fields such as "question" that serves as a guide as to which FAQ should be referred to in response to the user's inquiry of the software product, and "answer" that describes the response, and describes the contents corresponding to each field. Including. Here, the FAQ ID associates a character string of a query with a specific FAQ in the learning data described later, and is not limited as a field name. The names and numbers of other fields are not limited.

図４は、本発明の実施形態に係わる検索対象文書の分類と文書数の一例である。図３に示した文書の集合が検索対象文書記憶部１２３に格納されており、また当該検索対象文書記憶部１２３で、例えば異なるデータベースに保存されるなど管理上分類されている必要はないが、図３の製品種別、問合せ分類などのフィールドにより分類可能な状態である。 FIG. 4 is an example of the classification of search target documents and the number of documents according to the embodiment of the present invention. The set of documents shown in FIG. 3 is stored in the search target document storage unit 123, and the search target document storage unit 123 does not need to be classified administratively, for example, by storing in a different database. It is in a state where it can be classified by fields such as product type and inquiry classification in FIG.

本発明の実施例の通り、製品種別、問合せ分類の２つの体系で分類される場合、図４のような２次元の表となるが、任意の次元でよいことはいうまでもない。また、これらの分類は文書の絞込条件に対応するもので、これらをカテゴリと呼ぶことにする。表の各セルに記載の数値は、製品種別、問合せ分類の２つの条件で絞り込んだカテゴリに対応する文書群の文書数であるとする。いずれか一方で絞り込んだ場合でも対応する文書の集合をカテゴリとする。 As in the embodiment of the present invention, when the product is classified by two systems, product type and inquiry classification, the table is two-dimensional as shown in FIG. 4, but it goes without saying that any dimension may be used. In addition, these classifications correspond to the narrowing conditions of documents, and these are referred to as categories. It is assumed that the numerical value described in each cell of the table is the number of documents in the document group corresponding to the category narrowed down by the two conditions of product type and inquiry classification. Even if one of them is narrowed down, the corresponding set of documents is set as a category.

まず、製品種別で分類する場合を説明する。前記文書の集合は３つのグループ、すなわち個人向け、法人向け、教育機関向けに分類される。これら３つのうち１つ（例えば「個人向け」）を指定することで、縦方向に文書数を合計すると「製品種別＝個人向け」で絞り込んだ場合の文書数となる。 First, the case of classifying by product type will be described. The collection of documents is divided into three groups: individuals, corporations, and educational institutions. By specifying one of these three (for example, "for individuals"), the total number of documents in the vertical direction is the number of documents when narrowed down by "product type = for individuals".

一方、問合せ分類で分類する場合を説明する。前記文書の集合は操作手順、トラブル、製品情報、・・・（省略）、ライセンスのグループに分類される。これらのうち１つ（例えば「ライセンス」）を指定することで、横方向に文書数を合計すると「問合せ分類＝ライセンス」で絞り込んだ場合の２００文書となる。また、更に製品分類で絞り込むと、２つの分類が交差したセルで、「個人向け」が５０文書、「法人向け」が１００文書、「教育機関向け」が５０文書となる。 On the other hand, a case of classifying by inquiry classification will be described. The set of documents is classified into operating procedures, troubles, product information, ... (Omitted), and license groups. By designating one of these (for example, "license"), the total number of documents in the horizontal direction is 200 documents when narrowed down by "query classification = license". Further narrowing down by product classification, the cells in which the two classifications intersect are 50 documents for "individual", 100 documents for "corporate", and 50 documents for "educational institution".

なお、この例のように「製品種別＝個人向け＆問合せ分類＝ライセンス」で絞り込んだ文書の集合は、「製品種別＝個人向け」または「問合せ分類＝ライセンス」のいずれかで絞り込んだ文書集合の部分集合となる。本発明では、集合としての包含関係がある場合、「製品種別＝個人向け」または「問合せ分類＝ライセンス」で絞り込まれた大きな集合を上位カテゴリ、また２条件「製品種別＝個人向け＆問合せ分類＝ライセンス」で絞り込まれた小さな集合を下位カテゴリと呼ぶことがある。文書全体は、最上位のカテゴリとなる。従って、この２次元の例では３つのカテゴリ階層ができることになるが、更に多くの条件で絞り込んだ場合には、それよりも多くの階層から構成されることになる。 As in this example, the set of documents narrowed down by "product type = for individuals & inquiry classification = license" is the set of documents narrowed down by either "product type = for individuals" or "inquiry classification = license". It becomes a subset. In the present invention, when there is an inclusion relationship as a set, a large set narrowed down by "product type = for individuals" or "inquiry classification = license" is classified as a higher category, and two conditions "product type = for individuals & inquiry classification =". A small set narrowed down by "license" is sometimes called a subcategory. The entire document is in the top category. Therefore, in this two-dimensional example, three category hierarchies are created, but when narrowed down by more conditions, it is composed of more hierarchies.

なお前記の説明では、「問合せ分類＝ライセンス」というように１つの体系は１つの値で絞り込まれていたが、これらの値は図４の表から決まるものではなく、図６で説明する学習データによるものである。例えば「問合せ分類＝（製品情報ＯＲライセンス）」というようにＯＲ条件を使った学習データがあれば、当該絞込条件は１つのカテゴリとなり、当該カテゴリは「問合せ分類＝製品情報」と「問合せ分類＝ライセンス」の上位カテゴリとなる。この場合は、図４の表が２次元であっても階層は３次元以上となる場合があることは言うまでもない。 In the above explanation, one system is narrowed down by one value such as "query classification = license", but these values are not determined from the table of FIG. 4, and the learning data described in FIG. 6 is used. It is due to. For example, if there is learning data using the OR condition such as "query classification = (product information OR license)", the narrowing condition becomes one category, and the categories are "query classification = product information" and "query classification". = It is a higher category of "License". In this case, it goes without saying that even if the table of FIG. 4 is two-dimensional, the hierarchy may be three-dimensional or more.

図５は、本発明の実施形態に係る検索時のユーザインタフェースの一例である。本発明の実施形態においてユーザインタフェースは、学習データの収集方法を説明するために例示するものである。 FIG. 5 is an example of a user interface at the time of searching according to the embodiment of the present invention. In the embodiment of the present invention, the user interface is exemplified for explaining the method of collecting learning data.

検索条件入力画面５０１で、ユーザは質問文（検索条件）を入力する（５０２）。また絞込条件を指定することが可能である。図４で説明したとおり絞込条件により検索対象となる文書集合が絞り込まれ、相対的に精度向上する効果を得ることができる。本例では「製品種別」（５０３）、「問合せ分類」（５０６）のフィールドに対して、選択ボタン５０４を押下することで選択リスト５０５を表示し、リストから選択させてもよい。ただし絞り込みは必須の条件ではない。キーボードから入力してもよく、本例は入力方法を制限するものではない。検索条件を入力した後、検索するボタン５０７を押下すると、検索条件を情報処理装置１００の検索条件受付部１１１に送付し、検索結果一覧（不図示）を表示する。 On the search condition input screen 501, the user inputs a question sentence (search condition) (502). It is also possible to specify the narrowing conditions. As described with reference to FIG. 4, the document set to be searched is narrowed down by the narrowing conditions, and the effect of relatively improving the accuracy can be obtained. In this example, the selection list 505 may be displayed by pressing the selection button 504 in the fields of "product type" (503) and "query classification" (506), and may be selected from the list. However, narrowing down is not an essential condition. You may input from the keyboard, and this example does not limit the input method. When the search button 507 is pressed after inputting the search conditions, the search conditions are sent to the search condition reception unit 111 of the information processing apparatus 100, and a search result list (not shown) is displayed.

検索結果一覧からユーザが詳細を閲覧したい文書を選択すると、文書閲覧画面５１１が表示される。また５１２、５１３の欄にはユーザが指定した検索条件が表示されている。 When the user selects a document whose details are to be viewed from the search result list, the document viewing screen 511 is displayed. Further, the search conditions specified by the user are displayed in the columns 512 and 513.

当該文書閲覧画面５１１に表示された文書がユーザにとって求めていた情報であれば、ユーザはボタン５１５を押下して、情報処理装置１００に通知し、学習データ記憶部１２１に格納させることができる。学習データ記憶部１２１に格納するデータの形式を図６で説明する。 If the document displayed on the document viewing screen 511 is the information requested by the user, the user can press the button 515 to notify the information processing device 100 and store it in the learning data storage unit 121. The format of the data stored in the learning data storage unit 121 will be described with reference to FIG.

図６は、学習データ記憶部１２１のデータ形式の例を詳細に記載したものであり、１行が１つの学習データを表している。例えば図５の文書閲覧画面５１１でボタン５１５を押下した場合は、６０７で示す行の情報が登録される。 FIG. 6 describes in detail an example of the data format of the learning data storage unit 121, and one line represents one learning data. For example, when the button 515 is pressed on the document viewing screen 511 of FIG. 5, the information of the line indicated by 607 is registered.

学習データは、実際に検索する際に入力した文字列を質問文６０１、ユーザにとって求めていた情報として正しい文書であるとしたＦＡＱＩＤ６０２（文書を特定する情報）、絞込条件６０３から構成される。製品種別で絞り込んだ場合は６０４、製品種別と問合せ分類で絞り込んだ場合は６０５、絞り込みをせず全ての文書を検索対象とした場合は６０６のように絞込条件は空白になる。 The learning data is composed of a question sentence 601 that is a character string input when actually searching, a FAQID 602 (information that identifies a document) that is a correct document as the information requested by the user, and a narrowing condition 603. The narrowing conditions are blank, such as 604 when narrowed down by product type, 605 when narrowed down by product type and inquiry classification, and 606 when all documents are searched without narrowing down.

また、図６は学習データだけを示しているためＦＡＱＩＤ６０２には必ず値があるが、ボタン５１５を押下しなかった（正解を提示されなかった）場合は空白として、学習データとはならない検索ログを同じテーブルに登録してもよい。その場合でもユーザが頻繁に問い合わせる情報はいかなる絞込条件が指定されているか、などカテゴリに関する情報を収集することができる。 Further, since FIG. 6 shows only the learning data, the FAQID 602 always has a value, but if the button 515 is not pressed (the correct answer is not presented), it is left blank and the search log that does not become the learning data is displayed. You may register in the same table. Even in that case, it is possible to collect information on the category, such as what narrowing conditions are specified for the information frequently inquired by the user.

説明を文書閲覧画面５１１に戻す。ボタン５１５を押下するか否かにかかわらず検索ログを情報処理装置１００に通知し格納させてもよい。 The explanation is returned to the document viewing screen 511. The search log may be notified to the information processing apparatus 100 and stored regardless of whether or not the button 515 is pressed.

これらの画面はあくまで例であって、例えば検索結果一覧画面において複数の検索結果それぞれをチェック可能なユーザインタフェースとしておき、文書閲覧画面５１１で詳細を確認しなくても学習データを指定できるようにしてもよい。 These screens are just examples. For example, a user interface that allows each of a plurality of search results to be checked on the search result list screen is provided so that learning data can be specified without checking the details on the document viewing screen 511. May be good.

図７は、本発明の実施形態に係るカテゴリに対して学習モデルを生成するか否かを判定するための情報の一例である。図６で説明した学習データ記憶部１２１を集計して作成した表を示しており、文書・学習状況記憶部７００に格納されている。 FIG. 7 is an example of information for determining whether or not to generate a learning model for the category according to the embodiment of the present invention. A table created by aggregating the learning data storage units 121 described with reference to FIG. 6 is shown, and is stored in the document / learning situation storage unit 700.

本表に示された数と図８に示された学習実行範囲８０２で各カテゴリを評価し、全てのカテゴリを学習するのではなく運用上学習効果が高いと思われるカテゴリに絞って学習をすることで学習時間を短縮することが目的である。優先順位をつけた後、学習データの件数から学習時間を推定し、一定の時間内（例えば深夜０時から朝６時までの６時間）で学習実行可能なカテゴリのみを学習することが考えられる。 Evaluate each category based on the numbers shown in this table and the learning execution range 802 shown in Fig. 8, and instead of learning all categories, focus on the categories that are considered to have a high operational learning effect. The purpose is to shorten the learning time. After prioritizing, it is conceivable to estimate the learning time from the number of learning data and learn only the categories that can be learned within a certain time (for example, 6 hours from midnight to 6 am). ..

図７の１行が、最左に示すカテゴリ（絞込条件に対応）を示し、そのカテゴリに対して集計した値が項目毎に右に並ぶ。この数値を図８の学習実行条件記憶部１２２で示された各閾値と比較して、当該カテゴリについて学習すべきか否かを判断する。なお、図示した表は一部省略があり、全てのカテゴリに関する情報を記載しているわけではないため、以降の説明における数値は、本表には記載されていない項目も計算に利用している。 One line in FIG. 7 shows the category shown on the far left (corresponding to the narrowing condition), and the values aggregated for that category are arranged on the right for each item. This numerical value is compared with each threshold value shown in the learning execution condition storage unit 122 of FIG. 8, and it is determined whether or not learning should be performed for the category. In addition, since the illustrated table is partially omitted and does not describe information on all categories, the numerical values in the following explanations are used for calculations even for items not described in this table. ..

まず「製品種別＝法人向け」で指定されるカテゴリを例として説明する。同カテゴリ（および下位カテゴリ）に含まれる文書件数は１，５００件であり、図８の条件、２００〜５，０００の範囲であるため、学習した方がよい旨の「○」を付与する。同様に検索回数（同カテゴリおよび下位のカテゴリの合計）は２０，０００回であり「○」を付与する。 First, the category specified by "Product type = For corporations" will be described as an example. The number of documents included in the same category (and subcategories) is 1,500, which is in the range of 200 to 5,000 under the condition of FIG. 8, so "○" indicating that it is better to study is given. Similarly, the number of searches (total of the same category and lower categories) is 20,000, and "○" is given.

検索回数のカバー率は、同カテゴリを「学習モデルを生成しなかった場合」どの程度の検索ログに含まれる検索が「学習していない状態で実行されるか」の割合を示す数値である。すなわち同カテゴリが学習していなくても下位カテゴリ（例えば７０４）などでさらに絞り込めば、そこには対応する学習モデルがあるため、精度の高い検索が可能である。しかしながら下位カテゴリに絞り込まず「製品別種別＝法人向け」で検索すれば学習モデルが存在しないため精度が下がる。その検索回数の割合が図８「検索回数のカバー率」（３０％）を超えるようであれば、下位カテゴリが全て学習されていても同カテゴリも学習した方がよいと判断し「○」を付与する。次に、学習データ量、学習データ量のカバー率も同様に判断する。 The coverage rate of the number of searches is a numerical value indicating the ratio of the number of searches included in the search log "when the learning model is not generated" in the same category "is executed without learning". That is, even if the same category is not learned, if it is further narrowed down by a lower category (for example, 704), there is a corresponding learning model, so that a highly accurate search is possible. However, if you search by "product type = for corporations" without narrowing down to the lower categories, the accuracy will decrease because there is no learning model. If the ratio of the number of searches exceeds the “coverage rate of the number of searches” (30%) in Fig. 8, it is judged that it is better to learn the same category even if all the lower categories have been learned, and “○” is displayed. Give. Next, the amount of training data and the coverage rate of the amount of training data are also determined in the same manner.

最後に「精度（平均順位）であるが、これは検索した際に、検索結果一覧の何位に検索ユーザが望む情報が表示されたかに基づく情報である。すなわちユーザが図５のボタン５１５を押下した際に、その文書は何位の文書であったかを図６の学習データ記憶部１２１の不図示の欄に記憶しておいてもよい。すなわち前回の学習以降、どれだけ学習効果が検索結果の順位として現れていたか、に基づき新たな学習をするか否かを学習モデル毎に判断することになる。 Finally, "accuracy (average ranking)" is information based on how many places in the search result list the information desired by the search user is displayed when the search is performed. That is, the user presses the button 515 in FIG. When pressed, how many documents the document was may be stored in a column (not shown) of the learning data storage unit 121 of FIG. 6. That is, how much the learning effect has been the search result since the previous learning. It will be judged for each learning model whether or not to perform new learning based on whether or not it appears as the ranking of.

例えば、未学習の状態でも正解が２０位以内に入るような７０３〜７０５は学習の優先順位を下げてもよいため「○」を付与しない。一方で７０１のように未学習時は平均２４位であるが、学習により平均して１３位も向上するのであれば、学習効果があると見なして「○」を付与する、などの判断を行う。 For example, 703 to 705 in which the correct answer is within the 20th place even in the unlearned state may lower the priority of learning, so "○" is not given. On the other hand, like 701, it is 24th on average when not learning, but if learning improves 13th on average, it is considered that there is a learning effect and "○" is given. ..

同欄に「−」がある場合には、前回学習時はしていないカテゴリであるため学習後の精度評価はない。この場合の扱いは設計事項であり様々な実装があるが、例えば「未学習時」の順位と「学習時の向上差異」を２つ評価することで、付与する「○」の数を０〜２と３段階にする方法が可能である。 If there is a "-" in the same column, there is no accuracy evaluation after learning because it is a category that was not done at the time of the previous learning. The handling in this case is a design matter and there are various implementations. For example, by evaluating the ranking of "unlearned" and the "improvement difference during learning", the number of "○" to be given is 0 to 0. There are two and three steps available.

また、最新の学習データのうち、一定の割合を仮の学習データ、残りを評価用データとして仮の学習データで学習させ、評価用データで学習効果があるか否かを実際に評価してもよい。ただしそもそも本発明の目的が学習を効率化させることにあるため、評価のためだけに学習することはその趣旨に反することになる。そこで、上記評価のうち精度以外の評価結果として、複数のカテゴリが同じ優先順位となり、そのうちの一部のカテゴリだけ選択しなければならない場合にのみ精度の評価を行うということにしてもよい。 It is also possible to train a certain percentage of the latest learning data with temporary learning data and the rest as evaluation data with temporary learning data, and actually evaluate whether or not the evaluation data has a learning effect. Good. However, since the purpose of the present invention is to improve the efficiency of learning in the first place, learning only for evaluation is contrary to the purpose. Therefore, as an evaluation result other than accuracy among the above evaluations, the accuracy may be evaluated only when a plurality of categories have the same priority and only some of the categories must be selected.

もう１つ学習するか否かを判断する例を説明する。７０２の行の「問合せ分類＝ライセンス」に対応するカテゴリである。同カテゴリでは検索回数のカバー率が０％である。これは下位カテゴリ（７０３〜７０５）の学習モデルが生成された場合の数値である。すなわち検索回数は１０，０００件あるもののこれらは全て下位カテゴリに対応し、「問合せ分類＝ライセンス」のみの絞り込みで検索されるケースはない。従って、同カテゴリは学習しなくてもよいとして「○」が付与されない。ところが、下位カテゴリ（７０３〜７０５）は例えば文書件数が少なく、また未学習時でも順位が良いため他の評価で「○」が付与されず結果的に学習されない可能性がある。その場合は、７０２の行の「問合せ分類＝ライセンス」のカバー率を改めて評価し、下位カテゴリが全て学習されないのであればカバー率１００％となり、この欄に「○」を付与して優先順位を再評価する必要がある。この優先順位の再評価は、処理は図９のフローチャートのステップＳ９１１で判断されるものである。つまり評価に用いた図７の数値に変更があった場合には、再評価を行うことがあるものである。 Another example of determining whether or not to learn will be described. This is the category corresponding to "query classification = license" in line 702. In the same category, the coverage rate of the number of searches is 0%. This is a numerical value when a learning model of a lower category (703 to 705) is generated. That is, although the number of searches is 10,000, all of them correspond to subcategories, and there is no case where the search is performed by narrowing down only "query classification = license". Therefore, "○" is not given because the same category does not need to be learned. However, in the lower categories (703 to 705), for example, the number of documents is small, and the ranking is good even when unlearned, so that "○" is not given in other evaluations and there is a possibility that the documents will not be learned as a result. In that case, the coverage rate of "query classification = license" in line 702 is evaluated again, and if all the subcategories are not learned, the coverage rate is 100%, and "○" is added to this column to give priority. Needs to be reassessed. The process of re-evaluating the priority is determined in step S911 of the flowchart of FIG. That is, if there is a change in the numerical value of FIG. 7 used for the evaluation, the evaluation may be performed again.

また図８は、文書件数、検索回数、学習データ件数など件数、回数を記載しているが、全体の数の中の割合であってもよい。例えば、全体でも学習データが千件しかないような運用開始直後では、各カテゴリの学習データ数は何れも条件を満たさない。その場合は割合で計算してもよい。また件数、回数と割合を合わせて用いていることで特定のカテゴリのデータの割合が多くでも、絶対数としての閾値以上は必要であるものとして、全く学習を行わない場合もあってもよい。 Further, although FIG. 8 shows the number of documents, the number of searches, the number of learning data, and the number of cases, it may be a ratio of the total number. For example, immediately after the start of operation in which there are only 1,000 learning data in total, the number of learning data in each category does not satisfy the condition. In that case, it may be calculated as a percentage. Further, even if the ratio of data in a specific category is large by using the number, the number of times, and the ratio together, it may be considered that the threshold value or more as an absolute number is necessary, and learning may not be performed at all.

いずれにしても本発明の実施の形態を示す１つの例に過ぎず、様々な情報を組み合わせて判断してよい。また、図７では条件を満たした場合に「○」を付与しその数で学習すべきか否か判定するように図示しているが、各項目に重み８０３を付与する、あるいは計算式を用意してスコアを算出する、としてもよい。その場合、例えば学習データが１万件の場合を最高のスコアとして、図８の８０２における境界（最低５，０００、最高２万）に近づくにつれスコアが悪くなるようにしてもよい。 In any case, it is only one example showing the embodiment of the present invention, and various information may be combined for judgment. Further, in FIG. 7, when the condition is satisfied, "○" is given and it is determined whether or not to learn by the number, but a weight 803 is given to each item or a calculation formula is prepared. The score may be calculated. In that case, for example, the case where the number of training data is 10,000 may be set as the highest score, and the score may become worse as the boundary (minimum 5,000, maximum 20,000) in 802 of FIG. 8 is approached.

以上で、図７、図８の説明を完了する。 This completes the description of FIGS. 7 and 8.

図９は、本発明の実施形態に係る学習時の処理を説明するフローチャートの一例である。学習データ等に基づき優先順位の高いカテゴリに対して学習を実行する。図９のフローチャートの各ステップは、情報処理装置１００上のＣＰＵ２０１で実行される。 FIG. 9 is an example of a flowchart for explaining the process at the time of learning according to the embodiment of the present invention. Perform learning for categories with high priority based on learning data. Each step of the flowchart of FIG. 9 is executed by the CPU 201 on the information processing apparatus 100.

ステップＳ９０１においては、学習データ記憶部１２１から学習データを読み込む。ここで学習データではない（正解となるＦＡＱＩＤが指定されていないもの）検索ログも含めて読み込んでもよい（検索回数等確認のため）。 In step S901, the learning data is read from the learning data storage unit 121. Here, search logs that are not learning data (those for which the correct FAQID is not specified) may be read (to confirm the number of searches, etc.).

ステップＳ９０２においては、ステップＳ９０１で読み込んだ学習データ（検索ログを含む）を用いて、文書・学習状況記憶部７００で説明した表を生成する。ただし「○の数」の欄は、以降のステップでの評価に従って付与されていく。また精度（平均順位）の欄も、後述する図１１で実際に学習する場合には、この時点では空欄である。 In step S902, the table described in the document / learning status storage unit 700 is generated using the learning data (including the search log) read in step S901. However, the column of "number of ○" will be given according to the evaluation in the subsequent steps. The accuracy (average rank) column is also blank at this point when actually learning in FIG. 11, which will be described later.

ステップＳ９０３からステップＳ９０６は、ステップＳ９０２で説明した前記表に登録されたカテゴリに対応する行の全てに繰り返し実行される処理である。この時点では、全く評価していないため学習データ（検索ログ）に１件でも登録がある全てのカテゴリが対象となる。 Steps S903 to S906 are processes that are repeatedly executed for all the rows corresponding to the categories registered in the table described in step S902. At this point, since it has not been evaluated at all, all categories that have even one registration in the learning data (search log) are targeted.

ステップＳ９０４においては、前記表の中から１つのカテゴリを取り出して着目するカテゴリとし、ステップＳ９０５において、当該カテゴリに対応する評価を実行（図１０で後述）する。評価に際しては、図８で説明した学習実行条件記憶部１２２を参照する。 In step S904, one category is taken out from the table and set as the category of interest, and in step S905, the evaluation corresponding to the category is executed (described later in FIG. 10). In the evaluation, the learning execution condition storage unit 122 described with reference to FIG. 8 is referred to.

ステップＳ９０７においては、評価したカテゴリに付与された「○」の数（あるいは図７で説明したスコアなど）により実際に学習するカテゴリを選択する。 In step S907, the category to be actually learned is selected according to the number of “◯” assigned to the evaluated category (or the score described with reference to FIG. 7).

ステップＳ９０８においては、ステップＳ９０７のカテゴリ選択により、図７の情報に変更があるか否かを確認する。例えば、７０３〜７０５に対応するカテゴリの学習をしないとされた場合であれば、３カテゴリ共通の上位カテゴリである７０２に対応するカテゴリの検索回数カバー率、学習データカバー率が変更される。即ち下位カテゴリのいずれも学習しないので、両カバー率は１００％となり、上位カテゴリ７０２の学習実行優先順位は変わることになる。優先順位が変わった（ＹＥＳ）場合は、ステップＳ９０７に戻り、改めて学習カテゴリを選択する。変わらない（ＮＯ）場合には、ステップＳ９０９に進む。 In step S908, it is confirmed whether or not the information in FIG. 7 has been changed by selecting the category in step S907. For example, if it is decided not to learn the categories corresponding to 703 to 705, the search count coverage rate and the learning data coverage rate of the category corresponding to 702, which is a higher category common to the three categories, are changed. That is, since none of the lower categories is learned, both coverage rates are 100%, and the learning execution priority of the upper category 702 changes. If the priority has changed (YES), the process returns to step S907 and the learning category is selected again. If it does not change (NO), the process proceeds to step S909.

ステップＳ９０９は、選択されたカテゴリに対する学習モデルを生成し、図１２で後述するように学習データを格納する。 Step S909 generates a learning model for the selected category and stores the training data as described later in FIG.

図１０は、本発明の実施形態に係る学習時に１つのカテゴリ評価処理を説明するフローチャートの一例である。図１０のフローチャートの各ステップは、情報処理装置１００上のＣＰＵ２０１で実行される。図９において評価するためのカテゴリは指定されている。 FIG. 10 is an example of a flowchart for explaining one category evaluation process at the time of learning according to the embodiment of the present invention. Each step of the flowchart of FIG. 10 is executed by the CPU 201 on the information processing apparatus 100. The categories for evaluation are specified in FIG.

ステップＳ１００１においては、図７で対応する行に記載された検索件数・全体の値が、図８の文書件数で指定された学習実行範囲８０２の範囲にあるか否かを判定する。範囲にある場合にはステップＳ１００２に進み「○」を付与する。範囲にない場合は、ステップＳ１００３に進む。 In step S1001, it is determined whether or not the number of searches and the total value described in the corresponding line in FIG. 7 are within the range of the learning execution range 802 specified by the number of documents in FIG. If it is within the range, the process proceeds to step S1002 and "○" is given. If it is not in the range, the process proceeds to step S1003.

ステップＳ１００３、ステップＳ１００４においては、検索回数の判定に応じて「○」を付与する。 In step S1003 and step S1004, "○" is added according to the determination of the number of searches.

ステップＳ１００５、ステップＳ１００６においては、検索件数・カバー率の判定に応じて「○」を付与する。 In step S1005 and step S1006, "○" is assigned according to the determination of the number of searches and the coverage rate.

ステップＳ１００７、ステップＳ１００８においては、学習データ量・全体の判定に応じて「○」を付与する。 In step S1007 and step S1008, "○" is added according to the determination of the amount of training data and the whole.

ステップＳ１００９、ステップＳ１０１０においては、学習データ量・カバー率の判定に応じて「○」を付与する。 In step S1009 and step S1010, "○" is assigned according to the determination of the amount of training data and the coverage rate.

ステップＳ１０１１においては、前ステップまでで付与された「○」の数（あるいは図７、図８で説明したスコアなど）により、学習するカテゴリの優先順位を決める。 In step S1011, the priority of the category to be learned is determined by the number of “◯” given up to the previous step (or the score described in FIGS. 7 and 8).

ステップＳ１０１２においては、各カテゴリの学習にかかる時間を見積もる。具体的には、図７の学習データ量から学習時間を推定することが可能である。この推定は、計算式として本ステップに組み込まれていてもよいし、学習データ量に応じた推定時間を人手で表などにしておき、それを参照してもよい。時間を推定した後、学習に許される時間、例えば深夜０時から朝の６時までの６時間で学習するように指定（不図示の設定ファイル等）されていたとすると、優先順位が高いカテゴリから累積で６時間を超えるカテゴリ以降は、仮に学習対象ではないとされる。 In step S1012, the time required for learning each category is estimated. Specifically, it is possible to estimate the learning time from the amount of learning data shown in FIG. This estimation may be incorporated in this step as a calculation formula, or the estimated time according to the amount of learning data may be manually prepared in a table or the like and referred to. After estimating the time, if it is specified to study in the time allowed for learning, for example, 6 hours from midnight to 6 am (setting file not shown), the category with the highest priority After the category that exceeds 6 hours in total, it is assumed that it is not a learning target.

ステップＳ１０１３においては、精度評価を実行するか否か、実行するとすればどのカテゴリに対して実行するかを決定する。例えば、前述の「○」の数によっては、精度評価の結果にかかわらず学習するか否かの判断が変わらないカテゴリがある。一方で精度評価によっては、前項で仮に決めた学習対象が入れ替わる場合もある。その場合は、評価が入れ替わる可能性がある最低限のカテゴリについて評価をすればよい。評価用の学習自体時間がかかるため、その時間も考慮し、評価用に学習するカテゴリを決定する。精度評価するカテゴリがある場合にはステップＳ１０１４に進む。ない場合には本フローチャートの処理を完了して、図９のフローチャートに戻る。ステップＳ１０１４の説明は図１１を用いて後述する。 In step S1013, it is determined whether or not to execute the accuracy evaluation, and if so, to which category. For example, depending on the number of “○” mentioned above, there is a category in which the judgment of whether or not to learn does not change regardless of the result of the accuracy evaluation. On the other hand, depending on the accuracy evaluation, the learning target tentatively decided in the previous section may be replaced. In that case, the evaluation should be made for the minimum category in which the evaluation may be replaced. Since learning for evaluation itself takes time, the category to be learned for evaluation is determined in consideration of the time. If there is a category for accuracy evaluation, the process proceeds to step S1014. If not, the process of this flowchart is completed, and the process returns to the flowchart of FIG. The explanation of step S1014 will be described later with reference to FIG.

ステップＳ１０１５、ステップＳ１０１６においては、精度（平均順位）の未学習時の順位に応じて「○」を付与する。 In step S1015 and step S1016, "○" is given according to the unlearned rank of accuracy (average rank).

ステップＳ１０１７、ステップＳ１０１８においては、精度（平均順位）の学習時向上差異に応じて「○」を付与する。 In step S1017 and step S1018, “◯” is given according to the difference in improvement in accuracy (average rank) during learning.

図１１は、本発明の実施形態に係る学習時の精度評価の処理を説明するフローチャートの一例である。図１１のフローチャートの各ステップは、情報処理装置１００上のＣＰＵ２０１で実行される。 FIG. 11 is an example of a flowchart for explaining the accuracy evaluation process during learning according to the embodiment of the present invention. Each step of the flowchart of FIG. 11 is executed by the CPU 201 on the information processing apparatus 100.

ステップＳ１１０１においては、最新の学習データ（学習データ記憶部１２１）で学習するか否かを判断する。この判定は、例えば人手により設定するものであって、評価としての学習を実行するか否かを事前に決定されたものであってもよい。判定式を用意して情報処理装置１００が判断するものであってもよい。判定式の例としては、評価用の学習に要する時間を学習データの量から推定し、それが評価完了後の実際の学習時間とあわせて学習の可能時間（例えば前述の６時間）に処理可能かどうかで判定してもよい。学習すると判定した場合にはステップＳ１１０１に進む。しないと判定した場合にはステップＳ１１０５に進む。 In step S1101, it is determined whether or not to learn with the latest learning data (learning data storage unit 121). This determination may be set manually, for example, and may be determined in advance as to whether or not to execute learning as an evaluation. A discriminant may be prepared and the information processing apparatus 100 may make a judgment. As an example of the discriminant, the time required for learning for evaluation can be estimated from the amount of learning data, and it can be processed in the possible learning time (for example, the above-mentioned 6 hours) together with the actual learning time after the evaluation is completed. It may be judged by whether or not. If it is determined that learning is to be performed, the process proceeds to step S1101. If it is determined not to do so, the process proceeds to step S1105.

ステップＳ１１０２においては、全ての学習データのうち、一部を評価用の学習モデルを生成する目的、他の一部を当該学習モデルの効果を評価するための目的、として使用するために一定の基準で分ける。例えばランダムに各々５０％に分けてもよい。また評価に利用できる時間によって学習用、評価用のデータを減らしてもよい。 In step S1102, a certain criterion is used for using a part of all the training data as a purpose for generating a learning model for evaluation and another part as a purpose for evaluating the effect of the learning model. Divide by. For example, each may be randomly divided into 50%. Further, the data for learning and evaluation may be reduced depending on the time available for evaluation.

ステップＳ１１０３においては、前記学習用の目的に分けた学習データを用いて、着目中のカテゴリに対する評価用の学習モデルを生成する。 In step S1103, a learning model for evaluation for the category of interest is generated by using the learning data divided into the purposes for learning.

ステップＳ１１０４においては、前記評価用の目的に分けた学習データを用いて、着目中のカテゴリを実際に検索し、類似検索部１０３と再ランク付け部１１３の各々の結果に含まれる正解の順位を取得し評価する。これらの評価結果は、図７の精度（平均順位）の未学習時、学習時向上差異の欄に記載していく。 In step S1104, the category of interest is actually searched using the learning data divided into the objectives for evaluation, and the order of correct answers included in the results of the similar search unit 103 and the reranking unit 113 is determined. Acquire and evaluate. These evaluation results are described in the columns of the accuracy (average rank) of FIG. 7 when not learned and when improved during learning.

ステップＳ１１０５においては、図７で既存の学習データに基づき、最新の学習モデルを使用しても前回の学習モデルと順位が向上する傾向は大きく変動しないと推定して、精度評価結果とするものである。すなわち精度評価をするものの実際に評価用の学習処理や検索・再ランク付けをしての評価処理はせず、過去の実績を利用する。なお実際の運用時には、検索条件が入力されると類似検索部１０３で検索した結果に対して（学習モデルがある場合には）再ランク付け部１１３で最終的な検索結果をランク付けするため、１度の検索で「未学習時」「学習時」の順位が取得できる。以上の処理で図１１のフローチャートを完了し、図１０の処理に戻る。 In step S1105, based on the existing learning data in FIG. 7, it is estimated that the tendency to improve the ranking from the previous learning model does not change significantly even if the latest learning model is used, and the accuracy evaluation result is obtained. is there. That is, although the accuracy is evaluated, the past achievements are used without the learning processing for evaluation and the evaluation processing by searching and re-ranking. In actual operation, when a search condition is input, the re-ranking unit 113 ranks the final search result (if there is a learning model) for the search result in the similar search unit 103. The ranking of "unlearned" and "learned" can be obtained by one search. With the above processing, the flowchart of FIG. 11 is completed, and the process returns to the processing of FIG.

なおフローチャートでは明記していないが、学習データは必ずしも全て利用する必要はない。例えば、本発明の情報処理装置１００を長期間運用している場合には、数年前の学習データも残っている。しかしながらユーザの検索条件や登録されている文書が更新されることにより、１年以内のものに限り学習データを本発明の実施形態に使用する、としてもよい。以上により図１１の説明を完了する。 Although not specified in the flowchart, it is not always necessary to use all the learning data. For example, when the information processing apparatus 100 of the present invention is operated for a long period of time, learning data from several years ago remains. However, the learning data may be used in the embodiment of the present invention only for those within one year by updating the user's search conditions and the registered documents. This completes the description of FIG.

図１２は、本発明の実施形態に係る学習モデルとカテゴリを対応づけて記憶する記憶部の一例である。学習モデル１２０２は後述する検索処理の再ランク付け部１１３において使用するが、検索条件で指定された絞込条件と対応づけるため学習済みカテゴリテーブル１２０１も記憶する（他の記憶部でもよい）。 FIG. 12 is an example of a storage unit that stores the learning model and the category according to the embodiment of the present invention in association with each other. The learning model 1202 is used in the reranking unit 113 of the search process described later, but the learned category table 1201 is also stored (may be another storage unit) in order to associate with the narrowing condition specified in the search condition.

なお絞込条件とカテゴリの対応付けにおいて、絞込条件に記載されたフィールド名、条件の値は順番が入れ替わってもよいものとする。すなわち、「製品種別＝個人向け＆問合せ分類＝（製品情報ＯＲライセンス）」に対応するカテゴリの学習モデルは、検索条件の絞り込みにおいて「製品種別」と「問合せ分類」の指定順、また問合せ分類のＯＲの値「製品情報」と「ライセンス」の指定順に依存せず（順番が違っていても）条件として同じであれば、対応付けができるものとする。 In the association between the narrowing condition and the category, the order of the field names and the value of the condition described in the narrowing condition may be changed. That is, the learning model of the category corresponding to "Product type = Personal & Inquiry classification = (Product information OR license)" is the order in which "Product type" and "Inquiry classification" are specified in narrowing down the search conditions, and the inquiry classification. If the OR value does not depend on the specified order of "product information" and "license" (even if the order is different) and the conditions are the same, the association can be performed.

以上で、本発明における学習モデルを生成する処理の実施形態について説明を完了する。次に、生成された学習モデルを用いた検索処理について図１３、図１４を用いて説明する。 This completes the description of the embodiment of the process for generating the learning model in the present invention. Next, the search process using the generated learning model will be described with reference to FIGS. 13 and 14.

図１３は、本発明の実施形態に係る検索処理を説明するフローチャートの一例である。図１４は、本発明の実施形態に係る検索時にカテゴリを選択する処理を説明するフローチャートの一例である。図１３、図１４のフローチャートの各ステップは、情報処理装置１００上のＣＰＵ２０１で実行される。 FIG. 13 is an example of a flowchart for explaining the search process according to the embodiment of the present invention. FIG. 14 is an example of a flowchart illustrating a process of selecting a category at the time of searching according to the embodiment of the present invention. Each step of the flowchart of FIGS. 13 and 14 is executed by the CPU 201 on the information processing apparatus 100.

ステップＳ１３０１においては、図５のユーザインタフェースを通してユーザから、あるいは他のアプリケーションから検索条件を受け付ける。図５で説明したように、ユーザが指定する検索条件は質問文と絞込条件が含まれる。ここで絞込条件がない場合は、文書全体の集合を示す「全文書」というカテゴリが対応するものと仮定してもよい。 In step S1301, the search condition is accepted from the user or from another application through the user interface of FIG. As described with reference to FIG. 5, the search condition specified by the user includes a question sentence and a narrowing condition. If there is no narrowing condition here, it may be assumed that the category "all documents" indicating the set of the entire document corresponds to it.

ステップＳ１３０２においては、ステップＳ１３０１で受け付けた検索条件から絞込条件の部分を抽出し、カテゴリを特定する。図１２でも説明したように、検索条件の絞り込みにおいて記載の順番は本質的ではないので考慮せず、後述の処理で実質的な絞り込みが同一の文書群を指すのであれば同一のカテゴリに対応し、従って図１２の対応するカテゴリの学習モデル１２０２を取得できるものとする。学習モデルがない場合であっても、絞込条件そのものがカテゴリに対応している。前記カテゴリを「起点カテゴリ」とする。すなわち、検索ユーザが明示的に条件としたカテゴリである。 In step S1302, the narrowing-down condition portion is extracted from the search conditions received in step S1301 and the category is specified. As explained in FIG. 12, the order of description is not essential in narrowing down the search conditions, so it is not considered, and if the actual narrowing down points to the same document group in the processing described later, the same category is supported. Therefore, it is assumed that the learning model 1202 of the corresponding category of FIG. 12 can be acquired. Even if there is no learning model, the narrowing conditions themselves correspond to the categories. The category is referred to as a "starting point category". That is, it is a category explicitly set by the search user.

ステップＳ１３０３は、検索対象とするカテゴリを決定する。詳細は図１４のフローチャートを用いて説明する。 Step S1303 determines the category to be searched. Details will be described with reference to the flowchart of FIG.

ステップＳ１４０１においては、起点カテゴリに対応する学習モデルがあるか否か、図１２を参照して判定する。学習モデルがある（ＹＥＳ）場合には、ステップＳ１４０５に進み、起点カテゴリの学習モデル、すなわちユーザが指定した絞り込みに適した学習モデルを利用して再ランク付けするよう指定する。学習モデルがない（ＮＯ）場合には、ステップＳ１４０２に進む。 In step S1401, it is determined with reference to FIG. 12 whether or not there is a learning model corresponding to the starting point category. If there is a learning model (YES), the process proceeds to step S1405, and it is specified to re-rank using the learning model of the starting point category, that is, the learning model suitable for the narrowing down specified by the user. If there is no learning model (NO), the process proceeds to step S1402.

ステップＳ１４０２においては、起点カテゴリの上位に当たるカテゴリのうち、学習モデルがあるものを全てリストアップする。 In step S1402, among the categories higher than the starting point category, all the categories having the learning model are listed.

上位カテゴリで学習モデルがない場合、検索条件に含まれる絞込条件で結局は絞り込んで検索し、その後の再ランク付け部１１３でのランキング調整をしないため、検索結果は同じ結果となる。学習モデルがある場合には、ユーザの絞り込み条件で、同じ文書群を検索した後、上位カテゴリの学習モデルで再ランク付けされるため、学習効果があり優先的にすべきと推定される。従って、上位カテゴリのうち学習モデルがあるものだけをリストアップすればよい。 When there is no learning model in the upper category, the search result is the same because the search is finally narrowed down by the narrowing condition included in the search condition and the ranking is not adjusted in the reranking unit 113 after that. If there is a learning model, it is presumed that it should be prioritized because it has a learning effect because it is re-ranked by the learning model of the upper category after searching the same document group under the user narrowing condition. Therefore, it is only necessary to list the upper categories that have a learning model.

ステップＳ１４０３においては、上位カテゴリのうち、図７の精度（平均順位）を参照して、もっとも精度が高いカテゴリを選択する。また複数のカテゴリで優劣がつかない場合には、最も文書数が少ないものを選択してもよい。 In step S1403, among the upper categories, the category with the highest accuracy is selected with reference to the accuracy (average rank) of FIG. If there is no superiority or inferiority in a plurality of categories, the one with the smallest number of documents may be selected.

ステップＳ１４０４においては、選択されたカテゴリがあるか否かを判定する。例えば、ステップＳ１４０２で、そもそも学習モデルが生成されている上位カテゴリが１つもなければ選択されたカテゴリはない場合が考えら得られる。また上位カテゴリに対応する学習モデルがあったとしても、起点カテゴリよりも広い範囲に対する文書群を対象にして学習モデルを生成しているため、起点カテゴリに対応する絞込条件で絞り込んだ文書群に対しては、学習効果がなく、却って精度が下がる可能性もある。その場合は、上位カテゴリを選択しない。上位カテゴリがない（ＮＯ）場合には、ステップＳ１４０５に進み、起点カテゴリを検索対象とする。ただし学習モデルは存在しない（この場合の処理は図１３のステップＳ１３０５で説明）。選択された上位カテゴリがある（ＹＥＳ）場合には、ステップＳ１４０６に進み、選択された上位カテゴリの学習のデルを用いて再ランク付けすることと決定する。ここで図１４のフローチャートの説明を完了し、図３のステップＳ１３０３の処理が完了したところに話を戻す。 In step S1404, it is determined whether or not there is a selected category. For example, in step S1402, it is conceivable that there is no selected category if there is no upper category in which the learning model is generated in the first place. Even if there is a learning model corresponding to the higher category, since the learning model is generated for the document group for a wider range than the starting category, the document group narrowed down by the narrowing conditions corresponding to the starting category. On the other hand, there is no learning effect, and there is a possibility that the accuracy will decrease. In that case, do not select the upper category. If there is no higher category (NO), the process proceeds to step S1405, and the starting category is searched. However, there is no learning model (processing in this case will be described in step S1305 of FIG. 13). If there is a selected supercategory (YES), it is determined to proceed to step S1406 and rerank using the learning del of the selected superordinate category. Here, the description of the flowchart of FIG. 14 is completed, and the story returns to the place where the process of step S1303 of FIG. 3 is completed.

ステップＳ１３０４では、ステップＳ１３０１で受け付けた絞込条件で、類似検索部１０３により検索対象文書記憶部１２３を検索する。例えば学習モデルとして上位カテゴリに対応するものを使用するにしても、類似検索部１０３では、絞込条件が一番強い起点カテゴリで絞り込んで少ない文書の中から検索したほうが最も精度がよいためである。後のステップで、学習モデルを用いた再ランク付けするにしても類似検索で精度が高い方がよい。 In step S1304, the search target document storage unit 123 is searched by the similar search unit 103 under the narrowing conditions received in step S1301. For example, even if a learning model corresponding to a higher category is used, it is more accurate for the similar search unit 103 to narrow down by the starting point category with the strongest narrowing condition and search from a small number of documents. .. Even if it is re-ranked using the learning model in a later step, it is better that the accuracy is high in the similar search.

ステップＳ１３０５においては学習モデルが存在するか否かを判定する。即ち図１４において起点カテゴリで再ランク付けすると決定されている場合には起点カテゴリの、また上位カテゴリで再ランク付けすると決定されている場合は、当該上位カテゴリの対応する学習モデルを学習モデル記憶部１２４から探す。 In step S1305, it is determined whether or not the learning model exists. That is, in FIG. 14, when it is decided to re-rank in the starting category, the learning model is stored in the starting category, and when it is decided to be re-ranked in the higher category, the corresponding learning model in the higher category is stored in the learning model storage unit. Search from 124.

ステップＳ１３０６においては、ステップＳ１３０４において類似検索で取得した検索結果に対して再ランク付けを実施する。 In step S1306, the search results acquired by the similar search in step S1304 are re-ranked.

ステップＳ１３０７においては、ステップＳ１３０４の類似検索結果、また再ランク付けが行われている場合にはステップＳ１３０６における再ランク付けの結果を呼び出し側に提示する。これにより図１３、図１４のフローチャートの処理の説明を完了する。 In step S1307, the similar search result of step S1304 and, if re-ranking is performed, the result of re-ranking in step S1306 are presented to the caller. This completes the description of the processing of the flowcharts of FIGS. 13 and 14.

なお、上述した各種データの構成及びその内容はこれに限定されるものではなく、用途や目的に応じて、様々な構成や内容で構成されることは言うまでもない。 It should be noted that the structure and contents of the various data described above are not limited to this, and it goes without saying that the structure and contents are various depending on the intended use and purpose.

以上、いくつかの実施形態について示したが、本発明は、例えば、システム、装置、方法、コンピュータプログラムもしくは記録媒体等としての実施態様をとることが可能であり、具体的には、複数の機器から構成されるシステムに適用しても良いし、また、一つの機器からなる装置に適用しても良い。 Although some embodiments have been described above, the present invention can take an embodiment as, for example, a system, an apparatus, a method, a computer program, a recording medium, or the like, and specifically, a plurality of devices. It may be applied to a system composed of, or may be applied to a device composed of one device.

また、本発明におけるコンピュータプログラムは、図９〜図１１、図１３、図１４に示すフローチャートの処理方法をコンピュータが実行可能なコンピュータプログラムであり、本発明の記憶媒体は図９〜図１１、図１３、図１４の処理方法をコンピュータが実行可能なコンピュータプログラムが記憶されている。なお、本発明におけるコンピュータプログラムは図９〜図１１、図１３、図１４の各装置の処理方法ごとのコンピュータプログラムであってもよい。 Further, the computer program in the present invention is a computer program in which a computer can execute the processing methods of the flowcharts shown in FIGS. 9 to 11, 13 and 14, and the storage medium of the present invention is FIGS. 9 to 11 and FIGS. A computer program in which a computer can execute the processing methods of 13 and 14 is stored. The computer program in the present invention may be a computer program for each processing method of the devices of FIGS. 9 to 11, 13, and 14.

以上のように、前述した実施形態の機能を実現するコンピュータプログラムを記録した記録媒体を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格納されたコンピュータプログラムを読出し実行することによっても、本発明の目的が達成されることは言うまでもない。 As described above, a computer in which a recording medium on which a computer program that realizes the functions of the above-described embodiment is recorded is supplied to the system or device, and the computer (or CPU or MPU) of the system or device is stored in the recording medium. Needless to say, the object of the present invention can be achieved by reading and executing the program.

この場合、記録媒体から読み出されたコンピュータプログラム自体が本発明の新規な機能を実現することになり、そのコンピュータプログラムを記憶した記録媒体は本発明を構成することになる。 In this case, the computer program itself read from the recording medium realizes the novel function of the present invention, and the recording medium storing the computer program constitutes the present invention.

コンピュータプログラムを供給するための記録媒体としては、例えば、フレキシブルディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＤＶＤ−ＲＯＭ、磁気テープ、不揮発性のメモリカード、ＲＯＭ、ＥＥＰＲＯＭ、シリコンディスク、ソリッドステートドライブ等を用いることができる。 Recording media for supplying computer programs include, for example, flexible disks, hard disks, optical disks, magneto-optical disks, CD-ROMs, CD-Rs, DVD-ROMs, magnetic tapes, non-volatile memory cards, ROMs, EEPROMs, etc. A silicon disk, solid state drive, or the like can be used.

また、コンピュータが読み出したコンピュータプログラムを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのコンピュータプログラムの指示に基づき、コンピュータ上で稼働しているＯＳ（オペレーティングシステム）等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, by executing the computer program read by the computer, not only the functions of the above-described embodiment are realized, but also the OS (operating system) or the like running on the computer is activated based on the instructions of the computer program. Needless to say, there are cases where a part or all of the actual processing is performed and the processing realizes the functions of the above-described embodiment.

さらに、記録媒体から読み出されたコンピュータプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのコンピュータプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵ等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, the computer program read from the recording medium is written to the memory provided in the function expansion board inserted in the computer or the function expansion unit connected to the computer, and then its function is based on the instruction of the computer program code. Needless to say, there are cases where the CPU provided in the expansion board or the function expansion unit performs a part or all of the actual processing, and the processing realizes the functions of the above-described embodiment.

また、本発明は、複数の機器から構成されるシステムに適用しても、１つの機器からなる装置に適用してもよい。また、本発明は、システムあるいは装置にコンピュータプログラムを供給することによって達成される場合にも適応できることは言うまでもない。この場合、本発明を達成するためのコンピュータプログラムを格納した記録媒体を該システムあるいは装置に読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。 Further, the present invention may be applied to a system composed of a plurality of devices or a device composed of one device. It goes without saying that the present invention can also be applied when it is achieved by supplying a computer program to a system or device. In this case, by reading the recording medium in which the computer program for achieving the present invention is stored into the system or device, the system or device can enjoy the effect of the present invention.

さらに、本発明を達成するためのコンピュータプログラムをネットワーク上のサーバ、データベース等から通信プログラムによりダウンロードして読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。 Further, by downloading and reading a computer program for achieving the present invention from a server, database, or the like on the network by a communication program, the system or device can enjoy the effect of the present invention.

なお、上述した各実施形態およびその変形例を組み合わせた構成も全て本発明に含まれるものである。
It should be noted that all the configurations in which each of the above-described embodiments and modifications thereof are combined are also included in the present invention.

１００情報処理装置
１０１カテゴリ決定部
１０２学習モデル生成部
１０３類似検索部
１１１検索条件受付部
１１２カテゴリ取得部
１１３再ランク付け部
１１４結果提示部
１２１学習データ記憶部
１２２学習実行条件記憶部
１２３検索対象文書記憶部
１２４学習モデル記憶部
７００文書・学習状況記憶部 100 Information processing device 101 Category determination unit 102 Learning model generation unit 103 Similar search unit 111 Search condition reception unit 112 Category acquisition unit 113 Reranking unit 114 Result presentation unit 121 Learning data storage unit 122 Learning execution condition storage unit 123 Search target document Storage unit 124 Learning model storage unit 700 Document / learning situation storage unit

Claims

An information processing device that manages multiple learning models corresponding to the categories determined by the classification items.
A reception means that accepts the designation of the above classification items,
An information processing apparatus including a selection means for selecting a learning model of one category based on the designated classification item.

The information processing apparatus according to claim 1, wherein the selection means selects the learning model in a category determined by the designated classification item when the learning model exists.

The information according to claim 1 or 2, wherein the selection means selects another learning model including the category when the learning model of the category determined by the designated classification item does not exist. Processing equipment.

The information processing device according to claim 3, wherein the selection means selects a learning model of one category based on the number of learning data used when creating a learning model for each category.

It is equipped with an evaluation means for evaluating the learning effect of the learning model for each category.
The information processing apparatus according to claim 3, wherein the selection means selects a learning model of one category based on the evaluated learning effect.

The information processing device searches for a document by a similar search based on a search text, and re-ranks and outputs the searched document using a learning model selected by the selection means. The information processing device according to any one of claims 1 to 5.

The information processing device according to claim 6, wherein the information processing device outputs a document searched by the similar search when a learning model of one category is not selected by the selection means.

It is a control method of an information processing device that manages a plurality of learning models corresponding to a category determined by a classification item.
A reception step in which the reception means accepts the designation of the classification item,
A control method of an information processing apparatus, wherein the selection means includes a selection step of selecting a learning model of one category based on the designated classification item.

It is a program that can be executed in an information processing device that manages multiple learning models corresponding to the categories determined by the classification items.
The information processing device
A reception means that accepts the designation of the above classification items,
A program for functioning as a selection means for selecting a learning model of one category based on the specified classification item.